python: pandas: add values from other dataframe into new column by condition - python

I have two dataframes with the following data:
fixtures = pd.DataFrame(
{'HomeTeam': ["A", "B", "C", "D"], 'AwayTeam': ["E", "F", "G", "H"]})
ratings = pd.DataFrame({'team': ["A", "B", "C", "D", "E", "F", "G", "H"], "rating": [
"1,5", "0,2", "0,5", "2", "3", "4,8", "0,9", "-0,4"]})
now i want to map the value from ratings["rating"] to the respective team names but i can't get it to work. is it possible to have new columns with the ratings appear to the right of the HomeTeam and AwayTeam columns?
expected output:
fixtures:
homeTeam homeTeamRating awayTeam AwayTeamRating
Team A 1,5 Team E 3

you can use:
to_replace=dict(zip(ratings.team,ratings.rating)) #create a dict. Key is team name value is rating.
#{'A': '1,5', 'B': '0,2', 'C': '0,5', 'D': '2', 'E': '3', 'F': '4,8', 'G': '0,9', 'H': '-0,4'}
fixtures['homeTeamRating']=fixtures['HomeTeam'].map(to_replace) #use map and replace team column as a new column.
fixtures['AwayTeamRating']=fixtures['AwayTeam'].map(to_replace)
fixtures=fixtures[['HomeTeam','homeTeamRating','AwayTeam','AwayTeamRating']]
'''
HomeTeam homeTeamRating AwayTeam AwayTeamRating
0 A 1,5 E 3
1 B 0,2 F 4,8
2 C 0,5 G 0,9
3 D 2 H -0,4
'''

If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas.DataFrame.apply() method should do the trick.

Related

Pandas new col with indexes of rows sharing a code in another col

Let say I've a DataFrame indexed on unique Code. Each entry may herit from another (unique) entry: the parent's Code is given in col Herit.
I need a new column giving the list of children for every entries. I can obtain it providing the Code, but I don't succeed in setting up the whole column.
Here is my M(non)WE:
import pandas as pd
data = pd.DataFrame({
"Code": ["a", "aa", "ab", "b", "ba", "c"],
"Herit": ["", "a", "a", "", "b", ""],
"C": [12, 15, 13, 12, 14, 10]
}
)
data.set_index("Code", inplace=True)
print(data)
child_a = data[data.Herit == "a"].index.values
print(child_a)
data["child"] = data.apply(lambda x: data[data.Herit == x.index].index.values, axis=1)
print(data)
You can group by the Herit column and then reduce the corresponding Codes into lists:
>>> herits = df.groupby("Herit").Code.agg(list)
>>> herits
Herit
[a, b, c]
a [aa, ab]
b [ba]
Then you can map the Code column of your frame with this and assign to a new column and fill the slots who don't have any children with "":
>>> df["Children"] = df.Code.map(herits).fillna("")
>>> df
Code Herit C Children
0 a 12 [aa, ab]
1 aa a 15
2 ab a 13
3 b 12 [ba]
4 ba b 14
5 c 10

How to replace string with values from one datafame to another dataframe

I have two dataframes, where in one dataframe(df1) each user is having string values, while in another dataframe (df2) there is a value associated with string values.
I want to have a new dataframe similar to df1 but with string being replaced with values corresponding to df2. Let me know if a simple method exist to create such new dataframe?
here are the sample query for df1 and df2
df1 = pd.DataFrame({"user": ["user1", "user2", "user3", "user4"], "p1": ["A", "C", "D", "D"],"p2": ["B", "D", "D", "A"],"p3": ["A", "B", "C", "D"],"p4": ["D", "A", "B", "C"], }, index=[0, 1, 2, 3], )
df2 = pd.DataFrame({"N1": ["A", "B", "C", "D"],"N2": ["1", "2", "5", "6"], }, index=[0, 1, 2, 3], )
My desired output should look like this
You can use df.stack() with Series.map and df.unstack:
In [95]: df3 = df1.set_index('user').stack().map(df2.set_index('N1')['N2']).unstack()
In [96]: df3
Out[96]:
p1 p2 p3 p4
user
user1 1 2 1 6
user2 5 6 2 1
user3 6 6 5 2
user4 6 1 6 5

How to return columns from a Dataframe in a function that were not calculated by the function?

I have the following Dataframes
import pandas as pd
df_county = pd.DataFrame({
"A": [50],
"B": [60],
"C": [70]})
df_voronoi = pd.DataFrame({
"area": [1000, 2000, 3000, 4000],
"county": ["A", "B", "C", "A"],
"bus":["bus1", "bus4", "bus20", "bus2"]})
With the following function I am calculating my values:
def calc(df1, df2):
return [1/(df1[county] / area) for county,area in zip(df2.county, df2.area)]
df=calc(df_county,df_voronoi)
df=pd.DataFrame(df)
print(df)
Result:
Here county is the index. I want to have county as a own column and I want to have the bus-column from the Voronoi-Dataframe as a column with the right relation to the county and area.
Thas means i would like to have an output from the function that looks like this:
How to realize that?
And an extra question:
Does it matter at what position I define the function? I have an example where the function is created at the top and the type of the return is a pandas Dataframe. In this example it's a list and I have to make a Dataframe from the list. If yes, can you explane me why?
I think you need a small modification to your existing structure.Try this
import pandas as pd
df_county = pd.DataFrame({
"A": [50],
"B": [60],
"C": [70]})
df_voronoi = pd.DataFrame({
"area": [1000, 2000, 3000, 4000],
"country": ["A", "B", "C", "A"],
"bus":["bus1", "bus4", "bus20", "bus2"]})
def calc(df1, df2):
return [(1/(df1[country] / area),area) for country,area in zip(df2.country, df2.area)]
df=calc(df_county,df_voronoi)
mdf= pd.DataFrame([f[0] for f in df]).reset_index()
mdf["area"]= [f[1] for f in df]
mdf.columns = ["country","factor","area"]
print(mdf)
country factor area
0 A 20.000000 1000
1 B 33.333333 2000
2 C 42.857143 3000
3 A 80.000000 4000
added area column,otherwise we can't identify which bus we need(since two A in df2)
merged = pd.merge(mdf,df_voronoi,on=["country","area"],how="left")
merged = merged.drop(columns=["area"])
print(merged)
country factor bus
0 A 20.000000 bus1
1 B 33.333333 bus4
2 C 42.857143 bus20
3 A 80.000000 bus2

Check a condition in the cells of a column and return a value if the condition is fullfiled, using lambda (PYTHON)

Imagine the next dataframe
data = pd.DataFrame({"col1" : ["a", "b", "z","w", "g", "p", "f"], "col2" :
["010", "030","500","333","090","050","111"]})
I want to use a lambda function to remove the first prefix 0 of the cells in col2.
What I have tried is
data["col2"].apply(lambda row: row["col2"][1:] if row["col2"]
[0:1] == "0" else row["col2"])
But is not working, returning the next error
TypeError: string indices must be integers
So col2 should appear like 10, 30, 500, 333, 90, 50, 111
no need to use 'col2'
data["col2"].apply(lambda row: row[1:] if row[0:1] == "0" else row)
You can also try regex in python:
data = pd.DataFrame({"col1" : ["a", "b", "z","w", "g", "p", "f"], "col2" :
["010", "030","500","333","090","050","111"]})
data['col2'] = data['col2'].apply(lambda x:re.sub(r"^0", '', x))
output:
col1 col2
0 a 10
1 b 30
2 z 500
3 w 333
4 g 90
5 p 50
6 f 111
to_numeric()-Convert argument to a numeric type.
astype()-used to change data type of a series.
Ex.
import pandas as pd
df = pd.DataFrame({"col1" : ["a", "b", "z","w", "g", "p", "f"], "col2" :
["010", "030","500","333","090","050","111"]})
df.col2 = pd.to_numeric(df.col2, errors='coerce').astype(str)
#or
#df.col2 = df.col2.astype(int).astype(str)
print(df)
O/P:
col1 col2
0 a 10
1 b 30
2 z 500
3 w 333
4 g 90
5 p 50
6 f 111

Divide a column depending on a row value in pandas

I am trying to do a calculation in Pandas that looks obvious, but after several tries I did not find how to do it correctly.
I have a dataframe that looks like this:
df = pd.DataFrame([["A", "a", 10.0],
["A", "b", 12.0],
["A", "c", 13.0],
["B", "a", 5.0 ],
["B", "b", 6.0 ],
["B", "c", 7.0 ]])
The first column is a test name, the second column is a class, and third column gives a time. Each test is normally present in the table with the 3 classes.
This is the correct format to plot it like this:
sns.factorplot(x="2", y="0", hue="1", data=df,
kind="bar")
So that for each test, I get a group of 3 bars, one for each class.
However I would like to change the dataframe so that each value in column 2 is not an absolute value, but a ratio compared to class "a".
So I would like to transform it to this:
df = pd.DataFrame([["A", "a", 1.0],
["A", "b", 1.2],
["A", "c", 1.3],
["B", "a", 1.0],
["B", "b", 1.2],
["B", "c", 1.4]])
I am able to extract the series, change the index so that they match, do the computation, for example:
df_a = df[df[1] == "a"].set_index(0)
df_b = df[df[1] == "b"].set_index(0)
df_b["ratio_a"] = df_b[2] / df_a[2]
But this is certainly very inefficient, and I need to group it back to the format.
What is the correct way to do it?
You could use groupby/transform('first') to find the first value in each group:
import pandas as pd
df = pd.DataFrame([["A", "a", 10.0],
["A", "b", 12.0],
["A", "c", 13.0],
["B", "b", 6.0 ],
["B", "a", 5.0 ],
["B", "c", 7.0 ]])
df = df.sort_values(by=[0,1])
df[2] /= df.groupby(0)[2].transform('first')
yields
0 1 2
0 A a 1.0
1 A b 1.2
2 A c 1.3
3 B a 1.0
4 B b 1.2
5 B c 1.4
You can also do this with some index alignment.
df1 = df.set_index(['test', 'class'])
df1 / df1.xs('a', level='class')
But transform is better

Categories

Resources