I would like to update fields in my dataframe :
df = pd.DataFrame([
{'Name': 'Paul', 'Book': 'Plane', 'Cost': 22.50},
{'Name': 'Jean', 'Book': 'Harry Potter', 'Cost': 2.50},
{'Name': 'Jim', 'Book': 'Sponge bob', 'Cost': 5.00}
])
Book Cost Name
0 Plane 22.5 Paul
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jim
Changing names with this string :
{"Paul": "Paula", "Jim": "Jimmy"}
to get this result :
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy
any idea ?
I think you need replace by dictionary d:
d = {"Paul": "Paula", "Jim": "Jimmy"}
df.Name = df.Name.replace(d)
print (df)
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy
Another solution with map and combine_first - map return NaN where not match, so need replace it by original values:
df.Name = df.Name.map(d).combine_first(df.Name)
print (df)
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy
Related
I have a dataset with unique names. Another dataset contains several rows with the same names as in the first dataset.
I want to create a column with unique ids in the first dataset and another column in the second dataset with the same ids corresponding to all the same names in the first dataset.
For example:
Dataframe 1:
player_id Name
1 John Dosh
2 Michael Deesh
3 Julia Roberts
Dataframe 2:
player_id Name
1 John Dosh
1 John Dosh
2 Michael Deesh
2 Michael Deesh
2 Michael Deesh
3 Julia Roberts
3 Julia Roberts
I want to do to use both data frames to run deep feature synthesis using featuretools.
To be able to do something like this:
entity_set = ft.EntitySet("basketball_players")
entity_set.add_dataframe(dataframe_name="players_set",
dataframe=players_set,
index='name'
)
entity_set.add_dataframe(dataframe_name="season_stats",
dataframe=season_stats,
index='season_stats_id'
)
entity_set.add_relationship("players_set", "player_id", "season_stats", "player_id")
This should do what your question asks:
import pandas as pd
df1 = pd.DataFrame([
'John Dosh',
'Michael Deesh',
'Julia Roberts'], columns=['Name'])
df2 = pd.DataFrame([
['John Dosh'],
['John Dosh'],
['Michael Deesh'],
['Michael Deesh'],
['Michael Deesh'],
['Julia Roberts'],
['Julia Roberts']], columns=['Name'])
print('inputs:', '\n')
print(df1)
print(df2)
df1 = df1.reset_index().rename(columns={'index':'id'}).assign(id=df1.index + 1)
df2 = df2.join(df1.set_index('Name'), on='Name')[['id'] + list(df2.columns)]
print('\noutputs:', '\n')
print(df1)
print(df2)
Input/output:
inputs:
Name
0 John Dosh
1 Michael Deesh
2 Julia Roberts
Name
0 John Dosh
1 John Dosh
2 Michael Deesh
3 Michael Deesh
4 Michael Deesh
5 Julia Roberts
6 Julia Roberts
outputs:
id Name
0 1 John Dosh
1 2 Michael Deesh
2 3 Julia Roberts
id Name
0 1 John Dosh
1 1 John Dosh
2 2 Michael Deesh
3 2 Michael Deesh
4 2 Michael Deesh
5 3 Julia Roberts
6 3 Julia Roberts
UPDATE:
An alternative solution which should give the same result is:
df1 = df1.assign(id=list(range(1, len(df1) + 1)))[['id'] + list(df1.columns)]
df2 = df2.merge(df1)[['id'] + list(df2.columns)]
df_current = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-18'],'Name': ['Bob Jones', 'Mike Smith', 'Adam Smith'],
'Items Sold':[1, 3, 2], 'Ticket Type':['1 x GA', '2 x VIP, 1 x GA', '1 x GA, 1 x VIP']})
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 1 x GA
1 2022-09-17 Mike Smith 3 2 x VIP, 1 x GA
2 2022-09-18 Adam Smith 2 1 x GA, 1 x VIP
Hi there. I have the above dataframe, and what I'm after is new rows, with the ticket type and number of tickets sold split out such as below:
df_desired = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-17', '2022-09-18', '2022-09-18'],
'Name': ['Bob Jones', 'Mike Smith', 'Mike Smith', 'Adam Smith', 'Adam Smith'],
'Items Sold':[1, 2, 1, 1, 1], 'Ticket Type':['GA', 'VIP', 'GA', 'GA', 'VIP']})
Any help would be greatly appreciated!
#create df2, by splitting df['ticket type'] on "," and then explode to create rows
df2=df.assign(tt=df['Ticket Type'].str.split(',')).explode('tt')
# split again at 'x'
df2[['Items Sold','Ticket Type']]=df2['tt'].str.split('x', expand=True)
#drop the temp column
df2.drop(columns="tt", inplace=True)
df2
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 GA
1 2022-09-17 Mike Smith 2 VIP
1 2022-09-17 Mike Smith 1 GA
2 2022-09-18 Adam Smith 1 GA
2 2022-09-18 Adam Smith 1 VIP
I need to make a function to expand a dataframe. For example, the input of the function is :
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})
suppose the n value is 3. Then, for each person inside the Name column, I have to add 3 more new rows and leave the Cart as np.nan. The output should be like this :
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', np.nan, np.nan, np.nan, 'phone', 'food', 'bag', np.nan, np.nan, np.nan]
})
How can I solve this with using copy() and append()?
You can use np.repeat with pd.Series.unique:
n = 3
print (df.append(pd.DataFrame(np.repeat(df["Name"].unique(), n), columns=["Name"])))
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Sasha phone
4 Sasha food
5 Sasha bag
0 Ali NaN
1 Ali NaN
2 Ali NaN
3 Sasha NaN
4 Sasha NaN
5 Sasha NaN
Try this one: (it adds n rows to each group of rows with the same Name value)
import pandas as pd
import numpy as np
n = 3
list_of_df_unique_names = [df[df["Name"]==name] for name in df["Name"].unique()]
df2 = pd.concat([d.append(pd.DataFrame({"Name":np.repeat(d["Name"].values[-1], n)}))\
for d in list_of_df_unique_names]).reset_index(drop=True)
print(df2)
Output:
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Ali NaN
4 Ali NaN
5 Ali NaN
6 Sasha phone
7 Sasha food
8 Sasha bag
9 Sasha NaN
10 Sasha NaN
11 Sasha NaN
Maybe not the most beautiful of all solutions, but it works. Say that you want to add 4 NaN rows by group. Then, given your df:
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})
you can creat an empty dataframe DF and loop trough the range (1,4), filter the df you had and in every loop add an empty row:
DF = []
names = list(set(df.Name))
for i in range(4):
for name in names:
gf = df[df['Name']=='{}'.format(name)]
a = pd.concat([gf, gf.groupby('Name')['Cart'].apply(lambda x: x.shift(-1).iloc[-1]).reset_index()]).sort_values('Name').reset_index(drop=True)
DF.append(a)
DF_full = pd.concat(DF)
Now, you'll end up with copies of your original df, so you need to dump them without dumping the NaN rows:
DFF = DF_full.sort_values(['Name','Cart'])
DFF = DFF[(~DFF.duplicated()) | (DFF['Cart'].isnull())]
which gives:
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Ali NaN
3 Ali NaN
3 Ali NaN
3 Ali NaN
2 Sasha bag
1 Sasha food
0 Sasha phone
3 Sasha NaN
3 Sasha NaN
3 Sasha NaN
3 Sasha NaN
I have a datafame as follows
import pandas as pd
d = {
'Name' : ['James', 'John', 'Peter', 'Thomas', 'Jacob', 'Andrew','John', 'Peter', 'Thomas', 'Jacob', 'Peter', 'Thomas'],
'Order' : [1,1,1,1,1,1,2,2,2,2,3,3],
'Place' : ['Paris', 'London', 'Rome','Paris', 'Venice', 'Rome', 'Paris', 'Paris', 'London', 'Paris', 'Milan', 'Milan']
}
df = pd.DataFrame(d)
Name Order Place
0 James 1 Paris
1 John 1 London
2 Peter 1 Rome
3 Thomas 1 Paris
4 Jacob 1 Venice
5 Andrew 1 Rome
6 John 2 Paris
7 Peter 2 Paris
8 Thomas 2 London
9 Jacob 2 Paris
10 Peter 3 Milan
11 Thomas 3 Milan
[Finished in 0.7s]
The dataframe represents people visiting various cities, Order column defines the order of visit.
I would like find which city people visited before Paris.
Expected dataframe is as follows
Name Order Place
1 John 1 London
2 Peter 1 Rome
4 Jacob 1 Venice
Which is the pythonic way to find it ?
Using merge
s = df.loc[df.Place.eq('Paris'), ['Name', 'Order']]
m = s.assign(Order=s.Order.sub(1))
m.merge(df, on=['Name', 'Order'])
Name Order Place
0 John 1 London
1 Peter 1 Rome
2 Jacob 1 Venice
I have a csv file that needs to be ordered with a specific order of names.
e.g the order key is
[David, Paul, Harry, John]
column1 of the csv is however :
Harry
David
John
Paul
And I need to order the csv so column1 is
David
Paul
John
Harry
How can I do this in pandas.
Using Categorical
df = pd.DataFrame(dict(Name=['Harry', 'David', 'John', 'Paul']))
df
Name
0 Harry
1 David
2 John
3 Paul
Set categories
cats = ['David', 'Paul', 'Harry', 'John']
df.assign(Name=pd.Categorical(df.Name, cats, ordered=True)).sort_values('Name')
Name
1 David
3 Paul
0 Harry
2 John
Without regard to the index and using sorted with a key
df.assign(Name=sorted(df.Name, key=dict(map(reversed, enumerate(cats))).get))
Name
0 David
1 Paul
2 Harry
3 John
You can set the columns of names to the index and pass the list containing the order to .loc (data from #piRSquared)
ord = ['David', 'Paul', 'Harry', 'John']
df.set_index(df.Name).loc[ord,:].reset_index(drop=True)
Name
0 David
1 Paul
2 Harry
3 John