Pandas create new rows based on column values - python

df_current = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-18'],'Name': ['Bob Jones', 'Mike Smith', 'Adam Smith'],
'Items Sold':[1, 3, 2], 'Ticket Type':['1 x GA', '2 x VIP, 1 x GA', '1 x GA, 1 x VIP']})
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 1 x GA
1 2022-09-17 Mike Smith 3 2 x VIP, 1 x GA
2 2022-09-18 Adam Smith 2 1 x GA, 1 x VIP
Hi there. I have the above dataframe, and what I'm after is new rows, with the ticket type and number of tickets sold split out such as below:
df_desired = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-17', '2022-09-18', '2022-09-18'],
'Name': ['Bob Jones', 'Mike Smith', 'Mike Smith', 'Adam Smith', 'Adam Smith'],
'Items Sold':[1, 2, 1, 1, 1], 'Ticket Type':['GA', 'VIP', 'GA', 'GA', 'VIP']})
Any help would be greatly appreciated!

#create df2, by splitting df['ticket type'] on "," and then explode to create rows
df2=df.assign(tt=df['Ticket Type'].str.split(',')).explode('tt')
# split again at 'x'
df2[['Items Sold','Ticket Type']]=df2['tt'].str.split('x', expand=True)
#drop the temp column
df2.drop(columns="tt", inplace=True)
df2
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 GA
1 2022-09-17 Mike Smith 2 VIP
1 2022-09-17 Mike Smith 1 GA
2 2022-09-18 Adam Smith 1 GA
2 2022-09-18 Adam Smith 1 VIP

Related

Create the same ids for the same names in different dataframes in pandas

I have a dataset with unique names. Another dataset contains several rows with the same names as in the first dataset.
I want to create a column with unique ids in the first dataset and another column in the second dataset with the same ids corresponding to all the same names in the first dataset.
For example:
Dataframe 1:
player_id Name
1 John Dosh
2 Michael Deesh
3 Julia Roberts
Dataframe 2:
player_id Name
1 John Dosh
1 John Dosh
2 Michael Deesh
2 Michael Deesh
2 Michael Deesh
3 Julia Roberts
3 Julia Roberts
I want to do to use both data frames to run deep feature synthesis using featuretools.
To be able to do something like this:
entity_set = ft.EntitySet("basketball_players")
entity_set.add_dataframe(dataframe_name="players_set",
dataframe=players_set,
index='name'
)
entity_set.add_dataframe(dataframe_name="season_stats",
dataframe=season_stats,
index='season_stats_id'
)
entity_set.add_relationship("players_set", "player_id", "season_stats", "player_id")
This should do what your question asks:
import pandas as pd
df1 = pd.DataFrame([
'John Dosh',
'Michael Deesh',
'Julia Roberts'], columns=['Name'])
df2 = pd.DataFrame([
['John Dosh'],
['John Dosh'],
['Michael Deesh'],
['Michael Deesh'],
['Michael Deesh'],
['Julia Roberts'],
['Julia Roberts']], columns=['Name'])
print('inputs:', '\n')
print(df1)
print(df2)
df1 = df1.reset_index().rename(columns={'index':'id'}).assign(id=df1.index + 1)
df2 = df2.join(df1.set_index('Name'), on='Name')[['id'] + list(df2.columns)]
print('\noutputs:', '\n')
print(df1)
print(df2)
Input/output:
inputs:
Name
0 John Dosh
1 Michael Deesh
2 Julia Roberts
Name
0 John Dosh
1 John Dosh
2 Michael Deesh
3 Michael Deesh
4 Michael Deesh
5 Julia Roberts
6 Julia Roberts
outputs:
id Name
0 1 John Dosh
1 2 Michael Deesh
2 3 Julia Roberts
id Name
0 1 John Dosh
1 1 John Dosh
2 2 Michael Deesh
3 2 Michael Deesh
4 2 Michael Deesh
5 3 Julia Roberts
6 3 Julia Roberts
UPDATE:
An alternative solution which should give the same result is:
df1 = df1.assign(id=list(range(1, len(df1) + 1)))[['id'] + list(df1.columns)]
df2 = df2.merge(df1)[['id'] + list(df2.columns)]

Python DataFrame : find previous row's value before a specific value with same value in other columns

I have a datafame as follows
import pandas as pd
d = {
'Name' : ['James', 'John', 'Peter', 'Thomas', 'Jacob', 'Andrew','John', 'Peter', 'Thomas', 'Jacob', 'Peter', 'Thomas'],
'Order' : [1,1,1,1,1,1,2,2,2,2,3,3],
'Place' : ['Paris', 'London', 'Rome','Paris', 'Venice', 'Rome', 'Paris', 'Paris', 'London', 'Paris', 'Milan', 'Milan']
}
df = pd.DataFrame(d)
Name Order Place
0 James 1 Paris
1 John 1 London
2 Peter 1 Rome
3 Thomas 1 Paris
4 Jacob 1 Venice
5 Andrew 1 Rome
6 John 2 Paris
7 Peter 2 Paris
8 Thomas 2 London
9 Jacob 2 Paris
10 Peter 3 Milan
11 Thomas 3 Milan
[Finished in 0.7s]
The dataframe represents people visiting various cities, Order column defines the order of visit.
I would like find which city people visited before Paris.
Expected dataframe is as follows
Name Order Place
1 John 1 London
2 Peter 1 Rome
4 Jacob 1 Venice
Which is the pythonic way to find it ?
Using merge
s = df.loc[df.Place.eq('Paris'), ['Name', 'Order']]
m = s.assign(Order=s.Order.sub(1))
m.merge(df, on=['Name', 'Order'])
Name Order Place
0 John 1 London
1 Peter 1 Rome
2 Jacob 1 Venice

Creating a new dataframe from applying a function on all cell of a dataframe

I have a data frame, df, like this:
data = {'A': ['Jason (121439)', 'Molly (194439)', 'Tina (114439)', 'Jake (127859)', 'Amy (122579)'],
'B': ['Bob (127439)', 'Mark (136489)', 'Tyler (121443)', 'John (126259)', 'Anna(174439)'],
'C': ['Jay (121596)', 'Ben (12589)', 'Toom (123586)', 'Josh (174859)', 'Al(121659)'],
'D': ['Paul (123839)', 'Aaron (124159)', 'Steve (161899)', 'Vince (179839)', 'Ron (128379)']}
df = pd.DataFrame(data)
And I want to create a new data frame with one column with the name and the other column with the number between parenthesis, which would look like this:
data2 = {'Name': ['Jason ', 'Molly ', 'Tina ', 'Jake ', 'Amy '],
'ID#': ['121439', '194439', '114439', '127859', '122579']}
result = pd.DataFrame(data2)
I try different things, but it all did not work:
1)
List_name=pd.DataFrame()
List_id=pd.DataFrame()
List_both=pd.DataFrame(columns=["Name","ID"])
for i in df.columns:
left=df[i].str.split("(",1).str[0]
right=df[i].str.split("(",1).str[1]
List_name=List_name.append(left)
List_id=List_id.append(right)
List_both=pd.concat([List_name,List_id], axis=1)
List_both
2) applying a function on all cell
Names = lambda x: x.str.split("(",1).str[0]
IDS = Names = lambda x: x.str.split("(",1).str[1]
But I was wondering how to do it in order to store it in a data frame that will look like result...
You can use stack followed by str.extract.
(df.stack()
.str.strip()
.str.extract(r'(?P<Name>.*?)\s*\((?P<ID>.*?)\)$')
.reset_index(drop=True))
Name ID
0 Jason 121439
1 Bob 127439
2 Jay 121596
3 Paul 123839
4 Molly 194439
5 Mark 136489
6 Ben 12589
7 Aaron 124159
8 Tina 114439
9 Tyler 121443
10 Toom 123586
11 Steve 161899
12 Jake 127859
13 John 126259
14 Josh 174859
15 Vince 179839
16 Amy 122579
17 Anna 174439
18 Al 121659
19 Ron 128379

Update values in pandas dataframe

I would like to update fields in my dataframe :
df = pd.DataFrame([
{'Name': 'Paul', 'Book': 'Plane', 'Cost': 22.50},
{'Name': 'Jean', 'Book': 'Harry Potter', 'Cost': 2.50},
{'Name': 'Jim', 'Book': 'Sponge bob', 'Cost': 5.00}
])
Book Cost Name
0 Plane 22.5 Paul
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jim
Changing names with this string :
{"Paul": "Paula", "Jim": "Jimmy"}
to get this result :
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy
any idea ?
I think you need replace by dictionary d:
d = {"Paul": "Paula", "Jim": "Jimmy"}
df.Name = df.Name.replace(d)
print (df)
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy
Another solution with map and combine_first - map return NaN where not match, so need replace it by original values:
df.Name = df.Name.map(d).combine_first(df.Name)
print (df)
Book Cost Name
0 Plane 22.5 Paula
1 Harry Potter 2.5 Jean
2 Sponge bob 5.0 Jimmy

Convert a list of dictionaries with varying keys to a dataframe

How do I get a list of dictionaries converted into a dataframe whose columns are 'Event', 'Id', 'Name'?
sample = [{'event': 'up', '53118': 'Harry'},
{'event': 'up', '51880': 'Smith'},
{'event': 'down', '51659': 'Joe'},
{'52983': 'Sam', 'event': 'up'},
{'event': 'down', '52917': 'Roger'},
{'event': 'up', '314615': 'Julie'},
{'event': 'left', '276298': 'Andrew'},
{'event': 'right', '457249': 'Carlos'},
{'event': 'down', '391485': 'Jason'},
{'event': 'right', '53191': 'Taylor'},
{'51248': 'Benjy', 'event': 'down'}]
pd.DataFrame(sample) would return;
Is there a pythonic panda-ic way to convert it to this form?
Event Id Name
up 53118 Harry
up 51880 Smith
down 51659 Joe
pd.melt can get you most of the way, starting from your df = pd.DataFrame(sample):
In [74]: m = pd.melt(df, id_vars="event", var_name="Id", value_name="Name").dropna()
In [75]: m
Out[75]:
event Id Name
6 left 276298 Andrew
16 up 314615 Julie
30 down 391485 Jason
40 right 457249 Carlos
54 down 51248 Benjy
57 down 51659 Joe
67 up 51880 Smith
81 down 52917 Roger
91 up 52983 Sam
99 up 53118 Harry
119 right 53191 Taylor
And then you can do some cleanup (reset_index(drop=True), rename(columns={"event": "Event"}), convert Id to integers, etc.)
Since #eumiro makes a good point, we could also implement #MattDMo's suggestion easily enough:
In [90]: sample = [dict(event=d.pop("event"), id=min(d), name=min(d.values())) for d in sample]
In [91]: pd.DataFrame(sample)
Out[91]:
event id name
0 up 53118 Harry
1 up 51880 Smith
2 down 51659 Joe
3 up 52983 Sam
4 down 52917 Roger
5 up 314615 Julie
6 left 276298 Andrew
7 right 457249 Carlos
8 down 391485 Jason
9 right 53191 Taylor
10 down 51248 Benjy
Here I've taken advantage of the fact that once we pop event there's only one element in the dictionary left, but a more manual loop would work as easily.
You need to adjust your dicts, so that instead of having:
{'event': 'up', '53118': 'Harry'}
you have:
{'event': 'up', 'id': '53118', 'name': 'Harry'}
resulting in:
In [23]: df = pd.DataFrame(sample)
In [24]: df
Out[24]:
event id name
0 up 53118 Harry
1 up 51880 Smith
2 down 51659 Joe
3 up 52983 Sam
4 down 52917 Roger
5 up 314615 Julie
6 left 276298 Andrew
7 right 457249 Carlos
8 down 391485 Jason
9 right 53191 Taylor
10 down 51248 Benjy

Categories

Resources