I have the following dataframes:
df = pd.DataFrame({'nameCompany': ['Piestrita Inc', 'Total Play', 'Yate Inc', 'Spider Comp', 'Tech solutions', 'LG Inno'],
'code': ['1', '1', '2', '3', '3', '3']
'results': ['Rick', 'Patram', 'Pulis', 'Marie', 'Landon', 'Freddy']})
df2 = pd.DataFrame({'nameCompany': ['Alaska Inc', 'Kira', 'Joli Molly', 'Health Society'],
'code': ['1', '2', '3', '3']})
df:
nameCompany
code
results
Piestrita Inc
1
Rick
Total Play
1
Patram
Yate Inc
2
Pulis
Spider Comp
3
Marie
Tech solutions
3
Landon
LG Inno
3
Freddy
df2:
nameCompany
code
Alaska Inc
1
Kira
2
Joli Molly
3
Health Society
3
I need to make an update in the df in order to update the value of companyName if it appears in the df2 the code of the df, this update must be on the last element of the df if only one code appears in the df2 but if more appear it must be in the last positions, therefore, the output should be the following one:
df_new = pd.DataFrame({'nameCompany': ['Piestrita Inc', 'Alaska Inc', 'Kira', 'Spider Comp', 'Joli Molly', 'Health Society'],
'code': ['1', '1', '2', '3', '3', '3']
'results': ['Rick', 'Patram', 'Pulis', 'Marie', 'Landon', 'Freddy']})
df_new:
nameCompany
code
results
Pietrista Inc
1
Rick
Alaska Inc
1
Patram
Kira
2
Pulis
Spider Comp
3
Marie
Joli Molly
3
London
Health Society
3
Freddy
I have tried with the update method but I have not obtained the expected results, any suggestions?
Use GroupBy.cumcount with ascending=False for counter column from last values, then use DataFrame.merge by it and code and last use Series.combine_first:
df['g'] = df.groupby('code').cumcount(ascending=False)
df2['g'] = df2.groupby('code').cumcount(ascending=False)
df = df.merge(df2, on=['code','g'], how='left', suffixes=['','_']).drop('g', axis=1)
df['nameCompany'] = df.pop('nameCompany_').combine_first(df['nameCompany'])
print (df)
nameCompany code results
0 Piestrita Inc 1 Rick
1 Alaska Inc 1 Patram
2 Kira 2 Pulis
3 Spider Comp 3 Marie
4 Joli Molly 3 Landon
5 Health Society 3 Freddy
Related
df_current = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-18'],'Name': ['Bob Jones', 'Mike Smith', 'Adam Smith'],
'Items Sold':[1, 3, 2], 'Ticket Type':['1 x GA', '2 x VIP, 1 x GA', '1 x GA, 1 x VIP']})
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 1 x GA
1 2022-09-17 Mike Smith 3 2 x VIP, 1 x GA
2 2022-09-18 Adam Smith 2 1 x GA, 1 x VIP
Hi there. I have the above dataframe, and what I'm after is new rows, with the ticket type and number of tickets sold split out such as below:
df_desired = pd.DataFrame({'Date':['2022-09-16', '2022-09-17', '2022-09-17', '2022-09-18', '2022-09-18'],
'Name': ['Bob Jones', 'Mike Smith', 'Mike Smith', 'Adam Smith', 'Adam Smith'],
'Items Sold':[1, 2, 1, 1, 1], 'Ticket Type':['GA', 'VIP', 'GA', 'GA', 'VIP']})
Any help would be greatly appreciated!
#create df2, by splitting df['ticket type'] on "," and then explode to create rows
df2=df.assign(tt=df['Ticket Type'].str.split(',')).explode('tt')
# split again at 'x'
df2[['Items Sold','Ticket Type']]=df2['tt'].str.split('x', expand=True)
#drop the temp column
df2.drop(columns="tt", inplace=True)
df2
Date Name Items Sold Ticket Type
0 2022-09-16 Bob Jones 1 GA
1 2022-09-17 Mike Smith 2 VIP
1 2022-09-17 Mike Smith 1 GA
2 2022-09-18 Adam Smith 1 GA
2 2022-09-18 Adam Smith 1 VIP
I have to extract rows from a pandas dataframe with values in 'Date of birth' column which occur in a list with dates.
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].isin(dates)]
print(new_df)
0 Jack 1973
1 Mary 1999
2 David 1995
3 Bruce 1992/1991
4 Nick 2000
5 Mark 1969
6 Carl 1994
7 Sofie 1989/1990
Eventually I get the table below. As you can see, Bruce's and Sofie's rows are absent since the value is followed by / and another value. How should I split up these two filter them out?
Name Date of birth
0 Jack 1973
5 Mark 1969
You could use str.contains:
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].str.contains(rf"\b{'|'.join(dates)}\b")]
print(new_df)
Output
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990
The string rf"\b{'|'.join(dates)}\b" is a regex pattern that will match any of string that contains any of the dates.
I like #DaniMesejo way better but here is a way splitting up the values and stacking:
df[df['Date of birth'].str.split('/', expand=True).stack().isin(dates).max(level=0)]
Output:
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990
I have a datafame as follows
import pandas as pd
d = {
'Name' : ['James', 'John', 'Peter', 'Thomas', 'Jacob', 'Andrew','John', 'Peter', 'Thomas', 'Jacob', 'Peter', 'Thomas'],
'Order' : [1,1,1,1,1,1,2,2,2,2,3,3],
'Place' : ['Paris', 'London', 'Rome','Paris', 'Venice', 'Rome', 'Paris', 'Paris', 'London', 'Paris', 'Milan', 'Milan']
}
df = pd.DataFrame(d)
Name Order Place
0 James 1 Paris
1 John 1 London
2 Peter 1 Rome
3 Thomas 1 Paris
4 Jacob 1 Venice
5 Andrew 1 Rome
6 John 2 Paris
7 Peter 2 Paris
8 Thomas 2 London
9 Jacob 2 Paris
10 Peter 3 Milan
11 Thomas 3 Milan
[Finished in 0.7s]
The dataframe represents people visiting various cities, Order column defines the order of visit.
I would like find which city people visited before Paris.
Expected dataframe is as follows
Name Order Place
1 John 1 London
2 Peter 1 Rome
4 Jacob 1 Venice
Which is the pythonic way to find it ?
Using merge
s = df.loc[df.Place.eq('Paris'), ['Name', 'Order']]
m = s.assign(Order=s.Order.sub(1))
m.merge(df, on=['Name', 'Order'])
Name Order Place
0 John 1 London
1 Peter 1 Rome
2 Jacob 1 Venice
I have a data frame, df, like this:
data = {'A': ['Jason (121439)', 'Molly (194439)', 'Tina (114439)', 'Jake (127859)', 'Amy (122579)'],
'B': ['Bob (127439)', 'Mark (136489)', 'Tyler (121443)', 'John (126259)', 'Anna(174439)'],
'C': ['Jay (121596)', 'Ben (12589)', 'Toom (123586)', 'Josh (174859)', 'Al(121659)'],
'D': ['Paul (123839)', 'Aaron (124159)', 'Steve (161899)', 'Vince (179839)', 'Ron (128379)']}
df = pd.DataFrame(data)
And I want to create a new data frame with one column with the name and the other column with the number between parenthesis, which would look like this:
data2 = {'Name': ['Jason ', 'Molly ', 'Tina ', 'Jake ', 'Amy '],
'ID#': ['121439', '194439', '114439', '127859', '122579']}
result = pd.DataFrame(data2)
I try different things, but it all did not work:
1)
List_name=pd.DataFrame()
List_id=pd.DataFrame()
List_both=pd.DataFrame(columns=["Name","ID"])
for i in df.columns:
left=df[i].str.split("(",1).str[0]
right=df[i].str.split("(",1).str[1]
List_name=List_name.append(left)
List_id=List_id.append(right)
List_both=pd.concat([List_name,List_id], axis=1)
List_both
2) applying a function on all cell
Names = lambda x: x.str.split("(",1).str[0]
IDS = Names = lambda x: x.str.split("(",1).str[1]
But I was wondering how to do it in order to store it in a data frame that will look like result...
You can use stack followed by str.extract.
(df.stack()
.str.strip()
.str.extract(r'(?P<Name>.*?)\s*\((?P<ID>.*?)\)$')
.reset_index(drop=True))
Name ID
0 Jason 121439
1 Bob 127439
2 Jay 121596
3 Paul 123839
4 Molly 194439
5 Mark 136489
6 Ben 12589
7 Aaron 124159
8 Tina 114439
9 Tyler 121443
10 Toom 123586
11 Steve 161899
12 Jake 127859
13 John 126259
14 Josh 174859
15 Vince 179839
16 Amy 122579
17 Anna 174439
18 Al 121659
19 Ron 128379
How do I get a list of dictionaries converted into a dataframe whose columns are 'Event', 'Id', 'Name'?
sample = [{'event': 'up', '53118': 'Harry'},
{'event': 'up', '51880': 'Smith'},
{'event': 'down', '51659': 'Joe'},
{'52983': 'Sam', 'event': 'up'},
{'event': 'down', '52917': 'Roger'},
{'event': 'up', '314615': 'Julie'},
{'event': 'left', '276298': 'Andrew'},
{'event': 'right', '457249': 'Carlos'},
{'event': 'down', '391485': 'Jason'},
{'event': 'right', '53191': 'Taylor'},
{'51248': 'Benjy', 'event': 'down'}]
pd.DataFrame(sample) would return;
Is there a pythonic panda-ic way to convert it to this form?
Event Id Name
up 53118 Harry
up 51880 Smith
down 51659 Joe
pd.melt can get you most of the way, starting from your df = pd.DataFrame(sample):
In [74]: m = pd.melt(df, id_vars="event", var_name="Id", value_name="Name").dropna()
In [75]: m
Out[75]:
event Id Name
6 left 276298 Andrew
16 up 314615 Julie
30 down 391485 Jason
40 right 457249 Carlos
54 down 51248 Benjy
57 down 51659 Joe
67 up 51880 Smith
81 down 52917 Roger
91 up 52983 Sam
99 up 53118 Harry
119 right 53191 Taylor
And then you can do some cleanup (reset_index(drop=True), rename(columns={"event": "Event"}), convert Id to integers, etc.)
Since #eumiro makes a good point, we could also implement #MattDMo's suggestion easily enough:
In [90]: sample = [dict(event=d.pop("event"), id=min(d), name=min(d.values())) for d in sample]
In [91]: pd.DataFrame(sample)
Out[91]:
event id name
0 up 53118 Harry
1 up 51880 Smith
2 down 51659 Joe
3 up 52983 Sam
4 down 52917 Roger
5 up 314615 Julie
6 left 276298 Andrew
7 right 457249 Carlos
8 down 391485 Jason
9 right 53191 Taylor
10 down 51248 Benjy
Here I've taken advantage of the fact that once we pop event there's only one element in the dictionary left, but a more manual loop would work as easily.
You need to adjust your dicts, so that instead of having:
{'event': 'up', '53118': 'Harry'}
you have:
{'event': 'up', 'id': '53118', 'name': 'Harry'}
resulting in:
In [23]: df = pd.DataFrame(sample)
In [24]: df
Out[24]:
event id name
0 up 53118 Harry
1 up 51880 Smith
2 down 51659 Joe
3 up 52983 Sam
4 down 52917 Roger
5 up 314615 Julie
6 left 276298 Andrew
7 right 457249 Carlos
8 down 391485 Jason
9 right 53191 Taylor
10 down 51248 Benjy