i have two dataframe:
df1:
colname value
gender M
status business
age 60
df2:
colname value
name Susan
Place Africa
gender F
Is there a way i can concatenate these two dataframe in a way as the expected output? I tried outer join but it doesnot work, Thank you in advance.
Note: No dataframes have always the same common attribute, and also I am trying to remove the colname and value column.
Expected output:
gender status age name Place
0 M business 60 0 0
1 F 0 0 Susan Africa
You can convert to Series with colname as index and concat:
dfs = [df1, df2]
out = pd.concat([d.set_index('colname')['value'] for d in dfs],
axis=1, ignore_index=True).T
output:
colname gender status age name Place
0 M business 60 NaN NaN
1 F NaN NaN Susan Africa
Try this:
pd.concat([df1, df2], axis=0).fillna(0).reset_index(drop=True)
gender status age name place
0 M business 60 0 0
1 F 0 0 Susan Africa
The fillna will replace the NaN values with 0.
Below line will resolves your issue , you can use the pandas transpose function to solve this problem.
df_new = pd.concat([df1,df2])
df_new.set_index("colname").T
Related
In python, I have a df that looks like this
Name ID
Anna 1
Sarah 2
Max 3
And a df that looks like this
Name ID
Dan 1
Hallie 2
Cam 3
How can I merge the df’s so that the ID column looks like this
Name ID
Anna 1
Sarah 2
Max 3
Dan 4
Hallie 5
Cam 6
This is just a minimal reproducible example. My actual data set has 1000’s of values. I’m basically merging data frames and want the ID’s in numerical order (continuation of previous data frame) instead of repeating from one each time.
Use pd.concat:
out = pd.concat([df1, df2.assign(ID=df2['ID'] + df1['ID'].max())], ignore_index=True)
print(out)
# Output
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
Concatenate the two DataFrames, reset_index and use the new index to assign "ID"s
df_new = pd.concat((df1, df2)).reset_index(drop=True)
df_new['ID'] = df_new.index + 1
Output:
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
You can concat dataframes with ignore_index=True and then set ID column:
df = pd.concat([df1, df2], ignore_index=True)
df['ID'] = df.index + 1
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 3 years ago.
For the life of me I can not figure out how to implement the following solution:
Suppose I have a dataframe called df1
ID Name Gender
0 Bill M
1 Adam M
2 Kat F
1 Adam M
Then I have another dataframe called df2
ID Name Age
5as Sam 34
1as Adam 64
2as Kat 50
All I want to do is check if ID from df1 is in ID in df2, if so grab the corresponding Age column and attache it to df1.
Ideal Solution:
ID Name Gender Age
0 Bill M
1 Adam M 64
2 Kat F 50
1 Adam M 64
I have implement the following solution which at first I thought it works but realized it was missing matching a lot of values at the end of df. Not sure if it is because of what I wrote or the size of my CSV which is large.
y_list = df2.ID.dropna().unique()
for x in df1.ID.unique():
if x in y_list:
df1.loc[df1.ID == x, 'Age'] = df2.Age
Any help is appreciated!
Here's what you can do
df3 = df1.join(df2.set_index('ID'), on='ID', lsuffix='_left')
if you want to join on the 'ID' column.
If however you are looking to join on 'Name', you can change on='Name'.
An alternative option is to use merge,
df1.merge(df2, on='Name', how='left')
Output
ID Name_x Gender Name_y Age
0 0 Bill M NaN NaN
1 1 Adam M Adam 64.0
2 2 Kat F Kat 50.0
3 1 Adam M Adam 64.0
Here's the output when using caller.set_index('ID').join(other.set_index('ID'), lsuffix='_left')
Name_left Gender Name Age
ID
0 Bill M NaN NaN
1 Adam M Adam 64.0
1 Adam M Adam 64.0
2 Kat F Kat 50.0
You can do as below
name_age_dict = dict(zip(df2['Name'], df2['Age']))
df1['Age'] = df1['Name'].map(name_age_dict).fillna('')
Another method
df1['Age'] = df1['Name'].map(df2.set_index('Name')['Age']).fillna('')
Output
ID Name Gender Age
0 0 Bill M
1 1 Adam M 64
2 2 Kat F 50
3 1 Adam M 64
I have the following dataframe, df1 :
AS AT CH TR
James Robert/01/08/2019 0 0 0 1
James Robert/18/08/2019 0 0 0 1
John Smith/01/08/2019 1 0 0 0
John Smith/02/08/2019 0 1 0 0
And df2 :
TIME
Andrew Johnson/08/08/2019 1
James Robert/01/08/2019 0.5
John Smith/02/08/2019 1
If an index value is present in both dataframes (example : James Robert/01/08/2019 and John Smith/02/08/2019), I would like to delete the row in df1 if the value of df1["Column with a value"] - df2['TIME'] = 0 otherwise I would like to update the value.
The desired output would be :
AS AT CH TR
James Robert/01/08/2019 0 0 0 0.5
James Robert/18/08/2019 0 0 0 1
John Smith/01/08/2019 1 0 0 0
If a row is in both dataframes, I'm able to delete it from df1, but I can't find a way to add this particular condition : "df1["Column with a value"]"
Thanks
Instead of using index use them as columns. Place the df2['index'] column in a list. Use that list as parameter in isin method done in df1.
df2['index'] = df2.index
df1['index'] = df1.index
filtered_df1 = df1[df1['index'].isin(df2['index'].values.tolist())]
Create a dictionary with your 'index' column and the value for your 'Time' column from df2 then map it to filtered_df1.
your_dict = dict(zip(df2['index'],df2['Time']))
filtered_df1['Subtract Value'] = filtered_df1['index'].map(your_dict).fillna(value = 0)
Then do the subtraction there.
final_df = filtered_df1.sub(filtered_df1['Subtract Value'], axis=0)
Hope this helps.
I am trying to add a column from one pandas data-frame to another pandas data-frame.
Here is data frame 1:
print (df.head())
ID Name Gender
1 John Male
2 Denver 0
0
3 Jeff Male
Note: Both ID and Name are indexes
Here is the data frame 2:
print (df2.head())
ID Appointed
1 R
2
3 SL
Note: ID is the index here.
I am trying to add the Appointed column from df2 to df1 based on the ID. I tried inserting the column and copying the column from df2 but the Appointed column keeps returning all NAN values. So far I had no luck any suggestions would be greatly appreciated.
If I understand your problem correctly, you should get what you need using this:
df1.reset_index().merge(df2.reset_index(), left_on='ID', right_on='ID')
ID Name Gender Appointed
0 1 John Male R
1 2 Denver 0 NaN
2 3 Jeff Male SL
Or, as an alternative, as pointed out by Wen, you could use join:
df1.join(df2)
Gender Appointed
ID Name
1 John Male R
2 Denver 0 NaN
0 NaN NaN NaN
3 Jeff Male SL
Reset index for both datafrmes and then create a column named 'Appointed' in df1 and assign the same column of df2 in it.
After resetting index,both datafrmes have index beginning from 0. When we assign the column, they automatically align according to index which is a property of pandas dataframe
df1.reset_index()
df2.reset_index()
df1['Appointed'] = df2['Appointed']
I want to merge 2 csv file with a similar column but different header name.
a.csv:
id name country
1 Cyrus MY
2 May US
b.csv:
user_id gender
1 female
2 male
What I need is, c.csv:
id name country gender
1 Cyrus MY female
2 May US male
But the result I get when I use the below code
import csv
import pandas as pd
df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
df3 = pd.merge(df1,df2, left_on=['id'],right_on=['user_id'], how='outer')
df3.to_csv('c.csv',index=False)
The result I get:
id name country user_id gender
1 Cyrus MY 1 female
2 May US 2 male
You could rename the user_id column in df2 to id. Since the name is the same, it won't be duplicated.
df2 = pd.read_csv('b.csv').rename(columns={'user_id': 'id'})
df3 = pd.merge(df1, df2, on='id', how='outer')
Otherwise you can drop the user_id column adter the merge.
df3 = df3.drop('user_id', axis=1)
You can do with merge
df1.merge(df2,left_on='id',right_on='user_id')
Out[35]:
id name country user_id gender
0 1 Cyrus MY 1 female
1 2 May US 2 male
Or concat
pd.concat([df1.set_index('id'),df2.set_index('user_id')],1).reset_index()
Out[38]:
index name country gender
0 1 Cyrus MY female
1 2 May US male