I want to merge 2 csv file with a similar column but different header name.
a.csv:
id name country
1 Cyrus MY
2 May US
b.csv:
user_id gender
1 female
2 male
What I need is, c.csv:
id name country gender
1 Cyrus MY female
2 May US male
But the result I get when I use the below code
import csv
import pandas as pd
df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
df3 = pd.merge(df1,df2, left_on=['id'],right_on=['user_id'], how='outer')
df3.to_csv('c.csv',index=False)
The result I get:
id name country user_id gender
1 Cyrus MY 1 female
2 May US 2 male
You could rename the user_id column in df2 to id. Since the name is the same, it won't be duplicated.
df2 = pd.read_csv('b.csv').rename(columns={'user_id': 'id'})
df3 = pd.merge(df1, df2, on='id', how='outer')
Otherwise you can drop the user_id column adter the merge.
df3 = df3.drop('user_id', axis=1)
You can do with merge
df1.merge(df2,left_on='id',right_on='user_id')
Out[35]:
id name country user_id gender
0 1 Cyrus MY 1 female
1 2 May US 2 male
Or concat
pd.concat([df1.set_index('id'),df2.set_index('user_id')],1).reset_index()
Out[38]:
index name country gender
0 1 Cyrus MY female
1 2 May US male
Related
I've browsed a few answers but haven't found the exact thing i'm looking for yet.
I have a pandas dataframe with a single column structured as follows (example)
0 alex
1 7
2 female
3 nora
4 3
5 female
...
999 fred
1000 15
1001 male
i want to split that single column into 3 columns holding name, age, and gender. to look something like this:
name age gender
0 alex 7 female
1 nora 3 female
...
100 fred 15 male
is there a way to do this? i was thinking about using the index but not sure how to actually do it
Not the most efficient solution perhaps, but you can use pd.concat() and put them all next to each other, if they're always in order:
df = pd.DataFrame({'Value':['alex',7,'female','nora',3,'female','fred',15,'male']})
df2 = pd.concat([df[(df.index + x) % 3 == 0].reset_index(drop=True) for x in range(3)],axis=1)
df2.columns = ["name", "gender", "age"]
Returns:
name gender age
0 alex female 7
1 nora female 3
2 fred male 15
assuming "0" is your column name:
list_a = list(df[0])
a = np.array(list_a).reshape(-1, 3).tolist()
df2= pd.DataFrame(a,columns = ["name", "age","gender"])
Consider unstack:
import pandas as pd
df = pd.DataFrame(["alex", 7, "female", "nora", 3, "female", "fred", 15, "male"])
people = range(len(df) // 3)
attributes = ["name", "age", "gender"]
multi_index = pd.MultiIndex.from_product([people, attributes])
df.set_index(multi_index).unstack(level=1).droplevel(level=0, axis=1).reindex(columns=attributes)
Output:
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male
here is one way to do it
# step through the DF and get values for name, age and gender as series
# each starts from 0, 1 and 3
name=df['Value'][::3].values
age=df['Value'][1::3].values
gender=df['Value'][2::3].values
# create a DF based on the values
out=pd.DataFrame({'name': name,
'age' : age,
'gender': gender})
out
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male
i have two dataframe:
df1:
colname value
gender M
status business
age 60
df2:
colname value
name Susan
Place Africa
gender F
Is there a way i can concatenate these two dataframe in a way as the expected output? I tried outer join but it doesnot work, Thank you in advance.
Note: No dataframes have always the same common attribute, and also I am trying to remove the colname and value column.
Expected output:
gender status age name Place
0 M business 60 0 0
1 F 0 0 Susan Africa
You can convert to Series with colname as index and concat:
dfs = [df1, df2]
out = pd.concat([d.set_index('colname')['value'] for d in dfs],
axis=1, ignore_index=True).T
output:
colname gender status age name Place
0 M business 60 NaN NaN
1 F NaN NaN Susan Africa
Try this:
pd.concat([df1, df2], axis=0).fillna(0).reset_index(drop=True)
gender status age name place
0 M business 60 0 0
1 F 0 0 Susan Africa
The fillna will replace the NaN values with 0.
Below line will resolves your issue , you can use the pandas transpose function to solve this problem.
df_new = pd.concat([df1,df2])
df_new.set_index("colname").T
In python, I have a df that looks like this
Name ID
Anna 1
Sarah 2
Max 3
And a df that looks like this
Name ID
Dan 1
Hallie 2
Cam 3
How can I merge the df’s so that the ID column looks like this
Name ID
Anna 1
Sarah 2
Max 3
Dan 4
Hallie 5
Cam 6
This is just a minimal reproducible example. My actual data set has 1000’s of values. I’m basically merging data frames and want the ID’s in numerical order (continuation of previous data frame) instead of repeating from one each time.
Use pd.concat:
out = pd.concat([df1, df2.assign(ID=df2['ID'] + df1['ID'].max())], ignore_index=True)
print(out)
# Output
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
Concatenate the two DataFrames, reset_index and use the new index to assign "ID"s
df_new = pd.concat((df1, df2)).reset_index(drop=True)
df_new['ID'] = df_new.index + 1
Output:
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
You can concat dataframes with ignore_index=True and then set ID column:
df = pd.concat([df1, df2], ignore_index=True)
df['ID'] = df.index + 1
I am trying to detect phone number, my country code is +62 but some phone manufacturer or operator use 0 and +62, after query and pivoting I get pivoted data. But, the pivoted data is out of context
Here's the pivoted data
Id +623684682 03684682 +623684684 03684684
1 1 0 1 1
2 1 1 2 1
Here's what I need to group, but I don't want to group manually (+623684682 and 03684682 is same, etc)
Id 03684682 03684684
1 1 2
2 2 3
I think need replace with aggregate sum:
df = df.groupby(lambda x: x.replace('+62','0'), axis=1).sum()
Or replace columns names and sum:
df.columns = df.columns.str.replace('\+62','0')
df = df.sum(level=0, axis=1)
print (df)
03684682 03684684
Id
1 1 2
2 2 3
I am trying to add a column from one pandas data-frame to another pandas data-frame.
Here is data frame 1:
print (df.head())
ID Name Gender
1 John Male
2 Denver 0
0
3 Jeff Male
Note: Both ID and Name are indexes
Here is the data frame 2:
print (df2.head())
ID Appointed
1 R
2
3 SL
Note: ID is the index here.
I am trying to add the Appointed column from df2 to df1 based on the ID. I tried inserting the column and copying the column from df2 but the Appointed column keeps returning all NAN values. So far I had no luck any suggestions would be greatly appreciated.
If I understand your problem correctly, you should get what you need using this:
df1.reset_index().merge(df2.reset_index(), left_on='ID', right_on='ID')
ID Name Gender Appointed
0 1 John Male R
1 2 Denver 0 NaN
2 3 Jeff Male SL
Or, as an alternative, as pointed out by Wen, you could use join:
df1.join(df2)
Gender Appointed
ID Name
1 John Male R
2 Denver 0 NaN
0 NaN NaN NaN
3 Jeff Male SL
Reset index for both datafrmes and then create a column named 'Appointed' in df1 and assign the same column of df2 in it.
After resetting index,both datafrmes have index beginning from 0. When we assign the column, they automatically align according to index which is a property of pandas dataframe
df1.reset_index()
df2.reset_index()
df1['Appointed'] = df2['Appointed']