How to make an one-to-one merge on pandas dataframe

How to make an one-to-one merge on pandas dataframe - python

Let's say df1 looks like:
id x
a 1
b 2
b 3
c 4
and df2 looks like:
id y
b 9
b 8
How do I merge them so that the output is:
id x y
b 2 9
b 3 8
I've tried pd.merge(df1, df2, on='id') but it is giving me:
id x y
b 2 9
b 2 8
b 3 9
b 3 8
which is not what I want.

IIUC, GroupBy.cumcount + merge
new_df = (df1.assign(count=df1.groupby('id').cumcount())
.merge(df2.assign(count=df2.groupby('id').cumcount()),
on=['id', 'count'], how='inner')
.drop(columns='count'))
id x y
0 b 2 9
1 b 3 8

Related

merge dataframes and replace existing column

df1 = pd.DataFrame({'A':[3,5,2,5], 'B':['w','x','y','z'], 'C':['0','0','0','0']})
df2 = pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})
I'm trying to merge df1 and df2 on B and keep all A B C and D columns:
A B C D
0 3.0 w 1 10.0
1 5.0 x 2 20.0
2 2.0 y 3 30.0
3 5.0 z 4 40.0
I've tried df1.merge(df2, how='outer', on='B')
A B C_x C_y D
0 3 w 0 1 10
1 5 x 0 2 20
2 2 y 0 3 30
3 5 z 0 4 40
which is almost what I want, but need C in df2 to replace C in df1. How can I achieve that?

If you don't want C from the lefthand side at all you could simply drop it before the merge:
df1 = pd.DataFrame({'A':[3,5,2,5], 'B':['w','x','y','z'], 'C':['0','0','0','0']})
df2 = pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})
result = pd.merge(
df1.drop('C', axis=1),
df2,
how='outer',
on='B')
A B C D
0 3 w 1 10
1 5 x 2 20
2 2 y 3 30
3 5 z 4 40
Edit:
However, if you're wanting to combine in cases where you don't have C from df2 you could utilize combine_first():
df1 = pd.DataFrame({'A':[3,5,2,5,6], 'B':['w','x','y','z','q'], 'C':['0','0','0','0','88']})
df2 = pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})
result_2 = pd.merge(df1, df2, how='outer', on='B')
result_2['C'] = result_2['C_y'].combine_first(result_2['C_x'])
result_2.drop(['C_x', 'C_y'], axis=1, inplace=True)
A B D C
0 3 w 10.0 1
1 5 x 20.0 2
2 2 y 30.0 3
3 5 z 40.0 4
4 6 q 88

here is one way to do it, by choosing the column you need to include in the merge
df1[['A','B']].merge(df2,
on='B',
how='outer')
A B C D
0 3 w 1 10
1 5 x 2 20
2 2 y 3 30
3 5 z 4 40

Pandas: How do I insert DataFrame columns in another column with the resulting MultiIndex?

I got the following data structure for object 1:
dayofweek A B C
Monday 1 2 3
Tuesday 4 5 6
All those items A, B, C I got for other objects, let's say Obj1. Obj2, Obj3.
I wanna put all the date in one dataframe with the MultiIndex columns structure:
object Obj1 Obj2 Obj3
dayofweek A B C A B C A B C
Monday 1 2 3 2 1 3 3 2 1
Tuesday 4 5 6 5 4 6 6 5 4
How can I do it easily? I tried to use .unstack(), but it puts objects' label below A, B, C columns

Use concat with keys parameter for MultiIndex with rename columns:
df = df.set_index('dayofweek')
df1 = df.rename(columns={'A':'B', 'B':'A'}).sort_index(axis=1)
df2 = df.rename(columns={'A':'C', 'C':'A'}).sort_index(axis=1)
df3 = pd.concat([df, df1, df2], keys=('Obj1','Obj2','Obj3'), axis=1)
print (df3)
Obj1 Obj2 Obj3
A B C A B C A B C
dayofweek
Monday 1 2 3 2 1 3 3 2 1
Tuesday 4 5 6 5 4 6 6 5 4
If there are 3 DataFrames with dayofweek column use:
dfs = [df, df1, df2]
df3 = pd.concat([x.set_index('dayofweek') for x in dfs], keys=('Obj1','Obj2','Obj3'), axis=1)
print (df3)

Try to use merge:
print(obj1.merge(obj2, on='dayofweek').merge(obj3, on='dayofweek'))
result:
dayofweek A_x B_x C_x A_y B_y C_y A B C
0 Monday 1 2 3 2 1 3 3 2 1
1 Tuesday 4 5 6 5 4 6 6 5 4

selecting rows of one dataframe using multiple columns of another dataframe in python, pandas

I want to pick only rows from df1 where both values of columns A and B in df1 match values of columns A and B in df2 so for example if df 1 and df2 are as follow:
df1
A B C
1 2 3
4 5 6
6 7 8
df2
A B D E
1 2 6 8
2 3 7 9
4 5 2 1
the result will be a subset of df1 rows, in this example, result will look like:
df1
A B C
1 2 3
4 5 6

Use:
df = pd.merge(df1, df2[["A", "B"]], on=["A", "B"], how="inner")
print(df)
This prints:
A B C
0 1 2 3
1 4 5 6

Concatenate data frames by column values

How I can merge following two data frames on columns A and B:
df1
A B C
1 2 3
2 8 2
4 7 9
df2
A B C
5 6 7
2 8 9
And with result to get only results of those two matching rows.
df3
A B C
2 8 2
2 8 9

You can concatenate them and drop the ones that are not duplicated:
conc = pd.concat([df1, df2])
conc[conc.duplicated(subset=['A', 'B'], keep=False)]
Out:
A B C
1 2 8 2
1 2 8 9
If you have duplicates,
df1
Out:
A B C
0 1 2 3
1 2 8 2
2 4 7 9
3 4 7 9
4 2 8 5
df2
Out:
A B C
0 5 6 7
1 2 8 9
3 5 6 4
4 2 8 10
You can keep track of the duplicated ones via boolean arrays:
cols = ['A', 'B']
bool1 = df1[cols].isin(df2[cols].to_dict('l')).all(axis=1)
bool2 = df2[cols].isin(df1[cols].to_dict('l')).all(axis=1)
pd.concat([df1[bool1], df2[bool2]])
Out:
A B C
1 2 8 2
4 2 8 5
1 2 8 9
4 2 8 10

Solution with Index.intersection, then select values in both DataFrames by loc and last concat them together:
df1.set_index(['A','B'], inplace=True)
df2.set_index(['A','B'], inplace=True)
idx = df1.index.intersection(df2.index)
print (idx)
MultiIndex(levels=[[2], [8]],
labels=[[0], [0]],
names=['A', 'B'],
sortorder=0)
df = pd.concat([df1.loc[idx],df2.loc[idx]]).reset_index()
print (df)
A B C
0 2 8 2
1 2 8 9

Here is a less efficient method that should preserve duplicates, but involves two merge/joins
# create a merged DataFrame with variables C_x and C_y with the C values
temp = pd.merge(df1, df2, how='inner', on=['A', 'B'])
# join columns A and B to a stacked DataFrame with the Cs on index
temp[['A', 'B']].join(
pd.DataFrame({'C':temp[['C_x', 'C_y']].stack()
.reset_index(level=1, drop=True)})).reset_index(drop=True)
This returns
A B C
0 2 8 2
1 2 8 9

Grouping row in pandas

I have a dataframe looks like this:
In [4]:
import pandas as pd
df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df
Out[4]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
I just want to group row which has same value in column a. The desired output is like this:
df
Out[4]:
a b
0 A 1
2
1 B 5
5
4
2 C 6
EDIT:
I am sorry, actually the desired output may be like this:
df
Out[4]:
b
A 1
2
B 5
5
4
C 6

I think you are looking for set_index rather than groupby:
In [11]: df.set_index('a')
Out[11]:
b
a
A 1
A 2
B 5
B 5
B 4
C 6

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make an one-to-one merge on pandas dataframe - python

Let's say df1 looks like: id x a 1 b 2 b 3 c 4 and df2 looks like: id y b 9 b 8 How do I merge them so that the output is: id x y b 2 9 b 3 8 I've tried pd.merge(df1, df2, on='id') but it is giving me: id x y b 2 9 b 2 8 b 3 9 b 3 8 which is not what I want.

IIUC, GroupBy.cumcount + merge new_df = (df1.assign(count=df1.groupby('id').cumcount()) .merge(df2.assign(count=df2.groupby('id').cumcount()), on=['id', 'count'], how='inner') .drop(columns='count')) id x y 0 b 2 9 1 b 3 8

Related

merge dataframes and replace existing column

Pandas: How do I insert DataFrame columns in another column with the resulting MultiIndex?

selecting rows of one dataframe using multiple columns of another dataframe in python, pandas

Concatenate data frames by column values

Grouping row in pandas

Categories

Resources