I have two DataFrame objects:
df1: columns = [a, b, c]
df2: columns = [d, e]
I want to merge df1 with df2 using the equivalent of sql in pandas:
select * from df1 inner join df2 on df1.b=df2.e and df1.b <> df2.d and df1.c = 0
The following sequence of steps should get you there:
df1 = df[df1.c==0]
merged = df1.merge(df2, left_on='b', right_on='e')
merge = merged[merged.b != merged.d]
Related
I want to combine two dataframes:
df1=pd.DataFrame({'A':['a','a',],'B':['b','b']})
df2=pd.DataFrame({'B':['b','b'],'A':['a','a']})
pd.concat([df1,df2],ignore_index=True)
result:
But I want the output to be like this (I want the same code as SQL's union/union all):
Another way is to use numpy to stack the two dataframes and then use pd.DataFrame constructor:
pd.DataFrame(np.vstack([df1.values,df2.values]), columns = df1.columns)
Output:
A B
0 a b
1 a b
2 b a
3 b a
Here is a proposition to do an SQL UNION ALL with pandas by using pandas.concat :
list_dfs = [df1, df2]
out = (
pd.concat([pd.DataFrame(sub_df.to_numpy()) for sub_df in list_dfs],
ignore_index=True)
.set_axis(df1.columns, axis=1)
)
# Output :
print(out)
A B
0 a b
1 a b
2 b a
3 b a
I want to merge two dataframes like:
df1.columns = A, B, C, E, ..., D
df2.columns = A, B, C, F, ..., D
If I merge them, it merges on all columns. Also since the number of columns is high I don't want to specify them in on. I prefer to exclude the columns which I don't want to be merged. How can I do that?
mdf = pd.merge(df1, df2, exclude D)
I expect the result be like:
mdf.columns = A, B, C, E, F ..., D_x, D_y
You mentioned you mentioned you don't want to use on "since the number of columns is much".
You could still use on this way even if there are a lot of columns:
mdf = pd.merge(df1, df2, on=[i for i in df1.columns if i != 'D'])
Or
By using pd.Index.difference
mdf = pd.merge(df1, df2, on=df1.columns.difference(['D']).tolist())
Another solution can be:
mdf = pd.merge(df1, df2, on= df1.columns.tolist().remove('D')
What about dropping the unwanted column after the merge?
You can use pandas.DataFrame.drop:
mdf = pd.merge(df1, df2).drop('D', axis=1)
or dropping before the merge:
mdf = pd.merge(df1.drop('D', axis=1), df2.drop('D', axis=1))
One solution is using intersection and then difference on df1 and df2 columns:
mdf = pd.merge(df1, df2, on=df1.columns.intersection(df2.columns).difference(['D']).tolist())
The other solution could be renaming columns you want to exclude from merge:
df2.rename(columns={"D":"D_y"}, inplace=True)
mdf = pd.merge(df1, df2)
I have a:
Dataframe df1 with columns A, B and C. A is the index.
Dataframe df2 with columns D, E and F. D is the index.
What’s an efficient way to drop from df1 all rows where B is not found in df2 (in D the index)?
If need drop some not exist values it is same like select only existing values. So is possible use:
You can filter df1.B by index from df2 in Series.isin:
df3 = df1[df1.B.isin(df2.index)]
Or by DataFrame.merge with left join:
df3 = df1.merge(df2[[]], left_on='B', right_index=True, how='left')
df1 has columns A, B, C, D, E
df2 has columns A, B, D
How to concatenate them in order to have a resulting dataframe that has rows of df1 and df2, values of A, B and D will be extended from df2 on df1, and columns C and E will be filled with NaN because df2 has no data for them?
There is a function called concat
pd.concat([df1,df2])
The input must be a iterable, so put them into a list ;)
I have a list of data frame where I want to loop all the data frame and create the dataframe by append the column by filter the columns with the list provided. below is the code
#df_ls contains list of dataframes
df_ls = [A, B, C, D]
for j in df_ls:
match_ls = ['card', 'brave', 'wellness']
for i in text:
if i in j.columns:
print(i)
df1 = j[i]
df2 = df1
df_full = df2.append(df1)
Need the Result new dataframe with single column contains all the match_ls string value.
A
card banner
rex 23
fex 45
jex 66
B
brave laminate
max ste
vax pre
jox lex
expected output
rex
fex
jex
max
vax
jox
Use list comprehension with filter columns names by Index.intersection and last use concat for join together:
df_ls = [A, B, C, D]
match_ls = ['card', 'brave', 'wellness']
dfs = [j[j.columns.intersection(match_ls)].stack().reset_index(drop=True) for j in df_ls]
df_full = pd.concat(dfs, ignore_index=True)
Loop version:
dfs = []
for j in df_ls:
df = j[j.columns.intersection(match_ls)].stack().reset_index(drop=True)
print (df)
dfs.append(df)
df_full = pd.concat(dfs, ignore_index=True)
print (df_full)