How to join two dataframes in Pandas? [duplicate] - python

This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 3 years ago.
I have two dataframes.
The first dataframe is A.
And the second dataframe is B.
Basically both dataframes have AdId fields. First dataframe has unique AdIds per row but the second dataframe has multiple instances of a single AdId. I want to get all the information of that AdId to the second dataframe.
I am expecting the output as follows
I have tried the following code
B.join(A, on='AdId', how='left', lsuffix='_caller')
But this does not give the expected output.

Use pandas concat:
result = pd.concat([df1, df4], axis=1, sort=False)
More on merging with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#set-logic-on-the-other-axes

Related

Merging Pandas DFs and overwriting NaN [duplicate]

This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 1 year ago.
I have two DFs that I am trying to merge on the column 'conId'.The DFs have different number of rows and the only other overlapping column is 'delta'.
I am using pf.merge(greek,on='conId',how='left')
The resulting DF is giving me columns 'delta_x' and 'delta_y'
how can I merge these two columns into one column?
Thank you!
You can use
df['delta_x'] = df['detlt_x'].fillna(df['delta_y'])
then drop column if you want
df.drop(['delta_y'], axis=1)

Panda's MERGE on customerEmail column having duplicates [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
Aim is to detect fraud from this dataset.
I have two dataframes with columns as:
DF1[customerEmail, customerphone, customerdevice,customeripadd,NoOftransactions,Fraud] etc (168,11)
DF2[customerEmail,transactionid, payment methods,orderstatus] etc (623,11)
The customerEmail column is common in both the dataframes so it makes sense to merge tables on customerEmail.
The problem is that I have repeating customerEmail in DF2 with no reference in DF1. So when I merge using:
: DF3 = pd.merge(DF1, DF2, on='customerEmail')
the total size of rows and columns is (819,18) with repeating email ID having misleading data.
I want it to match using customerEmail from DF1 so my final dataframe DF3 should be somewhere equal to DF1.
Here's a link to the data for you to look at. Cheers
https://www.kaggle.com/aryanrastogi7767/ecommerce-fraud-data
Try changing the how parameter to 'left'.
For example:
DF3 = DF1.merge(DF2, how='left', on='customerEmail')
Failing this, we prob need some more information.
Maybe you should consider a different value for the option "how". By default, it is "inner" meaning deleting all rows without any match
Maybe the option "right", would help you, as then DF2 is the reference and DF1 is join to DF2.

how to remove duplicates when using pandas concat to combine two dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two data from.
df1 with columns: id,x1,x2,x3,x4,....xn
df2 with columns: id,y.
df3 =pd.concat([df1,df2],axis=1)
when I use pandas concat to combine them, it became
id,y,id,x1,x2,x3...xn.
there are two id here.How can I get rid of one.
I have tried :
df3=pd.concat([df1,df2],axis=1).drop_duplicates().reset_index(drop=True).
but not work.
DataFrames are concatenated on the index. Make sure that id is the index before concatenating:
df3 = pd.concat([df1.set_index('id'),
df2.set_index('id')], axis=1).reset_index()
Or, better yet, use join:
df3 = df1.join(df2, on='id')
drop_duplicates() only removes rows that are completely identical.
what you're looking for is pd.merge().
pd.merge(df1, df2, on='id)

Pandas - Comparing two Dataframe and finding difference [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 years ago.
I have two Dataframes with some sales data as below:
df1:
prod_id,sale_date,new
101,2019-01-01,101_2019-01-01
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
101,2019-01-04,101_2019-01-04
df2:
prod_id,sale_date
101,2019-01-01,101_2019-01-01
101,2019-01-04,101_2019-01-04
I am trying to compare the above two Dataframe to find dates which are missing in df2 as compared to df1
I have tried to do the below:
final_1 = df1.merge(df2, on='new', how='outer')
This returns back the below Dataframe:
prod_id_x,sale_date_x,new,prod_id_y,sale_date_y
101,2019-01-01,101_2019-01-01,,
101,2019-01-02,101_2019-01-01,,
101,2019-01-03,101_2019-01-01,,
101,2019-01-04,101_2019-01-01,,
,,101_2019-01-01,101,2019-01-01
,,101_2019-01-04,101,2019-01-04
This is not letting me compare these 2 Dataframe.
Expected Output:
prod_id_x,sale_date_x,new
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
You can use drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)

Print sample set of columns from dataframe in Pandas? [duplicate]

This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 5 years ago.
How do you print (in the terminal) a subset of columns from a pandas dataframe?
I don't want to remove any columns from the dataframe; I just want to see a few columns in the terminal to get an idea of how the data is pulling through.
Right now, I have print(df2.head(10)) which prints the first 10 rows of the dataframe, but how to I choose a few columns to print? Can you choose columns by their indexed number and/or name?
print(df2[['col1', 'col2', 'col3']].head(10)) will select the top 10 rows from columns 'col1', 'col2', and 'col3' from the dataframe without modifying the dataframe.

Categories

Resources