This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 years ago.
I have two Dataframes with some sales data as below:
df1:
prod_id,sale_date,new
101,2019-01-01,101_2019-01-01
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
101,2019-01-04,101_2019-01-04
df2:
prod_id,sale_date
101,2019-01-01,101_2019-01-01
101,2019-01-04,101_2019-01-04
I am trying to compare the above two Dataframe to find dates which are missing in df2 as compared to df1
I have tried to do the below:
final_1 = df1.merge(df2, on='new', how='outer')
This returns back the below Dataframe:
prod_id_x,sale_date_x,new,prod_id_y,sale_date_y
101,2019-01-01,101_2019-01-01,,
101,2019-01-02,101_2019-01-01,,
101,2019-01-03,101_2019-01-01,,
101,2019-01-04,101_2019-01-01,,
,,101_2019-01-01,101,2019-01-01
,,101_2019-01-04,101,2019-01-04
This is not letting me compare these 2 Dataframe.
Expected Output:
prod_id_x,sale_date_x,new
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
You can use drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
Related
This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 1 year ago.
I have two DFs that I am trying to merge on the column 'conId'.The DFs have different number of rows and the only other overlapping column is 'delta'.
I am using pf.merge(greek,on='conId',how='left')
The resulting DF is giving me columns 'delta_x' and 'delta_y'
how can I merge these two columns into one column?
Thank you!
You can use
df['delta_x'] = df['detlt_x'].fillna(df['delta_y'])
then drop column if you want
df.drop(['delta_y'], axis=1)
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I have dataframe: table_revenue
how can I transpose the dataframe and have grouping by 'stations_id' to see final result as:
where values of cells is the price, aggregated by exact date (column) for specific 'station_id' (row)
It seems you need pivot_table():
output = input.pivot_table(index='station_id',columns='endAt',values='price',aggfunc='sum',fill_value=0)
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 3 years ago.
I have two dataframes.
The first dataframe is A.
And the second dataframe is B.
Basically both dataframes have AdId fields. First dataframe has unique AdIds per row but the second dataframe has multiple instances of a single AdId. I want to get all the information of that AdId to the second dataframe.
I am expecting the output as follows
I have tried the following code
B.join(A, on='AdId', how='left', lsuffix='_caller')
But this does not give the expected output.
Use pandas concat:
result = pd.concat([df1, df4], axis=1, sort=False)
More on merging with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#set-logic-on-the-other-axes
This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 years ago.
I want to merge duplicate rows by adding a new column 'count'
Final dataframe that I want
rows can be in any order
You can use:
df["count"] = 1
df = df.groupby(["user_id", "item_id", "total"])["count"].count().reset_index()
This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
pandas get rows which are NOT in other dataframe
(17 answers)
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I've 2 dataframes, df1 and df2, with an emails column (and other non important ones.)
I want to drop rows in df2 that contain emails that are already in df1.
How can I do that?
You can do something like this:
df_1[~df_1['email_column'].isin(df_2['email_column'].tolist())