This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 5 months ago.
df
df1
dfsum
using df-column'code', i want to reference to df1 and return column 'title' & 'cu' values to dfsum
if both df have the same size the you can iterate just like a regular matrix
# go through the rows
for row in range(total_rows):
# go through the columns
for column in range(total_columns):
#make the condition if they match
if df[row][column] == df1[row][column]:
# now just assign the value from df1 to df
df[row][column] = df1[row][column]
i hope this solves your issue :)
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 3 years ago.
I have two dataframes.
The first dataframe is A.
And the second dataframe is B.
Basically both dataframes have AdId fields. First dataframe has unique AdIds per row but the second dataframe has multiple instances of a single AdId. I want to get all the information of that AdId to the second dataframe.
I am expecting the output as follows
I have tried the following code
B.join(A, on='AdId', how='left', lsuffix='_caller')
But this does not give the expected output.
Use pandas concat:
result = pd.concat([df1, df4], axis=1, sort=False)
More on merging with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#set-logic-on-the-other-axes
This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 years ago.
I want to merge duplicate rows by adding a new column 'count'
Final dataframe that I want
rows can be in any order
You can use:
df["count"] = 1
df = df.groupby(["user_id", "item_id", "total"])["count"].count().reset_index()
This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 years ago.
I have two Dataframes with some sales data as below:
df1:
prod_id,sale_date,new
101,2019-01-01,101_2019-01-01
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
101,2019-01-04,101_2019-01-04
df2:
prod_id,sale_date
101,2019-01-01,101_2019-01-01
101,2019-01-04,101_2019-01-04
I am trying to compare the above two Dataframe to find dates which are missing in df2 as compared to df1
I have tried to do the below:
final_1 = df1.merge(df2, on='new', how='outer')
This returns back the below Dataframe:
prod_id_x,sale_date_x,new,prod_id_y,sale_date_y
101,2019-01-01,101_2019-01-01,,
101,2019-01-02,101_2019-01-01,,
101,2019-01-03,101_2019-01-01,,
101,2019-01-04,101_2019-01-01,,
,,101_2019-01-01,101,2019-01-01
,,101_2019-01-04,101,2019-01-04
This is not letting me compare these 2 Dataframe.
Expected Output:
prod_id_x,sale_date_x,new
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
You can use drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 5 years ago.
How do you print (in the terminal) a subset of columns from a pandas dataframe?
I don't want to remove any columns from the dataframe; I just want to see a few columns in the terminal to get an idea of how the data is pulling through.
Right now, I have print(df2.head(10)) which prints the first 10 rows of the dataframe, but how to I choose a few columns to print? Can you choose columns by their indexed number and/or name?
print(df2[['col1', 'col2', 'col3']].head(10)) will select the top 10 rows from columns 'col1', 'col2', and 'col3' from the dataframe without modifying the dataframe.