I have two DataFrames, df1 and df2. In my code I used Pandas.concat method to find the differences between them.
df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
#Reads first and second sheet inside spreadsheet.
new_dataframe = pd.concat([df1,df2]).drop_duplicates(keep=False)
This works perfectly, however I want to know which rows are coming from df1, and which are coming from df2. to show this I want to add a column to new_dataframe, if it's from df1 to say "Removed" in the new column, and to say 'Added' if it's from df2. I can't seem to find any documentation on how to do this. Thanks in advance for any help.
Edit: In my current code it removed all columns which are identical in each DataFrame. The solution has to still remove the common rows.
Consider using pd.merge with indicator=True instead. This will create a new column named _merge that indicates which value came from which column. You can modify this to say Removed and Added
df1 = pd.DataFrame({'col1': [1,2,3,4,5]})
df2 = pd.DataFrame({'col1': [3,4,5,6,7]})
m = {'left_only': 'Removed', 'right_only': 'Added'}
new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
.query('_merge != "both"') \
.replace({'_merge': m})
Output:
col1 _merge
0 1 Removed
1 2 Removed
5 6 Added
6 7 Added
I have two dataframes, df1 and df2.
df1 contains integers and df2 contains booleans.
df1 and df2 are exactly the same size (like both are 10x10).
I would like to create a df3 that would take the data from df1 only if the value in the same location in df2 is True. All False would be replaced by Nan in df3
Thanks in advance!
I have two dataframes df1 and df2 where df1 has 9 columns and df2 has 8 columns. I want to replace the first 8 columns of df1 with that of df2. How can this be done? I tried with iloc but not able to succeed.
Following are the files:
https://www.filehosting.org/file/details/842516/tpkA0t2vAtkrqKTb/df1.csv for df1
https://www.filehosting.org/file/details/842517/8XpizwCAX79p9rrZ/df2.csv for df2
import pandas as pd
df1=pd.DataFrame({0:[1,1,1,0,0,0],1:[0,1,0,0,0,0],2:[1,1,1,0,0,0],3:[0,0,0,2,3,4],4:[0,0,0,0,1,0],5:[0,0,0,2,1,2]})
df2=pd.DataFrame({6:[2,2,2,0,0,0],7:[0,2,0,0,0,0],8:[2,2,2,0,0,0],'d':[0,0,0,2,3,4],'e':[0,0,0,0,1,0],'f':[0,0,0,2,1,2]})
z=pd.concat([df1.iloc[:,3:],df2.iloc[:,0:3]],axis=1)
Here I have concatenated from 3rd column to last column of 1st dataframe and the first 3 column of 2nd dataframe. Similarly you concatenate whichever row or column you want to concatenate
I'm iterating over two separate dataframes, where one dataframe is a subset of the other. I need to ensure that only the columns in the set (df1) which are not contained in the subset (df2) pass the conditional statement.
In this case, it would be comparing the Series object during each iteration in df1 to the dataframe, df2. Ideally I would like to compare just the labels associated with each column, not the values contained in the columns. My code below. Any help would be greatly appreciated!
for i in df1:
for j in df2:
if df1[i] is not in df2:
...do some stuff between df1[i] and df2[j]
To find out if the values of df1 are in df2 you can use:
df1.isin(df2)
To find all values in df1 that are not in df2 you can use:
df1[~df1.isin(df2)]
The values that are in df1 and df2 will be a nan in this case
I am trying to join 2 dataframes by same index as the first column in both dataframes using python. The code is below:
combined_data = pd.merge(df1, df2, right_index=True, left_index=True)
df1 has columns:
colA, colB
And df2 has:
colA, colC, colD, colE
the output is:
colA, colB, colC, colD, colE
with no data below it. It just gives the joined columns
NOTE: The df has about 4800 rows and df2 has 4600 rows
Could large data be a problem. Or there is something else wrong?
The problem was due to a different data type for the same common column in two dataframes.
this can be resolved by:
df1['colA'] = df1['colA'].astype(int)
df2['colA'] = df2['colA'].astype(int)#to ensure both are int type.
after this the code works like charm!.