Selecting rows from another boolean dataframe in Python - python

I have two dataframes, df1 and df2.
df1 contains integers and df2 contains booleans.
df1 and df2 are exactly the same size (like both are 10x10).
I would like to create a df3 that would take the data from df1 only if the value in the same location in df2 is True. All False would be replaced by Nan in df3
Thanks in advance!

Related

How to merge two different dataframe with different columns

As someone who is super new in merge/append on Python, I am trying to merge two different DF together.
DF1 has 2 columns with Text and ID columns and 100 rows
DF2 has 3 columns with Text, ID, and Match columns and has 20 rows
My goal is to combine the two DFs together so the "Match" column from DF2 can be merged into DF1.
The Match column is all "True" value, so when it gets merged over the other 80 rows on DF1 can be NaN and I can fix it later.
Thank you to everyone for the help and support!
Try a left merge using .merge(), like this:
DF_out = DF1.merge(DF2, on=['Text', 'ID'], how='left')

Copying dataframes columns into another dataframe

I have two dataframes df1 and df2 where df1 has 9 columns and df2 has 8 columns. I want to replace the first 8 columns of df1 with that of df2. How can this be done? I tried with iloc but not able to succeed.
Following are the files:
https://www.filehosting.org/file/details/842516/tpkA0t2vAtkrqKTb/df1.csv for df1
https://www.filehosting.org/file/details/842517/8XpizwCAX79p9rrZ/df2.csv for df2
import pandas as pd
df1=pd.DataFrame({0:[1,1,1,0,0,0],1:[0,1,0,0,0,0],2:[1,1,1,0,0,0],3:[0,0,0,2,3,4],4:[0,0,0,0,1,0],5:[0,0,0,2,1,2]})
df2=pd.DataFrame({6:[2,2,2,0,0,0],7:[0,2,0,0,0,0],8:[2,2,2,0,0,0],'d':[0,0,0,2,3,4],'e':[0,0,0,0,1,0],'f':[0,0,0,2,1,2]})
z=pd.concat([df1.iloc[:,3:],df2.iloc[:,0:3]],axis=1)
Here I have concatenated from 3rd column to last column of 1st dataframe and the first 3 column of 2nd dataframe. Similarly you concatenate whichever row or column you want to concatenate

Check Series label does not exist in a separate DataFrame

I'm iterating over two separate dataframes, where one dataframe is a subset of the other. I need to ensure that only the columns in the set (df1) which are not contained in the subset (df2) pass the conditional statement.
In this case, it would be comparing the Series object during each iteration in df1 to the dataframe, df2. Ideally I would like to compare just the labels associated with each column, not the values contained in the columns. My code below. Any help would be greatly appreciated!
for i in df1:
for j in df2:
if df1[i] is not in df2:
...do some stuff between df1[i] and df2[j]
To find out if the values of df1 are in df2 you can use:
df1.isin(df2)
To find all values in df1 that are not in df2 you can use:
df1[~df1.isin(df2)]
The values that are in df1 and df2 will be a nan in this case

Concatenate two dataframes with different number of rows and columns

I have two dataframes:
df1 shape = (101, 4825)
df2 shape = (97, 5818)
The first 4825 column names of df2 are the same as df1, and then increases by +1.
However, at the end of both dataframes, there is a column named Group_number.
I want to concatenate both the data frames so that the shape of the final dataframe is of shape (198,5818), i.e the final dataframe has all the rows of both the and NaN values for the df1 section (after the initial 4825 values).
I tried pd.concat([df1,df2]) but the column Group_number gets mixed up.
This could happening because of index problem as well. Use arg "ignore_index":
pd.concat([df1,df2], ignore_index=True)
or you can test by using "keys" argument so that you will know which observation is of which original data frame:
pd.concat([df1,df2], ignore_index=True, keys=['a', 'b'])

compare two seperate pandas dataframes row by row and return matching values

I have two pandas data frames df1 and df2. df1 contains 2 columns and 750 rows, df2 has 2 columns and 88 rows. I want to compare the two data frames and return the values from df1 that are present in df2 and store the matching values in a new column in df2.
Ex.
df1
A B
emp_table emp_id
emp_table emp_name
pay_table basic_amount
pay_table da_amount
df2
A B
emp_table emp_id
emp_table emp_department
pay_table da_amount
I want to add another column in df2 which has the matching values.
df2
A B
emp_table emp_id
pay_table da_amount
I want to perform one to many comparison of each element of df1 with each element of df2.
I think you need merge without parameter on, so all columns are joined:
df = pd.merge(df1, df2)
print (df)
A B
0 emp_table emp_id
1 pay_table da_amount

Categories

Resources