i have 2 dataframes df1 & df2 as given below:
df1:
a
T11552
T11559
T11566
T11567
T11569
T11594
T11604
T11625
df2:
a b
T11552 T11555
T11560 T11559
T11566 T11562
T11568 T11565
T11569 T11560
T11590 T11594
T11604 T11610
T11621 T11625
T11633 T11631
T11635 T11634
T13149 T13140
I want to have a new dataframe df3 where i want to search the value of df1 in df2. if the value is present in df2, i want to add new column in df1 returning True/False as shown below.
df3:
a v
T11552 TRUE
T11559 TRUE
T11566 TRUE
T11567 FALSE
T11569 TRUE
T11594 TRUE
T11604 TRUE
T11625 TRUE
T11633 TRUE
T11634 TRUE
Use assign for new DataFrame with isin and converting all values to flatten array by ravel, for improve performance is possible check only unique values and also check by in1d:
df3 = df1.assign(v = lambda x: x['a'].isin(np.unique(df2.values.ravel())))
#alternative solution
#df3 = df1.assign(v = lambda x: np.in1d(x['a'], np.unique(df2[['a','b']].values.ravel())))
#if need specify columns in df2 for check
df3 = df1.assign(v = lambda x: x['a'].isin(np.unique(df2[['a','b']].values.ravel())))
print (df3)
a v
0 T11552 True
1 T11559 True
2 T11566 True
3 T11567 False
4 T11569 True
5 T11594 True
6 T11604 True
7 T11625 True
Try this:
df3 = df1[['a']].copy()
df3['v'] = df3['a'].isin(set(df2.values.ravel()))
The above code will:
Create a new dataframe using column 'a' from df1.
Create a Boolean column 'v' testing the existence of each value of column 'a' versus values in df2 via set and numpy.ravel.
Related
I'd like to assign the new column to my DataFrame base on condition - if row.id is one of the bad_cat value.
bad_cat = [71,84]
df = pd.DataFrame({'name' : ['a','b','c','d','e'], 'id' : [1,2,71,5,84]})
df['type'] = df[df.id in bad_cat]
Output:
name id type
a 1 False
b 2 False
c 71 True
d 5 False
e 84 True
It seems my code doesn't work - could you explain how to do it.
The most intuitive answer would be one provided by Quang Hoang using .isin method. This will create a mask resulting in a series of bool statements:
df['type'] = df['id'].isin(bad_cat)
The other approach could be to use index - this can be faster solution under some circumstances. After setting index to column that will be assessed against values provided in the list, you can use .loc for slicing and setting type to True for vlaues that match those on the list.
df.set_index('id', inplace=True)
df['type'] = False
df['type'].loc[bad_cat] = True
for both solutions output will be:
name type
id
1 a False
2 b False
71 c True
5 d False
84 e True
Note: that values in the column that serves as an index does not have to be unique.
For the following table:
Using Pandas - I would like achieve the desired_output column, that is TRUE when the value below the current cell i different - otherwise FALSE.
I have tried the following code - but error occurs.
df['desired_output']=df.two.apply(lambda x: True if df.iloc[int(x),1]==df.iloc[int(x+1),1] else False)
df['desired_output'] = df['city'].shift().bfill() != df['city']
Compare by Series.ne with Series.shifted values and first missing value is replaced by original value:
df = pd.DataFrame({'city':list('mmmssb')})
df['out'] = df['city'].ne(df['city'].shift(fill_value=df['city'].iat[0]))
print (df)
city out
0 m False
1 m False
2 m False
3 s True
4 s False
5 b True
For oldier pandas versions if no missing values in column city is used replace first missing value by Series.fillna:
df['out'] = df['city'].ne(df['city'].shift().fillna(df['city']))
I have a dictionary of dataframes (Di_1). Each dataframe has the same number of columns, column names, number of rows and row indexes. I also have a list of the names of the dataframes (dfs). I would like to compare the contents of one of the columns (A) in each dataframe with those of the last dataframe in the list to see whether they are the same. For example:
df_A = pd.DataFrame({'A': [1,0,1,0]})
df_B = pd.DataFrame({'A': [1,1,0,0]})
Di_1 = {'X': df_A, 'Y': df_B}
dfs = ['X','Y']
I tried:
for df in dfs:
Di_1[str(df)]['True'] = Di_1[str(df)]['A'] .equals(Di_1[str(dfs[-1])]['A'])
I got:
[0,0,0,0]
I would like to get:
[1,0,0,1]
My attempt is checking whether the whole column is the same but I would instead please like to get it to go through each dataframe row by row.
I think you make things too complicated here. You can
series_last = Di_1[dfs[-1]]['A']
for df in map(Di_1.get, dfs):
df['True'] = df['A'] == series_last
This will produce as result:
>>> df_A
A True
0 1 True
1 0 False
2 1 False
3 0 True
>>> df_B
A True
0 1 True
1 1 True
2 0 True
3 0 True
So each df_i has an extra column named 'True' (perhaps you better use a different name), that checks if for a specific row, the value is the same as the one in the series_last.
In case the dfs contains something else than strings, we can first convert these to strings:
series_last = Di_1[str(dfs[-1])]['A']
for df in map(Di_1.get, map(str, dfs)):
df['True'] = df['A'] == series_last
Create a list:
l=[Di_1[i] for i in dfs]
Then using isin() you can compare the first and last df
l[0].isin(l[-1]).astype(int)
A
0 1
1 0
2 0
3 1
I have a two data frame df1 (35k record) and df2(100k records). In df1['col1'] and df2['col3'] i have unique id's. I want to match df1['col1'] with df2['col3']. If they match, I want to update df1 with one more column say df1['Match'] with value true and if not match, update with False value. I want to map this TRUE and False value against Matching and non-matching record only.
I am using .isin()function, I am getting the correct match and not match count but not able to map them correctly.
Match = df1['col1'].isin(df2['col3'])
df1['match'] = Match
I have also used merge function using by passing the parameter how=rightbut did not get the results.
You can simply do as follows:
df1['Match'] = df1['col1'].isin(df2['col3'])
For instance:
import pandas as pd
data1 = [1,2,3,4,5]
data2 = [2,3,5]
df1 = pd.DataFrame(data1, columns=['a'])
df2 = pd.DataFrame(data2,columns=['c'])
print (df1)
print (df2)
df1['Match'] = df1['a'].isin(df2['c']) # if matches it returns True else False
print (df1)
Output:
a
0 1
1 2
2 3
3 4
4 5
c
0 2
1 3
2 5
a Match
0 1 False
1 2 True
2 3 True
3 4 False
4 5 True
Use df.loc indexing:
df1['Match'] = False
df1.loc[df1['col1'].isin(df2['col3']), 'Match'] = True
How to compare 2 columns of same dataframe and update result in another column, if its matches update as True else False.
df:
Col1 Col2 Result
1234569 1234569 TRUE
256132 453543 FALSE
DSDFDSF DSDFDSF TRUE
TRYTR FGFH FALSE
This returns a boolean series: df.col1==df.col2