How to drop dataframe rows not in another dataframe?

How to drop dataframe rows not in another dataframe? - python

I have a:
Dataframe df1 with columns A, B and C. A is the index.
Dataframe df2 with columns D, E and F. D is the index.
What’s an efficient way to drop from df1 all rows where B is not found in df2 (in D the index)?

If need drop some not exist values it is same like select only existing values. So is possible use:
You can filter df1.B by index from df2 in Series.isin:
df3 = df1[df1.B.isin(df2.index)]
Or by DataFrame.merge with left join:
df3 = df1.merge(df2[[]], left_on='B', right_index=True, how='left')

Related

Assign specific value from a column to specific number of rows

I would like to assign agent_code to specific number of rows in df2.
df1
df2
Thank you.
df3 (Output)

First make sure in both DataFrames is default index by DataFrame.reset_index with drop=True, then repeat agent_code, convert to default index and last use concat:
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
s = df1['agent_code'].repeat(df1['number']).reset_index(drop=True)
df3 = pd.concat([df2, s], axis=1)

pandas merge by excluding certain columns from merge

I want to merge two dataframes like:
df1.columns = A, B, C, E, ..., D
df2.columns = A, B, C, F, ..., D
If I merge them, it merges on all columns. Also since the number of columns is high I don't want to specify them in on. I prefer to exclude the columns which I don't want to be merged. How can I do that?
mdf = pd.merge(df1, df2, exclude D)
I expect the result be like:
mdf.columns = A, B, C, E, F ..., D_x, D_y

You mentioned you mentioned you don't want to use on "since the number of columns is much".
You could still use on this way even if there are a lot of columns:
mdf = pd.merge(df1, df2, on=[i for i in df1.columns if i != 'D'])
Or
By using pd.Index.difference
mdf = pd.merge(df1, df2, on=df1.columns.difference(['D']).tolist())

Another solution can be:
mdf = pd.merge(df1, df2, on= df1.columns.tolist().remove('D')

What about dropping the unwanted column after the merge?
You can use pandas.DataFrame.drop:
mdf = pd.merge(df1, df2).drop('D', axis=1)
or dropping before the merge:
mdf = pd.merge(df1.drop('D', axis=1), df2.drop('D', axis=1))

One solution is using intersection and then difference on df1 and df2 columns:
mdf = pd.merge(df1, df2, on=df1.columns.intersection(df2.columns).difference(['D']).tolist())
The other solution could be renaming columns you want to exclude from merge:
df2.rename(columns={"D":"D_y"}, inplace=True)
mdf = pd.merge(df1, df2)

How to drop column from the target data frame, but the column(s) are required for the join in merge

I have two dataframe df1, df2
df1.columns
['id','a','b']
df2.columns
['id','ab','cd','ab_test','mn_test']
Expected out column is ['id','a','b','ab_test','mn_test']
How to get the all the columns from df1, and columns which contain test in the column name
pseudocode > pd.merge(df1,df2,how='id')

You can merge and use filter one the second dataframe to keep the columns of interest:
df1.merge(df2.filter(regex=r'^id$|test'), on='id')
Or similarly through bitwise operations:
df1.merge(df2.loc[:,(df2.columns=='id')|df2.columns.str.contains('test')], on='id')
df1 = pd.DataFrame(columns=['id','a','b'])
df2 = pd.DataFrame(columns=['id','ab','cd','ab_test','mn_test'])
df1.merge(df2.filter(regex=r'^id$|test'), on='id').columns
# Index(['a', 'b', 'id', 'ab_test', 'mn_test'], dtype='object')

Concatenate Pandas dataframes with different set of columns

df1 has columns A, B, C, D, E
df2 has columns A, B, D
How to concatenate them in order to have a resulting dataframe that has rows of df1 and df2, values of A, B and D will be extended from df2 on df1, and columns C and E will be filled with NaN because df2 has no data for them?

There is a function called concat
pd.concat([df1,df2])
The input must be a iterable, so put them into a list ;)

How do I remove the rows identified in df2 from df1?

I have a dataframe called df1. I then create a filter like this:
df2 = df1.loc[(df1['unit'].str.contains('Ph'))]
How do I remove the rows identified in df2 from df1? thanks!

Use ~, not operand in boolean indexing:
df3 = df1.loc[~(df1['unit'].str.contains('Ph'))]
Now, df3 is df1 minus df2.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to drop dataframe rows not in another dataframe? - python

I have a: Dataframe df1 with columns A, B and C. A is the index. Dataframe df2 with columns D, E and F. D is the index. What’s an efficient way to drop from df1 all rows where B is not found in df2 (in D the index)?

If need drop some not exist values it is same like select only existing values. So is possible use: You can filter df1.B by index from df2 in Series.isin: df3 = df1[df1.B.isin(df2.index)] Or by DataFrame.merge with left join: df3 = df1.merge(df2[[]], left_on='B', right_index=True, how='left')

Related

Assign specific value from a column to specific number of rows

pandas merge by excluding certain columns from merge

How to drop column from the target data frame, but the column(s) are required for the join in merge

Concatenate Pandas dataframes with different set of columns

How do I remove the rows identified in df2 from df1?

Categories

Resources