Python Pandas Merge Key Error - python

Why does the following fail with KeyError 'NUM'?
result = pandas.merge(sdf_subset, dfgeom, how='inner', on=['ID', 'NUM'])
The column 'ID' exists in sdf_subset and 'NUM' exists in dfgeom. I have checked the datatype and both are Int64.
Any ideas?

# you need to use left_on and right_on if the joining key is different between the dataframes.
result = pandas.merge(sdf_subset, dfgeom, how='inner', left_on='ID', right_on='NUM')

Related

Python Pandas Join doesnt work unexpected argument

import pandas as pd
df1 = pd.read_csv("sdvsdvsvsd.csv")
df2 = pd.read_csv("dsvsdvdv.csv")
df3 = df1.join(df2, how='inner', left_on = 'TIME', right_on = 'TIME')
I created a joint but when I run it, I get a message"unexpected argument". I checked it multiple times and cant see any misstake.
beginner here, please help
Use pd.merge(df1, df2, how='inner, left_on ='TIME', right_on = 'TIME') instead.
.join doesn't have left_on or right_on
solved one column Name was missing(" " ) this symbols .thanks everyone

Receiving a TypeError when using the merge() method with Pandas

I'm trying to merge two data frames on a column with a int data type
df3 = df2.merge('df1', how = 'inner', on = 'ID')
But I receive this error
TypeError: Can only merge Series or DataFrame objects, a (class 'str') was passed
I do not understand what is causing this, so any help would be appreciated!
The way you have written is calling to merge df2 with 'df1' as a computer this looks like trying to merge a dataframe with the literal phrase 'df1', try removing the quotations and do just df1 as an object.
You need to pass the variable 'df1' reference directly, not as a string:
df3 = df2.merge(df1, how = 'inner', on = 'ID')
Alternatively you can pass both dataframes as a parameter:
df3 = pd.merge(df1, df2, how = 'inner', on = 'ID')

Is there any problem with pandas merge currently?

Using outer join to merge two tables. Let's say
df1 = ['productID', 'Name']
df2 = ['userID', 'productID', 'usage']
I tried to use outer join with merge function in pandas.
pd.merge(df1, df2[['userID','productID', 'usage']], on='productID', how = 'outer')
However, the error message I got is
'productID' is both an index level and a column label, which is ambiguous.
I googled this error message and saw a open [issue]: https://github.com/facebook/prophet/issues/891
Any solution to my problem?
Error means there is same index name like column productID:
#check it
print (df2.index.name)
Solution is remove/rename index name, e.g. by DataFrame.rename_axis:
pd.merge(df1, df2.rename_axis(None)[['userID','productID', 'usage']],
on='productID', how = 'outer')

How to merge two pandas dataframe using an OR condition

I have two dataframe and I'd like to join based on a couple of columns. However, my join logic has an 'OR' in it, e.g. I want to join based on columns ['A','B','C'] OR ['A','B','D']. I have the following code to join based on one set of columns but how I can add the second set of columns?
pd.merge(df1,df2, how='inner',left_on = ['A','B','C'], right_on = ['A','B','C'])
Try this, since left_on and right_on are the same just use on:
d_1 = pd.merge(df1,df2, how='inner', on = ['A','B','C'])
d_2 = pd.merge(df1,df2, how='inner', on = ['A','B','D'])
d_3 = pd.concat([d_1,d_2]).drop_duplicates()

Key error when joining dfs in Pandas

I have a dataframe with these columns:
df1:
Index(['cnpj', '#CNAE', 'Estado', 'Capital_Social', '#CNAEpai', '#CNAEvo',
'#CNAEbisavo', 'Porte'],
dtype='object')
I have another dataframe with these columns:
df2:
Index(['#CNAEpai', 'ROA_t12_Peers_CNAEpai', 'MgBruta_t12_Peers_CNAEpai',
'MgEBITDA_t12_Peers_CNAEpai', 'LiqCorrente_t12_Peers_CNAEpai',
'Crescimento_t12_Peers_CNAEpai', 'MgLucro_t12_Peers_CNAEpai',
'Custo/Receita_t12_Peers_CNAEpai', 'Passivo/EBITDA_t12_Peers_CNAEpai',
'ROE_t12_Peers_CNAEpai', 'RFinanceiro/Receita_t12_Peers_CNAEpai',
'cnpj_t12_Peers_CNAEpai', 'LiqGeral_t12_Peers_CNAEpai'],
dtype='object')
I'm trying to join them, using this line:
df1=df1.join(df2,on=['#CNAEpai'],how='left',rsuffix='_bbb')
But I'm getting this error:
KeyError: '#CNAEpai'
Since #CNAEpai is a column in both dfs that shouldn't be happening right?
What's going on?
As #root indicated, pd.DataFrame.join joins index-on-index or index-on-column, but not column-on-column.
To join on column(s), use pd.DataFrame.merge:
df1 = df1.merge(df2, on='#CNAEpai', how='left', rsuffix='_bbb')

Categories

Resources