import pandas as pd
df1 = pd.read_csv("koko.csv", sep=";", header=0)
df1['Hour'] = df1['Hour'].astype(str)
df2 = pd.read_csv("wowwo", sep=";", header=0)
df2['Hour'] = df2['Hour'].astype(str)
df3 = df1.join(df2, how='inner', on='Hour')`
When i use for the Astype (str) i get the following message:
You are trying to merge on object and int64 columns.
When i use for the Astype (int) i get the following message:
invalid literal for int() with base 10: 'X253252352552'
how can i fix this?
Answer from different Thread:
"I had the same problem. The reason for why merge works on strings but join doesn't can be found in the answer by #MatthiasFripp here: link. Basically df1.join(df2) always merges via the index of df2 whereas df1.merge(df2) will merge on the column. So basically we were trying to merge based off a string and an integer, even though both columns were strings"
In my case both columns are strings, but the content of the Columns are a combination of strings and Integer. So thats why it doesnt work.
It should've been df1.merge(df2, on='Hour')
Related
I am converting data types of dataframe df2 same as df1 by using below code but it is giving me this error.
code:
df2 = df2.astype(df1.dtypes.to_dict())
Error:
invalid literal for int() with base 10 error in pandas: "0.75" in pandas
Is there any general solution to fix this line of code?
I tried converting both the data frames values to string but didn't work.
df2.astype(str)
df2 = df2.astype(df1.astype(str).dtypes.to_dict())
I don't know exact situation of your dataset because there is no example.
try this one
cols = df1.select_dtypes('number').columns
df2[cols] = df2[cols].apply(pd.to_numeric, errors='coerce')
df2 = df2.astype(df1.dtypes)
This one might be perfect if you just wanna mirror data type of every column in df1 to df2.
for x in df1.columns:
df2[x]=df2[x].astype(df1[x].dtypes.name)
0.75 is not an integer. Therefore it breaks.
If you want to transform 0.75 to 1, you can round the values and then transform them into integers.
But I guess you should firstly check if that's really what you want to do.
I'm trying to merge two data frames on a column with a int data type
df3 = df2.merge('df1', how = 'inner', on = 'ID')
But I receive this error
TypeError: Can only merge Series or DataFrame objects, a (class 'str') was passed
I do not understand what is causing this, so any help would be appreciated!
The way you have written is calling to merge df2 with 'df1' as a computer this looks like trying to merge a dataframe with the literal phrase 'df1', try removing the quotations and do just df1 as an object.
You need to pass the variable 'df1' reference directly, not as a string:
df3 = df2.merge(df1, how = 'inner', on = 'ID')
Alternatively you can pass both dataframes as a parameter:
df3 = pd.merge(df1, df2, how = 'inner', on = 'ID')
I have two dataframes in Azure Databricks. Both are of type: pyspark.sql.dataframe.DataFrame
The number of rows are the same; indexes are the same. I thought one of these code snippets, below, would do the job.
First Attempt:
result = pd.concat([df1, df2], axis=1)
Error Message: TypeError: cannot concatenate object of type "<class 'pyspark.sql.dataframe.DataFrame'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Second Attempt:
result = pd.merge(df1, df2, left_index=True, right_index=True)
Error Message: TypeError: Can only merge Series or DataFrame objects, a <class 'pyspark.sql.dataframe.DataFrame'> was passed
I ended up converting the two objects to pandas dataframes and then did the merge using the technique I know how to use.
Step #1:
df1= df1.select("*").toPandas()
df2= df2.select("*").toPandas()
Step #2:
result = pd.concat([df1, df2], axis=1)
Done!
I faced similar issue when combining two dataframes of same columns.
df = pd.concat([df, resultant_df], ignore_index=True)
TypeError: cannot concatenate object of type '<class 'pyspark.sql.dataframe.DataFrame'>'; only Series and DataFrame objs are valid
Then I tried join(), but it appends columns multiple times and returns empty dataframe.
df.join(resultant_df)
After that I used union(), gets the exact result.
df = df.union(resultant_df)
df.show()
It works fine in my case.
Quick question:
I have the following situation (table):
Imported data frame
Now what I would like to achieve is the following (or something in those lines, it does not have to be exactly that)
Goal
I do not want the following columns so I drop them
data.drop(data.columns[[0,5,6]], axis=1,inplace=True)
What I assumed is that the following line of code could solve it, but I am missing something?
pivoted = data.pivot(index=["Intentional homicides and other crimes","Unnamed: 2"],columns='Unnamed: 3', values='Unnamed: 4')
produces
ValueError: Length of passed values is 3395, index implies 2
Difference to the 8 question is that I do not want any aggregation functions, just to leave values as is.
Data can be found at: Data
The problem with the method pandas.DataFrame.pivot is that it does not handle duplicate values in the index. One way to solve this is to use the function pandas.pivot_table instead.
df = pd.read_csv('Crimes_UN_data.csv', skiprows=[0], encoding='latin1')
cols = list(df.columns)
cols[1] = 'Region'
df.columns = cols
pivoted = pd.pivot_table(df, values='Value', index=['Region', 'Year'], columns='Series', aggfunc=sum)
It should not sum anything, despite the aggfunc argument, but it was throwing pandas.core.base.DataError: No numeric types to aggregate if the argument was not provided.
I'm trying to execute a merge with pandas. The two files have a common key ("KEY_PLA") which I'm trying to use with a left join. But unfortunately, all columns which are transferred from the second file to the first file have NaN values.
Here is what I have done so far:
df_1 = pd.read_excel(path1, skiprows=1)
df_2 = pd.read_excel(path2, skiprows=1)
df_1.columns = ["Index", "KEY", "KEY_PLA", "INFO1", "INFO2"]
df_2.columns = ["Index", "KEY_PLA", "INFO4"]
df_1.drop(["Index"], axis=1, inplace=True)
df_2.drop(["Index"], axis=1, inplace=True)
# Merge all dataframes
df_merge = pd.DataFrame()
df_merge = df_1.merge(df_2, left_on="KEY_PLA", right_on="KEY_PLA", how="left")
print(df_merge)
This is the result:
Here are the excel files:
Excel1
Excel2
What is wrong with the code? I also checked the types and even converted the columns in strings. But nothing works.
I think problem is different types of joined columns KEY_PLA, obviously one is integer and another strings.
Solution is cast to same, e.g. to ints:
print (df_1['KEY_PLA'].dtype)
object
print (df_2['KEY_PLA'].dtype)
int64
df_1['KEY_PLA'] = df_1['KEY_PLA'].astype(int)