I am trying to create a new df with certain columns from 2 others:
The first called visas_df:
And the second called cpdf:
I only need the highlighted columns. But when I try this:
df_joined = pd.merge(cpdf,visas_df["visas"],on="date")
The error appearing is: KeyError: 'date'
I imagine this is due to how I created cpdf. It was a "bad dataset' so I did some fidgeting.Line 12 on the code snipped below might have something to do, but I am clueless...
I even renamed the date columns of both dfs as "date and checked that dtypes and number of rows are the same.
Any feedback would be much appreciated. Thanks!
df['visas'] in merge function is not a dataframe and its not contain date column. İf you want to df as a dataframe, you have to use double square bracket [[]] like this:
df_joined = pd.merge(cpdf,visas_df[["date","visas"]],on="date")
Related
Good day All,
I have two data frames that needs to be merged which is a little different to the ones I found so far and could not get it working. What I am currently getting, which I am sure is to do with the index, as dataframe 1 only has 1 record. I need to copy the contents of dataframe one into new columns of dataframe 2 for all rows.
Current problem highlighted in red
I have tried merge, append, reset index etc...
DF 1:
Dataframe 1
DF 2:
Dataframe 2
Output Requirement:
Required Output
Any suggestions would be highly appreciated
Update:
I got it to work using the below statements, is there a more dynamic way than specifying the column names?
mod_df['Type'] = mod_df['Type'].fillna(method="ffill")
mod_df['Date'] = mod_df['Date'].fillna(method="ffill")
mod_df['Version'] = mod_df['Version'].fillna(method="ffill")
Assuming you have a single row in df1, use a cross merge:
out = df2.merge(df1, how='cross')
Just a random q. If there's a dataframe, df, from the Boston Homes ds, and I'm trying to do EDA on a few of the columns, set to a variable feature_cols, which I could use afterwards to check for na, how would one go about this? I have the following, which is throwing an error:
This is what I was hoping to try to do after the above:
Any feedback would be greatly appreciated. Thanks in advance.
There are two problems in your pictures. First is a keyError, because if you want to access subset of columns of a dataframe, you need to pass the names of the columns in a list not a tuple, so the first line should be
feature_cols = df[['RM','ZN','B']]
However, this will return a dataframe with three columns. What you want to use in the for loop can not work with pandas. We usually iterate over rows, not columns, of a dataframe, you can use the one line:
df.isna().sum()
This will print all names of columns of the dataframe along with the count of the number of missing values in each column. Of course, if you want to check only a subset of columns, you can. replace df buy df[list_of_columns_names].
You need to store the names of the columns only in an array, to access multiple columns, for example
feature_cols = ['RM','ZN','B']
now accessing it as
x = df[feature_cols]
Now to iterate on columns of df, you can use
for column in df[feature_cols]:
print(df[column]) # or anything
As per your updated comment,. if your end goal is to see null counts only, you can achieve without looping., e.g
df[feature_cols].info(verbose=True,null_count=True)
I am trying to merge two dataframes on inner join and append the values and I was able to perform the join but for some reason the values are ordered properly in each column.
To explain more about this,
Please find the below screenshot where my first dataframe has the stored values of each column
My second dataframe has the string values which needs to be replaced with the values stored in my dataframe 1 above.
Below is the output that I have got but when you look at the values and compare with dataframe 2, they are not assigned properly, For eg:If you consider row 1 in dataframe 2, the Column 1 should have value(i.e. secind column in Dataframe 2) 1.896552 but in my outut I have something else.
Below is the code I worked with to achive the above result.
Joined_df_wna_test = pd.DataFrame()
for col in Unseen_cleaned_rmv_un_2:
Joined_data = pd.merge(Output_df_unseen, my_dataframe_categorical, on=col, how='inner')
Joined_df = pd.DataFrame(Joined_data)
Joined_df_wna_test[col]= Joined_df.Value
Joined_df_wna_test
Joined_df_wna_test
Could someone please help me in overcoming this issue this?
Found Answer for this
The how='left' part is what actually makes it keep the order
Joined_data = Unseen_cleaned_rmv_un.merge(my_dataframe_categorical, on=col, how='left')
hello stackoverflow community. I am having an issue while trying to do a simple merge between two dataframes which share the same date column. sorry I am new to python and perhaps the way I express myself is not very clear. I am working on the project related to stock prices calculation. the first data frame has date and closing prices columns, while the second one only has similar date column. my goal is to obtain a single date column which will have matching closing prices column next to it.
this is what I have done to merge two dataframes
inner_join = pd.merge(df.iloc[7:79],df1[['Ex-Date','FDX UN Equity']],on ='Ex-date',how ='inner')
inner_join
Ex-date refers to date column and FXD UN Equity refers to column with closing prices
I get this as a result:
) = self._get_merge_keys()
# validate the merge keys dtypes. We may need to coerce
# Check for duplicates
# work-around for merge_asof(right_index=True)
KeyError: 'Ex-date'```
Pandas read the format of date columns differently, so I made the same format for date columns in original excel file but it hasn't helped. I tried all sorts of various merges but it didn't work either.
anyone have any ideas what is going on?
The code would look like this
import pandas as pd
inner_join = pd.merge_asof(df, df1, on = 'Ex-date')
Change both column header name to the same lower case and merge again.. check Ex-Date.. the column name header should be the same before you merge and use how=‘left’
I am trying to use a data frame to regroup different kind of data.
I have a data frame with 3 columns :
one that I define and the index (used a groupby command)
one that regroups a parameter, say 'valeur1', for which I want a mean for these that have the same index (used a mean command after the group by)
the last column contains strings. There is only 1 string for each index but some cell might contain nan.
I am trying to get in the end a dataframe with the mean for 1 parameter depending on the index as well as the string that goes with the index (nan in the string column are not important). Here is a picture with an example or what I am trying to get : illustration . Main issue is that dataframe.mean does not work with string
The code I used so far is pretty basic :
dataRaw=pd.read_csv('file.csv', sep=';', encoding='latin-1')
data=dataRaw.groupby(index)
databis=data.mean();
Any suggestion would be greatly appreciated.
Thanks !
I think you need to group by multiple columns:
databis = dataRaw.groupby(['index', 'String']).mean()