Unable to join to dataframe in pandas - python

I have two df. The first df is a multiindex and the other one is typical single index.
Figure 1: Multiindex df
and
Figure 2: Single indexing
Upon join these two df, I got the following error
cannot join with no overlapping index names
I suspect, this error due to the index column name in the first df (Figure 1).
Even, swaping the index name and typical numeric value also does not help
Figure 2: Multiindex df
May I know how to address this error?
Thanks in advance for the time taken

You can convert first level in MultiIndex to column before merge:
df = (df1.reset_index(level=0)
.merge(df2, left_index=True, right_index=True)
.set_index('name', append=True)
.swaplevel(1, 0))
Or if use join:
df = df1.reset_index(level=0).join(df2).set_index('name', append=True).swaplevel(1, 0)

If you are trying to do a function such as df.rolling(window).cov()/df.rolling(window).var() where you are trying to basically merge two multi-index dataframes what happened to me was I had to specify a name to the index as it doesn't know they name of the index to match on which is why you are getting this error. If you are using something like yfin to get data you won't run into this issue because the index always defaults as 'Date'. Here is a simple one-liner to fix this:
df.index.rename('Date', inplace=True)

Related

Combining two dataframes with different rows, keeping contents of first dataframe on all rows

Good day All,
I have two data frames that needs to be merged which is a little different to the ones I found so far and could not get it working. What I am currently getting, which I am sure is to do with the index, as dataframe 1 only has 1 record. I need to copy the contents of dataframe one into new columns of dataframe 2 for all rows.
Current problem highlighted in red
I have tried merge, append, reset index etc...
DF 1:
Dataframe 1
DF 2:
Dataframe 2
Output Requirement:
Required Output
Any suggestions would be highly appreciated
Update:
I got it to work using the below statements, is there a more dynamic way than specifying the column names?
mod_df['Type'] = mod_df['Type'].fillna(method="ffill")
mod_df['Date'] = mod_df['Date'].fillna(method="ffill")
mod_df['Version'] = mod_df['Version'].fillna(method="ffill")
Assuming you have a single row in df1, use a cross merge:
out = df2.merge(df1, how='cross')

How should I filter one dataframe by entries from another one in pandas with isin?

I have two dataframes (df1, df2). The columns names and indices are the same (the difference in columns entries). Also, df2 has only 20 entries (which also existed in df1 as i said).
I want to filter df1 by df2 entries, but when i try to do it with isin but nothing happens.
df1.isin(df2) or df1.index.isin(df2.index)
Tell me please what I'm doing wrong and how should I do it..
First of all the isin function in pandas returns a Dataframe of booleans and not the result you want. So it makes sense that the cmds you used did not work.
I am possitive that hte following psot will help
pandas - filter dataframe by another dataframe by row elements
If you want to select the entries in df1 with an index that is also present in df2, you should be able to do it with:
df1.loc[df2.index]
or if you really want to use isin:
df1[df1.index.isin(df2.index)]

Inner Join two dataframes does not sort properly - python

I am trying to merge two dataframes on inner join and append the values and I was able to perform the join but for some reason the values are ordered properly in each column.
To explain more about this,
Please find the below screenshot where my first dataframe has the stored values of each column
My second dataframe has the string values which needs to be replaced with the values stored in my dataframe 1 above.
Below is the output that I have got but when you look at the values and compare with dataframe 2, they are not assigned properly, For eg:If you consider row 1 in dataframe 2, the Column 1 should have value(i.e. secind column in Dataframe 2) 1.896552 but in my outut I have something else.
Below is the code I worked with to achive the above result.
Joined_df_wna_test = pd.DataFrame()
for col in Unseen_cleaned_rmv_un_2:
Joined_data = pd.merge(Output_df_unseen, my_dataframe_categorical, on=col, how='inner')
Joined_df = pd.DataFrame(Joined_data)
Joined_df_wna_test[col]= Joined_df.Value
Joined_df_wna_test
Joined_df_wna_test
Could someone please help me in overcoming this issue this?
Found Answer for this
The how='left' part is what actually makes it keep the order
Joined_data = Unseen_cleaned_rmv_un.merge(my_dataframe_categorical, on=col, how='left')

Add a multiindex level Dataframe to another dataframe

I have the following sample dataframe:
df_temp = pd.DataFrame(np.arange(6).reshape(3,-1),
index=(0,1,2),
columns=pd.MultiIndex.from_tuples([('A', 'Salad'),('B','Burger')]))
I would like to put the column ('A','Salad') in another dataframe which might be empty or has this column already.
This is the case the df_output is empty or the column already exists in df_b.
df_output = pd.concat([df_output, df_temp], axis=1)
In case the column already exists, it just replaces it. However, in case df_output is empty, converts the multilevel index to a single line which is sth I don't want.
This is the case which df_output already has a column:
And how it should look like after the addition:
I am trying to use concat but the multiindex level of the columns is disappearing.
I managed to fix it with the following solution, although I believe it can be done better:
if len(df_output.columns):
df_output = pd.concat([df_output, df_temp], axis=1).sort_index(level=0, axis=1)
else:
df_output = df_temp

How can I join two dataframes with different dtypes?

I have a dataframe(df1) with index as a date range and no columns specified and another dataframe(df2) with float values in every column.
I tried joining a specific column from df2 to df1 using .join() method and ended up with all values as NaN in df1. What should I do to solve this?
It's unclear what you mean without any example of the data or their shape, and without more details about what kind of 'join' you're trying to do. It sounds like you are trying to concatenate dataframes without relying on a column or index level names to join on. That's what join or merge try to do, so if you don't have common values on the on parameter of the join, you'll end up with nans. If I'm correct and you just want a concatenation of dataframes, then you can use concat. I can't provide the code without more details, but it would look something like this:
new_df = pd.concat([df1, df2[['whatever_column_you_want_to_concatenate']]], axis=1)

Categories

Resources