Add a multiindex level Dataframe to another dataframe - python

I have the following sample dataframe:
df_temp = pd.DataFrame(np.arange(6).reshape(3,-1),
index=(0,1,2),
columns=pd.MultiIndex.from_tuples([('A', 'Salad'),('B','Burger')]))
I would like to put the column ('A','Salad') in another dataframe which might be empty or has this column already.
This is the case the df_output is empty or the column already exists in df_b.
df_output = pd.concat([df_output, df_temp], axis=1)
In case the column already exists, it just replaces it. However, in case df_output is empty, converts the multilevel index to a single line which is sth I don't want.
This is the case which df_output already has a column:
And how it should look like after the addition:
I am trying to use concat but the multiindex level of the columns is disappearing.

I managed to fix it with the following solution, although I believe it can be done better:
if len(df_output.columns):
df_output = pd.concat([df_output, df_temp], axis=1).sort_index(level=0, axis=1)
else:
df_output = df_temp

Related

Pandas: Sort Dataframe is Column Value Exists in another Dataframe

I have a database which has two columns with unique numbers. This is my reference dataframe (df_reference). In another dataframe (df_data) I want to get the rows of this dataframe of which a column values exist in this reference dataframe. I tried stuff like:
df_new = df_data[df_data['ID'].isin(df_reference)]
However, like this I can't get any results. What am I doing wrong here?
From what I see, you are passing the whole dataframe in .isin() method.
Try:
df_new = df_data[df_data['ID'].isin(df_reference['ID'])]
Convert the ID column to the index of the df_data data frame. Then you could do
matching_index = df_reference['ID']
df_new = df_data.loc[matching_index, :]
This should solve the issue.

Insert a dataframe between dataframes via the nested append method

The following code works fine to insert one dataframe underneath the other using a nesting of the append method.
for sheet_name, df in Input_Data.items():
df1 = df[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
df=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
However I need to add additional dataframes(one row, one column) in between df.loc[i] and as a first step, I tried to insert a dataframe at the top of df.loc[0] via
df=df_1st.append(df,ignore_index=True)
Which yields the following error cannot reindex from a duplicate axis
It seems my dataframe df has duplicate indices. Not sure how to proceed. Perhaps the nested method is not best approach?

Unable to join to dataframe in pandas

I have two df. The first df is a multiindex and the other one is typical single index.
Figure 1: Multiindex df
and
Figure 2: Single indexing
Upon join these two df, I got the following error
cannot join with no overlapping index names
I suspect, this error due to the index column name in the first df (Figure 1).
Even, swaping the index name and typical numeric value also does not help
Figure 2: Multiindex df
May I know how to address this error?
Thanks in advance for the time taken
You can convert first level in MultiIndex to column before merge:
df = (df1.reset_index(level=0)
.merge(df2, left_index=True, right_index=True)
.set_index('name', append=True)
.swaplevel(1, 0))
Or if use join:
df = df1.reset_index(level=0).join(df2).set_index('name', append=True).swaplevel(1, 0)
If you are trying to do a function such as df.rolling(window).cov()/df.rolling(window).var() where you are trying to basically merge two multi-index dataframes what happened to me was I had to specify a name to the index as it doesn't know they name of the index to match on which is why you are getting this error. If you are using something like yfin to get data you won't run into this issue because the index always defaults as 'Date'. Here is a simple one-liner to fix this:
df.index.rename('Date', inplace=True)

Pandas concat columns

I have two df-s:
I want to concatenate along the columns, e.g. get a 1000x61118 DataFrame. so I'm doing:
df_full = pd.concat([df_dev, df_temp2], axis=1)
df_full
This, however, yields a 2000x61118 df, and fills everything with NaNs... And I have no idea why. What could cause this behaviour?
Create default index values by DataFrame.reset_index with drop=True for correct align both DataFrames:
df_full = pd.concat([df_dev.reset_index(drop=True), df_temp2.reset_index(drop=True)], axis=1)

Replace values in column based on condition, then return dataframe

I'd like to replace some values in the first row of a dataframe by a dummy.
df[[0]].replace(["x"], ["dummy"])
The problem here is that the values in the first column are replaced, but not as part of the dataframe.
print(df)
yields the dataframe with the original data in column 1. I've tried
df[(df[[0]].replace(["x"], ["dummy"]))]
which doesn't work either..
replace returns a copy of the data by default, so you need to either overwrite the df by self-assign or pass inplace=True:
df[[0]].replace(["x"], ["dummy"], inplace=True)
or
df[0] = df[[0]].replace(["x"], ["dummy"])
see the docs

Categories

Resources