Formatting Pandas dataframes to highlight column headers and remove blanks - python

The dataframes I have created have the column headers on different rows with the columns I have included in the groupby statement being on a lower row than the others. How do I get all the column headers to be on the same row? I've tried the below 2 links and neither works.
concise way of flattening multiindex columns
After groupby, how to flatten column headers?
here is an example of a dataframe i created from another one using groupby
product_splits = dma_fees.groupby(['TRADEABLE_INSTR_NAME','SIG_CURRENCY_CODE']).sum()
product_splits = product_splits.drop('NUMBER_OF_LOTS',axis=1)
product_splits = product_splits.sort_values(by=['DMA_FEE_SUBTOTAL'],ascending=False)
product_splits = product_splits.round({'DMA_FEE_SUBTOTAL': 0}).astype(int)
and here is a picture of the dataframe it outputs and you can see dma_fee_subtotal is at a higher row / level than the groupby columns. How do I get these all on the same row?

Related

Transpose column/row, change column name and reset index

I have a Pandas DF and I need to:
Transpose my columns to rows,
Transform these rows to indexes,
Set the actual columns as titles for each columns (and not as part of rows)
How can I do that?
Here is my DF before the transpostion:
Here is my Df after my failed transposition:
After transposing, use:
df.columns = df.iloc[0]
to set column headers to the first row.
Then use the 'set_axis()' function to set indices for your rows. An explanation for this function is linked
here

Finding first repeated consecutive entries in pandas dataframe

I have a dataframe of two columns Stock and DueDate, where I need to select first row from the repeated consecutive entries based on stock column.
df:
I am expecting output like below,
Expected output:
My Approach
The approach I tried to use is to first list out what all rows repeating based on stock column by creating a new column repeated_yes and then subset the first row only if any rows are repeating more than twice.
I have used the below line of code to create new column "repeated_yes",
ss = df.Stock.ne(df.Stock.shift())
df['repeated_yes'] = ss.groupby(ss.cumsum()).cumcount() + 1
so the new updated dataframe looks like this,
df_new
But I am stuck on subsetting only row number 3 and 8 inorder to attain the result. If there are any other effective approach it would be helpful.
Edited:
Forgot to include the actual full question,
If there are any other rows below the last row in the dataframe df it should not display any output.
Chain another mask created by Series.duplicated with keep=False by & for bitwise AND and filter in boolean indexing:
ss = df.Stock.ne(df.Stock.shift())
ss1 = ss.cumsum().duplicated(keep=False)
df = df[ss & ss1]

Un-merge the column into two columns using pandas python

I have a dataframe generated using pivot table which looks like this
I want to unmerge the column Score column and it should be like below picture , I want to split the column Score in to two columns.
I tried this code but it split into two rows instead of column
final_df = final_df.apply(lambda x: x.str.split(' ').explode())

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

How can I groupby and aggregate pandas dataframe with many columns

I am working on a pandas dataframe with 168 columns. First three columns contain name of the country, latitude and longtitude. Rest of the columns contain numerical data. Each row represents a country but for some countries there are multiple rows. I need to aggregate those rows by summing. I can aggregate first three columns with following code:
df = df.groupby('Country', as_index=False).agg({'Lat':'first','Long':'first'})
However, I couldn't find a way to include in that code remaining 165 columns without explicitly writing all the column names. In addition, column names represent dates and are named like 5/27/20,5/28/20,5/29/20, etc. So I need to keep the column names.
How can I do that? Thanks.
Maybe you can generate the dictionary from the column names:
df = df.groupby('Country', as_index=False).agg({c: 'first' for c in df.columns})

Categories

Resources