pandas index and columns in same line - python

I have multindex with two columns in a dataframe as follow
import numpy as np
import pandas as pd
df = pd.DataFrame(np.array([[1, 2, 3, 0], [1, 5, 6, 0], [2, 5, 6, 0]]),columns=['a','b','c', 'd'])
df
df.set_index(['a', 'b'],inplace=True)
df
That produces index in a different line of column header
c d
a b
1 2 3 0
5 6 0
2 5 6 0
How index and column can be put in the same line without missing index columns?
Desired output, index and columns in the same line
a b c d
1 2 3 0
5 6 0
2 5 6 0

its true,
df.set_index(['a', 'b'],inplace=True).
then you used "print", printing change view(Change is only in vision.).
if you save dataframe, its save true.

Related

How to create multiple columns in Pandas Dataframe?

I have data as you can see in the terminal. I need it to be converted to the Excel sheet format as you can see in the Excel sheet file by creating multi-levels in columns.
I researched this and reached many different things but cannot achieve my goal then, I reached "transpose", and it gave me the shape that I need but unfortunately that it did reshape from a column to a row instead where I got the wrong data ordering.
Current result:
Desired result:
What can I try next?
You can use pivot() function and reorder multi-column levels.
Before that, index/group data for repeated iterations/rounds:
data=[
(2,0,0,1),
(10,2,5,3),
(2,0,0,0),
(10,1,1,1),
(2,0,0,0),
(10,1,2,1),
]
columns = ["player_number", "cel1", "cel2", "cel3"]
df = pd.DataFrame(data=data, columns=columns)
df_nbr_plr = df[["player_number"]].groupby("player_number").agg(cnt=("player_number","count"))
df["round"] = list(itertools.chain.from_iterable(itertools.repeat(x, df_nbr_plr.shape[0]) for x in range(df_nbr_plr.iloc[0,0])))
[Out]:
player_number cel1 cel2 cel3 round
0 2 0 0 1 0
1 10 2 5 3 0
2 2 0 0 0 1
3 10 1 1 1 1
4 2 0 0 0 2
5 10 1 2 1 2
Now, pivot and reorder the colums levels:
df = df.pivot(index="round", columns="player_number").reorder_levels([1,0], axis=1).sort_index(axis=1)
[Out]:
player_number 2 10
cel1 cel2 cel3 cel1 cel2 cel3
round
0 0 0 1 2 5 3
1 0 0 0 1 1 1
2 0 0 0 1 2 1
This can be done with unstack after setting player__number as index. You have to reorder the Multiindex columns and fill missing values/delete duplicates though:
import pandas as pd
data = {"player__number": [2, 10 , 2, 10, 2, 10],
"cel1": [0, 2, 0, 1, 0, 1],
"cel2": [0, 5, 0, 1, 0, 2],
"cel3": [1, 3, 0, 1, 0, 1],
}
df = pd.DataFrame(data).set_index('player__number', append=True)
df = df.unstack('player__number').reorder_levels([1, 0], axis=1).sort_index(axis=1) # unstacking, reordering and sorting columns
df = df.ffill().iloc[1::2].reset_index(drop=True) # filling values and keeping only every two rows
df.to_excel('output.xlsx')
Output:

Pandas compare columns and drop rows based on values in another column

Is there a way to drop values in one column based on comparison with another column? Assuming the columns are of equal length
For example, iterate through each row and drop values in col1 greater than values in col2? Something like this:
df['col1'].drop.where(df['col1']>=df['col2']
Pandas compare columns and drop rows based on values in another column
import pandas as pd
d = {
'1': [1, 2, 3, 4, 5],
'2': [2, 4, 1, 6, 3]
}
df = pd.DataFrame(d)
print(df)
dfd = df.drop(df[(df['1'] >= df['2'])].index)
print('update')
print(dfd)
Output
1 2
0 1 2
1 2 4
2 3 1
3 4 6
4 5 3
update
1 2
0 1 2
1 2 4
3 4 6

Delete the rows with all value zero and keep one specific column in dataframe

I have a dataframe, I want to delete the rows with all zero. However, the first column is the id and I want to keep that column. I check with
df = df[(df.T != 0).any()]
however, it delete all the column
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['id'] = [ 'a', 'b', 5, 'd' ]
df['b'] = [ 0, 9, 0, 2]
df['c'] = [ 0, 2, 0, 2]
df['d'] = [ 0, 7, 0, 5]
Here is the new DataFrame which I want.
You could create a boolean mask that returns True if none of the non-"id" column values are 0 for each row, False otherwise:
out = df[df.drop(columns='id').ne(0).all(axis=1)]
Output:
id b c d
1 b 9 2 7
3 d 2 2 5
Try with sum all 0 if the result less than 3 then we keep it
df[df.eq(0).sum(1)<3]
Out[438]:
id b c d
1 b 9 2 7
3 d 2 2 5

Keep/select columns with the n highest values in last row of a Pandas dataframe

So I have a dataframe as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 3, 6], [7, 2, 9]]),
columns=['a', 'b', 'c'])
df
Output:
a
b
c
1
2
3
4
3
6
7
2
9
I want to select or keep the two columns, with the highest values in the last row. What is the best way to approach?
So in fact I just want to select or keep column 'a' due to value 7 and column 'c' due to value 9.
Try:
df = df[df.iloc[-1].nlargest(2).index]
Output:
c a
0 3 1
1 6 4
2 9 7
If you want to keep original column sequence as well, you can use Index.intersection() together with .nlargest(), as follows:
df[df.columns.intersection(df.iloc[-1].nlargest(2).index, sort=False)]
Result:
a c
0 1 3
1 4 6
2 7 9

Pandas merge duplicate DataFrame columns preserving column names

How can I merge duplicate DataFrame columns and also keep all original column names?
e.g. If I have the DataFrame
df = pd.DataFrame({"col1" : [0, 0, 1, 2, 5, 3, 7],
"col2" : [0, 1, 2, 3, 3, 3, 4],
"col3" : [0, 1, 2, 3, 3, 3, 4]})
I can remove the duplicate columns (yes the transpose is slow for large DataFrames) with
df.T.drop_duplicates().T
but this only preserves one column name per unique column
col1 col2
0 0 0
1 0 1
2 1 2
3 2 3
4 5 3
5 3 3
6 7 4
How can I keep the information on which columns were merged? e.g. something like
[col1] [col2, col3]
0 0 0
1 0 1
2 1 2
3 2 3
4 5 3
5 3 3
6 7 4
Thanks!
# group columns by their values
grouped_columns = df.groupby(list(df.values), axis=1).apply(lambda g: g.columns.tolist())
# pick one column from each group of the columns
unique_df = df.loc[:, grouped_columns.str[0]]
# make a new column name for each group, don't think the list can work as a column name, you need to join them
unique_df.columns = grouped_columns.apply("-".join)
unique_df
I also used T and tuple to groupby
def f(x):
d = x.iloc[[0]]
d.index = ['-'.join(x.index.tolist())]
return d
df.T.groupby(df.apply(tuple), group_keys=False).apply(f).T

Categories

Resources