pandas dataframe reshape after pivot - python

The pivot code:
result = pandas.pivot_table(result, values=['value'], index=['index'], columns=['columns'], fill_value=0)
The result:
value value value
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
I tried:
result.columns = result.columns.get_level_values(1)
Then I got this:
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Actually what I would like is this one:
index col1 col2 col3
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Is there anyway to achieve this? Help really is appreciated. Thank you in advance.

You need remove index name by rename_axis (new in pandas 0.18.0):
df = df.rename_axis(None)
If need also remove columns name, use:
df = df.rename_axis(None, axis=1)
If use older version of pandas, use:
df.columns.name = None
df.index.name = None
Sample (if remove [] from pivot_table, you remove Multiindex from columns):
print (result)
index columns value
0 1 Toys 5
1 2 Toys 6
2 2 Cars 7
3 1 Toys 2
4 1 Cars 9
print (pd.pivot_table(result, index='index',columns='columns',values='value', fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
Cars Toys
1 9 3.5
2 7 6.0
If use [], get:
result = pd.pivot_table(result, values=['value'], index=['index'], columns=['columns'], fill_value=0)
.rename_axis(None)
.rename_axis((None,None), axis=1)
print (result)
value
Cars Toys
1 9 3.5
2 7 6.0

Consider this dataframe:
results = pd.DataFrame(
[
[14, 1, 1],
[2, 0, 1],
[6, 0, 0]
],
pd.Index(['idx1', 'idx2', 'idx3'], name='index'),
pd.MultiIndex.from_product([['value'], ['col1', 'col2', 'col3']], names=[None, 'columns'])
)
print results
value
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Then all you need is:
print results.value.rename_axis(None, 1) # <---- Solution
col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0

Related

Sort dataframe based on minimum value of two columns

Let's assume I have the following dataframe:
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)
I want this dataframe sorted by col1 and col2 on the minimum value. The order of the indexes should be 2, 0, 1, 3.
I tried this with df.sort_values(by=['col2', 'col1']), but than it takes the minimum of col1 first and then of col2. Is there anyway to order by taking the minimum of two columns?
Using numpy.lexsort:
order = np.lexsort(np.sort(df[['col1', 'col2']])[:, ::-1].T)
out = df.iloc[order]
Output:
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
Note that you can easily handle any number of columns:
df.iloc[np.lexsort(np.sort(df[['col1', 'col2', 'col3']])[:, ::-1].T)]
col1 col2 col3 outcome
1 2 2 0 0
2 3 1 1 1
0 1 4 1 1
3 4 3 1 0
One way (not the most efficient):
idx = df[['col2', 'col1']].apply(lambda x: tuple(sorted(x)), axis=1).sort_values().index
Output:
>>> df.loc[idx]
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
>>> idx
Int64Index([2, 0, 1, 3], dtype='int64')
you can decorate-sort-undecorate where decoration is minimal and other (i.e., maximal) values per row:
cols = ["col1", "col2"]
(df.assign(_min=df[cols].min(axis=1), _other=df[cols].max(axis=1))
.sort_values(["_min", "_other"])
.drop(columns=["_min", "_other"]))
to get
col1 col2 col3 outcome
2 3 1 1 1
0 1 4 1 1
1 2 2 0 0
3 4 3 1 0
I would compute min(col1, col2) as new column and then sort by it
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)
df['colmin'] = df[['col1','col2']].min(axis=1) # compute min
df = df.sort_values(by='colmin').drop(columns='colmin') # sort then drop min
print(df)
gives output
col1 col2 col3 outcome
0 1 4 1 1
2 3 1 1 1
1 2 2 0 0
3 4 3 1 0

How to swap column1 value with colum 2 value under a condition in Pandas

I'd like to swap column1 value with column2 value if column1.value >= 14 in pandas!
col1
col2
16
1
3
2
4
3
This should become:
col1
col2
1
16
3
2
4
3
Thanks!
Use Series.mask and re-assign the two columns values:
m = df["col1"].ge(14)
out = df.assign(
col1=df["col1"].mask(m, df["col2"]),
col2=df["col2"].mask(m, df["col1"])
)
Output:
col1 col2
0 1 16
1 3 2
2 4 3
Simple one liner solution,
df.loc[df['col1'] >= 14,['col1','col2']] = df.loc[df['col1'] >= 14,['col2','col1']].values

How to enter the value of one index and column into a new cell with +1 in the iteration?

I have the following DataFrame named df1:
col1
col2
col3
5
3
50
10
4
3
2
0
1
I would like to create a loop that adds a new column called "Total", which takes the value of col1 index 0 (5) and enters that value under the column "Total" at index 0. The next iteration, will col2 index 1 (4) and that value will go under column "Total" at index 1. This step will continue all columns and rows are completed.
The ideal output will be the following:
df1
col1
col2
col3
Total
5
3
50
5
10
4
3
4
2
0
1
1
I have the following code but I would like to find a more efficient way of doing this as I have a large DataFrame:
df1.iloc[0,3] = df1.iloc[0,0]
df1.iloc[1,3] = df1.iloc[1,1]
df1.iloc[2,3] = df1.iloc[2,2]
Thank you!
Numpy has a built in diagonal function:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [5, 10, 2], 'col2': [3, 4, 0], 'col3': [50, 3, 1]})
df['Total'] = np.diag(df)
print(df)
Output
col1 col2 col3 Total
0 5 3 50 5
1 10 4 3 4
2 2 0 1 1
You can try apply on rows
df['Total'] = df.apply(lambda row: row.iloc[row.name], axis=1)
col1 col2 col3 Total
0 5 3 50 5
1 10 4 3 4
2 2 0 1 1
Hope this logic will help
length = len(df1["col1"])
total = pd.Series([df1.iloc[i, i%3] for i in range(length)])
# in i%3, 3 is number of cols(col1, col2, col3)
# add this total Series to df1

Aggregate unique values of a column based on group by multiple columns and count unique - pandas

ID col1 col2 col3
I1 1 0 1
I2 1 0 1
I3 0 1 0
I4 0 1 0
I5 0 0 1
This is my dataframe. I am looking forward to aggregate ID values based on the group by of col1,col2,col3 and also want a count columns along ith this.
Expected output :
ID_List Count
[I1,I2] 2
[I3,I4] 2
[I5] 1
My code
cols_to_group = ['col1','col2','col3']
data = pd.DataFrame(df.groupby(cols_to_group)['id'].nunique()).reset_index(drop=True)
data.head()
ID
0 2
1 2
2 1
You can do a groupby.agg():
df.groupby(['col1','col2','col3'], sort=False).ID.agg([list,'count'])
Output:
list count
col1 col2 col3
1 0 1 [I1, I2] 2
0 1 0 [I3, I4] 2
0 1 [I5] 1
You need to aggregate a function by either sum, count etc. In this case, count. Try the below code.
df.groupby(['col1','col2','col3']).ID.agg([list,'count']).reset_index(drop=True)
Output:
list count
0 [I1, I2] 2
1 [I3, I4] 2
2 [I5] 1
Here you go:
grouped = df.groupby(['col1', 'col2', 'col3'], sort=False).ID
df = pd.DataFrame({
'ID_List': grouped.aggregate(list),
'Count': grouped.count()
}).reset_index(drop=True)
print(df)
Output:
ID_List Count
0 [I1, I2] 2
1 [I3, I4] 2
2 [I5] 1

How to make a sum row for two columns python dataframe

I have a pandas dataframe:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
And I want to add a new row summing over two columns [Col1,Col2] like:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
Total 3 5 NaN
Ignoring Col3. What should I do? Thanks in advance.
You can use the pandas.DataFrame.append and pandas.DataFrame.sum methods:
df2 = df.append(df.sum(), ignore_index=True)
df2.iloc[-1, df2.columns.get_loc('Col3')] = np.nan
You can use pd.DataFrame.loc. Note the final column will be converted to float since NaN is considered float:
import numpy as np
df.loc['Total'] = [df['Col1'].sum(), df['Col2'].sum(), np.nan]
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].astype(int)
print(df)
Col1 Col2 Col3
0 1 2 3.0
1 2 3 4.0
Total 3 5 NaN

Categories

Resources