How to turn column header into pandas index - python

So my pandas df currently looks something like this:
Detail
Person 1
Person 2
Person 3
Name
Steve
Larry
Dave
Age
45
56
67
Hobbie
Running
Skating
Painting
But I want to reshape it to this:
Person
Name
Age
Hobbie
Person 1
Steve
45
Running
Person 2
Larry
56
Skating
Person 3
Dave
67
Painting
Anyone know a way of doing this?

Use:
out = (df.set_index('Detail').T
.rename_axis('Person')
.reset_index()
.rename_axis(columns=None)
)
Output:
Person Name Age Hobbie
0 Person 1 Steve 45 Running
1 Person 2 Larry 56 Skating
2 Person 3 Dave 67 Painting

All you need to do is transpose the dataframe with df.T and rename the column name using df.rename(). But there is a catch while using df.T it also transpose the index column to row. So we need to work around it.
Here is the step by step code:
data.csv:
Detail,Person 1,Person 2,Person 3
Name,Steve,Larry,Dave
Age,45,56,67
Hobbie,Running,Skating,Painting
Reading from file:
import pandas as pd
df = pd.read_csv("data.csv")
print(df)
output:
Detail Person 1 Person 2 Person 3
0 Name Steve Larry Dave
1 Age 45 56 67
2 Hobbie Running Skating Painting
Changing the column name:
df = df.rename(columns={"Detail":"Person"})
print(df)
output:
Person Person 1 Person 2 Person 3
0 Name Steve Larry Dave
1 Age 45 56 67
2 Hobbie Running Skating Painting
Transposing with new index column:
df = df.set_index('Person').T
print(df)
output:
Person Name Age Hobbie
Person 1 Steve 45 Running
Person 2 Larry 56 Skating
Person 3 Dave 67 Painting

Related

How do i increase an element value from column in Pandas?

Hello I have this Pandas code (look below) but turn out it give me this error: TypeError: can only concatenate str (not "int") to str
import pandas as pd
import numpy as np
import os
_data0 = pd.read_excel("C:\\Users\\HP\\Documents\\DataScience task\\Gender_Age.xlsx")
_data0['Age' + 1]
I wanted to change the element values from column 'Age', imagine if I wanted to increase the column elements from 'Age' by 1, how do i do that? (With Number of Children as well)
The output I wanted:
First Name Last Name Age Number of Children
0 Kimberly Watson 36 2
1 Victor Wilson 35 6
2 Adrian Elliott 35 2
3 Richard Bailey 36 5
4 Blake Roberts 35 6
Original output:
First Name Last Name Age Number of Children
0 Kimberly Watson 24 1
1 Victor Wilson 23 5
2 Adrian Elliott 23 1
3 Richard Bailey 24 4
4 Blake Roberts 23 5
Try:
df['Age'] = df['Age'] - 12
df['Number of Children'] = df['Number of Children'] - 1

How to merge multiple rows based on a single column (implode or nest) in pandas dataframe?

I'm looking to combine multiple row in a dataframe into a single row based on one column
This is what my df looks like:
id Name score
0 1234 jim 34
1 5678 james 45
2 4321 Macy 56
3 1234 Jim 78
4 5678 James 80
I want to combine based on column "score" so the output would look like:
id Name score
0 1234 jim 34,78
1 5678 james 45,80
2 4321 Macy 56
Basically I want to do the reverse of the explode function. How can I achieve this using pandas dataframe?
Try agg with groupby
out = df.groupby('id',as_index=False).agg({'Name':'first','score':lambda x : ','.join(x.astype(str))})
Out[29]:
id Name score
0 1234 jim 34,78
1 4321 Macy 56
2 5678 james 45,80

find the maximum value in a column with respect to other column

i have below data frame:-
input-
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 4 25
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
3 Jake Milner 24 2 62
4 Amy Cooze 73 3 70
i want the output as:-0
Amy 73
so basically i want to find the highest value in age column and i also want the name of person with highest age.
i tried with pandas using group by as below:-
df2=df.groupby(['first_name'])['age'].max()
But with this i am getting the below output as below :
first_name
Amy 73
Jake 24
Jason 42
Molly 52
Tina 36
Name: age, dtype: int64
where as i only want
Amy 73
How shall i go about it in pandas?
You can get your result with the code below
df.loc[df.age.idxmax(),['first_name','age']]
Here, with df.age.idxmax() we are getting the index of the row which has the maximum age value.
Then with df.loc[df.age.idxmax(),['first_name','age']] we are getting the columns 'first_name' & 'age' at that index.
This line of code should do the work
df[df['age']==df['age'].max()][['first_name','age']]
The [['first_name','age']] has the names of columns you want in the result output.
Change as you want.
As in this case the output will be
first_name Age
Amy 73

Choose higher value based off column value between two dataframes

question to choose value based on two df.
>>> df[['age','name']]
age name
0 44 Anna
1 22 Bob
2 33 Cindy
3 44 Danis
4 55 Cindy
5 66 Danis
6 11 Anna
7 43 Bob
8 12 Cindy
9 19 Danis
10 11 Anna
11 32 Anna
12 55 Anna
13 33 Anna
14 32 Anna
>>> df2[['age','name']]
age name
5 66 Danis
4 55 Cindy
0 44 Anna
7 43 Bob
expected result is all rows that value 'age' is higher than df['age'] based on column 'name.
expected result
age name
12 55 Anna
Per comments, use merge and filter dataframe:
df.merge(df2, on='name', suffixes={'','_y'}).query('age > age_y')[['name','age']]
Output:
name age
4 Anna 55
IIUC, you can use this to find the max age of all names:
pd.concat([df,df2]).groupby('name')['age'].max()
Output:
name
Anna 55
Bob 43
Cindy 55
Danis 66
Name: age, dtype: int64
Try this:
index = df[df['age'] > age].index
df.loc[index]
There are a few edge cases you don't mention how you would like to resolve, but generally what you want to do is iterate down the df and compare ages and use the larger. You could do so in the following manner:
df3 = pd.DataFrame(columns = ['age', 'name'])
for x in len(df):
if df['age'][x] > df2['age'][x]:
df3['age'][x] = df['age'][x]
df3['name'][x] = df['name'][x]
else:
df3['age'][x] = df2['age'][x]
df3['name'][x] = df2['name'][x]
Although you will need to modify this to reflect how you want to resolve names that are only in one list, or if the lists are of different sizes.
One solution comes to my mind is merge and drop
df.merge(df2, on='name', suffixes=('', '_y')).query('age.gt(age_y)', engine='python')[['age','name']]
Out[175]:
age name
4 55 Anna

Pivoting count of column value using python pandas

I have student data with id's and some values and I need to pivot the table for count of ID.
Here's an example of data:
id name maths science
0 B001 john 50 60
1 B021 Kenny 89 77
2 B041 Jessi 100 89
3 B121 Annie 91 73
4 B456 Mark 45 33
pivot table:
count of ID
5
Lots of different ways to approach this, I would use either shape or nunique() as Sandeep suggested.
data = {'id' : ['0','1','2','3','4'],
'name' : ['john', 'kenny', 'jessi', 'Annie', 'Mark'],
'math' : [50,89,100,91,45],
'science' : [60,77,89,73,33]}
df = pd.DataFrame(data)
print(df)
id name math science
0 0 john 50 60
1 1 kenny 89 77
2 2 jessi 100 89
3 3 Annie 91 73
4 4 Mark 45 33
then pass either of the following:
df.shape() which gives you the length of a data frame.
or
in:df['id'].nunique()
out:5

Categories

Resources