Convert columns into rows data with Pandas - python

my dataset has some information by location for n dates. The CSV looks like
Country year2018 year2019 year2020
saleA saleB SaleA SaleB saleA saleB
USA 22 23 323 32 31 65
china 12 12 2 66 66 78
I want my data to be of the form
Country year saleA saleB
USA year2018 22 23
USA year2019 323 32
USA year2020 31 65
china year2018 12 12
.
.
.
How can I do it using pandas?
I tried using pd.melt but couldn't figured out.

You can reshape your dataframe with set_index and stack:
out = (df.set_index('Country')
.rename_axis(columns=['year', None])
.stack('year').reset_index())
Country year saleA saleB
0 USA year2018 22 23
1 USA year2019 323 32
2 USA year2020 31 65
3 China year2018 12 12
4 China year2019 2 66
5 China year2020 66 78
Another solution with melt and pivot_table:
>>> out = (df.melt(id_vars='Country', var_name=['year', 'sale'])
.pivot_table(index=['Country', 'year'], columns='sale', values='value')
.reset_index())

Related

Subtract value of column based on another column

I have a big dataframe (the following is an example)
country
value
portugal
86
germany
20
belgium
21
Uk
81
portugal
77
UK
87
I want to subtract values by 60 whenever the country is portugal or UK, the dataframe should look like (Python)
country
value
portugal
26
germany
20
belgium
21
Uk
21
portugal
17
UK
27
IUUC, use isin on the lowercase country string to check if the values is in a reference list, then slice the dataframe with loc for in place modification:
df.loc[df['country'].str.lower().isin(['portugal', 'uk']), 'value'] -= 60
output:
country value
0 portugal 26
1 germany 20
2 belgium 21
3 Uk 21
4 portugal 17
5 UK 27
Use numpy.where:
In [1621]: import numpy as np
In [1622]: df['value'] = np.where(df['country'].str.lower().isin(['portugal', 'uk']), df['value'] - 60, df['value'])
In [1623]: df
Out[1623]:
country value
0 portugal 26
1 germany 20
2 belgium 21
3 Uk 21
4 portugal 17
5 UK 27

How to transpose or pivote a table? Selecting specific columns

beginner here!
I have a dataframe similar to this:
df = pd.DataFrame({'Country_Code':['FR','FR','FR','USA','USA','USA','BR','BR','BR'],'Indicator_Name':['GPD','Pop','birth','GPD','Pop','birth','GPD','Pop','birth'],'2005':[14,34,56, 25, 67, 68, 55, 8,99], '2006':[23, 34, 34, 43,34,34, 65, 34,45]})
Index Country_Code Inndicator_Name 2005 2006
0 FR GPD 14 23
1 FR Pop 34 34
2 FR birth 56 34
3 USA GPD 25 43
4 USA Pop 67 34
5 USA birth 68 34
6 BR GPD 55 65
7 BR Pop 8 34
8 BR birth 99 45
I need to pivot or transpose it, keeping the Country Code, the years, and the indicators names as columns, like this:
index Country_Code year GPD Pop Birth
0 FR 2005 14 34 56
1 FR 2006 23 34 34
3 USA 2005 25 67 68
4 USA 2006 43 34 34
...
I used the transposed function like this:
df.set_index(['Indicator Name']).transpose()
The result is nice, but I have the Countries as a row like this:
Inndicator_Name GPD Pop birth GPD Pop birth GPD Pop birth
Country_Code FR FR FR USA USA USA BR BR BR
2005 14 34 56 25 67 68 55 8 99
2006 23 34 34 43 34 34 65 34 45
I also tried to use the "pivot" and the "pivot table" function, but the result is not satisfactory. Could you please give me some advice?
import pandas as pd
df = pd.DataFrame({'Country_Code':['FR','FR','FR','USA','USA','USA','BR','BR','BR'],'Indicator_Name':['GPD','Pop','birth','GPD','Pop','birth','GPD','Pop','birth'],'2005':[14,34,56, 25, 67, 68, 55, 8,99], '2006':[23, 34, 34, 43,34,34, 65, 34,45]})
df
#%% Pivot longer columns `'2005'` and `'2006'` to `'Year'`
df1 = df.melt(id_vars=["Country_Code", "Indicator_Name"],
var_name="Year",
value_name="Value")
#%% Pivot wider by values in `'Indicator_Name'`
df2 = (df1.pivot_table(index=['Country_Code', 'Year'],
columns=['Indicator_Name'],
values=['Value'],
aggfunc='first'))
Output:
Value
Indicator_Name GPD Pop birth
Country_Code Year
BR 2005 55 8 99
2006 65 34 45
FR 2005 14 34 56
2006 23 34 34
USA 2005 25 67 68
2006 43 34 34
The simplest in my opinion, you can pivot+stack:
(df.pivot(index='Country_Code', columns='Indicator_Name')
.rename_axis(columns=['year', None]).stack(0).reset_index()
)
output:
Country_Code year GPD Pop birth
0 BR 2005 55 8 99
1 BR 2006 65 34 45
2 FR 2005 14 34 56
3 FR 2006 23 34 34
4 USA 2005 25 67 68
5 USA 2006 43 34 34

Importing Excel data with merging cells

How we can import the excel data with merged cells ?
Please find the excel sheet image.
Last column has 3 sub columns. How we can import without making changes at excel sheet ?
You could try this
# Store data in variable
dataset = 'Merged_Column_Data.xlsx'
# Import dataset and skip row 1
df = pd.read_excel(dataset,skiprows=1)
Unnamed: 0 Unnamed: 1 Unnamed: 2 Gold Silver Bronze
0 Great Britain GBR 2012 29 17 19
1 China CHN 2012 38 28 22
2 Russia RUS 2012 24 25 32
3 United States US 2012 46 28 29
4 Korea KOR 2012 13 8 7
# Create dictionary to handle unnamed columns
col_dict = {'Unnamed: 0':'Country', 'Unnamed: 1':'Country',
'Unnamed: 2':'Year',}
# Rename columns with dictionary
df.rename(columns=col_dict)
Country Country Year Gold Silver Bronze
0 Great Britain GBR 2012 29 17 19
1 China CHN 2012 38 28 22
2 Russia RUS 2012 24 25 32
3 United States US 2012 46 28 29
4 Korea KOR 2012 13 8 7

How to take mean across row in Pandas pivot table Dataframe? [duplicate]

This question already has answers here:
Compute row average in pandas
(5 answers)
Closed 2 years ago.
I have a pandas dataframe as seen below which is a pivot table. I would like to print Africa in 2007 as well as do the mean of the entire Americas row; any ideas how to do this? I have been doing combinations of stack/unstack for a while now to no avail.
year 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
continent
Africa 12 13 15 20 39 25 81 12 22 23 25 44
Americas 12 14 65 10 119 15 21 42 47 84 15 89
Asia 12 13 89 20 39 25 81 29 77 23 25 89
Europe 12 13 15 20 39 25 81 29 23 32 15 89
Oceania 12 13 15 20 39 25 81 27 32 85 25 89
import pandas as pd
df = pd.read_csv('dummy_data.csv')
# handy to see the continent name against the value rather than '0' or '3'
df.set_index('continent', inplace=True)
# print mean for all rows - see how the continent name helps here
print(df.mean(axis=1))
print('---')
print()
# print the mean for just the 'Americas' row
print(df.mean(axis=1)['Americas'])
print('---')
print()
# print the value of the 'Africa' row for the year (column) 2007
print(df.query('continent == "Africa"')['2007'])
print('---')
print()
Output:
continent
Africa 27.583333
Americas 44.416667
Asia 43.500000
Europe 32.750000
Oceania 38.583333
dtype: float64
---
44.416666666666664
---
continent
Africa 44
Name: 2007, dtype: int64
---

Pandas: transform column's values in independent columns

I have Pandas DataFrame which looks like following (df_olymic).
I would like the values of column Type to be transformed in independent columns (df_olympic_table)
Original dataframe
In [3]: df_olympic
Out[3]:
Country Type Num
0 USA Gold 46
1 USA Silver 37
2 USA Bronze 38
3 GB Gold 27
4 GB Silver 23
5 GB Bronze 17
6 China Gold 26
7 China Silver 18
8 China Bronze 26
9 Russia Gold 19
10 Russia Silver 18
11 Russia Bronze 19
Transformed dataframe
In [5]: df_olympic_table
Out[5]:
Country N_Gold N_Silver N_Bronze
0 USA 46 37 38
1 GB 27 23 17
2 China 26 18 26
3 Russia 19 18 19
What would be the most convenient way to achieve this?
You can use DataFrame.pivot:
df = df.pivot(index='Country', columns='Type', values='Num')
print (df)
Type Bronze Gold Silver
Country
China 26 26 18
GB 17 27 23
Russia 19 19 18
USA 38 46 37
Another solution with DataFrame.set_index and Series.unstack:
df = df.set_index(['Country','Type'])['Num'].unstack()
print (df)
Type Bronze Gold Silver
Country
China 26 26 18
GB 17 27 23
Russia 19 19 18
USA 38 46 37
but if get:
ValueError: Index contains duplicate entries, cannot reshape
need pivot_table with some aggreagte function, by default it is np.mean, but you can use sum, first...
#add new row with duplicates value in 'Country' and 'Type'
print (df)
Country Type Num
0 USA Gold 46
1 USA Silver 37
2 USA Bronze 38
3 GB Gold 27
4 GB Silver 23
5 GB Bronze 17
6 China Gold 26
7 China Silver 18
8 China Bronze 26
9 Russia Gold 19
10 Russia Silver 18
11 Russia Bronze 20 < - changed value to 20
11 Russia Bronze 100 < - add new row with duplicates
df = df.pivot_table(index='Country', columns='Type', values='Num', aggfunc=np.mean)
print (df)
Type Bronze Gold Silver
Country
China 26 26 18
GB 17 27 23
Russia 60 19 18 < - Russia get ((100 + 20)/ 2 = 60
USA 38 46 37
Or groupby with aggreagting mean and reshape by unstack:
df = df.groupby(['Country','Type'])['Num'].mean().unstack()
print (df)
Type Bronze Gold Silver
Country
China 26 26 18
GB 17 27 23
Russia 60 19 18 < - Russia get ((100 + 20)/ 2 = 60
USA 38 46 37

Categories

Resources