Switch Header and Column in a DataFrame - python

Economy Year Indicator1 Indicator2 Indicator3 Indicator4 .
UK 1 23 45 56 78
UK 2 24 87 32 42
UK 3 22 87 32 42
UK 4 2 87 32 42
FR . . . . .
This is my data which extends on and held as a DataFrame, I want to switch the Header(Indicators) and the Year columns, seems like a pivot. There are hundreds of indicators and 20 years.

Use DataFrame.melt with DataFrame.pivot:
df = (df.melt(['Economy','Year'], var_name='Ind')
.pivot(['Economy','Ind'], 'Year', 'value')
.reset_index()
.rename_axis(None, axis=1))
print (df)
Economy Ind 1 2 3 4
0 UK Indicator1 23 24 22 2
1 UK Indicator2 45 87 87 87
2 UK Indicator3 56 32 32 32
3 UK Indicator4 78 42 42 42

Another option is to set Year column as index and then use transpose.
Consider the code below:
import pandas as pd
df = pd.DataFrame(columns=['Economy', 'Year', 'Indicator1', 'Indicator2', 'Indicator3', 'Indicator4'],
data=[['UK', 1, 23, 45, 56, 78],['UK', 2, 24, 87, 32, 42],['UK', 3, 22, 87, 32, 42],['UK', 4, 2, 87, 32, 42],
['FR', 1, 22, 33, 11, 35]])
# Make Year column as index
df = df.set_index('Year')
# Transpose columns to rows and vice-versa
df = df.transpose()
print(df)
gives you
Year 1 2 3 4 1
Economy UK UK UK UK FR
Indicator1 23 24 22 2 22
Indicator2 45 87 87 87 33
Indicator3 56 32 32 32 11
Indicator4 78 42 42 42 35

You can use transpose
like this :
df = df.set_index('Year')
df = df.transpose()
print (df)

Related

Select date columns in python based on specific date criteria

This is my sample code. My database contains columns for every date of the year, going back multiple years. Each column corresponds to a specific date.
import pandas as pd
df = pd.DataFrame([[10, 5, 25, 67,25,56],
[20, 10, 26, 45, 56, 34],
[30, 3, 27, 34, 78, 34],
[40, 9, 28, 45, 34,76]],
columns=[pd.to_datetime('2022-09-14'), pd.to_datetime('2022-08-14'), pd.to_datetime('2022-07-14'), pd.to_datetime('2021-09-14'),
pd.to_datetime('2020-09-14'), pd.to_datetime('2019-09-14')])
Is there a way to select only those columns which fit a particular criteria based on year, month or quarter.
For example, I was hoping to get only those columns which is the same date as today (any starting date) for every year. For example, today is Sep 14, 2022 and I need columns only for Sep 14, 2021, Sep 14, 2020 and so on. Another option could be to do the same on a month or quarter basis.
How can this be done in pandas?
Yes, you can do:
# day
df.loc[:, df.columns.day == 14]
2022-09-14 2022-08-14 2022-07-14 2021-09-14 2020-09-14 2019-09-14
0 10 5 25 67 25 56
1 20 10 26 45 56 34
2 30 3 27 34 78 34
3 40 9 28 45 34 76
# month
df.loc[:, df.columns.month == 9]
2022-09-14 2021-09-14 2020-09-14 2019-09-14
0 10 67 25 56
1 20 45 56 34
2 30 34 78 34
3 40 45 34 76
# quarter
df.loc[:, df.columns.quarter == 3]
2022-09-14 2022-08-14 2022-07-14 2021-09-14 2020-09-14 2019-09-14
0 10 5 25 67 25 56
1 20 10 26 45 56 34
2 30 3 27 34 78 34
3 40 9 28 45 34 76

How to transpose or pivote a table? Selecting specific columns

beginner here!
I have a dataframe similar to this:
df = pd.DataFrame({'Country_Code':['FR','FR','FR','USA','USA','USA','BR','BR','BR'],'Indicator_Name':['GPD','Pop','birth','GPD','Pop','birth','GPD','Pop','birth'],'2005':[14,34,56, 25, 67, 68, 55, 8,99], '2006':[23, 34, 34, 43,34,34, 65, 34,45]})
Index Country_Code Inndicator_Name 2005 2006
0 FR GPD 14 23
1 FR Pop 34 34
2 FR birth 56 34
3 USA GPD 25 43
4 USA Pop 67 34
5 USA birth 68 34
6 BR GPD 55 65
7 BR Pop 8 34
8 BR birth 99 45
I need to pivot or transpose it, keeping the Country Code, the years, and the indicators names as columns, like this:
index Country_Code year GPD Pop Birth
0 FR 2005 14 34 56
1 FR 2006 23 34 34
3 USA 2005 25 67 68
4 USA 2006 43 34 34
...
I used the transposed function like this:
df.set_index(['Indicator Name']).transpose()
The result is nice, but I have the Countries as a row like this:
Inndicator_Name GPD Pop birth GPD Pop birth GPD Pop birth
Country_Code FR FR FR USA USA USA BR BR BR
2005 14 34 56 25 67 68 55 8 99
2006 23 34 34 43 34 34 65 34 45
I also tried to use the "pivot" and the "pivot table" function, but the result is not satisfactory. Could you please give me some advice?
import pandas as pd
df = pd.DataFrame({'Country_Code':['FR','FR','FR','USA','USA','USA','BR','BR','BR'],'Indicator_Name':['GPD','Pop','birth','GPD','Pop','birth','GPD','Pop','birth'],'2005':[14,34,56, 25, 67, 68, 55, 8,99], '2006':[23, 34, 34, 43,34,34, 65, 34,45]})
df
#%% Pivot longer columns `'2005'` and `'2006'` to `'Year'`
df1 = df.melt(id_vars=["Country_Code", "Indicator_Name"],
var_name="Year",
value_name="Value")
#%% Pivot wider by values in `'Indicator_Name'`
df2 = (df1.pivot_table(index=['Country_Code', 'Year'],
columns=['Indicator_Name'],
values=['Value'],
aggfunc='first'))
Output:
Value
Indicator_Name GPD Pop birth
Country_Code Year
BR 2005 55 8 99
2006 65 34 45
FR 2005 14 34 56
2006 23 34 34
USA 2005 25 67 68
2006 43 34 34
The simplest in my opinion, you can pivot+stack:
(df.pivot(index='Country_Code', columns='Indicator_Name')
.rename_axis(columns=['year', None]).stack(0).reset_index()
)
output:
Country_Code year GPD Pop birth
0 BR 2005 55 8 99
1 BR 2006 65 34 45
2 FR 2005 14 34 56
3 FR 2006 23 34 34
4 USA 2005 25 67 68
5 USA 2006 43 34 34

Rearranging Pandas Dataframe

I have a DataFrame as follows:
d = {'name': ['a', 'a','a','b','b','b'],
'var': ['v1', 'v2', 'v3', 'v1', 'v2', 'v3'],
'Yr1': [11, 21, 31, 41, 51, 61],
'Yr2': [12, 22, 32, 42, 52, 62],
'Yr3': [13, 23, 33, 43, 53, 63]}
df = pd.DataFrame(d)
name var Yr1 Yr2 Yr3
a v1 11 12 13
a v2 21 22 23
a v3 31 32 33
b v1 41 42 43
b v2 51 52 53
b v3 61 62 63
and I want to rearrange it to look like this:
name Yr v1 v2 v3
a 1 11 21 31
a 2 12 22 32
a 3 13 23 33
b 1 41 51 61
b 2 42 52 62
b 3 43 53 63
I am new to pandas and tried using other threads I found here but struggled to make it work. Any help would be much appreciated.
Try this
import pandas as pd
d = {'name': ['a', 'a', 'a', 'b', 'b', 'b'],
'var': ['v1', 'v2', 'v3', 'v1', 'v2', 'v3'],
'Yr1': [11, 21, 31, 41, 51, 61],
'Yr2': [12, 22, 32, 42, 52, 62],
'Yr3': [13, 23, 33, 43, 53, 63]}
df = pd.DataFrame(d)
# Solution
df.set_index(['name', 'var'], inplace=True)
df = df.unstack().stack(0)
print(df.reset_index())
output:
var name level_1 v1 v2 v3
0 a Yr1 11 21 31
1 a Yr2 12 22 32
2 a Yr3 13 23 33
3 b Yr1 41 51 61
4 b Yr2 42 52 62
5 b Yr3 43 53 63
Reference: pandas.DataFrame.stack
Try groupby apply:
df.groupby("name").apply(
lambda x: x.set_index("var").T.drop("name")
).reset_index().rename(columns={"level_1": "Yr"}).rename_axis(columns=None)
name Yr v1 v2 v3
0 a Yr1 11 21 31
1 a Yr2 12 22 32
2 a Yr3 13 23 33
3 b Yr1 41 51 61
4 b Yr2 42 52 62
5 b Yr3 43 53 63
Or better:
df.pivot("var", "name", ["Yr1", "Yr2", "Yr3"]).T.sort_index(
level=1
).reset_index().rename({"level_0": "Yr"}, axis=1).rename_axis(columns=None)
Yr name v1 v2 v3
0 Yr1 a 11 21 31
1 Yr2 a 12 22 32
2 Yr3 a 13 23 33
3 Yr1 b 41 51 61
4 Yr2 b 42 52 62
5 Yr3 b 43 53 63
We can use pd.wide_to_long + df.unstack here.
pd.wide_to_long doc:
With stubnames [‘A’, ‘B’], this function expects to find one or more groups of columns with format A-suffix1, A-suffix2,…, B-suffix1, B-suffix2,… You specify what you want to call this suffix in the resulting long format with j (for example j=’year’).
pd.wide_to_long(
df, stubnames="Yr", i=["name", "var"], j="Y"
).squeeze().unstack(level=1).reset_index()
var name Y v1 v2 v3
0 a 1 11 21 31
1 a 2 12 22 32
2 a 3 13 23 33
3 b 1 41 51 61
4 b 2 42 52 62
5 b 3 43 53 63
We can use df.melt + df.pivot here.
out = df.melt(id_vars=['name', 'var'], var_name='Yr')
out['Yr'] = out['Yr'].str.replace('Yr', '')
out.pivot(index=['name', 'Yr'], columns='var', values='value').reset_index()
var name Yr v1 v2 v3
0 a 1 11 21 31
1 a 2 12 22 32
2 a 3 13 23 33
3 b 1 41 51 61
4 b 2 42 52 62
5 b 3 43 53 63

how to generate pandas dataframe basis list with condition

I have following list in python
movie_list = [11, 21, 31, 41, 51, 62, 55]
and following movie dataframe
userId movieId
1 11
1 21
1 31
2 62
2 55
Now what I want to do is generate similar dataframe, where movieId is not in dataframe, but there in movie_list
My desired dataframe would be
userId movieId
1 41
1 51
1 62
1 55
2 11
2 21
2 31
2 41
2 51
How can I do it in pandas?
IIUC, we can do the agg with list , then find the different between the original value in df with the movie_list
s=df.groupby('userId').movieId.agg(list).\
map(lambda x : list(set(movie_list)-set(x))).explode().reset_index()
userId movieId
0 1 41
1 1 51
2 1 62
3 1 55
4 2 41
5 2 11
6 2 51
7 2 21
8 2 31
One approach would be to use itertools.product to create all combinations of userId & movieId, then concat and drop_duplicates:
from itertools import product
movie_list = [11, 21, 31, 41, 51, 62, 55]
df_all = pd.DataFrame(product(df['userId'].unique(), movie_list), columns=df.columns)
df2 = pd.concat([df, df_all]).drop_duplicates(keep=False)
print(df2)
[out]
userId movieId
3 1 41
4 1 51
5 1 62
6 1 55
7 2 11
8 2 21
9 2 31
10 2 41
11 2 51
prod = pd.MultiIndex.from_product([df.userId.unique().tolist(), movie_list]).tolist()
(
pd.DataFrame(set(prod).difference([tuple(e) for e in df.values]),
columns=['userId', 'movieId'])
.sort_values(by=['userId', 'movieId'])
)
userId movieId
7 1 41
6 1 51
2 1 55
8 1 62
5 2 11
4 2 21
3 2 31
1 2 41
0 2 51
I think you need:
df = df.groupby("userId")["movieId"].apply(list).reset_index()
df["movieId"] = df["movieId"].apply(lambda x: list(set(movie_list)-set(x)))
df = df.explode("movieId")
print(df)
Output:
userId movieId
0 1 41
0 1 51
0 1 62
0 1 55
1 2 41
1 2 11
1 2 51
1 2 21
1 2 31

Python pandas.cut

Edit: Added defT
Does using pandas.cut change the structure of a pandas.DataFrame.
I am using pandas.cut in the following manner to map single age years to age groups and then aggregating afterwards. However, the aggregation does not work as I end up with NaN in all columns that are being aggregated. Here is my code:
cutoff = numpy.hstack([numpy.array(defT.MinAge[0]), defT.MaxAge.values])
labels = defT.AgeGrp
df['ageGrp'] = pandas.cut(df.Age,
bins = cutoff,
labels = labels,
include_lowest = True)
Here is defT:
AgeGrp MaxAge MinAge
1 18 14
2 21 19
3 24 22
4 34 25
5 44 35
6 54 45
7 65 55
Then I pass the data-frame into another function to aggregate:
grouped = df.groupby(['Year', 'Month', 'OccID', 'ageGrp', 'Sex', \
'Race', 'Hisp', 'Educ'],
as_index = False)
final = grouped.aggregate(numpy.sum)
If I change the ages to age groups via this manner it works perfectly:
df['ageGrp'] = 1
df.ix[(df.Age >= 14) & (df.Age <= 18), 'ageGrp'] = 1 # Age 16 - 20
df.ix[(df.Age >= 19) & (df.Age <= 21), 'ageGrp'] = 2 # Age 21 - 25
df.ix[(df.Age >= 22) & (df.Age <= 24), 'ageGrp'] = 3 # Age 26 - 44
df.ix[(df.Age >= 25) & (df.Age <= 34), 'ageGrp'] = 4 # Age 45 - 64
df.ix[(df.Age >= 35) & (df.Age <= 44), 'ageGrp'] = 5 # Age 64 - 85
df.ix[(df.Age >= 45) & (df.Age <= 54), 'ageGrp'] = 6 # Age 64 - 85
df.ix[(df.Age >= 55) & (df.Age <= 64), 'ageGrp'] = 7 # Age 64 - 85
df.ix[df.Age >= 65, 'ageGrp'] = 8 # Age 85+
I would prefer to do this on the fly, importing the definition table and using pandas.cut, instead of being hard-coded.
Thank you in advance.
Here is, perhaps, a work-around.
Consider the following example which replicates the symptom you describe:
import numpy as np
import pandas as pd
np.random.seed(2015)
defT = pd.DataFrame({'AgeGrp': [1, 2, 3, 4, 5, 6, 7],
'MaxAge': [18, 21, 24, 34, 44, 54, 65],
'MinAge': [14, 19, 22, 25, 35, 45, 55]})
cutoff = np.hstack([np.array(defT['MinAge'][0]), defT['MaxAge'].values])
labels = defT['AgeGrp']
N = 50
df = pd.DataFrame(np.random.randint(100, size=(N,2)), columns=['Age', 'Year'])
df['ageGrp'] = pd.cut(df['Age'], bins=cutoff, labels=labels, include_lowest=True)
grouped = df.groupby(['Year', 'ageGrp'], as_index=False)
final = grouped.agg(np.sum)
print(final)
# Year ageGrp Age
# Year ageGrp
# 3 1 NaN NaN NaN
# 2 NaN NaN NaN
# ...
# 97 1 NaN NaN NaN
# 2 NaN NaN NaN
# [294 rows x 3 columns]
If we change
grouped = df.groupby(['Year', 'ageGrp'], as_index=False)
final = grouped.agg(np.sum)
to
grouped = df.groupby(['Year', 'ageGrp'], as_index=True)
final = grouped.agg(np.sum).dropna()
print(final)
then we obtain:
Age
Year ageGrp
6 7 61
16 4 32
18 1 34
25 3 23
28 5 39
34 7 60
35 5 42
38 4 25
40 2 19
53 7 59
56 4 25
5 35
66 6 54
67 7 55
70 7 56
73 6 51
80 5 36
81 6 46
85 5 38
90 7 58
97 1 18

Categories

Resources