Pivoting a Pandas Table - Peculiar Problem - python

It seemed I had a simple problem of pivoting a pandas Table, but unfortunately, the problem seems a bit complicated to me.
I am providing a tiny sample table and the output I am looking to give the example of the problem I am facing:
Say, I have a table like this:
df =
AF BF AT BT
1 4 100 70
2 7 102 66
3 11 200 90
4 13 300 178
5 18 403 200
So I need it into a wide/pivot format but the parameter name in each case will be set as the same. ( I am not looking to subset the string if possible)
My output table should like the following:
dfout =
PAR F T
A 1 100
B 4 70
A 2 102
B 7 66
A 3 200
B 11 90
A 4 300
B 13 178
A 5 403
B 18 200
I tried pivoting, but not able to achieve the desired output. Any help will be immensely appreciated. Thanks.

You can use pandas wide_to_long, but first you have to reorder the columns:
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]).reset_index(),
stubnames=["F", "T"],
i="index",
sep="",
j="PAR",
suffix=".",
).reset_index("PAR")
PAR F T
index
0 A 1 100
1 A 2 102
2 A 3 200
3 A 4 300
4 A 5 403
0 B 4 70
1 B 7 66
2 B 11 90
3 B 13 178
4 B 18 200
Alternatively, you could use the pivot_longer function from the pyjanitor, to reshape the data :
# pip install pyjanitor
import janitor
df.pivot_longer(names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
Update: Using data from #jezrael:
df
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]),
stubnames=["F", "T"],
i="C",
sep="",
j="PAR",
suffix=".",
).reset_index()
C PAR F T
0 10 A 1 100
1 20 A 2 102
2 30 A 3 200
3 40 A 4 300
4 50 A 5 403
5 10 B 4 70
6 20 B 7 66
7 30 B 11 90
8 40 B 13 178
9 50 B 18 200
if you use the pivot_longer function:
df.pivot_longer(index="C", names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
pivot_longer is being worked on; in the next release of pyjanitor it should be much better. But pd.wide_to_long can solve your task pretty easily. The other answers can easily solve it as well.

Idea is create MultiIndex in columns by first and last letter and then use DataFrame.stack for reshape, last some data cleaning in MultiIndex in index:
df.columns= [df.columns.str[-1], df.columns.str[0]]
df = df.stack().reset_index(level=0, drop=True).rename_axis('PAR').reset_index()
print (df)
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
EDIT:
print (df)
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
df = df.set_index('C')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[-1],
df.columns.str[0]], names=[None,'PAR'])
df = df.stack().reset_index()
print (df)
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200

Let's try:
(pd.wide_to_long(df.reset_index(),stubnames=['A','B'],
i='index',
j='PAR', sep='', suffix='[FT]')
.stack().unstack('PAR').reset_index(level=1)
)
Output:
PAR level_1 F T
index
0 A 1 100
0 B 4 70
1 A 2 102
1 B 7 66
2 A 3 200
2 B 11 90
3 A 4 300
3 B 13 178
4 A 5 403
4 B 18 200

Related

Python Pandas - sumif in excel - criteria and range same df volumn

I’ve been trying to code python equivalent of excel sumif
Excel:
Sumif($A$1:$A$20,A1,$C$1:$C$20)
enter code here
Pandas df:
A C Term
1 10 1
1 20 2
1 10 3
1 10 4
2 30 5
2 30 6
2 30 7
3 20 8
3 10 9
3 10 10
3 10 11
3 10 12
Output df - I want output df with ‘fwdSum’ as follows
—————————
A C Term fwdSum
1 10 1 50
1 20 2 50
1 10 3 50
1 10 4 50
2 30 5 90
2 30 6 90
2 30 7 90
3 20 8 60
3 10 9 60
3 10 10 60
3 10 11 60
3 10 12 60
I tried creating another df with groupby and sum and then later merge
Please can anyone suggest the best Way to achieve this?
df['fwdSum'] = df.groupby('A')['C'].transform('sum')
print(df)
Prints:
A C Term fwdSum
0 1 10 1 50
1 1 20 2 50
2 1 10 3 50
3 1 10 4 50
4 2 30 5 90
5 2 30 6 90
6 2 30 7 90
7 3 20 8 60
8 3 10 9 60
9 3 10 10 60
10 3 10 11 60
11 3 10 12 60

Labeling by period

my dataset
name day value
A 7 88
A 15 101
A 21 121
A 29 56
B 21 131
B 30 78
B 35 102
C 8 80
C 16 101
...
I am trying to plot with values for these days, but I want to label because there are too many unique numbers of days.
I try to label it consistently,
Is there a way to speed up labeling by cutting it every 7 days(week)?
For example, ~ 7day = 1week, 8 ~ 14day = 2week, and so on.
output what I want
name day value week
A 7 88 1
A 15 101 3
A 21 121 3
A 29 56 5
B 21 131 3
B 30 78 5
B 35 102 5
C 8 80 2
C 16 101 3
thank you for reading
Subtract 1, then use integer division by 7 and last add 1:
df['week'] = (df['day'] - 1) // 7 + 1
print (df)
name day value week
0 A 7 88 1
1 A 15 101 3
2 A 21 121 3
3 A 29 56 5
4 B 21 131 3
5 B 30 78 5
6 B 35 102 5
7 C 8 80 2
8 C 16 101 3

Slice values of a column and calculate average in python

I have a dataframe with three columns:
a b c
0 73 12
73 80 2
80 100 5
100 150 13
Values in "a" and "b" are days. I need to find the average values of "c" in each 30 day-interval (slice values inside [min(a),max(b)] in 30 days and calculate average of c). I want as a result have a dataframe like this:
aa bb c_avg
0 30 12
30 60 12
60 90 6.33
90 120 9
120 150 13
Another sample data could be:
a b c
0 1264.0 1629.0 0.000000
1 1629.0 1632.0 133.333333
6 1632.0 1699.0 0.000000
2 1699.0 1706.0 21.428571
7 1706.0 1723.0 0.000000
3 1723.0 1726.0 50.000000
8 1726.0 1890.0 0.000000
4 1890.0 1893.0 33.333333
1 1893.0 1994.0 0.000000
How can I get to the final table?
First create ranges DataFrame by ranges defined a and b columns:
a = np.arange(0, 180, 30)
df1 = pd.DataFrame({'aa':a[:-1], 'bb':a[1:]})
#print (df1)
Then cross join all rows by helper column tmp:
df3 = pd.merge(df1.assign(tmp=1), df.assign(tmp=1), on='tmp')
#print (df3)
And last filter - There are 2 solution by columns for filtering:
df4 = df3[df3['aa'].between(df3['a'], df3['b']) | df3['bb'].between(df3['a'], df3['b'])]
print (df4)
aa bb tmp a b c
0 0 30 1 0 73 12
4 30 60 1 0 73 12
8 60 90 1 0 73 12
10 60 90 1 80 100 5
14 90 120 1 80 100 5
15 90 120 1 100 150 13
19 120 150 1 100 150 13
df4 = df4.groupby(['aa','bb'], as_index=False)['c'].mean()
print (df4)
aa bb c
0 0 30 12.0
1 30 60 12.0
2 60 90 8.5
3 90 120 9.0
4 120 150 13.0
df5 = df3[df3['a'].between(df3['aa'], df3['bb']) | df3['b'].between(df3['aa'], df3['bb'])]
print (df5)
aa bb tmp a b c
0 0 30 1 0 73 12
8 60 90 1 0 73 12
9 60 90 1 73 80 2
10 60 90 1 80 100 5
14 90 120 1 80 100 5
15 90 120 1 100 150 13
19 120 150 1 100 150 13
df5 = df5.groupby(['aa','bb'], as_index=False)['c'].mean()
print (df5)
aa bb c
0 0 30 12.000000
1 60 90 6.333333
2 90 120 9.000000
3 120 150 13.000000

Merge dataframes including extreme values

I have 2 data frames, df1 and df2:
df1
Out[66]:
A B
0 1 11
1 1 2
2 1 32
3 1 42
4 1 54
5 1 66
6 2 16
7 2 23
8 3 13
9 3 24
10 3 35
11 3 46
12 3 51
13 4 12
14 4 28
15 4 39
16 4 49
df2
Out[80]:
B
0 32
1 42
2 13
3 24
4 35
5 39
6 49
I want to merge dataframes but at the same time including the first and/or last value of the set in column A. This is an example of the desired outcome:
df3
Out[93]:
A B
0 1 2
1 1 32
2 1 42
3 1 54
4 3 13
5 3 24
6 3 35
7 3 46
8 4 28
9 4 39
10 4 49
I'm trying to use merge but that only slice the portion of data frames that coincides. Someone have an idea to deal with this? thanks!
Here's one way to do it using merge with indicator, groupby, and rolling:
df[df.merge(df2, on='B', how='left', indicator='Ind').eval('Found=Ind == "both"')
.groupby('A')['Found']
.apply(lambda x: x.rolling(3, center=True, min_periods=2).max()).astype(bool)]
Output:
A B
1 1 2
2 1 32
3 1 42
4 1 54
8 3 13
9 3 24
10 3 35
11 3 46
14 4 28
15 4 39
16 4 49
pd.concat([df1.groupby('A').min().reset_index(), pd.merge(df1,df2, on="B"), df1.groupby('A').max().reset_index()]).reset_index(drop=True).drop_duplicates().sort_values(['A','B'])
A B
0 1 2
4 1 32
5 1 42
1 2 16
2 3 13
7 3 24
8 3 35
3 4 12
9 4 39
10 4 49
Breaking down each part
#Get Minimum
df1.groupby('A').min().reset_index()
# Merge on B
pd.merge(df1,df2, on="B")
# Get Maximum
df1.groupby('A').max().reset_index()
# Reset the Index and drop duplicated rows since there may be similarities between the Merge and Min/Max. Sort values by 'A' then by 'B'
.reset_index(drop=True).drop_duplicates().sort_values(['A','B'])

insert dataframe into a dataframe - Python/Pandas

Question is pretty self explanatory, how would you insert a dataframe with a couple of values in to a bigger dataframe at a given point (between index's 10 and 11). Meaning that .append cant be used
You can use concat with sliced df by loc:
np.random.seed(100)
df1 = pd.DataFrame(np.random.randint(100, size=(5,6)), columns=list('ABCDEF'))
print (df1)
A B C D E F
0 8 24 67 87 79 48
1 10 94 52 98 53 66
2 98 14 34 24 15 60
3 58 16 9 93 86 2
4 27 4 31 1 13 83
df2 = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df2)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
#inserted between 4 and 5 index values
print (pd.concat([df1.loc[:4], df2, df1.loc[4:]], ignore_index=True))
A B C D E F
0 8 24 67 87 79 48
1 10 94 52 98 53 66
2 98 14 34 24 15 60
3 58 16 9 93 86 2
4 27 4 31 1 13 83
5 1 4 7 1 5 7
6 2 5 8 3 3 4
7 3 6 9 5 6 3
8 27 4 31 1 13 83

Categories

Resources