Pandas Pivot Table Based on Specific Column Value [duplicate] - python

This question already has answers here:
transform dataframe according to index and labels
(2 answers)
Closed 1 year ago.
I need to pivot my data in a df like shown below based on a specific date in the YYMMDD and HHMM column "20180101 100". This specific date represents a new category of data with equal amounts of rows. I plan on replacing the repeating column names in the output with unique names. Suppose my data looks like this below.
YYMMDD HHMM BestGuess(kWh)
0 20180101 100 20
1 20180101 200 70
0 20201231 2100 50
1 20201231 2200 90
2 20201231 2300 70
3 20210101 000 40
4 20180101 100 5
5 20180101 200 7
6 20201231 2100 2
7 20201231 2200 3
8 20201231 2300 1
9 20210101 000 4
I need the new df (dfpivot) to look like this:
YYMMDD HHMM BestGuess(kWh) BestGuess(kWh)
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 000 40 4

Does this suffice?
cols = ['YYMMDD', 'HHMM']
df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
BestGuess(kWh)
0 1
YYMMDD HHMM
20180101 100 20 5
200 70 7
20201231 2100 50 2
2200 90 3
2300 70 1
20210101 0 40 4
More fully baked
cols = ['YYMMDD', 'HHMM']
temp = df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
temp.columns = [f'{l0} {l1}' for l0, l1 in temp.columns]
temp.reset_index()
YYMMDD HHMM BestGuess(kWh) 0 BestGuess(kWh) 1
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 0 40 4

Related

Pivoting a Pandas Table - Peculiar Problem

It seemed I had a simple problem of pivoting a pandas Table, but unfortunately, the problem seems a bit complicated to me.
I am providing a tiny sample table and the output I am looking to give the example of the problem I am facing:
Say, I have a table like this:
df =
AF BF AT BT
1 4 100 70
2 7 102 66
3 11 200 90
4 13 300 178
5 18 403 200
So I need it into a wide/pivot format but the parameter name in each case will be set as the same. ( I am not looking to subset the string if possible)
My output table should like the following:
dfout =
PAR F T
A 1 100
B 4 70
A 2 102
B 7 66
A 3 200
B 11 90
A 4 300
B 13 178
A 5 403
B 18 200
I tried pivoting, but not able to achieve the desired output. Any help will be immensely appreciated. Thanks.
You can use pandas wide_to_long, but first you have to reorder the columns:
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]).reset_index(),
stubnames=["F", "T"],
i="index",
sep="",
j="PAR",
suffix=".",
).reset_index("PAR")
PAR F T
index
0 A 1 100
1 A 2 102
2 A 3 200
3 A 4 300
4 A 5 403
0 B 4 70
1 B 7 66
2 B 11 90
3 B 13 178
4 B 18 200
Alternatively, you could use the pivot_longer function from the pyjanitor, to reshape the data :
# pip install pyjanitor
import janitor
df.pivot_longer(names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
Update: Using data from #jezrael:
df
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]),
stubnames=["F", "T"],
i="C",
sep="",
j="PAR",
suffix=".",
).reset_index()
C PAR F T
0 10 A 1 100
1 20 A 2 102
2 30 A 3 200
3 40 A 4 300
4 50 A 5 403
5 10 B 4 70
6 20 B 7 66
7 30 B 11 90
8 40 B 13 178
9 50 B 18 200
if you use the pivot_longer function:
df.pivot_longer(index="C", names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
pivot_longer is being worked on; in the next release of pyjanitor it should be much better. But pd.wide_to_long can solve your task pretty easily. The other answers can easily solve it as well.
Idea is create MultiIndex in columns by first and last letter and then use DataFrame.stack for reshape, last some data cleaning in MultiIndex in index:
df.columns= [df.columns.str[-1], df.columns.str[0]]
df = df.stack().reset_index(level=0, drop=True).rename_axis('PAR').reset_index()
print (df)
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
EDIT:
print (df)
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
df = df.set_index('C')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[-1],
df.columns.str[0]], names=[None,'PAR'])
df = df.stack().reset_index()
print (df)
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
Let's try:
(pd.wide_to_long(df.reset_index(),stubnames=['A','B'],
i='index',
j='PAR', sep='', suffix='[FT]')
.stack().unstack('PAR').reset_index(level=1)
)
Output:
PAR level_1 F T
index
0 A 1 100
0 B 4 70
1 A 2 102
1 B 7 66
2 A 3 200
2 B 11 90
3 A 4 300
3 B 13 178
4 A 5 403
4 B 18 200

Create a text file using the pandas dataframes

I am new to the python . I have the following dataframe
Document_ID OFFSET PredictedFeature word
0 0 2000 abcd
0 8 2000 is
0 16 2200 a
0 23 2200 good
0 25 315 XXYYZZ
1 0 2100 but
1 5 2100 it
1 7 2100 can
1 10 315 XXYYZZ
Now, In this dataframe what I trying to do is make a file which can be in a readable formt like ,
abcd is 2000, a good 2200
but it can 2100,
PredictedData feature offset endoffset
abcd is 2000 0 8
a good 2200 16 23
NewLine 315 25 25
but it can 2100 0 7
this type of data. where if you see I trying same sequence of predictedFeatures are coming then I am concatening same words with it's value. If there is feature 315 then I am giving a new line to it.
SO, Is there any way though which I can do this ? Any help will be appreciated.
Thnaks
IIUC, you can do groupby():
(df.groupby(['Document_ID', 'PredictedFeature'],as_index=False)
.agg({'word':(' '.join),
'OFFSET':('min','max')
})
)
Output:
Document_ID PredictedFeature word OFFSET
join min max
0 0 315 XXYYZZ 25 25
1 0 2000 abcd is 0 8
2 0 2200 a good 16 23
3 1 315 XXYYZZ 10 10
4 1 2100 but it can 0 7

Problem with pandas.DataFrame.shift function

I have the following dataframe in python:
months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
'month' : months,
'd1' : data1,
'd2' : 0,
});
and I want to calculate the column d2, in the following way:
month d1 d2
0 1 100 101.0
1 2 200 303.0
2 3 300 606.0
3 4 400 1010.0
4 5 500 1515.0
5 6 600 2121.0
6 7 700 2828.0
7 8 800 3636.0
8 9 900 4545.0
9 10 1000 5555.0
10 11 1100 6666.0
11 12 1200 7878.0
I am doing it in the following way:
df['d2'] = (df['d2'].shift(1) + df['d1']) + df['month']
but the result is not what was expected:
month d1 d2
0 1 100 NaN
1 2 200 202.0
2 3 300 303.0
3 4 400 404.0
4 5 500 505.0
5 6 600 606.0
6 7 700 707.0
7 8 800 808.0
8 9 900 909.0
9 10 1000 1010.0
10 11 1100 1111.0
11 12 1200 1212.0
I do not know if I am clear in my request, I thank who can help me.
IIUC, you're looking for cumsum:
df['d2'] = (df.d1+df.month).cumsum()
>>> df
month d1 d2
0 1 100 101
1 2 200 303
2 3 300 606
3 4 400 1010
4 5 500 1515
5 6 600 2121
6 7 700 2828
7 8 800 3636
8 9 900 4545
9 10 1000 5555
10 11 1100 6666
11 12 1200 7878
What you need is cumulative sum :)
df['d2'] = df.d1.cumsum()
print(df)
month d1 d2
0 1 100 100
1 2 200 300
2 3 300 600
3 4 400 1000
4 5 500 1500
5 6 600 2100
6 7 700 2800
7 8 800 3600
8 9 900 4500
9 10 1000 5500
10 11 1100 6600
11 12 1200 7800

Dropping certain values in columns depending on contents of other column

I have a dataframe that looks like this:
Deal Year Financial Data1 Financial Data2 Financial Data3 Quarter
0 1 1991/1/1 122 123 120 1
3 1 1991/1/1 122 123 120 2
6 1 1991/1/1 122 123 120 3
1 2 1992/1/1 85 90 80 4
4 2 1992/1/1 85 90 80 5
7 2 1992/1/1 85 90 80 6
2 3 1993/1/1 85 90 100 1
5 3 1993/1/1 85 90 100 2
8 3 1993/1/1 85 90 100 3
However I only want the Financial Data1 displayed for the first quarter in each deal and The whole thing combined into one column again.
The end result should look something like this:
Deal Year Financial Data Quarter
0 1 1991/1/1 122 1
3 1 1991/1/1 123 2
6 1 1991/1/1 120 3
1 2 1992/1/1 85 4
4 2 1992/1/1 90 5
7 2 1992/1/1 80 6
2 3 1993/1/1 85 1
5 3 1993/1/1 90 2
8 3 1993/1/1 100 3
Okie dokie, using np.where() I think this does what you're trying to do:
import pandas as pd
import numpy as np
df = pd.read_fwf(StringIO(
"""Deal Year Financial_Data1 Financial_Data2 Financial_Data3 Quarter
1 1991/1/1 122 123 120 1
1 1991/1/1 122 123 120 2
1 1991/1/1 122 123 120 3
2 1992/1/1 85 90 80 4
2 1992/1/1 85 90 80 5
2 1992/1/1 85 90 80 6
3 1993/1/1 85 90 100 1
3 1993/1/1 85 90 100 2
3 1993/1/1 85 90 100 3"""))
df['Financial_Data'] = np.where(
# if 'Quarter'%3==1
df['Quarter']%3==1,
# Then return Financial_Data1
df['Financial_Data1'],
# Else
np.where(
# If 'Quarter'%3==2
df['Quarter']%3==2,
# Then return Financial_Data2
df['Financial_Data2'],
# Else return Financial_Data3
df['Financial_Data3']
)
)
# Drop Old Columns
df = df.drop(['Financial_Data1', 'Financial_Data2', 'Financial_Data3'], axis=1)
print(df)
Output:
Deal Year Quarter Financial_Data
0 1 1991/1/1 1 122
1 1 1991/1/1 2 123
2 1 1991/1/1 3 120
3 2 1992/1/1 4 85
4 2 1992/1/1 5 90
5 2 1992/1/1 6 80
6 3 1993/1/1 1 85
7 3 1993/1/1 2 90
8 3 1993/1/1 3 100
(PS: I wasn't 100% sure how you intended on dealing with Quarter 4-6, in this example I just treat them as 1-3)

Add a column in dataframe conditionally from values in other dataframe python

i have a table in pandas df
id product_1 count
1 100 10
2 200 20
3 100 30
4 400 40
5 500 50
6 200 60
7 100 70
also i have another table in dataframe df2
product score
100 5
200 10
300 15
400 20
500 25
600 30
700 35
i have to create a new column score in my first df, taking values of score from df2 with respect to product_1.
my final output should be. df =
id product_1 count score
1 100 10 5
2 200 20 10
3 100 30 5
4 400 40 20
5 500 50 25
6 200 60 10
7 100 70 5
Any ideas how to achieve it?
Use map:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
print (df)
id product_1 count score
0 1 100 10 5
1 2 200 20 10
2 3 100 30 5
3 4 400 40 20
4 5 500 50 25
5 6 200 60 10
6 7 100 70 5
Or merge:
df = pd.merge(df,df2, left_on='product_1', right_on='product', how='left')
print (df)
id product_1 count product score
0 1 100 10 100 5
1 2 200 20 200 10
2 3 100 30 100 5
3 4 400 40 400 20
4 5 500 50 500 25
5 6 200 60 200 10
6 7 100 70 100 5
EDIT by comment:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
df['final_score'] = (df['count'].mul(0.6).div(df.id)).add(df.score.mul(0.4))
print (df)
id product_1 count score final_score
0 1 100 10 5 8.0
1 2 200 20 10 10.0
2 3 100 30 5 8.0
3 4 400 40 20 14.0
4 5 500 50 25 16.0
5 6 200 60 10 10.0
6 7 100 70 5 8.0

Categories

Resources