I have the following dataframe in python:
months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
'month' : months,
'd1' : data1,
'd2' : 0,
});
and I want to calculate the column d2, in the following way:
month d1 d2
0 1 100 101.0
1 2 200 303.0
2 3 300 606.0
3 4 400 1010.0
4 5 500 1515.0
5 6 600 2121.0
6 7 700 2828.0
7 8 800 3636.0
8 9 900 4545.0
9 10 1000 5555.0
10 11 1100 6666.0
11 12 1200 7878.0
I am doing it in the following way:
df['d2'] = (df['d2'].shift(1) + df['d1']) + df['month']
but the result is not what was expected:
month d1 d2
0 1 100 NaN
1 2 200 202.0
2 3 300 303.0
3 4 400 404.0
4 5 500 505.0
5 6 600 606.0
6 7 700 707.0
7 8 800 808.0
8 9 900 909.0
9 10 1000 1010.0
10 11 1100 1111.0
11 12 1200 1212.0
I do not know if I am clear in my request, I thank who can help me.
IIUC, you're looking for cumsum:
df['d2'] = (df.d1+df.month).cumsum()
>>> df
month d1 d2
0 1 100 101
1 2 200 303
2 3 300 606
3 4 400 1010
4 5 500 1515
5 6 600 2121
6 7 700 2828
7 8 800 3636
8 9 900 4545
9 10 1000 5555
10 11 1100 6666
11 12 1200 7878
What you need is cumulative sum :)
df['d2'] = df.d1.cumsum()
print(df)
month d1 d2
0 1 100 100
1 2 200 300
2 3 300 600
3 4 400 1000
4 5 500 1500
5 6 600 2100
6 7 700 2800
7 8 800 3600
8 9 900 4500
9 10 1000 5500
10 11 1100 6600
11 12 1200 7800
Related
This question already has answers here:
transform dataframe according to index and labels
(2 answers)
Closed 1 year ago.
I need to pivot my data in a df like shown below based on a specific date in the YYMMDD and HHMM column "20180101 100". This specific date represents a new category of data with equal amounts of rows. I plan on replacing the repeating column names in the output with unique names. Suppose my data looks like this below.
YYMMDD HHMM BestGuess(kWh)
0 20180101 100 20
1 20180101 200 70
0 20201231 2100 50
1 20201231 2200 90
2 20201231 2300 70
3 20210101 000 40
4 20180101 100 5
5 20180101 200 7
6 20201231 2100 2
7 20201231 2200 3
8 20201231 2300 1
9 20210101 000 4
I need the new df (dfpivot) to look like this:
YYMMDD HHMM BestGuess(kWh) BestGuess(kWh)
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 000 40 4
Does this suffice?
cols = ['YYMMDD', 'HHMM']
df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
BestGuess(kWh)
0 1
YYMMDD HHMM
20180101 100 20 5
200 70 7
20201231 2100 50 2
2200 90 3
2300 70 1
20210101 0 40 4
More fully baked
cols = ['YYMMDD', 'HHMM']
temp = df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
temp.columns = [f'{l0} {l1}' for l0, l1 in temp.columns]
temp.reset_index()
YYMMDD HHMM BestGuess(kWh) 0 BestGuess(kWh) 1
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 0 40 4
I have a DataFrame with two columns: 'goods name' and their 'overall sales'. I need to make another column which will contain the indexes with largest sales numerated from 1, 2, 3... Where 1 is the largest number, 2 second largest number and so on.
Hope you can help me.
My dataframe:
lst = [['Keyboard1', 1860], ['Keyboard2', 1650], ['Keyboard3', 900], ['Keyboard4', 1230], ['Keyboard5', 1150], ['Keyboard6', 1345],
['Mouse1', 3100], ['Mouse2', 2900], ['Mouse3', 3050], ['Mouse4', 2750], ['Mouse5', 4100], ['Mouse6', 3910]]
df = pd.DataFrame(lst, columns = ['Goods', 'Sales'])
Goods Sales
0 Keyboard1 1860
1 Keyboard2 1650
2 Keyboard3 900
3 Keyboard4 1230
4 Keyboard5 1150
5 Keyboard6 1345
6 Mouse1 3100
7 Mouse2 2900
8 Mouse3 3050
9 Mouse4 2750
10 Mouse5 4100
11 Mouse6 3910
I'm trying to use this code:
import pandas as pd
import numpy as np
df = df.sort_values('Sales', ascending = False)
df['Largest'] = np.arange(len(df))+1
But I get indexes of Largest values for all goods, I need to get Indexes of Largest values for each type of good separately. My result:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 7
0 Keyboard1 1650 8
5 Keyboard6 1345 9
3 Keyboard4 1230 10
4 Keyboard5 1150 11
2 Keyboard3 900 12
Here is the output I need:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 1
0 Keyboard1 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
Just do:
# remove any number of groups at the end
df['goods_group'] = df['Goods'].str.replace('\d+$', '')
# sort by the new column and sales
df = df.sort_values(['goods_group', 'Sales'], ascending=False)
# create largest column
df['largest'] = df.groupby('goods_group').cumcount() + 1
# drop the new column
res = df.drop('goods_group', 1)
print(res)
Output
Goods Sales largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
Try adding these lines to the end of the code:
df['new'] = df['Goods'].str[:-1]
df['Largest'] = df.groupby('new').cumcount() + 1
df = df.drop('new', axis=1)
print(df)
Output:
Goods Sales new Largest
10 Mouse5 4100 Mouse 1
11 Mouse6 3910 Mouse 2
6 Mouse1 3100 Mouse 3
8 Mouse3 3050 Mouse 4
7 Mouse2 2900 Mouse 5
9 Mouse4 2750 Mouse 6
0 Keyboard1 1860 Keyboard 1
1 Keyboard2 1650 Keyboard 2
5 Keyboard6 1345 Keyboard 3
3 Keyboard4 1230 Keyboard 4
4 Keyboard5 1150 Keyboard 5
2 Keyboard3 900 Keyboard 6
You could groupby, Goods without the digits:
>>> df = df.sort_values('Sales', ascending=False)
>>> df
Goods Sales
10 Mouse5 4100
11 Mouse6 3910
6 Mouse1 3100
8 Mouse3 3050
7 Mouse2 2900
9 Mouse4 2750
0 Keyboard1 1860
1 Keyboard2 1650
5 Keyboard6 1345
3 Keyboard4 1230
4 Keyboard5 1150
2 Keyboard3 900
>>> df['Largest'] = df.groupby(df['Goods'].replace('\d+', '', regex=True)).cumcount() + 1
>>> df
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
It seemed I had a simple problem of pivoting a pandas Table, but unfortunately, the problem seems a bit complicated to me.
I am providing a tiny sample table and the output I am looking to give the example of the problem I am facing:
Say, I have a table like this:
df =
AF BF AT BT
1 4 100 70
2 7 102 66
3 11 200 90
4 13 300 178
5 18 403 200
So I need it into a wide/pivot format but the parameter name in each case will be set as the same. ( I am not looking to subset the string if possible)
My output table should like the following:
dfout =
PAR F T
A 1 100
B 4 70
A 2 102
B 7 66
A 3 200
B 11 90
A 4 300
B 13 178
A 5 403
B 18 200
I tried pivoting, but not able to achieve the desired output. Any help will be immensely appreciated. Thanks.
You can use pandas wide_to_long, but first you have to reorder the columns:
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]).reset_index(),
stubnames=["F", "T"],
i="index",
sep="",
j="PAR",
suffix=".",
).reset_index("PAR")
PAR F T
index
0 A 1 100
1 A 2 102
2 A 3 200
3 A 4 300
4 A 5 403
0 B 4 70
1 B 7 66
2 B 11 90
3 B 13 178
4 B 18 200
Alternatively, you could use the pivot_longer function from the pyjanitor, to reshape the data :
# pip install pyjanitor
import janitor
df.pivot_longer(names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
Update: Using data from #jezrael:
df
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]),
stubnames=["F", "T"],
i="C",
sep="",
j="PAR",
suffix=".",
).reset_index()
C PAR F T
0 10 A 1 100
1 20 A 2 102
2 30 A 3 200
3 40 A 4 300
4 50 A 5 403
5 10 B 4 70
6 20 B 7 66
7 30 B 11 90
8 40 B 13 178
9 50 B 18 200
if you use the pivot_longer function:
df.pivot_longer(index="C", names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
pivot_longer is being worked on; in the next release of pyjanitor it should be much better. But pd.wide_to_long can solve your task pretty easily. The other answers can easily solve it as well.
Idea is create MultiIndex in columns by first and last letter and then use DataFrame.stack for reshape, last some data cleaning in MultiIndex in index:
df.columns= [df.columns.str[-1], df.columns.str[0]]
df = df.stack().reset_index(level=0, drop=True).rename_axis('PAR').reset_index()
print (df)
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
EDIT:
print (df)
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
df = df.set_index('C')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[-1],
df.columns.str[0]], names=[None,'PAR'])
df = df.stack().reset_index()
print (df)
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
Let's try:
(pd.wide_to_long(df.reset_index(),stubnames=['A','B'],
i='index',
j='PAR', sep='', suffix='[FT]')
.stack().unstack('PAR').reset_index(level=1)
)
Output:
PAR level_1 F T
index
0 A 1 100
0 B 4 70
1 A 2 102
1 B 7 66
2 A 3 200
2 B 11 90
3 A 4 300
3 B 13 178
4 A 5 403
4 B 18 200
df
Name Run ID1 ID2
0 A 18 100 500
1 B 19 150 550
2 C 18 200 600
3 D 15 250 650
I then have a variable named max_runs = 20
What I want to do is get the data into this format below. Essentially copy each unique row max_runs - df['Run'] times
df_output
Name Run ID1 ID2
1 A 19 100 500
2 A 20 100 500
3 B 20 150 550
4 C 19 200 600
5 C 20 200 600
6 D 16 250 650
7 D 17 250 650
8 D 18 250 650
9 D 19 250 650
10 D 20 250 650
Thanks for any help and let me know if I need to explain further
You can use repeat to repeat the rows and assign to modify the new run:
(df.loc[df.index.repeat(20-df.Run)]
.assign(Run=lambda x: x.groupby(level=0).cumcount().add(x.Run+1))
.reset_index()
)
Output:
index Name Run ID1 ID2
0 0 A 19 100 500
1 0 A 20 100 500
2 1 B 20 150 550
3 2 C 19 200 600
4 2 C 20 200 600
5 3 D 16 250 650
6 3 D 17 250 650
7 3 D 18 250 650
8 3 D 19 250 650
9 3 D 20 250 650
i have a table in pandas df
id product_1 count
1 100 10
2 200 20
3 100 30
4 400 40
5 500 50
6 200 60
7 100 70
also i have another table in dataframe df2
product score
100 5
200 10
300 15
400 20
500 25
600 30
700 35
i have to create a new column score in my first df, taking values of score from df2 with respect to product_1.
my final output should be. df =
id product_1 count score
1 100 10 5
2 200 20 10
3 100 30 5
4 400 40 20
5 500 50 25
6 200 60 10
7 100 70 5
Any ideas how to achieve it?
Use map:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
print (df)
id product_1 count score
0 1 100 10 5
1 2 200 20 10
2 3 100 30 5
3 4 400 40 20
4 5 500 50 25
5 6 200 60 10
6 7 100 70 5
Or merge:
df = pd.merge(df,df2, left_on='product_1', right_on='product', how='left')
print (df)
id product_1 count product score
0 1 100 10 100 5
1 2 200 20 200 10
2 3 100 30 100 5
3 4 400 40 400 20
4 5 500 50 500 25
5 6 200 60 200 10
6 7 100 70 100 5
EDIT by comment:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
df['final_score'] = (df['count'].mul(0.6).div(df.id)).add(df.score.mul(0.4))
print (df)
id product_1 count score final_score
0 1 100 10 5 8.0
1 2 200 20 10 10.0
2 3 100 30 5 8.0
3 4 400 40 20 14.0
4 5 500 50 25 16.0
5 6 200 60 10 10.0
6 7 100 70 5 8.0