Append loop in Panda/Python without duplicating header row? - python

So right now when I run this, I get a final output that includes two header columns. As a result it won't let me write this to a .csv either. How would I fix this so that it only includes the column from the first table? (seeing the rest of the column names are the same throughout)
import pandas as pd
import urllib.request
import bs4 as bs
urls = ['https://fantasysportsdaily.net/bsl/boxes/1-1.html',
'https://fantasysportsdaily.net/bsl/boxes/1-2.html'
]
final = []
for url in urls:
df = pd.read_html(url, header=0)
format1 = df[1].iloc[:, : 16]
colname1 = format1.columns[0]
format1.insert(1, 'Team', colname1)
format1.rename(columns = {list(format1)[0]: 'Player'}, inplace = True)
format2 = format1.drop(format1[format1.Player == 'TEAM TOTALS'].index)
team1 = format2.drop(format2[format2.Player == 'PERCENTAGES'].index)
format3 = df[2].iloc[:, : 16]
colname2 = format3.columns[0]
format3.insert(1, 'Team', colname2)
format3.rename(columns = {list(format3)[0]: 'Player'}, inplace = True)
format4 = format3.drop(format3[format3.Player == 'TEAM TOTALS'].index)
team2 = format4.drop(format4[format4.Player == 'PERCENTAGES'].index)
both_teams = [team1, team2]
combined = pd.concat(both_teams)
final.append(combined, ignore_index=True)
print(final)
##final.to_csv ('boxes.csv', index = True, header=True)

Please Pay attention to the following points.
since you are calling the same host so you've to use the same session to avoid getting blocked or consider your requests as DDOS attack since pd.read_html is using requests underneath with a different session on each request. so that's better to use one session for the same host. That's why I've used requests.Session() ref
Please try to follow The DRY Principle as you don't need to repeat your code! use a Function or Class as I've used within the code.
Finally, iloc[] is actually can drop columns and rows as well! so you don't need to circle yourself.
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0'
}
def myformat(content, key):
df = pd.read_html(content)[key].iloc[:-2, :-1]
df.insert(1, 'Team', df.columns[0])
df.rename(columns={df.columns[0]: "Player"}, inplace=True)
return df
def main(url):
with requests.Session() as req:
req.headers.update(headers)
allin = []
for num in range(1, 3):
r = req.get(url.format(num))
df1 = myformat(r.content, 1)
df2 = myformat(r.content, 2)
final = pd.concat([df1, df2], ignore_index=True)
allin.append(final)
target = pd.concat(allin, ignore_index=True)
print(target)
main('https://fantasysportsdaily.net/bsl/boxes/1-{}.html')
Output:
Player Team POS MIN FG FGA 3P ... REB A PF ST TO BL PTS
0 David Robinson Spurs C 32 6 14 0 ... 15 1.0 4.0 2.0 3.0 2.0 21.0
1 Reggie Miller Spurs SG 42 5 12 3 ... 6 3.0 0.0 2.0 3.0 0.0 17.0
2 Tom Gugliotta Spurs PF 25 6 7 0 ... 11 1.0 4.0 2.0 3.0 1.0 17.0
3 Allan Houston Spurs PG 27 5 19 2 ... 1 1.0 1.0 2.0 3.0 0.0 12.0
4 Sean Elliott Spurs SF 34 3 6 0 ... 4 0.0 3.0 2.0 2.0 0.0 7.0
5 Rik Smits Spurs PF 32 1 10 0 ... 9 1.0 4.0 0.0 1.0 0.0 6.0
6 Mark Jackson Spurs PG 21 3 9 0 ... 3 7.0 1.0 2.0 6.0 0.0 6.0
7 Will Perdue Spurs C 16 1 3 0 ... 1 1.0 1.0 1.0 0.0 1.0 4.0
8 Robert Pack Spurs SG 12 0 2 0 ... 0 1.0 1.0 0.0 0.0 0.0 0.0
9 John Starks Lakers SG 39 10 20 2 ... 7 1.0 2.0 2.0 4.0 0.0 27.0
10 Magic Johnson Lakers PG 36 7 10 1 ... 7 7.0 1.0 1.0 2.0 0.0 20.0
11 Eddie Jones Lakers SF 31 4 7 1 ... 5 3.0 0.0 2.0 2.0 0.0 12.0
12 Elden Campbell Lakers PF 24 5 10 0 ... 5 0.0 4.0 0.0 0.0 1.0 12.0
13 Cedric Ceballos Lakers PF 32 3 11 0 ... 11 3.0 6.0 4.0 7.0 0.0 10.0
14 Vlade Divac Lakers C 24 3 6 0 ... 9 1.0 5.0 1.0 1.0 1.0 6.0
15 Pervis Ellison Lakers C 18 3 4 0 ... 4 0.0 6.0 1.0 0.0 1.0 6.0
16 Nick Van Exel Lakers PG 17 3 7 0 ... 1 3.0 0.0 0.0 1.0 0.0 6.0
17 Corie Blount Lakers C 6 0 0 0 ... 4 0.0 1.0 1.0 1.0 1.0 4.0
18 Anthony Peeler Lakers SF 13 0 4 0 ... 1 1.0 0.0 0.0 2.0 0.0 0.0
19 Terry Porter Timberwolves PG 31 6 15 2 ... 4 1.0 2.0 1.0 4.0 0.0 16.0
20 Kendall Gill Timberwolves PG 26 6 10 1 ... 5 5.0 0.0 0.0 3.0 0.0 15.0
21 J.R. Rider Timberwolves SG 34 7 14 0 ... 5 4.0 4.0 0.0 6.0 1.0 14.0
22 Larry Johnson Timberwolves SF 31 3 13 0 ... 10 3.0 1.0 0.0 1.0 0.0 8.0
23 LaPhonso Ellis Timberwolves PF 30 1 13 0 ... 15 2.0 3.0 0.0 1.0 1.0 6.0
24 J.R. Reid Timberwolves PF 18 1 4 0 ... 3 0.0 1.0 2.0 3.0 0.0 4.0
25 Mark Davis Timberwolves SF 17 1 3 0 ... 3 0.0 0.0 0.0 1.0 0.0 2.0
26 Eric Riley Timberwolves C 13 1 2 0 ... 5 0.0 0.0 1.0 1.0 0.0 2.0
27 Kevin Garnett Timberwolves C 35 0 8 0 ... 9 2.0 2.0 2.0 0.0 2.0 1.0
28 Micheal Williams Timberwolves PG 5 0 2 0 ... 2 1.0 1.0 0.0 0.0 0.0 0.0
29 Jim McIlvaine Bullets C 30 5 8 0 ... 6 1.0 2.0 0.0 1.0 6.0 16.0
30 Ledell Eackles Bullets SG 30 6 9 2 ... 9 1.0 2.0 3.0 2.0 0.0 15.0
31 Juwan Howard Bullets PF 29 4 10 0 ... 6 1.0 2.0 1.0 1.0 0.0 15.0
32 Avery Johnson Bullets PG 35 6 16 0 ... 2 6.0 2.0 2.0 2.0 0.0 14.0
33 Tim Legler Bullets SF 28 5 13 0 ... 2 0.0 0.0 1.0 1.0 0.0 10.0
34 David Benoit Bullets C 18 2 8 1 ... 10 1.0 1.0 1.0 1.0 0.0 7.0
35 Brent Price Bullets SG 18 2 6 1 ... 2 2.0 1.0 1.0 0.0 0.0 5.0
36 Rasheed Wallace Bullets SF 22 2 6 0 ... 1 2.0 1.0 1.0 0.0 0.0 4.0
37 Cory Alexander Bullets PG 9 0 2 0 ... 2 2.0 0.0 0.0 3.0 0.0 1.0
38 Mitchell Butler Bullets PF 19 0 1 0 ... 6 0.0 1.0 0.0 0.0 0.0 0.0
[39 rows x 17 columns]

pandas.concat() can concatenate a list of same structure pandas objects into one:
final = []
for url in urls:
...
combined = pd.concat(both_teams)
final.append(combined)
final_df = pd.concat(final, ignore_index=True)
print(final_df)

Related

How to sum by agrouping a specific column using Python?

I`m not able to sum by each group/column. The idea is to create a new column on this data set with the sum by "store":
PNO store ForecastSUM
17 20054706 WITZ 0.0
8 8007536 WITZ 0.0
2 8007205 WITZ 0.0
12 8601965 WITZ 0.0
5 8007239 WITZ 0.0
14 20054706 ROT 1.0
1 8007205 ROT 7.0
9 8601965 ROT 2.0
6 8007536 ROT 3.0
3 8007239 ROT 2.0
15 20054706 MAR 1.0
7 8007536 MAEG 6.0
10 8601965 MAEG 4.0
4 8007239 MAEG 3.0
0 8007205 MAEG 6.0
13 20054706 BUD 1.0
11 8601965 AYC 0.0
16 20054706 AYC 0.0
I am trying to apply this code:
copiedDataWHSE['sumWHSE'] = copiedDataWHSE.groupby(['ForecastSUM']).agg({'ForecastSUM': "sum"})
and the result I am getting is:
PNO store ForecastSUM sumWHSE
17 20054706 WITZ 0.0 NaN
8 8007536 WITZ 0.0 NaN
2 8007205 WITZ 0.0 4.0
12 8601965 WITZ 0.0 NaN
5 8007239 WITZ 0.0 NaN
14 20054706 ROT 1.0 NaN
1 8007205 ROT 7.0 3.0
9 8601965 ROT 2.0 NaN
6 8007536 ROT 3.0 12.0
3 8007239 ROT 2.0 6.0
15 20054706 MAR 1.0 NaN
7 8007536 MAEG 6.0 7.0
10 8601965 MAEG 4.0 NaN
4 8007239 MAEG 3.0 4.0
0 8007205 MAEG 6.0 0.0
13 20054706 BUD 1.0 NaN
11 8601965 AYC 0.0 NaN
16 20054706 AYC 0.0 NaN
Which is wrong, since I would like to have as example, once the store is ROT, the sumWHSE column should receive 19.
As #sammywemmy mentions, you need to group on store, not on ForecastSUM:
store_groupby = df.groupby(['store']).agg({'ForecastSUM': "sum"})
However, since it's a groupby of length 6, you can't assign it back to the dataframe as a new column.
What I would do is turn the groupby into a dictionary, then assign() it to a new column with a lambda function.
store_groupby_dict = store_groupby.to_dict()
df = df.assign(store_total = lambda x: store_groupby_dict[x.store])
Doing the same thing with apply() makes it a little more readable:
df['store_total'] = df.store.apply(lambda x: store_groupby_dict[x])

Substraction between two dataframe's column

I have different dataset total product data and selling data. I need to find out the Remaining products from product data comparing selling data. So, for that, I have done some general preprocessing and make both dataframe ready to use. But can't get it how to compare them.
DataFrame 1:
Item Qty
0 BUDS2 1.0
1 C100 4.0
2 CK1 5.0
3 DM10 10.0
4 DM7 2.0
5 DM9 9.0
6 HM12 6.0
7 HM13 4.0
8 HOCOX25(CTYPE) 1.0
9 HOCOX30USB 1.0
10 RM510 8.0
11 RM512 8.0
12 RM569 1.0
13 RM711 2.0
14 T2C 1.0
and
DataFrame 2 :
Item Name Quantity
0 BUDS2 2.0
1 C100 5.0
2 C101CABLE 1.0
3 CK1 8.0
4 DM10 12.0
5 DM7 5.0
6 DM9 10.0
7 G20CTYPE 1.0
8 G20NORMAL 1.0
9 HM12 9.0
10 HM13 8.0
11 HM9 3.0
12 HOCOX25CTYPE 3.0
13 HOCOX30USB 3.0
14 M45 1.0
15 REMAXRC080M 2.0
16 RM510 11.0
17 RM512 10.0
18 RM569 2.0
19 RM711 3.0
20 T2C 1.0
21 Y1 3.0
22 ZIRCON 1.0
I want to see the available quantity for each item. And I want to get an output like dataframe 2 but the Quantity column values will be changed after doing the subtraction operation. How can I do that ??
Expected Output:
Item Name Quantity
0 BUDS2 1.0
1 C100 1.0
2 C101CABLE 1.0
3 CK1 3.0
4 DM10 2.0
5 DM7 3.0
6 DM9 1.0
7 G20CTYPE 1.0
8 G20NORMAL 1.0
9 HM12 3.0
10 HM13 4.0
11 HM9 3.0
12 HOCOX25CTYPE 2.0
13 HOCOX30USB 2.0
14 M45 1.0
15 REMAXRC080M 2.0
16 RM510 3.0
17 RM512 2.0
18 RM569 1.0
19 RM711 1.0
20 T2C 0.0
21 Y1 3.0
22 ZIRCON 1.0
This can help by merging two dataframe:
df_new = df_2.merge(df_1,'left',left_on='Item Name',right_on='Item').fillna(0)
df_new.Quantity = df_new.Quantity - df_new.Qty
df_new = df_new.drop(['Item','Qty'],axis=1)
df_new output:
Item Name Quantity
0 BUDS2 1.0
1 C100 1.0
2 C101CABLE 1.0
3 CK1 3.0
4 DM10 2.0
5 DM7 3.0
6 DM9 1.0
7 G20CTYPE 1.0
8 G20NORMAL 1.0
9 HM12 3.0
10 HM13 4.0
11 HM9 3.0
12 HOCOX25CTYPE 3.0
13 HOCOX30USB 2.0
14 M45 1.0
15 REMAXRC080M 2.0
16 RM510 3.0
17 RM512 2.0
18 RM569 1.0
19 RM711 1.0
20 T2C 0.0
21 Y1 3.0
22 ZIRCON 1.0

Calculating RSI in Python

I am trying to calculate RSI on a dataframe
df = pd.DataFrame({"Close": [100,101,102,103,104,105,106,105,103,102,103,104,103,105,106,107,108,106,105,107,109]})
df["Change"] = df["Close"].diff()
df["Gain"] = np.where(df["Change"]>0,df["Change"],0)
df["Loss"] = np.where(df["Change"]<0,abs(df["Change"]),0 )
df["Index"] = [x for x in range(len(df))]
print(df)
Close Change Gain Loss Index
0 100 NaN 0.0 0.0 0
1 101 1.0 1.0 0.0 1
2 102 1.0 1.0 0.0 2
3 103 1.0 1.0 0.0 3
4 104 1.0 1.0 0.0 4
5 105 1.0 1.0 0.0 5
6 106 1.0 1.0 0.0 6
7 105 -1.0 0.0 1.0 7
8 103 -2.0 0.0 2.0 8
9 102 -1.0 0.0 1.0 9
10 103 1.0 1.0 0.0 10
11 104 1.0 1.0 0.0 11
12 103 -1.0 0.0 1.0 12
13 105 2.0 2.0 0.0 13
14 106 1.0 1.0 0.0 14
15 107 1.0 1.0 0.0 15
16 108 1.0 1.0 0.0 16
17 106 -2.0 0.0 2.0 17
18 105 -1.0 0.0 1.0 18
19 107 2.0 2.0 0.0 19
20 109 2.0 2.0 0.0 20
RSI_length = 7
Now, I am stuck in calculating "Avg Gain". The logic for average gain here is for first average gain at index 6 will be mean of "Gain" for RSI_length periods. For consecutive "Avg Gain" it should be
(Previous Avg Gain * (RSI_length - 1) + "Gain") / RSI_length
I tried the following but does not work as expected
df["Avg Gain"] = np.nan
df["Avg Gain"] = np.where(df["Index"]==(RSI_length-1),df["Gain"].rolling(window=RSI_length).mean(),\
np.where(df["Index"]>(RSI_length-1),(df["Avg Gain"].iloc[df["Index"]-1]*(RSI_length-1)+df["Gain"]) / RSI_length,np.nan))
The output of this code is:
print(df)
Close Change Gain Loss Index Avg Gain
0 100 NaN 0.0 0.0 0 NaN
1 101 1.0 1.0 0.0 1 NaN
2 102 1.0 1.0 0.0 2 NaN
3 103 1.0 1.0 0.0 3 NaN
4 104 1.0 1.0 0.0 4 NaN
5 105 1.0 1.0 0.0 5 NaN
6 106 1.0 1.0 0.0 6 0.857143
7 105 -1.0 0.0 1.0 7 NaN
8 103 -2.0 0.0 2.0 8 NaN
9 102 -1.0 0.0 1.0 9 NaN
10 103 1.0 1.0 0.0 10 NaN
11 104 1.0 1.0 0.0 11 NaN
12 103 -1.0 0.0 1.0 12 NaN
13 105 2.0 2.0 0.0 13 NaN
14 106 1.0 1.0 0.0 14 NaN
15 107 1.0 1.0 0.0 15 NaN
16 108 1.0 1.0 0.0 16 NaN
17 106 -2.0 0.0 2.0 17 NaN
18 105 -1.0 0.0 1.0 18 NaN
19 107 2.0 2.0 0.0 19 NaN
20 109 2.0 2.0 0.0 20 NaN
Desired output is:
Close Change Gain Loss Index Avg Gain
0 100 NaN 0 0 0 NaN
1 101 1.0 1 0 1 NaN
2 102 1.0 1 0 2 NaN
3 103 1.0 1 0 3 NaN
4 104 1.0 1 0 4 NaN
5 105 1.0 1 0 5 NaN
6 106 1.0 1 0 6 0.857143
7 105 -1.0 0 1 7 0.734694
8 103 -2.0 0 2 8 0.629738
9 102 -1.0 0 1 9 0.539775
10 103 1.0 1 0 10 0.605522
11 104 1.0 1 0 11 0.661876
12 103 -1.0 0 1 12 0.567322
13 105 2.0 2 0 13 0.771990
14 106 1.0 1 0 14 0.804563
15 107 1.0 1 0 15 0.832483
16 108 1.0 1 0 16 0.856414
17 106 -2.0 0 2 17 0.734069
18 105 -1.0 0 1 18 0.629202
19 107 2.0 2 0 19 0.825030
20 109 2.0 2 0 20 0.992883
​
(edited)
Here's an implementation of your formula.
RSI_LENGTH = 7
rolling_gain = df["Gain"].rolling(RSI_LENGTH).mean()
df.loc[RSI_LENGTH-1, "RSI"] = rolling_gain[RSI_LENGTH-1]
for inx in range(RSI_LENGTH, len(df)):
df.loc[inx, "RSI"] = (df.loc[inx-1, "RSI"] * (RSI_LENGTH -1) + df.loc[inx, "Gain"]) / RSI_LENGTH
The result is:
Close Change Gain Loss Index RSI
0 100 NaN 0.0 0.0 0 NaN
1 101 1.0 1.0 0.0 1 NaN
2 102 1.0 1.0 0.0 2 NaN
3 103 1.0 1.0 0.0 3 NaN
4 104 1.0 1.0 0.0 4 NaN
5 105 1.0 1.0 0.0 5 NaN
6 106 1.0 1.0 0.0 6 0.857143
7 105 -1.0 0.0 1.0 7 0.734694
8 103 -2.0 0.0 2.0 8 0.629738
9 102 -1.0 0.0 1.0 9 0.539775
10 103 1.0 1.0 0.0 10 0.605522
11 104 1.0 1.0 0.0 11 0.661876
12 103 -1.0 0.0 1.0 12 0.567322
13 105 2.0 2.0 0.0 13 0.771990
14 106 1.0 1.0 0.0 14 0.804563
15 107 1.0 1.0 0.0 15 0.832483
16 108 1.0 1.0 0.0 16 0.856414
17 106 -2.0 0.0 2.0 17 0.734069
18 105 -1.0 0.0 1.0 18 0.629202
19 107 2.0 2.0 0.0 19 0.825030
20 109 2.0 2.0 0.0 20 0.992883

How do you get a dataframe to perform a SQL groupby operation on multiple columns?

I have a dataframe that looks like the following:
index Player Team Matchup Game_Date WL Min PTS FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% OREB DREB REB AST STL BLK TOV PF Plus_Minus Triple_Double Double_Double FPT 2PA 2PM 2P%
1 John Long TOR TOR # BOS 04/20/1997 W 6 0 0 3.0 0.0 0 1.0 0.0 0 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 3.00 2.0 0 0.000000
2 Walt Williams TOR TOR # BOS 04/20/1997 W 29 7 3 9.0 33.3 1 2.0 50.0 0 0 0.0 0.0 3.0 3.0 2.0 2.0 1.0 1.0 5.0 20.0 0.0 1.0 21.25 7.0 2 28.571429
3 Todd Day BOS BOS vs. TOR 04/20/1997 L 36 22 8 17.0 47.1 4 8.0 50.0 2 2 100.0 0.0 6.0 6.0 4.0 0.0 0.0 3.0 2.0 -21.0 0.0 1.0 37.50 9.0 4 44.444444
4 Doug Christie TOR TOR # BOS 04/20/1997 W 39 27 8 19.0 42.1 3 9.0 33.3 8 8 100.0 0.0 1.0 1.0 5.0 3.0 1.0 0.0 2.0 30.0 0.0 1.0 46.75 10.0 5 50.000000
5 Brett Szabo BOS BOS vs. TOR 04/20/1997 L 25 5 1 4.0 25.0 0 0.0 0 3 4 75.0 1.0 2.0 3.0 1.0 0.0 0.0 0.0 5.0 -11.0 0.0 1.0 11.75 4.0 1 25.000000
I would like to create a dataframe that groups on team and game_date. I am trying the following code:
df2 = df.groupby(['Game_Date', 'Team', 'Matchup', 'WL'], as_index=False)['Min', 'PTS', 'FGM', 'FGA', '3PM', '3PA', 'FTM', 'FTA',
'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', '2PM', '2PA',
'PF'].sum()
However, when I run it, I get the following dataframe:
Game_Date Team Matchup WL Min PTS FGM FGA 3PM 3PA OREB DREB REB AST STL BLK TOV 2PM 2PA PF
04/11/2018 DEN DEN # MIN L 21966 12552 4707 11506.0 2615 5230.0 1046.0 3138.0 4184.0 2615.0 523.0 523.0 1046.0 2092 6276.0 2092.0
04/11/2018 MEM MEM # OKC L 125520 64329 23012 47593.0 6799 16213.0 4707.0 16736.0 21443.0 11506.0 4707.0 1046.0 6799.0 16213 31380.0 11506.0
04/11/2018 MIN MIN vs. DEN W 40271 20397 7322 15167.0 523 2092.0 1046.0 4707.0 5753.0 2615.0 1569.0 1046.0 2092.0 6799 13075.0 4184.0
04/11/2018 NOP NOP vs. SAS W 124997 63806 27196 46024.0 2615 9937.0 5753.0 20920.0 26673.0 15690.0 5753.0 2615.0 9937.0 24581 36087.0 10460.0
04/11/2018 OKC OKC vs. MEM W 126043 71651 24581 44455.0 10460 22489.0 4184.0 19351.0 23535.0 16736.0 4184.0 1569.0 7322.0 14121 21966.0 12029.0
Why is it grouping incorrectly?

Pandas, create new columns based on existing with repeated count

It's a bit complicated for explain, so I'll do my best. I have a pandas with two columns: hour (from 1 to 24) and value(corresponding to each hour). Dataset index is huge but column hour is repeated on that 24 hours basis (from 1 to 24). I am trying to create new 24 columns: value -1, value -2, value -3...value -24 that will correspond to each row and value from -1 hour, value from -2 hours(from above rows).
hour | value | value -1 | value -2 | value -3| ... | value - 24
1 10 0 0 0 0
2 11 10 0 0 0
3 12 11 10 0 0
4 13 12 11 10 0
...
24 32 31 30 29 0
1 33 32 31 30 10
2 34 33 32 31 11
and so on...
All value numbers are for the example. As I said there are lots of rows, not only 24 for all hours in a day time but all following time series from 1 to 24 and etc.
Thanks in advance and may the force be with you!
Is this what you need?
df = pd.DataFrame([[1,10],[2,11],
[3,12],[4,13]], columns=['hour','value'])
for i in range(1, 24):
df['value -' + str(i)] = df['value'].shift(i).fillna(0)
result:
Is this what you are looking for?
import pandas as pd
df = pd.DataFrame({'hour': list(range(24))*2,
'value': list(range(48))})
shift_cols_n = 10
for shift in range(1, shift_cols_n):
new_columns_name = 'value - ' + str(shift)
# Assuming that you don't have any NAs in your dataframe
df[new_columns_name] = df['value'].shift(shift).fillna(0)
# A safer (and a less simple) way, in case you have NAs in your dataframe
df[new_columns_name] = df['value'].shift(shift)
df.loc[:shift, new_columns_name] = 0
print(df.head(9))
hour value value - 1 value - 2 value - 3 value - 4 value - 5 \
0 0 0 0.0 0.0 0.0 0.0 0.0
1 1 1 0.0 0.0 0.0 0.0 0.0
2 2 2 1.0 0.0 0.0 0.0 0.0
3 3 3 2.0 1.0 0.0 0.0 0.0
4 4 4 3.0 2.0 1.0 0.0 0.0
5 5 5 4.0 3.0 2.0 1.0 0.0
6 6 6 5.0 4.0 3.0 2.0 1.0
7 7 7 6.0 5.0 4.0 3.0 2.0
8 8 8 7.0 6.0 5.0 4.0 3.0
value - 6 value - 7 value - 8 value - 9
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 1.0 0.0 0.0 0.0
8 2.0 1.0 0.0 0.0

Categories

Resources