python codition sum by index - python

I want to make a new column by condition sum.
But I don't know how to make it in this status:
index
Date
Open
High
Low
Close
Volume
Change
target
0
2020-12-14
2205
2205
2150
2185
180466
-0.011312
0
1
2020-12-15
2195
2195
2155
2185
139561
0.000000
0
2
2020-12-16
2195
2430
2180
2370
2909662
0.084668
1
3
2020-12-17
2425
2425
2290
2330
587914
-0.016878
0
4
2020-12-18
2335
2355
2225
2295
374375
-0.015021
0
5
2020-12-21
2250
2350
2250
2305
264192
0.004357
0
6
2020-12-22
2330
2345
2245
2255
327715
-0.021692
0
7
2020-12-23
2260
2300
2185
2220
194277
-0.015521
0
8
2020-12-24
2220
2235
2155
2165
208300
-0.024775
0
9
2020-12-28
2170
2180
2110
2120
201740
-0.020785
0
last = len(df)
df['Gap_up_cnt_3d'] = [df.loc[last-j-3:last-j-1,"Close"].sum() if t< 19 else t in list(df.index.values)> ]
I want:
index(9)["Gap_up_cnt_df'] = sum(index(9).Close ,index(8).Close, index(7).Close)
index(8)["Gap_up_cnt_df'] = sum(index(8).Close ,index(7).Close, index(6).Close)
NB: This is not an exact expression, just to convey meaning

Related

Trying to insert the values of one Pandas Dataframe into another Dataframe using DateTimeIndex

can't figure this one out. I want to merge the values of one Dataframe into the other Dataframe, using DateTimeIndex, but I can't seem to make it work. Here are the (printed) dataframes:
open high low close volume trade_count vwap minutes
timestamp
2022-04-11 04:00:00-04:00 17.95 18.13 17.89 17.95 7107 184 17.963452 0
2022-04-11 04:01:00-04:00 17.90 17.94 17.84 17.84 2978 71 17.895107 1
2022-04-11 04:02:00-04:00 17.90 17.94 17.66 17.70 4495 67 17.717039 2
2022-04-11 04:03:00-04:00 17.90 17.94 17.66 17.84 3795 58 17.764274 3
2022-04-11 04:04:00-04:00 17.90 17.94 17.66 17.80 3912 55 17.758436 4
... ... ... ... ... ... ... ... ...
2022-04-11 19:55:00-04:00 18.37 18.44 18.30 18.34 7957 31 18.327004 55
2022-04-11 19:56:00-04:00 18.37 18.44 18.30 18.35 5361 6 18.340563 56
2022-04-11 19:57:00-04:00 18.37 18.44 18.30 18.36 1250 16 18.346664 57
2022-04-11 19:58:00-04:00 18.37 18.44 18.30 18.37 2524 30 18.366807 58
2022-04-11 19:59:00-04:00 18.37 18.44 18.30 18.43 3305 41 18.409014 59
[790 rows x 8 columns]
open high low close volume trade_count vwap
timestamp
2022-04-08 04:00:00-04:00 19.69 19.88 19.69 19.82 8246 157 19.780987
2022-04-08 04:15:00-04:00 19.82 19.84 19.77 19.77 2995 73 19.804855
2022-04-08 04:30:00-04:00 19.80 19.80 19.77 19.80 2630 42 19.794878
2022-04-08 04:45:00-04:00 19.79 19.80 19.79 19.79 2294 23 19.793871
2022-04-08 05:00:00-04:00 19.80 19.81 19.80 19.81 888 15 19.805281
... ... ... ... ... ... ... ...
2022-04-11 18:45:00-04:00 18.40 18.42 18.31 18.34 21587 112 18.368550
2022-04-11 19:00:00-04:00 18.39 18.67 18.35 18.39 26144 72 18.388739
2022-04-11 19:15:00-04:00 18.39 18.39 18.30 18.35 46662 128 18.340306
2022-04-11 19:30:00-04:00 18.33 18.44 18.33 18.44 10784 61 18.351895
2022-04-11 19:45:00-04:00 18.42 18.43 18.30 18.43 24923 163 18.356868
[128 rows x 7 columns]
help!!

How do I get the timeseries data plotting in matplotlib in certain timeframe(like 9 am-3pm)?

I have daily timeseries data from 9am to 3pm but, when I am going plot these data in matplotlib, it is taking 24 hours as a day, but I want it take 9AM TO 3PM as a day . How do I get continuous daily graph for only 9 AM TO 3 PM?
This is what I got from my code.
Here is my sample data. I would like to have time-date v/s close plot without any data gap. Please help me.
Pardon me for any mistake here!
close lot1 lot2 time-date
0 800.0 25 50 2020-09-15 11:01:00
1 900.0 25 50 2020-09-15 14:33:00
2 885.85 50 75 2020-09-16 11:45:00
3 791.4 50 125 2020-09-16 12:50:00
4 1082.45 50 75 2020-09-16 14:30:00
5 1060.1 25 125 2020-09-16 14:35:00
6 855.1 50 100 2020-09-17 11:36:00
7 830.0 250 125 2020-09-17 11:39:00
8 815.0 25 125 2020-09-17 11:40:00
9 804.8 25 400 2020-09-17 11:41:00
10 803.0 275 400 2020-09-17 11:44:00
11 791.0 150 650 2020-09-17 11:54:00
12 791.0 100 650 2020-09-17 11:55:00
13 780.65 25 900 2020-09-17 11:59:00
14 784.8 25 925 2020-09-17 12:01:00
15 825.0 25 925 2020-09-17 12:16:00
16 812.3 25 925 2020-09-17 13:25:00
17 816.7 25 950 2020-09-17 14:23:00
18 811.0 25 950 2020-09-17 14:48:00
19 764.5 25 975 2020-09-17 15:11:00
20 808.95 100 1000 2020-09-17 15:20:00
21 805.85 50 1100 2020-09-17 15:24:00
22 798.85 25 1125 2020-09-17 15:27:00
23 812.45 25 1150 2020-09-18 09:17:00
24 814.9 50 1225 2020-09-18 09:18:00
25 840.95 25 1225 2020-09-18 09:19:00
26 839.0 25 1225 2020-09-18 09:20:00
27 827.1 25 1175 2020-09-18 09:23:00
28 812.0 100 1150 2020-09-18 09:28:00
29 770.0 100 1200 2020-09-18 09:32:00
30 784.95 25 1200 2020-09-18 09:33:00
31 790.0 25 1325 2020-09-18 09:35:00
32 788.7 25 1325 2020-09-18 09:37:00
33 789.25 75 1375 2020-09-18 09:38:00
34 810.95 25 1375 2020-09-18 09:42:00
35 827.3 25 1375 2020-09-18 09:43:00
36 821.25 25 1400 2020-09-18 09:45:00
37 809.45 25 1375 2020-09-18 09:57:00
38 820.6 50 1400 2020-09-18 10:01:00
39 835.15 50 1425 2020-09-18 10:04:00
40 832.9 100 1425 2020-09-18 10:05:00
41 839.5 25 1425 2020-09-18 10:07:00
42 831.85 25 1475 2020-09-18 10:09:00
43 808.0 50 1400 2020-09-18 10:14:00
44 795.0 25 1400 2020-09-18 10:17:00
45 780.0 50 1350 2020-09-18 10:26:00
46 802.7 100 1350 2020-09-18 10:28:00
47 792.5 50 1425 2020-09-18 10:29:00
48 790.7 75 1425 2020-09-18 10:30:00
49 793.0 25 1425 2020-09-18 10:34:00
50 789.65 25 1425 2020-09-18 10:35:00
51 796.9 50 1425 2020-09-18 10:37:00
52 791.5 25 1425 2020-09-18 10:38:00
53 797.1 50 1475 2020-09-18 10:39:00
54 760.0 50 1475 2020-09-18 10:41:00
55 780.65 100 1475 2020-09-18 10:42:00
56 782.4 50 1475 2020-09-18 10:43:00
57 780.0 100 1550 2020-09-18 10:45:00
58 788.6 75 1650 2020-09-18 10:50:00
59 794.75 25 1650 2020-09-18 10:53:00
60 792.8 150 1650 2020-09-18 10:54:00
61 806.5 150 1650 2020-09-18 10:55:00
62 801.0 50 1775 2020-09-18 10:57:00
63 789.4 50 1775 2020-09-18 10:58:00
64 804.55 25 1775 2020-09-18 11:00:00
65 792.0 25 1775 2020-09-18 11:03:00
66 793.9 50 1775 2020-09-18 11:05:00
67 785.1 225 1850 2020-09-18 11:06:00
68 782.45 50 1850 2020-09-18 11:07:00
69 778.05 50 1850 2020-09-18 11:08:00
70 766.2 175 2000 2020-09-18 11:12:00
71 772.1 75 2000 2020-09-18 11:13:00
72 758.9 100 2200 2020-09-18 11:14:00
73 771.0 250 2200 2020-09-18 11:15:00
74 764.7 25 2200 2020-09-18 11:16:00
75 777.45 125 2450 2020-09-18 11:19:00
76 778.2 25 2550 2020-09-18 11:22:00
77 777.85 25 2550 2020-09-18 11:23:00
78 783.85 125 2600 2020-09-18 11:24:00
79 776.8 100 2700 2020-09-18 11:26:00
80 776.2 75 2700 2020-09-18 11:28:00
81 785.8 75 2875 2020-09-18 11:30:00
82 787.45 100 2875 2020-09-18 11:31:00
83 789.9 25 2875 2020-09-18 11:32:00
84 798.1 75 2875 2020-09-18 11:33:00
85 792.85 50 2925 2020-09-18 11:38:00
86 794.2 100 2925 2020-09-18 11:40:00
87 796.55 25 3050 2020-09-18 11:42:00
88 800.0 25 3050 2020-09-18 11:44:00
89 781.9 525 3050 2020-09-18 11:50:00
90 787.85 50 3525 2020-09-18 11:51:00
91 787.15 350 3525 2020-09-18 11:53:00
92 789.0 25 3525 2020-09-18 11:54:00
93 801.9 50 3375 2020-09-18 11:57:00
94 793.3 25 3400 2020-09-18 12:02:00
95 793.4 25 3000 2020-09-18 12:07:00
96 795.0 25 3000 2020-09-18 12:08:00
97 800.95 25 3025 2020-09-18 12:09:00
98 800.9 75 3025 2020-09-18 12:10:00
99 792.85 25 3025 2020-09-18 12:11:00
100 785.3 75 3075 2020-09-18 12:12:00
101 790.8 50 3075 2020-09-18 12:13:00
102 782.0 125 3075 2020-09-18 12:14:00
103 791.9 25 3125 2020-09-18 12:15:00
104 789.9 25 3125 2020-09-18 12:16:00
105 799.65 125 3150 2020-09-18 12:18:00
106 795.15 50 3025 2020-09-18 12:30:00
107 794.05 25 3075 2020-09-18 12:36:00
108 785.75 25 3075 2020-09-18 12:37:00
109 758.5 25 3100 2020-09-18 12:45:00
110 775.0 50 3100 2020-09-18 12:46:00
111 771.6 100 3100 2020-09-18 12:47:00
112 768.6 25 3125 2020-09-18 12:48:00
113 786.95 50 3150 2020-09-18 12:53:00
114 781.5 25 3150 2020-09-18 12:54:00
115 774.95 25 3100 2020-09-18 12:58:00
116 771.7 25 3100 2020-09-18 12:59:00
117 766.7 50 3150 2020-09-18 13:00:00
118 764.55 125 3150 2020-09-18 13:01:00
119 767.0 200 3150 2020-09-18 13:02:00
120 770.0 250 3175 2020-09-18 13:05:00
121 770.15 75 3075 2020-09-18 13:06:00
122 789.15 50 3075 2020-09-18 13:08:00
123 777.7 50 3125 2020-09-18 13:10:00
124 787.0 50 3125 2020-09-18 13:11:00
125 790.0 100 3000 2020-09-18 13:14:00
126 795.0 275 3000 2020-09-18 13:15:00
127 775.0 25 3000 2020-09-18 13:20:00
128 780.0 75 2900 2020-09-18 13:21:00
129 774.15 75 2900 2020-09-18 13:22:00
130 769.0 100 2825 2020-09-18 13:24:00
131 753.2 175 2825 2020-09-18 13:25:00
132 761.7 25 2825 2020-09-18 13:26:00
133 775.0 75 2975 2020-09-18 13:30:00
134 766.15 650 3000 2020-09-18 13:37:00
135 767.5 25 3375 2020-09-18 13:40:00
136 778.0 25 3375 2020-09-18 13:42:00
137 782.0 25 3375 2020-09-18 13:43:00
138 776.5 25 3375 2020-09-18 13:44:00
139 796.1 75 3375 2020-09-18 13:45:00
140 795.2 25 3175 2020-09-18 13:48:00
141 802.55 175 3175 2020-09-18 13:49:00
142 806.3 100 3175 2020-09-18 13:50:00
143 806.65 125 3450 2020-09-18 13:51:00
144 788.2 50 3450 2020-09-18 13:52:00
145 796.25 50 3600 2020-09-18 13:55:00
146 795.0 25 3600 2020-09-18 13:56:00
147 784.4 25 3250 2020-09-18 14:01:00
148 790.0 25 3175 2020-09-18 14:05:00
149 790.0 25 3200 2020-09-18 14:06:00
150 780.0 25 3200 2020-09-18 14:07:00
151 775.0 225 3200 2020-09-18 14:08:00
152 779.0 100 3450 2020-09-18 14:10:00
153 767.0 100 3500 2020-09-18 14:12:00
154 769.0 25 3500 2020-09-18 14:13:00
155 759.7 25 3625 2020-09-18 14:17:00
156 764.45 100 3650 2020-09-18 14:18:00
157 750.0 25 3650 2020-09-18 14:19:00
158 701.9 525 3650 2020-09-18 14:20:00
159 631.0 1725 4050 2020-09-18 14:21:00
160 600.0 275 4050 2020-09-18 14:22:00
161 643.5 500 4050 2020-09-18 14:23:00
162 606.9 475 4275 2020-09-18 14:24:00
163 599.0 1000 4275 2020-09-18 14:25:00
164 590.45 675 4275 2020-09-18 14:26:00
165 605.0 925 4950 2020-09-18 14:27:00
166 614.3 600 4950 2020-09-18 14:28:00
167 600.05 775 4950 2020-09-18 14:29:00
168 595.35 1150 7025 2020-09-18 14:30:00
169 596.2 525 7025 2020-09-18 14:31:00
170 596.8 975 7025 2020-09-18 14:32:00
171 584.0 200 8125 2020-09-18 14:33:00
172 550.0 1750 8125 2020-09-18 14:34:00
173 552.2 1825 8125 2020-09-18 14:35:00
174 554.65 600 8750 2020-09-18 14:36:00
175 565.4 925 8750 2020-09-18 14:37:00
176 565.0 150 8750 2020-09-18 14:38:00
177 583.95 450 9150 2020-09-18 14:39:00
178 561.4 975 9150 2020-09-18 14:40:00
179 566.8 3450 9150 2020-09-18 14:41:00
180 563.35 425 10525 2020-09-18 14:42:00
181 565.4 700 10525 2020-09-18 14:43:00
182 570.0 650 10525 2020-09-18 14:44:00
183 572.8 200 11125 2020-09-18 14:45:00
184 595.25 750 11125 2020-09-18 14:46:00
185 585.75 625 11125 2020-09-18 14:47:00
186 593.4 475 10925 2020-09-18 14:48:00
187 590.0 950 10925 2020-09-18 14:49:00
188 576.4 775 10925 2020-09-18 14:50:00
189 596.55 775 10800 2020-09-18 14:51:00
190 595.95 475 10800 2020-09-18 14:52:00
191 593.95 725 10800 2020-09-18 14:53:00
192 611.45 1500 10550 2020-09-18 14:54:00
193 618.0 1050 10550 2020-09-18 14:55:00
194 600.0 1150 10550 2020-09-18 14:56:00
195 609.15 575 11025 2020-09-18 14:57:00
196 615.6 375 11025 2020-09-18 14:58:00
197 604.4 875 11025 2020-09-18 14:59:00
198 591.3 1225 11375 2020-09-18 15:00:00
199 600.0 1100 11375 2020-09-18 15:01:00
200 597.8 1800 11375 2020-09-18 15:02:00
201 605.0 550 12625 2020-09-18 15:03:00
202 599.1 325 12625 2020-09-18 15:04:00
203 606.2 500 12625 2020-09-18 15:05:00
204 614.65 850 12625 2020-09-18 15:06:00
205 616.0 1225 12625 2020-09-18 15:07:00
206 622.5 1325 12625 2020-09-18 15:08:00
207 620.0 750 13850 2020-09-18 15:09:00
208 632.0 525 13850 2020-09-18 15:10:00
209 630.3 375 13850 2020-09-18 15:11:00
210 635.0 425 13575 2020-09-18 15:12:00
211 630.85 400 13575 2020-09-18 15:13:00
212 627.45 275 13575 2020-09-18 15:14:00
213 620.7 200 13700 2020-09-18 15:15:00
214 622.4 200 13700 2020-09-18 15:16:00
215 631.6 625 13700 2020-09-18 15:17:00
216 624.95 525 13100 2020-09-18 15:18:00
217 632.25 775 13100 2020-09-18 15:19:00
218 610.7 350 13100 2020-09-18 15:20:00
219 602.0 575 13175 2020-09-18 15:21:00
220 612.4 200 13175 2020-09-18 15:22:00
221 617.25 325 13175 2020-09-18 15:23:00
222 617.8 300 13650 2020-09-18 15:24:00
223 622.1 600 13650 2020-09-18 15:25:00
224 622.2 250 13650 2020-09-18 15:26:00
225 623.7 300 13425 2020-09-18 15:27:00
226 622.5 425 13425 2020-09-18 15:28:00
227 621.0 375 13425 2020-09-18 15:29:00
228 567.55 1075 13275 2020-09-21 09:15:00
229 565.0 2100 13275 2020-09-21 09:16:00
230 560.0 1100 14925 2020-09-21 09:17:00
231 562.15 625 14925 2020-09-21 09:18:00
232 556.45 850 14925 2020-09-21 09:19:00
233 543.1 1525 16450 2020-09-21 09:20:00
234 538.0 800 16450 2020-09-21 09:21:00
235 537.45 575 16450 2020-09-21 09:22:00
236 544.4 775 16825 2020-09-21 09:23:00
237 545.7 500 16825 2020-09-21 09:24:00
238 551.4 550 16825 2020-09-21 09:25:00
239 544.25 900 17625 2020-09-21 09:26:00
240 538.0 1850 17625 2020-09-21 09:27:00
241 534.85 525 17625 2020-09-21 09:28:00
242 534.5 425 18775 2020-09-21 09:29:00
243 547.1 1075 18775 2020-09-21 09:30:00
244 536.85 375 18775 2020-09-21 09:31:00
245 547.6 375 19775 2020-09-21 09:32:00
246 540.25 350 19775 2020-09-21 09:33:00
247 541.2 375 19775 2020-09-21 09:34:00
248 544.6 175 18650 2020-09-21 09:35:00
249 542.55 250 18650 2020-09-21 09:36:00
250 539.65 550 18650 2020-09-21 09:37:00
251 531.15 2150 19175 2020-09-21 09:38:00
252 530.1 825 19175 2020-09-21 09:39:00
253 518.7 1575 19175 2020-09-21 09:40:00
254 520.95 575 20475 2020-09-21 09:41:00
255 511.45 1250 20475 2020-09-21 09:42:00
256 517.45 1025 20475 2020-09-21 09:43:00
257 515.0 550 21150 2020-09-21 09:44:00
258 515.0 1125 21150 2020-09-21 09:45:00
259 518.9 425 21150 2020-09-21 09:46:00
260 514.95 550 21500 2020-09-21 09:47:00
261 509.8 1625 21500 2020-09-21 09:48:00
262 507.55 475 21500 2020-09-21 09:49:00
263 514.35 500 21975 2020-09-21 09:50:00
264 524.5 500 21975 2020-09-21 09:51:00
265 527.45 550 21975 2020-09-21 09:52:00
266 527.3 675 21550 2020-09-21 09:53:00
267 521.3 525 21550 2020-09-21 09:54:00
268 520.0 275 21550 2020-09-21 09:55:00
269 519.5 750 21600 2020-09-21 09:56:00
270 516.7 400 21600 2020-09-21 09:57:00
271 517.75 350 21600 2020-09-21 09:58:00
272 511.9 575 21850 2020-09-21 09:59:00
273 507.9 1175 21850 2020-09-21 10:00:00
274 510.05 525 21850 2020-09-21 10:01:00
275 515.85 1025 22350 2020-09-21 10:02:00
276 514.25 600 22350 2020-09-21 10:03:00
277 520.05 650 22350 2020-09-21 10:04:00
278 518.9 950 22850 2020-09-21 10:05:00
279 512.25 550 22850 2020-09-21 10:06:00
280 513.65 650 22850 2020-09-21 10:07:00
281 514.0 525 24300 2020-09-21 10:08:00
282 506.3 875 24300 2020-09-21 10:09:00
283 490.85 1825 24300 2020-09-21 10:10:00
284 499.0 300 25050 2020-09-21 10:11:00
285 495.0 975 25050 2020-09-21 10:12:00
286 497.15 1125 25050 2020-09-21 10:13:00
287 496.8 625 24875 2020-09-21 10:14:00
288 492.95 1075 24875 2020-09-21 10:15:00
289 497.6 1125 24875 2020-09-21 10:16:00
290 502.3 775 24450 2020-09-21 10:17:00
291 501.85 475 24450 2020-09-21 10:18:00
292 502.85 800 24450 2020-09-21 10:19:00
293 511.35 825 24700 2020-09-21 10:20:00
294 537.0 1850 24700 2020-09-21 10:21:00
295 531.55 2025 24700 2020-09-21 10:22:00
296 558.45 2825 24675 2020-09-21 10:23:00
297 555.25 625 24675 2020-09-21 10:24:00
298 577.0 4275 24675 2020-09-21 10:25:00
299 574.0 1075 23500 2020-09-21 10:26:00
300 569.4 600 23500 2020-09-21 10:27:00

python read and rewrite values per row

I am changing an old question of mine.
I have a file with this format; 4 values per line:
2623 831 6892 0
2353 1803 3425 0
1910 1823 3810 0
1637 1287 2811 0
2803 546 6609 0
1591 2157 2367 0
2167 1906 2665 0
3192 2168 8362 0
3903 1465 2011 0
2355 1801 2004 0
2390 796 5055 0
1703 1044 3441 0
1886 1328 2731 0
1496 1277 3074 0
1827 460 5992 0
1945 1785 2065 0
1983 1963 2818 0
1532 2229 6936 0
2449 5972 1918 0
2699 2007 1581 0
and I want to get this one; 10 values per line:
2623 831 6892 0 2353 1803 3425 0 1910 1823
3810 0 1637 1287 2811 0 2803 546 6609 0
1591 2157 2367 0 2167 1906 2665 0 3192 2168
8362 0 3903 1465 2011 0 2355 1801 2004 0
2390 796 5055 0 1703 1044 3441 0 1886 1328
2731 0 1496 1277 3074 0 1827 460 5992 0
1945 1785 2065 0 1983 1963 2818 0 1532 2229
6936 0 2449 5972 1918 0 2699 2007 1581 0
with open("Read_file") as f1:
with open("Write_file"),"w") as f2:
f2.writelines(itertools.islice(f1, 4, None))
Any tip is appreciated.
Try this:
with open('data.txt') as fp, open('output.txt', 'w') as fw:
data = fp.read().replace('\n', ' ').split()
for i in range(0, len(data) // 10):
fw.write(' '.join(data[i * 10: (i + 1) * 10]) + '\n')
Output:
2623 831 6892 0 2353 1803 3425 0 1910 1823
3810 0 1637 1287 2811 0 2803 546 6609 0
1591 2157 2367 0 2167 1906 2665 0 3192 2168
8362 0 3903 1465 2011 0 2355 1801 2004 0
2390 796 5055 0 1703 1044 3441 0 1886 1328
2731 0 1496 1277 3074 0 1827 460 5992 0
1945 1785 2065 0 1983 1963 2818 0 1532 2229
6936 0 2449 5972 1918 0 2699 2007 1581 0
A version that does not rely on reading the whole file into memory:
def get_words(f):
for line in f:
for word in line.split():
yield word
def chunk_values(iterator, num):
while True:
yield [next(iterator) for _ in range(num)]
with open('input.txt') as fin, open('output.txt', 'w') as fout:
for chunk in chunk_values(get_words(fin), 10):
fout.write(' '.join(chunk) + '\n')

Why is Pandas resample sampling out of sample?

I've got an issue with pandas resample function when trying resample a time series. My program fetches daily traffic data two years back from today and populates it in a .csv file. Resampling the data initially functioned well but recently it has started acting up. When I try to resample the daily data into weekly, monthly or quarterly frequency, pandas seems to randomly give out-of sample (non-existent) data from both sides of the actual range.
I first create a Pandas data frame from the csv file:
data = pd.read_csv('Trucks.csv')
data['Date'] = pd.to_datetime(data['Date'], infer_datetime_format=True)
data.set_index('Date',inplace=True)
data['Modified Total Trucks'] = data['Modified Total Trucks'].astype(int)
Here's a sample of the data:
Date Total Trucks Modified Total Trucks Solo Trucks Semi Trucks Full Trucks
2020-07-04 3898 2535 805 2281 812
2020-06-04 4125 2740 927 2378 820
2020-05-04 730 569 234 431 65
2020-04-04 465 354 145 270 50
2020-03-04 3501 2377 812 2051 638
2020-02-04 3594 2334 754 2081 759
...
2018-04-13 3243 2333 819 1978 446
2018-12-04 3402 2394 767 2144 491
2018-11-04 3559 2543 859 2209 491
2018-10-04 3492 2473 813 2182 497
2018-09-04 3733 2672 902 2321 510
I then try to resample the data:
DataWeekly = data.resample('1W').sum()
DataMonthly = data.resample('1M').sum()
DataQuarterly = data.resample('1Q').sum()
However, the resampled data frames have the wrong range and sometimes incorrect values. Here's an example of the monthly set:
Date Total Trucks Modified Total Trucks Solo Trucks Semi Trucks Full Trucks
2018-01-31 15553 11119 3842 9531 2180
2018-02-28 18488 13113 4497 11291 2700
2018-03-31 21355 15177 5134 13176 3045
2018-04-30 67785 48478 16524 41893 9368
2018-05-31 72390 51690 17666 44594 10130
2018-06-30 63877 45356 14938 40000 8939
2018-07-31 64846 46437 16108 39703 9035
2018-08-31 68352 49036 16905 42081 9366
2018-09-30 64629 46379 15963 39842 8824
2018-10-31 68093 48609 16806 41643 9644
2018-11-30 74643 53052 18581 45073 10989
2018-12-31 60270 43042 15030 36649 8591
2019-01-31 76866 55463 18994 47789 10083
2019-02-28 74705 53744 18170 46674 9861
2019-03-31 78664 56562 19108 49144 10412
2019-04-30 77760 56175 19356 48224 10180
2019-05-31 88033 63219 22049 53859 12125
2019-06-30 70370 50626 17448 43454 9468
2019-07-31 76014 54531 18698 46947 10369
2019-08-31 83509 60418 21600 50653 11256
2019-09-30 77289 55375 19097 47517 10675
2019-10-31 83514 60021 20761 51397 11356
2019-11-30 81383 58460 20550 49551 11282
2019-12-31 68307 49172 17092 41990 9225
2020-01-31 59448 42384 14547 36472 8429
2020-02-29 53862 38544 13687 32457 7718
2020-03-31 62950 43478 14930 37403 10617
2020-04-30 7796 5645 1968 4811 1017
2020-05-31 7983 5840 2053 4951 979
2020-06-30 11200 7918 2785 6710 1705
2020-07-31 10998 7673 2576 6691 1731
2020-08-31 4602 3323 1155 2838 609
2020-09-30 7980 5794 1991 4981 1008
2020-10-31 9759 7060 2464 6012 1283
2020-11-30 7762 5595 1906 4836 1020
2020-12-31 7642 5412 1790 4760 1092
I would expect the resample to be:
2018-04-30 67785 48478 16524 41893 9368
2018-05-31 72390 51690 17666 44594 10130
2018-06-30 63877 45356 14938 40000 8939
2018-07-31 64846 46437 16108 39703 9035
2018-08-31 68352 49036 16905 42081 9366
2018-09-30 64629 46379 15963 39842 8824
2018-10-31 68093 48609 16806 41643 9644
2018-11-30 74643 53052 18581 45073 10989
2018-12-31 60270 43042 15030 36649 8591
2019-01-31 76866 55463 18994 47789 10083
2019-02-28 74705 53744 18170 46674 9861
2019-03-31 78664 56562 19108 49144 10412
2019-04-30 77760 56175 19356 48224 10180
2019-05-31 88033 63219 22049 53859 12125
2019-06-30 70370 50626 17448 43454 9468
2019-07-31 76014 54531 18698 46947 10369
2019-08-31 83509 60418 21600 50653 11256
2019-09-30 77289 55375 19097 47517 10675
2019-10-31 83514 60021 20761 51397 11356
2019-11-30 81383 58460 20550 49551 11282
2019-12-31 68307 49172 17092 41990 9225
2020-01-31 59448 42384 14547 36472 8429
2020-02-29 53862 38544 13687 32457 7718
2020-03-31 62950 43478 14930 37403 10617
2020-04-30 7796 5645 1968 4811 1017
What am I missing? Many thanks in advance!
I think this is a problem with US vs ISO (European) time format, i.e. YYYY-DD-MM vs YYYY-MM-DD, it looks like it reads 2018-01-04 as 4th of January and puts it into the 2018-01-31 block (i.e. January 2018).
you want to set the option dayfirst=True in your pd.to_datetime call, see the Pandas doc for more details.

Pandas Slicing Between Dates Then Replace Values With Zero

I have the following DataFrame:
Channel Column 1 Column 2 Column 3
Date
12/30/2018 638 4472 487
12/31/2018 868 6985 540
1/1/2019 755 4401 829
1/2/2019 1655 9484 1145
1/3/2019 2002 14212 1158
1/4/2019 1633 9575 1098
1/5/2019 1026 5575 941
1/6/2019 1025 4963 1007
1/7/2019 1944 10685 1246
1/8/2019 2140 9932 1151
1/9/2019 2067 1031 1087
1/10/2019 2168 1005 1074
1/11/2019 2052 9371 909
1/12/2019 1223 5953 895
1/13/2019 1268 4809 827
I would like to return the following result if possible [essentially reduce values between certain dates in a specific column to zero]
Channel Column 1 Column 2 Column 3
Date
12/30/2018 638 4472 487
12/31/2018 868 6985 540
1/1/2019 755 4401 829
1/2/2019 1655 9484 1145
1/3/2019 2002 14212 1158
1/4/2019 1633 9575 1098
1/5/2019 1026 5575 941
1/6/2019 0 4963 1007
1/7/2019 0 10685 1246
1/8/2019 0 9932 1151
1/9/2019 0 1031 1087
1/10/2019 2168 1005 1074
1/11/2019 2052 9371 909
1/12/2019 1223 5953 895
1/13/2019 1268 4809 827
I am trying to filter by a specific column at specific dates, but I can't get it to work properly.
I have tried the following approaches, but I haven't had much luck
df[df['Channel'] == 'Branded Paid Search'].loc['1/6/2019':'1/9/2019']['Sessions'].apply(lambda x: 0 if x < 4000 else 0).to_frame()
This works, but not sure how to get the values back into the original dataframe.
I tried this:
def zero(df):
if df[df['Column 1'] > 0].loc['1/6/2019':'1/9/2019']:
return 0
else:
return 1
df.apply(zero, axis=1)
ValueError: ('The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().')
I tried this:
sessions_df[sessions_df['Column 1'] > 0].loc['1/6/2019':'1/9/2019'] = 0
Nothing changes.
Any help would be appreciated
First create DatetimeIndex by to_datetime and then set values with DataFrame.loc:
df.index = pd.to_datetime(df.index)
df.loc['1/6/2019':'1/9/2019', 'Column 1'] = 0
print (df)
Column 1 Column 2 Column 3
Channel
2018-12-30 638 4472 487
2018-12-31 868 6985 540
2019-01-01 755 4401 829
2019-01-02 1655 9484 1145
2019-01-03 2002 14212 1158
2019-01-04 1633 9575 1098
2019-01-05 1026 5575 941
2019-01-06 0 4963 1007
2019-01-07 0 10685 1246
2019-01-08 0 9932 1151
2019-01-09 0 1031 1087
2019-01-10 2168 1005 1074
2019-01-11 2052 9371 909
2019-01-12 1223 5953 895
2019-01-13 1268 4809 827

Categories

Resources