Calculating Rolling Mean of Temperature Anomalies- python - python

The data set is indexed by year-month-day (dates) and has columns TMAX and TMIN.
I need to calculate rolling means of each of the anomalies you calculated. Use a window of 10 years and have the window centered, then add this to your plot.
This part of the code creates a plot of anomalies:
tmaxanom = cll.TMAX - cll.TMAX.mean()
tminanom = cll.TMIN - cll.TMIN.mean()
yearlytmax = tmaxanom.resample('1y').mean()
yearlytmin = tminanom.resample('1y').mean()
ax = plt.plot(yearlytmax, color='red', lw=2, ms=3, marker='o')
ax = plt.plot(yearlytmin, color='blue', lw=2, ms=3, marker='o')
plt.legend(('TMAX Anomaly', 'TMIN Anomaly'), loc='best')
plt.xlabel("Year")
plt.ylabel("Degrees C")
plt.title("Temperature Anomalies in College Station, Texas")
I am trying to calculate rolling means by the following:
rolmean = yearlytmax.rolling(window=10, center=True)
rolmean2 = yearlytmin.rolling(window=10, center=True)
plt.plot(rolmean, color='pink', label='Rolling Mean Max')
plt.plot(rolmean2, color='yellow', label='Rolling Mean Min')
However this is causing python to throw an error: NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704
I followed the link, however am still unsure how to fix this problem.
A Sample of data is:
DATE TMAX. TMIN
1951-08-01 37.8 22.8
1951-08-02 37.8 22.2
1951-08-03 40.0 23.9
1951-08-04 41.7 26.7
1951-08-05 41.1 26.1
1951-08-06 40.6 26.7
1951-08-07 38.9 24.4
1951-08-08 39.4 25.0
1951-08-09 38.9 24.4
1951-08-10 38.9 24.4
1951-08-11 38.9 22.2
1951-08-12 40.0 23.3
1951-08-13 40.6 22.8
1951-08-14 41.1 25.6
1951-08-15 41.1 23.9
1951-08-16 42.2 24.4
1951-08-17 41.7 24.4
1951-08-18 36.7 21.7
1951-08-19 31.7 23.3
1951-08-20 36.7 21.7
1951-08-21 38.3 23.3
1951-08-22 39.4 22.2
1951-08-23 37.2 23.9
1951-08-24 37.8 23.3
1951-08-25 38.3 23.9
1951-08-26 37.8 23.3
1951-08-27 37.8 23.9
1951-08-28 38.3 22.8
1951-08-29 38.3 23.3
1951-08-30 38.9 23.9
... ... ...

I got it to work by adding in .mean() to the end of each of the rolling mean commands, so it looks like:
rolmean = yearlytmax.rolling(window=10, center=True).mean()
rolmean2 = yearlytmin.rolling(window=10, center=True).mean()

Related

matplotlib.pyplot fill_between shape error

I am trying to display a band using an upper and lower plots. but I keep getting an error. I am using matplotlib.pyplot fill_between.
Error Message:
ValueError: x and y must have same first dimension, but have shapes (1,) and (448, 1)
plt.xlim([0, 45.5])
plt.ylim([-16.50, 15.000])
plt.plot(len(X)-1, y_predicted, 'b', linewidth=1)
plt.fill_between(X, y_predit_lower, y_predit_upper, alpha=0.1, color='green')
plt.plot(X, y_predit_upper, 'g', linewidth=1, linestyle='--')
plt.plot(X, y_predit_lower, 'g', linewidth=1, linestyle='--')
plt.axhline(y=10.70, color='r', linestyle='-.',linewidth=2)
plt.xticks(np.arange(0, 45.5, 1.0), fontsize=8)
plt.yticks(np.arange(-16.50, 15.00, 0.50), fontsize=8)
plt.title("Pressure Gradient Valve Size (27mm)")
plt.xlabel("Time (sec)")
plt.ylabel("Pressure (mmHg)")
plt.grid()
plt.show()
For my x am using the values from a column of a DataFrame:
X = df_train['Time'].to_numpy('float')
This is the line of the code thats gives me the error:
plt.fill_between(X, y_predit_lower, y_predit_upper, alpha=0.1, color='green')
Error Message I get:
ValueError: x and y must have same first dimension, but have shapes (1,) and (448, 1)
In: print(X.shape)
Out: (448,)
In: print(y_predit_upper.shape)
Out: (448,)
In: print(y_predit_lower.shape)
Out: (448,)
In: print(X)
Out:
[ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4
1.5 1.6 1.7 1.8 1.9 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
2.9 3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2
4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. 5.1 5.2 5.3 5.4 5.5 5.6
5.7 5.8 5.9 6. 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8. 8.1 8.2 8.3 8.4
8.5 8.6 8.7 8.8 8.9 9. 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8
9.9 10. 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11. 11.1 11.2
11.3 11.4 11.5 11.6 11.7 11.8 11.9 12. 12.1 12.2 12.3 12.4 12.5 12.6
12.7 12.8 12.9 13. 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 14.
14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 15. 15.1 15.2 15.3 15.4
15.5 15.6 15.7 15.8 15.9 16. 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8
16.9 17. 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 18. 18.1 18.2
18.3 18.4 18.5 18.6 18.7 18.8 18.9 19. 19.1 19.2 19.3 19.4 19.5 19.6
19.7 19.8 19.9 20. 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9 21.
21.1 21.2 21.3 21.4 21.5 21.6 21.7 21.8 21.9 22. 22.1 22.2 22.3 22.4
22.5 22.6 22.7 22.8 22.9 23. 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.8
23.9 24. 24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8 24.9 25. 25.1 25.2
25.3 25.4 25.5 25.6 25.7 25.8 25.9 26. 26.1 26.2 26.3 26.4 26.5 26.6
26.7 26.8 26.9 27. 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 27.9 28.
28.1 28.2 28.3 28.4 28.5 28.6 28.7 28.8 28.9 29. 29.1 29.2 29.3 29.4
29.5 29.6 29.7 29.8 29.9 30. 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8
30.9 31. 31.1 31.2 31.3 31.4 31.5 31.6 31.7 31.8 31.9 32. 32.1 32.2
32.3 32.4 32.5 32.6 32.7 32.8 32.9 33. 33.1 33.2 33.3 33.4 33.5 33.6
33.7 33.8 33.9 34. 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35.
35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36. 36.1 36.2 36.3 36.4
36.5 36.6 36.7 36.8 36.9 37. 37.1 37.2 37.3 37.4 37.5 37.6 37.7 37.8
37.9 38. 38.1 38.2 38.3 38.4 38.5 38.6 38.7 38.8 38.9 39. 39.1 39.2
39.3 39.4 39.5 39.6 39.7 39.8 39.9 40. 40.1 40.2 40.3 40.4 40.5 40.6
40.7 40.8 40.9 41. 41.1 41.2 41.3 41.4 41.5 41.6 41.7 41.8 41.9 42.
42.1 42.2 42.3 42.4 42.5 42.6 42.7 42.8 42.9 43. 43.1 43.2 43.3 43.4
43.5 43.6 43.7 43.8 43.9 44. 44.1 44.2 44.3 44.4 44.5 44.6 44.7 44.8]
In: print(y_predit_upper.shape)
Out:
[-10.920185 -10.730879 -10.395649 -9.781197 -8.639384
-6.6007776 -3.5282364 -0.28529644 2.1445403 3.7071989
4.679699 5.2941465 5.695036 5.9663973 6.157011
6.2958107 6.400445 6.4820027 6.5476484 6.6021276
6.648663 6.6894903 6.72619 6.7599044 6.7914734
6.821528 6.8505487 6.8789124 6.906917 6.934807
6.962783 6.991015 7.01965 7.0488157 7.0786242
7.1091766 7.140561 7.1728573 7.2061357 7.2404547
7.275866 7.31241 7.3501153 7.389001 7.4290714
7.4703183 7.512725 7.5562544 7.600858 7.646476
7.6930337 7.740444 7.7886114 7.8374276 7.886779
7.9365463 7.986604 8.036831 8.087101 8.137291
8.187288 8.236979 8.286261 8.335035 8.383221
8.430737 8.477519 8.52351 8.568661 8.61293
8.65629 8.698717 8.740196 8.780713 8.82027
8.858864 8.8965 8.933188 8.96894 9.003771
9.037693 9.070728 9.102894 9.134211 9.1647
9.19438 9.223274 9.251405 9.2787895 9.305452
9.331414 9.356693 9.381311 9.405285 9.428637
9.451382 9.47354 9.495129 9.516164 9.536662
9.556641 9.576112 9.595093 9.613596 9.631638
9.649229 9.666382 9.683111 9.69943 9.715345
9.730873 9.746021 9.760802 9.775225 9.7893
9.803036 9.8164425 9.829529 9.842304 9.854776
9.866953 9.878841 9.890451 9.901789 9.912861
9.923676 9.9342375 9.944555 9.954636 9.964481
9.974102 9.983503 9.992689 10.001665 10.010438
10.019011 10.0273905 10.03558 10.043587 10.051413
10.059064 10.066545 10.07386 10.081012 10.088005
10.094845 10.101532 10.108074 10.114471 10.120729
10.126851 10.132839 10.138698 10.144428 10.150038
10.155523 10.1608925 10.166145 10.171288 10.176319
10.181244 10.186064 10.190783 10.1954 10.199922
10.204349 10.20868 10.212924 10.217076 10.221144
10.225128 10.229028 10.232847 10.236588 10.2402525
10.243841 10.247356 10.2508 10.254173 10.257479
10.2607155 10.263888 10.266995 10.270041 10.273026
10.275951 10.278817 10.281626 10.284379 10.287078
10.289722 10.292316 10.294858 10.297348 10.299793
10.302189 10.304538 10.30684 10.309098 10.311312
10.313485 10.315615 10.317703 10.319752 10.321762
10.323734 10.325667 10.327566 10.329426 10.331253
10.333044 10.334803 10.336527 10.338222 10.339883
10.341513 10.343113 10.344685 10.346227 10.34774
10.349226 10.350685 10.352116 10.353521 10.354902
10.356257 10.357589 10.358896 10.36018 10.36144
10.362679 10.363894 10.365089 10.366262 10.367414
10.368546 10.369658 10.370752 10.371826 10.372882
10.373919 10.374937 10.375938 10.376922 10.37789
10.3788395 10.379774 10.380693 10.381596 10.3824835
10.383356 10.384214 10.385057 10.385887 10.3867035
10.387505 10.388294 10.3890705 10.3898325 10.390584
10.391323 10.39205 10.392763 10.393466 10.394157
10.394838 10.395506 10.396166 10.396815 10.397452
10.398081 10.398699 10.399307 10.399906 10.400494
10.401075 10.401647 10.402208 10.402762 10.403307
10.403845 10.404373 10.404894 10.405405 10.4059105
10.406408 10.4068985 10.40738 10.407856 10.408323
10.408785 10.409239 10.409686 10.410128 10.410563
10.410991 10.411414 10.411829 10.412239 10.412644
10.413042 10.413435 10.413822 10.414204 10.414579
10.414951 10.415317 10.415678 10.416034 10.416384
10.416731 10.417071 10.417408 10.41774 10.418068
10.418391 10.41871 10.4190235 10.419333 10.41964
10.419942 10.42024 10.420534 10.420824 10.42111
10.421393 10.421673 10.421947 10.422218 10.422487
10.422752 10.423013 10.423271 10.423527 10.423778
10.4240265 10.424272 10.424515 10.424753 10.42499
10.425224 10.425454 10.425682 10.425907 10.426128
10.426348 10.426565 10.426779 10.4269905 10.4272
10.427407 10.427612 10.427814 10.428013 10.428211
10.428406 10.428598 10.428788 10.428976 10.429163
10.429345 10.429527 10.4297085 10.429886 10.430061
10.430235 10.430407 10.430576 10.430744 10.43091
10.431075 10.431237 10.431398 10.431557 10.431714
10.4318695 10.432024 10.432177 10.432327 10.432476
10.432623 10.43277 10.432913 10.433057 10.433199
10.433338 10.433476 10.433613 10.433748 10.433882
10.434015 10.434147 10.434278 10.434406 10.434532
10.43466 10.434786 10.434908 10.43503 10.435152
10.435272 10.43539 10.435509 10.435625 10.435741
10.435854 10.435968 10.43608 10.436192 10.436301
10.43641 10.436517 10.436625 10.43673 10.436835
10.436939 10.437042 10.437143 10.437244 10.4373455
10.437443 10.437542 10.437639 10.437736 10.437831
10.437925 10.43802 10.438112 10.438204 10.438295
10.438386 10.438476 10.438565 10.438652 10.438741
10.438827 10.438912 10.438997 10.438997 10.438997
10.438997 10.438997 10.438997 10.438997 10.438997
10.438997 10.438997 10.438997 ]
In: print(y_predit_lower.shape)
Out:
[-1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01
-1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01
-1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4229019e+01
-1.4225172e+01 -1.4219831e+01 -1.4212313e+01 -1.4201557e+01
-1.4185902e+01 -1.4162625e+01 -1.4127128e+01 -1.4071282e+01
-1.3980150e+01 -1.3825264e+01 -1.3550985e+01 -1.3048252e+01
-1.2114041e+01 -1.0446091e+01 -7.9321928e+00 -5.2788787e+00
-3.2908306e+00 -2.0122919e+00 -1.2166102e+00 -7.1388006e-01
-3.8587976e-01 -1.6385698e-01 -7.9002380e-03 1.0566306e-01
1.9127297e-01 2.5800228e-01 3.1171203e-01 3.5628605e-01
3.9436054e-01 4.2776465e-01 4.5779157e-01 4.8537636e-01
5.1120520e-01 5.3579545e-01 5.5953956e-01 5.8274651e-01
6.0565925e-01 6.2847805e-01 6.5136743e-01 6.7446661e-01
6.9789529e-01 7.2175813e-01 7.4614692e-01 7.7114415e-01
7.9682255e-01 8.2324672e-01 8.5047436e-01 8.7855363e-01
9.0752649e-01 9.3742609e-01 9.6827579e-01 1.0000916e+00
1.0328760e+00 1.0666242e+00 1.1013203e+00 1.1369352e+00
1.1734290e+00 1.2107525e+00 1.2488456e+00 1.2876358e+00
1.3270454e+00 1.3669858e+00 1.4073639e+00 1.4480829e+00
1.4890399e+00 1.5301342e+00 1.5712638e+00 1.6123290e+00
1.6532354e+00 1.6938915e+00 1.7342124e+00 1.7741199e+00
1.8135443e+00 1.8524213e+00 1.8906975e+00 1.9283261e+00
1.9652677e+00 2.0014882e+00 2.0369649e+00 2.0716777e+00
2.1056142e+00 2.1387653e+00 2.1711292e+00 2.2027068e+00
2.2335000e+00 2.2635174e+00 2.2927685e+00 2.3212667e+00
2.3490210e+00 2.3760500e+00 2.4023676e+00 2.4279904e+00
2.4529357e+00 2.4772196e+00 2.5008607e+00 2.5238762e+00
2.5462823e+00 2.5680971e+00 2.5893383e+00 2.6100211e+00
2.6301632e+00 2.6497784e+00 2.6688843e+00 2.6874938e+00
2.7056236e+00 2.7232871e+00 2.7404976e+00 2.7572689e+00
2.7736154e+00 2.7895460e+00 2.8050761e+00 2.8202147e+00
2.8349762e+00 2.8493690e+00 2.8634038e+00 2.8770909e+00
2.8904424e+00 2.9034648e+00 2.9161687e+00 2.9285626e+00
2.9406557e+00 2.9524565e+00 2.9639721e+00 2.9752111e+00
2.9861798e+00 2.9968877e+00 3.0073400e+00 3.0175438e+00
3.0275064e+00 3.0372334e+00 3.0467329e+00 3.0560083e+00
3.0650678e+00 3.0739164e+00 3.0825577e+00 3.0909996e+00
3.0992465e+00 3.1073031e+00 3.1151748e+00 3.1228657e+00
3.1303816e+00 3.1377258e+00 3.1449032e+00 3.1519175e+00
3.1587734e+00 3.1654744e+00 3.1720252e+00 3.1784282e+00
3.1846886e+00 3.1908092e+00 3.1967940e+00 3.2026453e+00
3.2083673e+00 3.2139630e+00 3.2194352e+00 3.2247872e+00
3.2300220e+00 3.2351422e+00 3.2401505e+00 3.2450500e+00
3.2498436e+00 3.2545323e+00 3.2591219e+00 3.2636104e+00
3.2680025e+00 3.2723012e+00 3.2765074e+00 3.2806249e+00
3.2846537e+00 3.2885976e+00 3.2924585e+00 3.2962365e+00
3.2999353e+00 3.3035574e+00 3.3071012e+00 3.3105736e+00
3.3139715e+00 3.3172994e+00 3.3205595e+00 3.3237495e+00
3.3268752e+00 3.3299356e+00 3.3329334e+00 3.3358698e+00
3.3387456e+00 3.3415632e+00 3.3443227e+00 3.3470273e+00
3.3496766e+00 3.3522720e+00 3.3548145e+00 3.3573065e+00
3.3597488e+00 3.3621411e+00 3.3644867e+00 3.3667846e+00
3.3690367e+00 3.3712454e+00 3.3734097e+00 3.3755307e+00
3.3776112e+00 3.3796487e+00 3.3816485e+00 3.3836088e+00
3.3855305e+00 3.3874140e+00 3.3892622e+00 3.3910737e+00
3.3928509e+00 3.3945937e+00 3.3963022e+00 3.3979788e+00
3.3996229e+00 3.4012361e+00 3.4028187e+00 3.4043717e+00
3.4058938e+00 3.4073887e+00 3.4088535e+00 3.4102931e+00
3.4117036e+00 3.4130898e+00 3.4144497e+00 3.4157834e+00
3.4170928e+00 3.4183784e+00 3.4196396e+00 3.4208779e+00
3.4220929e+00 3.4232869e+00 3.4244580e+00 3.4256082e+00
3.4267378e+00 3.4278469e+00 3.4289360e+00 3.4300056e+00
3.4310555e+00 3.4320869e+00 3.4331002e+00 3.4340954e+00
3.4350724e+00 3.4360318e+00 3.4369760e+00 3.4379010e+00
3.4388113e+00 3.4397054e+00 3.4405851e+00 3.4414482e+00
3.4422965e+00 3.4431300e+00 3.4439492e+00 3.4447536e+00
3.4455457e+00 3.4463229e+00 3.4470873e+00 3.4478393e+00
3.4485774e+00 3.4493046e+00 3.4500179e+00 3.4507203e+00
3.4514103e+00 3.4520888e+00 3.4527564e+00 3.4534125e+00
3.4540586e+00 3.4546938e+00 3.4553175e+00 3.4559317e+00
3.4565368e+00 3.4571309e+00 3.4577150e+00 3.4582901e+00
3.4588556e+00 3.4594121e+00 3.4599595e+00 3.4604993e+00
3.4610300e+00 3.4615517e+00 3.4620652e+00 3.4625711e+00
3.4630685e+00 3.4635587e+00 3.4640403e+00 3.4645157e+00
3.4649830e+00 3.4654431e+00 3.4658966e+00 3.4663415e+00
3.4667811e+00 3.4672141e+00 3.4676394e+00 3.4680586e+00
3.4684715e+00 3.4688792e+00 3.4692798e+00 3.4696741e+00
3.4700637e+00 3.4704461e+00 3.4708233e+00 3.4711957e+00
3.4715614e+00 3.4719224e+00 3.4722786e+00 3.4726286e+00
3.4729748e+00 3.4733148e+00 3.4736505e+00 3.4739819e+00
3.4743066e+00 3.4746289e+00 3.4749455e+00 3.4752569e+00
3.4755650e+00 3.4758687e+00 3.4761677e+00 3.4764628e+00
3.4767542e+00 3.4770417e+00 3.4773250e+00 3.4776039e+00
3.4778790e+00 3.4781504e+00 3.4784188e+00 3.4786835e+00
3.4789438e+00 3.4792008e+00 3.4794540e+00 3.4797049e+00
3.4799519e+00 3.4801960e+00 3.4804363e+00 3.4806743e+00
3.4809079e+00 3.4811401e+00 3.4813676e+00 3.4815922e+00
3.4818144e+00 3.4820347e+00 3.4822516e+00 3.4824648e+00
3.4826765e+00 3.4828849e+00 3.4830904e+00 3.4832940e+00
3.4834948e+00 3.4836936e+00 3.4838891e+00 3.4840827e+00
3.4842734e+00 3.4844618e+00 3.4846487e+00 3.4848323e+00
3.4850144e+00 3.4851933e+00 3.4853716e+00 3.4855466e+00
3.4857192e+00 3.4858909e+00 3.4860601e+00 3.4862275e+00
3.4863930e+00 3.4865561e+00 3.4867177e+00 3.4868770e+00
3.4870348e+00 3.4871902e+00 3.4873438e+00 3.4874964e+00
3.4876461e+00 3.4877954e+00 3.4879422e+00 3.4880881e+00
3.4882312e+00 3.4883738e+00 3.4885139e+00 3.4886527e+00
3.4887900e+00 3.4889264e+00 3.4890614e+00 3.4891939e+00
3.4893255e+00 3.4894552e+00 3.4895840e+00 3.4897108e+00
3.4898376e+00 3.4899626e+00 3.4900851e+00 3.4902072e+00
3.4903274e+00 3.4904480e+00 3.4905648e+00 3.4906826e+00
3.4907985e+00 3.4909120e+00 3.4910259e+00 3.4911375e+00
3.4912481e+00 3.4913583e+00 3.4914665e+00 3.4915743e+00
3.4916811e+00 3.4917865e+00 3.4918904e+00 3.4919934e+00
3.4920964e+00 3.4921975e+00 3.4922967e+00 3.4923968e+00
3.4924951e+00 3.4925923e+00 3.4926887e+00 3.4927831e+00
3.4928784e+00 3.4929719e+00 3.4930644e+00 3.4931564e+00
3.4932475e+00 3.4933367e+00 3.4934263e+00 3.4935131e+00
3.4936023e+00 3.4936886e+00 3.4937744e+00 3.4938593e+00
3.4939427e+00 3.4940267e+00 3.4941092e+00 3.4941907e+00
3.4942718e+00 3.4943528e+00 3.4944315e+00 3.4945107e+00
3.4945889e+00 3.4946666e+00 3.4947433e+00 3.4948187e+00]
Dont know what I am missing here with the data structure.

Getting Empty DataFrame in pandas from table data

I'm getting data from using print command but in Pandas DataFrame throwing result as : Empty DataFrame,Columns: [],Index: [`]
Script:
from bs4 import BeautifulSoup
import requests
import re
import json
import pandas as pd
url='http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1640132253903&t=XNAS:AAPL'
req=requests.get(url).text
#print(req)
data=re.search(r'jsonp1640132253903\((\{.*\})\)',req).group(1)
json_data=json.loads(data)['componentData']
#print(json_data)
# with open('index.html','w') as f:
# f.write(json_data)
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
if len(row_data) < 12:
row_data = ['Particulars'] + row_data
#print(row_data)
df=pd.DataFrame(row_data)
print(df)
Print result:
['Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM']
['RevenueUSD Mil', '156,508', '170,910', '182,795', '233,715', '215,639', '229,234', '265,595', '260,174', '274,515', '365,817', '365,817']
['Gross Margin %', '43.9', '37.6', '38.6', '40.1', '39.1', '38.5', '38.3', '37.8', '38.2', '41.8', '41.8']
['Operating IncomeUSD Mil', '55,241', '48,999', '52,503', '71,230', '60,024', '61,344', '70,898', '63,930', '66,288', '108,949', '108,949']
['Operating Margin %', '35.3', '28.7', '28.7', '30.5', '27.8', '26.8', '26.7', '24.6', '24.1', '29.8', '29.8']
['Net IncomeUSD Mil', '41,733', '37,037', '39,510', '53,394', '45,687', '48,351', '59,531', '55,256', '57,411',
'94,680', '94,680']
['Earnings Per ShareUSD', '1.58', '1.42', '1.61', '2.31', '2.08', '2.30', '2.98', '2.97', '3.28', '5.61', '5.61'
Expected output:
2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
Revenue USD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
Operating Income USD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
Net Income USD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
Earnings Per Share USD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
Dividends USD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
Shares Mil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
Book Value Per Share * USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
Operating Cash Flow USD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
Cap Spending USD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
Free Cash Flow USD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
Free Cash Flow Per Share * USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
Working Capital USD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355
Expected columns:
'Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM'
#QHarr's answer is by far the most straightforward, but in case you are wondering what is wrong with your code, it's that you are resetting the variable row_data for every iteration of the loop.
To make your code work, you can instead store each row as an element in a list. Then to build a DataFrame, you can pass this list of rows and the column names to pd.DataFrame:
data = []
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
elif len(row_data) < 12:
columns = ['Particulars'] + row_data
else:
data.append(row_data)
df=pd.DataFrame(data, columns=columns)
Result:
>>> df
Particulars 2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
0 RevenueUSD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
1 Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
2 Operating IncomeUSD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
3 Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
4 Net IncomeUSD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
5 Earnings Per ShareUSD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
6 DividendsUSD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
7 Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
8 SharesMil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
9 Book Value Per Share *USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
10 Operating Cash FlowUSD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
11 Cap SpendingUSD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
12 Free Cash FlowUSD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
13 Free Cash Flow Per Share *USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
14 Working CapitalUSD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355 —
Use read_html for the DataFrame creation and then drop the na rows
json_data=json.loads(data)['componentData']
pd.read_html(json_data)[0].dropna(axis=0, how='all')

I'm getting float axis even with the command MaxNlocator(integer=True)

I have this df called normales:
CODIGO MES TMAX TMIN PP
0 000130 Enero 31.3 23.5 51.1
1 000130 Febrero 31.7 23.8 136.7
2 000130 Marzo 31.8 23.9 119.5
3 000130 Abril 31.5 23.7 55.6
4 000130 Mayo 30.6 23.1 15.6
... ... ... ... ...
4447 158328 Agosto 11.9 -10.6 2.2
4448 158328 Septiembre 13.2 -9.1 1.2
4449 158328 Octubre 14.6 -8.2 4.9
4450 158328 Noviembre 15.4 -7.2 11.1
4451 158328 Diciembre 14.7 -5.3 35.9
With this code i'm plotting time series and bars:
from matplotlib.ticker import MaxNLocator
from matplotlib.font_manager import FontProperties
for code, data in normales.groupby('CODIGO'):
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex=False, sharey=False,figsize=(20, 15))
data.plot('MES',["TMAX"], alpha=0.5, color='red', marker='P', fontsize = 15.0,ax=ax1)
data.plot('MES',["TMIN"], alpha=0.5,color='blue',marker='D', fontsize = 15.0,ax=ax2)
data.plot('MES',["PP"],kind='bar',color='green', fontsize = 15.0,ax=ax3)
tabla=ax4.table(cellText=data[['TMAX','TMIN','PP']].T.values,colLabels=["Enero","Febrero","Marzo","Abril","Mayo","Junio","Julio","Agosto",
"Septiembre","Octubre","Noviembre","Diciembre"],
rowLabels=data[['TMAX','TMIN','PP']].columns,rowColours =["red","blue","green"],
colColours =["black"] * 12,loc="center",bbox = [0.0, -0.5, 1, 1])
tabla.auto_set_font_size(False)
tabla.set_fontsize(15)
tabla.scale(1,2)
ax4.axis('off')
ax1.set_ylabel("Temperatura\nMáxima °C/mes", fontsize = 15.0)
ax1.yaxis.set_major_locator(MaxNLocator(integer=True))
ax2.set_ylabel("Temperatura\nMínima °C/mes", fontsize = 15.0)
ax2.yaxis.set_major_locator(MaxNLocator(integer=True))
ax3.set_ylabel("Precipitación mm/mes", fontsize = 15.0)
ax3.yaxis.set_major_locator(MaxNLocator(integer=True))
ax1.set_xlabel("")
ax2.set_xlabel("")
ax3.set_xlabel("")
ax4.set_xlabel("")
You can realize that i'm using ax.yaxis.set_major_locator(MaxNLocator(integer=True)) in every axis to make integer the numbers of the axis. Although i'm using ax.yaxis.set_major_locator(MaxNLocator(integer=True)) i'm getting graphics with non integer (float) values in the yaxis. Do you know why this is happening?
Thanks in advance.
From the MaxNLocator docs:
integer bool, default: False
If True, ticks will take only integer values, provided at least min_n_ticks integers are found within the view limits.
....
min_n_ticks int, default: 2
You need to change min_n_ticks to 1 since ax2 only has one integer within the view limits, namely 12.

Resample Pandas Dataframe Without Filling in Missing Times

Resampling a dataframe can take the dataframe to either a higher or lower temporal resolution. Most of the time this is used to go to lower resolution (e.g. resample 1-minute data to monthly values). When the dataset is sparse (for example, no data were collected in Feb-2020) then the Feb-2020 row in will be filled with NaNs the resampled dataframe. The problem is when the data record is long AND sparse there are a lot of NaN rows, which makes the dataframe unnecessarily large and takes a lot of CPU time. For example, consider this dataframe and resample operation:
import numpy as np
import pandas as pd
freq1 = pd.date_range("20000101", periods=10, freq="S")
freq2 = pd.date_range("20200101", periods=10, freq="S")
index = np.hstack([freq1.values, freq2.values])
data = np.random.randint(0, 100, (20, 10))
cols = list("ABCDEFGHIJ")
df = pd.DataFrame(index=index, data=data, columns=cols)
# now resample to daily average
df = df.resample(rule="1D").mean()
Most of the data in this dataframe is useless and can be removed via:
df.dropna(how="all", axis=0, inplace=True)
however, this is sloppy. Is there another method to resample the dataframe that does not fill all of the data gaps with NaN (i.e. in the example above, the resultant dataframe would have only two rows)?
Updating my original answer with (what I think) is an improvement, plus updated times.
Use groupby
There are a couple ways you can use groupby instead of resample. In the case of a day ("1D") resampling, you can just use the date property of the DateTimeIndex:
df = df.groupby(df.index.date).mean()
This is in fact faster than the resample for your data:
%%timeit
df.resample(rule='1D').mean().dropna()
# 2.08 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.groupby(df.index.date).mean()
# 666 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The more general approach would be to use the floor of the timestamps to do the groupby operation:
rule = '1D'
f = df.index.floor(rule)
df.groupby(f).mean()
# A B C D E F G H I J
# 2000-01-01 50.5 33.5 62.7 42.4 46.7 49.2 64.0 53.3 71.0 38.0
# 2020-01-01 50.4 56.3 57.4 46.2 55.0 60.2 60.3 57.8 63.5 47.3
This will work with more irregular frequencies as well. The main snag here is that by default, it seems like the floor is calculated in reference to some initial date, which can cause weird results (see my post):
rule = '7D'
f = df.index.floor(rule)
df.groupby(f).mean()
# A B C D E F G H I J
# 1999-12-30 50.5 33.5 62.7 42.4 46.7 49.2 64.0 53.3 71.0 38.0
# 2019-12-26 50.4 56.3 57.4 46.2 55.0 60.2 60.3 57.8 63.5 47.3
The major issue is that the resampling doesn't start on the earliest timestamp within your data. However, it is fixable using this solution to the above post:
# custom function for flooring relative to a start date
def floor(x, freq):
offset = x[0].ceil(freq) - x[0]
return (x + offset).floor(freq) - offset
rule = '7D'
f = floor(df.index, rule)
df.groupby(f).mean()
# A B C D E F G H I J
# 2000-01-01 50.5 33.5 62.7 42.4 46.7 49.2 64.0 53.3 71.0 38.0
# 2019-12-28 50.4 56.3 57.4 46.2 55.0 60.2 60.3 57.8 63.5 47.3
# the cycle of 7 days is now starting from 01-01-2000
Just note here that the function floor() is relatively slow compared to pandas.Series.dt.floor(). So it is best to us the latter if you can, but both are better than the original resample (in your example):
%%timeit
df.groupby(df.index.floor('1D')).mean()
# 1.06 ms ± 6.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df.groupby(floor(df.index, '1D')).mean()
# 1.42 ms ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

How to make a 4d plot using Python with matplotlib

I am looking for a way to create four-dimensional plots (surface plus a color scale) using Python and matplotlib. I am able to generate the surface using the first three variables, but I am not having success adding the color scale for the fourth variable. Here is a small subset of my data below. Any help would be greatly appreciated. Thanks
Data Subset
var1 var2 var3 var4
10.39 73.32 2.02 28.26
11.13 68.71 1.86 27.83
12.71 74.27 1.89 28.26
11.46 91.06 1.63 28.26
11.72 85.38 1.51 28.26
13.39 78.68 1.89 28.26
13.02 68.02 2.01 28.26
12.08 64.37 2.18 28.26
11.58 60.71 2.28 28.26
8.94 65.67 1.92 27.04
11.61 59.57 2.32 27.52
19.06 74.49 1.69 63.35
17.52 73.62 1.73 63.51
19.52 71.52 1.79 63.51
18.76 67.55 1.86 63.51
19.84 53.34 2.3 63.51
20.19 59.82 1.97 63.51
17.43 57.89 2.05 63.38
17.9 59.95 1.89 63.51
18.97 57.84 2 63.51
19.22 57.74 2.05 63.51
17.55 55.66 1.99 63.51
19.22 101.31 6.76 94.29
19.41 99.47 6.07 94.15
18.99 94.01 7.32 94.08
19.88 103.57 6.98 94.58
19.08 95.38 5.66 94.14
20.36 100.43 6.13 94.47
20.13 98.78 7.37 94.47
20.36 89.36 8.79 94.71
20.96 84.48 8.33 94.01
21.02 83.97 6.78 94.72
19.6 95.64 6.56 94.57
To create the plot you want, we need to use matplotlib's plot_surface to plot Z vs (X,Y) surface, and then use the keyword argument facecolors to pass in a new color for each patch.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
# create some fake data
x = y = np.arange(-4.0, 4.0, 0.02)
# here are the x,y and respective z values
X, Y = np.meshgrid(x, y)
Z = np.sinc(np.sqrt(X*X+Y*Y))
# this is the value to use for the color
V = np.sin(Y)
# create the figure, add a 3d axis, set the viewing angle
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(45,60)
# here we create the surface plot, but pass V through a colormap
# to create a different color for each patch
ax.plot_surface(X, Y, Z, facecolors=cm.Oranges(V))

Categories

Resources