Getting Empty DataFrame in pandas from table data - python
I'm getting data from using print command but in Pandas DataFrame throwing result as : Empty DataFrame,Columns: [],Index: [`]
Script:
from bs4 import BeautifulSoup
import requests
import re
import json
import pandas as pd
url='http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1640132253903&t=XNAS:AAPL'
req=requests.get(url).text
#print(req)
data=re.search(r'jsonp1640132253903\((\{.*\})\)',req).group(1)
json_data=json.loads(data)['componentData']
#print(json_data)
# with open('index.html','w') as f:
# f.write(json_data)
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
if len(row_data) < 12:
row_data = ['Particulars'] + row_data
#print(row_data)
df=pd.DataFrame(row_data)
print(df)
Print result:
['Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM']
['RevenueUSD Mil', '156,508', '170,910', '182,795', '233,715', '215,639', '229,234', '265,595', '260,174', '274,515', '365,817', '365,817']
['Gross Margin %', '43.9', '37.6', '38.6', '40.1', '39.1', '38.5', '38.3', '37.8', '38.2', '41.8', '41.8']
['Operating IncomeUSD Mil', '55,241', '48,999', '52,503', '71,230', '60,024', '61,344', '70,898', '63,930', '66,288', '108,949', '108,949']
['Operating Margin %', '35.3', '28.7', '28.7', '30.5', '27.8', '26.8', '26.7', '24.6', '24.1', '29.8', '29.8']
['Net IncomeUSD Mil', '41,733', '37,037', '39,510', '53,394', '45,687', '48,351', '59,531', '55,256', '57,411',
'94,680', '94,680']
['Earnings Per ShareUSD', '1.58', '1.42', '1.61', '2.31', '2.08', '2.30', '2.98', '2.97', '3.28', '5.61', '5.61'
Expected output:
2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
Revenue USD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
Operating Income USD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
Net Income USD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
Earnings Per Share USD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
Dividends USD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
Shares Mil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
Book Value Per Share * USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
Operating Cash Flow USD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
Cap Spending USD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
Free Cash Flow USD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
Free Cash Flow Per Share * USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
Working Capital USD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355
Expected columns:
'Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM'
#QHarr's answer is by far the most straightforward, but in case you are wondering what is wrong with your code, it's that you are resetting the variable row_data for every iteration of the loop.
To make your code work, you can instead store each row as an element in a list. Then to build a DataFrame, you can pass this list of rows and the column names to pd.DataFrame:
data = []
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
elif len(row_data) < 12:
columns = ['Particulars'] + row_data
else:
data.append(row_data)
df=pd.DataFrame(data, columns=columns)
Result:
>>> df
Particulars 2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
0 RevenueUSD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
1 Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
2 Operating IncomeUSD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
3 Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
4 Net IncomeUSD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
5 Earnings Per ShareUSD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
6 DividendsUSD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
7 Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
8 SharesMil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
9 Book Value Per Share *USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
10 Operating Cash FlowUSD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
11 Cap SpendingUSD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
12 Free Cash FlowUSD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
13 Free Cash Flow Per Share *USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
14 Working CapitalUSD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355 —
Use read_html for the DataFrame creation and then drop the na rows
json_data=json.loads(data)['componentData']
pd.read_html(json_data)[0].dropna(axis=0, how='all')
Related
matplotlib.pyplot fill_between shape error
I am trying to display a band using an upper and lower plots. but I keep getting an error. I am using matplotlib.pyplot fill_between. Error Message: ValueError: x and y must have same first dimension, but have shapes (1,) and (448, 1) plt.xlim([0, 45.5]) plt.ylim([-16.50, 15.000]) plt.plot(len(X)-1, y_predicted, 'b', linewidth=1) plt.fill_between(X, y_predit_lower, y_predit_upper, alpha=0.1, color='green') plt.plot(X, y_predit_upper, 'g', linewidth=1, linestyle='--') plt.plot(X, y_predit_lower, 'g', linewidth=1, linestyle='--') plt.axhline(y=10.70, color='r', linestyle='-.',linewidth=2) plt.xticks(np.arange(0, 45.5, 1.0), fontsize=8) plt.yticks(np.arange(-16.50, 15.00, 0.50), fontsize=8) plt.title("Pressure Gradient Valve Size (27mm)") plt.xlabel("Time (sec)") plt.ylabel("Pressure (mmHg)") plt.grid() plt.show() For my x am using the values from a column of a DataFrame: X = df_train['Time'].to_numpy('float') This is the line of the code thats gives me the error: plt.fill_between(X, y_predit_lower, y_predit_upper, alpha=0.1, color='green') Error Message I get: ValueError: x and y must have same first dimension, but have shapes (1,) and (448, 1) In: print(X.shape) Out: (448,) In: print(y_predit_upper.shape) Out: (448,) In: print(y_predit_lower.shape) Out: (448,) In: print(X) Out: [ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6. 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7. 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8. 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9. 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10. 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11. 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 12. 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13. 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 14. 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 15. 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 16. 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 16.9 17. 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 18. 18.1 18.2 18.3 18.4 18.5 18.6 18.7 18.8 18.9 19. 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 20. 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9 21. 21.1 21.2 21.3 21.4 21.5 21.6 21.7 21.8 21.9 22. 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 22.9 23. 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.8 23.9 24. 24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8 24.9 25. 25.1 25.2 25.3 25.4 25.5 25.6 25.7 25.8 25.9 26. 26.1 26.2 26.3 26.4 26.5 26.6 26.7 26.8 26.9 27. 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 27.9 28. 28.1 28.2 28.3 28.4 28.5 28.6 28.7 28.8 28.9 29. 29.1 29.2 29.3 29.4 29.5 29.6 29.7 29.8 29.9 30. 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31. 31.1 31.2 31.3 31.4 31.5 31.6 31.7 31.8 31.9 32. 32.1 32.2 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33. 33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9 34. 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35. 35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36. 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9 37. 37.1 37.2 37.3 37.4 37.5 37.6 37.7 37.8 37.9 38. 38.1 38.2 38.3 38.4 38.5 38.6 38.7 38.8 38.9 39. 39.1 39.2 39.3 39.4 39.5 39.6 39.7 39.8 39.9 40. 40.1 40.2 40.3 40.4 40.5 40.6 40.7 40.8 40.9 41. 41.1 41.2 41.3 41.4 41.5 41.6 41.7 41.8 41.9 42. 42.1 42.2 42.3 42.4 42.5 42.6 42.7 42.8 42.9 43. 43.1 43.2 43.3 43.4 43.5 43.6 43.7 43.8 43.9 44. 44.1 44.2 44.3 44.4 44.5 44.6 44.7 44.8] In: print(y_predit_upper.shape) Out: [-10.920185 -10.730879 -10.395649 -9.781197 -8.639384 -6.6007776 -3.5282364 -0.28529644 2.1445403 3.7071989 4.679699 5.2941465 5.695036 5.9663973 6.157011 6.2958107 6.400445 6.4820027 6.5476484 6.6021276 6.648663 6.6894903 6.72619 6.7599044 6.7914734 6.821528 6.8505487 6.8789124 6.906917 6.934807 6.962783 6.991015 7.01965 7.0488157 7.0786242 7.1091766 7.140561 7.1728573 7.2061357 7.2404547 7.275866 7.31241 7.3501153 7.389001 7.4290714 7.4703183 7.512725 7.5562544 7.600858 7.646476 7.6930337 7.740444 7.7886114 7.8374276 7.886779 7.9365463 7.986604 8.036831 8.087101 8.137291 8.187288 8.236979 8.286261 8.335035 8.383221 8.430737 8.477519 8.52351 8.568661 8.61293 8.65629 8.698717 8.740196 8.780713 8.82027 8.858864 8.8965 8.933188 8.96894 9.003771 9.037693 9.070728 9.102894 9.134211 9.1647 9.19438 9.223274 9.251405 9.2787895 9.305452 9.331414 9.356693 9.381311 9.405285 9.428637 9.451382 9.47354 9.495129 9.516164 9.536662 9.556641 9.576112 9.595093 9.613596 9.631638 9.649229 9.666382 9.683111 9.69943 9.715345 9.730873 9.746021 9.760802 9.775225 9.7893 9.803036 9.8164425 9.829529 9.842304 9.854776 9.866953 9.878841 9.890451 9.901789 9.912861 9.923676 9.9342375 9.944555 9.954636 9.964481 9.974102 9.983503 9.992689 10.001665 10.010438 10.019011 10.0273905 10.03558 10.043587 10.051413 10.059064 10.066545 10.07386 10.081012 10.088005 10.094845 10.101532 10.108074 10.114471 10.120729 10.126851 10.132839 10.138698 10.144428 10.150038 10.155523 10.1608925 10.166145 10.171288 10.176319 10.181244 10.186064 10.190783 10.1954 10.199922 10.204349 10.20868 10.212924 10.217076 10.221144 10.225128 10.229028 10.232847 10.236588 10.2402525 10.243841 10.247356 10.2508 10.254173 10.257479 10.2607155 10.263888 10.266995 10.270041 10.273026 10.275951 10.278817 10.281626 10.284379 10.287078 10.289722 10.292316 10.294858 10.297348 10.299793 10.302189 10.304538 10.30684 10.309098 10.311312 10.313485 10.315615 10.317703 10.319752 10.321762 10.323734 10.325667 10.327566 10.329426 10.331253 10.333044 10.334803 10.336527 10.338222 10.339883 10.341513 10.343113 10.344685 10.346227 10.34774 10.349226 10.350685 10.352116 10.353521 10.354902 10.356257 10.357589 10.358896 10.36018 10.36144 10.362679 10.363894 10.365089 10.366262 10.367414 10.368546 10.369658 10.370752 10.371826 10.372882 10.373919 10.374937 10.375938 10.376922 10.37789 10.3788395 10.379774 10.380693 10.381596 10.3824835 10.383356 10.384214 10.385057 10.385887 10.3867035 10.387505 10.388294 10.3890705 10.3898325 10.390584 10.391323 10.39205 10.392763 10.393466 10.394157 10.394838 10.395506 10.396166 10.396815 10.397452 10.398081 10.398699 10.399307 10.399906 10.400494 10.401075 10.401647 10.402208 10.402762 10.403307 10.403845 10.404373 10.404894 10.405405 10.4059105 10.406408 10.4068985 10.40738 10.407856 10.408323 10.408785 10.409239 10.409686 10.410128 10.410563 10.410991 10.411414 10.411829 10.412239 10.412644 10.413042 10.413435 10.413822 10.414204 10.414579 10.414951 10.415317 10.415678 10.416034 10.416384 10.416731 10.417071 10.417408 10.41774 10.418068 10.418391 10.41871 10.4190235 10.419333 10.41964 10.419942 10.42024 10.420534 10.420824 10.42111 10.421393 10.421673 10.421947 10.422218 10.422487 10.422752 10.423013 10.423271 10.423527 10.423778 10.4240265 10.424272 10.424515 10.424753 10.42499 10.425224 10.425454 10.425682 10.425907 10.426128 10.426348 10.426565 10.426779 10.4269905 10.4272 10.427407 10.427612 10.427814 10.428013 10.428211 10.428406 10.428598 10.428788 10.428976 10.429163 10.429345 10.429527 10.4297085 10.429886 10.430061 10.430235 10.430407 10.430576 10.430744 10.43091 10.431075 10.431237 10.431398 10.431557 10.431714 10.4318695 10.432024 10.432177 10.432327 10.432476 10.432623 10.43277 10.432913 10.433057 10.433199 10.433338 10.433476 10.433613 10.433748 10.433882 10.434015 10.434147 10.434278 10.434406 10.434532 10.43466 10.434786 10.434908 10.43503 10.435152 10.435272 10.43539 10.435509 10.435625 10.435741 10.435854 10.435968 10.43608 10.436192 10.436301 10.43641 10.436517 10.436625 10.43673 10.436835 10.436939 10.437042 10.437143 10.437244 10.4373455 10.437443 10.437542 10.437639 10.437736 10.437831 10.437925 10.43802 10.438112 10.438204 10.438295 10.438386 10.438476 10.438565 10.438652 10.438741 10.438827 10.438912 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 10.438997 ] In: print(y_predit_lower.shape) Out: [-1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4231827e+01 -1.4229019e+01 -1.4225172e+01 -1.4219831e+01 -1.4212313e+01 -1.4201557e+01 -1.4185902e+01 -1.4162625e+01 -1.4127128e+01 -1.4071282e+01 -1.3980150e+01 -1.3825264e+01 -1.3550985e+01 -1.3048252e+01 -1.2114041e+01 -1.0446091e+01 -7.9321928e+00 -5.2788787e+00 -3.2908306e+00 -2.0122919e+00 -1.2166102e+00 -7.1388006e-01 -3.8587976e-01 -1.6385698e-01 -7.9002380e-03 1.0566306e-01 1.9127297e-01 2.5800228e-01 3.1171203e-01 3.5628605e-01 3.9436054e-01 4.2776465e-01 4.5779157e-01 4.8537636e-01 5.1120520e-01 5.3579545e-01 5.5953956e-01 5.8274651e-01 6.0565925e-01 6.2847805e-01 6.5136743e-01 6.7446661e-01 6.9789529e-01 7.2175813e-01 7.4614692e-01 7.7114415e-01 7.9682255e-01 8.2324672e-01 8.5047436e-01 8.7855363e-01 9.0752649e-01 9.3742609e-01 9.6827579e-01 1.0000916e+00 1.0328760e+00 1.0666242e+00 1.1013203e+00 1.1369352e+00 1.1734290e+00 1.2107525e+00 1.2488456e+00 1.2876358e+00 1.3270454e+00 1.3669858e+00 1.4073639e+00 1.4480829e+00 1.4890399e+00 1.5301342e+00 1.5712638e+00 1.6123290e+00 1.6532354e+00 1.6938915e+00 1.7342124e+00 1.7741199e+00 1.8135443e+00 1.8524213e+00 1.8906975e+00 1.9283261e+00 1.9652677e+00 2.0014882e+00 2.0369649e+00 2.0716777e+00 2.1056142e+00 2.1387653e+00 2.1711292e+00 2.2027068e+00 2.2335000e+00 2.2635174e+00 2.2927685e+00 2.3212667e+00 2.3490210e+00 2.3760500e+00 2.4023676e+00 2.4279904e+00 2.4529357e+00 2.4772196e+00 2.5008607e+00 2.5238762e+00 2.5462823e+00 2.5680971e+00 2.5893383e+00 2.6100211e+00 2.6301632e+00 2.6497784e+00 2.6688843e+00 2.6874938e+00 2.7056236e+00 2.7232871e+00 2.7404976e+00 2.7572689e+00 2.7736154e+00 2.7895460e+00 2.8050761e+00 2.8202147e+00 2.8349762e+00 2.8493690e+00 2.8634038e+00 2.8770909e+00 2.8904424e+00 2.9034648e+00 2.9161687e+00 2.9285626e+00 2.9406557e+00 2.9524565e+00 2.9639721e+00 2.9752111e+00 2.9861798e+00 2.9968877e+00 3.0073400e+00 3.0175438e+00 3.0275064e+00 3.0372334e+00 3.0467329e+00 3.0560083e+00 3.0650678e+00 3.0739164e+00 3.0825577e+00 3.0909996e+00 3.0992465e+00 3.1073031e+00 3.1151748e+00 3.1228657e+00 3.1303816e+00 3.1377258e+00 3.1449032e+00 3.1519175e+00 3.1587734e+00 3.1654744e+00 3.1720252e+00 3.1784282e+00 3.1846886e+00 3.1908092e+00 3.1967940e+00 3.2026453e+00 3.2083673e+00 3.2139630e+00 3.2194352e+00 3.2247872e+00 3.2300220e+00 3.2351422e+00 3.2401505e+00 3.2450500e+00 3.2498436e+00 3.2545323e+00 3.2591219e+00 3.2636104e+00 3.2680025e+00 3.2723012e+00 3.2765074e+00 3.2806249e+00 3.2846537e+00 3.2885976e+00 3.2924585e+00 3.2962365e+00 3.2999353e+00 3.3035574e+00 3.3071012e+00 3.3105736e+00 3.3139715e+00 3.3172994e+00 3.3205595e+00 3.3237495e+00 3.3268752e+00 3.3299356e+00 3.3329334e+00 3.3358698e+00 3.3387456e+00 3.3415632e+00 3.3443227e+00 3.3470273e+00 3.3496766e+00 3.3522720e+00 3.3548145e+00 3.3573065e+00 3.3597488e+00 3.3621411e+00 3.3644867e+00 3.3667846e+00 3.3690367e+00 3.3712454e+00 3.3734097e+00 3.3755307e+00 3.3776112e+00 3.3796487e+00 3.3816485e+00 3.3836088e+00 3.3855305e+00 3.3874140e+00 3.3892622e+00 3.3910737e+00 3.3928509e+00 3.3945937e+00 3.3963022e+00 3.3979788e+00 3.3996229e+00 3.4012361e+00 3.4028187e+00 3.4043717e+00 3.4058938e+00 3.4073887e+00 3.4088535e+00 3.4102931e+00 3.4117036e+00 3.4130898e+00 3.4144497e+00 3.4157834e+00 3.4170928e+00 3.4183784e+00 3.4196396e+00 3.4208779e+00 3.4220929e+00 3.4232869e+00 3.4244580e+00 3.4256082e+00 3.4267378e+00 3.4278469e+00 3.4289360e+00 3.4300056e+00 3.4310555e+00 3.4320869e+00 3.4331002e+00 3.4340954e+00 3.4350724e+00 3.4360318e+00 3.4369760e+00 3.4379010e+00 3.4388113e+00 3.4397054e+00 3.4405851e+00 3.4414482e+00 3.4422965e+00 3.4431300e+00 3.4439492e+00 3.4447536e+00 3.4455457e+00 3.4463229e+00 3.4470873e+00 3.4478393e+00 3.4485774e+00 3.4493046e+00 3.4500179e+00 3.4507203e+00 3.4514103e+00 3.4520888e+00 3.4527564e+00 3.4534125e+00 3.4540586e+00 3.4546938e+00 3.4553175e+00 3.4559317e+00 3.4565368e+00 3.4571309e+00 3.4577150e+00 3.4582901e+00 3.4588556e+00 3.4594121e+00 3.4599595e+00 3.4604993e+00 3.4610300e+00 3.4615517e+00 3.4620652e+00 3.4625711e+00 3.4630685e+00 3.4635587e+00 3.4640403e+00 3.4645157e+00 3.4649830e+00 3.4654431e+00 3.4658966e+00 3.4663415e+00 3.4667811e+00 3.4672141e+00 3.4676394e+00 3.4680586e+00 3.4684715e+00 3.4688792e+00 3.4692798e+00 3.4696741e+00 3.4700637e+00 3.4704461e+00 3.4708233e+00 3.4711957e+00 3.4715614e+00 3.4719224e+00 3.4722786e+00 3.4726286e+00 3.4729748e+00 3.4733148e+00 3.4736505e+00 3.4739819e+00 3.4743066e+00 3.4746289e+00 3.4749455e+00 3.4752569e+00 3.4755650e+00 3.4758687e+00 3.4761677e+00 3.4764628e+00 3.4767542e+00 3.4770417e+00 3.4773250e+00 3.4776039e+00 3.4778790e+00 3.4781504e+00 3.4784188e+00 3.4786835e+00 3.4789438e+00 3.4792008e+00 3.4794540e+00 3.4797049e+00 3.4799519e+00 3.4801960e+00 3.4804363e+00 3.4806743e+00 3.4809079e+00 3.4811401e+00 3.4813676e+00 3.4815922e+00 3.4818144e+00 3.4820347e+00 3.4822516e+00 3.4824648e+00 3.4826765e+00 3.4828849e+00 3.4830904e+00 3.4832940e+00 3.4834948e+00 3.4836936e+00 3.4838891e+00 3.4840827e+00 3.4842734e+00 3.4844618e+00 3.4846487e+00 3.4848323e+00 3.4850144e+00 3.4851933e+00 3.4853716e+00 3.4855466e+00 3.4857192e+00 3.4858909e+00 3.4860601e+00 3.4862275e+00 3.4863930e+00 3.4865561e+00 3.4867177e+00 3.4868770e+00 3.4870348e+00 3.4871902e+00 3.4873438e+00 3.4874964e+00 3.4876461e+00 3.4877954e+00 3.4879422e+00 3.4880881e+00 3.4882312e+00 3.4883738e+00 3.4885139e+00 3.4886527e+00 3.4887900e+00 3.4889264e+00 3.4890614e+00 3.4891939e+00 3.4893255e+00 3.4894552e+00 3.4895840e+00 3.4897108e+00 3.4898376e+00 3.4899626e+00 3.4900851e+00 3.4902072e+00 3.4903274e+00 3.4904480e+00 3.4905648e+00 3.4906826e+00 3.4907985e+00 3.4909120e+00 3.4910259e+00 3.4911375e+00 3.4912481e+00 3.4913583e+00 3.4914665e+00 3.4915743e+00 3.4916811e+00 3.4917865e+00 3.4918904e+00 3.4919934e+00 3.4920964e+00 3.4921975e+00 3.4922967e+00 3.4923968e+00 3.4924951e+00 3.4925923e+00 3.4926887e+00 3.4927831e+00 3.4928784e+00 3.4929719e+00 3.4930644e+00 3.4931564e+00 3.4932475e+00 3.4933367e+00 3.4934263e+00 3.4935131e+00 3.4936023e+00 3.4936886e+00 3.4937744e+00 3.4938593e+00 3.4939427e+00 3.4940267e+00 3.4941092e+00 3.4941907e+00 3.4942718e+00 3.4943528e+00 3.4944315e+00 3.4945107e+00 3.4945889e+00 3.4946666e+00 3.4947433e+00 3.4948187e+00] Dont know what I am missing here with the data structure.
Calculating Rolling Mean of Temperature Anomalies- python
The data set is indexed by year-month-day (dates) and has columns TMAX and TMIN. I need to calculate rolling means of each of the anomalies you calculated. Use a window of 10 years and have the window centered, then add this to your plot. This part of the code creates a plot of anomalies: tmaxanom = cll.TMAX - cll.TMAX.mean() tminanom = cll.TMIN - cll.TMIN.mean() yearlytmax = tmaxanom.resample('1y').mean() yearlytmin = tminanom.resample('1y').mean() ax = plt.plot(yearlytmax, color='red', lw=2, ms=3, marker='o') ax = plt.plot(yearlytmin, color='blue', lw=2, ms=3, marker='o') plt.legend(('TMAX Anomaly', 'TMIN Anomaly'), loc='best') plt.xlabel("Year") plt.ylabel("Degrees C") plt.title("Temperature Anomalies in College Station, Texas") I am trying to calculate rolling means by the following: rolmean = yearlytmax.rolling(window=10, center=True) rolmean2 = yearlytmin.rolling(window=10, center=True) plt.plot(rolmean, color='pink', label='Rolling Mean Max') plt.plot(rolmean2, color='yellow', label='Rolling Mean Min') However this is causing python to throw an error: NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704 I followed the link, however am still unsure how to fix this problem. A Sample of data is: DATE TMAX. TMIN 1951-08-01 37.8 22.8 1951-08-02 37.8 22.2 1951-08-03 40.0 23.9 1951-08-04 41.7 26.7 1951-08-05 41.1 26.1 1951-08-06 40.6 26.7 1951-08-07 38.9 24.4 1951-08-08 39.4 25.0 1951-08-09 38.9 24.4 1951-08-10 38.9 24.4 1951-08-11 38.9 22.2 1951-08-12 40.0 23.3 1951-08-13 40.6 22.8 1951-08-14 41.1 25.6 1951-08-15 41.1 23.9 1951-08-16 42.2 24.4 1951-08-17 41.7 24.4 1951-08-18 36.7 21.7 1951-08-19 31.7 23.3 1951-08-20 36.7 21.7 1951-08-21 38.3 23.3 1951-08-22 39.4 22.2 1951-08-23 37.2 23.9 1951-08-24 37.8 23.3 1951-08-25 38.3 23.9 1951-08-26 37.8 23.3 1951-08-27 37.8 23.9 1951-08-28 38.3 22.8 1951-08-29 38.3 23.3 1951-08-30 38.9 23.9 ... ... ...
I got it to work by adding in .mean() to the end of each of the rolling mean commands, so it looks like: rolmean = yearlytmax.rolling(window=10, center=True).mean() rolmean2 = yearlytmin.rolling(window=10, center=True).mean()
extract only certain rows in a dataframe
I have a dataframe like this: Code Date Open High Low Close Volume VWAP TWAP 0 US_GWA_BTC 2014-04-01 467.28 488.62 467.28 479.56 74,776.48 482.76 482.82 1 GWA_BTC 2014-04-02 479.20 494.30 431.32 437.08 114,052.96 460.19 465.93 2 GWA_BTC 2014-04-03 437.33 449.74 414.41 445.60 91,415.08 432.29 433.28 . 316 MWA_XRP_US 2018-01-19 1.57 1.69 1.48 1.53 242,563,870.44 1.59 1.59 317 MWA_XRP_US 2018-01-20 1.54 1.62 1.49 1.57 140,459,727.30 1.56 1.56 I want to filter out rows where code which has GWA infront of it. I tried this code but it's not working. df.set_index("Code").filter(regex='[GWA_]*', axis=0)
Try using startswith: df[df.Code.str.startswith('GWA')]
How to keep pandas group by column when applying transform function?
This is my pandas dataframe look's like: sampling_time MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE 0 2018-07-15 08:41:49.028 4.41 32.87 19.12 7.70 10.29 7.59 4.49 1 2018-07-15 08:41:49.028 2.98 19.08 12.47 4.72 6.34 5.15 3.02 2 2018-07-15 08:41:49.028 2.73 16.88 11.33 4.22 5.69 4.72 2.76 3 2018-07-15 08:41:49.028 2.69 16.47 11.11 4.13 5.57 4.64 2.71 4 2018-07-15 08:41:49.028 2.66 16.26 11.00 4.09 5.50 4.60 2.69 When I'm doing group by (split apply combine method), my sampling time column was removed. transformed = dataframe.groupby('sampling_time').transform(lambda x: (x - x.mean()) / x.std()) transformed.head() MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE 0 15.710127 15.975636 15.773724 15.876433 15.874190 15.694674 1 3.519619 3.313661 3.494836 3.408578 3.404160 3.563717 2 1.388411 1.293621 1.389884 1.316656 1.352130 1.425885 3 1.047418 0.917159 0.983665 0.940110 0.973294 1.028148 4 0.791673 0.724337 0.780556 0.772756 0.752306 0.829280 Any help or suggestion about how to keep the sampling time column would be very appreciated.
You can do this by setting 'sampling_time' into the index, then when you runs groupby with transform, you will get your transform columns out with the index. df1 = df.set_index('sampling_time') df1.groupby('sampling_time').transform(lambda x: x-x.std()) output: MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL \ sampling_time 2018-07-15 08:41:49.028 3.663522 25.760508 15.652432 6.154209 2018-07-15 08:41:49.028 2.233522 11.970508 9.002432 3.174209 2018-07-15 08:41:49.028 1.983522 9.770508 7.862432 2.674209 2018-07-15 08:41:49.028 1.943522 9.360508 7.642432 2.584209 2018-07-15 08:41:49.028 1.913522 9.150508 7.532432 2.544209 MQ2_CH4 MQ2_H2 MQ2_PROPANE sampling_time 2018-07-15 08:41:49.028 8.243523 6.313227 3.7205 2018-07-15 08:41:49.028 4.293523 3.873227 2.2505 2018-07-15 08:41:49.028 3.643523 3.443227 1.9905 2018-07-15 08:41:49.028 3.523523 3.363227 1.9405 2018-07-15 08:41:49.028 3.453523 3.323227 1.9205
Remove values when reading a csv and return to a list
I have a csv file of subjects XY Coordinates. Some XY's have been removed if the X-Coordinate is less than 5. This can be for any player and changes over time. (See example dataset). At the start of this file P2, P7, P12, P17 have removed data. Although, throughout the file each player will have data missing. for about 90% of the file there will be at least 4 players having missing data at any time point. Frame Time P1_X P2_Y P2_X P2_Y P3_X P3_Y P4_X P4_Y P5_X P5_Y P6_X P6_Y P7_X P7_Y P8_X P8_Y P9_X P9_Y P10_X P10_Y P11_X P11_Y P12_X P12_Y 0 10:39.2 65.75 45.10 73.74 -3.52 61.91 41.80 67.07 -24.62 77.14 -22.98 93.95 3.51 56.52 28.44 70.21 11.06 73.08 -35.54 69.79 45.73 73.34 29.26 64.73 -40.69 70.90 6.11 70.94 -45.11 42.78 3.00 61.77 -1.05 72.07 38.62 1 10:39.3 65.77 45.16 73.69 -3.35 61.70 41.79 67.19 -24.59 77.17 -23.03 93.90 3.53 56.54 28.38 70.20 11.00 73.15 -35.48 69.79 45.86 73.20 29.30 64.96 -40.77 70.91 6.10 71.04 -45.29 42.84 3.02 61.82 -0.99 72.12 38.71 2 10:39.4 65.78 45.24 73.63 -3.17 61.70 41.79 67.32 -24.56 77.20 -23.05 93.83 3.55 56.59 28.31 70.20 10.92 73.20 -35.41 69.79 45.86 73.03 29.36 65.19 -40.84 70.91 6.10 71.15 -45.50 42.91 3.04 61.89 -0.91 72.16 38.80 3 10:39.5 65.78 45.33 73.57 -3.00 61.49 41.78 67.45 -24.50 77.25 -23.07 93.75 3.57 56.59 28.31 70.21 10.83 73.25 -35.33 69.77 46.01 72.86 29.43 65.45 -40.86 70.90 6.09 71.15 -45.50 43.01 3.08 61.98 -0.81 72.19 38.86 4 10:39.6 65.78 45.33 73.51 -2.86 61.32 41.76 67.45 -24.50 77.31 -23.09 93.64 3.60 56.65 28.22 70.23 10.72 73.29 -35.22 69.72 46.17 72.69 29.51 65.75 -40.84 70.88 6.08 71.24 -45.71 43.11 3.12 62.06 -0.70 72.22 38.90 5 10:39.7 65.75 45.44 73.51 -2.86 61.20 41.73 67.59 -24.37 77.38 -23.10 93.52 3.63 56.73 28.09 70.25 10.59 73.29 -35.22 69.68 46.33 72.49 29.60 66.06 -40.84 70.86 6.05 71.31 -45.91 43.22 3.14 62.13 -0.59 72.26 38.92 6 10:39.8 65.72 45.56 73.45 -2.72 61.08 41.71 67.72 -24.19 77.44 -23.12 93.39 3.69 56.80 27.91 70.27 10.45 73.34 -35.08 69.66 46.48 72.27 29.67 66.36 -40.87 70.86 6.01 71.39 -46.09 43.35 3.17 62.20 -0.47 72.29 38.93 7 10:39.9 65.72 45.56 73.34 -2.48 60.97 41.72 67.92 -23.76 77.51 -23.13 93.23 3.75 56.80 27.91 70.30 10.31 73.40 -34.76 69.64 46.63 72.01 29.74 66.62 -40.93 70.85 5.96 71.39 -46.09 43.51 3.18 62.27 -0.35 72.31 38.93 8 10:40.0 65.73 45.90 73.34 -2.48 60.86 41.72 67.92 -23.76 77.51 -23.13 93.05 3.80 56.91 27.47 70.30 10.31 73.40 -34.76 69.63 46.76 72.01 29.74 66.82 -41.06 70.83 5.88 71.53 -46.45 43.68 3.20 62.27 -0.35 72.29 38.92 9 10:40.1 65.73 46.09 73.29 -2.39 60.74 41.70 68.00 -23.52 77.60 -23.12 92.83 3.86 56.99 27.23 70.35 10.17 73.43 -34.58 69.64 46.88 71.72 29.80 66.99 -41.22 70.80 5.79 71.60 -46.63 43.86 3.23 62.34 -0.22 72.22 38.89 10 10:40.2 65.76 46.27 73.22 -2.32 60.60 41.65 68.07 -23.24 77.71 -23.05 92.83 3.86 57.14 26.98 70.43 10.05 73.47 -34.38 69.68 46.96 71.42 29.85 67.16 -41.38 70.77 5.70 71.64 -46.80 44.04 3.28 62.43 -0.08 72.13 38.86 11 10:40.3 65.81 46.43 73.12 -2.28 60.43 41.60 68.12 -22.93 77.83 -22.94 92.58 3.89 57.32 26.72 70.54 9.92 73.50 -34.16 69.75 46.99 71.08 29.89 67.16 -41.38 70.74 5.62 71.67 -46.96 44.21 3.33 62.54 0.09 72.03 38.84 12 10:40.4 65.87 46.58 72.98 -2.29 60.24 41.55 68.15 -22.57 77.94 -22.76 92.30 3.93 57.52 26.45 70.67 9.78 73.50 -33.91 69.85 47.00 70.72 29.91 67.31 -41.57 70.70 5.52 71.73 -47.15 44.37 3.40 62.66 0.24 72.03 38.84 13 10:40.5 65.91 46.69 72.80 -2.32 60.07 41.49 68.17 -22.18 78.01 -22.53 91.99 3.98 57.71 26.18 70.81 9.60 73.49 -33.68 69.97 47.03 70.33 29.92 67.45 -41.78 70.64 5.38 71.81 -47.35 44.37 3.40 62.80 0.40 71.96 38.81 14 10:40.6 65.94 46.80 72.60 -2.34 59.93 41.43 68.19 -21.77 78.05 -22.27 91.69 4.03 57.89 25.90 70.96 9.42 73.47 -33.47 70.10 47.09 69.93 29.93 67.54 -41.96 70.56 5.20 71.86 -47.53 44.54 3.50 62.98 0.58 71.91 38.77 15 10:40.7 65.95 46.93 72.36 -2.36 59.80 41.38 68.18 -21.32 78.08 -21.99 91.38 4.09 58.11 25.63 71.11 9.26 73.41 -33.26 70.24 47.15 69.50 29.91 67.58 -42.15 70.56 5.20 71.86 -47.69 44.54 3.50 63.16 0.77 71.91 38.77 16 10:40.8 65.93 47.09 72.10 -2.34 59.65 41.36 68.16 -20.86 78.11 -21.68 91.09 4.17 58.35 25.38 71.23 9.13 73.31 -33.05 70.38 47.20 69.07 29.84 67.56 -42.32 70.44 5.00 71.81 -47.84 45.00 3.79 63.34 0.97 71.80 38.60 17 10:40.9 65.92 47.23 71.85 -2.28 59.47 41.37 68.11 -20.41 78.11 -21.37 90.81 4.27 58.59 25.12 71.33 9.00 73.22 -32.84 70.52 47.26 68.63 29.75 67.47 -42.51 70.28 4.78 71.75 -47.97 45.26 3.94 63.52 1.14 71.73 38.46 Because there is missing data I tried to read the csv file as such. If I removed the try: except: function I received a Type Error stating I couldn't convert string to float. with open('NoBench.csv') as csvfile : readCSV = csv.reader(csvfile, delimiter=',') n=0 for row in readCSV : if n == 0 : n+=1 try: visuals[0].append([float(row[3]),float(row[5]),float(row[7]),float(row[9]),float(row[11]),float(row[13]),float(row[15]),float(row[17]),float(row[19]),float(row[21]),float(row[23]),float(row[25]),float(row[27]),float(row[29]),float(row[31]),float(row[33]),float(row[35]),float(row[37]),float(row[39]),float(row[41]),float(row[43])]) visuals[1].append([float(row[2]),float(row[4]),float(row[6]),float(row[8]),float(row[10]),float(row[12]),float(row[14]),float(row[16]),float(row[18]),float(row[20]),float(row[22]),float(row[24]),float(row[26]),float(row[28]),float(row[30]),float(row[32]),float(row[34]),float(row[36]),float(row[38]),float(row[40]),float(row[42])]) except ValueError: continue However, when I use this code, it only returns the values to the list when every row of data is present. As mentioned, this only occurs for about 10% of the file. I am using the xy's to create a scatter plot at each point so cannot change to 0,0 as that will create a false data point. How do I alter the code so it returns the xy values when players data isn't removed.
You can define your own convert before the loop: def convert_float(x); if x: # equivalent to if x == '' return float(x) else: return 0.0 # or set the default value you expect to replace the missing data with. In combination with #juanpa.arrivillaga's excellent suggestion, change the visual.append lines to these: visual[0].append(list(map(convert_float, row[3::2])) visual[1].append(list(map(convert_float, row[2::2])) Also I'm not sure what your n+=1 line is supposed to do... if you merely wanted to skip the first row (headers), simply do this: def convert_float(x); if x: return float(x) else: return 0.0 for i, row in enumerate(readCSV): if n > 0: visual[0].append(list(map(convert_float, row[3::2])) visual[1].append(list(map(convert_float, row[2::2]))