pandas wrapper raise ValueError - python
I got the below error while trying to run my python script via pandas, when runing on a 30 millon records data , please advise what went wrong
Traceback (most recent call last): File "extractyooochoose2.py", line 32, in totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
File "", line 13, in unique
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/pandas/core/groupby.py", line 620, in wrapper
raise ValueError
Data and code as shown below
import pandas as pd
import datetime as dt
clickspath='/tmp/gensim/yoochoose/yoochoose-clicks.dat'
buyspath='/tmp/gensim/yoochoose/yoochoose-buys.dat'
clicksdat=pd.read_csv(clickspath,header=None,dtype={'itemid': pd.np.str_,'Sid':pd.np.str_,'Timestamp':pd.np.str_,'itemcategory':pd.np.str_})
clicksdat.columns=['Sid','Timestamp','itemid','itemcategory']
buysdat=pd.read_csv(buyspath,header=None)
buysdat.columns=['Sid','Timestamp','itemid','price','qty']
segment={}
for i in range(24):
if i<7:
segment[i]='EM'
elif i<10:
segment[i]='M'
elif i<13:
segment[i]='A'
elif i<18:
segment[i]='E'
elif i<23:
segment[i]='N'
elif i<25:
segment[i]='MN'
#*******************************************
buyersession=buysdat.Sid.unique()
clickersession=clicksdat.Sid.unique()
maxtemp=[(dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")) for x in clicksdat.groupby('Sid')['Timestamp'].max()]
mintemp=[dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ") for x in clicksdat.groupby('Sid')['Timestamp'].min()]
duration=[int((a-b).total_seconds()) for a,b in zip(maxtemp,mintemp)]
day=[x.day for x in maxtemp]
month=[x.month for x in maxtemp]
noofnavigations=[clicksdat.groupby('Sid').count().Timestamp][0]
totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
totalcats=[len(x) for x in clicksdat.groupby('Sid')['itemcategory'].unique()]
timesegment= [segment[x.hour]for x in maxtemp]
segmentchange=[1 if (segment[x.hour]!=segment[y.hour]) else 0 for x,y in zip(maxtemp,mintemp)]
purchased=[x in buyersession for x in noofnavigations.index.values ]
percentile_list = pd.DataFrame({'purchased' : purchased,'duration':duration,'day':day,'month':month,'noofnavigations':noofnavigations,'totalitems':totalitems,'totalcats':totalcats,'timesegment':timesegment,'segmentchange':segmentchange })
percentile_list.to_csv('/tmp/gensim/yoochoose/yoochoose-clicks1001.csv')
Sample data as shown below
sessioid,timestamp,itemid,category
1,2014-04-07T10:51:09.277Z,214536502,0
1,2014-04-07T10:54:09.868Z,214536500,0
1,2014-04-07T10:54:46.998Z,214536506,0
1,2014-04-07T10:57:00.306Z,214577561,0
2,2014-04-07T13:56:37.614Z,214662742,0
2,2014-04-07T13:57:19.373Z,214662742,0
2,2014-04-07T13:58:37.446Z,214825110,0
2,2014-04-07T13:59:50.710Z,214757390,0
Related
TypeError when fitting Statsmodels OLS with standard errors clustered 2 ways
Context Building on top of How to run Panel OLS regressions with 3+ fixed-effect and errors clustering? and notably Josef's third comment, I am trying to adapt the OLS Coefficients and Standard Errors Clustered by Firm and Year section of this example notebook below: cluster_2ways_ols = sm.ols(formula='y ~ x', data=df).fit(cov_type='cluster', cov_kwds={'groups': np.array(df[['firmid', 'year']])}, use_t=True) to my own example dataset. Note that I am able to reproduce this example (and it works). I can also add fixed-effects, by using 'y ~ x + C(firmid) + C(year)' as formula instead. Problem However, trying to port the same command to my example dataset (see code below), I'm getting the following error: >>> model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df) >>> result = model.fit( cov_type='cluster', cov_kwds={'groups': np.array(df[['country', 'year_publication']])}, use_t=True ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 343, in fit lfit = OLSResults( File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 1607, in __init__ self.get_robustcov_results(cov_type=cov_type, use_self=True, File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 2568, in get_robustcov_results res.cov_params_default = sw.cov_cluster_2groups( File "/path/venv/lib64/python3.10/site-packages/statsmodels/stats/sandwich_covariance.py", line 591, in cov_cluster_2groups combine_indices(group)[0], File "/path/venv/lib64/python3.10/site-packages/statsmodels/tools/grouputils.py", line 55, in combine_indices groups_ = groups.view([('', groups.dtype)] * groups.shape[1]) File "/path/venv/lib64/python3.10/site-packages/numpy/core/_internal.py", line 549, in _view_is_safe raise TypeError("Cannot change data-type for object array.") TypeError: Cannot change data-type for object array. I have tried to manually cast the year_publication to string/object using np.array(df[['country', 'year_publication']].astype("str")), but it doesn't solve the issue. Questions What is the cause of the TypeError()? How to adapt the example command to my dataset? Minimal Working Example from io import StringIO import numpy as np import pandas as pd import statsmodels.api as sm DATA = """ "continent","country","source","year_publication","year_data","population","gdp" "Africa","Angola","OECD",2020,2018,972,52.69 "Africa","Angola","OECD",2020,2019,986,802.7 "Africa","Angola","OECD",2020,2020,641,568.74 "Africa","Angola","OECD",2021,2018,438,168.83 "Africa","Angola","OECD",2021,2019,958,310.57 "Africa","Angola","OECD",2021,2020,270,144.02 "Africa","Angola","OECD",2022,2018,528,359.71 "Africa","Angola","OECD",2022,2019,974,582.98 "Africa","Angola","OECD",2022,2020,835,820.49 "Africa","Angola","IMF",2020,2018,168,148.85 "Africa","Angola","IMF",2020,2019,460,236.21 "Africa","Angola","IMF",2020,2020,360,297.15 "Africa","Angola","IMF",2021,2018,381,249.13 "Africa","Angola","IMF",2021,2019,648,128.05 "Africa","Angola","IMF",2021,2020,206,179.05 "Africa","Angola","IMF",2022,2018,282,150.29 "Africa","Angola","IMF",2022,2019,125,23.42 "Africa","Angola","IMF",2022,2020,410,247.35 "Africa","Angola","WorldBank",2020,2018,553,182.06 "Africa","Angola","WorldBank",2020,2019,847,698.87 "Africa","Angola","WorldBank",2020,2020,844,126.61 "Africa","Angola","WorldBank",2021,2018,307,239.76 "Africa","Angola","WorldBank",2021,2019,659,510.73 "Africa","Angola","WorldBank",2021,2020,548,331.89 "Africa","Angola","WorldBank",2022,2018,448,122.76 "Africa","Angola","WorldBank",2022,2019,768,761.41 "Africa","Angola","WorldBank",2022,2020,324,163.57 "Africa","Benin","OECD",2020,2018,513,196.9 "Africa","Benin","OECD",2020,2019,590,83.7 "Africa","Benin","OECD",2020,2020,791,511.09 "Africa","Benin","OECD",2021,2018,799,474.43 "Africa","Benin","OECD",2021,2019,455,234.21 "Africa","Benin","OECD",2021,2020,549,238.83 "Africa","Benin","OECD",2022,2018,235,229.33 "Africa","Benin","OECD",2022,2019,347,46.51 "Africa","Benin","OECD",2022,2020,532,392.13 "Africa","Benin","IMF",2020,2018,138,137.05 "Africa","Benin","IMF",2020,2019,978,239.82 "Africa","Benin","IMF",2020,2020,821,33.41 "Africa","Benin","IMF",2021,2018,453,291.93 "Africa","Benin","IMF",2021,2019,526,381.88 "Africa","Benin","IMF",2021,2020,467,313.57 "Africa","Benin","IMF",2022,2018,948,555.23 "Africa","Benin","IMF",2022,2019,323,289.91 "Africa","Benin","IMF",2022,2020,421,62.35 "Africa","Benin","WorldBank",2020,2018,983,271.69 "Africa","Benin","WorldBank",2020,2019,138,23.55 "Africa","Benin","WorldBank",2020,2020,636,623.65 "Africa","Benin","WorldBank",2021,2018,653,534.99 "Africa","Benin","WorldBank",2021,2019,564,368.8 "Africa","Benin","WorldBank",2021,2020,741,312.02 "Africa","Benin","WorldBank",2022,2018,328,292.11 "Africa","Benin","WorldBank",2022,2019,653,429.21 "Africa","Benin","WorldBank",2022,2020,951,242.73 "Africa","Chad","OECD",2020,2018,176,95.06 "Africa","Chad","OECD",2020,2019,783,425.34 "Africa","Chad","OECD",2020,2020,885,461.6 "Africa","Chad","OECD",2021,2018,673,15.87 "Africa","Chad","OECD",2021,2019,131,74.46 "Africa","Chad","OECD",2021,2020,430,61.58 "Africa","Chad","OECD",2022,2018,593,211.34 "Africa","Chad","OECD",2022,2019,647,550.37 "Africa","Chad","OECD",2022,2020,154,105.65 "Africa","Chad","IMF",2020,2018,160,32.41 "Africa","Chad","IMF",2020,2019,654,27.84 "Africa","Chad","IMF",2020,2020,616,468.92 "Africa","Chad","IMF",2021,2018,996,22.4 "Africa","Chad","IMF",2021,2019,126,93.18 "Africa","Chad","IMF",2021,2020,879,547.87 "Africa","Chad","IMF",2022,2018,663,520 "Africa","Chad","IMF",2022,2019,681,544.76 "Africa","Chad","IMF",2022,2020,101,55.6 "Africa","Chad","WorldBank",2020,2018,786,757.22 "Africa","Chad","WorldBank",2020,2019,599,593.69 "Africa","Chad","WorldBank",2020,2020,641,529.84 "Africa","Chad","WorldBank",2021,2018,343,287.89 "Africa","Chad","WorldBank",2021,2019,438,340.83 "Africa","Chad","WorldBank",2021,2020,762,594.67 "Africa","Chad","WorldBank",2022,2018,430,128.69 "Africa","Chad","WorldBank",2022,2019,260,242.59 "Africa","Chad","WorldBank",2022,2020,607,216.1 "Europe","Denmark","OECD",2020,2018,114,86.75 "Europe","Denmark","OECD",2020,2019,937,373.29 "Europe","Denmark","OECD",2020,2020,866,392.93 "Europe","Denmark","OECD",2021,2018,296,41.04 "Europe","Denmark","OECD",2021,2019,402,32.67 "Europe","Denmark","OECD",2021,2020,306,7.88 "Europe","Denmark","OECD",2022,2018,540,379.51 "Europe","Denmark","OECD",2022,2019,108,26.72 "Europe","Denmark","OECD",2022,2020,752,307.2 "Europe","Denmark","IMF",2020,2018,157,24.24 "Europe","Denmark","IMF",2020,2019,303,79.04 "Europe","Denmark","IMF",2020,2020,286,122.36 "Europe","Denmark","IMF",2021,2018,569,69.32 "Europe","Denmark","IMF",2021,2019,808,642.67 "Europe","Denmark","IMF",2021,2020,157,5.58 "Europe","Denmark","IMF",2022,2018,147,112.21 "Europe","Denmark","IMF",2022,2019,414,311.16 "Europe","Denmark","IMF",2022,2020,774,230.46 "Europe","Denmark","WorldBank",2020,2018,695,350.03 "Europe","Denmark","WorldBank",2020,2019,511,209.84 "Europe","Denmark","WorldBank",2020,2020,181,29.27 "Europe","Denmark","WorldBank",2021,2018,503,176.89 "Europe","Denmark","WorldBank",2021,2019,710,609.02 "Europe","Denmark","WorldBank",2021,2020,264,165.78 "Europe","Denmark","WorldBank",2022,2018,670,638.99 "Europe","Denmark","WorldBank",2022,2019,651,354.6 "Europe","Denmark","WorldBank",2022,2020,632,623.94 "Europe","Estonia","OECD",2020,2018,838,263.67 "Europe","Estonia","OECD",2020,2019,638,533.95 "Europe","Estonia","OECD",2020,2020,898,638.73 "Europe","Estonia","OECD",2021,2018,262,98.16 "Europe","Estonia","OECD",2021,2019,569,552.54 "Europe","Estonia","OECD",2021,2020,868,252.48 "Europe","Estonia","OECD",2022,2018,927,264.65 "Europe","Estonia","OECD",2022,2019,205,150.6 "Europe","Estonia","OECD",2022,2020,828,752.61 "Europe","Estonia","IMF",2020,2018,841,176.31 "Europe","Estonia","IMF",2020,2019,614,230.55 "Europe","Estonia","IMF",2020,2020,500,41.19 "Europe","Estonia","IMF",2021,2018,510,169.68 "Europe","Estonia","IMF",2021,2019,765,401.85 "Europe","Estonia","IMF",2021,2020,751,319.6 "Europe","Estonia","IMF",2022,2018,314,58.81 "Europe","Estonia","IMF",2022,2019,155,2.24 "Europe","Estonia","IMF",2022,2020,734,187.6 "Europe","Estonia","WorldBank",2020,2018,332,160.17 "Europe","Estonia","WorldBank",2020,2019,466,385.33 "Europe","Estonia","WorldBank",2020,2020,487,435.06 "Europe","Estonia","WorldBank",2021,2018,461,249.19 "Europe","Estonia","WorldBank",2021,2019,932,763.38 "Europe","Estonia","WorldBank",2021,2020,650,463.91 "Europe","Estonia","WorldBank",2022,2018,570,549.97 "Europe","Estonia","WorldBank",2022,2019,909,80.48 "Europe","Estonia","WorldBank",2022,2020,523,242.22 "Europe","Finland","OECD",2020,2018,565,561.64 "Europe","Finland","OECD",2020,2019,646,161.62 "Europe","Finland","OECD",2020,2020,194,133.69 "Europe","Finland","OECD",2021,2018,529,39.76 "Europe","Finland","OECD",2021,2019,800,680.12 "Europe","Finland","OECD",2021,2020,418,399.19 "Europe","Finland","OECD",2022,2018,591,253.12 "Europe","Finland","OECD",2022,2019,457,272.58 "Europe","Finland","OECD",2022,2020,157,105.1 "Europe","Finland","IMF",2020,2018,860,445.03 "Europe","Finland","IMF",2020,2019,108,47.72 "Europe","Finland","IMF",2020,2020,523,500.58 "Europe","Finland","IMF",2021,2018,560,81.47 "Europe","Finland","IMF",2021,2019,830,664.64 "Europe","Finland","IMF",2021,2020,903,762.62 "Europe","Finland","IMF",2022,2018,179,167.73 "Europe","Finland","IMF",2022,2019,137,98.98 "Europe","Finland","IMF",2022,2020,666,524.86 "Europe","Finland","WorldBank",2020,2018,319,146.01 "Europe","Finland","WorldBank",2020,2019,401,219.56 "Europe","Finland","WorldBank",2020,2020,711,45.35 "Europe","Finland","WorldBank",2021,2018,828,20.97 "Europe","Finland","WorldBank",2021,2019,180,66.3 "Europe","Finland","WorldBank",2021,2020,682,92.57 "Europe","Finland","WorldBank",2022,2018,254,81.2 "Europe","Finland","WorldBank",2022,2019,619,159.08 "Europe","Finland","WorldBank",2022,2020,191,184.4 """ df = pd.read_csv(StringIO(DATA)) model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df) result = model.fit( cov_type='cluster', cov_kwds={'groups': np.array(df[['country', 'year_publication']])}, use_t=True ) print(result.summary())
I have realized that the groups must be an array of integers rather than of objects/strings. Thus, label encoding the string column as follows: df["country"] = df["country"].astype("category") df["country_id"] = df.country.cat.codes and using country_id to cluster the standard errors solves the issue: result = model.fit( cov_type='cluster', cov_kwds={'groups': np.array(df[['country_id', 'year_publication']])}, use_t=True ) Fully working example: from io import StringIO import numpy as np import pandas as pd import statsmodels.api as sm DATA = """ "continent","country","source","year_publication","year_data","population","gdp" "Africa","Angola","OECD",2020,2018,972,52.69 "Africa","Angola","OECD",2020,2019,986,802.7 "Africa","Angola","OECD",2020,2020,641,568.74 "Africa","Angola","OECD",2021,2018,438,168.83 "Africa","Angola","OECD",2021,2019,958,310.57 "Africa","Angola","OECD",2021,2020,270,144.02 "Africa","Angola","OECD",2022,2018,528,359.71 "Africa","Angola","OECD",2022,2019,974,582.98 "Africa","Angola","OECD",2022,2020,835,820.49 "Africa","Angola","IMF",2020,2018,168,148.85 "Africa","Angola","IMF",2020,2019,460,236.21 "Africa","Angola","IMF",2020,2020,360,297.15 "Africa","Angola","IMF",2021,2018,381,249.13 "Africa","Angola","IMF",2021,2019,648,128.05 "Africa","Angola","IMF",2021,2020,206,179.05 "Africa","Angola","IMF",2022,2018,282,150.29 "Africa","Angola","IMF",2022,2019,125,23.42 "Africa","Angola","IMF",2022,2020,410,247.35 "Africa","Angola","WorldBank",2020,2018,553,182.06 "Africa","Angola","WorldBank",2020,2019,847,698.87 "Africa","Angola","WorldBank",2020,2020,844,126.61 "Africa","Angola","WorldBank",2021,2018,307,239.76 "Africa","Angola","WorldBank",2021,2019,659,510.73 "Africa","Angola","WorldBank",2021,2020,548,331.89 "Africa","Angola","WorldBank",2022,2018,448,122.76 "Africa","Angola","WorldBank",2022,2019,768,761.41 "Africa","Angola","WorldBank",2022,2020,324,163.57 "Africa","Benin","OECD",2020,2018,513,196.9 "Africa","Benin","OECD",2020,2019,590,83.7 "Africa","Benin","OECD",2020,2020,791,511.09 "Africa","Benin","OECD",2021,2018,799,474.43 "Africa","Benin","OECD",2021,2019,455,234.21 "Africa","Benin","OECD",2021,2020,549,238.83 "Africa","Benin","OECD",2022,2018,235,229.33 "Africa","Benin","OECD",2022,2019,347,46.51 "Africa","Benin","OECD",2022,2020,532,392.13 "Africa","Benin","IMF",2020,2018,138,137.05 "Africa","Benin","IMF",2020,2019,978,239.82 "Africa","Benin","IMF",2020,2020,821,33.41 "Africa","Benin","IMF",2021,2018,453,291.93 "Africa","Benin","IMF",2021,2019,526,381.88 "Africa","Benin","IMF",2021,2020,467,313.57 "Africa","Benin","IMF",2022,2018,948,555.23 "Africa","Benin","IMF",2022,2019,323,289.91 "Africa","Benin","IMF",2022,2020,421,62.35 "Africa","Benin","WorldBank",2020,2018,983,271.69 "Africa","Benin","WorldBank",2020,2019,138,23.55 "Africa","Benin","WorldBank",2020,2020,636,623.65 "Africa","Benin","WorldBank",2021,2018,653,534.99 "Africa","Benin","WorldBank",2021,2019,564,368.8 "Africa","Benin","WorldBank",2021,2020,741,312.02 "Africa","Benin","WorldBank",2022,2018,328,292.11 "Africa","Benin","WorldBank",2022,2019,653,429.21 "Africa","Benin","WorldBank",2022,2020,951,242.73 "Africa","Chad","OECD",2020,2018,176,95.06 "Africa","Chad","OECD",2020,2019,783,425.34 "Africa","Chad","OECD",2020,2020,885,461.6 "Africa","Chad","OECD",2021,2018,673,15.87 "Africa","Chad","OECD",2021,2019,131,74.46 "Africa","Chad","OECD",2021,2020,430,61.58 "Africa","Chad","OECD",2022,2018,593,211.34 "Africa","Chad","OECD",2022,2019,647,550.37 "Africa","Chad","OECD",2022,2020,154,105.65 "Africa","Chad","IMF",2020,2018,160,32.41 "Africa","Chad","IMF",2020,2019,654,27.84 "Africa","Chad","IMF",2020,2020,616,468.92 "Africa","Chad","IMF",2021,2018,996,22.4 "Africa","Chad","IMF",2021,2019,126,93.18 "Africa","Chad","IMF",2021,2020,879,547.87 "Africa","Chad","IMF",2022,2018,663,520 "Africa","Chad","IMF",2022,2019,681,544.76 "Africa","Chad","IMF",2022,2020,101,55.6 "Africa","Chad","WorldBank",2020,2018,786,757.22 "Africa","Chad","WorldBank",2020,2019,599,593.69 "Africa","Chad","WorldBank",2020,2020,641,529.84 "Africa","Chad","WorldBank",2021,2018,343,287.89 "Africa","Chad","WorldBank",2021,2019,438,340.83 "Africa","Chad","WorldBank",2021,2020,762,594.67 "Africa","Chad","WorldBank",2022,2018,430,128.69 "Africa","Chad","WorldBank",2022,2019,260,242.59 "Africa","Chad","WorldBank",2022,2020,607,216.1 "Europe","Denmark","OECD",2020,2018,114,86.75 "Europe","Denmark","OECD",2020,2019,937,373.29 "Europe","Denmark","OECD",2020,2020,866,392.93 "Europe","Denmark","OECD",2021,2018,296,41.04 "Europe","Denmark","OECD",2021,2019,402,32.67 "Europe","Denmark","OECD",2021,2020,306,7.88 "Europe","Denmark","OECD",2022,2018,540,379.51 "Europe","Denmark","OECD",2022,2019,108,26.72 "Europe","Denmark","OECD",2022,2020,752,307.2 "Europe","Denmark","IMF",2020,2018,157,24.24 "Europe","Denmark","IMF",2020,2019,303,79.04 "Europe","Denmark","IMF",2020,2020,286,122.36 "Europe","Denmark","IMF",2021,2018,569,69.32 "Europe","Denmark","IMF",2021,2019,808,642.67 "Europe","Denmark","IMF",2021,2020,157,5.58 "Europe","Denmark","IMF",2022,2018,147,112.21 "Europe","Denmark","IMF",2022,2019,414,311.16 "Europe","Denmark","IMF",2022,2020,774,230.46 "Europe","Denmark","WorldBank",2020,2018,695,350.03 "Europe","Denmark","WorldBank",2020,2019,511,209.84 "Europe","Denmark","WorldBank",2020,2020,181,29.27 "Europe","Denmark","WorldBank",2021,2018,503,176.89 "Europe","Denmark","WorldBank",2021,2019,710,609.02 "Europe","Denmark","WorldBank",2021,2020,264,165.78 "Europe","Denmark","WorldBank",2022,2018,670,638.99 "Europe","Denmark","WorldBank",2022,2019,651,354.6 "Europe","Denmark","WorldBank",2022,2020,632,623.94 "Europe","Estonia","OECD",2020,2018,838,263.67 "Europe","Estonia","OECD",2020,2019,638,533.95 "Europe","Estonia","OECD",2020,2020,898,638.73 "Europe","Estonia","OECD",2021,2018,262,98.16 "Europe","Estonia","OECD",2021,2019,569,552.54 "Europe","Estonia","OECD",2021,2020,868,252.48 "Europe","Estonia","OECD",2022,2018,927,264.65 "Europe","Estonia","OECD",2022,2019,205,150.6 "Europe","Estonia","OECD",2022,2020,828,752.61 "Europe","Estonia","IMF",2020,2018,841,176.31 "Europe","Estonia","IMF",2020,2019,614,230.55 "Europe","Estonia","IMF",2020,2020,500,41.19 "Europe","Estonia","IMF",2021,2018,510,169.68 "Europe","Estonia","IMF",2021,2019,765,401.85 "Europe","Estonia","IMF",2021,2020,751,319.6 "Europe","Estonia","IMF",2022,2018,314,58.81 "Europe","Estonia","IMF",2022,2019,155,2.24 "Europe","Estonia","IMF",2022,2020,734,187.6 "Europe","Estonia","WorldBank",2020,2018,332,160.17 "Europe","Estonia","WorldBank",2020,2019,466,385.33 "Europe","Estonia","WorldBank",2020,2020,487,435.06 "Europe","Estonia","WorldBank",2021,2018,461,249.19 "Europe","Estonia","WorldBank",2021,2019,932,763.38 "Europe","Estonia","WorldBank",2021,2020,650,463.91 "Europe","Estonia","WorldBank",2022,2018,570,549.97 "Europe","Estonia","WorldBank",2022,2019,909,80.48 "Europe","Estonia","WorldBank",2022,2020,523,242.22 "Europe","Finland","OECD",2020,2018,565,561.64 "Europe","Finland","OECD",2020,2019,646,161.62 "Europe","Finland","OECD",2020,2020,194,133.69 "Europe","Finland","OECD",2021,2018,529,39.76 "Europe","Finland","OECD",2021,2019,800,680.12 "Europe","Finland","OECD",2021,2020,418,399.19 "Europe","Finland","OECD",2022,2018,591,253.12 "Europe","Finland","OECD",2022,2019,457,272.58 "Europe","Finland","OECD",2022,2020,157,105.1 "Europe","Finland","IMF",2020,2018,860,445.03 "Europe","Finland","IMF",2020,2019,108,47.72 "Europe","Finland","IMF",2020,2020,523,500.58 "Europe","Finland","IMF",2021,2018,560,81.47 "Europe","Finland","IMF",2021,2019,830,664.64 "Europe","Finland","IMF",2021,2020,903,762.62 "Europe","Finland","IMF",2022,2018,179,167.73 "Europe","Finland","IMF",2022,2019,137,98.98 "Europe","Finland","IMF",2022,2020,666,524.86 "Europe","Finland","WorldBank",2020,2018,319,146.01 "Europe","Finland","WorldBank",2020,2019,401,219.56 "Europe","Finland","WorldBank",2020,2020,711,45.35 "Europe","Finland","WorldBank",2021,2018,828,20.97 "Europe","Finland","WorldBank",2021,2019,180,66.3 "Europe","Finland","WorldBank",2021,2020,682,92.57 "Europe","Finland","WorldBank",2022,2018,254,81.2 "Europe","Finland","WorldBank",2022,2019,619,159.08 "Europe","Finland","WorldBank",2022,2020,191,184.4 """ df = pd.read_csv(StringIO(DATA)) df["country"] = df["country"].astype("category") df["country_id"] = df.country.cat.codes model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df) result = model.fit( cov_type='cluster', cov_kwds={'groups': np.array(df[['country_id', 'year_publication']])}, use_t=True ) print(result.summary())
how to solve IndexError : single positional indexer is out-of-bounds
CODE:- from datetime import date from datetime import timedelta from nsepy import get_history import pandas as pd import datetime # import matplotlib.pyplot as mp end1 = date.today() start1 = end1 - timedelta(days=365) stock = [ 'RELIANCE','HDFCBANK','INFY','ICICIBANK','HDFC','TCS','KOTAKBANK','LT','SBIN','HINDUNILVR','AXISBANK','ITC','BAJFINANCE','BHARTIARTL','ASIANPAINT','HCLTECH','MARUTI','TITAN','BAJAJFINSV','TATAMOTORS', 'TECHM','SUNPHARMA','TATASTEEL','M&M','WIPRO','ULTRACEMCO','POWERGRID','HINDALCO','NTPC','NESTLEIND','GRASIM','ONGC','JSWSTEEL','HDFCLIFE','INDUSINDBK','SBILIFE','DRREDDY','ADANIPORTS','DIVISLAB','CIPLA', 'BAJAJ-AUTO','TATACONSUM','UPL','BRITANNIA','BPCL','EICHERMOT','HEROMOTOCO','COALINDIA','SHREECEM','IOC','VEDL','ADANIENT', 'APOLLOHOSP', 'TATAPOWER', 'PIDILITIND', 'SRF', 'NAUKRI', 'ICICIGI', 'DABUR', 'GODREJCP', 'HAVELLS', 'PEL', 'VOLTAS', 'AUBANK', 'LTI', 'CHOLAFIN', 'AMBUJACEM', 'MARICO', 'SRTRANSFIN','GAIL', 'MCDOWELL-N', 'MPHASIS', 'MINDTREE', 'PAGEIND', 'ZEEL', 'BEL', 'TRENT', 'CROMPTON', 'JUBLFOOD', 'DLF', 'SBICARD', 'SIEMENS', 'BANDHANBNK', 'IRCTC', 'LAURUSLABS', 'PIIND', 'INDIGO', 'INDUSTOWER','ICICIPRULI', 'MOTHERSON', 'AARTIIND', 'FEDERALBNK', 'BANKBARODA', 'PERSISTENT', 'HINDPETRO', 'ACC', 'AUROPHARMA', 'COLPAL', 'GODREJPROP', 'MFSL', 'LUPIN', 'BIOCON', 'ASHOKLEY', 'BHARATFORG', 'BERGEPAINT','JINDALSTEL', 'ASTRAL', 'IEX', 'NMDC', 'CONCOR', 'INDHOTEL', 'BALKRISIND', 'PETRONET', 'CANBK', 'ALKEM', 'DIXON', 'DEEPAKNTR', 'DALBHARAT', 'TVSMOTOR', 'ATUL', 'HDFCAMC', 'TATACOMM', 'MUTHOOTFIN', 'TATACHEM','SAIL', 'IDFCFIRSTB', 'PFC', 'BOSCHLTD', 'MRF', 'NAVINFLUOR', 'CUMMINSIND', 'IGL', 'IPCALAB', 'COFORGE', 'ESCORTS', 'TORNTPHARM', 'LTTS', 'RECLTD', 'LICHSGFIN', 'BATAINDIA', 'HAL', 'PNB', 'GUJGASLTD', 'UBL','3MINDIA','ABB','AIAENG','APLAPOLLO','AARTIDRUGS','AAVAS','ABBOTINDIA','ADANIGREEN','ATGL','ABCAPITAL', 'ABFRL','ABSLAMC','ADVENZYMES','AEGISCHEM','AFFLE','AJANTPHARM','ALKYLAMINE','ALLCARGO','AMARAJABAT','AMBER','ANGELONE','ANURAS','APTUS','ASAHIINDIA','ASTERDM','ASTRAZEN','AVANTIFEED','DMART','BASF', 'BSE','BAJAJELEC','BAJAJHLDNG','BALAMINES','BALRAMCHIN','BANKINDIA','MAHABANK','BAYERCROP','BDL','BEL','BHEL','BIRLACORPN','BSOFT','BLUEDART','BLUESTARCO','BORORENEW','BOSCHLTD','BRIGADE','BCG','MAPMYINDIA' ] target_stocks_list = [] target_stocks = pd.DataFrame() for stock in stock: vol = get_history(symbol=stock, start=start1, end=end1) d_vol = pd.concat([vol['Deliverable Volume']]) symbol_s = pd.concat([vol['Symbol']]) close = pd.concat([vol['Close']]) df = pd.DataFrame(symbol_s) df['D_vol'] = d_vol # print(df) cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1]) if(cond): target_stocks_list.append(stock) target_stocks = pd.concat([target_stocks, df]) print(target_stocks_list) file_name = f'{datetime.datetime.now().day}-{datetime.datetime.now().month}-{datetime.datetime.now().year}.csv' target_stocks.to_csv(f'D:/HUGE VOLUME SPURTS/first 250/SEP 2022/{file_name}') pd.set_option('display.max_columns',10) pd.set_option('display.max_rows',2000) print(target_stocks) ERROR:- C:\python\Python310\python.exe "C:/Users/Yogesh_PC/PycharmProjects/future oi data analysis/trial2.py" Traceback (most recent call last): File "C:\Users\Yogesh_PC\PycharmProjects\future oi data analysis\trial2.py", line 64, in <module> cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1]) File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 967, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 1520, in _getitem_axis self._validate_integer(key, axis) File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 1452, in _validate_integer raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds Process finished with exit code 1 Above code gives the historical stock data of Indian stock market. The data is updated on website after market closed around 8:00PM to 9:00PM daily. Then I run my code. For most of the days my code gives output without any error but frequently it throws an error which showed above. There are around 150-200 stocks in my code. This error occurs because some time exchange do not update the data of one or two stocks from the above list that is why this error comes. So please post the code which will skip the particular one or two stocks which are not updated and should give the output for rest all stocks. for example:- stocks = ['DLF', 'SBICARD', 'SIEMENS', 'BANDHANBNK', 'IRCTC', 'LAURUSLABS', 'PIIND', 'INDIGO', 'INDUSTOWER','ICICIPRULI', 'MOTHERSON'] in above stocks suppose exchange didn't update the data of 'IRCTC' and rest all stocks are up to date then due to 'IRCTC' my code throws error and it is not showing data which is updated. Thank you.
The "out-of-bounds" error indicates you're trying to access a part of the dataframe series that doesn't exist. It's most likely caused by df['D_vol'] being less than 90 items long when you try to do df['D_vol'].iloc[-91:-1] Edit: add a length check before the offending line: if df['D_vol'].size > 90: cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1]) if(cond): target_stocks_list.append(stock) target_stocks = pd.concat([target_stocks, df])
Once my functions are nested and reference eachother, my tuples return NoneType errors, why?
I wrote functions to call and replace tuples from a dictionary. The functions all work independently. When I run them individually the tuple values return integers as planned. When run in sequence or nested with other functions, the tuples return NoneType error. When I run type on my called tuple it returns integer. I'm confused and wanted to solve this issue before I convert to a class structure to tidy up. My current workflow is: takes in an integer determined from previous code (volume)> conditionally chooses a divisor> rounds down the value> matches the value in a dictionary> returns the value from the dictionary> new tuple created> tuple in dictionary is replaced TypeError Traceback (most recent call last) <ipython-input-23-1f1ae48bdda3> in <module> ----> 1 asp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A1') 2 height_a1= get_height(dict_tuberack1,tuberack1,'A1') 3 asp_a(275,height_a1,tuberack1['A1']) 4 disp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A2') 5 height_a2= get_height(dict_tuberack1,tuberack1,'A2') <ipython-input-10-ba8bc2194a15> in asp_master(volume, dict_vol, dict_labware, labware, well) 1 def asp_master(volume,dict_vol,dict_labware,labware,well): ----> 2 if low_vol_check(volume,dict_labware,labware,well)==True: 3 new_vol=volume_sub(volume,dict_labware,labware,well) 4 tup_update_sub(new_vol,dict_vol,dict_labware,labware,well) 5 print(dict_labware[labware[well]]) <ipython-input-16-a23165d51020> in low_vol_check(volume, dict_labware, labware, well) 10 11 def low_vol_check(volume,dict_labware,labware,well): ---> 12 x=get_volume(dict_labware,labware,well) 13 y=volume 14 if x-y < 0: <ipython-input-16-a23165d51020> in get_volume(dict_labware, labware, well) 1 def get_volume(dict_labware,labware,well): 2 tup = dict_labware.get(labware[well]) ----> 3 (tup_v,tup_h)=tup 4 volume=tup_v 5 return tup_v TypeError: cannot unpack non-iterable NoneType object Robot Code: from opentrons import protocol_api from opentrons.simulate import get_protocol_api from math import floor,ceil from datetime import datetime import opentrons protocol = get_protocol_api('2.8') tuberack1 = protocol.load_labware('opentrons_15_tuberack_nest_15ml_conical','2', 'tuberack1') tuberack2 = protocol.load_labware('opentrons_24_tuberack_nest_1.5ml_snapcap','3','tuberack2') tiprack= protocol.load_labware('opentrons_96_tiprack_300ul','4') p300 = protocol.load_instrument('p300_single', 'left', tip_racks=[tiprack]) p300.home Code: dict_tuberack1={tuberack1['A1']:(14000,104), tuberack1['A2']:(14000,104), tuberack1['A3']:(14000,104),} vol_tuberack1= {14000: 104, 13500: 101, 13000: 98, 12500: 94, 12000: 91, 11500: 88, 11000: 85, 10500: 81, 10000: 78, 9500: 75, 9000: 72, 8500: 68, 8000: 65, 7500: 62, 7000: 59, 6500: 55, 6000: 52, 5500: 49,} def get_volume(dict_labware,labware,well): tup = dict_labware.get(labware[well]) (tup_v,tup_h)=tup volume=tup_v return tup_v def low_vol_check(volume,dict_labware,labware,well): x=get_volume(dict_labware,labware,well) y=volume if x-y < 0: return False else: return True def tup_update_sub(volume,dict_vol,dict_labware,labware,well): tup = dict_labware.get(labware[well]) adj_list=list(tup) adj_list[0]=volume divisor=1 if volume >=1000: divisor=1000 vol_even=round_down(volume, divisor) elif 100 <= volume <1000: #this was the issue and was fixed. divisor=100 vol_even=round_down(volume,divisor) else: divisor=10 vol_even=round_down(volume,divisor) new_height=dict_vol.get(vol_even) adj_list[1]=new_height new_tup=tuple(adj_list) dict_labware[labware[well]] = new_tup def asp_master(volume,dict_vol,dict_labware,labware,well): if low_vol_check(volume,dict_labware,labware,well)==True: new_vol=volume_sub(volume,dict_labware,labware,well) tup_update_sub(new_vol,dict_vol,dict_labware,labware,well) print(dict_labware[labware[well]]) else: print('Cannot aspirate') #robot commands below def asp_a (volume,height,source): p300.pick_up_tip() p300.aspirate(volume, source.bottom(z=height)) def disp_a (volume,height,destination): p300.dispense(volume,destination.bottom(z=height+8)) p300.blowout(height+8) #code that generated error message below asp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A1') height_a1= get_height(dict_tuberack1,tuberack1,'A1') asp_a(275,height_a1,tuberack1['A1'])
matplotlib xlim TypeError: '>' not supported between instances of 'int' and 'list'
this is the original repo i'm trying to run in my computer: https://github.com/kreamkorokke/cs244-final-project import os import matplotlib.pyplot as plt import argparse from attacker import check_attack_type IMG_DIR = "./plots" def read_lines(f, d): lines = f.readlines()[:-1] for line in lines: typ, time, num = line.split(',') if typ == 'seq': d['seq']['time'].append(float(time)) d['seq']['num'].append(float(num)) elif typ == 'ack': d['ack']['time'].append(float(time)) d['ack']['num'].append(float(num)) else: raise "Unknown type read while parsing log file: %s" % typ def main(): parser = argparse.ArgumentParser(description="Plot script for plotting sequence numbers.") parser.add_argument('--save', dest='save_imgs', action='store_true', help="Set this to true to save images under specified output directory.") parser.add_argument('--attack', dest='attack', nargs='?', const="", type=check_attack_type, help="Attack name (used in plot names).") parser.add_argument('--output', dest='output_dir', default=IMG_DIR, help="Directory to store plots.") args = parser.parse_args() save_imgs = args.save_imgs output_dir = args.output_dir attack_name = args.attack if save_imgs and attack_name not in ['div', 'dup', 'opt'] : print("Attack name needed for saving plot figures.") return normal_log = {'seq':{'time':[], 'num':[]}, 'ack':{'time':[], 'num':[]}} attack_log = {'seq':{'time':[], 'num':[]}, 'ack':{'time':[], 'num':[]}} normal_f = open('log.txt', 'r') attack_f = open('%s_attack_log.txt' % attack_name, 'r') read_lines(normal_f, normal_log) read_lines(attack_f, attack_log) if attack_name == 'div': attack_desc = 'ACK Division' elif attack_name == 'dup': attack_desc = 'DupACK Spoofing' elif attack_name == 'opt': attack_desc = 'Optimistic ACKing' else: raise 'Unknown attack type: %s' % attack_name norm_seq_time, norm_seq_num = normal_log['seq']['time'], normal_log['seq']['num'] norm_ack_time, norm_ack_num = normal_log['ack']['time'], normal_log['ack']['num'] atck_seq_time, atck_seq_num = attack_log['seq']['time'], attack_log['seq']['num'] atck_ack_time, atck_ack_num = attack_log['ack']['time'], attack_log['ack']['num'] plt.plot(norm_seq_time, norm_seq_num, 'b^', label='Regular TCP Data Segments') plt.plot(norm_ack_time, norm_ack_num, 'bx', label='Regular TCP ACKs') plt.plot(atck_seq_time, atck_seq_num, 'rs', label='%s Attack Data Segments' % attack_desc) plt.plot(atck_ack_time, atck_ack_num, 'r+', label='%s Attack ACKs' % attack_desc) plt.legend(loc='upper left') x = max(max(norm_seq_time, norm_ack_time),max(atck_seq_time, atck_ack_time)) y = max(max(norm_seq_num, norm_ack_num),max(atck_seq_num, atck_ack_num)) plt.xlim(0, x) plt.ylim(0,y) plt.xlabel('Time (s)') plt.ylabel('Sequence Number (Bytes)') if save_imgs: # Save images to figure/ if not os.path.exists(output_dir): os.makedirs(output_dir) plt.savefig(output_dir + "/" + attack_name) else: plt.show() normal_f.close() attack_f.close() if __name__ == "__main__": main() after running this i get this error Traceback (most recent call last): File "plot.py", line 85, in <module> main() File "plot.py", line 66, in main plt.xlim(0, a) File "/usr/lib/python3/dist-packages/matplotlib/pyplot.py", line 1427, in xlim ret = ax.set_xlim(*args, **kwargs) File "/usr/lib/python3/dist-packages/matplotlib/axes/_base.py", line 3267, in set_xlim reverse = left > right TypeError: '>' not supported between instances of 'int' and 'list' Done! Please check ./plots for all generated plots. how can i solve this problem? or better yet if there is another way of running this project? i installed matplotlib via pip3 install matplotlib command (same with scapy) and my main python version is python2 right now but i run the project with python3, could the issue be about this? what am i missing? or is it about mininet itself?
The problem is in this line x = max(max(norm_seq_time, norm_ack_time),max(atck_seq_time, atck_ack_time)) IIUC, you wanna assign to x the maximum value among all those four lists. However, when you pass two lists to the max function, such as max(norm_seq_time, norm_ack_time), it will return the list it considers the greater one, and not the highest value considering both lists. Instead, you can do something like: x = max(norm_seq_time + norm_ack_time + atck_seq_time + atck_ack_time) This will concatenate the four lists into a single one. Then, the function will return the highest value among all of them. You might wanna do that to the calculation of y as well. If this is not what you wanted, or if you have any further issues, please let us know.
with the help of a friend we solved this problem by changing a part in code into this: max_norm_seq_time = max(norm_seq_time) if len(norm_seq_time) > 0 else 0 max_norm_ack_time = max(norm_ack_time) if len(norm_ack_time) > 0 else 0 max_atck_seq_time = max(atck_seq_time) if len(atck_seq_time) > 0 else 0 max_atck_ack_time = max(atck_ack_time) if len(atck_ack_time) > 0 else 0 x = max((max_norm_seq_time, max_norm_ack_time,\ max_atck_seq_time, max_atck_ack_time)) plt.xlim([0,x]) max_norm_seq_num = max(norm_seq_num) if len(norm_seq_num) > 0 else 0 max_norm_ack_num = max(norm_ack_num) if len(norm_ack_num) > 0 else 0 max_atck_seq_num = max(atck_seq_num) if len(atck_seq_num) > 0 else 0 max_atck_ack_num = max(atck_ack_num) if len(atck_ack_num) > 0 else 0 plt.ylim([0, max((max_norm_seq_num, max_norm_ack_num,\ max_atck_seq_num, max_atck_ack_num))]) ``` writing here just in case anyone else needs it.
Runtime Exception. Exception in python callback function evaluation:
I am working on an assignment for Coursera's Machine Learning: Regression course. I am using the kc_house_data.gl/ dataset and GraphLab Create. I am adding new variables to train_data and test_data that are combinations of old variables. Then I take the mean of all these variables. These are the variables I am adding: bedrooms_squared = bedrooms * bedrooms bed_bath_rooms = bedrooms*bathrooms log_sqft_living = log(sqft_living) lat_plus_long = lat + long Here is my code: train_data['bedrooms_squared'] = train_data['bedrooms'].apply(lambda x: x**2) test_data['bedrooms_squared'] = test_data['bedrooms'].apply(lambda x: x**2) # create the remaining 3 features in both TEST and TRAIN data train_data['bed_bath_rooms'] = train_data.apply(lambda row: row['bedrooms'] * row['bathrooms']) test_data['bed_bath_rooms'] = test_data.apply(lambda row: row['bedrooms'] * row['bathrooms']) train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x)) test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x)) train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long']) train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long']) test_data['bedrooms_squared'].mean() test_data['bed_bath_rooms'].mean() test_data['log_sqft_living'].mean() test_data['lat_plus_long'].mean() This is the error I'm getting: RuntimeError: Runtime Exception. Exception in python callback function evaluation: ValueError('math domain error',): Traceback (most recent call last): File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in graphlab.cython.cy_pylambda_workers._eval_lambda File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple File "<ipython-input-13-1cdbcd5f5d9b>", line 5, in <lambda> ValueError: math domain error I have no idea what this means. Any idea on what caused it and how I fix it? Thanks.
Your problem is that log is receiving a negative number. log is defined only for numbers greater than zero. You need to check your values.
Please add/learn exceptions to make your code more robust: try: train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x)) test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x)) train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long']) train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long']) test_data['bedrooms_squared'].mean() test_data['bed_bath_rooms'].mean() test_data['log_sqft_living'].mean() test_data['lat_plus_long'].mean() except e as Exception: print "ERROR in function:", e