pandas wrapper raise ValueError

pandas wrapper raise ValueError - python

I got the below error while trying to run my python script via pandas, when runing on a 30 millon records data , please advise what went wrong
Traceback (most recent call last): File "extractyooochoose2.py", line 32, in totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
File "", line 13, in unique
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/pandas/core/groupby.py", line 620, in wrapper
raise ValueError
Data and code as shown below
import pandas as pd
import datetime as dt
clickspath='/tmp/gensim/yoochoose/yoochoose-clicks.dat'
buyspath='/tmp/gensim/yoochoose/yoochoose-buys.dat'
clicksdat=pd.read_csv(clickspath,header=None,dtype={'itemid': pd.np.str_,'Sid':pd.np.str_,'Timestamp':pd.np.str_,'itemcategory':pd.np.str_})
clicksdat.columns=['Sid','Timestamp','itemid','itemcategory']
buysdat=pd.read_csv(buyspath,header=None)
buysdat.columns=['Sid','Timestamp','itemid','price','qty']
segment={}
for i in range(24):
if i<7:
segment[i]='EM'
elif i<10:
segment[i]='M'
elif i<13:
segment[i]='A'
elif i<18:
segment[i]='E'
elif i<23:
segment[i]='N'
elif i<25:
segment[i]='MN'
#*******************************************
buyersession=buysdat.Sid.unique()
clickersession=clicksdat.Sid.unique()
maxtemp=[(dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")) for x in clicksdat.groupby('Sid')['Timestamp'].max()]
mintemp=[dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ") for x in clicksdat.groupby('Sid')['Timestamp'].min()]
duration=[int((a-b).total_seconds()) for a,b in zip(maxtemp,mintemp)]
day=[x.day for x in maxtemp]
month=[x.month for x in maxtemp]
noofnavigations=[clicksdat.groupby('Sid').count().Timestamp][0]
totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
totalcats=[len(x) for x in clicksdat.groupby('Sid')['itemcategory'].unique()]
timesegment= [segment[x.hour]for x in maxtemp]
segmentchange=[1 if (segment[x.hour]!=segment[y.hour]) else 0 for x,y in zip(maxtemp,mintemp)]
purchased=[x in buyersession for x in noofnavigations.index.values ]
percentile_list = pd.DataFrame({'purchased' : purchased,'duration':duration,'day':day,'month':month,'noofnavigations':noofnavigations,'totalitems':totalitems,'totalcats':totalcats,'timesegment':timesegment,'segmentchange':segmentchange })
percentile_list.to_csv('/tmp/gensim/yoochoose/yoochoose-clicks1001.csv')
Sample data as shown below
sessioid,timestamp,itemid,category
1,2014-04-07T10:51:09.277Z,214536502,0
1,2014-04-07T10:54:09.868Z,214536500,0
1,2014-04-07T10:54:46.998Z,214536506,0
1,2014-04-07T10:57:00.306Z,214577561,0
2,2014-04-07T13:56:37.614Z,214662742,0
2,2014-04-07T13:57:19.373Z,214662742,0
2,2014-04-07T13:58:37.446Z,214825110,0
2,2014-04-07T13:59:50.710Z,214757390,0

Related

TypeError when fitting Statsmodels OLS with standard errors clustered 2 ways

Context
Building on top of How to run Panel OLS regressions with 3+ fixed-effect and errors clustering? and notably Josef's third comment, I am trying to adapt the OLS Coefficients and Standard Errors Clustered by Firm and Year section of this example notebook below:
cluster_2ways_ols = sm.ols(formula='y ~ x', data=df).fit(cov_type='cluster',
cov_kwds={'groups': np.array(df[['firmid', 'year']])},
use_t=True)
to my own example dataset.
Note that I am able to reproduce this example (and it works). I can also add fixed-effects, by using 'y ~ x + C(firmid) + C(year)' as formula instead.
Problem
However, trying to port the same command to my example dataset (see code below), I'm getting the following error:
>>> model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df)
>>> result = model.fit(
cov_type='cluster',
cov_kwds={'groups': np.array(df[['country', 'year_publication']])},
use_t=True
)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 343, in fit
lfit = OLSResults(
File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 1607, in __init__
self.get_robustcov_results(cov_type=cov_type, use_self=True,
File "/path/venv/lib64/python3.10/site-packages/statsmodels/regression/linear_model.py", line 2568, in get_robustcov_results
res.cov_params_default = sw.cov_cluster_2groups(
File "/path/venv/lib64/python3.10/site-packages/statsmodels/stats/sandwich_covariance.py", line 591, in cov_cluster_2groups
combine_indices(group)[0],
File "/path/venv/lib64/python3.10/site-packages/statsmodels/tools/grouputils.py", line 55, in combine_indices
groups_ = groups.view([('', groups.dtype)] * groups.shape[1])
File "/path/venv/lib64/python3.10/site-packages/numpy/core/_internal.py", line 549, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.
I have tried to manually cast the year_publication to string/object using np.array(df[['country', 'year_publication']].astype("str")), but it doesn't solve the issue.
Questions
What is the cause of the TypeError()?
How to adapt the example command to my dataset?
Minimal Working Example
from io import StringIO
import numpy as np
import pandas as pd
import statsmodels.api as sm
DATA = """
"continent","country","source","year_publication","year_data","population","gdp"
"Africa","Angola","OECD",2020,2018,972,52.69
"Africa","Angola","OECD",2020,2019,986,802.7
"Africa","Angola","OECD",2020,2020,641,568.74
"Africa","Angola","OECD",2021,2018,438,168.83
"Africa","Angola","OECD",2021,2019,958,310.57
"Africa","Angola","OECD",2021,2020,270,144.02
"Africa","Angola","OECD",2022,2018,528,359.71
"Africa","Angola","OECD",2022,2019,974,582.98
"Africa","Angola","OECD",2022,2020,835,820.49
"Africa","Angola","IMF",2020,2018,168,148.85
"Africa","Angola","IMF",2020,2019,460,236.21
"Africa","Angola","IMF",2020,2020,360,297.15
"Africa","Angola","IMF",2021,2018,381,249.13
"Africa","Angola","IMF",2021,2019,648,128.05
"Africa","Angola","IMF",2021,2020,206,179.05
"Africa","Angola","IMF",2022,2018,282,150.29
"Africa","Angola","IMF",2022,2019,125,23.42
"Africa","Angola","IMF",2022,2020,410,247.35
"Africa","Angola","WorldBank",2020,2018,553,182.06
"Africa","Angola","WorldBank",2020,2019,847,698.87
"Africa","Angola","WorldBank",2020,2020,844,126.61
"Africa","Angola","WorldBank",2021,2018,307,239.76
"Africa","Angola","WorldBank",2021,2019,659,510.73
"Africa","Angola","WorldBank",2021,2020,548,331.89
"Africa","Angola","WorldBank",2022,2018,448,122.76
"Africa","Angola","WorldBank",2022,2019,768,761.41
"Africa","Angola","WorldBank",2022,2020,324,163.57
"Africa","Benin","OECD",2020,2018,513,196.9
"Africa","Benin","OECD",2020,2019,590,83.7
"Africa","Benin","OECD",2020,2020,791,511.09
"Africa","Benin","OECD",2021,2018,799,474.43
"Africa","Benin","OECD",2021,2019,455,234.21
"Africa","Benin","OECD",2021,2020,549,238.83
"Africa","Benin","OECD",2022,2018,235,229.33
"Africa","Benin","OECD",2022,2019,347,46.51
"Africa","Benin","OECD",2022,2020,532,392.13
"Africa","Benin","IMF",2020,2018,138,137.05
"Africa","Benin","IMF",2020,2019,978,239.82
"Africa","Benin","IMF",2020,2020,821,33.41
"Africa","Benin","IMF",2021,2018,453,291.93
"Africa","Benin","IMF",2021,2019,526,381.88
"Africa","Benin","IMF",2021,2020,467,313.57
"Africa","Benin","IMF",2022,2018,948,555.23
"Africa","Benin","IMF",2022,2019,323,289.91
"Africa","Benin","IMF",2022,2020,421,62.35
"Africa","Benin","WorldBank",2020,2018,983,271.69
"Africa","Benin","WorldBank",2020,2019,138,23.55
"Africa","Benin","WorldBank",2020,2020,636,623.65
"Africa","Benin","WorldBank",2021,2018,653,534.99
"Africa","Benin","WorldBank",2021,2019,564,368.8
"Africa","Benin","WorldBank",2021,2020,741,312.02
"Africa","Benin","WorldBank",2022,2018,328,292.11
"Africa","Benin","WorldBank",2022,2019,653,429.21
"Africa","Benin","WorldBank",2022,2020,951,242.73
"Africa","Chad","OECD",2020,2018,176,95.06
"Africa","Chad","OECD",2020,2019,783,425.34
"Africa","Chad","OECD",2020,2020,885,461.6
"Africa","Chad","OECD",2021,2018,673,15.87
"Africa","Chad","OECD",2021,2019,131,74.46
"Africa","Chad","OECD",2021,2020,430,61.58
"Africa","Chad","OECD",2022,2018,593,211.34
"Africa","Chad","OECD",2022,2019,647,550.37
"Africa","Chad","OECD",2022,2020,154,105.65
"Africa","Chad","IMF",2020,2018,160,32.41
"Africa","Chad","IMF",2020,2019,654,27.84
"Africa","Chad","IMF",2020,2020,616,468.92
"Africa","Chad","IMF",2021,2018,996,22.4
"Africa","Chad","IMF",2021,2019,126,93.18
"Africa","Chad","IMF",2021,2020,879,547.87
"Africa","Chad","IMF",2022,2018,663,520
"Africa","Chad","IMF",2022,2019,681,544.76
"Africa","Chad","IMF",2022,2020,101,55.6
"Africa","Chad","WorldBank",2020,2018,786,757.22
"Africa","Chad","WorldBank",2020,2019,599,593.69
"Africa","Chad","WorldBank",2020,2020,641,529.84
"Africa","Chad","WorldBank",2021,2018,343,287.89
"Africa","Chad","WorldBank",2021,2019,438,340.83
"Africa","Chad","WorldBank",2021,2020,762,594.67
"Africa","Chad","WorldBank",2022,2018,430,128.69
"Africa","Chad","WorldBank",2022,2019,260,242.59
"Africa","Chad","WorldBank",2022,2020,607,216.1
"Europe","Denmark","OECD",2020,2018,114,86.75
"Europe","Denmark","OECD",2020,2019,937,373.29
"Europe","Denmark","OECD",2020,2020,866,392.93
"Europe","Denmark","OECD",2021,2018,296,41.04
"Europe","Denmark","OECD",2021,2019,402,32.67
"Europe","Denmark","OECD",2021,2020,306,7.88
"Europe","Denmark","OECD",2022,2018,540,379.51
"Europe","Denmark","OECD",2022,2019,108,26.72
"Europe","Denmark","OECD",2022,2020,752,307.2
"Europe","Denmark","IMF",2020,2018,157,24.24
"Europe","Denmark","IMF",2020,2019,303,79.04
"Europe","Denmark","IMF",2020,2020,286,122.36
"Europe","Denmark","IMF",2021,2018,569,69.32
"Europe","Denmark","IMF",2021,2019,808,642.67
"Europe","Denmark","IMF",2021,2020,157,5.58
"Europe","Denmark","IMF",2022,2018,147,112.21
"Europe","Denmark","IMF",2022,2019,414,311.16
"Europe","Denmark","IMF",2022,2020,774,230.46
"Europe","Denmark","WorldBank",2020,2018,695,350.03
"Europe","Denmark","WorldBank",2020,2019,511,209.84
"Europe","Denmark","WorldBank",2020,2020,181,29.27
"Europe","Denmark","WorldBank",2021,2018,503,176.89
"Europe","Denmark","WorldBank",2021,2019,710,609.02
"Europe","Denmark","WorldBank",2021,2020,264,165.78
"Europe","Denmark","WorldBank",2022,2018,670,638.99
"Europe","Denmark","WorldBank",2022,2019,651,354.6
"Europe","Denmark","WorldBank",2022,2020,632,623.94
"Europe","Estonia","OECD",2020,2018,838,263.67
"Europe","Estonia","OECD",2020,2019,638,533.95
"Europe","Estonia","OECD",2020,2020,898,638.73
"Europe","Estonia","OECD",2021,2018,262,98.16
"Europe","Estonia","OECD",2021,2019,569,552.54
"Europe","Estonia","OECD",2021,2020,868,252.48
"Europe","Estonia","OECD",2022,2018,927,264.65
"Europe","Estonia","OECD",2022,2019,205,150.6
"Europe","Estonia","OECD",2022,2020,828,752.61
"Europe","Estonia","IMF",2020,2018,841,176.31
"Europe","Estonia","IMF",2020,2019,614,230.55
"Europe","Estonia","IMF",2020,2020,500,41.19
"Europe","Estonia","IMF",2021,2018,510,169.68
"Europe","Estonia","IMF",2021,2019,765,401.85
"Europe","Estonia","IMF",2021,2020,751,319.6
"Europe","Estonia","IMF",2022,2018,314,58.81
"Europe","Estonia","IMF",2022,2019,155,2.24
"Europe","Estonia","IMF",2022,2020,734,187.6
"Europe","Estonia","WorldBank",2020,2018,332,160.17
"Europe","Estonia","WorldBank",2020,2019,466,385.33
"Europe","Estonia","WorldBank",2020,2020,487,435.06
"Europe","Estonia","WorldBank",2021,2018,461,249.19
"Europe","Estonia","WorldBank",2021,2019,932,763.38
"Europe","Estonia","WorldBank",2021,2020,650,463.91
"Europe","Estonia","WorldBank",2022,2018,570,549.97
"Europe","Estonia","WorldBank",2022,2019,909,80.48
"Europe","Estonia","WorldBank",2022,2020,523,242.22
"Europe","Finland","OECD",2020,2018,565,561.64
"Europe","Finland","OECD",2020,2019,646,161.62
"Europe","Finland","OECD",2020,2020,194,133.69
"Europe","Finland","OECD",2021,2018,529,39.76
"Europe","Finland","OECD",2021,2019,800,680.12
"Europe","Finland","OECD",2021,2020,418,399.19
"Europe","Finland","OECD",2022,2018,591,253.12
"Europe","Finland","OECD",2022,2019,457,272.58
"Europe","Finland","OECD",2022,2020,157,105.1
"Europe","Finland","IMF",2020,2018,860,445.03
"Europe","Finland","IMF",2020,2019,108,47.72
"Europe","Finland","IMF",2020,2020,523,500.58
"Europe","Finland","IMF",2021,2018,560,81.47
"Europe","Finland","IMF",2021,2019,830,664.64
"Europe","Finland","IMF",2021,2020,903,762.62
"Europe","Finland","IMF",2022,2018,179,167.73
"Europe","Finland","IMF",2022,2019,137,98.98
"Europe","Finland","IMF",2022,2020,666,524.86
"Europe","Finland","WorldBank",2020,2018,319,146.01
"Europe","Finland","WorldBank",2020,2019,401,219.56
"Europe","Finland","WorldBank",2020,2020,711,45.35
"Europe","Finland","WorldBank",2021,2018,828,20.97
"Europe","Finland","WorldBank",2021,2019,180,66.3
"Europe","Finland","WorldBank",2021,2020,682,92.57
"Europe","Finland","WorldBank",2022,2018,254,81.2
"Europe","Finland","WorldBank",2022,2019,619,159.08
"Europe","Finland","WorldBank",2022,2020,191,184.4
"""
df = pd.read_csv(StringIO(DATA))
model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df)
result = model.fit(
cov_type='cluster',
cov_kwds={'groups': np.array(df[['country', 'year_publication']])},
use_t=True
)
print(result.summary())

I have realized that the groups must be an array of integers rather than of objects/strings.
Thus, label encoding the string column as follows:
df["country"] = df["country"].astype("category")
df["country_id"] = df.country.cat.codes
and using country_id to cluster the standard errors solves the issue:
result = model.fit(
cov_type='cluster',
cov_kwds={'groups': np.array(df[['country_id', 'year_publication']])},
use_t=True
)
Fully working example:
from io import StringIO
import numpy as np
import pandas as pd
import statsmodels.api as sm
DATA = """
"continent","country","source","year_publication","year_data","population","gdp"
"Africa","Angola","OECD",2020,2018,972,52.69
"Africa","Angola","OECD",2020,2019,986,802.7
"Africa","Angola","OECD",2020,2020,641,568.74
"Africa","Angola","OECD",2021,2018,438,168.83
"Africa","Angola","OECD",2021,2019,958,310.57
"Africa","Angola","OECD",2021,2020,270,144.02
"Africa","Angola","OECD",2022,2018,528,359.71
"Africa","Angola","OECD",2022,2019,974,582.98
"Africa","Angola","OECD",2022,2020,835,820.49
"Africa","Angola","IMF",2020,2018,168,148.85
"Africa","Angola","IMF",2020,2019,460,236.21
"Africa","Angola","IMF",2020,2020,360,297.15
"Africa","Angola","IMF",2021,2018,381,249.13
"Africa","Angola","IMF",2021,2019,648,128.05
"Africa","Angola","IMF",2021,2020,206,179.05
"Africa","Angola","IMF",2022,2018,282,150.29
"Africa","Angola","IMF",2022,2019,125,23.42
"Africa","Angola","IMF",2022,2020,410,247.35
"Africa","Angola","WorldBank",2020,2018,553,182.06
"Africa","Angola","WorldBank",2020,2019,847,698.87
"Africa","Angola","WorldBank",2020,2020,844,126.61
"Africa","Angola","WorldBank",2021,2018,307,239.76
"Africa","Angola","WorldBank",2021,2019,659,510.73
"Africa","Angola","WorldBank",2021,2020,548,331.89
"Africa","Angola","WorldBank",2022,2018,448,122.76
"Africa","Angola","WorldBank",2022,2019,768,761.41
"Africa","Angola","WorldBank",2022,2020,324,163.57
"Africa","Benin","OECD",2020,2018,513,196.9
"Africa","Benin","OECD",2020,2019,590,83.7
"Africa","Benin","OECD",2020,2020,791,511.09
"Africa","Benin","OECD",2021,2018,799,474.43
"Africa","Benin","OECD",2021,2019,455,234.21
"Africa","Benin","OECD",2021,2020,549,238.83
"Africa","Benin","OECD",2022,2018,235,229.33
"Africa","Benin","OECD",2022,2019,347,46.51
"Africa","Benin","OECD",2022,2020,532,392.13
"Africa","Benin","IMF",2020,2018,138,137.05
"Africa","Benin","IMF",2020,2019,978,239.82
"Africa","Benin","IMF",2020,2020,821,33.41
"Africa","Benin","IMF",2021,2018,453,291.93
"Africa","Benin","IMF",2021,2019,526,381.88
"Africa","Benin","IMF",2021,2020,467,313.57
"Africa","Benin","IMF",2022,2018,948,555.23
"Africa","Benin","IMF",2022,2019,323,289.91
"Africa","Benin","IMF",2022,2020,421,62.35
"Africa","Benin","WorldBank",2020,2018,983,271.69
"Africa","Benin","WorldBank",2020,2019,138,23.55
"Africa","Benin","WorldBank",2020,2020,636,623.65
"Africa","Benin","WorldBank",2021,2018,653,534.99
"Africa","Benin","WorldBank",2021,2019,564,368.8
"Africa","Benin","WorldBank",2021,2020,741,312.02
"Africa","Benin","WorldBank",2022,2018,328,292.11
"Africa","Benin","WorldBank",2022,2019,653,429.21
"Africa","Benin","WorldBank",2022,2020,951,242.73
"Africa","Chad","OECD",2020,2018,176,95.06
"Africa","Chad","OECD",2020,2019,783,425.34
"Africa","Chad","OECD",2020,2020,885,461.6
"Africa","Chad","OECD",2021,2018,673,15.87
"Africa","Chad","OECD",2021,2019,131,74.46
"Africa","Chad","OECD",2021,2020,430,61.58
"Africa","Chad","OECD",2022,2018,593,211.34
"Africa","Chad","OECD",2022,2019,647,550.37
"Africa","Chad","OECD",2022,2020,154,105.65
"Africa","Chad","IMF",2020,2018,160,32.41
"Africa","Chad","IMF",2020,2019,654,27.84
"Africa","Chad","IMF",2020,2020,616,468.92
"Africa","Chad","IMF",2021,2018,996,22.4
"Africa","Chad","IMF",2021,2019,126,93.18
"Africa","Chad","IMF",2021,2020,879,547.87
"Africa","Chad","IMF",2022,2018,663,520
"Africa","Chad","IMF",2022,2019,681,544.76
"Africa","Chad","IMF",2022,2020,101,55.6
"Africa","Chad","WorldBank",2020,2018,786,757.22
"Africa","Chad","WorldBank",2020,2019,599,593.69
"Africa","Chad","WorldBank",2020,2020,641,529.84
"Africa","Chad","WorldBank",2021,2018,343,287.89
"Africa","Chad","WorldBank",2021,2019,438,340.83
"Africa","Chad","WorldBank",2021,2020,762,594.67
"Africa","Chad","WorldBank",2022,2018,430,128.69
"Africa","Chad","WorldBank",2022,2019,260,242.59
"Africa","Chad","WorldBank",2022,2020,607,216.1
"Europe","Denmark","OECD",2020,2018,114,86.75
"Europe","Denmark","OECD",2020,2019,937,373.29
"Europe","Denmark","OECD",2020,2020,866,392.93
"Europe","Denmark","OECD",2021,2018,296,41.04
"Europe","Denmark","OECD",2021,2019,402,32.67
"Europe","Denmark","OECD",2021,2020,306,7.88
"Europe","Denmark","OECD",2022,2018,540,379.51
"Europe","Denmark","OECD",2022,2019,108,26.72
"Europe","Denmark","OECD",2022,2020,752,307.2
"Europe","Denmark","IMF",2020,2018,157,24.24
"Europe","Denmark","IMF",2020,2019,303,79.04
"Europe","Denmark","IMF",2020,2020,286,122.36
"Europe","Denmark","IMF",2021,2018,569,69.32
"Europe","Denmark","IMF",2021,2019,808,642.67
"Europe","Denmark","IMF",2021,2020,157,5.58
"Europe","Denmark","IMF",2022,2018,147,112.21
"Europe","Denmark","IMF",2022,2019,414,311.16
"Europe","Denmark","IMF",2022,2020,774,230.46
"Europe","Denmark","WorldBank",2020,2018,695,350.03
"Europe","Denmark","WorldBank",2020,2019,511,209.84
"Europe","Denmark","WorldBank",2020,2020,181,29.27
"Europe","Denmark","WorldBank",2021,2018,503,176.89
"Europe","Denmark","WorldBank",2021,2019,710,609.02
"Europe","Denmark","WorldBank",2021,2020,264,165.78
"Europe","Denmark","WorldBank",2022,2018,670,638.99
"Europe","Denmark","WorldBank",2022,2019,651,354.6
"Europe","Denmark","WorldBank",2022,2020,632,623.94
"Europe","Estonia","OECD",2020,2018,838,263.67
"Europe","Estonia","OECD",2020,2019,638,533.95
"Europe","Estonia","OECD",2020,2020,898,638.73
"Europe","Estonia","OECD",2021,2018,262,98.16
"Europe","Estonia","OECD",2021,2019,569,552.54
"Europe","Estonia","OECD",2021,2020,868,252.48
"Europe","Estonia","OECD",2022,2018,927,264.65
"Europe","Estonia","OECD",2022,2019,205,150.6
"Europe","Estonia","OECD",2022,2020,828,752.61
"Europe","Estonia","IMF",2020,2018,841,176.31
"Europe","Estonia","IMF",2020,2019,614,230.55
"Europe","Estonia","IMF",2020,2020,500,41.19
"Europe","Estonia","IMF",2021,2018,510,169.68
"Europe","Estonia","IMF",2021,2019,765,401.85
"Europe","Estonia","IMF",2021,2020,751,319.6
"Europe","Estonia","IMF",2022,2018,314,58.81
"Europe","Estonia","IMF",2022,2019,155,2.24
"Europe","Estonia","IMF",2022,2020,734,187.6
"Europe","Estonia","WorldBank",2020,2018,332,160.17
"Europe","Estonia","WorldBank",2020,2019,466,385.33
"Europe","Estonia","WorldBank",2020,2020,487,435.06
"Europe","Estonia","WorldBank",2021,2018,461,249.19
"Europe","Estonia","WorldBank",2021,2019,932,763.38
"Europe","Estonia","WorldBank",2021,2020,650,463.91
"Europe","Estonia","WorldBank",2022,2018,570,549.97
"Europe","Estonia","WorldBank",2022,2019,909,80.48
"Europe","Estonia","WorldBank",2022,2020,523,242.22
"Europe","Finland","OECD",2020,2018,565,561.64
"Europe","Finland","OECD",2020,2019,646,161.62
"Europe","Finland","OECD",2020,2020,194,133.69
"Europe","Finland","OECD",2021,2018,529,39.76
"Europe","Finland","OECD",2021,2019,800,680.12
"Europe","Finland","OECD",2021,2020,418,399.19
"Europe","Finland","OECD",2022,2018,591,253.12
"Europe","Finland","OECD",2022,2019,457,272.58
"Europe","Finland","OECD",2022,2020,157,105.1
"Europe","Finland","IMF",2020,2018,860,445.03
"Europe","Finland","IMF",2020,2019,108,47.72
"Europe","Finland","IMF",2020,2020,523,500.58
"Europe","Finland","IMF",2021,2018,560,81.47
"Europe","Finland","IMF",2021,2019,830,664.64
"Europe","Finland","IMF",2021,2020,903,762.62
"Europe","Finland","IMF",2022,2018,179,167.73
"Europe","Finland","IMF",2022,2019,137,98.98
"Europe","Finland","IMF",2022,2020,666,524.86
"Europe","Finland","WorldBank",2020,2018,319,146.01
"Europe","Finland","WorldBank",2020,2019,401,219.56
"Europe","Finland","WorldBank",2020,2020,711,45.35
"Europe","Finland","WorldBank",2021,2018,828,20.97
"Europe","Finland","WorldBank",2021,2019,180,66.3
"Europe","Finland","WorldBank",2021,2020,682,92.57
"Europe","Finland","WorldBank",2022,2018,254,81.2
"Europe","Finland","WorldBank",2022,2019,619,159.08
"Europe","Finland","WorldBank",2022,2020,191,184.4
"""
df = pd.read_csv(StringIO(DATA))
df["country"] = df["country"].astype("category")
df["country_id"] = df.country.cat.codes
model = sm.OLS.from_formula("gdp ~ population + C(year_publication) + C(country)", df)
result = model.fit(
cov_type='cluster',
cov_kwds={'groups': np.array(df[['country_id', 'year_publication']])},
use_t=True
)
print(result.summary())

how to solve IndexError : single positional indexer is out-of-bounds

CODE:-
from datetime import date
from datetime import timedelta
from nsepy import get_history
import pandas as pd
import datetime
# import matplotlib.pyplot as mp
end1 = date.today()
start1 = end1 - timedelta(days=365)
stock = [
'RELIANCE','HDFCBANK','INFY','ICICIBANK','HDFC','TCS','KOTAKBANK','LT','SBIN','HINDUNILVR','AXISBANK','ITC','BAJFINANCE','BHARTIARTL','ASIANPAINT','HCLTECH','MARUTI','TITAN','BAJAJFINSV','TATAMOTORS',
'TECHM','SUNPHARMA','TATASTEEL','M&M','WIPRO','ULTRACEMCO','POWERGRID','HINDALCO','NTPC','NESTLEIND','GRASIM','ONGC','JSWSTEEL','HDFCLIFE','INDUSINDBK','SBILIFE','DRREDDY','ADANIPORTS','DIVISLAB','CIPLA',
'BAJAJ-AUTO','TATACONSUM','UPL','BRITANNIA','BPCL','EICHERMOT','HEROMOTOCO','COALINDIA','SHREECEM','IOC','VEDL','ADANIENT', 'APOLLOHOSP', 'TATAPOWER', 'PIDILITIND', 'SRF', 'NAUKRI', 'ICICIGI', 'DABUR',
'GODREJCP', 'HAVELLS', 'PEL', 'VOLTAS', 'AUBANK', 'LTI', 'CHOLAFIN', 'AMBUJACEM', 'MARICO', 'SRTRANSFIN','GAIL', 'MCDOWELL-N', 'MPHASIS', 'MINDTREE', 'PAGEIND', 'ZEEL', 'BEL', 'TRENT', 'CROMPTON', 'JUBLFOOD',
'DLF', 'SBICARD', 'SIEMENS', 'BANDHANBNK', 'IRCTC', 'LAURUSLABS', 'PIIND', 'INDIGO', 'INDUSTOWER','ICICIPRULI', 'MOTHERSON', 'AARTIIND', 'FEDERALBNK', 'BANKBARODA', 'PERSISTENT', 'HINDPETRO', 'ACC',
'AUROPHARMA', 'COLPAL', 'GODREJPROP', 'MFSL', 'LUPIN', 'BIOCON', 'ASHOKLEY', 'BHARATFORG', 'BERGEPAINT','JINDALSTEL', 'ASTRAL', 'IEX', 'NMDC', 'CONCOR', 'INDHOTEL', 'BALKRISIND', 'PETRONET', 'CANBK', 'ALKEM',
'DIXON', 'DEEPAKNTR', 'DALBHARAT', 'TVSMOTOR', 'ATUL', 'HDFCAMC', 'TATACOMM', 'MUTHOOTFIN', 'TATACHEM','SAIL', 'IDFCFIRSTB', 'PFC', 'BOSCHLTD', 'MRF', 'NAVINFLUOR', 'CUMMINSIND', 'IGL', 'IPCALAB', 'COFORGE',
'ESCORTS', 'TORNTPHARM', 'LTTS', 'RECLTD', 'LICHSGFIN', 'BATAINDIA', 'HAL', 'PNB', 'GUJGASLTD', 'UBL','3MINDIA','ABB','AIAENG','APLAPOLLO','AARTIDRUGS','AAVAS','ABBOTINDIA','ADANIGREEN','ATGL','ABCAPITAL',
'ABFRL','ABSLAMC','ADVENZYMES','AEGISCHEM','AFFLE','AJANTPHARM','ALKYLAMINE','ALLCARGO','AMARAJABAT','AMBER','ANGELONE','ANURAS','APTUS','ASAHIINDIA','ASTERDM','ASTRAZEN','AVANTIFEED','DMART','BASF',
'BSE','BAJAJELEC','BAJAJHLDNG','BALAMINES','BALRAMCHIN','BANKINDIA','MAHABANK','BAYERCROP','BDL','BEL','BHEL','BIRLACORPN','BSOFT','BLUEDART','BLUESTARCO','BORORENEW','BOSCHLTD','BRIGADE','BCG','MAPMYINDIA'
]
target_stocks_list = []
target_stocks = pd.DataFrame()
for stock in stock:
vol = get_history(symbol=stock,
start=start1,
end=end1)
d_vol = pd.concat([vol['Deliverable Volume']])
symbol_s = pd.concat([vol['Symbol']])
close = pd.concat([vol['Close']])
df = pd.DataFrame(symbol_s)
df['D_vol'] = d_vol
# print(df)
cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1])
if(cond):
target_stocks_list.append(stock)
target_stocks = pd.concat([target_stocks, df])
print(target_stocks_list)
file_name = f'{datetime.datetime.now().day}-{datetime.datetime.now().month}-{datetime.datetime.now().year}.csv'
target_stocks.to_csv(f'D:/HUGE VOLUME SPURTS/first 250/SEP 2022/{file_name}')
pd.set_option('display.max_columns',10)
pd.set_option('display.max_rows',2000)
print(target_stocks)
ERROR:-
C:\python\Python310\python.exe "C:/Users/Yogesh_PC/PycharmProjects/future oi data analysis/trial2.py"
Traceback (most recent call last):
File "C:\Users\Yogesh_PC\PycharmProjects\future oi data analysis\trial2.py", line 64, in <module>
cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1])
File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 967, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 1520, in _getitem_axis
self._validate_integer(key, axis)
File "C:\python\Python310\lib\site-packages\pandas\core\indexing.py", line 1452, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
Process finished with exit code 1
Above code gives the historical stock data of Indian stock market. The data is updated on website after market closed around 8:00PM to 9:00PM daily. Then I run my code. For most of the days my code gives output without any error but frequently it throws an error which showed above.
There are around 150-200 stocks in my code. This error occurs because some time exchange do not update the data of one or two stocks from the above list that is why this error comes.
So please post the code which will skip the particular one or two stocks which are not updated and should give the output for rest all stocks.
for example:- stocks = ['DLF', 'SBICARD', 'SIEMENS', 'BANDHANBNK', 'IRCTC', 'LAURUSLABS', 'PIIND',
'INDIGO', 'INDUSTOWER','ICICIPRULI', 'MOTHERSON']
in above stocks suppose exchange didn't update the data of 'IRCTC' and rest all stocks are up to date then due to 'IRCTC' my code throws error and it is not showing data which is updated.
Thank you.

The "out-of-bounds" error indicates you're trying to access a part of the dataframe series that doesn't exist. It's most likely caused by df['D_vol'] being less than 90 items long when you try to do
df['D_vol'].iloc[-91:-1]
Edit:
add a length check before the offending line:
if df['D_vol'].size > 90:
cond = df['D_vol'].iloc[-1] > max(df['D_vol'].iloc[-91:-1])
if(cond):
target_stocks_list.append(stock)
target_stocks = pd.concat([target_stocks, df])

Once my functions are nested and reference eachother, my tuples return NoneType errors, why?

I wrote functions to call and replace tuples from a dictionary. The functions all work independently. When I run them individually the tuple values return integers as planned. When run in sequence or nested with other functions, the tuples return NoneType error. When I run type on my called tuple it returns integer. I'm confused and wanted to solve this issue before I convert to a class structure to tidy up.
My current workflow is: takes in an integer determined from previous code (volume)> conditionally chooses a divisor> rounds down the value> matches the value in a dictionary> returns the value from the dictionary> new tuple created> tuple in dictionary is replaced
TypeError Traceback (most recent call last)
<ipython-input-23-1f1ae48bdda3> in <module>
----> 1 asp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A1')
2 height_a1= get_height(dict_tuberack1,tuberack1,'A1')
3 asp_a(275,height_a1,tuberack1['A1'])
4 disp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A2')
5 height_a2= get_height(dict_tuberack1,tuberack1,'A2')
<ipython-input-10-ba8bc2194a15> in asp_master(volume, dict_vol, dict_labware, labware, well)
1 def asp_master(volume,dict_vol,dict_labware,labware,well):
----> 2 if low_vol_check(volume,dict_labware,labware,well)==True:
3 new_vol=volume_sub(volume,dict_labware,labware,well)
4 tup_update_sub(new_vol,dict_vol,dict_labware,labware,well)
5 print(dict_labware[labware[well]])
<ipython-input-16-a23165d51020> in low_vol_check(volume, dict_labware, labware, well)
10
11 def low_vol_check(volume,dict_labware,labware,well):
---> 12 x=get_volume(dict_labware,labware,well)
13 y=volume
14 if x-y < 0:
<ipython-input-16-a23165d51020> in get_volume(dict_labware, labware, well)
1 def get_volume(dict_labware,labware,well):
2 tup = dict_labware.get(labware[well])
----> 3 (tup_v,tup_h)=tup
4 volume=tup_v
5 return tup_v
TypeError: cannot unpack non-iterable NoneType object
Robot Code:
from opentrons import protocol_api
from opentrons.simulate import get_protocol_api
from math import floor,ceil
from datetime import datetime
import opentrons
protocol = get_protocol_api('2.8')
tuberack1 = protocol.load_labware('opentrons_15_tuberack_nest_15ml_conical','2', 'tuberack1')
tuberack2 = protocol.load_labware('opentrons_24_tuberack_nest_1.5ml_snapcap','3','tuberack2')
tiprack= protocol.load_labware('opentrons_96_tiprack_300ul','4')
p300 = protocol.load_instrument('p300_single', 'left', tip_racks=[tiprack])
p300.home
Code:
dict_tuberack1={tuberack1['A1']:(14000,104), tuberack1['A2']:(14000,104), tuberack1['A3']:(14000,104),}
vol_tuberack1= {14000: 104, 13500: 101, 13000: 98, 12500: 94,
12000: 91, 11500: 88, 11000: 85, 10500: 81,
10000: 78, 9500: 75, 9000: 72, 8500: 68,
8000: 65, 7500: 62, 7000: 59, 6500: 55,
6000: 52, 5500: 49,}
def get_volume(dict_labware,labware,well):
tup = dict_labware.get(labware[well])
(tup_v,tup_h)=tup
volume=tup_v
return tup_v
def low_vol_check(volume,dict_labware,labware,well):
x=get_volume(dict_labware,labware,well)
y=volume
if x-y < 0:
return False
else:
return True
def tup_update_sub(volume,dict_vol,dict_labware,labware,well):
tup = dict_labware.get(labware[well])
adj_list=list(tup)
adj_list[0]=volume
divisor=1
if volume >=1000:
divisor=1000
vol_even=round_down(volume, divisor)
elif 100 <= volume <1000: #this was the issue and was fixed.
divisor=100
vol_even=round_down(volume,divisor)
else:
divisor=10
vol_even=round_down(volume,divisor)
new_height=dict_vol.get(vol_even)
adj_list[1]=new_height
new_tup=tuple(adj_list)
dict_labware[labware[well]] = new_tup
def asp_master(volume,dict_vol,dict_labware,labware,well):
if low_vol_check(volume,dict_labware,labware,well)==True:
new_vol=volume_sub(volume,dict_labware,labware,well)
tup_update_sub(new_vol,dict_vol,dict_labware,labware,well)
print(dict_labware[labware[well]])
else:
print('Cannot aspirate')
#robot commands below
def asp_a (volume,height,source):
p300.pick_up_tip()
p300.aspirate(volume, source.bottom(z=height))
def disp_a (volume,height,destination):
p300.dispense(volume,destination.bottom(z=height+8))
p300.blowout(height+8)
#code that generated error message below
asp_master(275,dict_tuberack1,vol_tuberack1,tuberack1,'A1')
height_a1= get_height(dict_tuberack1,tuberack1,'A1')
asp_a(275,height_a1,tuberack1['A1'])

matplotlib xlim TypeError: '>' not supported between instances of 'int' and 'list'

this is the original repo i'm trying to run in my computer: https://github.com/kreamkorokke/cs244-final-project
import os
import matplotlib.pyplot as plt
import argparse
from attacker import check_attack_type
IMG_DIR = "./plots"
def read_lines(f, d):
lines = f.readlines()[:-1]
for line in lines:
typ, time, num = line.split(',')
if typ == 'seq':
d['seq']['time'].append(float(time))
d['seq']['num'].append(float(num))
elif typ == 'ack':
d['ack']['time'].append(float(time))
d['ack']['num'].append(float(num))
else:
raise "Unknown type read while parsing log file: %s" % typ
def main():
parser = argparse.ArgumentParser(description="Plot script for plotting sequence numbers.")
parser.add_argument('--save', dest='save_imgs', action='store_true',
help="Set this to true to save images under specified output directory.")
parser.add_argument('--attack', dest='attack',
nargs='?', const="", type=check_attack_type,
help="Attack name (used in plot names).")
parser.add_argument('--output', dest='output_dir', default=IMG_DIR,
help="Directory to store plots.")
args = parser.parse_args()
save_imgs = args.save_imgs
output_dir = args.output_dir
attack_name = args.attack
if save_imgs and attack_name not in ['div', 'dup', 'opt'] :
print("Attack name needed for saving plot figures.")
return
normal_log = {'seq':{'time':[], 'num':[]}, 'ack':{'time':[], 'num':[]}}
attack_log = {'seq':{'time':[], 'num':[]}, 'ack':{'time':[], 'num':[]}}
normal_f = open('log.txt', 'r')
attack_f = open('%s_attack_log.txt' % attack_name, 'r')
read_lines(normal_f, normal_log)
read_lines(attack_f, attack_log)
if attack_name == 'div':
attack_desc = 'ACK Division'
elif attack_name == 'dup':
attack_desc = 'DupACK Spoofing'
elif attack_name == 'opt':
attack_desc = 'Optimistic ACKing'
else:
raise 'Unknown attack type: %s' % attack_name
norm_seq_time, norm_seq_num = normal_log['seq']['time'], normal_log['seq']['num']
norm_ack_time, norm_ack_num = normal_log['ack']['time'], normal_log['ack']['num']
atck_seq_time, atck_seq_num = attack_log['seq']['time'], attack_log['seq']['num']
atck_ack_time, atck_ack_num = attack_log['ack']['time'], attack_log['ack']['num']
plt.plot(norm_seq_time, norm_seq_num, 'b^', label='Regular TCP Data Segments')
plt.plot(norm_ack_time, norm_ack_num, 'bx', label='Regular TCP ACKs')
plt.plot(atck_seq_time, atck_seq_num, 'rs', label='%s Attack Data Segments' % attack_desc)
plt.plot(atck_ack_time, atck_ack_num, 'r+', label='%s Attack ACKs' % attack_desc)
plt.legend(loc='upper left')
x = max(max(norm_seq_time, norm_ack_time),max(atck_seq_time, atck_ack_time))
y = max(max(norm_seq_num, norm_ack_num),max(atck_seq_num, atck_ack_num))
plt.xlim(0, x)
plt.ylim(0,y)
plt.xlabel('Time (s)')
plt.ylabel('Sequence Number (Bytes)')
if save_imgs:
# Save images to figure/
if not os.path.exists(output_dir):
os.makedirs(output_dir)
plt.savefig(output_dir + "/" + attack_name)
else:
plt.show()
normal_f.close()
attack_f.close()
if __name__ == "__main__":
main()
after running this i get this error
Traceback (most recent call last):
File "plot.py", line 85, in <module>
main()
File "plot.py", line 66, in main
plt.xlim(0, a)
File "/usr/lib/python3/dist-packages/matplotlib/pyplot.py", line 1427, in xlim
ret = ax.set_xlim(*args, **kwargs)
File "/usr/lib/python3/dist-packages/matplotlib/axes/_base.py", line 3267, in set_xlim
reverse = left > right
TypeError: '>' not supported between instances of 'int' and 'list'
Done! Please check ./plots for all generated plots.
how can i solve this problem? or better yet if there is another way of running this project? i installed matplotlib via pip3 install matplotlib command (same with scapy) and my main python version is python2 right now but i run the project with python3, could the issue be about this? what am i missing? or is it about mininet itself?

The problem is in this line
x = max(max(norm_seq_time, norm_ack_time),max(atck_seq_time, atck_ack_time))
IIUC, you wanna assign to x the maximum value among all those four lists. However, when you pass two lists to the max function, such as max(norm_seq_time, norm_ack_time), it will return the list it considers the greater one, and not the highest value considering both lists.
Instead, you can do something like:
x = max(norm_seq_time + norm_ack_time + atck_seq_time + atck_ack_time)
This will concatenate the four lists into a single one. Then, the function will return the highest value among all of them. You might wanna do that to the calculation of y as well.
If this is not what you wanted, or if you have any further issues, please let us know.

with the help of a friend we solved this problem by changing a part in code into this:
max_norm_seq_time = max(norm_seq_time) if len(norm_seq_time) > 0 else 0
max_norm_ack_time = max(norm_ack_time) if len(norm_ack_time) > 0 else 0
max_atck_seq_time = max(atck_seq_time) if len(atck_seq_time) > 0 else 0
max_atck_ack_time = max(atck_ack_time) if len(atck_ack_time) > 0 else 0
x = max((max_norm_seq_time, max_norm_ack_time,\
max_atck_seq_time, max_atck_ack_time))
plt.xlim([0,x])
max_norm_seq_num = max(norm_seq_num) if len(norm_seq_num) > 0 else 0
max_norm_ack_num = max(norm_ack_num) if len(norm_ack_num) > 0 else 0
max_atck_seq_num = max(atck_seq_num) if len(atck_seq_num) > 0 else 0
max_atck_ack_num = max(atck_ack_num) if len(atck_ack_num) > 0 else 0
plt.ylim([0, max((max_norm_seq_num, max_norm_ack_num,\
max_atck_seq_num, max_atck_ack_num))])
```
writing here just in case anyone else needs it.

Runtime Exception. Exception in python callback function evaluation:

I am working on an assignment for Coursera's Machine Learning: Regression course. I am using the kc_house_data.gl/ dataset and GraphLab Create. I am adding new variables to train_data and test_data that are combinations of old variables. Then I take the mean of all these variables. These are the variables I am adding:
bedrooms_squared = bedrooms * bedrooms
bed_bath_rooms = bedrooms*bathrooms
log_sqft_living = log(sqft_living)
lat_plus_long = lat + long
Here is my code:
train_data['bedrooms_squared'] = train_data['bedrooms'].apply(lambda x: x**2)
test_data['bedrooms_squared'] = test_data['bedrooms'].apply(lambda x: x**2)
# create the remaining 3 features in both TEST and TRAIN data
train_data['bed_bath_rooms'] = train_data.apply(lambda row: row['bedrooms'] * row['bathrooms'])
test_data['bed_bath_rooms'] = test_data.apply(lambda row: row['bedrooms'] * row['bathrooms'])
train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x))
test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x))
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
test_data['bedrooms_squared'].mean()
test_data['bed_bath_rooms'].mean()
test_data['log_sqft_living'].mean()
test_data['lat_plus_long'].mean()
This is the error I'm getting:
RuntimeError: Runtime Exception. Exception in python callback function evaluation:
ValueError('math domain error',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-13-1cdbcd5f5d9b>", line 5, in <lambda>
ValueError: math domain error
I have no idea what this means. Any idea on what caused it and how I fix it? Thanks.

Your problem is that log is receiving a negative number.
log is defined only for numbers greater than zero.
You need to check your values.

Please add/learn exceptions to make your code more robust:
try:
train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x))
test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x))
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
test_data['bedrooms_squared'].mean()
test_data['bed_bath_rooms'].mean()
test_data['log_sqft_living'].mean()
test_data['lat_plus_long'].mean()
except e as Exception:
print "ERROR in function:", e

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas wrapper raise ValueError - python

Related

TypeError when fitting Statsmodels OLS with standard errors clustered 2 ways

how to solve IndexError : single positional indexer is out-of-bounds

Once my functions are nested and reference eachother, my tuples return NoneType errors, why?

matplotlib xlim TypeError: '>' not supported between instances of 'int' and 'list'

Runtime Exception. Exception in python callback function evaluation:

Categories

Resources