statsmodels ols from formula with groupby pandas

statsmodels ols from formula with groupby pandas - python

I have a dataframe of the type:
date TICKER x1 x2 ... Z Y month x3
0 1999-12-31 A UN Equity 52.1330 51.9645 ... 0.0052 NaN 12 NaN
1 1999-12-31 AA UN Equity 92.9415 92.8715 ... 0.0052 NaN 12 NaN
2 1999-12-31 ABC UN Equity 3.6843 3.6539 ... 0.0052 NaN 12 NaN
3 1999-12-31 ABF UN Equity 22.0625 21.9375 ... 0.0052 NaN 12 NaN
4 1999-12-31 ABM UN Equity 10.2188 10.1250 ... 0.0052 NaN 12 NaN
I would like to run an OLS regression from the formula 'Y ~ x1 + x2:x3' by the group ['TICKER','year','month'] (year is a column which does not appear here) from statsmodels.formula.api as smf. I therefore use:
data.groupby(['TICKER','year','month']).apply(lambda x: smf.ols(formula='Y ~ x1 + x2:x3', data=x))
However, I get the following error:
IndexError: tuple index out of range
Any idea why?
The full tracebakc is
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 894, in apply
result = self._python_apply_general(f, self._selected_obj)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 928, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\ops.py", line 238, in apply
res = f(group)
File "<input>", line 1, in <lambda>
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py", line 195, in from_formula
mod = cls(endog, exog, *args, **kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\regression\linear_model.py", line 872, in __init__
super(OLS, self).__init__(endog, exog, missing=missing,
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\regression\linear_model.py", line 703, in __init__
super(WLS, self).__init__(endog, exog, missing=missing,
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\regression\linear_model.py", line 190, in __init__
super(RegressionModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py", line 237, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py", line 77, in __init__
self.data = self._handle_data(endog, exog, missing, hasconst,
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py", line 101, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\data.py", line 672, in handle_data
return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\data.py", line 71, in __init__
arrays, nan_idx = self.handle_missing(endog, exog, missing,
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\data.py", line 247, in handle_missing
if combined_nans.shape[0] != nan_mask.shape[0]:
IndexError: tuple index out of range

I see that your Y columns has a lot of NaNs, so you need to ensure that the subgroup has enough observations, so that the regression can work.
So if I use an example data:
import statsmodels.formula.api as smf
np.random.seed(123)
data = pd.concat([
pd.DataFrame({'TICKER':np.random.choice(['A','B','C'],30),
'year':np.random.choice([2000,2001],30),
'month':np.random.choice([1,2],30)}),
pd.DataFrame(np.random.normal(0,1,(30,4)),columns=['Y','x1','x2','x3'])
],axis=1)
data.loc[:6,'Y'] = np.nan
If I run your code on the data frame above, I get the same error.
So if we use only complete data (relevant for your regression):
complete_ix = data[['Y','x1','x2','x3']].dropna().index
data.loc[complete_ix].groupby(['TICKER','year','month']).apply(lambda x: smf.ols(formula='Y ~ x1 + x2:x3', data=x))
It works:
TICKER year month
A 2000 2 <statsmodels.regression.linear_model.OLS objec...
2001 1 <statsmodels.regression.linear_model.OLS objec...
2 <statsmodels.regression.linear_model.OLS objec...
B 2000 1 <statsmodels.regression.linear_model.OLS objec...
2 <statsmodels.regression.linear_model.OLS objec...
2001 1 <statsmodels.regression.linear_model.OLS objec...
C 2000 1 <statsmodels.regression.linear_model.OLS objec...
2 <statsmodels.regression.linear_model.OLS objec...

Related

Pandas apply getting KeyError: [duplicate]

This question already has an answer here:
Why do I get a KeyError when using pandas apply?
(1 answer)
Closed 13 days ago.
I was looking at this answer by Roman Pekar for using apply. I initially copied the code exactly and it worked fine. Then I used it on my df3 that is created from a csv file and I got a KeyError. I checked datatypes the columns I was using are int64, so that is okay. I don't have nulls. If I can get this working then I will make the function more complex. How do I get this working?
def fxy(x, y):
return x * y
df3 = pd.read_csv(path + 'test_data.csv', usecols=[0,1,2])
print(df3.dtypes)
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
Trace back
Traceback (most recent call last):
File "f:\...\my_file.py", line 54, in <module>
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
File "C:\...\frame.py", line 8833, in apply
return op.apply().__finalize__(self, method="apply")
File "C:\...\apply.py", line 727, in apply
return self.apply_standard()
File "C:\...\apply.py", line 851, in apply_standard
results, res_index = self.apply_series_generator()
File "C:\...\apply.py", line 867, in apply_series_generator
results[i] = self.f(v)
File "f:\...\my_file.py", line 54, in <lambda>
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
File "C:\...\series.py", line 958, in __getitem__
return self._get_value(key)
File "C:\...\series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "C:\...\range.py", line 389, in get_loc
raise KeyError(key)
KeyError: 'Len'
I don't see a way to attach the csv file. Below is Sample df3 if I save the below with excel as "CSV (Comma delimited)(*.csv) I get the same results.
ID
Len
Width
A
170
4
B
362
5
C
12
15
D
42
7
E
15
3
F
46
49
G
71
74

I think you miss the axis=1 on apply:
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']), axis=1)
But in your case, you can just do:
df3['Area'] = df3['Len'] * df3['Width']
print(df3)
# Output
ID Len Width Area
0 A 170 4 680
1 B 362 5 1810
2 C 12 15 180
3 D 42 7 294
4 E 15 3 45
5 F 46 49 2254
6 G 71 74 5254

How to convert label data with None values to OneHot using LabelBinarizer sklearn

I have label data that same of the values are np.nan.
I want to convert the data to OneHot vector using LabelBinarizer, and the np.nan will convert to zero-array.
But I get an error. I success to convert the data with get_dummies from pandas model.
I can't use the get_dummies function because the train and the test data coming with different files and different time. I want to use sklearn model, for save it, and us the model latter.
Code for example:
In[11]: df = pd.DataFrame({'CITY':['London','NYC','Manchester',np.nan],'Country':['UK','US','UK','AUS']})
In[12]: df
Out[12]:
CITY Country
0 London UK
1 NYC US
2 Manchester UK
3 NaN AUS
In[13]: pd.get_dummies(df['CITY'])
Out[13]:
London Manchester NYC
0 1 0 0
1 0 0 1
2 0 1 0
3 0 0 0
In[14]: from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
In[15]: lb.fit_transform(df['CITY'])
Traceback (most recent call last):
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-16-d0afb38b2695>", line 1, in <module>
lb.fit_transform(df['CITY'])
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 307, in fit_transform
return self.fit(y).transform(y)
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 276, in fit
self.y_type_ = type_of_target(y)
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 288, in type_of_target
if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 223, in unique
return _unique1d(ar, return_index, return_inverse, return_counts)
File "/home/oshrib/.conda/envs/on_target/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 283, in _unique1d
ar.sort()
TypeError: unorderable types: float() < str()

Error while implementing ARIMA in Python on Quandl Data

df = quandl.get('NSE/TATAMOTORS', start_date='2000-01-01', end_date='2018-05-10')
df=df.drop(['Last','Total Trade Quantity','Turnover (Lacs)'], axis=1)
df.head(10)
OUTPUT -
Open High Low Close
Date
2003-12-26 435.80 440.50 431.65 438.60
2003-12-29 441.00 449.70 441.00 447.80
2003-12-30 450.00 451.90 430.10 442.40
2003-12-31 446.00 459.30 443.55 452.05
2004-01-01 453.25 457.90 451.50 454.45
2004-01-02 458.00 460.35 454.05 456.40
2004-01-05 458.00 465.00 450.60 454.85
2004-01-06 460.00 465.00 448.50 454.45
2004-01-07 451.40 454.70 438.10 446.45
2004-01-08 449.00 466.95 449.00 464.75
-
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df, order=(5,1,0))
OUTPUT -
Traceback (most recent call last):
File "<ipython-input-90-799de8e60d6f>", line 1, in <module>
model = ARIMA(df, order=(5,1,0))
File "D:\A\lib\site-packages\statsmodels\tsa\arima_model.py", line 1000, in __new__
mod.__init__(endog, order, exog, dates, freq, missing)
File "D:\A\lib\site-packages\statsmodels\tsa\arima_model.py", line 1024, in __init__
self.data.ynames = 'D.' + self.endog_names
TypeError: must be str, not list
So i converted the index column containing dates to proper column
by -
df = df.reset_index()
df.head(10)
Out[92]:
Date Open High Low Close
0 2003-12-26 435.80 440.50 431.65 438.60
1 2003-12-29 441.00 449.70 441.00 447.80
2 2003-12-30 450.00 451.90 430.10 442.40
3 2003-12-31 446.00 459.30 443.55 452.05
4 2004-01-01 453.25 457.90 451.50 454.45
5 2004-01-02 458.00 460.35 454.05 456.40
6 2004-01-05 458.00 465.00 450.60 454.85
7 2004-01-06 460.00 465.00 448.50 454.45
8 2004-01-07 451.40 454.70 438.10 446.45
9 2004-01-08 449.00 466.95 449.00 464.75
then when i run this line -
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df, order=(5,1,0))
OUTPUT -
Traceback (most recent call last):
File "<ipython-input-94-799de8e60d6f>", line 1, in <module>
model = ARIMA(df, order=(5,1,0))
File "D:\A\lib\site-packages\statsmodels\tsa\arima_model.py", line 1000, in __new__
mod.__init__(endog, order, exog, dates, freq, missing)
File "D:\A\lib\site-packages\statsmodels\tsa\arima_model.py", line 1015, in __init__
super(ARIMA, self).__init__(endog, (p, q), exog, dates, freq, missing)
File "D:\A\lib\site-packages\statsmodels\tsa\arima_model.py", line 452, in __init__
super(ARMA, self).__init__(endog, exog, dates, freq, missing=missing)
File "D:\A\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 43, in __init__
super(TimeSeriesModel, self).__init__(endog, exog, missing=missing)
File "D:\A\lib\site-packages\statsmodels\base\model.py", line 212, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "D:\A\lib\site-packages\statsmodels\base\model.py", line 63, in __init__
**kwargs)
File "D:\A\lib\site-packages\statsmodels\base\model.py", line 88, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "D:\A\lib\site-packages\statsmodels\base\data.py", line 630, in handle_data
**kwargs)
File "D:\A\lib\site-packages\statsmodels\base\data.py", line 76, in __init__
self.endog, self.exog = self._convert_endog_exog(endog, exog)
File "D:\A\lib\site-packages\statsmodels\base\data.py", line 471, in _convert_endog_exog
raise ValueError("Pandas data cast to numpy dtype of object. "
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
HELP?

ARIMA is expected a array-like object, if we instead of using a 2D array(dataframe) and use a 1D array(Series) and this will work.
Try:
ARIMA(df['Close'].values, order=(5,1,0))
where df has a datetime in index and you select one column:
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10 entries, 2003-12-26 to 2004-01-08
Data columns (total 4 columns):
Open 10 non-null float64
High 10 non-null float64
Low 10 non-null float64
Close 10 non-null float64
dtypes: float64(4)
memory usage: 400.0 bytes

Query hdf5 datetime column

I have a hdf5 file that contains a table where the column time is in datetime64[ns] format.
I want to get all the rows that are older than thresh. How can I do that? This is what I've tried:
thresh = pd.datetime.strptime('2018-03-08 14:19:41','%Y-%m-%d %H:%M:%S').timestamp()
hdf = pd.read_hdf(STORE, 'gh1', where = 'time>thresh' )
I get the following error:
Traceback (most recent call last):
File "<ipython-input-80-fa444735d0a9>", line 1, in <module>
runfile('/home/joao/github/control_panel/controlpanel/controlpanel/reading_test.py', wdir='/home/joao/github/control_panel/controlpanel/controlpanel')
File "/home/joao/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/home/joao/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/joao/github/control_panel/controlpanel/controlpanel/reading_test.py", line 15, in <module>
hdf = pd.read_hdf(STORE, 'gh1', where = 'time>thresh' )
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 370, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 717, in select
return it.get_result()
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1457, in get_result
results = self.func(self.start, self.stop, where)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 710, in func
columns=columns, **kwargs)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 4141, in read
if not self.read_axes(where=where, **kwargs):
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 3340, in read_axes
self.selection = Selection(self, where=where, **kwargs)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 4706, in __init__
self.condition, self.filter = self.terms.evaluate()
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 556, in evaluate
self.condition = self.terms.prune(ConditionBinOp)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 118, in prune
res = pr(left.value, right.value)
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 113, in pr
encoding=self.encoding).evaluate()
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 327, in evaluate
values = [self.convert_value(v) for v in rhs]
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 327, in <listcomp>
values = [self.convert_value(v) for v in rhs]
File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 185, in convert_value
v = pd.Timestamp(v)
File "pandas/_libs/tslib.pyx", line 390, in pandas._libs.tslib.Timestamp.__new__
File "pandas/_libs/tslib.pyx", line 1549, in pandas._libs.tslib.convert_to_tsobject
File "pandas/_libs/tslib.pyx", line 1735, in pandas._libs.tslib.convert_str_to_tsobject
ValueError: could not convert string to Timestamp

Demo:
creating sample DF (100.000 rows):
In [9]: N = 10**5
In [10]: dates = pd.date_range('1980-01-01', freq='99T', periods=N)
In [11]: df = pd.DataFrame({'date':dates, 'val':np.random.rand(N)})
In [12]: df
Out[12]:
date val
0 1980-01-01 00:00:00 0.985215
1 1980-01-01 01:39:00 0.452295
2 1980-01-01 03:18:00 0.780096
3 1980-01-01 04:57:00 0.004596
4 1980-01-01 06:36:00 0.515051
... ... ...
99995 1998-10-27 15:45:00 0.509954
99996 1998-10-27 17:24:00 0.046636
99997 1998-10-27 19:03:00 0.026678
99998 1998-10-27 20:42:00 0.660652
99999 1998-10-27 22:21:00 0.839426
[100000 rows x 2 columns]
writing it to HDF5 file (index date column):
In [13]: df.to_hdf('d:/temp/test.h5', 'test', format='t', data_columns=['date'])
read HDF5 conditionally by index:
In [14]: x = pd.read_hdf('d:/temp/test.h5', 'test', where="date > '1998-10-27 15:00:00'")
In [15]: x
Out[15]:
date val
99995 1998-10-27 15:45:00 0.509954
99996 1998-10-27 17:24:00 0.046636
99997 1998-10-27 19:03:00 0.026678
99998 1998-10-27 20:42:00 0.660652
99999 1998-10-27 22:21:00 0.839426

Pandas KeyError when working on split data frame

I want to perform some operations on a pandas data frame that is split into chunks. After splitting the data frame, I then try to iterate over the chunks, but after the first iterations runs well, I get an error (see below). I have gone through some questions like these: 1 and 2 but they don't quite address my issue. Kindly help me resolve this as I don't fully understand it.
import pandas as pd
tupList = [('Eisenstadt', 'Paris','1', '2'), ('London', 'Berlin','1','3'), ('Berlin', 'stuttgat','1', '4'),
('Liverpool', 'Southampton','1', '5'),('Tirana', 'Blackpool', '1', '6'),('blackpool', 'tirana','1','7'),
('Paris', 'Lyon','1','8'), ('Manchester', 'Nice','1','10'),('Orleans', 'Madrid','1', '12'),
('Lisbon','Stockholm','1','12')]
cities = pd.DataFrame(tupList, columns=['Origin', 'Destination', 'O_Code', 'D_code'])
# purpose - splits the DataFrame into smaller of max size chunkSize (last is smaller)
def splitDataFrameIntoSmaller(df, chunkSize = 3):
listOfDf = list()
numberChunks = len(df) // chunkSize + 1
for i in range(numberChunks):
listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
return listOfDf
citiesChunks = splitDataFrameIntoSmaller(cities)
for ind, cc in enumerate(citiesChunks):
cc["distance"] = 0
cc["time"] = 0
for i in xrange(len(cc)):
al = cc['Origin'][i]
bl = cc['Destination'][i]
'...' #trucating to make it readable
cc.to_csv('out.csv', sep=',', encoding='utf-8')
Traceback (most recent call last):
File ..., line 39, in <module>
al = cc['Origin'][i]
File ..., line 603, in __getitem__
result = self.index.get_value(self, key)
File ..., line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas\index.c:3557)
File "pandas\index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas\index.c:3240)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8564)
File "pandas\src\hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8508)
KeyError: 0L

You can first floor divide index values and then use list comprehension - loop by unique values and select by loc, last reset_index for remove duplicated index:
cities.index = cities.index // 3
print (cities)
Origin Destination O_Code D_code
0 Eisenstadt Paris 1 2
0 London Berlin 1 3
0 Berlin stuttgat 1 4
1 Liverpool Southampton 1 5
1 Tirana Blackpool 1 6
1 blackpool tirana 1 7
2 Paris Lyon 1 8
2 Manchester Nice 1 10
2 Orleans Madrid 1 12
3 Lisbon Stockholm 1 12
citiesChunks = [cities.loc[[x]].reset_index(drop=True) for x in cities.index.unique()]
#print (citiesChunks)
print (citiesChunks[0])
Origin Destination O_Code D_code
0 Eisenstadt Paris 1 2
1 London Berlin 1 3
2 Berlin stuttgat 1 4
Last need iterrows if need loop in DataFrame:
#write columns to file first
cols = ['Origin', 'Destination', 'O_Code', 'D_code', 'distance', 'time']
df = pd.DataFrame(columns=cols)
df.to_csv('out.csv', encoding='utf-8', index=False)
for ind, cc in enumerate(citiesChunks):
cc["distance"] = 0
cc["time"] = 0
for i, val in cc.iterrows():
al = cc.loc[i, 'Origin']
bl = cc.loc[i, 'Destination']
'...' #trucating to make it readable
cc.to_csv('out.csv', encoding='utf-8', mode='a', header=None, index=False)
print (cc.to_csv(encoding='utf-8'))
,Origin,Destination,O_Code,D_code,distance,time
0,Eisenstadt,Paris,1,2,0,0
1,London,Berlin,1,3,0,0
2,Berlin,stuttgat,1,4,0,0
,Origin,Destination,O_Code,D_code,distance,time
0,Liverpool,Southampton,1,5,0,0
1,Tirana,Blackpool,1,6,0,0
2,blackpool,tirana,1,7,0,0
,Origin,Destination,O_Code,D_code,distance,time
0,Paris,Lyon,1,8,0,0
1,Manchester,Nice,1,10,0,0
2,Orleans,Madrid,1,12,0,0
,Origin,Destination,O_Code,D_code,distance,time
0,Lisbon,Stockholm,1,12,0,0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

statsmodels ols from formula with groupby pandas - python

Related

Pandas apply getting KeyError: [duplicate]

How to convert label data with None values to OneHot using LabelBinarizer sklearn

Error while implementing ARIMA in Python on Quandl Data

Query hdf5 datetime column

Pandas KeyError when working on split data frame

Categories

Resources