I am getting an error while using the following command
trainset, testset = train_test_split(t2data, test_size=.15,train_size=0.85)
The dataset contains user rating, user ids and product ids.
error message:
AttributeError: 'DataFrame' object has no attribute 'raw_ratings'
My dataframe doesn't have any attribute by the name raw_ratings.
This is how I am reading the CSV:
rdata = pd.read_csv('ratings_Electronics.csv', header=0, names ['userid','productid','rating','timestamp'],skipinitialspace=True)
So i am unable to understand how this error is coming. Any help would be appreciated. thanks
detailed error:
AttributeError Traceback (most recent call last)
in ()
----> 1 trainset, testset = train_test_split(t2data, test_size=.15,train_size=0.85)
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in getattr(self, name)
5134 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5135 return self[name]
-> 5136 return object.getattribute(self, name)
5137
5138 def setattr(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'raw_ratings'
You may be using the wrong data type. Very much possible you are using panada data frame whereas surprise dataset is expected.
I found this example helpful https://github.com/NicolasHug/Surprise/issues/20
from NicholasHug.
Solution worked for me.
You are reading the CSV in rdata variable and splitting the t2data.
Related
ISL_eventPassdf[ISL_eventPassdf["match_id"].isin([3817897, 3813305])]["match_id"].drop_duplicates()
Series([], Name: match_Id, dtype: int64)\
ISL_FINAL_Data =ISL_eventPassdf[ISL_eventPassdf["match_id"].isin([3817897, 3813305])]["match_id"]
ISL_FINAL_Data.pivot_table(values="type.id", index="player.name", columns="pass.recipient.name", aggfunc="count")
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11460/1068599328.py in
----> 1 ISL_FINAL_Data.pivot_table(values="type.id", index="player.name", columns="pass.recipient.name", aggfunc="count")
C:\Python\Python310\lib\site-packages\pandas\core\generic.py in getattr(self, name)
5905 ):
5906 return self[name]
-> 5907 return object.getattribute(self, name)
5908
5909 def setattr(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'pivot_table'
please help me to fix this
error shows 'Series' object has no attribute 'pivot_table'
Trying to run some python code on Google colab. I have some preprocessed data that I need to read in:
train, test, unused_feat, target_features, features, cat_idxs, cat_dims = pickle.load(open('/content/drive/My Drive/xxx/data/train_test.pkl', 'rb'))
Then if I call train I get the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
--> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()
8 frames
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__get__()
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5268 or name in self._accessors
5269 ):
-> 5270 return object.__getattribute__(self, name)
5271 else:
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
AttributeError: 'DataFrame' object has no attribute '_data'
and when I call
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-35-5e6a15ce28a5> in <module>()
----> 1 train.shape
321 frames
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__get__()
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5268 or name in self._accessors
5269 ):
-> 5270 return object.__getattribute__(self, name)
5271 else:
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
RecursionError: maximum recursion depth exceeded while calling a Python object
When I work on my own jupyter notebook I have no problem with this. I wonder what went wrong. Also train contains roughly 250000 rows.
I think you could change the recursion limit if that is your constraint, however, you might be in an endless loop.
This post tells you how to increase your recursion limit.
What is the maximum recursion depth in Python, and how to increase it?
However, I am not sure about the other error you got. Hope it helps a bit.
I am learning about python/pandas attributes in a Series. I can get it to display the min and max values, but I want to display the min and max index values and I get an error message.
google.min()
49.95
google.max()
782.22
google.idmin()
AttributeError Traceback (most recent
call last) in
----> 1 google.idmin(True)
/opt/anaconda3/envs/pandas_playground/lib/python3.8/site-packages/pandas/core/generic.py
in getattr(self, name) 5272 if
self._info_axis._can_hold_identifiers_and_holds_name(name): 5273
return self[name]
-> 5274 return object.getattribute(self, name) 5275 5276 def setattr(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'idmin'
After some searching, I found I was simply using the wrong methods.
idxmin and idxmax work just fine.
google.idxmax()
3011
google.idxmin()
11
I've trained an XGBoost Classifier for binary classification. While training the model on train data using CV and predicting on the test data, I face the error AttributeError: 'DataFrame' object has no attribute 'feature_names'.
My code is as follows:
folds = StratifiedKFold(n_splits=5, shuffle=False, random_state=44000)
oof = np.zeros(len(X_train))
predictions = np.zeros(len(X_test))
for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train, y_train)):
print("Fold {}".format(fold_+1))
trn_data = xgb.DMatrix(X_train.iloc[trn_idx], y_train.iloc[trn_idx])
val_data = xgb.DMatrix(X_train.iloc[val_idx], y_train.iloc[val_idx])
clf = xgb.train(params = best_params,
dtrain = trn_data,
num_boost_round = 2000,
evals = [(trn_data, 'train'), (val_data, 'valid')],
maximize = False,
early_stopping_rounds = 100,
verbose_eval=100)
oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)
predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits
How to deal with it?
Here is the complete error trace:
Fold 1
[0] train-auc:0.919667 valid-auc:0.822968
Multiple eval metrics have been passed: 'valid-auc' will be used for early stopping.
Will train until valid-auc hasn't improved in 100 rounds.
[100] train-auc:1 valid-auc:0.974659
[200] train-auc:1 valid-auc:0.97668
[300] train-auc:1 valid-auc:0.977696
[400] train-auc:1 valid-auc:0.977704
Stopping. Best iteration:
[376] train-auc:1 valid-auc:0.977862
Exception ignored in: <bound method DMatrix.__del__ of <xgboost.core.DMatrix object at 0x7f3d9c285550>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/xgboost/core.py", line 368, in __del__
if self.handle is not None:
AttributeError: 'DMatrix' object has no attribute 'handle'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-55-d52b20cc0183> in <module>()
19 verbose_eval=100)
20
---> 21 oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)
22
23 predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits
/usr/local/lib/python3.6/dist-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs)
1042 option_mask |= 0x08
1043
-> 1044 self._validate_features(data)
1045
1046 length = c_bst_ulong()
/usr/local/lib/python3.6/dist-packages/xgboost/core.py in _validate_features(self, data)
1271 else:
1272 # Booster can't accept data with different feature names
-> 1273 if self.feature_names != data.feature_names:
1274 dat_missing = set(self.feature_names) - set(data.feature_names)
1275 my_missing = set(data.feature_names) - set(self.feature_names)
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
3612 if name in self._info_axis:
3613 return self[name]
-> 3614 return object.__getattribute__(self, name)
3615
3616 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'feature_names'
The problem has been solved. The problem is, I didn't converted the X_train.iloc[val_idx] to xgb.DMatrix. After converting X_train.iloc[val_idx] and X_test to xgb.DMatrix the plroblem was gone!
Updated the following two lines:
oof[val_idx] = clf.predict(xgb.DMatrix(X_train.iloc[val_idx]), ntree_limit=clf.best_ntree_limit)
predictions += clf.predict(xgb.DMatrix(X_test), ntree_limit=clf.best_ntree_limit)/folds.n_splits
I am new to python. Just following the tutorial: https://www.hackerearth.com/practice/machine-learning/machine-learning-projects/python-project/tutorial/
This is the dataframe miss:
miss = train.isnull().sum()/len(train)
miss = miss[miss>0]
miss.sort_values(inplace = True)
miss
Electrical 0.000685
MasVnrType 0.005479
MasVnrArea 0.005479
BsmtQual 0.025342
BsmtCond 0.025342
BsmtFinType1 0.025342
BsmtExposure 0.026027
BsmtFinType2 0.026027
GarageCond 0.055479
GarageQual 0.055479
GarageFinish 0.055479
GarageType 0.055479
GarageYrBlt 0.055479
LotFrontage 0.177397
FireplaceQu 0.472603
Fence 0.807534
Alley 0.937671
MiscFeature 0.963014
PoolQC 0.995205
dtype: float64
Now I just want to visualize those missing values"
#visualising missing values
miss = miss.to_frame()
miss.columns = ['count']
miss.index.names = ['Name']
miss['Name'] = miss.index
And this is the error I got:
AttributeError Traceback (most recent call last)
<ipython-input-42-cd3b25e8862a> in <module>()
1 #visualising missing values
----> 2 miss = miss.to_frame()
C:\Users\Username\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
2742 if name in self._info_axis:
2743 return self[name]
-> 2744 return object.__getattribute__(self, name)
2745
2746 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'to_frame'
What am I missing here?
Check print(type(miss)) it should be <class 'pandas.core.series.Series'>
You have is dataframe, somewhere in the code you are doing wrong.
df = pd.DataFrame()
df.to_frame()
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\UR_NAME\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_frame'
I traced the tutorial, and below is the order flow
train = pd.read_csv("train.csv")
print(type(train)) # <class 'pandas.core.frame.DataFrame'>
miss = train.isnull().sum()/len(train)
print(type(miss)) # <class 'pandas.core.series.Series'>
miss = train.isnull().sum()/len(train) converts in into pandas.core.series.Series from pandas.core.frame.DataFrame
You are probably messed code at this place.
If you use Notebook while the current cell is running, "miss" is converted to a data frame so that the output is displayed the first time. If you run the cell again, you will get an/the error because it is already a data frame. So run the previous cell again and then run the current cell to fix the problem. The notebook itself works this way.