Attribute error after pickle - python

First I followed an iris tutorial and it worked great! the program ran fine and did everything it was supposed to do. Then I started working on a pickle tutorial to pickle data then open it again ... then everything went crazy. Now I have a pycache folder in my code folder that wasn't there and I am getting the following error:
AttributeError: module 'numpy' has no attribute 'dtype'
So far I have tried completely wiping scipy, numpy, sklearn, and pandas from my computer and reinstalling. Then I tried disabling rapport (I'm on an Ubuntu machine) because a part of the bug long error code kept talking about it.
Below is the program I ran that I think caused this.
Save Model Using Pickle
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
import pickle
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
# Fit the model on 33%
model = LogisticRegression()
model.fit(X_train, Y_train)
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))

Upon further investigation I realized I saved the code as pickle.py on my computer (in the same folder that the pycache was appearing). I changed it to pickle1.py and now everything works. Lesson learned don't name code after modules...

I might guess that your numpy installation somehow got stepped on. Maybe try "pip install --upgrade --force-reinstall numpy" in your command line?
Or maybe it's a line that says "numpy.dtype" somewhere is being used wrong. You'd have to share at least that line of code to see that though.
Just wild guesses without having your entire setup.

Related

OSError when loading tokenizer for huggingface model

I am trying to use this huggingface model and have been following the example provided, but I am getting an error when loading the tokenizer:
from transformers import AutoTokenizer
task = 'sentiment'
MODEL = f"cardiffnlp/twitter-roberta-base-{task}"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
OSError: Can't load tokenizer for 'cardiffnlp/twitter-roberta-base-sentiment'. Make sure that:
'cardiffnlp/twitter-roberta-base-sentiment' is a correct model identifier listed on 'https://huggingface.co/models'
or 'cardiffnlp/twitter-roberta-base-sentiment' is the correct path to a directory containing relevant tokenizer files
What I find very weird is that I was able to run my script several times but ran into an error after some time, while I don't recall changing anything in the meantime. Does anyone know what's the solution here?
EDIT: Here is my entire script:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
import numpy as np
from scipy.special import softmax
import csv
import urllib.request
task = 'sentiment'
MODEL = f"nlptown/bert-base-multilingual-uncased-{task}"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
labels = ['very_negative', 'negative', 'neutral', 'positive', 'very_positive']
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)
text = "I love you"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
print(scores)
The error seems to start happening when I run model.save_pretrained(MODEL), but this might be a coincidence.
I just came across this same issue. It seems like a bug with model.save_pretrained(), as you noted.
I was able to resolve by deleting the directory where the model had been saved (cardiffnlp/) and running again without model.save_pretrained().
Not sure what your application is. For me, re-downloading the model each time takes ~5s and that is acceptable.

How to solve Nameerror: name 'n' is not defined in train_test_split of scikit-learn 0.22 version without downgrading the version?

I am doing sentiment analysis and using scikit learn train_test_split function. But I am getting Nameerror: 'n' is not defined even though I have defined it. After checking various forums I found out that this error is pertaining in the new versions (after 0.19) of scikit learn. So the solution that is given is to downgrade the scikit learn to 0.19 version and it will work. But my problem is that I am working on python 3.7 and using anaconda3, jupyter notebook 6.0.3 and it is not downgrading to the older version.
What should I do? How to solve this issue?
def postprocess(data, n=1000000):
data = data.head(n)
data['tokens'] = data['Articles'].progress_map(tokenize) ## progress_map is a variant of the map function plus a progress bar. Handy to monitor DataFrame creations.
data = data[data.tokens != 'NC']
data.reset_index(inplace=True)
data.drop('index', inplace=True, axis=1)
return data
data = postprocess(data)
x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
np.array(data.head(n).Sentiment), test_size=0.2)
Error:
NameError Traceback (most recent call
last) in
----> 1 x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
2 np.array(data.head(n).Sentiment), test_size=0.2)
NameError: name 'n' is not defined
Thanks in Advance.
You don't seem to define n anywhere out of your postprocess function, plus it sounds very unlikely that such an error is due to a scikit-learn bug in recent versions (when claiming something like that, you should always include the results of your own research).
In any case, this will most probably work (provided that there are no other issues with your code & data):
n=1000000
data = postprocess(data, n=n)
x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
np.array(data.head(n).Sentiment), test_size=0.2)

Huge problem to display decision tree in Jupyter Notebook in Python: ExecutableNotFound?

I have a problem to create and display decision tree in Jupyter Notebook using Python.
My code is as below:
X = data.drop(["Risk"], axis=1)
y = data["Risk"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.tree import DecisionTreeClassifier
klasyfikator = DecisionTreeClassifier(criterion = "gini", random_state=0, max_depth=4, min_samples_leaf=1)
klasyfikator.fit(X = X, y = y)
data = export_graphviz(klasyfikator,out_file=None,feature_names=X.columns,class_names=["0", "1"],
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(data)
graph
Generally this decision tree concerns credit risk research 0 - will not pay 1 - will pay.
When I use code above, I have error like this:
ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH
I have already tried many solutions from StackOverflow, for example:
pip install graphviz
Conta install graphiz
I Downloaded Graphviz from http://www.graphviz.org/download/
I added to the PATH environment variable:
C:\Program Files (x86)\Graphviz2.38\bin
And there is still error described above. What can I do? what should I do ? Please help me guys because I'm losing hope of being able to draw this tree. Thank you!
Moreover, when I added by using this code:
import os
os.environ["PATH"] += os.pathsep + 'C:\Program Files (x86)\Graphviz2.38\bin'
I have in PATH something like this: C:\\Program Files (x86)\\Graphviz2.38\x08in it is not the same, what can I do ?
With latest version of sklearn, you can directly plot the decision tree without graphviz.
Use:
from sklearn.tree import plot_tree
plot_tree(klasyfikator)
Read more here.

Cannot find reference to Python package (plt.cm.py)

I have a small issue with running code from a tutorial that isn't working as it should. It's not a syntax problem for sure. I'm working with scikit-learn and matplotlib, and I'm getting a warning message in my IDE "Cannot find reference 'gray_r' in 'cm.py'..." All my packages are installed properly (via pip) and have worked for sample programs except this.
Any advice?
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
print(digits.target)
print(digits.images[0])
clf = svm.SVC(gamma=0.001, C=100)
print(len(digits.data))
x, y = digits.data[:-1], digits.target[:-1]
clf.fit(x,y)
print('Prediction:', clf.predict(digits.data[-1])
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
Well for starters your missing a closing parenthesis on your last print statement: print('Prediction:', clf.predict(digits.data[-1])) Other than that, this code runs on my computer with only a deprecation warning. What does the traceback say?

Python error in SVM classifier.predict()

I am getting the following error when i perform classification of new data with the following command in Python:
classifier.predict(new_data)
AttributeError: python 'SVC' object has no attribute _dual_coef_
In my laptop though, the command works fine! What's wrong?
I had this exact error
AttributeError: python 'SVC' object has no attribute _dual_coef_
with a model trained using scikit-learn version 0.15.2, when I tried to run it in scikit-learn version 0.16.1. I did solve it by re-training the model in the latest scikit-learn 0.16.1.
Make sure you are loading the right version of the package.
Have you loaded the model based on which you try to predict?
In this case it can be a version conflict, try to re-learn the model using the same sklearn version.
You can see a similar problem here: Sklearn error: 'SVR' object has no attribute '_impl'
I had the same problem,I use Sklearn version 0.23.02 but I was trying to run an archive trained with a version 0.18... and my error said: "'SVC' object has no attribute 'break_ties'", I just retrained the model with my version and fix the problem I generate other svc.pickle to run with the 0.23.02 version and replace the oldie.
"""
X = X_train
y = y_train
"""
X = X_test
y = y_test
# Instantiate and train the classifier
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(X, y)
# Check the results using metrics
from sklearn import metrics
y_pred = clf.predict(X)
print(metrics.confusion_matrix(y_pred, y))

Categories

Resources