I am trying to use pick to save and load my ML models but I get an error. Here is the simplify version of my code to save my model:
import pickle
def test(x,y):
return x+y
filename = 'test.pkl'
pickle.dump(test, open(filename, 'wb'))
I can load the pickle file from the same notebook that I am creating it but if I close the notebook and try to load the pick in a new one with the below code:
import pickle
filename = 'test.pkl'
loaded_model = pickle.load(open(filename, 'rb'))
It gets me this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 2
1 filename = 'test.pkl'
----> 2 loaded_model = pickle.load(open(filename, 'rb'))
AttributeError: Can't get attribute 'test' on <module '__main__'>
Related
I created a RandomForest model with PySpark.
I need to save this model as a file with .pkl extension, for this I used the pickle library, but when I go to use it I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-76-bf32d5617a63> in <module>()
2
3 filename = "drive/My Drive/Progetto BigData/APPOGGIO/Modelli/SVM/svm_sentiment_analysis"
----> 4 pickle.dump(model, open(filename, "wb"))
TypeError: can't pickle _thread.RLock objects
Is it possible to use PICKLE with a PySPark model like RandomForest or can it only be used with a Scikit-learn model ???
This is my code:
from pyspark.ml.classification import RandomForestClassifier
rf = RandomForestClassifier(labelCol = "label", featuresCol = "word2vect", weightCol = "classWeigth", seed = 0, maxDepth=10, numTrees=100, impurity="gini")
model = rf.fit(train_df)
# Save our model into a file with the help of pickle library
filename = "drive/My Drive/Progetto BigData/APPOGGIO/Modelli/SVM/svm_sentiment_analysis"
pickle.dump(model, open(filename, "wb"))
My environment is Google Colab
I need to transform the model into a PICKLE file to create a webapp, to save it I normally use the .save(path) method, in this case I don't need the .save .
Is it possible that a PySpark model cannot be transformed into a file?
Thanks in advance!!
I'm trying to import office365.sharepoint.file and I keep getting the error
ModuleNotFoundError: No module named 'office365.sharepoint.file'
I'm stuck because I haven't been able to find out whether the method used to import it has changed.
Any help would be appreciated
import io
from office365.sharepoint.file import File
response = File.open_binary(ctx, relative_url)
#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start
#read file into pandas dataframe
df = pd.read_excel(bytes_file_obj)```
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-9-ce2948d5390a> in <module>
1 import pandas as pd
2 import io
----> 3 from office365.sharepoint.file import File
4
5 response = File.open_binary(ctx, relative_url)
ModuleNotFoundError: No module named 'office365.sharepoint.file'```
I found that one has to path the full URL to File, not just the path:
from office365.sharepoint.files.file import File
Try to install the o365 library. It would solve your problem.
I'm getting a weird error when running the following code:
import os
import urllib.request
import pandas as pd
data_url = 'URL FROM GOOGLE DRIVE DOWNLOAD'
file_name = 'mask.pkl'
data_dir = os.path.join('tempdata', 'test')
file_path = os.path.join(data_dir, file_name)
gdrive_file(file_path, data_url, data_dir)
x = pd.read_pickle(file_path)
mask = x[0]
Where the data_url is a download link from Google Drive, but it's just a .pkl file, any .pkl you use to test this should throw the same error. The gdrive_file() function is defined as follows:
def gdrive_file(file_path, data_url, data_dir):
if file_path is True:
pass
if not os.path.isfile(file_path):
print('Fetching example data file')
os.makedirs(data_dir, exist_ok=True)
return urllib.request.urlretrieve(data_url, file_path)
Everything works great up to the point where I use pandas to read the .pkl file. I get the following error:
In [11]: pd.read_pickle(file_path)
#---------------------------------------------
UnpicklingError
Traceback (most recent call last)
<ipython-input-11-f996be11e8eb> in <module>
----> 1 pd.read_pickle(file_path)
~/anaconda3/envs/reborn/lib/python3.7/site-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
180 # We want to silence any warnings about, e.g. moved modules.
181 warnings.simplefilter("ignore", Warning)
--> 182 return pickle.load(f)
183 except excs_to_catch:
184 # e.g.
UnpicklingError: invalid load key, '<'.
I've used this same code to open other data types, it's just having an issue with opening .pkl file using pd.read_pickle(file_path) and I'm not sure why.
Here is my code
#my process class----------
class Process(object):
def PrintName(self, name):
print('Your name is : ', name)
#pickling-------------
import pickle
model = Process()
filename = 'Process.pkl'
pickle.dump(model, open(filename, 'wb'))
#loading the pickle-------------
model = pickle.load(open('Process.pkl', 'rb'))
while i run above code on jupyter notebook,i got an error AttributeError: 'module' object has no attribute 'Process',
confused which line causes the error
any help would be appreciated
In Python, Indentation is important.
Everything after your function was still part of the class Process due to a error in your indentation.
you can read more about indentation and coding styles in general for python here
I formated your code according to PEP8 for you and it should work now:
import pickle
# my process class----------
class Process(object):
def PrintName(self, name):
print('Your name is : ', name)
# pickling-------------
model = Process()
filename = 'Process.pkl'
pickle.dump(model, open(filename, 'wb'))
# loading the pickle-------------
model = pickle.load(open('Process.pkl', 'rb'))
I am trying to analyze a tensor data, but I could not read the data in picked file by using np.load(). My python code is as follows:
import pickle
import numpy as np
import sktensor as skt
import numpy.random as rn
data = np.ones((10, 8, 3), dtype='int32') # 3-mode count tensor of size 10 x 8 x 3
##data = skt.dtensor(data)
with open('data.dat', 'w+') as f: # can be stored as a .dat using pickle
pickle.dump(data, f)
with open('data.dat', 'r+') as f: # can be loaded back in using pickle.load
tmp = pickle.load(f)
assert np.allclose(tmp, data)
But when I attempted to use np.load() to load the data in data.bat as follows:
np.load('G:\data.dat')
Some error appears as"
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
np.load('D:/GDELT_Tensor/data.dat', mmap_mode = 'r')
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 416, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'D:/data.dat' as a pickle.
Anyone can help me?
Don't use the pickle module to save NumPy arrays. Instead, use one of the methods here: http://docs.scipy.org/doc/numpy/reference/routines.io.html
There's even one that uses pickle under the hood, for example:
np.save('data.dat', data)
tmp = np.load('data.dat')
Another format like CSV or HDF5 might be more suitable for most applications--especially where you might want to interoperate with non-Python systems.