Azure blob storage model access for gensim in python

Azure blob storage model access for gensim in python - python

I am trying to load my model files using below code
import gensim
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open
azure_storage_connection_string = "DefaultEndpointsProtocol=https;AccountName=lnipcfdevlanding;AccountKey=xxxxxxxxx"
client = BlobServiceClient.from_connection_string(azure_storage_connection_string)
file_prefix="azure://landing/TechnologyCluster/VectorCreation/embeddings/"
fin = open(file_prefix+"word2vec.Tobacco.fasttext.model", transport_params=dict(client=client))
clustering.embedding = gensim.models.Word2Vec.load(fin)
But it is failing with below error
AttributeError: '_io.TextIOWrapper' object has no attribute 'endswith'
I assume the way I am passing file to gensim.models.Word2Vec.load is not the right way. I could not find any good example that how to pass the filename which is on Azure blob storage, if I give complete uri it does not work, what is the right way to achieve this ?

Please check :
AttributeError usually occurs when save() or load() function is called on an object instance instead of class as that is a class method..
Please note that the information in file can be incomplete ,check if the binary tree is missing.In this case, the query may be obtained ,but you may not be able to continue with a model loaded with this way.
Check if the file path must be saved in word2vec-format file
binary is bool, if True, indicates data is in binary word2vec format.
Note from: https://radimrehurek.com/gensim/models/keyedvectors.html.
Check if this way can be worked around by importing required models and by checking version compliance : -
gensim.models.Word2Vec.load_word2vec_format('model', binary=True)
or withKeyedVectors.load in place of .Word2Vec.load_... according to what supports fasttext.
Also check whether the model is correctly supports the
function.
References:
python - AttributeError: 'Word2Vec' object has no attribute
'endswith' - Stack Overflow
models.keyedvectors – Store and query word vectors — gensim
(radimrehurek.com)

Related

How do I visualize and modify the content of a pickled function?

I need to use a pickled data processing function that was not written by me and I therefore do not know its content/structure. When I load it, a ModuleNotFound Error occurs:
ModuleNotFoundError: No module named 'sklearn.preprocessing.label'
I assume that the error occurs because the pickled object is trying to import a module named 'sklearn.preprocessing.label', which doesn't exist. I have tried to downgrade my sklearn version but that didn't work either. ¨
If I knew what the pickled object was doing I could simply make my own function to replace the function within the pickled object. In order to do that I would have to visualize the function contained within the pickled object, or remove the import sklearn.preprocessing.label.

sklearn.preprocessing.label was available in scikit-learn version 0.21 and below: https://github.com/scikit-learn/scikit-learn/tree/0.21.X/sklearn/preprocessing

Subclass of PyTorch dataset class cannot find dataset files

I'm trying to create a subclass of the PyTorch MNIST dataset class, which I call CustomMNISTDataset, as follows:
import torchvision.datasets as datasets
class CustomMNISTDataset(datasets.MNIST):
def __init__(self, root='/home/psando'):
super().__init__(root=root,
download=False)
but when I execute:
dataset = CustomMNISTDataset()
it fails with error: "RuntimeError: Dataset not found. You can use download=True to download it".
However, when I run the following in the same file:
dataset = datasets.MNIST(root='/home/psando', download=False)
print(len(dataset))
it succeeds and prints "60000", as expected.
Since CustomMNISTDataset subclasses datasets.MNIST why is the behavior different? I've verified that the path '/home/psando' contains the MNIST directory with raw and processed subdirectories (otherwise, explicitly calling the constructor for datasets.MNIST() would have failed). The current behavior implies that the call to super().__init__() within CustomMNISTDataset is not calling the constructor for datasets.MNIST which is very strange!
Other details: I'm using Python 3.6.8 with torch==1.6.0 and
torchvision==0.7.0. Any help would be appreciated!

This requires some source-diving, but your problem is this function. The path to the dataset is dependant on the name of the class, so when you subclass MNIST the root folder changes to /home/psando/CustomMNISTDataset
So if you rename /home/psando/MNIST to /home/psando/CustomMNISTDataset it works.

unpickle instance after refactoring module name

Previously I defined an ElectrodePositionsModel class in the module gselu.py in the package gselu, and pickled the ElectrodePositionsModel objects into some files.
Some time later it was decided to refactor the project and the package name gselu was changed to ielu
When I attempt to unpickle the old pickle files with pickle.load(), the process fails with the error, 'module' object has no attribute 'ElectrodePositionsModel'. What I understand of the Unpicklers behavior is that this is because the pickle thinks it has stored an instance of gselu.gselu.ElectrodePositionsModel, and tries to therefore import this class from this module. When it doesn't exist, it gives up.
I think that I am supposed to add something to the module's init.py to tell it where the gselu.gselu.ElectrodePositionsModel is, but I can't get the pickle.load() function to give me any error message other than 'module' has no attribute 'ElectrodePositionsModel' and I can't figure out where I am supposed to provide the correct path to find it. The code that does the unpickling is in the same module file (gselu.py) as the ElectrodePositionsModel class.
When I load the pickle file in an ipython session and manually import ElectrodePositionsModel, it loads correctly.
How do I tell the pickler where to load this module?

I realise this questions is old, but I just ran into a similar problem.
What I did to solve it was to take the old code and unpickle the data using that.
Then instead of pickling directly the custom classes, I pickled CustomClass.__dict__ which only contained raw python.
This data could then easily be imported in the new module by doing
a = NewNameOfCustomClass()
a.__dict__ = pickle.load('olddata.p', 'rb')
This method works if your custom class only has standard variables (such as builtins or numpy arrays, etc).

pickling and unpickling user-defined class

I have a user-defined class 'myclass' that I store on file with the pickle module, but I am having problem unpickling it. I have about 20 distinct instances of the same structure, that I save in distinct files. When I read each file, the code works on some files and not on others, when I get the error:
'module' object has no attribute 'myclass'
I have generated some files today and some other yesterday, and my code only works on the files generated today (I have NOT changed class definition between yesterday and today).
I was wondering if maybe my method is not robust, if I am not doing things as I should do, for example maybe I cannot pickled user-defined class, and if this is introducing some randomness in the process.
Another issue could be that the files that I generated yesterday were generated on a different machine --- because I work on an academic cluster, I have some login nodes and some computing nodes, that differs by architecture. So I generated yesterday files on the computing nodes, and today files on the login nodes, and I am reading everything on the login nodes.
As suggested in some of the comments, I have installed dill and loaded it with import dill as pickle. Now I can read the files from computing nodes to login nodes of the same cluster. But if I try to read the files generated on the computing node of one cluster, on the login node of another cluster I cannot. I get KeyError: 'ClassType' in _load_type(name) in dill.py
Can it be because the python version is different? I have generated the files with python2.7 and I read them with python3.3.
EDIT:
I can read the pickled files, if I use everywhere python 2.7. Sadly, part of my code, written in python 3, is not automatically compatible with python 2.7 :(

Can you from mymodule import myclass? Pickling does not pickle the class, just a reference to it. To load a pickled object python must be able to find the class that was to be used to create the object.
eg.
import pickle
class A(object):
pass
obj = A()
pickled = pickle.dumps(obj)
_A = A; del A # hide class
try:
pickle.loads(pickled)
except AttributeError as e:
print(e)
A = _A # unhide class
print(pickle.loads(pickled))

gdata AttributeError: 'ContactEntry' object has no attribute 'name'

Using GData Python libraries, version 2.0.18
Attempting to retrieve contact list using the Service approach (not Client like the sample app). It appears that the return is mapped to a ContactEntry (good), but it gives error when I try to access the name attribute:
AttributeError: 'ContactEntry' object has no attribute 'name'
from gdata.contacts.service import ContactsService
(...)
self.client = ContactsService(source='appname', additional_headers=additional_headers )
feed = self.client.GetContactsFeed(uri=query.ToUri())
self.client is a gdata.contacts.service
GetContactsFeed uses
def GetContactsFeed(self, uri=None):
uri = uri or self.GetFeedUri()
return self.Get(uri, converter=gdata.contacts.ContactsFeedFromString)
The sample code uses desired_class=gdata.contacts.data.ContactsFeed
Seems like there should be a name attribute.
Is my syntax wrong?

Ok, here is the issue for the python contacts sample vs my implementation:
in gdata/sample/contacts/contacts_example.py it uses gdata.contacts.Client which (long chain to it) calls for the atom classes to use desired_class=gdata.contacts.data.ContactsFeed.
The service, as pointed out in the question, uses converter=gdata.contacts.ContactsFeedFromString.
This converter comes from the init file, src/gdata/contacts/init.py, as do the class definitions. Obviously at this point, you know what's coming -- the classes for the xml in the initializer do not match the ones in the data file.
I added these missing, incorrect ones in to initializer and things worked as expected. Alternatively, changing to use desired_class would do it too (at some point you'd have to map to converter...isn't supported directly in service.py), or adding converter to data.ContactsFeed, etc.
Hope this helps someone.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Azure blob storage model access for gensim in python - python

Related

How do I visualize and modify the content of a pickled function?

Subclass of PyTorch dataset class cannot find dataset files

unpickle instance after refactoring module name

pickling and unpickling user-defined class

gdata AttributeError: 'ContactEntry' object has no attribute 'name'

Categories

Resources