Can I pickle a tensorflow model? - python

Will I be able to pickle all the .meta, .data and checkpoint files of a tensorflow model? That's because I want to run a prediction on my model and if i deploy it , the files can't be on disk right? I know about the tensorflow serving but I don't really understand it. I want to be able to load the tensforflow files without accessing the drive all the time.

Using pickle is not recommended. Instead, they have created a new format called "SavedModel format" that serves this exact purpose.
See: https://www.tensorflow.org/guide/saved_model

Related

How to download and use the universal sentence encoder instead of loading it from url

I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model
import tensorflow_hub as hub
model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
Here, instead of lookin at the url in tensorflow hub, Is there a way to download the model programmatically to a local location and load the model from local filesystem
Well i am not sure and i haven't tried it but i checked the source of the hub.load() and i found some interesting facts may be they help you for your problem
First of all the doc says
This function is roughly equivalent to the TF2 function
tf.saved_model.load() on the result of hub.resolve(handle).
Calling this function requires TF 1.14 or newer. It can be called
both in eager and graph mode.
that means the function can handle both URL or saved model in a file system, to confirm that i checked the documentation of hub.resolve() which is being used internally in hub.load() and there i found some thing of your interest
def resolve(handle):
"""Resolves a module handle into a path.
This function works both for plain TF2 SavedModels and the legacy TF1 Hub
format.
Resolves a module handle into a path by downloading and caching in
location specified by TF_HUB_CACHE_DIR if needed.
Currently, three types of module handles are supported:
1) Smart URL resolvers such as tfhub.dev, e.g.:
https://tfhub.dev/google/nnlm-en-dim128/1.
2) A directory on a file system supported by Tensorflow containing module
files. This may include a local directory (e.g. /usr/local/mymodule) or a
Google Cloud Storage bucket (gs://mymodule).
3) A URL pointing to a TGZ archive of a module, e.g.
https://example.com/mymodule.tar.gz.
Args:
handle: (string) the Module handle to resolve.
Returns:
A string representing the Module path.
"""
return registry.resolver(handle)
The documentation clearly says it supports the path to the local file system which points to the module/model files, you should now perform some experiments and give it a try. For more details have a look on this source file

Loading a FastText Model from s3 without Saving Locally

I am looking to use a FastText model in a ML pipeline that I made and saved as a .bin file on s3. My hope is to keep this all in a cloud based pipeline, so I don't want local files. I feel like I am really close, but I can't figure out how to make a temporary .bin file. I also am not sure if I am saving and reading the FastText model in the most efficient way. The below code works, but it saves the file locally which I want to avoid.
import smart_open
file = smart_open.smart_open(s3 location of .bin model)
listed = b''.join([i for i in file])
with open("ml_model.bin", "wb") as binary_file:
binary_file.write(listed)
model = fasttext.load_model("ml_model.bin")
If you want to use the fasttext wrapper for the official Facebook FastText code, you may need to create a local temporary copy - your troubles make it seem like that code relies on opening a local file path.
You could also try the Gensim package's separate FastText support, which should accept an S3 path via its load_facebook_model() function:
https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_model
(Note, though, that Gensim doesn't support all FastText functionality, like the supervised mode.)
As partially answered by the above response, a temporary file was needed. But on top of that, the temporary file needed to be passed as a string object, which is sort of strange. Working code below:
import tempfile
import fasttext
import smart_open
from pathlib import Path
file = smart_open.smart_open(f's3://{bucket_name}/{key}')
listed = b''.join([i for i in file])
with tempfile.TemporaryDirectory() as tdir:
tfile = Path(tdir).joinpath('tempfile.bin')
tfile.write_bytes(listed)
model = fasttext.load_model(str(tfile))

using pipelines with a local model

I am trying to use a simple pipeline offline. I am only allowed to download files directly from the web.
I went to https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/tree/main and downloaded all the files in a local folder C:\\Users\\me\\mymodel
However, when I tried to load the model I get a strange error
from transformers import pipeline
classifier = pipeline(task= 'sentiment-analysis',
model= "C:\\Users\\me\\mymodel",
tokenizer = "C:\\Users\\me\\mymodel")
ValueError: unable to parse C:\Users\me\mymodel\modelcard.json as a URL or as a local path
What is the issue here?
Thanks!
Must be either of the two cases:
You didn't download all the required files properly
Folder path is wrong
FYI, I am listing out the required contents in the directory:
config.json
pytorch_model.bin/ tf_model.h5
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt
the solution was slightly indirect:
load the model on a computer with internet access
save the model with save_pretrained()
transfer the folder obtained above to the offline machine and point its path in the pipeline call
The folder will contain all the expected files.

pickled python machine learning model uses hardcoded paths, doesn't run on other machine - what to do?

I use AutoGluon to create ML models locally on my computer.
Now I want to deploy them through AWS, but I realized that all the pickle files created in the process use hardcoded path references to other pickle files:
/home/myname/Desktop/ETC_PATH/AutoGluon/
I use cloudpickle.dump(predictor, open('FINAL_MODEL.pkl', 'wb')) to pickle the final ensemble model, but AutoGluon creates numerous other pickle files of the individual models, which are then referenced as /home/myname/Desktop/ETC_PATH/AutoGluon/models/ and /home/myname/Desktop/ETC_PATH/AutoGluon/models/specific_model/ and so forth...
How can I achieve that all absolute paths everywhere are replaced by relative paths like root/AutoGluon/WHATEVER_PATH, where root could be set to anything, depending on where the model is later saved.
Any pointers would be helpful.
EDIT: I'm reasonably sure I found the problem. If, instead of loading FINAL_MODEL.pkl (that seems to hardcode paths) I use AutoGluon's predictor = task.load(model_dir) it should find all dependencies correctly, whether or not the AutoGluon folder as a whole was moved. This issue on github helped
EDIT: This solved the problem: If, instead of loading FINAL_MODEL.pkl (that seems to hardcode paths) I use AutoGluon's predictor = task.load(model_dir) it should find all dependencies correctly, whether or not the AutoGluon folder as a whole was moved. This issue on github helped

How to load tf.keras model directly from cloud bucket?

I try to load tf.keras model direcly from cloud bucket but I can't see easy wat to do it.
I would like to load whole model structure not only weights.
I see 3 possible directions:
Is posssible to load keras model directly from Google cloud bucket? Command tf.keras.model.load_model('gs://my_bucket/model.h5') doesn't work
I tried to use tensorflow.python.lib.ii.file_io but I don't know how to load this as model.
I copied model to local directory by gsutil cp command but I don't know how to wait until operation will be complete. tf try to load model before download operation is complete so the errors occurs
I will be thankful for any sugestions.
Peter
Load the file from gs storage
from tensorflow.python.lib.io import file_io
model_file = file_io.FileIO('gs://mybucket/model.h5', mode='rb')
Save a temporary copy of the model locally
temp_model_location = './temp_model.h5'
temp_model_file = open(temp_model_location, 'wb')
temp_model_file.write(model_file.read())
temp_model_file.close()
model_file.close()
Load model saved locally
model = tf.keras.models.load_model(temp_model_location)

Categories

Resources