Model name 'bert-base-uncased' was not found in tokenizers - python

My code that loads a pre-trained BERT model has been working alright until today I moved it to another, new server. I set up the environment properly, then when loading the 'bert-base-uncased' model, I got this error
Traceback (most recent call last):
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/wop_bert/code/python/src/exp/run_exp_bert_apply.py", line 74, in <module>
input_text_fields)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/wop_bert/code/python/src/classifier/classifier_bert_.py", line 556, in fit_bert_trainonly
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=True)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1140, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1246, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). We assumed 'bert-base-uncased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
And the line that triggered this error (classifier_bert_.py line 556) is very simple:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
Please can I have some help on how to solve this issue?
Thanks

You have to download it and put in the same directory:
You can download it from here: https://huggingface.co/bert-base-uncased

Related

OSError: SavedModel file does not exist at: model/CPN_Model.h5\{saved_model.pbtxt|saved_model.pb}

I am using python 3.9.10
Traceback (most recent call last):
File "C:\Users\AbdiShakra\OneDrive\Documents\Diagnosticc-main\server.py", line 4, in
import prediction
File "C:\Users\AbdiShakra\OneDrive\Documents\Diagnosticc-main\prediction.py", line 5, in
model = keras.models.load_model("model/CPN_Model.h5")
File "C:\Users\AbdiShakra\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\save.py", line 205, in load_model
return saved_model_load.load(filepath, compile, options)
File "C:\Users\AbdiShakra\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\saved_model\load.py", line 108, in load
meta_graph_def = tf.internal.saved_model.parse_saved_model(path).meta_graphs[0]
File "C:\Users\AbdiShakra\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 118, in parse_saved_model
raise IOError(
OSError: SavedModel file does not exist at: model/CPN_Model.h5{saved_model.pbtxt|saved_model.pb}
To load a model with tf.keras only pass the directory which contains the model, instead of the .pb or .h5 file itself. /tmp/saved_model may be a valid path, while /tmp/saved_model/mymodel.pb is not. Check the documentation to see some example code.

pyspark unable load pipelineModel

I met a problem, unable to load PipelineModel
I test my model in practice environment, but unable to apply this model and code on production environment
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 228, in <module>
main(xdr_input_file,model_file,xdr_output_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 215, in main
xdr_df_predict = xdr_predict(xdr_df,model_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 193, in xdr_predict
loadmodel = PipelineModel.load(model_input_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 257, in load
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 197, in load
File "/usr/bch/1.5.0/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel'
21/12/01 12:01:06 INFO SparkContext: Invoking stop() from shutdown hook
Thanks all the help.I am a intern in bigdata industry.This is my first time to post in stackoverflow,i am sorry to post in unreguired method.
Finally,i sovled this problem to adjust my code from spark2.4 to spark2.2.
Here is details about this tracesback:
I test my code in test environment under the version of spark2.4 and python3.7; i meet error when i deploy it in product environment under the version of spark2.2 and python3.7.
I train model under product envirenment, this si Model generate error:
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 331, in
main(xdr_file_path,jingfeng_file_path,save_model_path)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 318, in main
tvs_piplineModel, gbdt_bestModel = generate_model(label_col, xdr_75109_String_title, union_df, save_model_path)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 310, in generate_model
tvs_piplineModel.save(save_model_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 217, in save
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 212, in write
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 100, in init
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 249, in _to_java
AttributeError: 'TrainValidationSplitModel' object has no attribute '_to_java'
when i skip model generate to model predict in model which i trained in test envirenment,this is Model predict error:
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 228, in
main(xdr_input_file,model_file,xdr_output_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 215, in main
xdr_df_predict = xdr_predict(xdr_df,model_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 193, in xdr_predict
loadmodel = PipelineModel.load(model_input_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 257, in load
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 197, in load
File "/usr/bch/1.5.0/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel'
I check official document it explains ML persistence:version changes of ML persistence
So the error might casused by the vision of spark.
I ignore the "TrainValidationSplitModel" function which mention in first traceback,and it does work.My code run successfully.
Success run screenshot
Conclusion, my code aim to deploy a machine-learning classificaiton model in product environment. So i import pyspark.ml.gbdt to process dataframe. But i ignore the vision of test and product envirenoment.Thanks for all the help,this is the experience of an chinese intern.Forgive my pool expressive ability.

"Private key is missing or invalid. It should be service " when making BigQuery call with credential

I'm following this data prediction using Cloud ML Engine with scikit-learn tutorial for GCP AI Platforms. I tried to make an API call to BigQuery with:
def query_to_dataframe(query):
import pandas as pd
import pkgutil
privatekey = pkgutil.get_data('trainer', 'privatekey.json')
print(privatekey[:200])
return pd.read_gbq(query,
project_id=PROJECT,
dialect='standard',
private_key=privatekey)
but got the following error:
Traceback (most recent call last):
[...]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 66, in <module>
arguments['numTrees']
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 119, in train_and_evaluate
train_df, eval_df = create_dataframes(frac)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 95, in create_dataframes
train_df = query_to_dataframe(train_query)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 82, in query_to_dataframe
private_key=privatekey)
File "/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py", line 149, in read_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 846, in read_gbq
dialect=dialect, auth_local_webserver=auth_local_webserver)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 184, in __init__
self.credentials = self.get_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 193, in get_credentials
return self.get_service_account_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 413, in get_service_account_credentials
"Private key is missing or invalid. It should be service "
pandas_gbq.gbq.InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from: https://console.developers.google.com/permissions/serviceaccounts
When the package runs in local environment, the private key loads fine, but when submitted as a ml-engine training job, the error occurs. Note that the private key fails to load only when I use GCP RUNTIME_VERSION="1.15" and PYTHON_VERSION="3.7", but can load with no problem when I use PYTHON_VERSION="2.7".
In case it's useful, the structure of my package is:
/babyweight
- setup.py
- trainer
- __init__.py
- model.py
- privatekey.json
- task.py
I'm not sure if the problem is due to a bug in Python, or where I placed privatekey.json.
I was able to solve the problem after I changed read_gbq's attribute for reading BigQuery access key from private_keys to credentials, as recommended by #rmesteves, and as shown here. I then set the value as the absolute path to privatekey.json, as shown here. Now the job is able to run without error.
Note: I only encountered this problem with Python 3+, but not with Python 2.7. I'm not sure why. It could possibly be due to the implementation of read_gbq.

Post-training quantization with tflite cause runtime error

I am trying to quantize my model (specifically pretrained faster_rcnn_inception_v2 on coco, that was downloaded from the model zoo), in hopes to speedup inference time.
I use the following code from here:
import tensorflow as tf
converter = tf.lite.TocoConverter.from_saved_model(saved_model_dir)
converter.post_training_quantize = True
tflite_quantized_model = converter.convert()
open("quantized_model.tflite", "wb").write(tflite_quantized_model)
Models directory didnt have saved_model.pb file. So i renamed frozen_inference_graph.pb to saved_model.pb.
Running the code above produce the following runtime error:
Traceback (most recent call last):
File "/home/juggernaut/pycharm-community-2018.2.4/helpers/pydev/pydevd.py", line 1664, in <module>
main()
File "/home/juggernaut/pycharm-community-2018.2.4/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/juggernaut/pycharm-community-2018.2.4/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/hdd/motorola/motorola_heads/tensorflow_face_detection/quantize.py", line 5, in <module>
converter = tf.lite.TocoConverter.from_saved_model(saved_model_dir)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 318, in new_func
return func(*args, **kwargs)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/lite/python/lite.py", line 587, in from_saved_model
tag_set, signature_key)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/lite/python/lite.py", line 376, in from_saved_model
output_arrays, tag_set, signature_key)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/lite/python/convert_saved_model.py", line 254, in freeze_saved_model
meta_graph = get_meta_graph_def(saved_model_dir, tag_set)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/lite/python/convert_saved_model.py", line 61, in get_meta_graph_def
return loader.load(sess, tag_set, saved_model_dir)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 318, in new_func
return func(*args, **kwargs)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 269, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 420, in load
**saver_kwargs)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 347, in load_graph
meta_graph_def = self.get_meta_graph_def_from_tags(tags)
File "/hdd/motorola/venv_py27_tf1.10/local/lib/python2.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 323, in get_meta_graph_def_from_tags
" could not be found in SavedModel. To inspect available tag-sets in"
RuntimeError: MetaGraphDef associated with tags set(['serve']) could not be found in SavedModel. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: `saved_model_cli`
What does it mean and what should i do?
Please refer to this issue. They seem to have the same issue as you.
This may be fixed in a more recent version of Tensorflow (perhaps the tag has switched from 'serve' to 'serving' in the meantime).
You should use tf.saved_model.simple_save to save the pb model.

Loading matlab files using Python's scipy (from a Google Cloud bucket)

I'm trying to run a Keras multi-layer perceptron model using Google Cloud ML engine (following the format put forward in tutorials such as https://github.com/clintonreece/keras-cloud-ml-engine and http://liufuyang.github.io/2017/04/02/just-another-tensorflow-beginner-guide-4.html) and my dataset is in the form of .mat files (which as far as I know are not 7.3 format, so don't need HDF5).
The training set files are in a file called "data" in a Google Cloud storage bucket called project_1; I also have them stored locally. I modified my model for cloud use such that it loads the .mat files as follows:
def train_model (train_file='data', job_dir='./tmp/mlp2', **args):
with file_io.FileIO(train_file + '/train_subject01.mat', mode='r') as a:
train_data = scipy.io.loadmat(a)
etc.
When I run the model locally using gcloud commands (with --train-file ./data ) it works smoothly. However, when I try to deploy it to run on the cloud using
$ export BUCKET_NAME=project_1
....
> --train-file gs://$BUCKET_NAME/data
as seems to be common practice, I get an error message as follows:
The replica master 0 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/mlp2.py", line 195, in <module> train_model(**arguments)
File "/root/.local/lib/python2.7/site-packages/trainer/mlp2.py", line 39, in train_model train_data = scipy.io.loadmat(a)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.py", line 135, in loadmat matfile_dict = MR.get_variables(variable_names)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio5.py", line 272, in get_variables hdr, next_position = self.read_var_header()
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio5.py", line 232, in read_var_header header = self._matrix_reader.read_header(check_stream_limit)
File "scipy/io/matlab/mio5_utils.pyx", line 558, in scipy.io.matlab.mio5_utils.VarReader5.read_header (scipy/io/matlab/mio5_utils.c:5684)
File "scipy/io/matlab/mio5_utils.pyx", line 610, in scipy.io.matlab.mio5_utils.VarReader5.read_header (scipy/io/matlab/mio5_utils.c:5609)
File "scipy/io/matlab/mio5_utils.pyx", line 481, in scipy.io.matlab.mio5_utils.VarReader5.read_int8_string (scipy/io/matlab/mio5_utils.c:4635)
File "scipy/io/matlab/mio5_utils.pyx", line 362, in scipy.io.matlab.mio5_utils.VarReader5.read_element (scipy/io/matlab/mio5_utils.c:3994)
File "scipy/io/matlab/streams.pyx", line 55, in scipy.io.matlab.streams.GenericStream.seek (scipy/io/matlab/streams.c:1401)
TypeError: seek() takes exactly 2 arguments (3 given)
I have no idea what this seek() error means! Am I using the right method to load the file, and if so, why is the issue popping up? Is there an alternative way to load the file?

Categories

Resources