I'm trying to run a Keras multi-layer perceptron model using Google Cloud ML engine (following the format put forward in tutorials such as https://github.com/clintonreece/keras-cloud-ml-engine and http://liufuyang.github.io/2017/04/02/just-another-tensorflow-beginner-guide-4.html) and my dataset is in the form of .mat files (which as far as I know are not 7.3 format, so don't need HDF5).
The training set files are in a file called "data" in a Google Cloud storage bucket called project_1; I also have them stored locally. I modified my model for cloud use such that it loads the .mat files as follows:
def train_model (train_file='data', job_dir='./tmp/mlp2', **args):
with file_io.FileIO(train_file + '/train_subject01.mat', mode='r') as a:
train_data = scipy.io.loadmat(a)
etc.
When I run the model locally using gcloud commands (with --train-file ./data ) it works smoothly. However, when I try to deploy it to run on the cloud using
$ export BUCKET_NAME=project_1
....
> --train-file gs://$BUCKET_NAME/data
as seems to be common practice, I get an error message as follows:
The replica master 0 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/mlp2.py", line 195, in <module> train_model(**arguments)
File "/root/.local/lib/python2.7/site-packages/trainer/mlp2.py", line 39, in train_model train_data = scipy.io.loadmat(a)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.py", line 135, in loadmat matfile_dict = MR.get_variables(variable_names)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio5.py", line 272, in get_variables hdr, next_position = self.read_var_header()
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio5.py", line 232, in read_var_header header = self._matrix_reader.read_header(check_stream_limit)
File "scipy/io/matlab/mio5_utils.pyx", line 558, in scipy.io.matlab.mio5_utils.VarReader5.read_header (scipy/io/matlab/mio5_utils.c:5684)
File "scipy/io/matlab/mio5_utils.pyx", line 610, in scipy.io.matlab.mio5_utils.VarReader5.read_header (scipy/io/matlab/mio5_utils.c:5609)
File "scipy/io/matlab/mio5_utils.pyx", line 481, in scipy.io.matlab.mio5_utils.VarReader5.read_int8_string (scipy/io/matlab/mio5_utils.c:4635)
File "scipy/io/matlab/mio5_utils.pyx", line 362, in scipy.io.matlab.mio5_utils.VarReader5.read_element (scipy/io/matlab/mio5_utils.c:3994)
File "scipy/io/matlab/streams.pyx", line 55, in scipy.io.matlab.streams.GenericStream.seek (scipy/io/matlab/streams.c:1401)
TypeError: seek() takes exactly 2 arguments (3 given)
I have no idea what this seek() error means! Am I using the right method to load the file, and if so, why is the issue popping up? Is there an alternative way to load the file?
Related
My code that loads a pre-trained BERT model has been working alright until today I moved it to another, new server. I set up the environment properly, then when loading the 'bert-base-uncased' model, I got this error
Traceback (most recent call last):
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/wop_bert/code/python/src/exp/run_exp_bert_apply.py", line 74, in <module>
input_text_fields)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/wop_bert/code/python/src/classifier/classifier_bert_.py", line 556, in fit_bert_trainonly
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=True)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1140, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/jmain02/home/J2AD003/txk64/zzz70-txk64/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1246, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). We assumed 'bert-base-uncased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
And the line that triggered this error (classifier_bert_.py line 556) is very simple:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
Please can I have some help on how to solve this issue?
Thanks
You have to download it and put in the same directory:
You can download it from here: https://huggingface.co/bert-base-uncased
I met a problem, unable to load PipelineModel
I test my model in practice environment, but unable to apply this model and code on production environment
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 228, in <module>
main(xdr_input_file,model_file,xdr_output_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 215, in main
xdr_df_predict = xdr_predict(xdr_df,model_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 193, in xdr_predict
loadmodel = PipelineModel.load(model_input_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 257, in load
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 197, in load
File "/usr/bch/1.5.0/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel'
21/12/01 12:01:06 INFO SparkContext: Invoking stop() from shutdown hook
Thanks all the help.I am a intern in bigdata industry.This is my first time to post in stackoverflow,i am sorry to post in unreguired method.
Finally,i sovled this problem to adjust my code from spark2.4 to spark2.2.
Here is details about this tracesback:
I test my code in test environment under the version of spark2.4 and python3.7; i meet error when i deploy it in product environment under the version of spark2.2 and python3.7.
I train model under product envirenment, this si Model generate error:
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 331, in
main(xdr_file_path,jingfeng_file_path,save_model_path)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 318, in main
tvs_piplineModel, gbdt_bestModel = generate_model(label_col, xdr_75109_String_title, union_df, save_model_path)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/train/model_generate.py", line 310, in generate_model
tvs_piplineModel.save(save_model_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 217, in save
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 212, in write
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 100, in init
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 249, in _to_java
AttributeError: 'TrainValidationSplitModel' object has no attribute '_to_java'
when i skip model generate to model predict in model which i trained in test envirenment,this is Model predict error:
Traceback (most recent call last):
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 228, in
main(xdr_input_file,model_file,xdr_output_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 215, in main
xdr_df_predict = xdr_predict(xdr_df,model_file)
File "/home/fwfx_yaofei/telbd-yjy/src/ml/complain_user_it/predict/model_predict.py", line 193, in xdr_predict
loadmodel = PipelineModel.load(model_input_path)
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 257, in load
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 197, in load
File "/usr/bch/1.5.0/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
File "/usr/bch/1.5.0/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel'
I check official document it explains ML persistence:version changes of ML persistence
So the error might casused by the vision of spark.
I ignore the "TrainValidationSplitModel" function which mention in first traceback,and it does work.My code run successfully.
Success run screenshot
Conclusion, my code aim to deploy a machine-learning classificaiton model in product environment. So i import pyspark.ml.gbdt to process dataframe. But i ignore the vision of test and product envirenoment.Thanks for all the help,this is the experience of an chinese intern.Forgive my pool expressive ability.
I'm following this data prediction using Cloud ML Engine with scikit-learn tutorial for GCP AI Platforms. I tried to make an API call to BigQuery with:
def query_to_dataframe(query):
import pandas as pd
import pkgutil
privatekey = pkgutil.get_data('trainer', 'privatekey.json')
print(privatekey[:200])
return pd.read_gbq(query,
project_id=PROJECT,
dialect='standard',
private_key=privatekey)
but got the following error:
Traceback (most recent call last):
[...]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 66, in <module>
arguments['numTrees']
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 119, in train_and_evaluate
train_df, eval_df = create_dataframes(frac)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 95, in create_dataframes
train_df = query_to_dataframe(train_query)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 82, in query_to_dataframe
private_key=privatekey)
File "/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py", line 149, in read_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 846, in read_gbq
dialect=dialect, auth_local_webserver=auth_local_webserver)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 184, in __init__
self.credentials = self.get_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 193, in get_credentials
return self.get_service_account_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 413, in get_service_account_credentials
"Private key is missing or invalid. It should be service "
pandas_gbq.gbq.InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from: https://console.developers.google.com/permissions/serviceaccounts
When the package runs in local environment, the private key loads fine, but when submitted as a ml-engine training job, the error occurs. Note that the private key fails to load only when I use GCP RUNTIME_VERSION="1.15" and PYTHON_VERSION="3.7", but can load with no problem when I use PYTHON_VERSION="2.7".
In case it's useful, the structure of my package is:
/babyweight
- setup.py
- trainer
- __init__.py
- model.py
- privatekey.json
- task.py
I'm not sure if the problem is due to a bug in Python, or where I placed privatekey.json.
I was able to solve the problem after I changed read_gbq's attribute for reading BigQuery access key from private_keys to credentials, as recommended by #rmesteves, and as shown here. I then set the value as the absolute path to privatekey.json, as shown here. Now the job is able to run without error.
Note: I only encountered this problem with Python 3+, but not with Python 2.7. I'm not sure why. It could possibly be due to the implementation of read_gbq.
My Google AI Platform / ML Engine training job doesn't seem to have access to the training file I put into a Google Cloud Storage bucket.
Google's AI Platform / ML Engine requires you store training data files in one of their Cloud Storage buckets. Accessing locally from CLI works fine. However, when I send a training job (after ensuring the data is in the appropriate location in my Cloud Storage bucket), I get an error seeming to be due to no access to the bucket Link URL.
The error is from trying to read what looks to me like the contents of a web page that Google served up saying "Hey, you don't have access to this." I see this gaia.loginAutoRedirect.start(5000, and a URL with this flag at the end: noautologin=true.
I know permissions between AI Platform and Cloud Storage are a thing, but both are under the same project. The walkthroughs I'm using at very least imply that no further action is required if under the same project.
I am assuming I need to use the Link URL provided in the bucket Overview tab. Tried the Link for gsutil but the python (from Google's CloudML Samples repo) was upset about using gs://.
I think Google's examples are proving insufficient since their example data is from a public URL rather than a private Cloud Storage bucket.
Ultimately, the error message I get is a Python error. But like I said, this is preceded by a bunch of gross INFO logs of HTML/CSS/JS from Google saying I don't have permission to get the file I'm trying to get. These logs are actually just because I added a print statement to the util.py file as well - right before read_csv() on the train file. (So the Python parse error is due to trying to parse HTML as a CSV).
...
INFO g("gaia.loginAutoRedirect.stop",function(){var b=n;b.b=!0;b.a&&(clearInterval(b.a),b.a=null)});
INFO gaia.loginAutoRedirect.start(5000,
INFO 'https:\x2F\x2Faccounts.google.com\x2FServiceLogin?continue=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26followup=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26service=cds\x26passive=1209600\x26noautologin=true',
ERROR Command '['python', '-m', u'trainer.task', u'--train-files', u'gs://<BUCKET_NAME>/data/train.csv', u'--eval-files', u'gs://<BUCKET_NAME>/data/test.csv', u'--batch-pct', u'0.2', u'--num-epochs', u'1000', u'--verbosity', u'DEBUG', '--job-dir', u'gs://<BUCKET_NAME>/predictor']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 137, in <module>
train_and_evaluate(args)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 80, in train_and_evaluate
train_x, train_y, eval_x, eval_y = util.load_data()
File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 168, in load_data
train_df = pd.read_csv(training_file_path, header=0, names=_CSV_COLUMNS, na_values='?')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
ParserError: Error tokenizing data. C error: Expected 5 fields in line 205, saw 961
To get the data, I'm more or less trying to mimic this:
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tf-keras/trainer/util.py
Various ways I have tried to address my bucket in my copy of util.py:
https://console.cloud.google.com/storage/browser/<BUCKET_NAME>/data (think this was the "Link URL" back in May)
https://storage.cloud.google.com/<BUCKET_NAME>/data (this is the "Link URL" now - in July)
gs://<BUCKET_NAME>/data (this is the URI - which gives a different error about not liking gs as a url type)
Transferring the answer from a comment above:
Looks like the URL approach requires cookie based authentication if it's not a public object. Instead of using a URL, I would suggest using tf.gfile with a gs:// path, as is used in the Keras sample. If you need to download the file from GCS in a separate step, you can use the GCS client library.
So, I`m comparing some python test frameworks and came across behave. Thought it was interesting and worth a test drive.
Followed the steps on the tutorial, available at:
https://behave.readthedocs.io/en/stable/tutorial.html
When I ran the behave command on Powershell (Win10 and Python 2.7.10), I got the following error:
Exception TypeError: compile() expected string without null bytes
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\lib\site-packages\behave\__main__.py", line 187, in <module>
sys.exit(main())
File "C:\Python27\lib\site-packages\behave\__main__.py", line 183, in main
return run_behave(config)
File "C:\Python27\lib\site-packages\behave\__main__.py", line 127, in run_behave
failed = runner.run()
File "C:\Python27\lib\site-packages\behave\runner.py", line 804, in run
return self.run_with_paths()
File "C:\Python27\lib\site-packages\behave\runner.py", line 809, in run_with_paths
self.load_step_definitions()
File "C:\Python27\lib\site-packages\behave\runner.py", line 796, in load_step_definitions
load_step_modules(step_paths)
File "C:\Python27\lib\site-packages\behave\runner_util.py", line 412, in load_step_modules
exec_file(os.path.join(path, name), step_module_globals)
File "C:\Python27\lib\site-packages\behave\runner_util.py", line 385, in exec_file
code = compile(f.read(), filename2, "exec", dont_inherit=True)
TypeError: compile() expected string without null bytes
Has anyone encountered this error while trying to run behave? (Found some threads online related mainly to flask issues but I couldn't solve the problem)
Answering my own question here.
It was an encoding problem.
Sublime was saving my files with an encoding different from UTF-8.
File -> Save with Encoding -> UTF-8 did the trick.