Get the path of saved_model.pb after training on ML engine - python

I have been using the python client API of ML engine to create training jobs of some canned estimators. What I'm not able to do is get the path of the saved_model.pb on GCS because the path it is stored in has a timestamp as a dir name. Is there anyway I can get this using a regular expression or something on python client, so that I'll be able to deploy the model with correct path.
The path seems to be in this format right now -
gs://bucket_name/outputs/export/serv/timestamp/saved_model.pb
UPDATE
Thanks shahin for the answer.
So I wrote this, which gives me the exact path that I can pass to the deploy_uri for ml engine.
from google.cloud import storage
def getGCSPath(prefix):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
mlist = bucket.list_blobs(prefix=prefix)
for line in mlist:
if 'saved_model.pb' in line.name:
return line.name[:-14]
# print getGCSPath('output/export/serv/')

Use gsutil and tail:
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/outputs/export/serv | tail -1)
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version $TFVERSION

import os
import cloudstorage as gcs
bucket = os.environ.get('BUCKET')
page_size = 1
stats = gcs.listbucket(bucket + '/outputs/export/serv', max_keys=page_size)

Related

Cannot load azure.ml workspace from within a webservice entry_script! Where is the `/var/azureml-app/` located?

I am creating an azure-ml webservice. The following script shows the code for creating the webservice and deploying it locally.
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
ws = Workspace.from_config()
model = Model(ws,'textDNN-20News')
ws.write_config(file_name='config.json')
env = Environment(name="init-env")
python_packages = ['numpy', 'pandas']
for package in python_packages:
env.python.conda_dependencies.add_pip_package(package)
dummy_inference_config = InferenceConfig(
environment=env,
source_directory="./source_dir",
entry_script="./init_score.py",
)
from azureml.core.webservice import LocalWebservice
deployment_config = LocalWebservice.deploy_configuration(port=6789)
service = Model.deploy(
ws,
"myservice",
[model],
dummy_inference_config,
deployment_config,
overwrite=True,
)
service.wait_for_deployment(show_output=True)
As it can be seen, the above code deploys "entry_script = init_score.py" to my local machine. Within the entry_script, I need to load the workspace again to connect to azure SQL database. I do it like the following :
from azureml.core import Dataset, Datastore
from azureml.data.datapath import DataPath
from azureml.core import Workspace
def init():
pass
def run(data):
try:
ws = Workspace.from_config()
# create tabular dataset from a SQL database in datastore
datastore = Datastore.get(ws, 'sql_db_name')
query = DataPath(datastore, 'SELECT * FROM my_table')
tabular = Dataset.Tabular.from_sql_query(query, query_timeout=10)
df = tabular.to_pandas_dataframe()
return len(df)
except Exception as e:
output0 = "{}:".format(type(e).__name__)
output1 = "{} ".format(e)
output2 = f"{type(e).__name__} occured at line {e.__traceback__.tb_lineno} of {__file__}"
return output0 + output1 + output2
The try-catch block is for catching the potential exception thrown and return it as an output.
The exception that I keep getting is:
UserErrorException: The workspace configuration file config.json, could not be found in /var/azureml-app or its
parent directories. Please check whether the workspace configuration file exists, or provide the full path
to the configuration file as an argument. You can download a configuration file for your workspace,
via http://ml.azure.com and clicking on the name of your workspace in the right top.
I have actually tried to save the config file by passing an absolute path to the path argument of both ws.write_config(path='my_absolute_path'), and also when loading it to the Workspace.from_config(path='my_absolute_path'), but I got pretty much the same error:
UserErrorException: The workspace configuration file config.json, could not be found in /var/azureml-app/my_absolute_path or its
parent directories. Please check whether the workspace configuration file exists, or provide the full path
to the configuration file as an argument. You can download a configuration file for your workspace,
via http://ml.azure.com and clicking on the name of your workspace in the right top.
Looks like even providing the path does not change the root directory that the entry script starts locating from.
I also tried to directly saving the file to /var/azureml-app/, but this path is not recognized when I passed it to the ws.write_config(path='/var/azureml-app/').
Do you have any idea where exactly is the /var/azureml-app/?
Any idea on how to fix this?

AzureDevOPS ML Error: We could not find config.json in: /home/vsts/work/1/s or in its parent directories

I am trying to create an Azure DEVOPS ML Pipeline. The following code works 100% fine on Jupyter Notebooks, but when I run it in Azure Devops I get this error:
Traceback (most recent call last):
File "src/my_custom_package/data.py", line 26, in <module>
ws = Workspace.from_config()
File "/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/azureml/core/workspace.py", line 258, in from_config
raise UserErrorException('We could not find config.json in: {} or in its parent directories. '
azureml.exceptions._azureml_exception.UserErrorException: UserErrorException:
Message: We could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories.
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "We could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories."
}
}
The code is:
#import
from sklearn.model_selection import train_test_split
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.experiment import Experiment
from datetime import date
from azureml.core import Workspace, Dataset
import pandas as pd
import numpy as np
import logging
#getdata
subscription_id = 'mysubid'
resource_group = 'myrg'
workspace_name = 'mlplayground'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='correctData')
#auto ml
ws = Workspace.from_config()
automl_settings = {
"iteration_timeout_minutes": 2880,
"experiment_timeout_hours": 48,
"enable_early_stopping": True,
"primary_metric": 'spearman_correlation',
"featurization": 'auto',
"verbosity": logging.INFO,
"n_cross_validations": 5,
"max_concurrent_iterations": 4,
"max_cores_per_iteration": -1,
}
cpu_cluster_name = "computecluster"
compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print(compute_target)
automl_config = AutoMLConfig(task='regression',
compute_target = compute_target,
debug_log='automated_ml_errors.log',
training_data = dataset,
label_column_name="paidInDays",
**automl_settings)
today = date.today()
d4 = today.strftime("%b-%d-%Y")
experiment = Experiment(ws, "myexperiment"+d4)
remote_run = experiment.submit(automl_config, show_output = True)
from azureml.widgets import RunDetails
RunDetails(remote_run).show()
remote_run.wait_for_completion()
There is something weird happening on your code, you are getting the data from a first workspace (workspace = Workspace(subscription_id, resource_group, workspace_name)), then using the resources from a second one (ws = Workspace.from_config()). I would suggest avoiding having code relying on two different workspaces, especially when you know that an underlying datasource can be registered (linked) to multiple workspaces (documentation).
In general using a config.json file when instantiating a Workspace object will result in an interactive authentication. When your code will be processed and you will have a log asking you to reach a specific URL and enter a code. This will use your Microsoft account to verify that you are authorized to access the Azure resource (in this case your Workspace('mysubid', 'myrg', 'mlplayground')). This has its limitations when you start deploying the code onto virtual machines or agents, you will not always manually check the logs, access the URL and authenticate yourself.
For this matter it is strongly recommended setting up more advanced authentication methods and personally I would suggest using the service principal one since it is simple, convinient and secure if done properly.
You can follow Azure's official documentation here.
You need to provide a config path to Workspace.from_config().
Under https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py you find the following explanation how to create a config file:
Create a Workspace:
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
Save the workspace config:
ws.write_config(path="./file-path", file_name="config.json")
load the config from the default path:
ws = Workspace.from_config()
ws.get_details()
or load the config from a specified path:
ws = Workspace.from_config(path="my/path/config.json")
more details about how to create a Workspace from_config can be found here:
https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-

Error when serving attention_ocr model ("error": "Expected one or two output Tensors, found 17")

I'm trying to serve attention_ocr model on docker with tensorflow/serving image.
First, I trained this model with own dataset and get a good result with demo_inference.py
So, I'm export the trained model with export_model.py
python export_model.py --checkpoint=model.ckpt-111111 --export_dir=/tmp/mydir
Then, run docker container for serving the model.
docker run -it --rm -p 8501:8501 -v /tmp/mydir:/models/aocr -e MODEL_NAME=aocr --gpus all tensorflow/serving
And this is my python client script.
data_dir = '/root/src/models/research/attention_ocr/python/datasets/data/demo/'
data_files = os.listdir(data_dir)
with open(data_dir + "0.jpg", "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
## Some requests I tried ##
# predict_request = '{"examples": [{"inputs": ["%s"]}]}' % encoded_string
# predict_request = '{"examples": [{"inputs": "%s"}]}' % encoded_string
predict_request = '{"examples": [{"inputs": {"b64": ["%s"]}}]}' % encoded_string
r = requests.post('http://MY_IP_ADDR:8501/v1/models/aocr:classify', data=predict_request)
print(r.text)
Result.. "error": "Expected one or two output Tensors, found 17"
This is the first time using tensorflow/serving. I can't handle this error.
Please help this newbie.. Thanks in advance.
Thank you for reporting this issue. I filed a bug (#9264) on Github on your behalf. The issue is that the default signature includes all the endpoints that the model provides. If you want to use the Serving's Classification API, we need to modify the export_model script to export just the 2 tensors expected by the classification API (i.e., predictions and scores). In the meantime, you can use the Predict API, which supports an arbitrary number of output tensors. Please note that when using the predict API via GRPC you can specify output_filter, but the RESTful API does not have that option, so the response is pretty heavy, since it sends back all the attention masks and the raw image. In case somebody else is trying to figure out how to run inference, here are steps that worked for me.
Export the model:
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz
tar -xzvf attention_ocr_2017_08_09.tar.gz
python model_export.py --checkpoint=model.ckpt-399731 \
--export_dir=/tmp/aocr_b1 --batch_size=1
Note that the --batch_size=1 is needed due to a bug in model_export.py. I'll take care of it when send the PR for the signature issue.
Run the docker container for serving the model.
sudo docker run -t --rm -p 8501:8501 \
-v /tmp/aocr_b1:/models/aocr/1 -e MODEL_NAME=aocr tensorflow/serving
Please note that the path needs to contain a version number /models/aocr/1. If you don't append /1 the server complains that it could not find any versions.
Run the script
python send_serving_request.py --image_file=testdata/fsns_train_00.png
Here are the results
Prediction: Rue de la Gare░░░░░░░░░░░░░░░░░░░░░░░
Confidence: 0.899479449
Here is the code:
send_serving_request.py
from absl import app
from absl import flags
import base64
import json
import os
from PIL import Image
import numpy as np
import requests
import tensorflow as tf
flags.DEFINE_string('image_file', None,
'Name of file containing image to request.')
def create_serialized_tf_example(image):
"""Create a serialized tf.Example proto for feeding the model."""
example = tf.train.Example()
example.features.feature['image/encoded'].float_list.value.extend(
list(np.reshape(image, (-1))))
return example.SerializeToString()
def main(_):
pil_image = Image.open(flags.FLAGS.image_file)
encoded_string = base64.b64encode(
create_serialized_tf_example(np.asarray(pil_image)))
predict_request = (
b'{"instances": [{"inputs": {"b64": "%s"}}]}') % encoded_string
r = requests.post(
'http://localhost:8501/v1/models/aocr:predict', data=predict_request)
data = json.loads(r.text)
print('Prediction:', data['predictions'][0]['predicted_text'])
print('Confidence:', data['predictions'][0]['normalized_seq_conf'])
if __name__ == '__main__':
flags.mark_flag_as_required('image_file')
app.run(main)

Dataflow Error: 'Clients have non-trivial state that is local and unpickleable'

I have a pipeline that I can execute locally without any errors. I used to get this error in my locally run pipeline
'Clients have non-trivial state that is local and unpickleable.'
PicklingError: Pickling client objects is explicitly not supported.
I believe I fixed this by downgrading to apache-beam=2.3.0
Then locally it would run perfectly.
Now I am using DataflowRunner and in the requirements.txt file I have the following dependencies
apache-beam==2.3.0
google-cloud-bigquery==1.1.0
google-cloud-core==0.28.1
google-cloud-datastore==1.6.0
google-cloud-storage==1.10.0
protobuf==3.5.2.post1
pytz==2013.7
but I get this dreaded error again
'Clients have non-trivial state that is local and unpickleable.'
PicklingError: Pickling client objects is explicitly not supported.
How come it's giving me the error with DataflowRunner but not DirectRunner? shouldn't they be using the same dependencies/environment?
Any help would be appreciated.
I had read that this is the way to solve it but when I try it I still get the same error
class MyDoFn(beam.DoFn):
def start_bundle(self, process_context):
self._dsclient = datastore.Client()
def process(self, context, *args, **kwargs):
# do stuff with self._dsclient
from https://github.com/GoogleCloudPlatform/google-cloud-python/issues/3191
My previous reference post where I fixed this locally:
Using start_bundle() in apache-beam job not working. Unpickleable storage.Client()
Thanks in advance!
Initializing unpickleable clients in start_bundle method is a correct approach, and Beam IOs often follow that, see datastoreio.py as an example. Here is a pipeline that does a simple operation with a GCS python client in a DoFn. I ran it on Apache Beam 2.16.0 without issues. If you can still reproduce your issue, please provide additional details.
gcs_client.py file:
import argparse
import logging
import time
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud import storage
class MyDoFn(beam.DoFn):
def start_bundle(self):
self.storage_client = storage.Client()
def process(self, element):
bucket = self.storage_client.get_bucket("existing-gcs-bucket")
blob = bucket.blob(str(int(time.time())))
blob.upload_from_string("payload")
return element
logging.getLogger().setLevel(logging.INFO)
_, options = argparse.ArgumentParser().parse_known_args()
pipeline_options = PipelineOptions(options)
p = beam.Pipeline(options=pipeline_options)
_ = p | beam.Create([None]) | beam.ParDo(MyDoFn())
p.run().wait_until_finish()
requirements.txt file:
google-cloud-storage==1.23.0
command line:
python -m gcs_client \
--project=insert_your_project \
--runner=DataflowRunner \
--temp_location gs://existing-gcs-bucket/temp/ \
--requirements_file=requirements.txt \
--save_main_session
I've had a similar issue when making Dataflow write a bunch of rows to Bigtable. Setting --save-main-session to False seems to have solved it.

How can I push AWS CodeCommit to S3 using Lambda?

Python is my preferred language but any supported by Lambda will do.
-- All AWS Architecture --
I have Prod, Beta, and Gamma branches and corresponding folders in S3. I am looking for a method to have Lambda respond to a CodeCommit trigger and based on the Branch that triggered it, clone the repo and place the files in the appropriate S3 folder.
S3://Example-Folder/Application/Prod
S3://Example-Folder/Application/Beta
S3://Example-Folder/Application/Gamma
I tried to utilize GitPython but it does not work because Lambda does not have Git installed on the base Lambda AMI and GitPython depends on it.
I also looked through the Boto3 docs and there are only custodial tasks available; it is not able to return the project files.
Thank you for the help!
The latest version of the boto3 codecommit includes the methods get_differences and get_blob.
You can get all the content of a codecommit repository using these two methods (at least, if you are not interested in the retaining the .git history).
The script below takes all the content of the master branch and adds it to a tar file. Afterwards you could upload it to s3 as you please.
You can run this as a lambda function, which can be invoked when you push to codecommit.
This works with the current lambda python 3.6 environment.
botocore==1.5.89
boto3==1.4.4
import boto3
import pathlib
import tarfile
import io
import sys
def get_differences(repository_name, branch="master"):
response = codecommit.get_differences(
repositoryName=repository_name,
afterCommitSpecifier=branch,
)
differences = []
while "nextToken" in response:
response = codecommit.get_differences(
repositoryName=repository_name,
afterCommitSpecifier=branch,
nextToken=response["nextToken"]
)
differences += response.get("differences", [])
else:
differences += response["differences"]
return differences
if __name__ == "__main__":
repository_name = sys.argv[1]
codecommit = boto3.client("codecommit")
repository_path = pathlib.Path(repository_name)
buf = io.BytesIO()
with tarfile.open(None, mode="w:gz", fileobj=buf) as tar:
for difference in get_differences(repository_name):
blobid = difference["afterBlob"]["blobId"]
path = difference["afterBlob"]["path"]
mode = difference["afterBlob"]["mode"] # noqa
blob = codecommit.get_blob(
repositoryName=repository_name, blobId=blobid)
tarinfo = tarfile.TarInfo(str(repository_path / path))
tarinfo.size = len(blob["content"])
tar.addfile(tarinfo, io.BytesIO(blob["content"]))
tarobject = buf.getvalue()
# save to s3
Looks like LambCI does exactly you want.
Unfortunately, currently CodeCommit doesn’t have an API to upload the repository to S3 bucket. However, if you are open to trying out CodePipeline, You can configure AWS CodePipeline to use a branch in an AWS CodeCommit repository as the source stage for your code. In this way, when you make changes to your selected tracking branch in CodePipeline, an archive of the repository at the tip of that branch will be delivered to your CodePipelie bucket. For more information about CodePipeline, please refer to following link:
http://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-simple-codecommit.html

Categories

Resources