I started working on sagemaker recently and I'm trying to understand what each line of code does in sagemaker examples.
I'm stuck at following code. I'm working on logistic regression of bank data.
from sagemaker.amazon.amazon_estimator import get_image_uri
Can anyone explain the what get_image_uri does?
Also can anyone share a link or something where each line of code related to sagemaker is explained.
unfortunately I can't do much better than the source code, which says:
Return algorithm image URI for the given AWS region, repository name, and repository version
the link by PV8 has demo code, but it's basically getting a HTTPS URL that points to a "disk drive" image that is then used by AWS to spin up a new EC2 container with Jupyter configured and running
Amazon SageMaker is designed to be open and extensible, and it is using Docker images as the way to communicate between the development (notebooks), training and tuning, and finally hosting for real-time and batch prediction.
When you want to submit a training job, for example, you need to point the docker image that is holding the algorithm and pre/post-processing code that you want to execute as part of your training.
Amazon SageMaker is providing a set of built-in algorithms that you can use out of the box to train models in scale (mostly optimized for distributed training). These algorithms are identified by their name, and the above line of python code is mapping between the name and the URI of the docker image that Amazon provided in the container registry service - ECR.
It's because of a deprecation in the latest version of Amazon packages.
Just force the use of previous versions, by adding to the very beginning of the notebook:
import sys
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=1.71.0,<2.0.0"
Now when loading the method you want:
from sagemaker.amazon.amazon_estimator import get_image_uri
you will just get a deprecation warning, but the code works fine anyway:
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
Cheers
Related
I have done quite a few google searches but have not found a clear answer to the following use case. Basically, I would rather use cloud 9 (most of the time) as my IDE rather than Jupyter. What I am confused/not sure about is, how I could executed long running jobs like (Bayesian) hyper parameter optimisation from there. Can I use Sagemaker capabilities? Should I use docker and deploy to ECR (looking for the cheapest-ish option)? Any pointers w.r.t. to this particular issue would be very much appreciated. Thanks.
You could use whatever IDE you choose (including your laptop).
SaegMaker tuning job (example) is asynchronous, so you can safely close your IDE after launching it. You can monitor the job the AWS web console, or with a DescribeHyperParameterTuningJob API call.
You can launch TensorFlow, PyTorch, XGBoost, Scikit-learn, and other popular ML frameworks, using one of the built-in framework containers, avoiding the extra work of bringing your own container.
I am trying to deploy a Python ML app (made using Streamlit) to a server. This app essentially loads a NN model that I previously trained and makes classification predictions using this model.
The problem I am running into is that because TensorFlow is such a large package (at least 150MB for the latest tensorflow-cpu version) the hosting service I am trying to use (Heroku) keeps telling me that I exceed the storage limit of 300MB.
I was wondering if anyone else had similar problems or an idea of how to fix/get around this issue?
What I've tried so far
I've already tried replacing the tensorflow requirement with tensorflow-cpu which did significantly reduce the size, but it was still too big so -
I also tried downgrading the tensorflow-cpu version to tensorflow-cpu==2.1.0 which finally worked but then I ran into issues on model.load() (which I think might be related to the fact that I downgraded the tf version since it works fine locally)
I've faced the same problem last year. I know this does not answer your Heroku specific question, but my solution was to use Docker with AWS Beanstalk. It worked out cheaper than Heroku and I had less issues with deployment. I can guide on how to do this if you are interested
You might have multiple modules downloaded. I would recommend you to open file explorer and see the actual directory of the downloaded modules.
We have a production scenario with users invoking expensive NLP functions running for short periods of time (say 30s). Because of the high load and intermittent usage, we're looking into Lambda function deployment. However - our packages are big.
I'm trying to fit AllenNLP in a lambda function, which in turn depends on pytorch, scipy, spacy and numpy and a few other libs.
What I've tried
Following recommendations made here and the example here, tests and additional files are removed. I also use a non-cuda version of Pytorch which gets its' size down. I can package an AllenNLP deployment down to about 512mb. Currently, this is still too big for AWS Lambda.
Possible fixes?
I'm wondering if anyone of has experience with one of the following potential pathways:
Cutting PyTorch out of AllenNLP. Without Pytorch, we're in reach of getting it to 250mb. We only need to load archived models in production, but that does seem to use some of the PyTorch infrastructure. Maybe there are alternatives?
Invoking PyTorch in (a fork of) AllenNLP as a second lambda function.
Using S3 to deliver some of the dependencies: SIMlinking some of the larger .so files and serving them from an S3 bucket might help. This does create an additional problem: the Semnatic Role Labelling we're using from AllenNLP also requires some language models of around 500mb, for which the ephemeral storage could be used - but maybe these can be streamed directly into RAM from S3?
Maybe i'm missing an easy solution. Any direction or experiences would be much appreciated!
You could deploy your models to SageMaker inside of AWS, and run Lambda -> Sagemaker to avoid having to load up very large functions inside of a Lambda.
Architecture explained here - https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/
I have to deploy a Deep Learning model on AWS Lambda which does object detection. It is triggered on addition of image in the S3 bucket.
The issue that I'm facing is that the Lambda function code uses a lot of libraries like Tensorflow, PIL, Numpy, Matplotlib, etc. and if I try adding all of them in the function code or as layers, it exceeds the 250 MB size limit.
Is there any way I can deploy the libraries zip file on S3 bucket and use them from there in the function code (written in Python 3.6) instead of directly having them as a part of the code?
I can also try some entirely different approach for this.
It sounds to me like Amazon SageMaker would be a better choice for your task.
You can create a model and host it on an endpoint all via SageMaker. The use a lambda function triggered by your s3 upload to pass the image to your SageMaker endpoint and process the result.
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
https://aws.amazon.com/sagemaker/
Try out pre-compiled packs for Tensorflow, they are about 50MB, you probably would have enough space for the rest of modules you need. And in general, check the AWS recommendations on deep learning architecture approaches.
Ultimately, I changed my approach from using Lambda to using EC2.
I deployed the whole code with libraries on an EC2 instance and then triggered it using Lambda. On EC2, it can also be deployed on Apache server to change port mapping.
All,
(Environments: Windows 7, Python 3.6, Keras & tensorflow libs, gcloud ml engine)
I am running certain Keras ML model examples using gcloud ml engine as introduced here. Everything was fine but I just got various results among multiple runs although I was using the same training and validation data. My goal is to make reproductive training results from multiple runs.
I googled it for a while and found some solutions in this Keras Q&A regarding making reproductive results. Basically they first suggested this:
First, you need to set the PYTHONHASHSEED environment variable to 0 before the program starts (not within the program itself).
I know I could set the variables locally on my own machine, or I could set it when I deploy gcloud function as introduced here.
But, I just do not know how to set environment variables when I was using gcloud ML engine (on server side but NOT on local). So I cannot set "PYTHONHASHSEED=0" on the gcloud server when my model programs running there.
BTW, in general I know the randomness is a useful nature in the ML field but I am not very familiar about the topic of making reproductive results yet, so any thoughts regarding this topic are also welcome. Thanks!
Daqi
PS:
I have tried to setting the environment variables in running time below:
import os
os.environ["PYTHONHASHSEED"] = "0"
print(hash("keras"))
But it cannot make the effect that "setting the variables before the program starts". So by having this code, I still cannot get the same hash results from the multiple runs. On the other hand, on local, if I set "PYTHONHASHSEED=0" before running the code, I may get the same hash results.
I don't believe the Cloud ML Engine API provides a mechanism to set environment variables. However, you might be able to workaround this by writing a wrapper script (NB: UNTESTED CODE):
import os
import subprocess
env = os.environ.copy()
env["PYTHONHASHSEED"] = "0"
subprocess.check_call(['python', 'main.py'], env=env)