Deploy TensorFlow model to server? - python

I am trying to deploy a Python ML app (made using Streamlit) to a server. This app essentially loads a NN model that I previously trained and makes classification predictions using this model.
The problem I am running into is that because TensorFlow is such a large package (at least 150MB for the latest tensorflow-cpu version) the hosting service I am trying to use (Heroku) keeps telling me that I exceed the storage limit of 300MB.
I was wondering if anyone else had similar problems or an idea of how to fix/get around this issue?
What I've tried so far
I've already tried replacing the tensorflow requirement with tensorflow-cpu which did significantly reduce the size, but it was still too big so -
I also tried downgrading the tensorflow-cpu version to tensorflow-cpu==2.1.0 which finally worked but then I ran into issues on model.load() (which I think might be related to the fact that I downgraded the tf version since it works fine locally)

I've faced the same problem last year. I know this does not answer your Heroku specific question, but my solution was to use Docker with AWS Beanstalk. It worked out cheaper than Heroku and I had less issues with deployment. I can guide on how to do this if you are interested

You might have multiple modules downloaded. I would recommend you to open file explorer and see the actual directory of the downloaded modules.

Related

MWAA: Trouble installing Airflow Google Providers from requirements.txt

I'm trying to set up an MWAA Airflow 2.0 environment that integrates S3 and GCP's Pub/Sub. While we have no problems with the environment being initialized, we're having trouble installing some dependencies and importing Python packages -- specifically apache-airflow-providers-google==2.2.0.
We've followed all of the instructions based on the official MWAA Python documentation. We already included the constraints file as prescribed by AWS, activated all Airflow logging configs, and tested the requirements.txt file using the MWAA local runner. The result when updating our MWAA environment's requirements would always be like this
When testing using the MWAA local runner, we observed that using the requirements.txt file with the constraints still takes forever to resolve. Installation takes more than 10-30 minutes which is no good.
As an experiment, we tried using a version of the requirements.txt file that omits the constraints and pinned versioning. Doing so installs the packages successfully and we don't receive import errors anymore on both MWAA local runner and our MWAA environment itself. However, all of our dags will fail to run no matter what. Airflow logs are also inaccessible whenever we do this.
The team and I have been trying to get MWAA environments up and running for our different applications and ETL pipelines but we just can't seem to get things to work smoothly. Any help would be appreciated!
I'm having the same problems and in the end we had to refactor a lot of things to remove the dependence. It looks like is a problem with PIP resolver and apache-airflow-providers-google if you look the official page:
https://pypi.org/project/apache-airflow-providers-google/2.0.0rc1/
In the WORST case, you may need to use Airflow direct on EC2 from docker image and abandon MWAA :(
I've been through similar issues but with different packages. There are certain things you need to take into consideration when using MWAA. I didn't have any issue testing the packages on the local runner then on MWAA using a public VPC, I only had issues when using a private VPC as the web server doesn't have an internet connection, so the method to get the packages to MWAA is different.
Things to take into consideration:
The version of the packages; test on the local runner if you can first
Enable the logs; The scheduler and web server logs can show you issues, but also they may not. The reason for this is Fargate serving the images, will try to roll back to a working state rather than have MWAA be in a non-working state. So, you might not see what the error actually is, it may even look like there were no errors in certain scenarios.
Check dependencies; You may need to download a package with pip download <package>==version. There you can inspect the contents of the .whl file and see if there are any dependencies. You may have extra notes that can point you in the right direction. In one case, using the Slack package wouldn't work until I also added the http package, even though Airflow includes this package.
So, yes it's serverless, and you may have an easy time installing/setting MWAA up, but be prepared to do a little investigation if it doesn't work. I did contact AWS support, but managed to solve it myself in the end. Other than trying the obvious things, only those that use MWAA frequently and have faced varying scenarios will be of any assistance.

Heroku Slug Size Too Large due to Keras

I'm deploying my app to heroku however their free tier only allow < 500MB packages, I'm currently using tensorflow and it takes > 500 MB, I can downgrade it to lower version or use the tensorflow-cpu instead, however, I'm also using Keras, which requires at least tensorflow=2.2.0 (so size > 500MB).
I have looked into these two questions but somehow downgrading Keras causes a lot of imcompatible issues.
Deploy python app to Heroku "Slug Size too large"
Error "Keras requires TensorFlow 2.2 or higher"
What I want to achieve is to be able to use Keras while the size of my packages < 500MB so that I can push my app to heroku

How to circumvent AWS package and ephemeral limits for large packages + large models

We have a production scenario with users invoking expensive NLP functions running for short periods of time (say 30s). Because of the high load and intermittent usage, we're looking into Lambda function deployment. However - our packages are big.
I'm trying to fit AllenNLP in a lambda function, which in turn depends on pytorch, scipy, spacy and numpy and a few other libs.
What I've tried
Following recommendations made here and the example here, tests and additional files are removed. I also use a non-cuda version of Pytorch which gets its' size down. I can package an AllenNLP deployment down to about 512mb. Currently, this is still too big for AWS Lambda.
Possible fixes?
I'm wondering if anyone of has experience with one of the following potential pathways:
Cutting PyTorch out of AllenNLP. Without Pytorch, we're in reach of getting it to 250mb. We only need to load archived models in production, but that does seem to use some of the PyTorch infrastructure. Maybe there are alternatives?
Invoking PyTorch in (a fork of) AllenNLP as a second lambda function.
Using S3 to deliver some of the dependencies: SIMlinking some of the larger .so files and serving them from an S3 bucket might help. This does create an additional problem: the Semnatic Role Labelling we're using from AllenNLP also requires some language models of around 500mb, for which the ephemeral storage could be used - but maybe these can be streamed directly into RAM from S3?
Maybe i'm missing an easy solution. Any direction or experiences would be much appreciated!
You could deploy your models to SageMaker inside of AWS, and run Lambda -> Sagemaker to avoid having to load up very large functions inside of a Lambda.
Architecture explained here - https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Could not understand Sagemaker code in logistic regression model

I started working on sagemaker recently and I'm trying to understand what each line of code does in sagemaker examples.
I'm stuck at following code. I'm working on logistic regression of bank data.
from sagemaker.amazon.amazon_estimator import get_image_uri
Can anyone explain the what get_image_uri does?
Also can anyone share a link or something where each line of code related to sagemaker is explained.
unfortunately I can't do much better than the source code, which says:
Return algorithm image URI for the given AWS region, repository name, and repository version
the link by PV8 has demo code, but it's basically getting a HTTPS URL that points to a "disk drive" image that is then used by AWS to spin up a new EC2 container with Jupyter configured and running
Amazon SageMaker is designed to be open and extensible, and it is using Docker images as the way to communicate between the development (notebooks), training and tuning, and finally hosting for real-time and batch prediction.
When you want to submit a training job, for example, you need to point the docker image that is holding the algorithm and pre/post-processing code that you want to execute as part of your training.
Amazon SageMaker is providing a set of built-in algorithms that you can use out of the box to train models in scale (mostly optimized for distributed training). These algorithms are identified by their name, and the above line of python code is mapping between the name and the URI of the docker image that Amazon provided in the container registry service - ECR.
It's because of a deprecation in the latest version of Amazon packages.
Just force the use of previous versions, by adding to the very beginning of the notebook:
import sys
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=1.71.0,<2.0.0"
Now when loading the method you want:
from sagemaker.amazon.amazon_estimator import get_image_uri
you will just get a deprecation warning, but the code works fine anyway:
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
Cheers

Tensorflow code on google cloud platform

I am facing a strange issue. I have a working code for a very simple neural network. I am running it on my laptop. Kind of slow but ok. I then created a 24 cores instance (Linux) on google cloud, and run the same code. It seems to take almost the same time. I expected to be a lot faster. Any idea why this could be the case? I am using a standard, vanilla pip installation of cpu tensorflow. Nothing fancy.
Would appreciate any ideas...
Best, Umberto

Categories

Resources