cloud 9 and sagemaker - hyper parameter optimisation - python

I have done quite a few google searches but have not found a clear answer to the following use case. Basically, I would rather use cloud 9 (most of the time) as my IDE rather than Jupyter. What I am confused/not sure about is, how I could executed long running jobs like (Bayesian) hyper parameter optimisation from there. Can I use Sagemaker capabilities? Should I use docker and deploy to ECR (looking for the cheapest-ish option)? Any pointers w.r.t. to this particular issue would be very much appreciated. Thanks.

You could use whatever IDE you choose (including your laptop).
SaegMaker tuning job (example) is asynchronous, so you can safely close your IDE after launching it. You can monitor the job the AWS web console, or with a DescribeHyperParameterTuningJob API call.
You can launch TensorFlow, PyTorch, XGBoost, Scikit-learn, and other popular ML frameworks, using one of the built-in framework containers, avoiding the extra work of bringing your own container.

Related

Running Jupyter Notebook in GCP on a Schedule

What is the best way to migrate a jupyter notebook in to Google Cloud Platform?
Requirements
I don't want to do a lot of changes to the notebook to get it to run
I want it to be scheduleable, preferably through the UI
I want it to be able to run a ipynb file, not a py file
In AWS it seems like sagemaker is the no brainer solution for this. I want the tool in GCP that gets as close to the specific task without a lot of extras
I've tried the following,
Cloud Function: it seems like it's best for running python scripts, not a notebook, requires you to run a main.py file by default
Dataproc: seems like you can add a notebook to a running instance but it cannot be scheduled
Dataflow: sort of seemed like overkill, like it wasn't the best tool and that it was better suited apache based tools
I feel like this question should be easier, I found this article on the subject:
How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform
He actually doesn't do what the title says, he moves a lot of GCP code in to a main.py to create an instance and he has the instance execute the notebook.
Feel free to correct my perspective on any of this
I use Vertex AI Workbench to run notebooks on GCP. It provides two variants:
Managed Notebooks
User-managed Notebooks
User-managed notebooks creates compute instances at the background and it comes with pre-built packages such as Jupyter Lab, Python, etc and allows customisation. I mainly use for developing Dataflow pipelines.
Other requirement of scheduling - Managed Notebooks supports this feature, refer this documentation (I am yet to try Managed Notebooks):
Use the executor to run a notebook file as a one-time execution or on
a schedule. Choose the specific environment and hardware that you want
your execution to run on. Your notebook's code will run on Vertex AI
custom training, which can make it easier to do distributed training,
optimize hyperparameters, or schedule continuous training jobs. See
Run notebook files with the executor.
You can use parameters in your execution to make specific changes to
each run. For example, you might specify a different dataset to use,
change the learning rate on your model, or change the version of the
model.
You can also set a notebook to run on a recurring schedule. Even while
your instance is shut down, Vertex AI Workbench will run your notebook
file and save the results for you to look at and share with others.

How to circumvent AWS package and ephemeral limits for large packages + large models

We have a production scenario with users invoking expensive NLP functions running for short periods of time (say 30s). Because of the high load and intermittent usage, we're looking into Lambda function deployment. However - our packages are big.
I'm trying to fit AllenNLP in a lambda function, which in turn depends on pytorch, scipy, spacy and numpy and a few other libs.
What I've tried
Following recommendations made here and the example here, tests and additional files are removed. I also use a non-cuda version of Pytorch which gets its' size down. I can package an AllenNLP deployment down to about 512mb. Currently, this is still too big for AWS Lambda.
Possible fixes?
I'm wondering if anyone of has experience with one of the following potential pathways:
Cutting PyTorch out of AllenNLP. Without Pytorch, we're in reach of getting it to 250mb. We only need to load archived models in production, but that does seem to use some of the PyTorch infrastructure. Maybe there are alternatives?
Invoking PyTorch in (a fork of) AllenNLP as a second lambda function.
Using S3 to deliver some of the dependencies: SIMlinking some of the larger .so files and serving them from an S3 bucket might help. This does create an additional problem: the Semnatic Role Labelling we're using from AllenNLP also requires some language models of around 500mb, for which the ephemeral storage could be used - but maybe these can be streamed directly into RAM from S3?
Maybe i'm missing an easy solution. Any direction or experiences would be much appreciated!
You could deploy your models to SageMaker inside of AWS, and run Lambda -> Sagemaker to avoid having to load up very large functions inside of a Lambda.
Architecture explained here - https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Importing libraries in AWS Lambda function code from S3 bucket

I have to deploy a Deep Learning model on AWS Lambda which does object detection. It is triggered on addition of image in the S3 bucket.
The issue that I'm facing is that the Lambda function code uses a lot of libraries like Tensorflow, PIL, Numpy, Matplotlib, etc. and if I try adding all of them in the function code or as layers, it exceeds the 250 MB size limit.
Is there any way I can deploy the libraries zip file on S3 bucket and use them from there in the function code (written in Python 3.6) instead of directly having them as a part of the code?
I can also try some entirely different approach for this.
It sounds to me like Amazon SageMaker would be a better choice for your task.
You can create a model and host it on an endpoint all via SageMaker. The use a lambda function triggered by your s3 upload to pass the image to your SageMaker endpoint and process the result.
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
https://aws.amazon.com/sagemaker/
Try out pre-compiled packs for Tensorflow, they are about 50MB, you probably would have enough space for the rest of modules you need. And in general, check the AWS recommendations on deep learning architecture approaches.
Ultimately, I changed my approach from using Lambda to using EC2.
I deployed the whole code with libraries on an EC2 instance and then triggered it using Lambda. On EC2, it can also be deployed on Apache server to change port mapping.

Run multiple Python scripts in Azure (using Docker?)

I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/

How to build a web service with one sandboxed Python (VM) per request

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?
Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started
[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".
I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D
I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.

Categories

Resources