Adding python modules to AzureML workspace - python

I've been working recently on deploying a machine learning model as a web service. I used Azure Machine Learning Studio for creating my own Workspace ID and Authorization Token. Then, I trained LogisticRegressionCV model from sklearn.linear_model locally on my machine (using python 2.7.13) and with the usage of below code snippet I wanted to publish my model as web service:
from azureml import services
#services.publish('workspaceID','authorization_token')
#services.types(var_1= float, var_2= float)
#services.returns(int)
def predicting(var_1, var_2):
input = np.array([var_1, var_2].reshape(1,-1)
return model.predict_proba(input)[0][1]
where input variable is a list with data to be scored and model variable contains trained classifier. Then after defining above function I want to make a prediction on sample input vector:
predicting.service(1.21, 1.34)
However following error occurs:
RuntimeError: Error 0085: The following error occurred during script
evaluation, please view the output log for more information:
And the most important message in log is:
AttributeError: 'module' object has no attribute 'LogisticRegressionCV'
The error is strange to me because when I was using normal sklearn.linear_model.LogisticRegression everything was fine. I was able to make predictions sending POST requests to created endpoint, so I guess sklearn worked correctly.
After changing to LogisticRegressionCV it does not.
Therefore I wanted to update sklearn on my workspace.
Do you have any ideas how to do it? Or even more general question: how to install any python module on azure machine learning studio in a way to use predict functions of any model I develpoed locally?
Thanks

For anyone who came across this question like I did in hopes of installing modules in AzureML notebooks; it seems the current environments sit on Conda on the compute so it's now as simple as executing
!conda env list
# conda environments:
#
base * /anaconda
azureml_py36 /anaconda/envs/azureml_py36
!conda -n azureml_py36 -y <packages>
from within the notebook environment or doing pretty much the same without the ! in the terminal environment

For installing python module on Azure ML Studio, there is a section Technical Notes of the offical document Execute Python Script which introduces it.
The general steps as below.
Create a Python project via virtualenv and active it.
Install all packages you want via pip on the virtual Python environment, and then
Package all files and directorys under the path Lib\site-packages of your project as a zip file.
Upload the zip package into your Azure ML WorkSpace as a dataSet.
Follow the offical document to import Python Module for your Execute Python Script.
For more details, you can refer to the other similar SO thread Updating pandas to version 0.19 in Azure ML Studio, it even introduced how to update the version of Python packages installed by Azure.
Hope it helps.

I struggled with the same issue: error 0085
I was able to resolve it by using Azure ML code example available from their library:
Deployment of AzureML Web Services from Python Notebooks
can be found at https://gallery.cortanaintelligence.com/Notebook/Deployment-of-AzureML-Web-Services-from-Python-Notebooks-4
I won't copy the whole code here, but I used it exactly as is and it worked with Boston dataset. Then I used it with my dataset, and I no longer got error 0085. I haven't tracked down the error yet but it's most likely due to some misbehaving character or indent. Hope this helps.

Related

AWS Studio ModuleNotFoundError: No module named 'sagemaker'

I am trying to replicate the below example for churn prediction.
https://towardsdatascience.com/a-practical-guide-to-mlops-in-aws-sagemaker-part-i-1d28003f565
Preprocessing.py has to import sagemaker but it's throwing ModuleNotFoundError as I run the pipeline. Same sagemaker package is also imported in pipeline.py but it works fine there. Please let me know how we can install packages in studio environment with the syntax. I tried with pip and conda install in a cell in another ipynb file.. Requirement already satisfied message is only displayed when it gets installed.
So probably the first thing to understand here is that the steps in a SageMaker pipeline don't actually run inside of SageMaker Studio, but in containerized jobs.
What I think you're seeing is that the SageMaker Python SDK (which is open-source and published on PyPI as sagemaker) is present in your Studio notebook kernel where you set up the pipeline, but missing from the processing job that runs preprocessing.py.
I see the pipeline uses a ScriptProcessor based on the XGBoost v1.0-1 image (image_uri and script_eval in pipeline.py), so it looks like this particular image doesn't have sagemaker installed by default.
In fact, preprocessing.py only seems to be using the library for the purpose of looking up the name of the SageMaker default bucket. You could achieve the same result with only boto3 (which should already be installed) as follows:
account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name
trans_bucket = f"sagemaker-{region}-{account_id}"
If you really needed to install extra libraries to use with your processing jobs, I would suggest to check out FrameworkProcessor (which could install sagemaker via you providing a requirements.txt file) instead of ScriptProcessor - but watch out that there have been some bug reports when using FrameworkProcessor and Pipelines together.
If FrameworkProcessor isn't working, you could instead build your own container image FROM the pre-provided one and pip install sagemaker in the Dockerfile. You would upload this customized image to Amazon ECR and then reference it in your pipeline instead of the standard XGBoost one.

How can I match my local azure automl python sdk version to the remote version?

I'm using the azure automl python sdk to download and save a model then reload it. I get the following error:
anaconda3\envs\automl_21\lib\site-packages\sklearn\base.py:318: UserWarning: Trying to unpickle estimator Pipeline from version 0.22.1 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
How can I ensure that the versions match?
My Microsoft contact says -
"For this, their best bet is probably to see what the training env was pinned to and install those same pins. They can get that env by running child_run.get_environment() and then pip install all the pkgs listed in there with the pins listed there."
A useful code snippet.
for run in experiment.get_runs():
tags_dictionary = run.get_tags()
best_run = AutoMLRun(experiment, tags_dictionary['automl_best_child_run_id'])
env = best_run.get_environment()
print(env.python.conda_dependencies.serialize_to_string())

Dockerfile pip install not working on openshift for xgboost

So my team at work had put together a simple python service and had a dockerfile that performs some small tasks like pip installing dependencies, creating a user, rewiring some directories etc.
Everything worked fine locally -- we were able to build and run a docker image locally on my laptop (I have a MacBook Pro, running High Sierra if that is relevant).
We attempted to build the project on Openshift and kept getting a "Generic Build error" which told us to check the logs.
The log line it was failing on was
RUN pip3 install pandas flask-restful xgboost scikit-learn nltk scipy
and there was no related error listed with it. It literally just stopped at that point.
Specifically, it was breaking out when it got to installing xgboost. We removed the xgboost part and the entire Dockerfile ran fine and the build completed successfully so we know it was definitely just that one install causing the issue. Has anyone else encountered this and know why it happens?
We are pretty certain it wasn't any kind of memory or storage issue as the container had plenty of room for how small the service was.
Note: We were able to eventually use a coworker's template image to get our project up with the needed dependencies, just curious why this was happening in case we run into a similar issue in the future
Not sure if this is an option for you, but when I need to build from a Dockerfile, I usually use a hosted service like Quay.io or DockerHub.

"Unable to get Filesystem for path" error when training neural network on google cloud

I am using Google Cloud to train a neural network on the cloud like in the following example:
https://cloud.google.com/blog/big-data/2016/12/how-to-classify-images-with-tensorflow-using-google-cloud-machine-learning-and-cloud-dataflow
To start I set the following to environmental variables:
PROJECT_ID=$(gcloud config list project --format "value(core.project)")
BUCKET_NAME=${PROJECT_ID}-mlengine
I then uploaded my training and evaluation data, both csv's with the names eval_set.csv and train_set.csv to Google cloud storage with the following command:
gsutil cp -r data gs://$BUCKET_NAME
I then verified that these two csv files where in the polar-terminal-160506-mlengine/data directory on my Google Cloud storage.
I then did the following environmental variable assignments
# Assign appropriate values.
PROJECT=$(gcloud config list project --format "value(core.project)")
JOB_ID="flowers_${USER}_$(date +%Y%m%d_%H%M%S)"
GCS_PATH="${BUCKET}/${USER}/${JOB_ID}"
DICT_FILE=gs://cloud-ml-data/img/flower_photos/dict.txt
Before trying to preprocess my evaluation data like so:
# Preprocess the eval set.
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
--output_path "${GCS_PATH}/preproc/eval" \
--cloud
Sadly, this runs for a bit and then crashes outputting the following error:
ValueError: Unable to get the Filesystem for path gs://polar-terminal-160506-mlengine/data/eval_set.csv
This doesn't seem possible as I have confirmed with my eyes via my Google Cloud Storage console that eval_set.csv is stored at this location. Is this perhaps a permissions issue or something I am not seeing?
Edit:
I have found the cause of this run time error to be from a certain line in the trainer.preprocess.py file. The line is this one:
read_input_source = beam.io.ReadFromText(
opt.input_path, strip_trailing_newlines=True)
Seems like a pretty good clue but I am still not really sure what is going on. When I google "beam.io.ReadFromText ValueError: Unable to get the Filesystem for path" nothing relevant at all appears which is a bit odd. Thoughts?
It looks like your apache-beam library installation might be incomplete.
try pip install apache-beam[gcp]
It allows apache beam to access files stored on Google Cloud Storage.
Apache Beam package available here
Just as Jean-Christophe described, I believe your installation is incomplete.
The apache-beam package doesn't include all the stuff to read/write from GCP. To get all that, as well as the runner for being able to deploy your pipeline to CloudDataflow (the DataRunner), you'll need to install it via pip.
pip install google-cloud-dataflow
This is how I was able to resolve the same issue.
Try pip install apache_beam[gcp]. This will help you.

Trouble setting up 'Bachbot': Python gives no such command-error while installed correctly?

I am trying to set up Bachbot (https://github.com/feynmanliang/bachbot) on my Windows 10 system in Python 3.5.1, Anaconda 4.0.0. Though doing several attempts, I keep failing at getting this to work. I downloaded the source code from github (didn't use Docker) and got to work.
First thing that's good to know is that I changed all print statements and added parantheses. Furthermore I changed every import of cPickle to
import _pickle as cPickle
since I'm using a newer version of Python. By doing this, I cleared all compile errors, but now I'm stuck at the first few steps of getting the program to work. When calling
bachbot chorales prepare_poly
I get an error
Usage: bachbot-script.py [OPTIONS] COMMAND [ARGS]
Error: no such command "chorales"
I figured the chorales script is part of the music21-module, which I installed on my computer using pip.
As far as I know I followed the installation steps more or less correctly (see github Getting Started and Workflow):
run activate script
run pip install --editable .
2.5 (installed the missing module music21)
run bachbot chorales prepare_poly
I suspect it has something to do with the entry point but I can't put a finger on what's wrong. I tried several re-installs but that does not seem to do the trick.
I would be grateful if someone could help me with this. Thanks in advance!
My apologies, I was rushing to get the thesis in on time so the documentation is not the best!
The commands for building the polyphonic dataset and training the model are:
bachbot datasets prepare
bachbot datasets concatenate_corpus scratch/BWV-*.utf
bachbot make_h5
bachbot train
To use the model trained for $ITER iterations to generate samples with a sampling temperature of $TMP:
bachbot sample ~/bachbot/scratch/checkpoints/*/checkpoint_<ITER>.t7 -t <TEMP>
bachbot decode sampled_stream ~/bachbot/scratch/sampled_$TMP.utf
The first and last section of a recent presentation I made summarizes this workflow.
By the way, I would recommend using the Docker image described in the presentation I linked. While the CLI is in Python, the actual LSTM has additional dependencies (e.g. Lua, Torch, CUDA if you plan on using a GPU).

Categories

Resources