All,
(Environments: Windows 7, Python 3.6, Keras & tensorflow libs, gcloud ml engine)
I am running certain Keras ML model examples using gcloud ml engine as introduced here. Everything was fine but I just got various results among multiple runs although I was using the same training and validation data. My goal is to make reproductive training results from multiple runs.
I googled it for a while and found some solutions in this Keras Q&A regarding making reproductive results. Basically they first suggested this:
First, you need to set the PYTHONHASHSEED environment variable to 0 before the program starts (not within the program itself).
I know I could set the variables locally on my own machine, or I could set it when I deploy gcloud function as introduced here.
But, I just do not know how to set environment variables when I was using gcloud ML engine (on server side but NOT on local). So I cannot set "PYTHONHASHSEED=0" on the gcloud server when my model programs running there.
BTW, in general I know the randomness is a useful nature in the ML field but I am not very familiar about the topic of making reproductive results yet, so any thoughts regarding this topic are also welcome. Thanks!
Daqi
PS:
I have tried to setting the environment variables in running time below:
import os
os.environ["PYTHONHASHSEED"] = "0"
print(hash("keras"))
But it cannot make the effect that "setting the variables before the program starts". So by having this code, I still cannot get the same hash results from the multiple runs. On the other hand, on local, if I set "PYTHONHASHSEED=0" before running the code, I may get the same hash results.
I don't believe the Cloud ML Engine API provides a mechanism to set environment variables. However, you might be able to workaround this by writing a wrapper script (NB: UNTESTED CODE):
import os
import subprocess
env = os.environ.copy()
env["PYTHONHASHSEED"] = "0"
subprocess.check_call(['python', 'main.py'], env=env)
Related
I'm trying to set environment variables using docker (for fastAPI) but clod run doesn't want to see them. I tried many solutions, what would be a good way? I mention that I use the docker image in cloud run
What is the best way to migrate a jupyter notebook in to Google Cloud Platform?
Requirements
I don't want to do a lot of changes to the notebook to get it to run
I want it to be scheduleable, preferably through the UI
I want it to be able to run a ipynb file, not a py file
In AWS it seems like sagemaker is the no brainer solution for this. I want the tool in GCP that gets as close to the specific task without a lot of extras
I've tried the following,
Cloud Function: it seems like it's best for running python scripts, not a notebook, requires you to run a main.py file by default
Dataproc: seems like you can add a notebook to a running instance but it cannot be scheduled
Dataflow: sort of seemed like overkill, like it wasn't the best tool and that it was better suited apache based tools
I feel like this question should be easier, I found this article on the subject:
How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform
He actually doesn't do what the title says, he moves a lot of GCP code in to a main.py to create an instance and he has the instance execute the notebook.
Feel free to correct my perspective on any of this
I use Vertex AI Workbench to run notebooks on GCP. It provides two variants:
Managed Notebooks
User-managed Notebooks
User-managed notebooks creates compute instances at the background and it comes with pre-built packages such as Jupyter Lab, Python, etc and allows customisation. I mainly use for developing Dataflow pipelines.
Other requirement of scheduling - Managed Notebooks supports this feature, refer this documentation (I am yet to try Managed Notebooks):
Use the executor to run a notebook file as a one-time execution or on
a schedule. Choose the specific environment and hardware that you want
your execution to run on. Your notebook's code will run on Vertex AI
custom training, which can make it easier to do distributed training,
optimize hyperparameters, or schedule continuous training jobs. See
Run notebook files with the executor.
You can use parameters in your execution to make specific changes to
each run. For example, you might specify a different dataset to use,
change the learning rate on your model, or change the version of the
model.
You can also set a notebook to run on a recurring schedule. Even while
your instance is shut down, Vertex AI Workbench will run your notebook
file and save the results for you to look at and share with others.
I have done quite a few google searches but have not found a clear answer to the following use case. Basically, I would rather use cloud 9 (most of the time) as my IDE rather than Jupyter. What I am confused/not sure about is, how I could executed long running jobs like (Bayesian) hyper parameter optimisation from there. Can I use Sagemaker capabilities? Should I use docker and deploy to ECR (looking for the cheapest-ish option)? Any pointers w.r.t. to this particular issue would be very much appreciated. Thanks.
You could use whatever IDE you choose (including your laptop).
SaegMaker tuning job (example) is asynchronous, so you can safely close your IDE after launching it. You can monitor the job the AWS web console, or with a DescribeHyperParameterTuningJob API call.
You can launch TensorFlow, PyTorch, XGBoost, Scikit-learn, and other popular ML frameworks, using one of the built-in framework containers, avoiding the extra work of bringing your own container.
I started working on sagemaker recently and I'm trying to understand what each line of code does in sagemaker examples.
I'm stuck at following code. I'm working on logistic regression of bank data.
from sagemaker.amazon.amazon_estimator import get_image_uri
Can anyone explain the what get_image_uri does?
Also can anyone share a link or something where each line of code related to sagemaker is explained.
unfortunately I can't do much better than the source code, which says:
Return algorithm image URI for the given AWS region, repository name, and repository version
the link by PV8 has demo code, but it's basically getting a HTTPS URL that points to a "disk drive" image that is then used by AWS to spin up a new EC2 container with Jupyter configured and running
Amazon SageMaker is designed to be open and extensible, and it is using Docker images as the way to communicate between the development (notebooks), training and tuning, and finally hosting for real-time and batch prediction.
When you want to submit a training job, for example, you need to point the docker image that is holding the algorithm and pre/post-processing code that you want to execute as part of your training.
Amazon SageMaker is providing a set of built-in algorithms that you can use out of the box to train models in scale (mostly optimized for distributed training). These algorithms are identified by their name, and the above line of python code is mapping between the name and the URI of the docker image that Amazon provided in the container registry service - ECR.
It's because of a deprecation in the latest version of Amazon packages.
Just force the use of previous versions, by adding to the very beginning of the notebook:
import sys
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=1.71.0,<2.0.0"
Now when loading the method you want:
from sagemaker.amazon.amazon_estimator import get_image_uri
you will just get a deprecation warning, but the code works fine anyway:
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
Cheers
I'm following the guide here and can't seem to get my Python app (which is deployed fine on GCP) to read the environment variables I've created in Cloud Functions.
The REST endpoint for the function returns the environment variables fine as I've coded up the Python method in the function to just do os.environ.get() on a request parameter that is passed in. However, in my actual deployed application, I don't want to do a REST GET call every time I need an environment variable. I would expect using os.environ.get() in my application would be enough, but that returns blank.
How does one go about retrieving environment variables on GCP with just a simple os.environ.get() or do I really have to make a call to an endpoint every time?
I have been struggling with this for some time. The only solution I have found to set environment variables for the whole app is to define them in app.yaml. See the env_variables section here.
But then you cannot commit app.yaml to any version control repository if you don't want people to see the environment variables. You could add it to .gitignore. There are more secure ways to handle secrets storage if these variables contain sensitive data. If you need more robust security, you might find some inspiration here.