I'm following the guidelines (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-environments) to use a custom docker file on Azure. My script to create the environment looks like this:
from azureml.core.environment import Environment
myenv = Environment(name = "myenv")
myenv.docker.enabled = True
dockerfile = r"""
FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04
RUN apt-get update && apt-get install -y libgl1-mesa-glx
RUN echo "Hello from custom container!"
"""
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile
Upon execution, this is totally ignored and libgl1 is not installed. Any ideas why?
EDIT: Here's the rest of my code:
est = Estimator(
source_directory = '.',
script_params = script_params,
use_gpu = True,
compute_target = 'gpu-cluster-1',
pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
entry_script = 'AzureEntry.py',
)
run = exp.submit(config = est)
run.wait_for_completion(show_output=True)
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-environments
Have no issues installing the lib. First, please dump your dockerfile content into a file, easier to maintain and read ;)
e = Environment("custom")
e.docker.base_dockerfile = "path/to/your/dockerfile"
will set the content of the file into a string prop.
e.register(ws).build(ws).wait_for_completion()
step 2/16 will have your apt update and libgl1 install
Note, that should work with >=1.7 SDK
This should work :
from azureml.core import Workspace
from azureml.core.environment import Environment
from azureml.train.estimator import Estimator
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Experiment
ws = Workspace (...)
exp = Experiment(ws, 'test-so-exp')
myenv = Environment(name = "myenv")
myenv.docker.enabled = True
dockerfile = r"""
FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04
RUN apt-get update && apt-get install -y libgl1-mesa-glx
RUN echo "Hello from custom container!"
"""
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile
## You need to instead put your packages in the Environment definition instead...
## see below for some changes too
myenv.python.conda_dependencies = CondaDependencies.create(pip_packages = ['scipy==1.1.0', 'torch==1.5.1'])
Finally you can build your estimator a bit differently :
est = Estimator(
source_directory = '.',
# script_params = script_params,
# use_gpu = True,
compute_target = 'gpu-cluster-1',
# pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
entry_script = 'AzureEntry.py',
environment_definition=myenv
)
And submit it :
run = exp.submit(config = est)
run.wait_for_completion(show_output=True)
Let us know if that works.
Totally understandable why you're struggling -- others have also expressed a need for more information.
perhaps base_dockerfile needs to be a text file (with the contents inside) and not a string? I'll ask the environments PM to learn more specifically how this works
another option would be to lever Azure Container Instance (ACI). An ACI is created automatically when spinning up an Azure ML workspace. See this GitHub issue for more info on that.
For more information about using Docker in environments, see the article `Enable
Docker https://learn.microsoft.com/azure/machine-learning/how-to-use-environments#enable-docker
The following example shows how to load docker steps as a string.
from azureml.core import Environment
myenv = Environment(name="myenv")
# Creates the environment inside a Docker container.
myenv.docker.enabled = True
# Specify docker steps as a string.
dockerfile = r'''
FROM mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04
RUN echo "Hello from custom container!"
'''
# Alternatively, load from a file.
#with open("dockerfiles/Dockerfile", "r") as f:
# dockerfile=f.read()
myenv.docker.base_dockerfile = dockerfile
I think it's that you're using an estimator. Estimators create their own environment, unless you set the environment_definition parameter, which I don't see in your snippet. I'm looking at https://learn.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py.
Haven't tried it, but I think you can fix this by changing your code to:
est = Estimator(
source_directory = '.',
script_params = script_params,
use_gpu = True,
compute_target = 'gpu-cluster-1',
pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
entry_script = 'AzureEntry.py',
environment_definition = myenv
)
run = exp.submit(config = est)
run.wait_for_completion(show_output=True)
You might also have to move use_gpu setting into the environment definition, as the SDK page I linked above says that the environment will take precedence over this and a couple other estimator parameters.
Related
Hello I get the following error while trying to use tabula to read a table in a pdf.
I was aware of some of the difficulties (here) using this package with AWS lambda and tried to zip the tabula package via an EC2 (Ubuntu 20.02) and then, add it as a layer in the function.
Many thanks in advance!
{ "errorMessage": "`java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`", "errorType": "JavaNotFoundError", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 39, in lambda_handler\n df = tabula.read_pdf(BytesIO(fs), pages=\"all\", area = [box],\n", " File \"/opt/python/lib/python3.8/site-packages/tabula/io.py\", line 420, in read_pdf\n output = _run(java_options, tabula_options, path, encoding)\n", " File \"/opt/python/lib/python3.8/site-packages/tabula/io.py\", line 98, in _run\n raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)\n" ] }
Code
import boto3
import read_pdf from tabula
from io import BytesIO
def lambda_handler(event, context):
client = boto3.client('s3')
s3 = boto3.resource('s3')
# Get most recent file name
response = client.list_objects_v2(Bucket='S3bucket')
all = response['Contents']
latest = max(all, key=lambda x: x['LastModified'])
latest_key = latest['Key']
# Get file
obj = s3.Object('S3bucket', latest_key)
fs = obj.get()['Body'].read()
# Read PDF
box = [3.99, .22, 8.3, 7.86]
fc = 72
for i in range(0, len(box)):
box[i] *= fc
df = tabula.read_pdf(BytesIO(fs), pages="all", area = [box], output_format = "dataframe", lattice=True)
Here is the Dockerfile that ultimatley worked and allowed me to run tabula in my lambda function:
ARG FUNCTION_DIR="/var/task/"
COPY ./ ${FUNCTION_DIR}
# Install OpenJDK
RUN yum install -y java-1.8.0-openjdk
# Setup Python environment
# Install PYTHON requirements
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Copy function code to container
COPY app.py ./
CMD [ "app.handler" ]
Tabula's python package is just a wrapper for java code. Here's a reference to the package here.
Java 8+ is required to be installed for this to work. Your best bet to achieve that is to develop a docker container image where your script works and deploy that image as a lambda function.
AWS has a good walkthrough that might help.
I am trying to submit an experiment in Azure Machine Learning service locally on an Azure VM using a ScriptRunConfig object in my workspace ws, as in
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment
experiment = Experiment(ws, name='test')
run_local = RunConfiguration()
script_params = {
'--data-folder': './data',
'--training-data': 'train.csv'
}
src = ScriptRunConfig(source_directory = './source_dir',
script = 'train.py',
run_config = run_local,
arguments = script_params)
run = experiment.submit(src)
However, this fails with
ExperimentExecutionException: {
"error_details": {
"correlation": {
"operation": "bb12f5b8bd78084b9b34f088a1d77224",
"request": "iGfp+sjC34Q="
},
"error": {
"code": "UserError",
"message": "Failed to deserialize run definition"
Worse, if I set my data folder to use a datastore (which likely I will need to)
script_params = {
'--data-folder': ds.path('mydatastoredir').as_mount(),
'--training-data': 'train.csv'
}
the error is
UserErrorException: Dictionary with non-native python type values are
not supported in runconfigs.
{'--data-folder':
$AZUREML_DATAREFERENCE_d93269a580ec4ecf97be428cd2fe79,
'--training-data': 'train.csv'}
I don't quite understand how I should pass my script_params parameters to my train.py (the documentation of ScriptRunConfig doesn't include a lot of details on this unfortunately).
Does anybody know how to properly create src in these two cases?
In the end I abandoned ScriptRunConfig and used Estimator as follows to pass script_params (after having provisioned a compute target):
estimator = Estimator(source_directory='./mysourcedir',
script_params=script_params,
compute_target='cluster',
entry_script='train.py',
conda_packages = ["pandas"],
pip_packages = ["git+https://github.com/..."],
use_docker=True,
custom_docker_image='<mydockeraccount>/<mydockerimage>')
This also allowed me to install my pip_packages dependency by putting on https://hub.docker.com/ a custom_docker_image Docker image created from a Dockerfile like:
FROM continuumio/miniconda
RUN apt-get update
RUN apt-get install git gcc g++ -y
(it worked!)
The correct way of passing arguments to the ScriptRunConfig and RunConfig is as a list of strings according to https://learn.microsoft.com/nb-no/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py.
Modified and working code would be as follows.
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment
experiment = Experiment(ws, name='test')
run_local = RunConfiguration()
script_params = [
'--data-folder',
'./data',
'--training-data',
'train.csv'
]
src = ScriptRunConfig(source_directory = './source_dir',
script = 'train.py',
run_config = run_local,
arguments = script_params)
run = experiment.submit(src)
As per the documentation, I created the following files:
setup.py (in folder C:\Python34\Lib\site-packages\Orange\widgets\orange-demo)
from setuptools import setup
setup(name="Demo",
packages=["orangedemo"],
package_data={"orangedemo": ["icons/*.svg"]},
classifiers=["Example :: Invalid"],
# Declare orangedemo package to contain widgets for the "Demo" category
entry_points={"orange.widgets": "Demo = orangedemo"},
)
and OWDataSamplerA.py (in folder C:\Python34\Lib\site-packages\Orange\widgets\orange-demo\orangedemo)
import sys
import numpy
import Orange.data
from Orange.widgets import widget, gui
class OWDataSamplerA (widget.OWWidget):
name = "Data Sampler"
description = "Randomly selects a subset of instances from the data set"
icon = "icons/DataSamplerA.svg"
priority = 10
inputs = [("Data", Orange.data.Table, "set_data")]
outputs = [("Sampled Data", Orange.data.Table)]
want_main_area = False
def __init__(self):
super().__init__()
# GUI
box = gui.widgetBox(self.controlArea, "Info")
self.infoa = gui.widgetLabel(box, 'No data on input yet, waiting to get something.')
self.infob = gui.widgetLabel(box, '')
def set_data(self, dataset):
if dataset is not None:
self.infoa.setText('%d instances in input data set' % len(dataset))
indices = numpy.random.permutation(len(dataset))
indices = indices[:int(numpy.ceil(len(dataset) * 0.1))]
sample = dataset[indices]
self.infob.setText('%d sampled instances' % len(sample))
self.send("Sampled Data", sample)
else:
self.infoa.setText('No data on input yet, waiting to get something.')
self.infob.setText('')
self.send("Sampled Data", None)
I created a .svg icon and left the __init__.py file blank. After running pip install -e ., a Demo.egg-info directory is created and it includes several files, but no demo widget is created. After restarting Python Orange no visible changes occur at all.
Any advice would be most welcome.
A separate version of Python 3.6 is bundled with Orange.
To install a new widget, you need to have the proper Python instance in the path.
On Windows, you can find a special shortcut "Orange Command Prompt". You will need to run it as Administrator to install new packages in newer version.
Once in the appropriate directory, you can run your pip install.. command.
I'm trying to do the equivalent of git fetch -a using the dulwich library within python.
Using the docs at https://www.dulwich.io/docs/tutorial/remote.html I created the following script:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
import os
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local'
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
remote_refs = LocalGitClient().fetch(remote, local)
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
with an existing git repository at ~/temp/remote and a newly initialised repo at ~/temp/local
remote_refs shows everything I would expect, but local_refs is an empty dictionary and git branch -a on the local repo returns nothing.
Am I missing something obvious?
This is on dulwich 0.12.0 and Python 3.5
EDIT #1
Following a discussion on the python-uk irc channel, I updated my script to include the use of determine_wants_all:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local'
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
wants = local.object_store.determine_wants_all
remote_refs = LocalGitClient().fetch(remote, local, wants)
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
but this had no effect :-(
EDIT #2
Again, following discussion on the python-uk irc channel, I tried running dulwich fetch from within the local repo. It gave the same result as my script i.e. the remote refs were printed to the console correctly, but git branch -a showed nothing.
EDIT - Solved
A simple loop to update the local refs did the trick:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
import os
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local')
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
remote_refs = LocalGitClient().fetch(remote, local)
for key, value in remote_refs.items():
local.refs[key] = value
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
LocalGitClient.fetch() does not update refs, it just fetches objects and then returns the remote refs so you can use that to update the target repository refs.
How do I run OpenERP on uWSGI?
I found this wsgi script online, but I'm not sure where to place it?
import openerp
try:
import uwsgi
uwsgi.port_fork_hook = openerp.wsgi.core.on_starting
except:
openerp.wsgi.core.on_starting()
# Equivalent of --load command-line option
openerp.conf.server_wide_modules = ['web']
# internal TODO: use openerp.conf.xxx when available
conf = openerp.tools.config
# Path to the OpenERP Addons repository (comma-separated for
# multiple locations)
conf['addons_path'] = '/home/openerp/addons/trunk,/home/openerp/web/trunk/addons'
# Optional database config if not using local socket
#conf['db_name'] = 'mycompany'
#conf['db_host'] = 'localhost'
#conf['db_user'] = 'foo'
#conf['db_port'] = 5432
#conf['db_password'] = 'secret'
# OpenERP Log Level
# DEBUG=10, DEBUG_RPC=8, DEBUG_RPC_ANSWER=6, DEBUG_SQL=5, INFO=20,
# WARNING=30, ERROR=40, CRITICAL=50
# conf['log_level'] = 20
# If --static-http-enable is used, path for the static web directory
#conf['static_http_document_root'] = '/var/www'
# vim:expandtab:smartindent:tabstop=4:softtabstop=4:shiftwidth=4:
application = openerp.wsgi.core.application
I installed OpenERP in a virtual environment in /var/www/openerp/venv and I can run it by calling $ openerp-server.
Thanks in advance.
you can just put the script file in the same directory with the openerp-server.py file.
however when I test it it doesnot work since gunicorn cannot find the openerp in the
import openerp sentence. the reason is that openerp is not installed as a python module to the system with the installation procedures around.
I think it will work when you do a openerp install with the DEB package. (when you make such install you should disable the start script so it will just work from gunicorn.
let me also make a test install and share the result.