No file found using custom container to train on azure ml

No file found using custom container to train on azure ml - python

I'm trying to train on azureml using a custom docker container with azure-cli, using the below command:
az ml job create -f train.yaml --resource-group DefaultResourceGroup-EUS2 --workspace-name test1234
and the train.yaml is:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
type: command
environment:
image: user2001.azurecr.io/test/train:latest
command: >-
python test_local.py
compute: azureml:test1234
Upon firing the above command I'm getting this error on azure ml jobs:
Error: python: can't open file 'test_local.py': [Errno 2] No such file or directory
I have checked in my docker image, test_local.py is present,I have also tried the following combination- "./test_local.py & /test_local.py"
The error still continues. Can't figure out where I'm going wrong.
Edit:docker run -it user2001.azurecr.io/test/train:latest python test_local.py
on running this command the container executes, but same thing doesn't work on azureml

Adding the /app prefix solved the issue,
so the final statement is
command: >-
python /app/test_local.py

Related

Docker run failing when mounting host dir inside a container

I am trying to mount a directory from host to container and at the same time running jupyter from that directory. What am I doing wrong here that docker is complaining as file now found please?
docker run -it --rm -p 8888:8888 tensorflow/tensorflow:nightly-jupyter -v $HOME/mytensor:/tensor --name TensorFlow python:3.9 bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-v": executable file not found in $PATH: unknown.
I tried removing the version of python but still same problem. I searched extensively online and couldnt get an answer.
Basically i want to mount that directory which is is a git clone where I have tensor files. At the same time, I want to run jupyter notebook where I can see the files and run them. With so many issues with apple:m1 processor and tensorflow, i thought going the docker route would be better but am i not surprised :)
Appreciate the help

Docker run command syntax is
docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
image name tensorflow/tensorflow:nightly-jupyter should be after options (-v, -p --name et.al.) and before the command.
docker run -it --rm -p 8888:8888 -v $HOME/mytensor:/tensor --name TensorFlow tensorflow/tensorflow:nightly-jupyter bash

Cannot run docker container. Error response from daemon pull

To use TensorFlow serving, I had to use docker.
I downloaded the TensorFlow image using
docker pull tensorflow/serving
After that, I had to start tf serving and map my directories.
$ docker run -it -v D:\softwares\software saved file\GITHUB\potato-disease\Plant-disease-classification-using-Convolution-Neural-Networks:/tf_serving -p 8605:8605 --entrypoint /bin/bash tensorflow/serving
As a result I have an error :-
Unable to find image 'saved:latest' locally
docker: Error response from daemon: pull access denied for saved, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

The volume path contained spaces, putting "" around the path could solve the error
In this case, I changed the name of the directory.
Answered by #ai2ys

docker login --username AWS --password-stdin https://<accountNumber>.dkr.ecr.<region>.amazonaws.com exited with error code 1:

I'm simply adopting the Eventbridge ETL design pattern and it gives me this error when I deploy:
[100%] fail: docker login --username AWS --password-stdin https://315997497220.dkr.ecr.us-west-2.amazonaws.com exited with error code 1:
❌ the-eventbridge-etl failed: Error: Failed to publish one or more
assets. See the error messages above for more information. at
Object.publishAssets
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/util/asset-publishing.ts:25:11)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at Object.deployStack
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:237:3)
at CdkToolkit.deploy
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:194:24)
at initCommandLine
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/bin/cdk.ts:267:9)
Failed to publish one or more assets. See the error messages above for
more information.
The steps I took. Github repo has a video I followed
npx cdkp init the-eventbridge-etl --lang=python
cd the-eventbridge-etl
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
cdk synth
cdk deploy
The first error I get is related to bootstrapping. So I bootstrap.
export CDK_NEW_BOOTSTRAP=1
npx cdk bootstrap aws://315997497220/us-east-2 --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess --trust 315997497220 aws://315997497220/us-east-2
I've naturally updated the cdk.json file for using the above bootstrapping technique. I've tried all bootstrap techniques, with and without a qualifier, and with its subsequent changes to cdk.json. I don't think it's a bootstrap issue.
I get the above error and I don't know what the issue is. I have not made any changes to the code.

I geuss you need to get and pipe a password first as you use the --password-stdin flag. Try:
aws ecr get-login-password | docker login --username AWS --password-stdin https://315997497220.dkr.ecr.us-west-2.amazonaws.com

Running .env files within a docker container

I have been struggling to add env variables into my container for the past 3 hrs :( I have looked through the docker run docs but haven't managed to get it to work.
I have built my image using docker build -t sellers_json_analysis . which works fine.
I then go to run it with: docker run -d --env-file ./env sellers_json_analysis
As per the docs: $ docker run --env-file ./env.list ubuntu bash but I get the following error:
docker: open ./env: no such file or directory.
The .env file is in my root directory
But when running docker run --help I am unable to find anything about env variables, but it doesn't provide the following:
Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
So not sure I am placing things incorrectly. I could add my variables into the dockerfile but I want to keep it as a public repo as it's a project I would like to display.

Your problem is wrong path, either use .env or ./.env, when you use ./env it mean a file named env in current directory
docker run -d --env-file .env sellers_json_analysis

How to use camera in docker container

I have a code in python that periodically takes pictures (using OpenCV). I created an image to execute this code in container. In linux, I can use it by perfectly executing the following command:
docker run -it --device=/dev/video0:/dev/video0 myImage
A little searching, the equivalent command in windows ( 10 Pro ) would be
docker run --privileged -v /dev/bus/usb:/dev/bus/usb myImage
But I am getting Error response from daemon: Mount denied error
I already tried to enable the shared drivers option in the Docker app, but the same error continued
I also tried some similar commands, but the same error continued
The command that generated a different error was:
docker run -it --device /dev/video0 myImage
generating the error:
  Error response from daemon: error gathering device information while adding custom device "C": not a device node.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.