I try to install my machine learning environment ( install all the library that I need) using Dockerfile :
Here is the dockerfile :
# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.
FROM ubuntu:16.04
MAINTAINER Amazon AI <sage-learner#amazon.com>
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python \
nginx \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
pip install numpy scipy scikit-learn pandas flask gevent gunicorn && \
(cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && \
rm -rf /root/.cache
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
# Set up the program in the image
COPY xgboost /opt/program
WORKDIR /opt/program
But I get this error :
/usr/bin/env: 'python3.5': No such file or directory
Can you help me to solve this problem please?
Thank you
apt-get install python will always install Python 2. If you want Python 3, you need to apt-get install python3.
(You might want to specify a versioned dependency if you require e.g. Python 3.5 specifically.)
In run command provide python version 3.5 explicitly
If this do not work. Please keep note that this issue is because environment variables are not correctly set. Try to set them via hard coded Command using CMD or EXECUTE <"Linux Command">
Related
I want to run a Docker container on my Raspberry PI 2 with a Python script that uses numpy. For this I have the following Dockerfile:
FROM python:3.7
COPY numpy_script.py /
RUN pip install numpy
CMD ["python", "numpy_script.py"]
But when I want to import numpy, I get the error message that libf77blas.so.3 was not found.
I have also tried to install numpy with a wheel from www.piwheels.org, but the same error occurs.
The Google search revealed that I need to install the liblapack3. How do I need to modify my Dockerfile for this?
Inspired by the answer of om-ha, this worked for me:
FROM python:3.7
COPY numpy_script.py /
RUN apt-get update \
&& apt-get -y install libatlas-base-dev \
&& pip install numpy
CMD ["python", "numpy_script.py"]
Working Dockerfile
# Python image (debian-based)
FROM python:3.7
# Create working directory
WORKDIR /app
# Copy project files
COPY numpy_script.py numpy_script.py
# RUN command to update packages & install dependencies for the project
RUN apt-get update \
&& apt-get install -y \
&& pip install numpy
# Commands to run within the container
CMD ["python", "numpy_script.py"]
Explanation
You have an extra trailing \ in your dockerfile, this is used for multi-line shell commands actually. You can see this used in-action here. I used this in my answer above. Beware the last shell command (in this case pip) does not need a trailing \, a mishap that was done in the code you showed.
You should probably use a working directory via WORKDIR /app
Run apt-get update just to be sure everything is up-to-date.
It's recommended to group multiple shell commands within ONE RUN directive using &&. See best practices and this discussion.
Resources sorted by order of use in this dockerfile
FROM
WORKDIR
COPY
RUN
CMD
I have installed a library called fastai==1.0.59 via requirements.txt file inside my Dockerfile.
But the purpose of running the Django app is not achieved because of one error. To solve that error, I need to manually edit the files /site-packages/fastai/torch_core.py and site-packages/fastai/basic_train.py inside this library folder which I don't intend to.
Therefore I'm trying to copy the fastai folder itself from my host machine to the location inside docker image.
source location: /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/
destination location: ../venv/lib/python3.6/site-packages/ which is inside my docker image.
being new to docker, I tried this using COPY command like:
COPY /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/ ../venv/lib/python3.6/site-packages/
which gave me an error:
ERROR: Service 'app' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder583041406/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai: no such file or directory.
I tried referring this: How to include files outside of Docker's build context?
but seems like it bounced off my head a bit..
Please help me tackling this. Thanks.
Dockerfile:
FROM python:3.6-slim-buster AS build
MAINTAINER model1
ENV PYTHONUNBUFFERED 1
RUN python3 -m venv /venv
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y git && \
apt-get install -y build-essential && \
apt-get install -y awscli && \
apt-get install -y unzip && \
apt-get install -y nano && \
apt-get install -y libsm6 libxext6 libxrender-dev
RUN apt-cache search mysql-server
RUN apt-cache search libmysqlclient-dev
RUN apt-get install -y libpq-dev
RUN apt-get install -y postgresql
RUN apt-cache search postgresql-server-dev-9.5
RUN apt-get install -y libglib2.0-0
RUN mkdir -p /model/
COPY . /model/
WORKDIR /model/
RUN pip install --upgrade awscli==1.14.5 s3cmd==2.0.1 python-magic
RUN pip install -r ./requirements.txt
EXPOSE 8001
RUN chmod -R 777 /model/
COPY /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/ ../venv/lib/python3.6/site-packages/
CMD python3 -m /venv/activate
CMD /model/my_setup.sh development
CMD export API_ENV = development
CMD cd server && \
python manage.py migrate && \
python manage.py runserver 0.0.0.0:8001
Short Answer
No
Long Answer
When you run docker build the current directory and all of its contents (subdirectories and all) are copied into a staging area called the 'build context'. When you issue a COPY instruction in the Dockerfile, docker will copy from the staging area into a layer in the image's filesystem.
As you can see, this procludes copying files from directories outside the build context.
Workaround
Either download the files you want from their golden-source directly into the image during the build process (this is why you often see a lot of curl statements in Dockerfiles), or you can copy the files (dirs) you need into the build-tree and check them into source control as part of your project. Which method you choose is entirely dependent on the nature of your project and the files you need.
Notes
There are other workarounds documented for this, all of them without exception break the intent of 'portability' of your build. The only quality solutions are those documented here (though I'm happy to add to this list if I've missed any that preserve portability).
I have javascript application, in which I use python utility (canconvert of canmatrix utility).
In javascript I call canconvert through
execSync(`canconvert\\
--jsonExportAll\\
--jsonNativeTypes\\
--additionalFrameAttributes\\
--additionalSignalAttributes\\
${dbcFileLocation} ${parsedJsonLocation}`);
so, canconvert should be available in the docker environment.
for now, I just install canmatrix through pip but size of the container becomes very big.
FROM node:10.15.3-slim AS dist
...
RUN apt-get update && \
apt-get install -y python python-pip && \
pip install canmatrix && \
chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
Python is not tiny. Batteries included.
I guess you could try building your python code with nuitka or cython, but then you might end up giving up on automatic upgrades.
Disk is cheap - why not just use it like you have it?
I am using a centos os base image and installing python3 with the following dockerfile
FROM centos:7
ENV container docker
ARG USER=dsadmin
ARG HOMEDIR=/home/${USER}
RUN yum clean all \
&& yum update -q -y -t \
&& yum install file -q -y
RUN useradd -s /bin/bash -d ${HOMEDIR} ${USER}
RUN export LC_ALL=en_US.UTF-8
# install Development Tools to get gcc
RUN yum groupinstall -y "Development Tools"
# install python development so that pip can compile packages
RUN yum -y install epel-release && yum clean all \
&& yum install -y python34-setuptools \
&& yum install -y python34-devel
# install pip
RUN easy_install-3.4 pip
# install virtualenv or virtualenvwrapper
RUN pip3 install virtualenv \
&& pip3 install virtualenvwrapper \
&& pip3 install pandas
# # install django
# RUN pip3 install django
USER ${USER}
WORKDIR ${HOMEDIR}
I build and tag the above as follows:
docker build . --label validation --tag validation
I then need to add a .tar.gz file to the home directory. This file contains all the python scripts I maintain. This file will change frequently. If I add it to the dockerfile above, python is installed every time I change the .gz file. This adds a lot of time to development. As a workaround, I tried creating a second dockerfile file that uses the above image as the base and then just adds the .tar.gz file on it.
FROM validation:latest
ARG USER=dsadmin
ARG HOMEDIR=/home/${USER}
ADD code/validation_utility.tar.gz ${HOMEDIR}/.
USER ${USER}
WORKDIR ${HOMEDIR}
After that if I run docker image and do an ls, all the files in the folder have a owner of games.
-rw-r--r-- 1 501 games 35785 Nov 2 21:24 Validation_utility.py
To fix the above, I added the following lines to the second docker file:
ADD code/validation_utility.tar.gz ${HOMEDIR}/.
RUN chown -R ${USER}:${USER} ${HOMEDIR} \
&& chmod +x ${HOMEDIR}/Validation_utility.py
but I get the error:
chown: changing ownership of '/home/dsadmin/Validation_utility.py': Operation not permitted
The goal is to have two docker files. The users will run the first docker file to install centos and python dependencies. The second dockerfile will install the custom python scripts. If the scripts change, they will just run the second docker file again. Is that the right way to think about docker? Thank you.
Is that the right way to think about docker?
This is the easy part of your question. Yes. You're thinking about the proper way to structure your Dockerfiles, reuse them, and keep your image builds efficient. Good job.
As for the error you're receiving, I'm less confident in answering why the ADD command is un-tarballing your tar.gz as the games user. I'm not nearly as familiar with CentOS. That's the start of the problem. dsadmin, as a regular non-privileged user, can't change ownership of files he doesn't own. Since this un-tarballed script is owned by games, the chown command fails.
I used your Dockerfiles and got the same issue on MacOS.
You can get around this by, well, not using ADD. Which is funny because local tarball extraction is the one use case where Docker thinks you should prefer ADD over COPY.
COPY code/validation_utility.tar.gz ${HOMEDIR}/.
RUN tar -xvf validation_utility.tar.gz
Properly extracts the tarball and, since dsadmin was the user at the time, the contents come out properly owned by dsadmin.
(An uglier route might be to switch the USER to root to set permissions, then set it back to dsadmin. I think this is icky, but it's an option.)
I am looking for a way to create multistage builds with python and Dockerfile:
For example, using the following images:
1st image: install all compile-time requirements, and install all needed python modules
2nd image: copy all compiled/built packages from the first image to the second, without the compilers themselves (gcc, postgers-dev, python-dev, etc..)
The final objective is to have a smaller image, running python and the python packages that I need.
In short: how can I 'wrap' all the compiled modules (site-packages / external libs) that were created in the first image, and copy them in a 'clean' manner, to the 2nd image.
ok so my solution is using wheel, it lets us compile on first image, create wheel files for all dependencies and install them in the second image, without installing the compilers
FROM python:2.7-alpine as base
RUN mkdir /svc
COPY . /svc
WORKDIR /svc
RUN apk add --update \
postgresql-dev \
gcc \
musl-dev \
linux-headers
RUN pip install wheel && pip wheel . --wheel-dir=/svc/wheels
FROM python:2.7-alpine
COPY --from=base /svc /svc
WORKDIR /svc
RUN pip install --no-index --find-links=/svc/wheels -r requirements.txt
You can see my answer regarding this in the following blog post
https://www.blogfoobar.com/post/2018/02/10/python-and-docker-multistage-build
I recommend the approach detailed in this article (section 2). He uses virtualenv so pip install stores all the python code, binaries, etc. under one folder instead of spread out all over the file system. Then it's easy to copy just that one folder to the final "production" image. In summary:
Compile image
Activate virtualenv in some path of your choosing.
Prepend that path to your docker ENV. This is all virtualenv needs to function for all future docker RUN and CMD action.
Install system dev packages and pip install xyz as usual.
Production image
Copy the virtualenv folder from the Compile Image.
Prepend the virtualenv folder to docker's PATH
This is a place where using a Python virtual environment inside Docker can be useful. Copying a virtual environment normally is tricky since it needs to be the exact same filesystem path on the exact same Python build, but in Docker you can guarantee that.
(This is the same basic recipe #mpoisot describes in their answer and it appears in other SO answers as well.)
Say you're installing the psycopg PostgreSQL client library. The extended form of this requires the Python C development library plus the PostgreSQL C client library headers; but to run it you only need the PostgreSQL C runtime library. So here you can use a multi-stage build: the first stage installs the virtual environment using the full C toolchain, and the final stage copies the built virtual environment but only includes the minimum required libraries.
A typical Dockerfile could look like:
# Name the single Python image we're using everywhere.
ARG python=python:3.10-slim
# Build stage:
FROM ${python} AS build
# Install a full C toolchain and C build-time dependencies for
# everything we're going to need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
build-essential \
libpq-dev
# Create the virtual environment.
RUN python3 -m venv /venv
ENV PATH=/venv/bin:$PATH
# Install the Python library dependencies, including those with
# C extensions. They'll get installed into the virtual environment.
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# Final stage:
FROM ${python}
# Install the runtime-only C library dependencies we need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
libpq5
# Copy the virtual environment from the first stage.
COPY --from=build /venv /venv
ENV PATH=/venv/bin:$PATH
# Copy the application in.
COPY . .
CMD ["./main.py"]
If your application uses a Python entry point script then you can do everything in the first stage: RUN pip install . will copy the application into the virtual environment and create a wrapper script in /venv/bin for you. In the final stage you don't need to COPY the application again. Set the CMD to run the wrapper script out of the virtual environment, which is already at the front of the $PATH.
Again, note that this approach only works because it is the same Python base image in both stages, and because the virtual environment is on the exact same path. If it is a different Python or a different container path the transplanted virtual environment may not work correctly.
The docs on this explain exactly how to do this.
https://docs.docker.com/engine/userguide/eng-image/multistage-build/#before-multi-stage-builds
Basically you do exactly what you've said. The magic of multistage build feature though is that you can do this all from one dockerfile.
ie:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
This builds a go binary, then the next image runs the binary. The first image has all the build tools and the seccond is just a base linux machine that can run a binary.