Python(pip) not install ElasticSearch library on kubernetes - python

I'm writing a python application to push some results onto elasticsearch.
I've written a Dockerfile to build it & am deploying it over Kubernetes.
Things seems to be working without any problem on my local machine, when I execute docker run.
The application is running and it is pushing data onto ElasticSearch.
But when I run it on K8S, I'm getting below error:
Traceback (most recent call last):
File "application.py", line 2, in <module>
from elasticsearch import Elasticsearch
ModuleNotFoundError: No module named 'elasticsearch'
I'm installing elasticsearch, using pip.
Dockerfile:
FROM python:3.7.3-alpine
RUN apk update && apk upgrade && apk add gcc libc-dev g++ libffi-dev libxml2 unixodbc-dev mariadb-dev postgresql-dev \
python-dev vim
RUN addgroup -S -g 1000 docker \
&& adduser -D -S -h /var/cache/docker -s /sbin/nologin -G docker -u 1000 docker \
&& chown docker:docker -R /usr/local/lib/python3.7/site-packages/
WORKDIR /app/
COPY application.py /app/
COPY lib.txt /app/
RUN chown docker:docker -R /app/
USER docker
# Install the dependencies
RUN ["pip", "install", "-r", "lib.txt", "--user"]
ENV PYTHONPATH=/usr/local/lib/python2.7/site-packages
RUN echo $PYTHONPATH
CMD [ "python", "application.py"]
lib.txt
Flask==1.0.2
prometheus_client>=0.6.0
requests>=2.21.0
six>=1.12.0
# Elasticsearch 7.x
elasticsearch>=7.0.0,<8.0.0
pyodbc
As suggested in one answer, I'm also setting PYTHONPATH in Dockerfile.
Any suggestions, what am I missing?
Example code here.
Thanks

Try with this Dockerfile:
FROM python:3.7.3-alpine
RUN apk update && apk upgrade && apk add gcc libc-dev g++ libffi-dev libxml2 unixodbc-dev mariadb-dev postgresql-dev \
python-dev vim
# Install the dependencies
RUN pip install --upgrade pip
RUN mkdir /app
COPY lib.txt /app/lib.txt
RUN pip install -r lib.txt
RUN addgroup -S -g 1000 docker \
&& adduser -D -S -h /var/cache/docker -s /sbin/nologin -G docker -u 1000 docker
WORKDIR /app
COPY application.py /app/
RUN chown docker:docker -R /app/
USER docker
CMD [ "python", "application.py"]
Changes:
Updated pip before install dependencies. This remove some warnings into my containers, and keep pip package with the last version when building the image.
Installed the pip packages as part of the system, when still root user is executing.
Removed PYTHONPATH, which seems pointing to wrong place.
Removed unnecessary owner changing.

Related

How to use GITHUB_TOKEN in pip's requirements.txt without setting it as env variable in Dockerfile?

I have a private repos that can be installable via python's pip:
requirements.txt
git+https://${GITHUB_TOKEN}#github.com/MY_ACCOUNT/MY_REPO.git
And a Dockerfile:
Dockerfile
FROM python:3.8.11
RUN apt-get update && \
apt-get -y install gcc curl && \
rm -rf /var/lib/apt/lists/*
ARG GITHUB_TOKEN
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
It worked perfectly when i build up an image:
$ docker build . --build-arg GITHUB_TOKEN=THIS_IS_MY_GITHUB_TOKEN -t wow/my_app:latest
But when I inspected image, it shows GITHUB_TOKEN in Cmd section:
$ docker image inspect wow/my_app:latest
...
"ContainerConfig": {
...
"Cmd": [
"|1",
"GITHUB_TOKEN=THIS_IS_MY_GITHUB_TOKEN", # Here!
"/bin/sh",
"-c",
"pip install -r /tmp/requirements.txt"
],
...
},
...
I think this could lead to a security problem. How can I solve this so that anything credential info not appear in docker inspect?
If you build your image using BuildKit, you can take advantage of Docker build secrets.
You would structure your Dockerfile something like this:
FROM python:3.8.11
RUN apt-get update && \
apt-get -y install gcc curl && \
rm -rf /var/lib/apt/lists/*
COPY ./requirements.txt /tmp/requirements.txt
RUN --mount=type=secret,id=GITHUB_TOKEN \
GITHUB_TOKEN=$(cat /run/secrets/GITHUB_TOKEN) \
pip install -r /tmp/requirements.txt
And then if you have a GITHUB_TOKEN environment variable in your local environment, you could run:
docker buildx build --secret id=GITHUB_TOKEN -t myimage .
Or if you have the value in a file, you could run:
docker buildx build \
--secret id=GITHUB_TOKEN,src=github_token.txt \
-t myimage .
In either case, the setting will not be baked into the resulting image. See the linked documentation for more information.

Django server does not start, the site does not work when starting dockerfile

I'm training in the dockerfile assembly, I can't understand why it doesn't work.
Python django project GitHub:
[https://github.com/BrianRuizy/covid19-dashboard][1]
At the moment I have such a dockerfile, who knows how to create docker files, help me figure it out and tell me what my mistake is.
FROM python:3.8-slim
RUN apt-get update && \
apt-get install -y --no-install-recommends build-essential
ENV PYTHONUNBUFFERED=1
ADD requirements.txt /
RUN pip install -r /requirements.txt
EXPOSE 8000
CMD ["python", "manage.py", "runserver", "127.0.0.1:8000"]
You never coppied your files to the container an example would look like something like this. This also prevents as running as superuser
FROM python:3.8-slim
#INSTALL REQUIREMENTS
RUN apt-get update
RUN apt-get -y install default-libmysqlclient-dev gcc wget
# create the app user
RUN useradd -u 1000 -ms /bin/bash app && mkdir /app && chown -R app:app /app
# copy the requirements
COPY --chown=app:app ./requirements.txt /app/requirements.txt
COPY --chown=app:app ./deploy-requirements.txt /app/deploy-requirements.txt
WORKDIR /app
# and install them
RUN pip install -r requirements.txt && pip install -r deploy-requirements.txt
#####
#
# everything below runs as the 'app' user for security reasons
#
#####
USER app
#COPY APP
COPY --chown=app:app . /app
WORKDIR /app
#RUN
ENTRYPOINT gunicorn -u app -g app --bind 0.0.0.0:8000 PROJECT.asgi:application -k uvicorn.workers.UvicornWorker

how to copy flask dependencies from one stage to the next one in a Dockerfile?

I am learning about Docker and I have a Dockerfile with a simple app such as this:
FROM python:3.8-alpine
WORKDIR /code
ENV FLASK_APP App.py
ENV FLASK_RUN_HOST 0.0.0.0
ENV FLASK_RUN_PORT :3001
RUN apk update \
&& apk add --virtual build-deps gcc python3-dev musl-dev \
&& apk add --no-cache mariadb-dev
COPY ./myapp/requirements.txt requirements.txt
RUN pip install --no-cache-dir -vv -r requirements.txt
ADD ./myapp .
EXPOSE 3001
CMD ["flask", "run"]
I want to use multistage to have a smaller image, so checking this https://pythonspeed.com/articles/multi-stage-docker-python/ I have change my Dockerfile to this:
FROM python:3.8-alpine as builder
COPY ./myapp/requirements.txt requirements.txt
RUN apk update \
&& apk add --virtual build-deps gcc python3-dev musl-dev \
&& apk add --no-cache mariadb-dev
RUN pip install --user -r requirements.txt
FROM python:3.8-alpine
ADD ./myapp .
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local:$PATH
ENV FLASK_APP App.py
ENV FLASK_RUN_HOST 0.0.0.0
ENV FLASK_RUN_PORT 3000
CMD ["python", "-m", "flask", "run"]
But when running the container I get an error telling me the MySQL dp dependecy is not installed (it is in requirements.txt), but it is within the requirements.txt file and in the first Dockerfile works, so I do not know what I am missing as if I get it right the COPY step in the second stage should copy the dependencies installed in the first stage right?. This is the output I get when trying to spin the container:
Traceback (most recent call last):
File "/root/.local/lib/python3.8/site-packages/MySQLdb/__init__.py", line 18, in <module>
from . import _mysql
ImportError: Error loading shared library libmariadb.so.3: No such file or directory (needed by /root/.local/lib/python3.8/site-packages/MySQLdb/_mysql.cpython-38-x86_64-linux-gnu.so)
apk add --no-cache mariadb-dev also install MariaDB libraries, which you don't install in the final image. Their lack is the cause of the errors you get.
Is mysql getting installed from requirements.txt or is it installed by apk MariahDb? If the latter then that’s what is missing in the second image; it’s not pip installed —-user under .local it’s installed systemwide in the first image but not in the second.

Dockerfile for Python DJango failing on COPY

I have a Dockerfile that fails on build with the error;
COPY failed: stat /var/lib/docker/tmp/docker-builder476469130/requirements.txt: no such file or directory
The error occurs on the COPY line for the requirments.txt file. I use a pretty standard Dockerfile;
FROM python:3.6.7-slim
# Version: 1.4
# Dockerfile to build the coroner container.
# Install Python and Package Libraries
RUN apt-get update && apt-get upgrade -y && apt-get autoremove && apt-get autoclean
RUN apt-get install -y \
libffi-dev \
libssl-dev \
default-libmysqlclient-dev \
libxml2-dev \
libxslt-dev \
libjpeg-dev \
libfreetype6-dev \
zlib1g-dev \
net-tools \
nano
ARG PROJECT=coroner
ARG PROJECT_DIR=/var/www/${PROJECT}
WORKDIR $PROJECT_DIR
ENV PYTHONUNBUFFERED 1
RUN mkdir -p $PROJECT_DIR
COPY requirements.txt $PROJECT_DIR/requirments.txt
RUN pip install --upgrade pip
RUN pip install -r $PROJECT_DIR/requirements.txt
EXPOSE 8888
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8888"]
I am bashing my head against this and have been praying at the church of google for a while now. I have checked the context and it seems to be correct. my build command is;
sudo docker build -t coroner:dev .
Docker Version Docker version 19.03.6, build 369ce74a3c
Can somebody put me out of my misery, please?
You've got a typo in 'requirements.txt' in the destination, you've put 'requirments.txt'.
However, because you're simply copying this to where you've specified your WORKDIR, you can just do:
COPY requirements.txt .
The file will then be copied into your CWD.

How minimize python3.7 application docker image

I would like to dockerize python program with this Dockerfile:
FROM python:3.7-alpine
COPY requirements.pip ./requirements.pip
RUN python3 -m pip install --upgrade pip
RUN pip install -U setuptools
RUN apk update
RUN apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
python3 -m pip install -r requirements.pip --no-cache-dir && \
apk --purge del .build-deps
ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}
COPY app .
ENTRYPOINT [ "python3", "run.py" ]
and this is my requirements.pip file:
pysher~=0.5.0
redis~=2.10.6
flake8~=3.5.0
pandas==0.23.4
Because of pandas, the docker image has 461MB, without pandas 131MB.
I was thinking how to make it smaller, so I build binary file from my applicaiton using:
pyinstaller run.py --onefile
It build 38M binary file. When I run it, it works fine. So I build docker image from Dockerfile:
FROM alpine:3.4
ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}
COPY app/dist/run run
ENTRYPOINT [ "/bin/sh", "/app/run" ]
Basicaly, just copied my run binary file into /app directory. It looks fine, image has just 48.8MB. When I run the container, I receive error:
$ docker run --rm --name myapp myminimalimage:latest
/app/run: line 1: syntax error: unexpected "("
Then I was thinking, maybe there is problem with sh, so I installed bash, so I added 3 lines into Dockerfile:
RUN apk update
RUN apk upgrade
RUN apk add bash
Image was built, but when I run it there is error again:
$ $ docker run --rm --name myapp myminimalimage:latest
/app/run: /app/run: cannot execute binary file
My questions:
Why is the image in the first step so big? Can I minimize the size
somehow ? Like choose what to install from pandas package?
Why is my binary file working fine on my system (Kubuntu 18.10) but I
cant run it from alpine:3.4, should I use another image or install
something to run it?
What is the best way to build minimalistic image with my app? One of
mentioned above or is there other ways?
On sizes, make sure you always pass --no-cache-dir when using pip (you use it once, but not in other cases). Similarly, combine uses of apk and make sure the last step is to clear the apk cache so it never gets frozen in an image layer, e.g. replace your three separate RUNs with RUN apk update && apk upgrade && apk add bash && rm -rf /var/cache/apk/*; achieves the same effect in a single layer, that doesn't keep the apk cache around.
Example:
FROM python:3.7-alpine
COPY requirements.pip ./requirements.pip
# Avoid pip cache, use consistent command line with other uses, and merge simple layers
RUN python3 -m pip install --upgrade --no-cache-dir pip && \
python3 -m pip install --upgrade --no-cache-dir setuptools
# Combine update and add into same layer, clear cache explicitly at end
RUN apk update && apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
python3 -m pip install -r requirements.pip --no-cache-dir && \
apk --purge del .build-deps && rm -rf /var/cache/apk/*
Don't expect it to do much (you already used --no-cache-dir on the big pip operation), but it's something. pandas is a huge monolithic package, dependent on other huge monolithic packages; there is a limit to what you can accomplish here.
Keep in mind that if you don't use Alpine, you won't need a compiler, since you can just use wheels. This makes everything simpler... e.g. you don't need to install and then uninstall compilers. Slightly bigger, but only slightly.
(See here for more about why I'm not a fan of Alpine Linux: https://pythonspeed.com/articles/base-image-python-docker-images/)

Categories

Resources