GCC error in python virtual environment in docker - python

I have a docker file where i am creating a virtual environment in python 2.7.15 and installing all required python dependencies for my project.
Some of the dependencies require gcc to compile like pandas, lz4 .. are failing with below error,
pandas/io/sas/sas.c:4:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'gcc' failed with exit status 1
Even i tired installing python-devel , gcc also in the docker. But doesn't help.
RUN yum install -y python-pip python-devel gcc
Dockerfile:
FROM registry-access-redhat-com.repo.lab.pl.*-*.com/rhel7.5
# CONFIGURE YUM
RUN rm -f /etc/yum.repos.d/*
ADD resources/yum.repos.d/* /etc/yum.repos.d/
RUN echo "sslverify=false" >> /etc/yum.conf
# INSTALL REQUIRED SYSTEM PACKAGES
RUN yum install -y python-pip python-devel gcc && yum clean all && rm -rf /var/cache/yum
RUN yum install -y wget && yum clean all && rm -rf /var/cache/yum && wget http://repo.lab.pl.alcatel-lucent.com/eden-yum-releases/installation-packages-rpm/python-2.7.15-2.x86_64.rpm
RUN yum install -y python-2.7.15-2.x86_64.rpm && yum clean all && rm -rf /var/cache/yum
#DOWNLAOD LATEST PIP
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/c2/d7/90f34cb0d83a6c5631cf71dfe64cc1054598c843a92b400e55675cc2ac37/pip-18.1-py2.py3-none-any.whl
#INSTALL PIP ON PYTHON 2.7.15
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/pip-18.1-py2.py3-none-any.whl
# CREATE foo GROUP AND USER
RUN groupadd foo
RUN useradd -d /home/foo -ms /bin/bash -g foo foo
# SETUP BASHRC for foo user
COPY file/.bashrc /home/foo
COPY file/.bash_profile /home/foo
RUN chown foo:foo /home/foo/.bash_profile
RUN chown foo:foo /home/foo/.bashrc
# SET WORKING DIRECTORY TO /home/foo
WORKDIR /home/foo
#CREATE VIRTUAL ENVIRONMENT
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/e7/16/da8cb8046149d50940c6110310983abb359bbb8cbc3539e6bef95c29428a/setuptools-40.6.2-py2.py3-none-any.whl
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/7c/17/9b7b6cddfd255388b58c61e25b091047f6814183e1d63741c8df8dcd65a2/virtualenv-16.1.0-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/virtualenv-16.1.0-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/setuptools-40.6.2-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/virtualenv -p /usr/local/bin/python2.7 enet
RUN chown -R foo:foo /home/foo/enet
RUN export LD_LIBRARY_PATH=/usr/local/lib
RUN source /home/foo/enet/bin/activate
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/pip-18.1-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install scipy
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install openpyxl
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install confluent-kafka
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install cython
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install pandas
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install lz4
How can i resolve this issue.

I guess you should:
Install the python development files (as suggest in comment).
Use a -I switch in gcc command line to point the correct path for python include files.

Related

pip - is it possible to preinstall a lib, so that any requirements.txt that is run containing that lib won't need to re-build it?

So my scenario is that I'm trying to create a Dockerfile that I can build on my Mac for running Spacy in production. The production server contains a Nvidia GPU with CUDA. To get Spacy to use GPU, I need the lib cupy-cuda117. That lib won't build on my Mac because it can't find the CUDA GPU. So what I'm trying to do is create an image from the Linux server that has the CUDA GPU, that's already pre-build cupy-cuda117 on it. I'll then use that as the parent image for Docker, as all other libs in my requirements.txt will build on my Mac.
My goal at the moment is to build that lib into the server, but I'm not sure the right path forward. Is it sudo pip3 intall cupy-cuda117? Or should I create a venv, and pip3 install cupy-cuda117? Basically my goal is later to add all the other app code and full requirements.txt, and when pip3 install -r requirements.txt is run by Docker, it'll download/build/install everything, but not cupy-cuda117, because hopefully it'll see that it's already been built.
FYI the handling of using GPU on the prod server and CPU on the dev computer i've already got sorted, it's just the building of that one package I'm stuck on. I basically just need it not to try and rebuild on my Mac. Thanks!
FROM "debian:bullseye-20210902-slim" as builder
# install build dependencies
RUN apt-get update -y && apt-get install --no-install-recommends -y build-essential git locales \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN apt update -y && apt upgrade -y && apt install -y sudo
# Install Python 3.9 reqs
RUN sudo apt install -y --no-install-recommends wget libxml2 libstdc++6 zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
# Install Python 3.9
RUN wget --no-check-certificate https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz && \
tar -xf Python-3.9.1.tgz && \
cd Python-3.9.1 && \
./configure --enable-optimizations && \
make -j $(nproc) && \
sudo make altinstall && \
cd .. && \
sudo rm -rf Python-3.9.1 && \
sudo rm -rf Python-3.9.1.tgz
# Install CUDA
RUN wget --no-check-certificate https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run && \
sudo chmod +x cuda_11.7.1_515.65.01_linux.run && \
sudo ./cuda_11.7.1_515.65.01_linux.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-11.7 --samplespath=/usr/local/cuda --no-opengl-libs && \
sudo ln -s /usr/local/cuda-11.7 /usr/local/cuda && \
sudo rm -rf cuda_11.7.1_515.65.01_linux.run
## Add NVIDIA CUDA to PATH and LD_LIBRARY_PATH ##
RUN echo 'case ":${PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${PATH}" ] ; then\n\
PATH=/usr/local/cuda-11.7/bin\n\
else\n\
PATH=/usr/local/cuda-11.7/bin:$PATH\n\
fi\n\
esac\n\
case ":${LD_LIBRARY_PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${LD_LIBRARY_PATH}" ] ; then\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64\n\
else\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH\n\
fi\n\
esac\n\
export PATH LD_LIBRARY_PATH\n\
export GLPATH=/usr/lib/x86_64-linux-gnu\n\
export GLLINK=-L/usr/lib/x86_64-linux-gnu\n\
export DFLT_PATH=/usr/lib\n'\
>> ~/.bashrc
ENV PATH="$PATH:/usr/local/cuda-11.7/bin"
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64"
ENV GLPATH="/usr/lib/x86_64-linux-gnu"
ENV GLLINK="-L/usr/lib/x86_64-linux-gnu"
ENV DFLT_PATH="/usr/lib"
RUN python3.9 -m pip install -U wheel setuptools
RUN sudo pip3.9 install torch torchvision torchaudio
RUN sudo pip3.9 install -U 'spacy[cuda117,transformers]'
# set runner ENV
ENV ENV="prod"
CMD ["bash"]
My local Dockerfile is this:
FROM myacct/myimg:latest
ENV ENV=prod
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
COPY ./requirements /code/requirements
RUN pip3 install --no-cache-dir -r /code/requirements.txt
COPY ./app /code/app
ENV ENV=prod
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

Dockerfile installs newer python insteam of FROM source

I'm trying to make a new image for python dockerfile, but keeps installing Python 3.10 instead of Python 3.8.
My Dockerfile looks like this:
FROM python:3.8.16
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
## follow installation from xtb-python github:
ENV CONDA_DIR /opt/conda
RUN apt-get update && apt-get install -y \
wget \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda
ENV PATH=$CONDA_DIR/bin:$PATH
RUN conda config --add channels conda-forge \
&& conda install -y -c conda-forge ...
I don't know much about conda (but have to use it).
It is Conda messing with the my Python or I did it something wrong?

How to run Chromedriver as REGULAR user? (Dockerfile configuration)

I am desperately trying to containerise my web scraping app (in python). It is using Selenium.
I'm receiving an error which indicates that the chromedriver binary needs to run as a REGULAR user.
How can I configure this in a dockerfile?
The error Im getting is
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable may have wrong permissions. Please see https://chromedriver.chromium.org/home
But when I read the chromedriver site I can see the root cause is that it is running the driver as root, and it doesn't like that.
Here is my dockerfile already:
#FROM python:3.9-buster
FROM --platform=linux/amd64 python:3.10-buster
#FROM --platform=linux/arm64/v8 python:3.9-buster
# FROM --platform=linux/amd64 python:3.9
# FROM selenium/node-chrome
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
RUN apt-get update \
&& apt-get -y install gcc make \
&& rm -rf /var/lib/apt/lists/*s
# RUN apt-get update
RUN apt-get install -y xvfb
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils
RUN apt-get install -y chromium
## RUN apt-get install -y chromium-browser
RUN apt-get install chromium-driver
# RUN apt-get install -y google-chrome-stable
# install chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
# RUN dpkg -i google-chrome-stable_current_amd64.deb --fix-missing; apt-get -fy install
RUN wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get update && apt-get install -y wget bzip2 libxtst6 packagekit-gtk3-module libx11-xcb-dev libdbus-glib-1-2 libxt6 libpci-dev && rm -rf /var/lib/apt/lists/*
RUN export PATH=$PATH:'/usr/local/bin/chromedriver'
#download and install chrome
RUN apt update -y
RUN apt install -y google-chrome-stable
RUN apt-get install -y google-chrome-stable
RUN python3 --version
RUN pip3 --version
RUN pip install --no-cache-dir --upgrade pip
RUN apt-get install -y libglib2.0-0 libnss3 libgconf-2-4 libfontconfig1
# COPY chromedriver "/usr/local/bin"
#install python dependencies
COPY requirements.txt requirements.txt
RUN pip install -r ./requirements.txt
#some envs
ENV APP_HOME /app
#set workspace
WORKDIR ${APP_HOME}
RUN chmod -x "/usr/bin/google-chrome"
RUN chmod -x "/usr/local/bin/chromedriver"
#copy local files
COPY . ${APP_HOME}
CMD ["python", "/app/main.py"]

Install talib on docker

FROM python:3
USER root
RUN apt-get update
RUN apt-get -y install locales && \
localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure --prefix=/usr && \
make && \
make install
RUN pip install TA-Lib
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm
ADD . /code
WORKDIR /code
RUN apt-get install -y vim less
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install -r requirements.txt
version: "3"
services:
python3:
restart: always
build: .
container_name: "binancepython3"
working_dir: /root/
tty: true
volumes:
- ./opt:/root/opt
pandas
requests
ccxt == 1.81.77
I'm trying to install talib on docker, but I got an error like below, could you teach me how to solve it?
Is the problem caused by the environment? Should I use anaconda instead of python:3?
#7 3.276 configure: error: cannot guess build type; you must specify one
------
executor failed running [/bin/sh -c wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && tar -xvzf ta-lib-0.4.0-src.tar.gz && cd ta-lib/ && ./configure --prefix=/usr && make && make install]: exit code: 1
ERROR: Service 'python3' failed to build : Build failed
add:
./configure --build=aarch64-unknown-linux-gnu
more info: https://stackoverflow.com/a/68025766/1145929

Error as:-ModuleNotFoundError: No module named ‘pyspark’ While running Pyspark in docker

Getting the error as:
Traceback (most recent call last): File “/opt/application/main.py”,
line 6, in
from pyspark import SparkConf, SparkContext ModuleNotFoundError: No module named ‘pyspark’
While running pyspark in docker.
And my dockerfile is as follows:
FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36
RUN yum -y install wget
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN pip3.6 install numpy
RUN pip3.6 install pandas
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
RUN yum -y install java-1.8.0-openjdk
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]
You forgot to install pyspark in your dockerfile.
FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36
RUN yum -y install wget
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN pip3.6 install numpy
RUN pip3.6 install pandas
RUN pip3.6 install pyspark # add this line.
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
RUN yum -y install java-1.8.0-openjdk
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]
Edit: dockerfile improvment:
FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36 wget java-1.8.0-openjdk # you could install python36 and wget in once
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py \
&& python get-pip.py \
&& pip3.6 install numpy==1.19 pandas==1.1.5 pyspark==3.0.2 # you should also pin the version you need, pandas 1.2.x does not support python 3.6
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]

Categories

Resources