How can I install python packages in Docker? - python

I am trying to run a model written in Python and packaged in Docker. The model is available here: https://github.com/pennsignals/chime_sims
The model has a requirements.txt file and a number of python files. Requirements file is supposed to install a number of python packages, including configargparse. The docker file runs, but when it gets to the python code that uses configargparse, I get an error that such a package does not exist.
I added a line that I think should install it:
RUN pip -m install --upgrade configargparse
and then I import it in the python code. But I still get an error stating that no such package exits.
Here is the Docker code:
FROM continuumio/miniconda3
WORKDIR /chime_sims
RUN conda update -yq -n base -c defaults conda
RUN conda create -yq -n chime_sims python=3.7 pip pandas matplotlib scipy numpy seaborn
COPY ./requirements.txt .
FROM python
RUN python -m pip install --upgrade pip
RUN pip -m install --upgrade configargparse
RUN pip freeze > requirements.txt .
RUN py -m pip install -r requirements.txt
And here is the list from requirements.txt:
ConfigArgParse
configargparse
gitpython
seaborn
numpy
pandas
gvar
lsqfit
I suspect I'll have problems installing gvar and lsqfit if I ever get configargparse to work. I had trouble installing those packages outside of docker, but it finally worked.
Any ideas on how I can fix it?
thank you,
i.

Related

Great expectations installation to AWS EMR

I tried to use great expectations for data quality purpose
I am running my jobs in AWS EMR cluster and I am trying to launch great expectations job on AWS EMR as well
I have bootstrap script for installation dependencies on a cluster. It looks like this
#!/bin/bash
sudo yes | sudo yum install python3-devel
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install cython
sudo python3 -m pip install boto3==1.26.37
sudo python3 -m pip install great-expectations==0.15.36
I saw that all dependencies was installed correctly based on log outputs, but then job started I got the following error
ImportError: this version of pandas is incompatible with numpy < 1.17.3
your numpy version is 1.16.5.
Please upgrade numpy to >= 1.17.3 to use this pandas version
I tried to uninstall numpy and install it manually via pip in bootstrap script like this but it didn't help
sudo python3 -m pip uninstall --yes numpy
I don't understand why it happens
sudo python3 -m pip install numpy==1.17.3
Usage of EMR of newer version solved problem.

Dockerfile: pip install fails with requirements.txt (but succeeds with individual packages)

I'm trying to install some packages in a docker container, and there is a problem when installing from a requirements.txt file. This line:
RUN python3.8 -m pip install -r requirements.txt
fails with the error:
...
Collecting torch
Downloading torch-1.8.0-cp38-cp38-manylinux1_x86_64.whl (735.5 MB)
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
torch from https://files.pythonhosted.org/packages/89/c1/72e9050d3e31e4df983f6e06799a1a4c896427c1e5645a6d810940944b60/torch-1.8.0-cp38-cp38-manylinux1_x86_64.whl#sha256=fa1e391cca3937d5dea31f31a1a80a01bd4a8062c039448c254bbf5a58eb0787 (from -r requirements.txt (line 3)):
Expected sha256 fa1e391cca3937d5dea31f31a1a80a01bd4a8062c039448c254bbf5a58eb0787
Got d5466637c17c3ae0c81c00d93a0b7c8d8428cfd216f54953a11d0788ea7b74fb
The requirements.txt file is the following:
numpy
opencv-python
torch
However, when installing these packages one at a time everything works fine:
RUN python3.8 -m pip install numpy
RUN python3.8 -m pip install opencv-python
RUN python3.8 -m pip install torch
Any ideas how to solve this?
*** EDIT ***
Dockerfile up to that point:
FROM public.ecr.aws/lambda/python:3.8
COPY requirements.txt ./
You could try a couple of things. Depending on your base image, you could run pip install in this way:
RUN pip install -r requirements.txt
Another option would be to change your requirements.txt such that it is version controlled. Then you can be sure you have compatible versions and is a good practice in general. Eg.:
torch==1.8.0
Try to run Docker again with without caches:
docker build -no-cache
Or you could check this answer:
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. when updating Django

Azure ML Environment: install a package from a file?

I'm building an Environment object in the Azure Machine Learning service using the Python SDK, and everything is working fine except one Python package that installs from a URL. I'm wondering how to deal with it. This works:
my_env = Environment.from_conda_specification("trident", './environment.yml')
..but the Docker build fails on one of the packages, which installs from a file.
[91mERROR: Could not find a version that satisfies the requirement detectron2==0.1.3+cu101 (from -r /azureml-environment-setup/condaenv.s5fi23rw.requirements.txt (line 7)) (from versions: none)
[0m[91mERROR: No matching distribution found for detectron2==0.1.3+cu101 (from -r /azureml-environment-setup/condaenv.s5fi23rw.requirements.txt (line 7))
[0m[91m
Here's how I would install that package manually:
python -m pip install detectron2 -f / https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
and I have another package that should install from github, like this:
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
I'm pretty ignorant about yaml files: is there a way to include complicated syntax like that in a yaml file?
I'm hoping to not have to re-build the environment locally and install from it (which is an alternative option), because I would have to reinstall CUDA to do so.
Thanks
Updating because who likes downvotes and someone might find this useful.
AML uses same spec for installing Conda packages as per: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually
OP could have applied something like:
# run: conda env create --file environment.yml
name: test-env
dependencies:
- python>=3.5
- anaconda
- pip
- pip:
# works for regular pip packages
- docx
- gooey
# for github
- git+https://github.com/facebookresearch/detectron2.git
# and for wheels
- https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/index.html
However, I found it much easier to use a Docker image to load Detectron2 onto a container for AzureML because of CUDA/CuDNN compatibility fun.
FROM mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
RUN apt update && apt install git -y && rm -rf /var/lib/apt/lists/*
RUN /opt/miniconda/bin/conda update -n base -c defaults conda
RUN /opt/miniconda/bin/conda install -y cython=0.29.15 numpy=1.18.1
# Install cocoapi, required for drawing bounding boxes
RUN git clone https://github.com/cocodataset/cocoapi.git && cd cocoapi/PythonAPI && python setup.py build_ext install
RUN pip install --user tensorboard cython
RUN pip install --user torch==1.5+cu101 torchvision==0.6+cu101 -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install azureml-defaults
RUN pip install azureml-dataprep[fuse]
RUN pip install pandas pyarrow
RUN pip install opencv-python-headless
RUN pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/index.html```

Pip install shap==0.25.2 fails with a weird error

I have a requirement.txt which includes a line for SHAP as
shap==0.25.2
This requirement.txt gets copied to a dockerfile and when I run
pip install - r requirements.txt
It fails with error
Files/Directory not found in /tmp/pip-install-<>/shap/pip-egg-info
I am able to install shap==0.25.2 separately.
Even in the dockerfile, it remove the shap from requirement.txt and added it separately as RUN pip install shap==0.25.2 it works.
What am I missing or doing wrong here?
ENV
Python version 3.6.4
Docker version 1.18

pip fails to install packages from requirements.txt

I am trying to install a python software using the requirements file.
>> cat requirements.txt
Cython==0.15.1
numpy==1.6.1
distribute==0.6.24
logilab-astng==0.23.1logilab-common==0.57.1
netaddr==0.7.6
numexpr==2.0.1
ply==2.5
pycallgraph==0.5.1
pyflowtools==0.3.4.1
pylint==0.25.1
tables==2.3.1
wsgiref==0.1.2
So I create a virtual environment
>> mkvirtualenv parser
(parser)
>> pip freeze
distribute==0.6.24
wsgiref==0.1.2
(parser)
>> pip install -r requirements.txt
... and then I packages downloaded but not installed with errors: http://pastie.org/4079800
(parser)
>> pip freeze
distribute==0.6.24
wsgiref==0.1.2
Surprisingly, if I try to manually install each package, they install just fine.
For instance:
>> pip install numpy==1.6.1
(parser)
>> pip freeze
distribute==0.6.24
wsgiref==0.1.2
numpy==1.6.1
I am lost. What is going on?
PS: I am using pip v1.1 and python v2.7.2 with virtualenv and virtualenvwrapper
It looks like the numexpr package has an install-time dependency on numpy. Pip makes two passes through your requirements: first it downloads all packages and runs each one's setup.py to get its metadata, and then it installs them all in a second pass.
So, numexpr is trying to import from numpy in its setup.py, but when pip first runs numexpr's setup.py, it has not yet installed numpy.
This is also why you don't see this error when you install the packages one by one: if you install them one at a time, numpy will be fully installed in your environment before you pip install numexpr.
The only solution is to install pip install numpy before you ever run pip install -r requirements.txt -- you won't be able to do this in a single command with a single requirements.txt file.
More info here: https://github.com/pypa/pip/issues/25
I come across with a similar issue and I ended up with the below:
cat requirements.txt | sed -e '/^\s*#.*$/d' -e '/^\s*$/d' | xargs -n 1 python -m pip install
That will read line by line the requirements.txt and execute pip. I cannot find from where I got the answer properly, so apologies for that, but I found some justification below:
How sed works: https://howto.lintel.in/truncate-empty-lines-using-sed/
Another similar answer but with git: https://stackoverflow.com/a/46494462/7127519
Hope this help with alternatives.
This is quite annoying sometimes, a bug of pip.
When you run pip install package_name the pip will first run pip check to the target package, and install all the required package for the dependency(target package).
But when you run pip install -r requirements.txt pip will try to directly install all the required packages listed one by one from top to bottom. Sometimes the dependency is listed above the package it depend upon.
The solution is simple:
1.pip install package_name
2.simply put the error package to the bottom of the requirements.txt
3.sometimes a particular version of the package is not be able to be installed,just install the newest version of it and update the data in requirements.txt

Categories

Resources