ModuleNotFoundError: No module named 'pymongo' with Docker and Airflow - python

I'm currently using Docker with puckel/Airflow to run Airflow
I installed pymongo successfully but when calling the import of pymongo, it still fails to find the module.
I added below codes into the Dockerfile above the other RUN before rebuilding
1st attempt
RUN pip install pymongo
2nd attempt
RUN pip install pymongo -U
I built them with
docker build --rm -t puckel/docker-airflow .
Pymongo does install successfully but when I do run the webserver with a simple import of dags I still get the error
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/mongo_hook.py", line 22, in <module>
from pymongo import MongoClient
ModuleNotFoundError: No module named 'pymongo'

I solved it by copying my requirements.txt file in the root.
In fact in puckel/docker-airflow's Dockerfile it executes entrypoint.sh witch pip install packages from /requirements.txt if the file exists. So we are sure our packages are installed.
You can add in the Dockerfile :
COPY ./requirements.txt /requirements.txt
Or
in a docker-compose.yml add a volume to your container:
volumes:
- ./requirements.txt:/requirements.txt

I ran into this same symptom. I fixed it by adding && pip install pymongo \ to puckel/airflow:Dockerfile, near the other pip install commands and rebuilding the image.
Here's what I tried that did not fix the problem:
Adding pymongo to requirements.txt and mounting the file. I verified that the module was loaded as expected via log messages in docker-compose startup and by connecting to my worker and webserver instances and seeing that the module was available in the Python environment using help("modules") but the module was not available to my Airflow DAGs
Adding --build-arg PYTHON_DEPS="pymongo" as a parameter to my docker build command. (Note: for modules other than pymongo this step fixed module not found errors, but not for pymongo. In fact, I did not see any log record of pymongo being installed during docker build when I set this)

Could you, try
RUN pip3 install pymongo
and report back. It might happen if you have multiple versions of Python. pip3 will make sure you are installing the module for Python 3.x.

When you built the puckel/Airflow Docker image, did you add mongo to AIRFLOW_DEPS in your build arguments?
e.g. docker build --rm --build-arg AIRFLOW_DEPS="mongo" -t puckel/docker-airflow .

I has similar experience for mysql hook and solved.
My experience is check if the module could be imported in pure python enviroment first.
Some time, the pack you installed is not the airflow wanted.
For your case, you could check in following step.
1. jump into the docker container
docker exec -it /bin/bash
2. launch python assuming you has use python 3.X version
python
3. check the module in python enviromnet
import pymonggo
# other test script if you to check.
if you facing error, pls solve it in python environment first and then go back to airflow.
=======================================================
I Just double checked the airflow github source code and realized that mongo db is not default hook in original source code.
In case case, you might need go further into pymongo package to study how to install & compile it and related dependence.

Related

sh: flake8: not found, though I installed it with python pip

Here I'm going to use flake8 within a docker container. I installed flake8 using the following command and everything installed successfully.
$ sudo -H pip3 install flake8 // worked fine
and it's location path is-
/usr/local/lib/python3.8/dist-packages
Then I executed the following command but result was not expected.
$ sudo docker-compose run --rm app sh -c "flake8"
It says,
sh: flake8: not found
May be, the flake8 package is not installed in the correct location path. Please help me with this issue.
When you docker-compose run a container, it is in a new container in a new isolated filesystem. If you ran pip install in a debugging shell in another container, it won't be visible there.
As a general rule, never install software into a running container, unless it's for very short-term debugging. This will get lost as soon as the container exits, and it's very routine to destroy and recreate containers.
If you do need a tool like this, you should install it in your Dockerfile instead:
FROM python:3.10
...
RUN pip3 install flake8
However, for purely developer-oriented tools like linters, it may be better to not install them in Docker at all. A Docker image contains a fixed copy of your application code and is intended to be totally separate from anything on your host system. You can do day-to-day development, including unit testing and style checking, in a non-Docker virtual environment, and use docker-compose up --build to get a container-based integration-test setup.

Airflow: Module not found

I am trying to schedule a python script in airflow.
When the script start to execute it fails to find specific mysql module.
I already installed mysql-connector.
So it should not be a problem.
Please help me here.
Since you are running airflow in a docker image, the python lib you use should be install in the docker image.
For example, if you are using the image apache/airflow, you can create a new docker image based on this image, with all the libs you need:
Dockerfile:
FROM apache/airflow
RUN pip install mysql-connector
Then you build the image:
docker build -t my_custom_airflow_image <path/to/Dockerfile>
Then you can replace apache/airflow in your docker-compose file by my_custom_airflow_image, and it will work without any problem.

Can not add python lib to existing Docker image on Ubuntu server

Good day,
I'm trying to deploy Telegram bot on AWS Ubuntu server. But I can not run application because server says (when i run docker-compose up):
there is no name: asyncpg
However I installed it manually on server through
pip3 install asyncpg
and I checked later it is in the "packages" folder.
However, I sort of understand where problem is from. When I first tun
sudo docker-compose up
It used this file:
My Dockerfile:
FROM python:3.8
WORKDIR /src
COPY requirements.txt /src
RUN pip install -r requirements.txt
COPY . /src
Where requirements.txt lacked this library. I edited requirements with
nano
and tried to run:
docker-compose up
again, but i again run into similar problem that
there is no asyncpg package
So as I understand docker-compose up uses already created image where there is no such package. I tried different solutions from SOF like build and > freeze but nothing helped. Probably because I dont quite understand what im doing, Im beginner at programming and python.
How can I add this package to existing docker image?
So after you have added the library package manually on the server, to save back the changes made into the docker image you would need to commit the running docker container using the command,
docker commit <container-id> <image-name>
Let's take an example.
you have an image, application
you run the docker image and get back container-id say 1b390cd1dc2d.
Now, you can go into the running container using the command -
docker exec -it 1b390cd1dc2d /bin/bash
Next, install the package -
pip3 install asyncpg
Now exit from the running container exit
Use the first command shared to update the image like below
docker commit 1b390cd1dc2d application
This updates the image by adding the required library into your image

How to add Python libraries to Docker image

Today I started working with Docker. So please bear with me. I'm not even sure if the title makes sense. I just installed Tensorflow using Docker and wanted to run a script. However, I got the following error saying that Matplotlib is not installed.
Traceback (most recent call last):
File "tf_mlp_v3.py", line 3, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'
I used the following command to install Tensorflow
docker pull tensorflow/tensorflow:latest-gpu-jupyter
How can I now add other python libraries such as Matplotlib to that image?
To customize an image you generally want to create a new one using the existing image as a base. In Docker it is extremely common to create custom images when existing ones don't quite do what you want. By basing your images off public ones you can add your own customizations without having to repeat (or even know) what the base image does.
Add the necessary steps to a new Dockerfile.
FROM tensorflow/tensorflow:latest-gpu-jupyter
RUN <extra install steps>
COPY <extra files>
RUN and COPY are examples of instructions you might use. RUN will run a command of your choosing such as RUN pip install matplotlib. COPY is used to add new files from your machine to the image, such as a configuration file.
Build and tag the new image. Give it a new name of your choosing. I'll call it my-customized-tensorflow, but you can name it anything you like.
Assuming the Dockerfile is in the current directory, run docker build:
$ docker build -t my-customized-tensorflow .
Now you can use my-customized-tensorflow as you would any other image.
$ docker run my-customized-tensorflow
Add this to your Dockerfile after pulling the image:
RUN python -m pip install matplotlib
There are multiple options:
Get into the container and install dependencies (be aware that this changes will be lost when recreating the container):
docker exec <your-container-id> /bin/bash
That should open an interactive bash. Then install the dependencies (pip or conda).
Another alternative, is to add it during build time (of the image). That is adding the instruction RUN into a Dockerfile
All dependencies are installed using python default tools (i.e: pip, conda)
As an alternative you can use '--user' to store the packages mounted folders
mkdir /path/to/local
mkdir /path/to/cache
Add these option to the docker command
--mount type=bind,source=/path/to/local,target=/.local --mount type=bind,source=/path/to/cache,target=/.cache
Then you can install packages using
pip install --user pandas
The packages will then be persistent without having to build and restart you docker every time.

Installing and Using Python Modules in a Docker Container

I am new to using docker containers. I have successfully created a new docker container but am now having trouble installing and then using python modules in it.
I entered into the already running container with the following command:
$docker exec -it lizzie /bin/bash
This worked. I also managed to install the module of interest to me with the following command:
$pip install pysnmp
I cloned my git repository, entered the local repo, and attempted to run a script that utilized the module pysnmp. The following error was returned:
ImportError: No module named pysnmp
I reinstalled the module to ensure that it had installed correctly; all requirements were already satisfied. The two folders currently in the docker are a folder called "anaconda-ks.cfg" which I can't enter and the repo. I feel like this has something to do with the path the module was installed in but I'm not sure where I should be installing it or how to do so. Thanks in advance for the help!

Categories

Resources