CI/CD tests involving pyspark - JAVA_HOME is not set

CI/CD tests involving pyspark - JAVA_HOME is not set - python

I am working on a project which uses pyspark, and would like to set up automated tests.
Here's what my .gitlab-ci.yml file looks like:
image: "myimage:latest"
stages:
- Tests
pytest:
stage: Tests
script:
- pytest tests/.
I built the docker image myimage using a Dockerfile such as the following (see this excellent answer):
FROM python:3.7
RUN python --version
# Create app directory
WORKDIR /app
# copy requirements.txt
COPY local-src/requirements.txt ./
# Install app dependencies
RUN pip install -r requirements.txt
# Bundle app source
COPY src /app
However, when I run this, the gitlab CI job errors with the following:
/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py:95: in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
E Exception: Java gateway process exited before sending the driver its port number
------------------------------- Captured stderr --------------------------------
JAVA_HOME is not set
I understand that pyspark requires me to have JAVA8 or higher installed on my computer. I have this set up alright locally, but...what about during the CI process? How can I install Java so it works?
I have tried adding
RUN sudo add-apt-repository ppa:webupd8team/java
RUN sudo apt-get update
RUN apt-get install oracle-java8-installer
to the Dockerfile which created the image, but got the error
/bin/sh: 1: sudo: not found
.
How can I modify the Dockerfile so that tests using pyspark will work?

Solution that worked for me: add
RUN apt-get update
RUN apt-get install default-jdk -y
before
RUN pip install -r requirements.txt
It then all worked as expected with no further modifications needed!
EDIT
To make this work, I've had to update my base image to python:3.7-stretch

Write in your .bash_profile:
export JAVA_HOME=(the home directory in your jdk i.e. /Library/Java/JavaVirtualMachines/[yourjdk]/Contents/Home)

Related

how to successfully run docker image as container

Below my docker file,
FROM python:3.9.0
ARG WORK_DIR=/opt/quarter_1
RUN apt-get update && apt-get install cron -y && apt-get install -y default-jre
# Install python libraries
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
WORKDIR $WORK_DIR
EXPOSE 8888
VOLUME /home/data/quarter_1/
# Copy etl code
# copy code on container under your workdir "/opt/quarter_1"
COPY . .
I tried to connect to the server then i did the build with docker build -t my-python-app .
when i tried to run the container from a build image i got nothing and was not able to do it.
docker run -p 8888:8888 -v /home/data/quarter_1/:/opt/quarter_1 image_id
work here is opt

Update based on comments
If I understand everything you've posted correctly, my suggestion here is to use a base Docker Jupyter image, modify it to add your pip requirements, and then add your files to the work path. I've tested the following:
Start with a dockerfile like below
FROM jupyter/base-notebook:python-3.9.6
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
COPY ./quarter_1 /home/jovyan/quarter_1
Above assumes you are running the build from the folder containing dockerfile, "requirements.txt", and the "quarter_1" folder with your build files.
Note "home/joyvan" is the default working folder in this image.
Build the image
docker build -t biwia-jupyter:3.9.6 .
Start the container with open port to 8888. e.g.
docker run -p 8888:8888 biwia-jupyter:3.9.6
Connect to the container to access token. A few ways to do but for example:
docker exec -it CONTAINER_NAME bash
jupyter notebook list
Copy the token in the URL and connect using your server IP and port. You should be able to paste the token there, and afterwards access the folder you copied into the build, as I did below.
Jupyter screenshot
If you are deploying the image to different hosts this is probably the best way to do it using COPY/ADD etc., but otherwise look at using Docker Volumes which give you access to a folder (for example quarter_1) from the host, so you don't constantly have to rebuild during development.
Second edit for Python 3.9.0 request
Using the method above, 3.9.0 is not immediately available from DockerHub. I doubt you'll have much compatibility issues between 3.9.0 and 3.9.6, but we'll build it anyway. We can download the dockerfile folder from github, update a build argument, create our own variant with 3.9.0, and do as above.
Assuming you have git. Otherwise download the repo manually.
Download the Jupyter Docker stack repo
git clone https://github.com/jupyter/docker-stacks
change into the base-notebook directory of the cloned repo
cd ./base-notebook
Build the image with python 3.9.0 instead
docker build --build-arg PYTHON_VERSION=3.9.0 -t jupyter-base-notebook:3.9.0 .
Create the version with your copied folders and 3.9.0 version from the steps above, replacing the first line in the dockerfile instead with:
FROM jupyter-base-notebook:3.9.0
I've tested this and it works, running Python 3.9.0 without issue.
There are lots of ways to build Jupyter images, this is just one method. Check out docker hub for Jupyter to see their variants.

Flask, React and Docker: Non-Zero Codes

I am trying to follow the Flask/React tutorial here, on a plain Windows machine.
On Windows 10, without considering Docker, I have the tutorial working.
On Windows 10 under a docker system (ubuntu-based containers and docker-compose), I do not:
The React server works under the docker.
The Flask server won't successfully build.
The Dockerfile for the Flask server is:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
RUN pip3 install flask
#RUN pip3 install venv
RUN mkdir -p /app
WORKDIR /app
COPY . /app
#RUN python3 -m venv venv
RUN cd api/venv/Scripts
RUN flask run --no-debugger
This fails at the very last line:
The command '/bin/sh -c flask run --no-debugger' returned a non-zero code: 1
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time. The venv commands are commented out because I'm not even sure venv makes sense in a docker (but what would I know?) and also because the pip3 install venv command halts with a non-zero code:2.
Any advice is welcome.

There are two obvious issues in the Dockerfile you show.
Each RUN command runs in a clean environment starting from the last known state of the image. Settings like the current directory (and also environment variable values) are not preserved when a RUN command exits. So RUN cd ... starts the RUN command from the old directory, changes to the new directory, and then doesn't remember that; the following RUN command starts again from the old directory. You need the WORKDIR directive to actually change directories.
The RUN commands also run during the build phase. They won't publish network ports or have access to databases; in a multi-container Compose setup they can't connect to other containers. You probably want to run the Flask app as the main container CMD.
So you can update your Dockerfile to look like:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
WORKDIR /app # Creates the directory as well
COPY requirements.txt ./ # Includes "flask"
RUN pip install -r requirements.txt
COPY . ./
WORKDIR /app/api/venv/Scripts # Not `RUN cd ...`
CMD flask run --no-debugger # Not `RUN ...`
It is in fact common to just not use a virtual environment in Docker; the Docker image is isolated from any other Python installation and so it's safe to use the "system" Python package tree. (I am a little suspicious of the venv directory in there, since virtual environments can't be transplanted into other setups very well.)
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time.
Put Docker away for another day. It's not necessary, especially during the development phase of your application. If you read through SO questions there are a lot of questions trying to contort Docker into acting just like a local development environment, where it's really not designed for it. There's nothing wrong with locally installing the tools you need to do your job, especially when they're very routine tools like Python and Node.

I believe that flask can't find your app when you run your docker (especially as the docker build attempts to run it). If you want to use the docker only for the purpose of running your app through that docker, use CMD in the dockerfile, thus when running the docker image, it will start your flask app first thing.

Can't build Dockerfile on Ubuntu Server

I'm working on a python project and I get this problem on the Ubuntu Server while working on my local Windows. It stops in the second step, when trying to run mkdir instruction. It seems that I can't run the typical Ubuntu instructions (apt-get clean, apt-get update)
Dockerfile
FROM python:3
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install --upgrade pip==20.0.2 && pip install -r requirements.txt
COPY . /code/
Output error
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mountpoint for devices not found\"": unknown

Are you able to run the Docker hello-world image? If not, this may indicate a problem with your installation/configuration
$ docker run hello-world
More information about post-installation steps can be found here. Otherwise, first option is to try restarting Docker
$ sudo systemctl restart docker
The Docker daemon must run with root privileges in the background, I have experienced issues before where on a newly-installed machine the updated group permissions for the daemon have not been fully applied. Restarting the daemon, or logging out & in might fix this.
Furthermore, when you declare a WORKDIR inside a Dockerfile that path will automatically be created it if does not already exist. Once you have set your WORKDIR all your paths can and should be relative to it if possible. Knowing this, we can simplify the Dockerfile
FROM python:3
WORKDIR /code
COPY requirements.txt .
RUN pip install --upgrade pip==20.0.2 && pip install -r requirements.txt
COPY . .
That may be enough to solve your issue. In my experience the Docker build tracebacks can be rather vague at times, but it sounds like that particular error could be stemming from a failed attempt to create a directory, either from a permission issue on the host machine or a syntax issue inside the container.

I solved this problem by (re)installing with apt, instead of snap:
sudo snap remove docker
sudo apt install docker-io
Test with (now working):
sudo docker run hello-world

How to run a python program using Singularity from a docker container?

I have created a docker container for my pure python program and have set python main.py to be executed when the container is run. Running the container works as expected on my local machine. However, I want to run the container on my institution's high-performance cluster. The cluster machines use Singularity, which I am using to pull my docker image hosted on Dockerhub (the repo is darshank11/ga_paci_final). However, when I try to run the Singularity container, I get the following error: python3: can't open file 'main.py': [Errno 2] No such file or directory.
I've tried to change the base image in the Dockerfile, for example from FROM python:latest to FROM ubuntu:latest. I've made sure the docker container worked on my local machine, and then got one of my co-workers to pull the container from Dockerhub and run it too. Everything works fine until I get to Singularity.
Here is my docker file:
FROM ubuntu:16.04
RUN apt-get update -y && \
apt-get install -y python3-pip python3-dev
RUN mkdir src
WORKDIR /src
COPY . /src
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
CMD ["python3", "-u", "main.py"]

You're getting that error because the execution context is not what you're expecting. The run path in singularity is the current directory on the host OS (e.g., ~/ga_paci_final), which has been mounted into the singularity image.
As mentioned in the comments, one solution is to give the full path to the python file in the docker CMD statement. Another option is to modify the %runscript block of singularity definition file to something like:
%runscript
cd /src
python3 -u main.py
That way you ensure the run environment is identical between Docker and Singularity.

Building Docker image for a python flask web app. Image does not appear or requirements.txt not found

I am trying to create a build of a webapp I have created using Docker, but I have had no success. I've tried to follow two different tutorials but neither worked for me
Following Tutorial 1:
The build seemed to complete without any problems but I could not find the image file anywhere, and 'sudo docker ps -a' returned nothing.
Following through thtutorial 2:
I am now getting another error, that the requirements file is not found. I looked up solutions to that here, but it seems I am doing the correct thing by adding it to the build with the 'ADD requirements.txt /webapp' command. I checked that I spelled requirements right, haha. Now I do see it in 'sudo docker ps -a', but I dont see any image file and presumably it would not work if I did, since it could not find the requirements.
I'm quite confused as to what is wrong and how I should properly build a docker. How to I get it to find the requirements file, and then upon completing the "Build" command, actually have an image. Where is this image stored?
Below is the setup I have after following the second tutorial.
Dockerfile
FROM ubuntu:latest
#Update OS
RUN sed -i 's/# \(.*multiverse$\)/\1/g' /etc/apt/sources.list
RUN apt-get update
RUN apt-get -y upgrade
# Install Python
RUN apt-get install -y python-dev python-pip
# Add requirements.txt
ADD requirements.txt /webapp
# Install uwsgi Python web server
RUN pip install uwsgi
# Install app requirements
RUN pip install -r requirements.txt
# Create app directory
ADD . /webapp
# Set the default directory for our environment
ENV HOME /webapp
WORKDIR /webapp
# Expose port 8000 for uwsgi
EXPOSE 8000
ENTRYPOINT ["uwsgi", "--http", "0.0.0.0:8000", "--module", "app:app", "--processes", "1", "--threads", "8"]
CMD ["app.py"]
Requirements
Flask==0.12.1
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
Werkzeug==0.11.5
SQLite3==3.18.0
Directory Structure (if it matters)
app.py
image_data.db
README.txt
requirements.txt
Dockerfile
templates
- index.html
static/
- image.js
- main.css
img/
- camera.png
images/
- empty
Current output of ' sudo docker ps -a'
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
663d4e096938 8f5126108354 "/bin/sh -c 'pip i..." 17 minutes ago Exited (1) 17 minutes ago admiring_agnesi

The requirements.txt should be in the same directory as your dockerfile exists. I see from the dockerfile that the requirements.txt is added to webapp but the
RUN pip install -r requirements.txt
is trying to find it in the current directory; You probably need to copy rquirement.txt to the current directory like
ADD requirements.txt .
Lets see if that works. I did not test it.
You can see the images by
docker images
and then run it like
docker run -t image_name

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.