OpenNI2 with OpenCV 3 - python

I'm on OSX 10.11.5
I have been trying to build OpenCV3 with OpenNI support. I have a Python script that is supposed to work with the XBox 360 Kinect, but OpenCV isn't detecting the Kinect. I have libfreenect installed and I can run freenect-glview which brings up the depth view.
I've done pretty extensive Google searches, but all guides seem to be for Windows or Linux. Can anyone help me? Guides for OpenNI 1 will work too.
Thanks in advance!

I use Kinect for Xbox 360 on macOS successfully. The steps are as follows:
1) Install OpenNI2
$ brew tap homebrew/science
$ brew install openni2
2) Install libfreenect
Note:
libfreenect for Kinect Xbox 360
libfreenect2 for Kinect Xbox One
Because of this issue: Using Microsoft Kinect with Opencv 3.0.0, we should build it by ourselves.
2.1) Download
$ curl -O -L https://github.com/OpenKinect/libfreenect/archive/v0.5.5.tar.gz
$ tar zxf v0.5.5.tar.gz
$ cd libfreenect-0.5.5/
2.2) Fix issue
# jump to line 173
$ vi OpenNI2-FreenectDriver/src/DepthStream.hpp
case XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE: // unsigned long long or unsigned int (for OpenNI2/OpenCV)
if (*pDataSize != sizeof(unsigned long long) && *pDataSize != sizeof(unsigned int))
{
LogError("Unexpected size for XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE");
return ONI_STATUS_ERROR;
} else {
if (*pDataSize != sizeof(unsigned long long)) {
*(static_cast<unsigned long long*>(data)) = ZERO_PLANE_DISTANCE_VAL;
} else {
*(static_cast<unsigned int*>(data)) = (unsigned int) ZERO_PLANE_DISTANCE_VAL;
}
}
return ONI_STATUS_OK;
2.3) Build
$ mkdir build
$ cd build
$ cmake .. -DBUILD_OPENNI2_DRIVER=ON
$ make
2.4) Install
$ export OPENNI2_DIR=$(python -c 'import subprocess, glob; \
prefix = subprocess.check_output("brew --prefix", shell=True); \
print(glob.glob("%s/Cellar/openni2/*" % prefix[:-1])[0])')
$ cd lib/OpenNI2-FreenectDriver/
$ export FREENECT_DRIVER_LIB=libFreenectDriver.dylib
$ export FREENECT_DRIVER_LIB=$(python - <<'EOF'
import os
lib = os.environ['FREENECT_DRIVER_LIB']
while os.path.islink(lib):
lib = os.readlink(lib)
print(lib)
EOF
)
$ ln -sf `pwd`/$FREENECT_DRIVER_LIB $OPENNI2_DIR/lib/ni2/OpenNI2/Drivers/libFreenectDriver.dylib
$ ln -sf `pwd`/$FREENECT_DRIVER_LIB $OPENNI2_DIR/share/openni2/tools/OpenNI2/Drivers/
2.5) Preview
$ cd `brew --prefix`/share/openni2/tools
$ ./NiViewer
3) Build OpenCV
3.1) Python
$ brew install python
$ python --version
Python 2.7.13
$ pip install numpy
3.2) Source
$ git clone https://github.com/opencv/opencv.git
$ cd opencv/
$ git checkout 3.2.0
# if use extra modules
$ git clone https://github.com/opencv/opencv_contrib.git
$ cd opencv_contrib/
$ git checkout 3.2.0
3.3) Build
$ cd opencv/
$ mkdir build
$ cd build/
$ export PY_HOME=/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7
$ export PY_NUMPY_DIR=$(python -c 'import os.path, numpy.core; print(os.path.dirname(numpy.core.__file__))')
$ export OPENNI2_INCLUDE=/usr/local/include/ni2
$ export OPENNI2_REDIST=/usr/local/lib/ni2
$ cmake -DCMAKE_BUILD_TYPE=RELEASE \
-DCMAKE_INSTALL_PREFIX=/usr/local \
\
-DBUILD_opencv_python2=ON \
-DPYTHON2_EXECUTABLE=$PY_HOME/bin/python \
-DPYTHON2_INCLUDE_DIR=$PY_HOME/Headers \
-DPYTHON2_LIBRARY=$PY_HOME/lib \
-DPYTHON2_PACKAGES_PATH=/usr/local/lib/python2.7/site-packages \
-DPYTHON2_NUMPY_INCLUDE_DIRS=$PY_NUMPY_DIR/include/ \
\
-DWITH_OPENNI2=ON \
\
-DBUILD_DOCS=OFF \
-DBUILD_EXAMPLES=OFF \
-DBUILD_TESTS=OFF \
-DBUILD_PERF_TESTS=OFF \
\
-DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
..
$ make -j$(python -c 'import multiprocessing as mp; print(mp.cpu_count())')
$ make install
4) Finally
I posted the details to this gist. Samples here: C++, Python.
NOTE: Unable to retrieve correct image using Python code now. Please see this issue.

Related

How to import pandas in java application to run python script using dockerfile [duplicate]

I need both java and python in my docker container to run some code.
This is my dockerfile:
It works perpectly if I don't add the FROM openjdk:slim
#get python
FROM python:3.6-slim
RUN pip install --trusted-host pypi.python.org flask
#get openjdk
FROM openjdk:slim
COPY . /targetdir
WORKDIR /targetdir
# Make port 81 available to the world outside this container
EXPOSE 81
CMD ["python", "test.py"]
And the test.py app is in the same directory:
from flask import Flask
import os
app = Flask(__name__)
#app.route("/")
def hello():
html = "<h3>Test:{test}</h3>"
test = os.environ['JAVA_HOME']
return html.format(test = test)
if __name__ == '__main__':
app.run(debug=True,host='0.0.0.0',port=81)
I'm getting this error:
D:\MyApps\Docker Toolbox\Docker Toolbox\docker.exe: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"python\": executable file not found in $PATH": unknown.
What exactly am I doing wrong here? I'm new to docker, perhaps I'm missing a step.
Additional details
My goal
I have to run a python program that runs a Java file. The python library I'm using requires the path to JAVA_HOME.
My issues:
I do not know Java, so I cannot run the file properly.
My entire code is in Python, except this Java bit
The Python wrapper runs the file in a way I need it to run.
An easier solution to the above issue is to use multi-stage docker containers where you can copy the content from one to another. In the above case you can have openjdk:slim as the base container and then use content from a python container to be copied over into this base container as follows:
FROM openjdk:slim
COPY --from=python:3.6 / /
...
<normal instructions for python container continues>
...
This feature is available as of Docker 17.05 and there are more things you can do using multi-stage build as in copying only the content you need from one to another.
Reference documentation
OK it took me a little while to figure it out. And my thanks go to this answer.
I think my approach didn't work because I did not have a basic version of Linux.
So it goes like this:
Get Linux (I'm using Alpine because it's barebones)
Get Java via the package manager
Get Python, PIP
OPTIONAL: find and set JAVA_HOME
Find the path to JAVA_HOME. Perhaps there is a better way to do this, but I did this running the running the container, then I looked inside the container using docker exec -it [COINTAINER ID] bin/bash and found it.
Set JAVA_HOME in dockerfile and build + run it all again
Here is the final Dockerfile ( it should work with the python code in the question) :
### 1. Get Linux
FROM alpine:3.7
### 2. Get Java via the package manager
RUN apk update \
&& apk upgrade \
&& apk add --no-cache bash \
&& apk add --no-cache --virtual=build-dependencies unzip \
&& apk add --no-cache curl \
&& apk add --no-cache openjdk8-jre
### 3. Get Python, PIP
RUN apk add --no-cache python3 \
&& python3 -m ensurepip \
&& pip3 install --upgrade pip setuptools \
&& rm -r /usr/lib/python*/ensurepip && \
if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi && \
if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi && \
rm -r /root/.cache
### Get Flask for the app
RUN pip install --trusted-host pypi.python.org flask
####
#### OPTIONAL : 4. SET JAVA_HOME environment variable, uncomment the line below if you need it
#ENV JAVA_HOME="/usr/lib/jvm/java-1.8-openjdk"
####
EXPOSE 81
ADD test.py /
CMD ["python", "test.py"]
I'm new to Docker, so this may not be the best possible solution. I'm open to suggestions.
UPDATE: COMMON ISUUES
Difficulty using python packages
As Joabe Lucena pointed out here, Alpine can have issues certain python packages.
I recommend that you use a Linux distro that works best for you, e.g. centos.
Another alternative is to simply use docker-java-python image from docker hub. https://hub.docker.com/r/rappdw/docker-java-python
FROM rappdw/docker-java-python:openjdk1.8.0_171-python3.6.6
RUN java -version
RUN python --version
I found Sunny Pal's answer very useful but I made the copy more specific and added the necessary environment variables and update-alternatives lines so that Java was accessible from the command line in the Python container.
FROM python:3.9-slim
COPY --from=openjdk:8-jre-slim /usr/local/openjdk-8 /usr/local/openjdk-8
ENV JAVA_HOME /usr/local/openjdk-8
RUN update-alternatives --install /usr/bin/java java /usr/local/openjdk-8/bin/java 1
...
Oh, let me add my five cents. I took python slim as a base image. Then I found open-jdk-11 (Note, open-jdk-10 will fail because it is not supported) base image code!... And copy-pasted it into my docker file.
Note, copy-paste driven development is cool... ONLY when you understand each line you use in your code!!!
And here it is!
<!-- language: shell -->
FROM python:3.7.2-slim
# Do your stuff, install python.
# and now Jdk
RUN rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends curl ca-certificates \
&& rm -rf /var/lib/apt/lists/*
ENV JAVA_VERSION jdk-11.0.2+7
COPY slim-java* /usr/local/bin/
RUN set -eux; \
ARCH="$(dpkg --print-architecture)"; \
case "${ARCH}" in \
ppc64el|ppc64le) \
ESUM='c18364a778b1b990e8e62d094377af48b000f9f6a64ec21baff6a032af06386d'; \
BINARY_URL='https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.1%2B13/OpenJDK11U-jdk_ppc64le_linux_hotspot_11.0.1_13.tar.gz'; \
;; \
s390x) \
ESUM='e39aacc270731dadcdc000aaaf709adae7a08113ccf5b4a045bc87fc13458d71'; \
BINARY_URL='https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11%2B28/OpenJDK11-jdk_s390x_linux_hotspot_11_28.tar.gz'; \
;; \
amd64|x86_64) \
ESUM='d89304a971e5186e80b6a48a9415e49583b7a5a9315ba5552d373be7782fc528'; \
BINARY_URL='https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.2%2B7/OpenJDK11U-jdk_x64_linux_hotspot_11.0.2_7.tar.gz'; \
;; \
aarch64|arm64) \
ESUM='b66121b9a0c2e7176373e670a499b9d55344bcb326f67140ad6d0dc24d13d3e2'; \
BINARY_URL='https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.1%2B13/OpenJDK11U-jdk_aarch64_linux_hotspot_11.0.1_13.tar.gz'; \
;; \
*) \
echo "Unsupported arch: ${ARCH}"; \
exit 1; \
;; \
esac; \
curl -Lso /tmp/openjdk.tar.gz ${BINARY_URL}; \
sha256sum /tmp/openjdk.tar.gz; \
mkdir -p /opt/java/openjdk; \
cd /opt/java/openjdk; \
echo "${ESUM} /tmp/openjdk.tar.gz" | sha256sum -c -; \
tar -xf /tmp/openjdk.tar.gz; \
jdir=$(dirname $(dirname $(find /opt/java/openjdk -name javac))); \
mv ${jdir}/* /opt/java/openjdk; \
export PATH="/opt/java/openjdk/bin:$PATH"; \
apt-get update; apt-get install -y --no-install-recommends binutils; \
/usr/local/bin/slim-java.sh /opt/java/openjdk; \
apt-get remove -y binutils; \
rm -rf /var/lib/apt/lists/*; \
rm -rf ${jdir} /tmp/openjdk.tar.gz;
ENV JAVA_HOME=/opt/java/openjdk \
PATH="/opt/java/openjdk/bin:$PATH"
ENV JAVA_TOOL_OPTIONS="-XX:+UseContainerSupport"
Now references.
https://github.com/AdoptOpenJDK/openjdk-docker/blob/master/11/jdk/ubuntu/Dockerfile.hotspot.releases.slim
https://hub.docker.com/_/python/
https://hub.docker.com/r/adoptopenjdk/openjdk11/
I used them to answer this question, which may help you sometime.
Running Python and Java in Docker
I believe that by adding FROM openjdk:slim line, you tell docker to execute all of your subsequent commands in openjdk container (which does not have python)
I would approach this by creating two separate containers for openjdk and python and specify individual sets of commands for them.
Docker is made to modularize your solutions and mashing everything into one container is usually a bad practice.
I tried pajamas's anwser which worked very well for creating this image. However, when trying to install packages like gensim, pandas or else, I faced some errors like: don't know how to compile Fortran code on platform 'posix'. I searched and tried this, this and that but none worked for me.
So, based on pajamas's anwser I decided to convert his image from Alpine to Centos which worked very well. So here's a Dockerfile that might help someone who's may be struggling in this scenario like I was:
# Get Linux
FROM centos:7
# Install Java
RUN yum update -y \
&& yum install java-1.8.0-openjdk -y \
&& yum clean all \
&& rm -rf /var/cache/yum
# Set JAVA_HOME environment var
ENV JAVA_HOME="/usr/lib/jvm/jre-openjdk"
# Install Python
RUN yum install python3 -y \
&& pip3 install --upgrade pip setuptools wheel \
&& if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi \
&& if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi \
&& yum clean all \
&& rm -rf /var/cache/yum
CMD ["bash"]
you should have one FROM in your dockerfile
(unless you use multi-stage build for the docker)
I think i found easiest way to mix java jdk 17 and python3. I is not working on python2
FROM openjdk:17.0.1-jdk-slim
RUN apt-get update && \
apt-get install -y software-properties-common && \
apt-get install -y python3-pip
Software Commons have python3 lightweight version. (3.9.1 version)
U can also install some libraries like that.
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install numpy && \
python3 -m pip install opencv-python
OR
RUN apt-get update && \
apt-get install -y ffmpeg
Easiest is to just start from a Python image and add the OpenJDK. Note that FROM openjdk has been deprecated and replaced with eclipse-temurin
FROM python:3.10
ENV JAVA_HOME=/opt/java/openjdk
COPY --from=eclipse-temurin:17-jre $JAVA_HOME $JAVA_HOME
ENV PATH="${JAVA_HOME}/bin:${PATH}"
RUN pip install --trusted-host pypi.python.org flask
See How to use this Image - Using a different base Image section of https://hub.docker.com/_/eclipse-temurin for details.
Instead of using FROM openjdk:slim you can separately install Java, please refer below example:
# Install OpenJDK-8
RUN apt-get update && \
apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean;
# Fix certificate issues
RUN apt-get update && \
apt-get install ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f;
# Setup JAVA_HOME -- useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME

How can I activate conda venv in Dockerfile? (pip not found)

I'm trying to build a docker image like
FROM ubuntu:latest
RUN apt update && apt upgrade -y && \
apt install -y git wget libsuitesparse-dev gcc g++ swig && \
cd ~ && wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
sh Miniconda3-latest-Linux-x86_64.sh -b && rm Miniconda3-latest-Linux-x86_64.sh && \
PATH=$PATH:~/miniconda3/condabin && \
conda init bash && conda upgrade -y conda && /bin/bash -c "source ~/.bashrc" && \
pip install numpy scipy matplotlib scikit_umfpack
However, /bin/bash -c "source ~/.bashrc" does not work... so I got /bin/sh: 1: pip: not found
How can I build a docker image installing miniconda and python requirements using pip at the same time?
I would recommend using a pre-existing Docker image that already has Anaconda installed. For example, this link has a Docker image endorsed by Anaconda itself. There may be others on Dockerhub that also have Anaconda installed already. In the case you already tried an image with Anaconda and it didn't meet your needs, let me know.

Loading openvino python library on Raspebrry pi 4

After followign the guides (and coming through the github trackers), I was able to get OpenVINO to install on my pi4 and can run /opt/intel/openvino/bin/armv7l/Release/object_detection_sample_ssd successfully using my own trained model.
I used the follow cmake command most recently, but I've done pretty much all iterations I could find on the several instruction pages.
cmake -DCMAKE_BUILD_TYPE=Release /
-DENABLE_SSE42=OFF /
-DTHREADING=SEQ /
-DENABLE_GNA=OFF /
-DENABLE_PYTHON=ON /
-DPYTHON_EXECUTABLE=/usr/bin/python3 /
-DPYTHON_LIBRARY=/usr/lib/arm-linux-gnueabihf/libpython3.7m.so /
-DPYTHON_INCLUDE_DIR=/usr/include/python3.7m /
-NGRAPH_PYTHON_BUILD_ENABLE=ON /
-DCMAKE_CXX_FLAGS=-latomic /
-DOPENCV_EXTRA_EXE_LINKER_FLAGS=-latomic /
-D CMAKE_INSTALL_PREFIX=/usr/local /
.. && make
When I try one of the python examples I get
ModuleNotFoundError: No module named 'ngraph'
Looking more into it now and it appears the issue is the "setupvars.sh" script not calling into the right directory. I was able to get the module openvino to load by adjusting the export path. I must say the amount of documentation that is, quite frankly all over the place and seems to have wrong directory structures left and right.
As of right now, the builds for the Raspberry Pi are not being made with NGRAPH compiled. The user has to compile it themselves from scratch per this thread. https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/No-ngraph-bindings-for-python-in-raspbian-distribution/m-p/1263401#M23093
The comment by the Intel "Support" is wrong and won't work on new builds.
Please refer to the steps below for building Open Source OpenVINO™ Toolkit for Raspbian OS:
Set up build environment and install build tools
sudo apt update && sudo apt upgrade -y
sudo apt install build-essential
Install CMake from source
cd ~/
wget https://github.com/Kitware/CMake/releases/download/v3.14.4/cmake-3.14.4.tar.gz
tar xvzf cmake-3.14.4.tar.gz
cd ~/cmake-3.14.4
./bootstrap
make -j4 && sudo make install
Install OpenCV from source
sudo apt install git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev python3-scipy libatlas-base-dev
cd ~/
git clone --depth 1 --branch 4.5.2 https://github.com/opencv/opencv.git
cd opencv && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local ..
make -j4 && sudo make install
Download source code and install dependencies
cd ~/
git clone --depth 1 --branch 2021.3 https://github.com/openvinotoolkit/openvino.git
cd ~/openvino
git submodule update --init --recursive
sh ./install_build_dependencies.sh
cd ~/openvino/inference-engine/ie_bridges/python/
pip3 install -r requirements.txt
Start CMake build
export OpenCV_DIR=/usr/local/lib/cmake/opencv4
cd ~/openvino
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/home/pi/openvino_dist \
-DENABLE_MKL_DNN=OFF \
-DENABLE_CLDNN=OFF \
-DENABLE_GNA=OFF \
-DENABLE_SSE42=OFF \
-DTHREADING=SEQ \
-DENABLE_OPENCV=OFF \
-DNGRAPH_PYTHON_BUILD_ENABLE=ON \
-DNGRAPH_ONNX_IMPORT_ENABLE=ON \
-DENABLE_PYTHON=ON \
-DPYTHON_EXECUTABLE=$(which python3.7) \
-DPYTHON_LIBRARY=/usr/lib/arm-linux-gnueabihf/libpython3.7m.so \
-DPYTHON_INCLUDE_DIR=/usr/include/python3.7 \
-DCMAKE_CXX_FLAGS=-latomic ..
make -j4 && sudo make install
Configure the Intel® Neural Compute Stick 2 Linux USB Driver
sudo usermod -a -G users "$(whoami)"
source /home/pi/openvino_dist/bin/setupvars.sh
sh /home/pi/openvino_dist/install_dependencies/install_NCS_udev_rules.sh
Verify nGraph module binding to Python
cd /home/pi/openvino_dist/deployment_tools/inference_engine/samples/python/object_detection_sample_ssd
python3 object_detection_sample_ssd.py -h

Cannot import pyspark from a pipenv virtualenv as it cannot find py4j

I have built a docker image containing spark and pipenv. If I run python within the pipenv virtualenv and attempt to import pyspark, it fails with error "ModuleNotFoundError: No module named 'py4j'"
root#4d0ae585a52a:/tmp# pipenv run python -c "from pyspark.sql import SparkSession"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/spark/python/pyspark/__init__.py", line 46, in <module>
from pyspark.context import SparkContext
File "/opt/spark/python/pyspark/context.py", line 29, in <module>
from py4j.protocol import Py4JError
ModuleNotFoundError: No module named 'py4j'
However, if I run pyspark within that same virtualenv there are no such problems:
root#4d0ae585a52a:/tmp# pipenv run pyspark
Python 3.7.4 (default, Sep 12 2019, 16:02:06)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/10/16 10:18:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/16 10:18:33 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Python version 3.7.4 (default, Sep 12 2019 16:02:06)
SparkSession available as 'spark'.
>>> spark.createDataFrame([('Alice',)], ['name']).collect()
[Row(name='Alice')]
I admit I copied alot of the code for my Dockerfile from elsewhere so I'm not fully au fait with how this hangs together under the covers. I was hoping having py4j on the PYTHONPATH would be enough, but apparently not. I can confirm that it is there on the PYTHONPATH and that it exists:
root#4d0ae585a52a:/tmp# pipenv run python -c "import os;print(os.environ['PYTHONPATH'])"
/opt/spark/python:/opt/spark/python/lib/py4j-0.10.7-src.zip:
root#4d0ae585a52a:/tmp# pipenv run ls /opt/spark/python/lib/py4j*
/opt/spark/python/lib/py4j-0.10.4-src.zip
Can anyone suggest what I can do to make py4j available to my python interpreter in my virtualenv?
Here is the Dockerfile. We pull artifacts (docker images, apt packages, pypi packages etc) from our local jfrog artifactory cache, hence all the artifactory references herein:
FROM images.artifactory.our.org.com/python3-7-pipenv:1.0
WORKDIR /tmp
ENV SPARK_VERSION=2.2.1
ENV HADOOP_VERSION=2.8.4
ARG ARTIFACTORY_USER
ARG ARTIFACTORY_ENCRYPTED_PASSWORD
ARG ARTIFACTORY_PATH=artifactory.our.org.com/artifactory/generic-dev/ceng/external-dependencies
ARG SPARK_BINARY_PATH=https://${ARTIFACTORY_PATH}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz
ARG HADOOP_BINARY_PATH=https://${ARTIFACTORY_PATH}/hadoop-${HADOOP_VERSION}.tar.gz
ADD apt-transport-https_1.4.8_amd64.deb /tmp
RUN echo "deb https://username:password#artifactory.our.org.com/artifactory/debian-main-remote stretch main" >/etc/apt/sources.list.d/main.list &&\
echo "deb https://username:password#artifactory.our.org.com/artifactory/maria-db-debian stretch main" >>/etc/apt/sources.list.d/main.list &&\
echo 'Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/02update &&\
echo 'Acquire::http::Timeout "10";' > /etc/apt/apt.conf.d/99timeout &&\
echo 'Acquire::ftp::Timeout "10";' >> /etc/apt/apt.conf.d/99timeout &&\
dpkg -i /tmp/apt-transport-https_1.4.8_amd64.deb &&\
apt-get install --allow-unauthenticated -y /tmp/apt-transport-https_1.4.8_amd64.deb &&\
apt-get update --allow-unauthenticated -y -o Dir::Etc::sourcelist="sources.list.d/main.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
RUN apt-get update && \
apt-get -y install default-jdk
# Detect JAVA_HOME and export in bashrc.
# This will result in something like this being added to /etc/bash.bashrc
# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
RUN echo export JAVA_HOME="$(readlink -f /usr/bin/java | sed "s:/jre/bin/java::")" >> /etc/bash.bashrc
# Configure Spark-${SPARK_VERSION}
# Not using tar -v because including verbose output causes ci logsto exceed max length
RUN curl --fail -u "${ARTIFACTORY_USER}:${ARTIFACTORY_ENCRYPTED_PASSWORD}" -X GET "${SPARK_BINARY_PATH}" -o /opt/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& cd /opt \
&& tar -xzf /opt/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& rm spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark \
&& sed -i '/log4j.rootCategory=INFO, console/c\log4j.rootCategory=CRITICAL, console' /opt/spark/conf/log4j.properties.template \
&& mv /opt/spark/conf/log4j.properties.template /opt/spark/conf/log4j.properties \
&& mkdir /opt/spark-optional-jars/ \
&& mv /opt/spark/conf/spark-defaults.conf.template /opt/spark/conf/spark-defaults.conf \
&& printf "spark.driver.extraClassPath /opt/spark-optional-jars/*\nspark.executor.extraClassPath /opt/spark-optional-jars/*\n">>/opt/spark/conf/spark-defaults.conf \
&& printf "spark.driver.extraJavaOptions -Dderby.system.home=/tmp/derby" >> /opt/spark/conf/spark-defaults.conf
# Configure Hadoop-${HADOOP_VERSION}
# Not using tar -v because including verbose output causes ci logsto exceed max length
RUN curl --fail -u "${ARTIFACTORY_USER}:${ARTIFACTORY_ENCRYPTED_PASSWORD}" -X GET "${HADOOP_BINARY_PATH}" -o /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& cd /opt \
&& tar -xzf /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& rm /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& ln -s hadoop-${HADOOP_VERSION} hadoop
# Set Environment Variables.
ENV SPARK_HOME="/opt/spark" \
HADOOP_HOME="/opt/hadoop" \
PYSPARK_SUBMIT_ARGS="--master=lo cal[*] pyspark-shell --executor-memory 1g --driver-memory 1g --conf spark.ui.enabled=false spark.executor.extrajavaoptions=-Xmx=1024m" \
PYTHONPATH="/opt/spark/python:/opt/spark/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH" \
PATH="$PATH:/opt/spark/bin:/opt/hadoop/bin" \
PYSPARK_DRIVER_PYTHON="/usr/local/bin/python" \
PYSPARK_PYTHON="/usr/local/bin/python"
# Upgrade pip and setuptools
RUN pip install --index-url https://username:password#artifactory.our.org.com/artifactory/api/pypi/pypi-virtual-all/simple --upgrade pip setuptools
I think I've managed to get around this simply by installing py4j standalone:
$ docker run --rm -it images.artifactory.our.org.com/myimage:mytag bash
root#1d6a0ec725f0:/tmp# pipenv install py4j
Installing py4j…
✔ Installation Succeeded
Pipfile.lock (49f1d8) out of date, updating to (dfdbd6)…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
✔ Success!
Updated Pipfile.lock (49f1d8)!
Installing dependencies from Pipfile.lock (49f1d8)…
🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 42/42 — 00:00:06
To activate this projects virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
root#1d6a0ec725f0:/tmp# pipenv run python -c "from pyspark.sql import SparkSession;spark = SparkSession.builder.master('local').enableHiveSupport().getOrCreate();print(spark.createDataFrame([('Alice',)], ['name']).collect())"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/10/16 13:05:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/16 13:05:48 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[Row(name='Alice')]
root#1d6a0ec725f0:/tmp#
Not entirely sure why I have to given py4j is already on the PYTHONPATH, but so far it seems OK, so I'm happy. If anyone can elucidate why it didn't work without explicitly installing py4j, I'd love to know. I can only assume that this line from my Dockerfile:
PYTHONPATH="/opt/spark/python:/opt/spark/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH"
does not successfully make py4j known to the interpreter.
Just to confirm (in case it helps) where pip thinks py4j & pyspark are installed to:
root#1d6a0ec725f0:/tmp# pipenv run pip show pyspark
Name: pyspark
Version: 2.2.1
Summary: Apache Spark Python API
Home-page: https://github.com/apache/spark/tree/master/python
Author: Spark Developers
Author-email: dev#spark.apache.org
License: http://www.apache.org/licenses/LICENSE-2.0
Location: /opt/spark-2.2.1-bin-hadoop2.7/python
Requires: py4j
Required-by:
root#1d6a0ec725f0:/tmp# pipenv run pip show py4j
Name: py4j
Version: 0.10.8.1
Summary: Enables Python programs to dynamically access arbitrary Java objects
Home-page: https://www.py4j.org/
Author: Barthelemy Dagenais
Author-email: barthelemy#infobart.com
License: BSD License
Location: /root/.local/share/virtualenvs/tmp-XVr6zr33/lib/python3.7/site-packages
Requires:
Required-by: pyspark
root#1d6a0ec725f0:/tmp#
Another solution, unzip the py4j zip file as part of the Dockerfile stage that installs spark, and then set PYTHONPATH accordingly:
unzip spark/python/lib/py4j-*-src.zip -d spark/python/lib/
...
...
PYTHONPATH="/opt/spark/python:/opt/spark/python/lib:$PYTHONPATH"
This feels like the best solution actually. Here's the new Dockerfile:
FROM images.artifactory.our.org.com/python3-7-pipenv:1.0
WORKDIR /tmp
ENV SPARK_VERSION=2.2.1
ENV HADOOP_VERSION=2.8.4
ARG ARTIFACTORY_USER
ARG ARTIFACTORY_ENCRYPTED_PASSWORD
ARG ARTIFACTORY_PATH=artifactory.our.org.com/artifactory/generic-dev/ceng/external-dependencies
ARG SPARK_BINARY_PATH=https://${ARTIFACTORY_PATH}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz
ARG HADOOP_BINARY_PATH=https://${ARTIFACTORY_PATH}/hadoop-${HADOOP_VERSION}.tar.gz
ADD apt-transport-https_1.4.8_amd64.deb /tmp
RUN echo "deb https://username:password#artifactory.our.org.com/artifactory/debian-main-remote stretch main" >/etc/apt/sources.list.d/main.list &&\
echo "deb https://username:password#artifactory.our.org.com/artifactory/maria-db-debian stretch main" >>/etc/apt/sources.list.d/main.list &&\
echo 'Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/02update &&\
echo 'Acquire::http::Timeout "10";' > /etc/apt/apt.conf.d/99timeout &&\
echo 'Acquire::ftp::Timeout "10";' >> /etc/apt/apt.conf.d/99timeout &&\
dpkg -i /tmp/apt-transport-https_1.4.8_amd64.deb &&\
apt-get install --allow-unauthenticated -y /tmp/apt-transport-https_1.4.8_amd64.deb &&\
apt-get update --allow-unauthenticated -y -o Dir::Etc::sourcelist="sources.list.d/main.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
RUN apt-get update && \
apt-get -y install default-jdk
# Detect JAVA_HOME and export in bashrc.
# This will result in something like this being added to /etc/bash.bashrc
# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
RUN echo export JAVA_HOME="$(readlink -f /usr/bin/java | sed "s:/jre/bin/java::")" >> /etc/bash.bashrc
# Configure Spark-${SPARK_VERSION}
# Not using tar -v because including verbose output causes ci logsto exceed max length
RUN curl --fail -u "${ARTIFACTORY_USER}:${ARTIFACTORY_ENCRYPTED_PASSWORD}" -X GET "${SPARK_BINARY_PATH}" -o /opt/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& cd /opt \
&& tar -xzf /opt/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& rm spark-${SPARK_VERSION}-bin-hadoop2.7.tgz \
&& ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark \
&& unzip spark/python/lib/py4j-*-src.zip -d spark/python/lib/ \
&& sed -i '/log4j.rootCategory=INFO, console/c\log4j.rootCategory=CRITICAL, console' /opt/spark/conf/log4j.properties.template \
&& mv /opt/spark/conf/log4j.properties.template /opt/spark/conf/log4j.properties \
&& mkdir /opt/spark-optional-jars/ \
&& mv /opt/spark/conf/spark-defaults.conf.template /opt/spark/conf/spark-defaults.conf \
&& printf "spark.driver.extraClassPath /opt/spark-optional-jars/*\nspark.executor.extraClassPath /opt/spark-optional-jars/*\n">>/opt/spark/conf/spark-defaults.conf \
&& printf "spark.driver.extraJavaOptions -Dderby.system.home=/tmp/derby" >> /opt/spark/conf/spark-defaults.conf
# Configure Hadoop-${HADOOP_VERSION}
# Not using tar -v because including verbose output causes ci logsto exceed max length
RUN curl --fail -u "${ARTIFACTORY_USER}:${ARTIFACTORY_ENCRYPTED_PASSWORD}" -X GET "${HADOOP_BINARY_PATH}" -o /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& cd /opt \
&& tar -xzf /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& rm /opt/hadoop-${HADOOP_VERSION}.tar.gz \
&& ln -s hadoop-${HADOOP_VERSION} hadoop
# Set Environment Variables.
ENV SPARK_HOME="/opt/spark" \
HADOOP_HOME="/opt/hadoop" \
PYSPARK_SUBMIT_ARGS="--master=local[*] pyspark-shell --executor-memory 1g --driver-memory 1g --conf spark.ui.enabled=false spark.executor.extrajavaoptions=-Xmx=1024m" \
PYTHONPATH="/opt/spark/python:/opt/spark/python/lib:$PYTHONPATH" \
PATH="$PATH:/opt/spark/bin:/opt/hadoop/bin" \
PYSPARK_DRIVER_PYTHON="/usr/local/bin/python" \
PYSPARK_PYTHON="/usr/local/bin/python"
# Upgrade pip and setuptools
RUN pip install --index-url https://username:password#artifactory.our.org.com/artifactory/api/pypi/pypi-virtual-all/simple --upgrade pip setuptools
So apparently I cannot put a zip file on the PYTHONPATH and have the contents of that zip file available to the python interpreter. As I said above I copied that code from elsewhere so why it worked for someone else and not me...I have no idea. Oh well, everything is working now.
Here is a nice single command to check that it all works:
docker run --rm -it myimage:mytag pipenv run python -c "from pyspark.sql import SparkSession;spark = SparkSession.builder.master('local').enableHiveSupport().getOrCreate();print(spark.createDataFrame([('Alice',)], ['name']).collect())"
Here's my output from running that command:
$ docker run --rm -it myimage:mytag pipenv run python -c "from pyspark.sql import SparkSession;spark = SparkSession.builder.master('local').enableHiveSupport().getOrCreate();print(spark.createDataFrame([('Alice',)], ['name']).collect())"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/10/16 15:53:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/16 15:53:55 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/10/16 15:53:55 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/10/16 15:53:56 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[Row(name='Alice')]

How to manually pass source of bzip2 install for Python install?

I've been through several StackOverflow questions about Python & bzip2. These have been very helpful in getting me to the state I'm clearly at now. Here's what I've done so far and the problem I'm having:
I do not have root access and cannot install libbz2-dev(el)
/usr/bin/bzip2 is version 1.0.3
/usr/bin/python is version 2.4.3
GNU Stow is being used to manage libraries similar to how homebrew works
I need Python 2.7.3 to install with the bzip2 module in order to properly compile node.js from source. And yes, I'm sorry, but I do actually have to do all of this as a regular user from source.
I have installed bzip2 from source as follows:
$ make -f Makefile-libbz2_so
$ make
$ make install PREFIX=${STOW}/bzip2-1.0.6
$ cp libbz2.so.1.0.6 ${STOW}/bzip2-1.0.6/lib/
$ cd ${STOW}/bzip2-1.0.6/lib
$ ln -s libbz2.so.1.0.6 libbz2.so.1.0
$ cd ${STOW}
$ stow bzip2-1.0.6
I have stow's root directory in my PATH before anything else, so this results in:
$ bzip2 -V
# [...] Version 1.0.6
Which indicates that the correct bzip2 is being utilized in my PATH.
Next I move on to compiling Python from source and run the following:
$ cd Python-2.7.3
$ ./configure --prefix=${STOW}/Python-2.7.3
$ make
# Complains about several missing modules, of which "bz2" is the one I care about
$ make install prefix=${STOW}/Python-2.7.3 # unimportant as bz2 module failed to install
What is the correct way to tell Python during it's source configuration where the source installed bzip 1.0.6 library lives so it will detect the bzip2 devel headers and install the module properly?
Alright, it took me a few months to get to this, but I'm finally back and managed to tackle this problem.
Install bzip2 from source:
# Upload bzip2-1.0.6.tar.gz to ${SRC}
$ cd ${SRC}
$ tar -xzvf bzip2-1.0.6.tar.gz
$ cd bzip2-1.0.6
$ export CFLAGS="-fPIC"
$ make -f Makefile-libbz2_so
$ make
$ make install PREFIX=${STOW}/bzip2-1.0.6
$ cp libbz2.so.1.0.6 ${STOW}/bzip2-1.0.6/lib/
$ cd ${STOW}/bzip2-1.0.6/lib
$ ln -s libbz2.so.1.0.6 libbz2.so.1.0
$ cd ${STOW}
$ stow bzip2-1.0.6
$ source ${HOME}/.bash_profile
$ bzip2 --version
#=> bzip2, a block-soring file compressor. Version 1.0.6...
Install Python from source:
# Upload Python-2.7.3.tar.gz to ${SRC}
$ cd ${SRC}
$ tar -xzvf Python-2.7.3.tar.gz
$ cd Python-2.7.3
$ export CLFAGS="-fPIC"
$ export C_INCLUDE_PATH=${STOW}/../include
$ export CPLUS_INCLUDE_PATH=${C_INCLUDE_PATH}
$ export LIBRARY_PATH=${STOW}/../lib
$ export LD_RUN_PATH=${LIBRARY_PATH}
$ ./configure --enable-shared --prefix=${STOW}/Python-2.7.3 --libdir=${STOW}/../lib
$ make
$ make install prefix=${STOW}/Python-2.7.3
$ cd ${STOW}
$ stow Python-2.7.3
$ source ${HOME}/.bash_profile
$ python -V
#=> Python 2.7.3
$ python -c "import bz2; print bz2.__doc__"
#=> The python bz2 module provides...
Although node.js wasn't technically part of the question, it is what drove me to go through all of the above so I may as well include the last few commands to get node.js installed from source using a source install Python 2.7.3 & bzip2 1.0.6:
Install node.js from source:
# Upload node-v0.10.0.tar.gz to ${SRC}
$ cd ${SRC}
$ tar -xzvf node-v0.10.0.tar.gz
$ cd node-v0.10.0
$ ./configure --prefix=${STOW}/node-v0.10.0
$ make
$ make install prefix=${STOW}/node-v0.10.0
$ cd ${STOW}
$ stow node-v0.10.0
$ source ${HOME}/.bash_profile
$ node -v
#=> v0.10.0

Categories

Resources