run tika python with django in docker - python

I've a django site that parses pdf using tika-python and stores the parsed pdf content in elasticsearch index. it works fine in my local machine. I want to run this setup using docker. However, tika-python does not work as it requires java 8 to run the REST server in background.
my dockerfile:
FROM python:3.6.5
WORKDIR /site
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
EXPOSE 9200
ENV PATH="/site/poppler/bin:${PATH}"
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]
requirements.txt file :
django==2.2
beautifulsoup4==4.6.0
json5==0.8.4
jsonschema==2.6.0
django-elasticsearch-dsl==0.5.1
tika==1.19
sklearn
where (dockerfile or requirements) and how should i add java 8 required for tika to make it work in docker. Online tutorials/ examples contain java+tika in container, which is easy to achieve. Unfortunately, couldn't find a similar solution in stackoverflow also.

Related

Python winamd64 image and Windows container

I am new to Docker and want to know if there is a slim or light-sized Python winamd64 image?
In the Docker file, as follows:
FROM python:3.8-slim-buster
# Create project directory (workdir)
WORKDIR /SMART
# Add requirements.txt to WORKDIR and install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Add source code files to WORKDIR
ADD . .
# Application port (optional)
EXPOSE 8000
# Container start command
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]
It works well in the Linux container state, but I need my container to be Windows. I face a problem when I switch it to the Windows container, even if I change "experimental" to the true value. Therefore, I decided to use a Python winamd64 image, but all of the official images on the website are around 1 GByte. I need a small-sized Python image like (python:3.8-slim-buster) so how could I solve this problem? Or how could I change the Docker file in the first line (FROM python:3.8-slim-buster) to use a Python image that I push to my hub?

Trying to supply PGPASS to Docker Image

New to Docker here. I'm trying to create a basic Dockerfile where I run a python script that runs some queries in postgres through psycopg2. I currently have a pgpass file setup in my environment variables so that I can run these tools without supplying a password in the code. I'm trying to replicate this in Docker. I have windows on my local.
FROM datascienceschool/rpython as base
RUN mkdir /test
WORKDIR /test
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY main_airflow.py /test
RUN cp C:\Users\user.name\AppData\Roaming\postgresql\pgpass.conf /test
ENV PGPASSFILE="test/pgpass.conf"
ENV PGUSER="user"
ENTRYPOINT ["python", "/test/main_airflow.py"]
This is what I've tried in my Dockerfile. I've tried to copy over my pgpassfile and set it as my environment variable. Apologies if I have a forward/backslashes wrong, or syntax. I'm very new to Docker, Linux, etc.
Any help or alternatives would be appreciated
It's better to pass your secrets into the container at runtime than it is to include the secret in the image at build-time. This means that the Dockerfile doesn't need to know anything about this value.
For example
$ export PGPASSWORD=<postgres password literal>
$ docker run -e PGPASSWORD <image ref>
Now in that example, I've used PGPASSWORD, which is an alternative to PGPASSFILE. It's a little more complicated to do this same if you're using a file, but that would be something like this:
The plan will be to mount the credentials as a volume at runtime.
FROM datascienceschool/rpython as base
RUN mkdir /test
WORKDIR /test
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY main_airflow.py /test
ENV PGPASSFILE="/credentials/pgpass.conf"
ENV PGUSER="user"
ENTRYPOINT ["python", "/test/main_airflow.py"]
As I said above, we don't want to include the secrets in the image. We are going to indicate where the file will be in the image, but we don't actually include it yet.
Now, when we start the image, we'll mount a volume containing the file at the location specified in the image, /credentials
$ docker run --mount src="<host path to directory>",target="/credentials",type=bind <image ref>
I haven't tested this so you may need to adjust the exact paths and such, but this is the idea of how to set sensitive values in a docker container

Is there a way to download TextBlob corpora to Google Cloud Run?

I am using Python with TextBlob for sentiment analysis. I want to deploy my app (build in Plotly Dash) to Google Cloud Run with Google Cloud Build (without using Docker). When using locally on my virtual environment all goes fine, but after deploying it on the cloud the corpora is not downloaded. Looking at the requriements.txt file, there was also no reference to this corpora.
I have tried to add python -m textblob.download_corpora to my requriements.txt file but it doesn't download when I deploy it. I have also tried to add
import textblob
import subprocess
cmd = ['python','-m','textblob.download_corpora']
subprocess.run(cmd)
and
import nltk
nltk.download('movie_reviews')
to my script (callbacks.py, I am using Plotly Dash to make my app), all without success.
Is there a way to add this corpus to my requirements.txt file? Or is there another workaround to download this corpus? How can I fix this?
Thanks in advance!
Vijay
Since Cloud Run creates and destroys containers as needed for your traffic levels you'll want to embed your corpora in the pre-built container to ensure a fast cold start time (instead of downloading it when the container starts)
The easiest way to do this is add another line inside of a docker file that downloads and installs the corpora at build time like so:
RUN python -m textblob.download_corpora
Here's a full docker file for your reference:
# Python image to use.
FROM python:3.8
# Set the working directory to /app
WORKDIR /app
# copy the requirements file used for dependencies
COPY requirements.txt .
# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
RUN python -m textblob.download_corpora
# Copy the rest of the working directory contents into the container at /app
COPY . .
# Run app.py when the container launches
ENTRYPOINT ["python", "app.py"]
Good luck,
Josh

Run multiple python programs through a batch file on docker CE desktop(windows)

I have 3 python script which I want to run at the same time through batch file on docker CE windows.
I have created 3 containers for 3 python scripts. All the 3 python require input files.
python_script_1 : docker-container-name
python_script_2 : docker-container-name
python_script_2 : docker-container-name
The docker files for 3 python scripts are:
Docker_1
FROM python:3.7
RUN pip install pandas
COPY input.csv ./
COPY script.py ./
CMD ["python", "./script.py", "input.csv"]
Docker_2
FROM python:3.7
RUN pip install pandas
COPY input_itgear.txt ./
COPY script.py ./
CMD ["python", "./script.py", "input_itgear.txt"]
Docker_3
FROM python:3.7
RUN pip install pandas
COPY sale_new.txt ./
COPY store.txt ./
COPY script.py ./
CMD ["python", "./script.py", "sale_new.txt", "store.txt"]
I want to run all the three scripts at the same time on docker through a batch file. Any help will be greatly appreciated.
so here is the gist
https://gist.github.com/karlisabe/9f0d43fe09536efa8035092ccbb593d4
You need to place your csv and py files accordingly so that they can be copied into the images, but what is going to happen is that when you run docker-compose up -d it should build the images (or use from cache if no changes are made) and run all 3 services. In essence it's like using docker run my-image 3 times but there is some additional features available, like all 3 of the containers will be a special network created by docker. You should read more about docker-compose here https://docs.docker.com/compose/

How to run a docker image in IBM Cloud functions?

I have a simple Python program that I want to run in IBM Cloud functions. Alas it needs two libraries (O365 and PySnow) so I have to Dockerize it and it needs to be able to accept a Json feed from STDIN. I succeeded in doing this:
FROM python:3
ADD requirements.txt ./
RUN pip install -r requirements.txt
ADD ./main ./main
WORKDIR /main
CMD ["python", "main.py"]
This runs with: cat env_var.json | docker run -i f9bf70b8fc89
I've added the Docker container to IBM Cloud Functions like this:
ibmcloud fn action create e2t-bridge --docker [username]/e2t-bridge
However when I run it, it times out.
Now I did see a possible solution route, where I dockerize it as an Openwhisk application. But for that I need to create a binary from my Python application and then load it into a rather complicated Openwhisk skeleton, I think?
But having a file you can simply run was is the whole point of my Docker, so to create a binary of an interpreted language and then adding it into a Openwhisk docker just feels awfully clunky.
What would be the best way to approach this?
It turns out you don't need to create a binary, you just need to edit the OpenWhisk skeleton like so:
# Dockerfile for example whisk docker action
FROM openwhisk/dockerskeleton
ENV FLASK_PROXY_PORT 8080
### Add source file(s)
ADD requirements.txt /action/requirements.txt
RUN cd /action; pip install -r requirements.txt
# Move the file to
ADD ./main /action
# Rename our executable Python action
ADD /main/main.py /action/exec
CMD ["/bin/bash", "-c", "cd actionProxy && python -u actionproxy.py"]
And make sure that your Python code accepts a Json feed from stdin:
json_input = json.loads(sys.argv[1])
The whole explaination is here: https://github.com/iainhouston/dockerPython

Categories

Resources