How to install Poppler to be used on AWS Lambda - python

I have to run pdf2image on my Python Lambda Function in AWS, but it requires poppler and poppler-utils to be installed on the machine.
I have tried to search in many different places how to do that but could not find anything or anyone that have done that using lambda functions.
Would any of you know how to generate poppler binaries, put it on my Lambda package and tell Lambda to use that?
Thank you all.

AWS lambda runs under an execution environment which includes software and libraries if anything you need is not there you need to install it to create an execution environment.Check the below link for more info ,
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
for poppler follow this steps to create your own binary
https://github.com/skylander86/lambda-text-extractor/blob/master/BuildingBinaries.md

My approach was to use the AWS Linux 2 image as a base to ensure maximum compatibility with the Lambda environment, compile openjpeg and poppler in the container build and build a zip containing the binaries and libraries needed which can then by used as a layer.
This enables you to write your code in it's own lambda which pulls in the poppler dependencies as a layer, simplifying build and deployment.
The contents of the layer will be unpacked into /opt/. This means the contents will automatically be available because by default in the lambda environment
$PATH is /usr/local/bin:/usr/bin/:/bin:/opt/bin
$LD_LIBRARY_PATH is /lib64:/usr/lib64:$LAMBDA_RUNTIME_DIR:$LAMBDA_RUNTIME_DIR/lib:$LAMBDA_TASK_ROOT:$LAMBDA_TASK_ROOT/lib:/opt/lib
Dockerfile :
# https://www.petewilcock.com/using-poppler-pdftotext-and-other-custom-binaries-on-aws-lambda/
ARG POPPLER_VERSION="21.10.0"
ARG POPPLER_DATA_VERSION="0.4.11"
ARG OPENJPEG_VERSION="2.4.0"
FROM amazonlinux:2
ARG POPPLER_VERSION
ARG POPPLER_DATA_VERSION
ARG OPENJPEG_VERSION
WORKDIR /root
RUN yum update -y
RUN yum install -y \
cmake \
cmake3 \
fontconfig-devel \
gcc \
gcc-c++ \
gzip \
libjpeg-devel \
libpng-devel \
libtiff-devel \
make \
tar \
xz \
zip
RUN curl -o poppler.tar.xz https://poppler.freedesktop.org/poppler-${POPPLER_VERSION}.tar.xz
RUN tar xf poppler.tar.xz
RUN curl -o poppler-data.tar.gz https://poppler.freedesktop.org/poppler-data-${POPPLER_DATA_VERSION}.tar.gz
RUN tar xf poppler-data.tar.gz
RUN curl -o openjpeg.tar.gz https://codeload.github.com/uclouvain/openjpeg/tar.gz/refs/tags/v${OPENJPEG_VERSION}
RUN tar xf openjpeg.tar.gz
WORKDIR poppler-data-${POPPLER_DATA_VERSION}
RUN make install
WORKDIR /root
RUN mkdir openjpeg-${OPENJPEG_VERSION}/build
WORKDIR openjpeg-${OPENJPEG_VERSION}/build
RUN cmake .. -DCMAKE_BUILD_TYPE=Release
RUN make
RUN make install
WORKDIR /root
RUN mkdir poppler-${POPPLER_VERSION}/build
WORKDIR poppler-${POPPLER_VERSION}/build
RUN cmake3 .. -DCMAKE_BUILD_TYPE=release -DBUILD_GTK_TESTS=OFF -DBUILD_QT5_TESTS=OFF -DBUILD_QT6_TESTS=OFF \
-DBUILD_CPP_TESTS=OFF -DBUILD_MANUAL_TESTS=OFF -DENABLE_BOOST=OFF -DENABLE_CPP=OFF -DENABLE_GLIB=OFF \
-DENABLE_GOBJECT_INTROSPECTION=OFF -DENABLE_GTK_DOC=OFF -DENABLE_QT5=OFF -DENABLE_QT6=OFF \
-DENABLE_LIBOPENJPEG=openjpeg2 -DENABLE_CMS=none -DBUILD_SHARED_LIBS=OFF
RUN make
RUN make install
WORKDIR /root
RUN mkdir -p package/{lib,bin,share}
RUN cp -d /usr/lib64/libexpat* package/lib
RUN cp -d /usr/lib64/libfontconfig* package/lib
RUN cp -d /usr/lib64/libfreetype* package/lib
RUN cp -d /usr/lib64/libjbig* package/lib
RUN cp -d /usr/lib64/libjpeg* package/lib
RUN cp -d /usr/lib64/libpng* package/lib
RUN cp -d /usr/lib64/libtiff* package/lib
RUN cp -d /usr/lib64/libuuid* package/lib
RUN cp -d /usr/lib64/libz* package/lib
RUN cp -rd /usr/local/lib/* package/lib
RUN cp -rd /usr/local/lib64/* package/lib
RUN cp -d /usr/local/bin/* package/bin
RUN cp -rd /usr/local/share/poppler package/share
WORKDIR package
RUN zip -r9 ../package.zip *
And to run...
docker build -t poppler .
docker run --name poppler -d -t poppler cat
docker cp poppler:/root/package.zip .
Then upload package.zip as a layer using the console or aws cli.

I used the pre-built AWS Lambda layer https://github.com/jeylabs/aws-lambda-poppler-layer/releases and it worked!
You can use this solution if you just want to run the function, but If you want to specify the version and have more control, I'll recommend using the container image solution.

Straightforward Build Instructions for Poppler on Lambda using Docker
In order to put Poppler on Lambda, we will build a zipped folder containing poppler and add it as a layer. Follow these steps on an EC2 instance running Amazon Linux 2 (t2micro is plenty).
Setup the machine
Install docker on the EC2 machine. Instructions here
mkdir -p poppler_binaries
Create a Dockerfile
Use this link or copy/paste from below.
FROM ubuntu:18.04
# Installing dependencies
RUN apt update
RUN apt-get update
RUN apt-get install -y locate \
libopenjp2-7 \
poppler-utils
RUN rm -rf /poppler_binaries; mkdir /poppler_binaries;
RUN updatedb
RUN cp $(locate libpoppler.so) /poppler_binaries/.
RUN cp $(which pdftoppm) /poppler_binaries/.
RUN cp $(which pdfinfo) /poppler_binaries/.
RUN cp $(which pdftocairo) /poppler_binaries/.
RUN cp $(locate libjpeg.so.8 ) /poppler_binaries/.
RUN cp $(locate libopenjp2.so.7 ) /poppler_binaries/.
RUN cp $(locate libpng16.so.16 ) /poppler_binaries/.
RUN cp $(locate libz.so.1 ) /poppler_binaries/.
Build Docker Image and create a zip file
Running the commands below will produce a zip file in your home directory.
docker build -t poppler-build .
# Run the container
docker run -d --name poppler-build-cont poppler-build sleep 20
#docker exec poppler-build-cont
sudo docker cp poppler-build-cont:/poppler_binaries .
# Cleaning up
docker kill poppler-build-cont
docker rm poppler-build-cont
docker image rm poppler-build
cd poppler_binaries
zip -r9 ..poppler.zip .
cd ..
Make and add your Lambda Layer
Download your zip file or upload it to S3. Head to the Lambda Console page to create a Layer and then add it to your function. Information about layers here.
Add Environment Variable to Lambda
In order to avoid adding unnecessary folder structure to the zip as described here. We will add an environment variable to point to our dependency
PYTHONPATH: /opt/
And Viola! You now have a working Lambda function with Poppler!
Note: Credit to these two articles which helped me piece this together
pdf2image-as-a-service GitHub Repo
Pete Wilcock Website
Warning: do not try to add pdf2image to the same layer. I am not sure why but when they are in the same layer, pdf2image cannot find poppler.

Hi #Alex Albracht thanks for compiled easy instructions! They helped a lot. But I really struggled with getting the lambda function find the poppler path. So, I'll try to add that up with an effort to make it clear.
The binary files should go in a zip folder having structure as:
poppler.zip -> bin/poppler
where poppler folder contains the binary files. This zip folder can be then uploaded as a layer in AWS lambda.
For pdf2image to work, it needs poppler path. This should be included in the lambda function in the format - "/opt/bin/poppler".
For example,
poppler_path = "/opt/bin/poppler"
pages = convert_from_path(PDF_file, 500, poppler_path=poppler_path)

Related

AppDynamics Serverless APM for Lambda. Auto instrument docker layer extension. Python app. Error = Extension.Crash

I'm trying to implement APM Serverless for AWS Lambda in a Python function. The function is deployed via a container image, so the extension is built as a layer in the Dockerfile.
Firstly, in case someone is trying to auto instrument this process; The extension must be unzipped into /opt/extensions/ not to /opt/ like suggested in the docs. Otherwise, the Lambda won't see the extension.
This is the Dockerfile:
FROM amazon/aws-cli:2.2.4 AS downloader
ARG version_number=10
ARG region=
ENV AWS_REGION=${region}
ENV VERSION_NUMBER=${version_number}
RUN yum install -y jq curl
WORKDIR /aws
RUN aws lambda get-layer-version-by-arn --arn arn:aws:lambda:$AWS_REGION:716333212585:layer:appdynamics-lambda-extension:$VERSION_NUMBER | jq -r '.Content.Location' | xargs curl -o extension.zip
# set base image (host OS)
FROM python:3.6.8-slim
ENV APPDYNAMICS_PYTHON_AUTOINSTRUMENT=true
# set the working directory for AppD
WORKDIR /opt
RUN apt-get clean \
&& apt-get -y update \
&& apt-get -y install python3-dev \
python3-psycopg2 \
&& apt-get -y install build-essential
COPY --from=downloader /aws/extension.zip .
RUN apt-get install -y unzip && unzip extension.zip -d /opt/extensions/ && rm -f extension.zip
# set the working directory in the container
WORKDIR /code
RUN pip install --upgrade pip \
&& pip install awslambdaric && pip install appdynamics-lambda-tracer
# copy the dependencies file to the working directory
COPY requirements.txt .
# install dependencies
RUN pip install -r requirements.txt
# copy the content of the local src directory to the working directory
COPY /src .
RUN chmod 644 $(find . -type f) \
&& chmod 755 $(find . -type d)
# command to run on container start
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.lambda_handler" ]
However, when executing the function, I get the following error:
AWS_Execution_Env is not supported, running lambda with default arguments.
EXTENSION Name: appdynamics-extension-script State: Started Events: []
Start RequestId <requestid>
End RequestId <requestid>
Error: exit code 0 Extension.Crash
Without more information.
When I try to implement the tracer manually, by installing appdynamics-lambda-tracer with pip, and importing the module I do see the logs in Cloudwatch from AppDynamics but they don't report to the controller.
Any idea what could be causing said crash?

standard_init_linux.go:211: exec user process caused "exec format error" when build tiktokapi:latest [duplicate]

docker started throwing this error:
standard_init_linux.go:178: exec user process caused "exec format error"
whenever I run a specific docker container with CMD or ENTRYPOINT, with no regard to any changes to the file other then removing CMD or ENTRYPOINT. here is the docker file I have been working with which worked perfectly until about an hour ago:
FROM buildpack-deps:jessie
ENV PATH /usr/local/bin:$PATH
ENV LANG C.UTF-8
RUN apt-get update && apt-get install -y --no-install-recommends \
tcl \
tk \
&& rm -rf /var/lib/apt/lists/*
ENV GPG_KEY 0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D
ENV PYTHON_VERSION 3.6.0
ENV PYTHON_PIP_VERSION 9.0.1
RUN set -ex \
&& buildDeps=' \
tcl-dev \
tk-dev \
' \
&& apt-get update && apt-get install -y $buildDeps --no-install-recommends && rm -rf /var/lib/apt/lists/* \
\
&& wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \
&& wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \
&& gpg --batch --verify python.tar.xz.asc python.tar.xz \
&& rm -r "$GNUPGHOME" python.tar.xz.asc \
&& mkdir -p /usr/src/python \
&& tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
&& rm python.tar.xz \
\
&& cd /usr/src/python \
&& ./configure \
--enable-loadable-sqlite-extensions \
--enable-shared \
&& make -j$(nproc) \
&& make install \
&& ldconfig \
\
&& if [ ! -e /usr/local/bin/pip3 ]; then : \
&& wget -O /tmp/get-pip.py 'https://bootstrap.pypa.io/get-pip.py' \
&& python3 /tmp/get-pip.py "pip==$PYTHON_PIP_VERSION" \
&& rm /tmp/get-pip.py \
; fi \
&& pip3 install --no-cache-dir --upgrade --force-reinstall "pip==$PYTHON_PIP_VERSION" \
&& [ "$(pip list |tac|tac| awk -F '[ ()]+' '$1 == "pip" { print $2; exit }')" = "$PYTHON_PIP_VERSION" ] \
\
&& find /usr/local -depth \
\( \
\( -type d -a -name test -o -name tests \) \
-o \
\( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
\) -exec rm -rf '{}' + \
&& apt-get purge -y --auto-remove $buildDeps \
&& rm -rf /usr/src/python ~/.cache
RUN cd /usr/local/bin \
&& { [ -e easy_install ] || ln -s easy_install-* easy_install; } \
&& ln -s idle3 idle \
&& ln -s pydoc3 pydoc \
&& ln -s python3 python \
&& ln -s python3-config python-config
RUN pip install uwsgi
RUN mkdir /config
RUN mkdir /logs
ENV HOME /var/www
WORKDIR /config
ADD conf/requirements.txt /config
RUN pip install -r /config/requirements.txt
ADD conf/wsgi.py /config
ADD conf/wsgi.ini /config
ADD conf/__init__.py /config
ADD start.sh /bin/start.sh
RUN chmod +x /bin/start.sh
EXPOSE 8000
ENTRYPOINT ["start.sh", "uwsgi", "--ini", "wsgi.ini"]
I forgot to put
#!/bin/bash
at the top of the sh file, problem solved.
This can happen if you're trying to run an x86 built image on an arm64/aarch64 machine.
You'll need to rebuild the image using the corresponding architecture
This error could also occur if an image was built on a MacBook Pro with a Apple M1 Pro chip, which is ARM-based, so by default the Docker build command targets arm64.
Docker in fact detects the Apple M1 Pro platform as linux/arm64/v8
Specifying the platform to both the build command and version tag was enough:
# Build for ARM64 (default)
docker build -t <image-name>:<version>-arm64 .
# Build for ARM64
docker build --platform=linux/arm64 -t <image-name>:<version>-arm64 .
# Build for AMD64
docker build --platform=linux/amd64 -t <image-name>:<version>-amd64 .
Environment
Chip: Apple M1 Pro, 10 Cores (8 performance and 2 efficiency)
Docker version 20.10.12, build e91ed57
Add this code
#!/usr/bin/env bash
at the top of your script file.
If the Docker image is built on an M1 chip and uploaded to be deployed by Fargate then you’ll notice this container error in Fargate:
standard_init_linux.go:228: exec user process caused: exec format error
There’s a couple ways to work around this. You can either:
Build your docker image using:
docker buildx build --platform=linux/amd64 -t image-name:version .
Update your Dockerfile’s FROM statements with
FROM --platform=linux/amd64 BASE_IMAGE:VERSION
Got the same Error, i was building ARM image after changing to AMD. Issue Fixed
That error usually means you're trying to run this amd64 image on a non-amd64 host (such as 32-bit or ARM).
TRY BUILDING by using buildx and specifying --platform linux/amd64
Sample Command
docker buildx build -t ranjithkumarmv/node-12.13.0-awscli . --platform linux/amd64
If you're getting this in AWS ECS, you probably built the image with an Apple M1 Pro chip. In your Dockerfile, you can add the following:
FROM --platform=linux/amd64 <image>:<tag>.
If you're using a sub-image, ex: FROM <parent_image_you_created>:<tag> you'll want to make sure the <parent_image_you_created>:<tag> was built with FROM --platform=linux/amd64 <image>:<tag>.
Another possible reason for this could be if the file is saved with Windows line endings (CRLF). Save it with Unix line endings (LF) and the file will be found.
Extending to the accepted answer:
For an alpine (without bash) image:
#!/bin/ash
at the top of the sh file, solves the problem.
I'm currently rocking an M1 Mac, I was also running into this issue a little earlier. I actually realized a Fargate task that I had deployed as part of a stack, had not been running for over a month because I had deployed it on my M1 Mac laptop. The deploy script was working fine on my old Intel-based Mac.
The error I was just noticing in the CW logs (again the task had been failing for over a month) was just exactly as follows:
standard_init_linux.go:228: exec user process caused: exec format error
To fix it, in all the docker build steps -- since I had one for building a lambda layer, and another for building the Fargate task -- I just updated to add the --platform=linux/amd64. Please note, it did not work for me if I for added the tag after the -t argument, for example like img-test:0.1-amd64. I wonder if perhaps that could be because I was referencing the :latest tag later in my script.
In any case, these were the changes necessitated. You'll notice, all I really did was add the --platform argument; everything else stayed the same.
ecr_repository="some/app"
docker build -t ${ecr_repository} \
--platform=linux/amd64 \
--build-arg SOME_ARG='value' \
.
And I'm not sure if it's technically required, but just to be on the safe side, I also updated all the FROM statements in my Dockerfiles:
FROM --platform=linux/amd64 ubuntu:<version>
For those trying to build images for aarch64 or armv7l architectures on amd64 Linux system and running into the same error: check if qemu-user-static package is installed. If not, install it with sudo apt install qemu-user-static on Ubuntu/Debian/Mint etc, or with sudo dnf install qemu-user-static on Fedora
None of this worked for me. Why is no one mentioning the BOM?
This error occurs if you have a Byte Order Mark at the start of your file.
You can check with:
head -n 1 yourscript | LC_ALL=C od -tc
In Notepad++ you can save text in UTF-8 without the BOM by selecting the appropriate option in the Encoding menu:
Encoding > UTF-8
I have faced same issue in RHEL 7.3, docker 17.05-ce when running offline loaded image. It appeared the default storage driver of RHEL/CentOS changed from device-mapper to overlay. Reverting back the driver to devicemapper fixed the problem.
dockerd --storage-driver=devicemapper
or
/etc/docker/daemon.json
{
"storage-driver": "devicemapper"
}
One more possibility is that #!/bin/bash is not in the very first line. There must be really nothing before it (no empty lines, nothing).
Not a direct answer to the question asked. Although I got the error while calling "docker-compose up" to bring my nodejs application up. Realized that in my "Dockerfile" i had CMD ["./server.js"].
To fix i replaced it with CMD ["npm","start"] and that solved the issue. Hope if someone lands here for this exception may find this useful.
standard_init_linux.go:228: exec user process caused: exec format error
For me, it's because there is no main package in my go project. Hope it helps someone.
If you have buildx installed and still run into this error, it might be because cross-platform emulators are missing.
These can be installed like this:
docker run --privileged --rm tonistiigi/binfmt --install all
Source: https://docs.docker.com/build/building/multi-platform/#building-multi-platform-images
Happened to me when trying to build a linux/arm64 image via Gitlab pipeline running on a Gitlab runner with platform linux/amd64.
In my case, I "drained" my ECS instanced and "activated" them back again and thereafter the error vanished.
If you are using an IBR1700 router which runs containers, you may get a similar error when in the router command line after using command container logs test (where test is the name of the container).
To fix this you need to build the application so it runs on a different platform. It uses linux/arm/v7.
docker run -it --rm --privileged docker/binfmt:a7996909642ee92942dcd6cff44b9b95f08dad64
docker buildx create --name mybuilder
docker buildx use mybuilder
docker buildx build --platform linux/arm/v7 --no-cache -t <username/repository>:<tag> . --push
Pushing to the repository with this build means it can run on the router.
https://github.com/cradlepoint/container-samples
For me, my ECS cluster was arm64 architecture, but my docker image was showing amd64 architecture. I rebuilt my docker image: https://docs.docker.com/desktop/multi-arch/
I had similar problem standard_init_linux.go:228: exec user process caused: exec format error, but nothing helped from the answers. Eventually i found that it was old docker version 17.09.0-ce which is also default on the Circle CI, so just after changing it to the most recent one problem was solved.

A more elegant docker run command

I have built a docker image using a Dockerfile that does the following:
FROM my-base-python-image
WORKDIR /opt/data/projects/project/
RUN mkdir files
RUN rm -rf /etc/yum.repos.d/*.repo
COPY rss-centos-7-config.repo /etc/yum.repos.d/
COPY files/ files/
RUN python -m venv /opt/venv && . /opt/venv/activate
RUN yum install -y unzip
WORKDIR files/
RUN unzip file.zip && rm -rf file.zip && . /opt/venv/bin/activate && python -m pip install *
WORKDIR /opt/data/projects/project/
That builds an image that allows me to run a custom command. In a terminal, for instance, here is the commmand I run after activating my project venv:
python -m pathA.ModuleB -a inputfile_a.json -b inputfile_b.json -c
Arguments a & b are custom tags to identify input files. -c calls a block of code.
So to run the built image successfully, I run the container and map local files to input files:
docker run --rm -it -v /local/inputfile_a.json:/opt/data/projects/project/inputfile_a.json -v /local/inputfile_b.json:/opt/data/projects/project/inputfile_b.json image-name:latest bash -c 'source /opt/venv/bin/activate && python -m pathA.ModuleB -a inputfile_a.json -b inputfile_b.json -c'
Besides shortening file paths, is there anythin I can do to shorten the docker run command? I'm thinking that adding a CMD and/or ENTRYPOINT to the Dockerfile would help, but I cannot figure out how to do it as I get errors.
There are a couple of things you can do to improve this.
The simplest is to run the application outside of Docker. You mention that you have a working Python virtual environment. A design goal of Docker is that programs in containers can't generally access files on the host, so if your application is all about reading and writing host files, Docker may not be a good fit.
Your file paths inside the container are fairly long, and this is bloating your -v mount options. You don't need an /opt/data/projects/project prefix; it's very typical just to use short paths like /app or /data.
You're also installing your application into a Python virtual environment, but inside a Docker image, which provides its own isolation. As you're seeing in your docker run command and elsewhere, the mechanics of activating a virtual environment in Docker are a little hairy. It's also not necessary; just skip the virtual environment setup altogether. (You can also directly run /opt/venv/bin/python and it knows it "belongs to" a virtual environment, without explicitly activating it.)
Finally, in your setup.py file, you can use a setuptools entry_points declaration to provide a script that runs your named module.
That can reduce your Dockerfile to more or less
FROM my-base-python-image
# OS-level setup
RUN rm -rf /etc/yum.repos.d/*.repo
COPY rss-centos-7-config.repo /etc/yum.repos.d/
RUN yum install -y unzip
# Copy the application in
WORKDIR /app/files
COPY files/ ./
RUN unzip file.zip \
&& rm file.zip \
&& pip install *
# Typical runtime metadata
WORKDIR /app
CMD main-script --help
And then when you run it, you can:
docker run --rm -it \
-v /local:/data \ # just map the entire directory
image-name:latest \
main-script -a /data/inputfile_a.json -b /data/inputfile_b.json -c
You can also consider the docker run -w /data option to change the current directory, which would add a Docker-level argument but slightly shorten the script command line.

Docker python output csv file

I have a script python which should output a file csv. I'm trying to have this file in the current working directory but without success.
This is my Dockerfile
FROM python:3.6.4
RUN apt-get update && apt-get install -y libaio1 wget unzip
WORKDIR /opt/oracle
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-
basiclite-linuxx64.zip && \ unzip instantclient-basiclite-linuxx64.zip && rm
-f instantclient-basiclite-linuxx64.zip && \ cd /opt/oracle/instantclient*
&& rm -f jdbc occi mysql *README jar uidrvci genezi adrci && \ echo
/opt/oracle/instantclient > /etc/ld.so.conf.d/oracle-instantclient.conf &&
ldconfig
RUN pip install --upgrade pip
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install pystan
RUN apt-get -y update && python3 -m pip install cx_Oracle --upgrade
RUN pip install -r requirements.txt
CMD [ "python", "Main.py" ]
And run the container with the following command
docker container run -v $pwd:/home/learn/rstudio_script/output image
This is bad practice to bind a volume just to have 1 file on your container be saved onto your host.
Instead, what you should leverage is the copy command:
docker cp <containerId>:/file/path/within/container /host/path/target
You can set this command to auto execute with bash, after your docker run.
So something like:
#!/bin/bash
# this stores the container id
CONTAINER_ID=$(docker run -dit img)
docker cp $CONTAINER_ID:/some_path host_path
If you are adamant on using a bind volume, then as the others have pointed out, the issue is most likely your python script isn't outputting the csv to the correct path.
Your script Main.py is probably not trying to write to /home/learn/rstudio_script/output. The working directory in the container is /app because of the last WORKDIR directive in the Dockerfile. You can override that at runtime with --workdir but then the CMD would have to be changed as well.
One solution is to have your script write files to /output/ and then run it like this:
docker container run -v $PWD:/output/ image

How to Have NaCL at AWS Lambda Properly Working?

I am using Pychromeless repo with success at AWS lambda.
Now I need to use NaCL dependency, to decrypt a string, but I am getting
Unable to import module 'lambda_function': /var/task/lib/nacl/_sodium.abi3.so
with a complement of
invalid ELF header
 
when running the function on AWS Lambda.
I know it is a problem specific related to AWS Lambda environment, because I can run the function on my Mac inside docker.
Here's my requirements.txt file
boto3==1.6.18
botocore==1.9.18
selenium==2.53.6
chromedriver-install==1.0.3
beautifulsoup4==4.6.1
certifi==2018.11.29
chardet==3.0.4
editdistance==0.5.3
future==0.17.1
idna==2.7
python-telegram-bot==10.1.0
requests==2.19.1
soupsieve==1.7.3
urllib3==1.23
PyNaCl==1.3.0
Here is the dockerfile
FROM lambci/lambda:python3.6
MAINTAINER tech#21buttons.com
USER root
ENV APP_DIR /var/task
WORKDIR $APP_DIR
COPY requirements.txt .
COPY bin ./bin
COPY lib ./lib
RUN mkdir -p $APP_DIR/lib
RUN pip3 install -r requirements.txt -t /var/task/lib
And the makefile:
clean:
rm -rf build build.zip
rm -rf __pycache__
fetch-dependencies:
mkdir -p bin/
# Get chromedriver
curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
unzip chromedriver.zip -d bin/
# Get Headless-chrome
curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
unzip headless-chromium.zip -d bin/
# Clean
rm headless-chromium.zip chromedriver.zip
docker-build:
docker-compose build
docker-run:
docker-compose run lambda src/lambda_function.lambda_handler
build-lambda-package: clean fetch-dependencies
mkdir build
cp -r src build/.
cp -r bin build/.
cp -r lib build/.
pip install -r requirements.txt -t build/lib/.
cd build; zip -9qr build.zip .
cp build/build.zip .
rm -rf build
Without the decryption part, the code works great. So the issue is 100% related to PyNaCl.
Any help on solving this?
I think you may try to setup PyNaCl like so:
SODIUM_INSTALL=system pip3 install pynacl
that will force PyNaCl to use the version of libsodium provided by AWS
see this
in the last version of PyNaClis updated to libsodium 1.0.16. so maybe it is not compatible with AWS
so you may remove PyNaCl from requirements.txt and add this to your Dockerfile:
RUN SODIUM_INSTALL=system pip3 install pynacl -t /var/task/lib
or maybe setup the dockerfile like this and keep PyNaCl in requirements.txt:
ARG SODIUM_INSTALL=system
Try also to setup sodium before installing PyNaCI:
RUN wget https://download.libsodium.org/libsodium/releases/libsodium-1.0.15.tar.gz \
&& tar xzf libsodium-1.0.15.tar.gz \
&& cd libsodium-1.0.15 \
&& ./configure \
&& make install
Ok, this is how I did it. I had to build everything on an EC2 AMI Linux 2 instance.
amzn2-ami-hvm-2.0.20190823.1-x86_64-gp2 (ami-0a1f49a762473adbd)
After launching the instance, I use this script to install Python 3.6 (and pip) and to create and activate a virtual environment.
For the docker part, I followed this tutorial, not without some troubles on the way (had to
sudo yum install polkit
and
sudo usermod -a -G docker ec2-user
and
reboot
and
sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
and
sudo chmod +x /usr/local/bin/docker-compose).
But anyway, I manage to work with docker on the EC2 instance, building the zip file and uploading it to Lambda environment, where everything worked fine, as I expected.
I thought docker was an environment independent from the host, but I guess that is not the case.

Categories

Resources