I am using Pychromeless repo with success at AWS lambda.
Now I need to use NaCL dependency, to decrypt a string, but I am getting
Unable to import module 'lambda_function': /var/task/lib/nacl/_sodium.abi3.so
with a complement of
invalid ELF header
when running the function on AWS Lambda.
I know it is a problem specific related to AWS Lambda environment, because I can run the function on my Mac inside docker.
Here's my requirements.txt file
boto3==1.6.18
botocore==1.9.18
selenium==2.53.6
chromedriver-install==1.0.3
beautifulsoup4==4.6.1
certifi==2018.11.29
chardet==3.0.4
editdistance==0.5.3
future==0.17.1
idna==2.7
python-telegram-bot==10.1.0
requests==2.19.1
soupsieve==1.7.3
urllib3==1.23
PyNaCl==1.3.0
Here is the dockerfile
FROM lambci/lambda:python3.6
MAINTAINER tech#21buttons.com
USER root
ENV APP_DIR /var/task
WORKDIR $APP_DIR
COPY requirements.txt .
COPY bin ./bin
COPY lib ./lib
RUN mkdir -p $APP_DIR/lib
RUN pip3 install -r requirements.txt -t /var/task/lib
And the makefile:
clean:
rm -rf build build.zip
rm -rf __pycache__
fetch-dependencies:
mkdir -p bin/
# Get chromedriver
curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
unzip chromedriver.zip -d bin/
# Get Headless-chrome
curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
unzip headless-chromium.zip -d bin/
# Clean
rm headless-chromium.zip chromedriver.zip
docker-build:
docker-compose build
docker-run:
docker-compose run lambda src/lambda_function.lambda_handler
build-lambda-package: clean fetch-dependencies
mkdir build
cp -r src build/.
cp -r bin build/.
cp -r lib build/.
pip install -r requirements.txt -t build/lib/.
cd build; zip -9qr build.zip .
cp build/build.zip .
rm -rf build
Without the decryption part, the code works great. So the issue is 100% related to PyNaCl.
Any help on solving this?
I think you may try to setup PyNaCl like so:
SODIUM_INSTALL=system pip3 install pynacl
that will force PyNaCl to use the version of libsodium provided by AWS
see this
in the last version of PyNaClis updated to libsodium 1.0.16. so maybe it is not compatible with AWS
so you may remove PyNaCl from requirements.txt and add this to your Dockerfile:
RUN SODIUM_INSTALL=system pip3 install pynacl -t /var/task/lib
or maybe setup the dockerfile like this and keep PyNaCl in requirements.txt:
ARG SODIUM_INSTALL=system
Try also to setup sodium before installing PyNaCI:
RUN wget https://download.libsodium.org/libsodium/releases/libsodium-1.0.15.tar.gz \
&& tar xzf libsodium-1.0.15.tar.gz \
&& cd libsodium-1.0.15 \
&& ./configure \
&& make install
Ok, this is how I did it. I had to build everything on an EC2 AMI Linux 2 instance.
amzn2-ami-hvm-2.0.20190823.1-x86_64-gp2 (ami-0a1f49a762473adbd)
After launching the instance, I use this script to install Python 3.6 (and pip) and to create and activate a virtual environment.
For the docker part, I followed this tutorial, not without some troubles on the way (had to
sudo yum install polkit
and
sudo usermod -a -G docker ec2-user
and
reboot
and
sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
and
sudo chmod +x /usr/local/bin/docker-compose).
But anyway, I manage to work with docker on the EC2 instance, building the zip file and uploading it to Lambda environment, where everything worked fine, as I expected.
I thought docker was an environment independent from the host, but I guess that is not the case.
Related
I'm trying to implement APM Serverless for AWS Lambda in a Python function. The function is deployed via a container image, so the extension is built as a layer in the Dockerfile.
Firstly, in case someone is trying to auto instrument this process; The extension must be unzipped into /opt/extensions/ not to /opt/ like suggested in the docs. Otherwise, the Lambda won't see the extension.
This is the Dockerfile:
FROM amazon/aws-cli:2.2.4 AS downloader
ARG version_number=10
ARG region=
ENV AWS_REGION=${region}
ENV VERSION_NUMBER=${version_number}
RUN yum install -y jq curl
WORKDIR /aws
RUN aws lambda get-layer-version-by-arn --arn arn:aws:lambda:$AWS_REGION:716333212585:layer:appdynamics-lambda-extension:$VERSION_NUMBER | jq -r '.Content.Location' | xargs curl -o extension.zip
# set base image (host OS)
FROM python:3.6.8-slim
ENV APPDYNAMICS_PYTHON_AUTOINSTRUMENT=true
# set the working directory for AppD
WORKDIR /opt
RUN apt-get clean \
&& apt-get -y update \
&& apt-get -y install python3-dev \
python3-psycopg2 \
&& apt-get -y install build-essential
COPY --from=downloader /aws/extension.zip .
RUN apt-get install -y unzip && unzip extension.zip -d /opt/extensions/ && rm -f extension.zip
# set the working directory in the container
WORKDIR /code
RUN pip install --upgrade pip \
&& pip install awslambdaric && pip install appdynamics-lambda-tracer
# copy the dependencies file to the working directory
COPY requirements.txt .
# install dependencies
RUN pip install -r requirements.txt
# copy the content of the local src directory to the working directory
COPY /src .
RUN chmod 644 $(find . -type f) \
&& chmod 755 $(find . -type d)
# command to run on container start
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.lambda_handler" ]
However, when executing the function, I get the following error:
AWS_Execution_Env is not supported, running lambda with default arguments.
EXTENSION Name: appdynamics-extension-script State: Started Events: []
Start RequestId <requestid>
End RequestId <requestid>
Error: exit code 0 Extension.Crash
Without more information.
When I try to implement the tracer manually, by installing appdynamics-lambda-tracer with pip, and importing the module I do see the logs in Cloudwatch from AppDynamics but they don't report to the controller.
Any idea what could be causing said crash?
I'm getting following error:
This CDK CLI is not compatible with the CDK library used by your application. Please upgrade the CLI to the latest version.
(Cloud assembly schema version mismatch: Maximum schema version supported is 8.0.0, but found 9.0.0)
after issuing cdk diff command.
I did run npm install -g aws-cdk#latest after which I've successfully installed new versions of packages: Successfully installed aws-cdk.assets-1.92.0 aws-cdk.aws-apigateway-1.92.0 aws-cdk.aws-apigatewayv2-1.92.0 ... etc. with pip install -r requirements.txt
However after typing cdk --version I'm still getting 1.85.0 (build 5f44668).
My part of setup.py is as follows:
install_requires=[
"aws-cdk.core==1.92.0",
"aws-cdk.aws-ec2==1.92.0",
"aws-cdk.aws_ecs==1.92.0",
"aws-cdk.aws_elasticloadbalancingv2==1.92.0"
],
And I'm stuck now, as downgrading packages setup.py to 1.85.0 throwing ImportError: cannot import name 'CapacityProviderStrategy' from 'aws_cdk.aws_ecs'.
Help :), I would like to use newest packages version.
I encountered this issue with a typescript package, after upgrading the cdk in package.json. As Maciej noted upgrading did not seem to work. I am installing the cdk cli with npm, and an uninstall followed by an install fixed the issue.
npm -g uninstall aws-cdk
npm -g install aws-cdk
So I've fixed it, but is too chaotic to describe the steps.
It seems like there are problems with the symlink
/usr/local/bin/cdk
which was pointing to version 1.85.0 and not the one I updated to 1.92.0.
I removed the aws-cdk from the node_modules and installed it again, then removed the symlink /usr/local/bin/cdk and recreated it manually with
ln -s /usr/lib/node_modules/aws-cdk/bin/cdk /usr/local/bin/cdk
nothing helps for mac OS except this command:
yarn global upgrade aws-cdk#latest
For those coming here that are not using a global install (using cdk from node_modules) and using a mono-repo. This issue is due to the aws-cdk package of the devDependencies not matching the version of the dependencies for the package.
I was using "aws-cdk": "2.18.0" in my root package.json but all my packages were using "aws-cdk-lib": "2.32.1" as their dependencies. By updating the root package.json to use "aws-cdk": "2.31.1" this solved the issue.
I have been experiencing this a few times as well, so i am just dropping this solution which helps me resolves version mismatches, particularly on existing projects.
For Python:
What you need to do is modify the setup.py to specify the latest version.
Either, implicitly
install_requires=[
"aws-cdk.core",
"aws-cdk.aws-ec2",
"aws-cdk.aws_ecs",
"aws-cdk.aws_elasticloadbalancingv2"
],
or explicitly;
install_requires=[
"aws-cdk.core==1.xx.x",
"aws-cdk.aws-ec2==1.xx.x",
"aws-cdk.aws_ecs==1.xx.x",
"aws-cdk.aws_elasticloadbalancingv2==1.xx.x"
],
Then from project root, run;
setup.py install
For TypeScript:
Modify package.json;
"dependencies": {
"#aws-cdk/core" : "latest",
"source-map-support": "^0.5.16"
}
Then run from project root:
npm install
I hope this helps! Please let me know if I need to elaborate or provide more details.
Uninstall the CDK version:
npm uninstall -g aws-cdk
Install specfic version which your application is using. For ex: CDK 1.158.0
npm install -g aws-cdk#1.158.0
If you've made it down here there are 2 options to overcome this issue.
Provision a cloud9 env and run the code there.
Run the app in a container.
Move the cdk app into a folder so that you have a parent folder to put a dockerfile a makefile and a dockercompose file.
fire up docker desktop
cd into the root folder from the terminal and then run make
Dockerfile contents
FROM ubuntu:20.04 as compiler
WORKDIR /app/
RUN apt-get update -y \
&& apt install python3 -y \
&& apt install python3-pip -y \
&& apt install python3-venv -y \
&& python3 -m venv venv
ARG NODE_VERSION=16
RUN ls
RUN apt-get update
RUN apt-get install xz-utils
RUN apt-get -y install curl
RUN apt-get update -y && \
apt-get upgrade -y && \
apt-get install -y && \
curl -sL https://deb.nodesource.com/setup_$NODE_VERSION.x | bash - && \
apt install -y nodejs
RUN apt-get install -y python-is-python3
RUN npm install -g aws-cdk
RUN python -m venv /opt/venv
docker-compose.yaml contents
version: '3.6'
services:
cdk-base:
build: .
image: cdk-base
command: ${COMPOSE_COMMAND:-bash}
volumes:
- .:/app
- /var/run/docker.sock:/var/run/docker.sock #Needed so a docker container can be run from inside a docker container
- ~/.aws/:/root/.aws:ro
Makefile content
SHELL=/bin/bash
CDK_DIR=cdk_app/
COMPOSE_RUN = docker-compose run --rm cdk-base
COMPOSE_UP = docker-compose up
PROFILE = --profile default
all: pre-reqs synth
pre-reqs: _prep-cache container-build npm-install
_prep-cache: #This resolves Error: EACCES: permission denied, open 'cdk.out/tree.json'
mkdir -p cdk_websocket/cdk.out/
container-build: pre-reqs
docker-compose build
container-info:
${COMPOSE_RUN} make _container-info
_container-info:
./containerInfo.sh
clear-cache:
${COMPOSE_RUN} rm -rf ${CDK_DIR}cdk.out && rm -rf ${CDK_DIR}node_modules
cli: _prep-cache
docker-compose run cdk-base /bin/bash
npm-install: _prep-cache
${COMPOSE_RUN} make _npm-install
_npm-install:
cd ${CDK_DIR} && ls && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt && npm -v && python --version
npm-update: _prep-cache
${COMPOSE_RUN} make _npm-update
_npm-update:
cd ${CDK_DIR} && npm update
synth: _prep-cache
${COMPOSE_RUN} make _synth
_synth:
cd ${CDK_DIR} && source .venv/bin/activate && pip install -r requirements.txt && cdk synth --no-staging ${PROFILE} && cdk deploy --require-approval never ${PROFILE}
bootstrap: _prep-cache
${COMPOSE_RUN} make _bootstrap
_bootstrap:
cd ${CDK_DIR} && source .venv/bin/activate && pip install -r requirements.txt && cdk bootstrap ${PROFILE}
deploy: _prep-cache
${COMPOSE_RUN} make _deploy
_deploy:
cd ${CDK_DIR} && cdk deploy --require-approval never ${PROFILE}
destroy:
${COMPOSE_RUN} make _destroy
_destroy:
cd ${CDK_DIR} && source .venv/bin/activate && pip install -r requirements.txt && cdk destroy --force ${PROFILE}
diff: _prep-cache
${COMPOSE_RUN} make _diff
_diff: _prep-cache
cd ${CDK_DIR} && cdk diff ${PROFILE}
test:
${COMPOSE_RUN} make _test
_test:
cd ${CDK_DIR} && npm test
This worked for me in my dev environment since I had re-cloned the source, I needed to re-run the npm install command. A sample package.json might look like this (update the versions as needed):
{
"dependencies": {
"aws-cdk": "2.27.0",
"node": "^16.14.0"
}
}
I changed typescript": "^4.7.3" version and it worked.
Note: cdk version 2("aws-cdk-lib": "^2.55.0")
I have installed a library called fastai==1.0.59 via requirements.txt file inside my Dockerfile.
But the purpose of running the Django app is not achieved because of one error. To solve that error, I need to manually edit the files /site-packages/fastai/torch_core.py and site-packages/fastai/basic_train.py inside this library folder which I don't intend to.
Therefore I'm trying to copy the fastai folder itself from my host machine to the location inside docker image.
source location: /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/
destination location: ../venv/lib/python3.6/site-packages/ which is inside my docker image.
being new to docker, I tried this using COPY command like:
COPY /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/ ../venv/lib/python3.6/site-packages/
which gave me an error:
ERROR: Service 'app' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder583041406/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai: no such file or directory.
I tried referring this: How to include files outside of Docker's build context?
but seems like it bounced off my head a bit..
Please help me tackling this. Thanks.
Dockerfile:
FROM python:3.6-slim-buster AS build
MAINTAINER model1
ENV PYTHONUNBUFFERED 1
RUN python3 -m venv /venv
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y git && \
apt-get install -y build-essential && \
apt-get install -y awscli && \
apt-get install -y unzip && \
apt-get install -y nano && \
apt-get install -y libsm6 libxext6 libxrender-dev
RUN apt-cache search mysql-server
RUN apt-cache search libmysqlclient-dev
RUN apt-get install -y libpq-dev
RUN apt-get install -y postgresql
RUN apt-cache search postgresql-server-dev-9.5
RUN apt-get install -y libglib2.0-0
RUN mkdir -p /model/
COPY . /model/
WORKDIR /model/
RUN pip install --upgrade awscli==1.14.5 s3cmd==2.0.1 python-magic
RUN pip install -r ./requirements.txt
EXPOSE 8001
RUN chmod -R 777 /model/
COPY /Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/fastai/ ../venv/lib/python3.6/site-packages/
CMD python3 -m /venv/activate
CMD /model/my_setup.sh development
CMD export API_ENV = development
CMD cd server && \
python manage.py migrate && \
python manage.py runserver 0.0.0.0:8001
Short Answer
No
Long Answer
When you run docker build the current directory and all of its contents (subdirectories and all) are copied into a staging area called the 'build context'. When you issue a COPY instruction in the Dockerfile, docker will copy from the staging area into a layer in the image's filesystem.
As you can see, this procludes copying files from directories outside the build context.
Workaround
Either download the files you want from their golden-source directly into the image during the build process (this is why you often see a lot of curl statements in Dockerfiles), or you can copy the files (dirs) you need into the build-tree and check them into source control as part of your project. Which method you choose is entirely dependent on the nature of your project and the files you need.
Notes
There are other workarounds documented for this, all of them without exception break the intent of 'portability' of your build. The only quality solutions are those documented here (though I'm happy to add to this list if I've missed any that preserve portability).
I have a script python which should output a file csv. I'm trying to have this file in the current working directory but without success.
This is my Dockerfile
FROM python:3.6.4
RUN apt-get update && apt-get install -y libaio1 wget unzip
WORKDIR /opt/oracle
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-
basiclite-linuxx64.zip && \ unzip instantclient-basiclite-linuxx64.zip && rm
-f instantclient-basiclite-linuxx64.zip && \ cd /opt/oracle/instantclient*
&& rm -f jdbc occi mysql *README jar uidrvci genezi adrci && \ echo
/opt/oracle/instantclient > /etc/ld.so.conf.d/oracle-instantclient.conf &&
ldconfig
RUN pip install --upgrade pip
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install pystan
RUN apt-get -y update && python3 -m pip install cx_Oracle --upgrade
RUN pip install -r requirements.txt
CMD [ "python", "Main.py" ]
And run the container with the following command
docker container run -v $pwd:/home/learn/rstudio_script/output image
This is bad practice to bind a volume just to have 1 file on your container be saved onto your host.
Instead, what you should leverage is the copy command:
docker cp <containerId>:/file/path/within/container /host/path/target
You can set this command to auto execute with bash, after your docker run.
So something like:
#!/bin/bash
# this stores the container id
CONTAINER_ID=$(docker run -dit img)
docker cp $CONTAINER_ID:/some_path host_path
If you are adamant on using a bind volume, then as the others have pointed out, the issue is most likely your python script isn't outputting the csv to the correct path.
Your script Main.py is probably not trying to write to /home/learn/rstudio_script/output. The working directory in the container is /app because of the last WORKDIR directive in the Dockerfile. You can override that at runtime with --workdir but then the CMD would have to be changed as well.
One solution is to have your script write files to /output/ and then run it like this:
docker container run -v $PWD:/output/ image
I have to run pdf2image on my Python Lambda Function in AWS, but it requires poppler and poppler-utils to be installed on the machine.
I have tried to search in many different places how to do that but could not find anything or anyone that have done that using lambda functions.
Would any of you know how to generate poppler binaries, put it on my Lambda package and tell Lambda to use that?
Thank you all.
AWS lambda runs under an execution environment which includes software and libraries if anything you need is not there you need to install it to create an execution environment.Check the below link for more info ,
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
for poppler follow this steps to create your own binary
https://github.com/skylander86/lambda-text-extractor/blob/master/BuildingBinaries.md
My approach was to use the AWS Linux 2 image as a base to ensure maximum compatibility with the Lambda environment, compile openjpeg and poppler in the container build and build a zip containing the binaries and libraries needed which can then by used as a layer.
This enables you to write your code in it's own lambda which pulls in the poppler dependencies as a layer, simplifying build and deployment.
The contents of the layer will be unpacked into /opt/. This means the contents will automatically be available because by default in the lambda environment
$PATH is /usr/local/bin:/usr/bin/:/bin:/opt/bin
$LD_LIBRARY_PATH is /lib64:/usr/lib64:$LAMBDA_RUNTIME_DIR:$LAMBDA_RUNTIME_DIR/lib:$LAMBDA_TASK_ROOT:$LAMBDA_TASK_ROOT/lib:/opt/lib
Dockerfile :
# https://www.petewilcock.com/using-poppler-pdftotext-and-other-custom-binaries-on-aws-lambda/
ARG POPPLER_VERSION="21.10.0"
ARG POPPLER_DATA_VERSION="0.4.11"
ARG OPENJPEG_VERSION="2.4.0"
FROM amazonlinux:2
ARG POPPLER_VERSION
ARG POPPLER_DATA_VERSION
ARG OPENJPEG_VERSION
WORKDIR /root
RUN yum update -y
RUN yum install -y \
cmake \
cmake3 \
fontconfig-devel \
gcc \
gcc-c++ \
gzip \
libjpeg-devel \
libpng-devel \
libtiff-devel \
make \
tar \
xz \
zip
RUN curl -o poppler.tar.xz https://poppler.freedesktop.org/poppler-${POPPLER_VERSION}.tar.xz
RUN tar xf poppler.tar.xz
RUN curl -o poppler-data.tar.gz https://poppler.freedesktop.org/poppler-data-${POPPLER_DATA_VERSION}.tar.gz
RUN tar xf poppler-data.tar.gz
RUN curl -o openjpeg.tar.gz https://codeload.github.com/uclouvain/openjpeg/tar.gz/refs/tags/v${OPENJPEG_VERSION}
RUN tar xf openjpeg.tar.gz
WORKDIR poppler-data-${POPPLER_DATA_VERSION}
RUN make install
WORKDIR /root
RUN mkdir openjpeg-${OPENJPEG_VERSION}/build
WORKDIR openjpeg-${OPENJPEG_VERSION}/build
RUN cmake .. -DCMAKE_BUILD_TYPE=Release
RUN make
RUN make install
WORKDIR /root
RUN mkdir poppler-${POPPLER_VERSION}/build
WORKDIR poppler-${POPPLER_VERSION}/build
RUN cmake3 .. -DCMAKE_BUILD_TYPE=release -DBUILD_GTK_TESTS=OFF -DBUILD_QT5_TESTS=OFF -DBUILD_QT6_TESTS=OFF \
-DBUILD_CPP_TESTS=OFF -DBUILD_MANUAL_TESTS=OFF -DENABLE_BOOST=OFF -DENABLE_CPP=OFF -DENABLE_GLIB=OFF \
-DENABLE_GOBJECT_INTROSPECTION=OFF -DENABLE_GTK_DOC=OFF -DENABLE_QT5=OFF -DENABLE_QT6=OFF \
-DENABLE_LIBOPENJPEG=openjpeg2 -DENABLE_CMS=none -DBUILD_SHARED_LIBS=OFF
RUN make
RUN make install
WORKDIR /root
RUN mkdir -p package/{lib,bin,share}
RUN cp -d /usr/lib64/libexpat* package/lib
RUN cp -d /usr/lib64/libfontconfig* package/lib
RUN cp -d /usr/lib64/libfreetype* package/lib
RUN cp -d /usr/lib64/libjbig* package/lib
RUN cp -d /usr/lib64/libjpeg* package/lib
RUN cp -d /usr/lib64/libpng* package/lib
RUN cp -d /usr/lib64/libtiff* package/lib
RUN cp -d /usr/lib64/libuuid* package/lib
RUN cp -d /usr/lib64/libz* package/lib
RUN cp -rd /usr/local/lib/* package/lib
RUN cp -rd /usr/local/lib64/* package/lib
RUN cp -d /usr/local/bin/* package/bin
RUN cp -rd /usr/local/share/poppler package/share
WORKDIR package
RUN zip -r9 ../package.zip *
And to run...
docker build -t poppler .
docker run --name poppler -d -t poppler cat
docker cp poppler:/root/package.zip .
Then upload package.zip as a layer using the console or aws cli.
I used the pre-built AWS Lambda layer https://github.com/jeylabs/aws-lambda-poppler-layer/releases and it worked!
You can use this solution if you just want to run the function, but If you want to specify the version and have more control, I'll recommend using the container image solution.
Straightforward Build Instructions for Poppler on Lambda using Docker
In order to put Poppler on Lambda, we will build a zipped folder containing poppler and add it as a layer. Follow these steps on an EC2 instance running Amazon Linux 2 (t2micro is plenty).
Setup the machine
Install docker on the EC2 machine. Instructions here
mkdir -p poppler_binaries
Create a Dockerfile
Use this link or copy/paste from below.
FROM ubuntu:18.04
# Installing dependencies
RUN apt update
RUN apt-get update
RUN apt-get install -y locate \
libopenjp2-7 \
poppler-utils
RUN rm -rf /poppler_binaries; mkdir /poppler_binaries;
RUN updatedb
RUN cp $(locate libpoppler.so) /poppler_binaries/.
RUN cp $(which pdftoppm) /poppler_binaries/.
RUN cp $(which pdfinfo) /poppler_binaries/.
RUN cp $(which pdftocairo) /poppler_binaries/.
RUN cp $(locate libjpeg.so.8 ) /poppler_binaries/.
RUN cp $(locate libopenjp2.so.7 ) /poppler_binaries/.
RUN cp $(locate libpng16.so.16 ) /poppler_binaries/.
RUN cp $(locate libz.so.1 ) /poppler_binaries/.
Build Docker Image and create a zip file
Running the commands below will produce a zip file in your home directory.
docker build -t poppler-build .
# Run the container
docker run -d --name poppler-build-cont poppler-build sleep 20
#docker exec poppler-build-cont
sudo docker cp poppler-build-cont:/poppler_binaries .
# Cleaning up
docker kill poppler-build-cont
docker rm poppler-build-cont
docker image rm poppler-build
cd poppler_binaries
zip -r9 ..poppler.zip .
cd ..
Make and add your Lambda Layer
Download your zip file or upload it to S3. Head to the Lambda Console page to create a Layer and then add it to your function. Information about layers here.
Add Environment Variable to Lambda
In order to avoid adding unnecessary folder structure to the zip as described here. We will add an environment variable to point to our dependency
PYTHONPATH: /opt/
And Viola! You now have a working Lambda function with Poppler!
Note: Credit to these two articles which helped me piece this together
pdf2image-as-a-service GitHub Repo
Pete Wilcock Website
Warning: do not try to add pdf2image to the same layer. I am not sure why but when they are in the same layer, pdf2image cannot find poppler.
Hi #Alex Albracht thanks for compiled easy instructions! They helped a lot. But I really struggled with getting the lambda function find the poppler path. So, I'll try to add that up with an effort to make it clear.
The binary files should go in a zip folder having structure as:
poppler.zip -> bin/poppler
where poppler folder contains the binary files. This zip folder can be then uploaded as a layer in AWS lambda.
For pdf2image to work, it needs poppler path. This should be included in the lambda function in the format - "/opt/bin/poppler".
For example,
poppler_path = "/opt/bin/poppler"
pages = convert_from_path(PDF_file, 500, poppler_path=poppler_path)