How to package Scrapy dependency to lambda? - python

I am writing a python application which dependents on Scrapy module. It works fine locally but failed when I run it from aws lambda test console. My python project has a requirements.txt file with below dependency:
scrapy==1.6.0
I packaged all dependencies by following this link: https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html. And also, I put my source code *.py at the root level of in the zip file. My package script can be found https://github.com/zhaoyi0113/quote-datalake/blob/master/bin/deploy.sh.
It basically does two things, first run command pip install -r requirements.txt -t dist to download all dependencies to dist directory. second, copy app python source code to dist directory.
The deployment is done via terraform and below is the configuration file.
provider "aws" {
profile = "default"
region = "ap-southeast-2"
}
variable "runtime" {
default = "python3.6"
}
data "archive_file" "zipit" {
type = "zip"
source_dir = "crawler/dist"
output_path = "crawler/dist/deploy.zip"
}
resource "aws_lambda_function" "test_lambda" {
filename = "crawler/dist/deploy.zip"
function_name = "quote-crawler"
role = "arn:aws:iam::773592622512:role/LambdaRole"
handler = "handler.handler"
source_code_hash = "${data.archive_file.zipit.output_base64sha256}"
runtime = "${var.runtime}"
}
It zip the directory and upload the file to lambda.
I found I get the runtime error in lambda Unable to import module 'handler': cannot import name 'etree' when there is a statement import scrapy. I didn't use etree in my code so I believe there is something used by scrapy.
My source code can be found at https://github.com/zhaoyi0113/quote-datalake/tree/master/crawler. There are only two simple python files.
It works fine if I run them locally. The error only appears in lambda. Is there a different way to package scrapy to lambda?

Based on the communication with Tim, the issue is caused by incompatible library versions between local and lambda.
The easiest way to resolve this issue is to use the docker image lambci/lambda to build a package with the command:
$ docker run -v $(pwd):/outputs -it --rm lambci/lambda:build-python3.6 pip install scrapy -t /outputs/

You need to provide the entire dependency tree, scrapy also has a set of dependencies (and they may also have dependencies).
The easiest way to download all the required dependencies is to use pip
$ pip -t packages/ install scrapy
This will download scrapy and all its dependencies into the folder packages.
Scrapy has lxml and pyOpenSSL as dependencies that include compiled components. Unless they are statically compiled they will likely require that the c-libraries they require are also installed on the lambda VM.
From the lxml documentation it requires:
libxml2 version 2.9.2 or later.
libxslt version 1.1.27 or later.
We recommend libxslt 1.1.28 or later.
Maybe try adding installation of these to your deploy script. You should be able to use (I'm making a guess at the package names) yum -y install libxml2 libxslt
Another good idea is to test your scripts on an Amazon Linux EC2 instance as this is close to the environment that Lambda executes in.

Related

Unable to import module AWS Lambda Function

I'm having issues running basically any lambda function on AWS as the Lambda Function tool would not be able to import the module.
I tried to import the packages as layers - pretty wrong I think. (https://www.linkedin.com/pulse/add-external-python-libraries-aws-lambda-using-layers-gabe-olokun/)
Then I've tried to import the packages as environments (from local or bash scripted):
local - like this boy https://www.youtube.com/watch?v=NGteAkN2WYc
or using bash scripting (AWS Cloudshell) - https://docs.aws.amazon.com/lambda/latest/dg/python-package.html
Either ways:
The lambda environment would look like this:
import psycopg2
def lambda_handler(event, context):
# Connect to PostgreSQL database
conn = psycopg2.connect(
host=['host'],
database=['postgres'],
user=['user'],
password=['password']
)
And I'm hitting the following errors:
-if the lambda_function.py is inside the "psycopg2+py" directory: errorMessage": "Unable to import module 'lambda_function': No module named 'lambda_function'
-if the lambda_function.py is outside the "psycopg2+py" directory and just inside the "postgresql" directory: errorMessage": "Unable to import module 'lambda_function': No module named 'psycopg2'
And I suposse the Handler is set correctly :
I must also mention, when I set up the environment to install the packages, I was using Python 3.9, the same version that I'm using on Lambda function.
Also, I've tried the same methods with another package, like fastapi, still not working, so it seems to be a functionality issue not a package issue.
I don't have any idea what else should I try.
Here is how you can go about adding dependencies for a lambda function:
Install required python packages (in this case psycopg2):
# mkdir workspace; cd workspace
# pip3.9 install pip --upgrade
# pip3.9 install --platform manylinux2014_x86_64 \
--target=./python/lib/python3.9/site-packages --implementation cp \
--python 3.9 --only-binary=:all: --upgrade psycopg2
Rename _psycopg.xxxxxx.so file:
mv ./python/lib/python3.9/site-packages/psycopg2/_psycopg*.so ./python/lib/python3.9/site-packages/psycopg2/_psycopg.so
Zip the created python directory:
# zip -r requirements.zip python/
Create a lambda layer from the requirements.zip zip file (you can also use the portal instead of awscli):
# aws lambda publish-layer-version --layer-name dependencies \
--description "Python packages" --license-info "MIT" \
--zip-file fileb://requirements.zip --compatible-runtimes python3.9 \
--compatible-architectures "x86_64"
Add the published layer version to your lambda function (or use the portal):
# aws lambda update-function-configuration --function-name Your_Lambda_Func_Name --layers arn:aws:lambda:us-east-1:xxxxxxxx:layer:dependencies:1
I found the solution, just use Python 3.8 and use the packages as layers.
Here's a link from my git with 2 layers working, psycopg2 and pandas (boto3/os are prebuilt in AWS): https://github.com/alexdragut20/AWS
Just create the layers using "Upload a zip file" option, create a function from scratch and add the layers upon the function.

ta-lib replit python install problem, ERROR: No matching distribution found for talib-binary

I use it on my windows machine by downloading its binary. I also use it in Heroku from its herokus build pack. I don't know what operating system replit use. But I try every possible commed like.
!pip install ta-lib
!pip install talib-binary
It's not working with replit. I thought it work like google co-lab but its not the same.
can anyone use TA-LIB with replit. if so. How you install it?
Getting TA-Lib work on Replit
(by installing it from sources)
Create a new replit with Nix toolset with a Python template.
In main.py write:
import talib
print (talib.__ta_version__)
This will be our test case. If ta-lib is installed the python main.py (executed in Shell) will return something like:
$ python main.py
b'0.6.0-dev (Jan 1 1980 00:00:00)'
We need to prepare a tools for building TA-Lib sources. There is a replit.nix file in your project's root folder (in my case it was ~/BrownDutifulLinux). Every time you execute a command like cmake the Nix reports that:
cmake: command not installed. Multiple versions of this command were found in Nix.
Select one to run (or press Ctrl-C to cancel):
cmake.out
cmakeCurses.out
cmakeWithGui.out
cmakeMinimal.out
cmake_2_8.out
If you select cmake.out it will add a record about it into the replit.nix file. And next time you call cmake, it will know which cmake version to launch. Perhaps you may manually edit replit.nix file... But if you're going to add such commands in a my way, note that you must execute them in Shell in your project root folder as replit.nix file is located in it. Otherwise Nix won't remember your choice.
After all my replit.nix file (you may see its content with cat replit.nix) content was:
{ pkgs }: {
deps = [
pkgs.libtool
pkgs.automake
pkgs.autoconf
pkgs.cmake
pkgs.python38Full
];
env = {
PYTHON_LD_LIBRARY_PATH = pkgs.lib.makeLibraryPath [
# Needed for pandas / numpy
pkgs.stdenv.cc.cc.lib
pkgs.zlib
# Needed for pygame
pkgs.glib
# Needed for matplotlib
pkgs.xorg.libX11
];
PYTHONBIN = "${pkgs.python38Full}/bin/python3.8";
LANG = "en_US.UTF-8";
};
}
Which means I executed libtool, autoconf, automake and cmake in Shell. I always choose a generic suggestion from Nix, without a specific version. Note: some commands may report errors as we executing them in a wrong way just to add to a replit.nix.
3.
Once build tools are set up we need to get and build TA-Lib C library sources. To do that execute in Shell:
git clone https://github.com/TA-Lib/ta-lib.git
then
cd ta-lib/
libtoolize
autoreconf --install
./configure
If configure script is completed without any problems, build the library with:
make -j4
It will end up with some compilation errors, but they are related to some additional tools which are used to add new TA-Lib indicators and build at the end, but not the library itself. The library will be successfully compiled and you should be able to see it with:
$ ls ./src/.libs/
libta_lib.a libta_lib.lai libta_lib.so.0
libta_lib.la libta_lib.so libta_lib.so.0.0.0
Now we have our C library built, but we can't install it to a system default folders. So we have to use the library as is from the folders where it was build. All we need is just one more additional preparation:
mkdir ./include/ta-lib
cp ./include/*.h ./include/ta-lib/
This will copy a library headers to a subfolder, as they are designed to be used from a such subfolder (which they don't have due to impossibility to perform the installation step).
4.
Now we have TA-Lib C library built and prepared to be used locally from its build folders. All we need after that - is to compile the Python wrapper for it. But Python wrapper will look for a library only in system default folders, so we need to instruct it where our library is.
To do this, execute pwd and remember the absolute path to your project's root folder. In my case it was:
/home/runner/FormalPleasedOffice
Then adjust the paths (there are two) in a following command to lead to your project path:
TA_INCLUDE_PATH=/home/runner/FormalPleasedOffice/ta-lib/include/ TA_LIBRARY_PATH=/home/runner/FormalPleasedOffice/ta-lib/src/.libs/ pip install ta-lib
This is one line command, not a two commands.If the paths would be shorter it would look like:
TA_INCLUDE_PATH=/path1/ TA_LIBRARY_PATH=/path2/ pip install ta-lib.
After execution of this command the wrapper will be installed with two additional paths where it will look for a library and its header files.
That's actually all.
An alternative way would be to clone the wrapper sources, edit its setup.py and install wrapper manually. Just for the record this would be:
cd ~/Your_project
git clone https://github.com/mrjbq7/ta-lib.git ta-lib-wrapper
cd ta-lib-wrapper
Here edit the setup.py. Find the lines include_dirs = [ and library_dirs = [ and append your paths to these lists. Then you just need to:
python setup.py build
pip install .
Note the dot at the end.
5.
Go to the project's folder and try our python script:
$python main.py
b'0.6.0-dev (Jan 1 1980 00:00:00)'
Bingo!
The #truf answer is correct.
after you add the
pkgs.libtool
pkgs.automake
pkgs.autoconf
pkgs.cmake
in the replit.nix dippendancies.
git clone https://github.com/TA-Lib/ta-lib.git
cd ta-lib/
libtoolize
autoreconf --install
./configure
make -j4
mkdir ./include/ta-lib
cp ./include/*.h ./include/ta-lib/
TA_INCLUDE_PATH=/home/runner/FormalPleasedOffice/ta-lib/include/ TA_LIBRARY_PATH=/home/runner/FormalPleasedOffice/ta-lib/src/.libs/ pip install ta-lib
Note : FormalPleasedOffice should be your project name
Done.
Here is the youtube video :
https://www.youtube.com/watch?v=u20y-nUMo5I

Cannot import name 'cygrpc' from 'grpc._cython' - Google Ads API

I want to deploy working python project in pycharm to aws lambda. The project is using google-ads library to get some report data from google ads.
I tried deploying lambda by importing complete project as a zip file by zipping all the folders/files inside the project and not the project folder itself. But i got the following error:
{
"errorMessage": "Unable to import module 'main': cannot import name 'cygrpc' from 'grpc._cython' (/var/task/grpc/_cython/__init__.py)",
"errorType": "Runtime.ImportModuleError",
"stackTrace": []
}
Assuming that google-ads library is working and that something is wrong with grpc(btw google-ads includes grpcio and stuff on its own), i tried to create a layer for grpcio, cython, cygrpc but the error remains same.
I create projects/layers in aws lambda and they work. I dont know what i am doing wrong here.
Any help would be really appreciated!
versions: google-ads-14.1.0, python-3.9, grpcio-1.43.0
Answering my own question after a lot of workaround. I have made it generic so anyone can use it.
I believe you can fix any type of ImportModuleError as long as your deployment package's file structure, your code and architecture is ok, only then you can deploy and run your code successfully. To fix your structure and architecture, follow steps below:
1- Install "ubuntu 18.04 LTS" from microsoft store (Windows 10).
2- Open CMD and run following commands:
ubuntu1804
Enter password or create user if asked.
cd /mnt/c You can choose any of your drive. I chose C.
mkdir my-lambda-folder Create project folder.
cd my-lambda-folder Enter into project folder.
touch lambda_function.py Create file called lambda_function.py
Now copy and paste your code into file you just created i.e lambda_function.py
pip install --target ./package your-module-name
For Example: pip install --target ./package google-ads will install
google-ads module inside folder 'package'. The folder 'package' will be
created automatically if not found.
cd package
zip -r ../my-deployment-package.zip . This will create deployment package with the installed library at the root of your project folder i.e my-lambda-folder.
cd .. go back to the root of your project folder.
zip -g my-deployment-package.zip lambda_function.py Add your lambda function to the deployment package you just created i.e my-deployment-package.zip.
(Optional) In my case i was using google-ads and to run my code i needed google-ads.yaml file too in my deployment package. So i ran additional command zip -g my-deployment-package.zip google-ads-yaml (i already pasted this file in my project folder).
3- Upload my-deployment-package.zip to your lambda function in AWS console and you are good to go.
For me, it worked just by downloading the packages with pip on ubuntu on docker and packing and uploading them on AWS.

How to import lxml from precompiled binary on AWS Lambda?

I'm trying to import the lxml library in Python to execute an AWS Lambda function but I'm getting the following error: [ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'lxml'. To solve this, I followed the recommendation from this SO answer and used precompiled binaries from the following repo.
I used the lxml_amazon_binaries.zip file from that repo, which has this structure:
lxml_amazon_binaries
├── lxml
└── usr
I uploaded the entire zip file to an AWS Lambda layer, created a new Lambda function, and tested with a simple from lxml import etree, which led to the above error.
Am I uploading/using these binaries correctly? I'm not sure what caused the error. Using different Python runtimes didn't help.
The most reliable way to create lxml layer is using Docker as explain in the AWS blog. Specifically, the verified steps are (executed on Linux, but windows should also work as long as you have Docker):
Create empty folder, e.g. mylayer.
Go to the folder and create requirements.txt file with the content of
lxml
Run the following docker command:
The command will create layer for python3.8:
docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"
Archive the layer as zip:
zip -9 -r mylayer.zip python
Create lambda layer based on mylayer.zip in the AWS Console. Don't forget to specify Compatible runtime to python3.8.
Add the the layer created in step 5 to your function.
I tested the layer using your code:
from lxml import etree
def lambda_handler(event, context):
root = etree.Element("root")
root.append( etree.Element("child1") )
print(etree.tostring(root, pretty_print=True))
It works correctly:
b'<root>\n <child1/>\n</root>\n'

How to add dependencies inside setup.py file?

How to add dependencies inside setup.py file ? Like, I am writing this script on VM and want to check whether certain dependencies like, jdk or docker is there or not, and if there is no dependencies installed, then need to install automatically on VM using this script.
Please do tell me as soon as possible, as it is required for my project.
In simplest form, you can add (python) dependencies which can be install via pip as follow:
from setuptools import setup
setup(
...
install_requires=["install-jdk", "docker>=4.3"],
...
)
Alternatively, write down a requirement.txt file and then use it:
with open("requirements.txt") as requirements_file:
requirements = requirements_file.readlines()
requirements = [x[:-1] for x in requirements]
setup(
...
install_requires=requirements,
...
)
Whenever you'll execute python setup.py install then these dependencies will be checked against the available libraries in your VM and if they are not available (or version mismatch) then it will be installed (or replaced). More information can be found here.
Refer the https://github.com/boto/s3transfer/blob/develop/setup.py and check the requires variables.
You can refer many other open source projects
You can add dependencies using setuptools, however it can only check dependencies on python packages.
Because of that, you could check jdk and docker installation before setup(), manually.
You could call system like the code below and check the reponse.
import os
os.system("java -version")
os.system("docker version --format \'{{.Server.Version}}\'")

Categories

Resources