Python package management for AWS lambda function - python

I'm struggling to understand how best to manage python packages to get zipped up for an AWS lambda function.
In my project folder I have a number of .py files. As part of my build process I zip these up and use the AWS APIs to create and publish my lambda function supplying the zip file as part of that call.
Therefore, it is my belief that I need to have all the packages my lambda is dependant on within my project folder.
With that in mind, I call pip as follows:
pip install -t . tzlocal
This seems to fill my project folder with lots of stuff and I'm unsure if all of it needs to get zipped up into my lambda function deployment e.g.
.\pytz
.\pytz-2018.4.dist-info
.\tzlocal
...
...
First question - does all of this stuff need to be zipped up into my lambda?
If not, how do I get a package that gives me just the bits I need to go into my zip file?
Coming from a .Net / Node background - with the former I NuGet my package in and it goes into a nice packages folder containing just the .dll file I need which I then reference into my project.
If I do need all of these files is there a way to "put" them somewhere more tidy - like in a packages folder?
Finally, is there a way to just download the binary that I actually need? I've read the problem here is that the Lambda function will need a different binary to the one I use on my desktop development environment (Windows) so not sure how to solve that problem either.

Binary libraries used for example by numpy should be compiled on AWS Linux to work on lambda. I find this tutorial useful (https://serverlesscode.com/post/deploy-scikitlearn-on-lamba/). There is even newer version of it which uses docker container so you do not need EC2 instance for compilation and you can do everything locally.
As for the packages: the AWS docs says to install everything to the root, but you may install them all in a ./packages directory and append it to path in the beginning if the lambda handler code
import os
import sys
cwd = os.getcwd()
package_path = os.path.join(cwd, 'packages')
sys.path.append(package_path)

Related

How to include external files in Python wheel?

I'm working on a Python package (built as a wheel), which requires some external files – i.e. ones that aren't in my package/repository. At runtime, it currently gets the path to these from a config file.
I'd like to instead bundle these files inside the wheel itself, so the package is all you need to install. Normally you'd use package_data to do this kind of thing, but I don't think that works for files outside of the package tree. I've wondered about making a build script, which first copies these into a local temporary directory. Would that work, or is there a more elegant way to do this?

How to bundle Python Lambdas with shared files and different dependencies?

I've got some apparently conflicting concerns when deploying Lambda functions using CDK:
I want all the dependencies for the whole project to be installed in a single virtualenv in my local copy of the code, so that the IDE can index everything in one go. This is done with a project-level pyproject.toml and poetry.lock.
I want only the minimal dependencies to be installed for each Lambda function. This is done by using Poetry "extras" for each Lambda function, and installing only the relevant extras when bundling each Lambda function.
I want to share code between Lambdas, by putting shared files into their common parent directory.
The way I do this currently is clunky:
bundling_options = BundlingOptions(
image=aws_lambda.Runtime.PYTHON_3_8.bundling_docker_image,
command=["backend/bundle.bash", directory],
)
return aws_lambda.Code.from_asset(path=".", bundling=bundling_options)
bundle.bash
uses poetry export to convert the relevant entries from Poetry to pip requirements.txt format and installs using pip install --target=/asset-output (because Poetry doesn't support installing into a specific directory), then
copies the files from directory and its parent directory (the shared files) to /asset-output.
I have to use path="." above because the Docker container needs to read pyproject.toml and poetry.lock from the root of the project. This seems to be causing another issue, where any change to any file in the entire repository causes a redeployment of every Lambda function. This is the main reason I'm looking for alternatives.
Is there a better way to share files between Lambda functions in CDK, where the Lambda functions have different dependencies?
Some bad options:
I don't think I can use PythonFunction, because it assumes that the requirements.txt file is in the root directory (which is shared between Lambda functions).
I don't want to duplicate the list of packages, so putting all packages in the root configuration and duplicating the relevant packages in requirements.txt files for each Lambda function is out.
I don't want to do any horrible code rewriting at bundling time, such as moving all the shared code into subdirectories and changing imports from them.
A workaround for the specific issue with redeploying every time was to specify the asset_hash in from_asset. Unfortunately, this adds considerable brittleness and complexity (hashing every possibly relevant file) rather than simplifying.
Manually doing bundling outside of Docker. This would also be semi-complex (about the same as right now, by the looks) and would be more brittle than bundling inside Docker. I'd also have to hash the source files, not the resulting files, because Pip installs are in general not reproducible (I see plenty of differences between two installs of the same package to different directories).
I would suggest you use Lambda Layers where you should install all dependencies in a single folder called python also you can add common functionality in the layer.
To create AWS Lambda Layer Package follow below steps -
$ mkdir python
$ cd python
$ pip install -r requirements.txt -t .
Once, you installed these dependencies make it a zip file named python.zip. Now, you can create Lambda Layer and you can upload this python.zip over there. Now, you can import packages in your Lambda Function as usually we do. AWS Lambda Layers

Converting a python package into a single importable file

Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.
It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.

looking for a way to upgrade / downgrade inside Lambda

I need a way to upgrade/downgrade the boto3 lib inside my Python 3.7 env inside Lambda.
Right now, the version is 1.9.42 inside Lambda. I cannot use certain things like Textract (boto3.client('textract'), but I can on my local machine (boto3 version 1.9.138.
So, I decided to install boto3 into a package (pip3 install boto3 -t dir/ --system) then upload it to Lambda after zipping it.
This didn't work because Lambda won't accept a package larger than 3MB (it's around 8MB)
Any other workarounds?
edit: I know I could always just write code that works and keep uploading it to Lambda, but this will become cumbersome as I'd have to include all the packages installed in the package and rebuilding it as I make changes.
The Serverless Application Model is a tool provided by AWS that lets you develop locally as it simulates the lamdba environment inside a docker container. Once you are ready you can deploy your code to a lambda and it will work as expect.
If you really want to keep editing the code in the web platform, there is a workaround by using lambda layers. You create a package with all of your dependencies and upload that to a lambda layer. Then include your layer in the lambda and just modify your own code there. As it has been pointed out in the comments this is not the way to go for real development.

Python project in visual studio deployed on AWS using lambda?

I have a python project with a lot of dependencies (around 30 or so python packages) that i want to deploy on aws using lambda function. Right now i have about 30 custom python packages in my VS solution that i import into the main function - there is a lot of code. What is the best way to build a deployment package and how would i go about doing this?
I watched a few tutorials but i am new to this, so im not sure exactly what concrete steps to take. If i use something like zappa and create a virtual environment how would i then get my project there and install all the dependencies and then zip the file?
Thanks so much, sorry for the stupid questions, i couldn't find a stackoverflow post that covered this
Just go to your python environment folder and found site-package folder (usually in /lib), choose all the dependencies you need and zip them with your code.
I guess it's the easiest way.
For example, I may need beautifulsoup and urllib for dependencies, just zip them (and their dependencies, if needed) with my code, then upload to AWS Lambda, that's all.
BTW, you can also see this gist to know whether the module you need can be directly import to AWS Lambda or not.

Categories

Resources