I've got some apparently conflicting concerns when deploying Lambda functions using CDK:
I want all the dependencies for the whole project to be installed in a single virtualenv in my local copy of the code, so that the IDE can index everything in one go. This is done with a project-level pyproject.toml and poetry.lock.
I want only the minimal dependencies to be installed for each Lambda function. This is done by using Poetry "extras" for each Lambda function, and installing only the relevant extras when bundling each Lambda function.
I want to share code between Lambdas, by putting shared files into their common parent directory.
The way I do this currently is clunky:
bundling_options = BundlingOptions(
image=aws_lambda.Runtime.PYTHON_3_8.bundling_docker_image,
command=["backend/bundle.bash", directory],
)
return aws_lambda.Code.from_asset(path=".", bundling=bundling_options)
bundle.bash
uses poetry export to convert the relevant entries from Poetry to pip requirements.txt format and installs using pip install --target=/asset-output (because Poetry doesn't support installing into a specific directory), then
copies the files from directory and its parent directory (the shared files) to /asset-output.
I have to use path="." above because the Docker container needs to read pyproject.toml and poetry.lock from the root of the project. This seems to be causing another issue, where any change to any file in the entire repository causes a redeployment of every Lambda function. This is the main reason I'm looking for alternatives.
Is there a better way to share files between Lambda functions in CDK, where the Lambda functions have different dependencies?
Some bad options:
I don't think I can use PythonFunction, because it assumes that the requirements.txt file is in the root directory (which is shared between Lambda functions).
I don't want to duplicate the list of packages, so putting all packages in the root configuration and duplicating the relevant packages in requirements.txt files for each Lambda function is out.
I don't want to do any horrible code rewriting at bundling time, such as moving all the shared code into subdirectories and changing imports from them.
A workaround for the specific issue with redeploying every time was to specify the asset_hash in from_asset. Unfortunately, this adds considerable brittleness and complexity (hashing every possibly relevant file) rather than simplifying.
Manually doing bundling outside of Docker. This would also be semi-complex (about the same as right now, by the looks) and would be more brittle than bundling inside Docker. I'd also have to hash the source files, not the resulting files, because Pip installs are in general not reproducible (I see plenty of differences between two installs of the same package to different directories).
I would suggest you use Lambda Layers where you should install all dependencies in a single folder called python also you can add common functionality in the layer.
To create AWS Lambda Layer Package follow below steps -
$ mkdir python
$ cd python
$ pip install -r requirements.txt -t .
Once, you installed these dependencies make it a zip file named python.zip. Now, you can create Lambda Layer and you can upload this python.zip over there. Now, you can import packages in your Lambda Function as usually we do. AWS Lambda Layers
Related
Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.
It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.
Let's say I have two distinct Python projects, libr (a library) and proj (a project that depends on libr).
If I set up a private package index and list libr as a dependency in setup.py, I can tell pip to use it using --index-url... but if I want to test out how a change in libr affects proj, I need to upload the new version to the private index and reinstall.
If I instead tell pip to look for libr in my local filesystem using --editable, I will be able to test out the code without uploading, but my setup.py will be polluted with paths instead of package names.
Is there any way to get the best of both worlds?
I'm struggling to understand how best to manage python packages to get zipped up for an AWS lambda function.
In my project folder I have a number of .py files. As part of my build process I zip these up and use the AWS APIs to create and publish my lambda function supplying the zip file as part of that call.
Therefore, it is my belief that I need to have all the packages my lambda is dependant on within my project folder.
With that in mind, I call pip as follows:
pip install -t . tzlocal
This seems to fill my project folder with lots of stuff and I'm unsure if all of it needs to get zipped up into my lambda function deployment e.g.
.\pytz
.\pytz-2018.4.dist-info
.\tzlocal
...
...
First question - does all of this stuff need to be zipped up into my lambda?
If not, how do I get a package that gives me just the bits I need to go into my zip file?
Coming from a .Net / Node background - with the former I NuGet my package in and it goes into a nice packages folder containing just the .dll file I need which I then reference into my project.
If I do need all of these files is there a way to "put" them somewhere more tidy - like in a packages folder?
Finally, is there a way to just download the binary that I actually need? I've read the problem here is that the Lambda function will need a different binary to the one I use on my desktop development environment (Windows) so not sure how to solve that problem either.
Binary libraries used for example by numpy should be compiled on AWS Linux to work on lambda. I find this tutorial useful (https://serverlesscode.com/post/deploy-scikitlearn-on-lamba/). There is even newer version of it which uses docker container so you do not need EC2 instance for compilation and you can do everything locally.
As for the packages: the AWS docs says to install everything to the root, but you may install them all in a ./packages directory and append it to path in the beginning if the lambda handler code
import os
import sys
cwd = os.getcwd()
package_path = os.path.join(cwd, 'packages')
sys.path.append(package_path)
I am working on a project and I have cloned a repository from github.
After first compile I realized that the project that I cloned has some dependencies and they were in requirements.txt file.
I know I have to install these packages, but I dont want to cause I am on windows development environment and after finishing my project I am going to publish it to my ubuntu production environment and I dont want to take the hassle of double installation.
I have two options:
Using a virtualenv and installing those packages inside it
Downloading the packages and use them the direct way using import foldername
I wanna avoid the first option cause I have less control over my project and the problem gets bigger and bigger If for example I were inside another project's virtualenv and wanted to run my project's main.py file from its own virtualenv and etc... Also moving the virtualenv from windows (bat files) to linux (bash / sh files) seems ugly to me and directs me to approaches that I choose to better avoid.
The second option is my choice. for example I need to use the future package. The scenario would be downloading the package using pip download future and when done extracting the tar.gz file, inside the src folder I can see the future package folder, And I use it with import future_package.src.future without even touching anything else.
Aside from os.path problems (which assume I take care of):
Is this good practice?
I am not running the setup.py preventing any installation. Can it cause problems?
Is there any better approach that involves less work (like the second one) or the better one is my mentioned first approach?
UPDATE 1: I have extracted future and certifi packages which were part of the requirements of my project and I used them the direct way and it is working in this particular case.
I'm working on a python project that contains a number of routines I use repeatedly. Instead of rewriting code all the time, I just want to update my package and import it; however, it's nowhere near done and is constantly changing. I host the package on a repo so that colleagues on various machines (UNIX + Windows) can pull it into their local repos and use it.
It sounds like I have two options, either I can keeping installing the package after every change or I can just add the folder directory to my system's path. If I change the package, does it need to be reinstalled? I'm using this blog post as inspiration, but the author there doesn't stress the issue of a continuously changing package structure, so I'm not sure how to deal with this.
Also if I wanted to split the project into multiple files and bundle it as a package, at what level in the directory structure does the PTYHONPATH need to be at? To the main project directory, or the .sample/ directory?
README.rst
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/test_basic.py
tests/test_advanced.py
In this example, I want to be able to just import the package itself and call the modules within it like this:
import sample
arg = sample.helper.foo()
out = sample.core.bar(arg)
return out
Where core contains a function called foo
PYTHONPATH is a valid way of doing this, but in my (personal) opinion it's more useful if you have a whole different place where you keep your python variables. Like /opt/pythonpkgs or so.
For projects where I want it to be installed and also I have to keep developing, I use develop instead of install in setup.py:
When installing the package, don't do:
python setup.py install
Rather, do:
python setup.py develop
What this does is that it creates a synlink/shortcut (I believe it's called egglink in python) in the python libs (where the packages are installed) to point to your module's directory. Hence, as it's a shortcut/symlink/egglink when ever you change a python file, it will immediately reflect the next time you import that file.
Note: Using this, if you delete the repository/directory you ran this command from, the package will cease to exist (as its only a shortcut)
The equivalent in pip is -e (for editable):
pip install -e .
Instead of:
pip install .