Unable to import module 'lambda_function': No module named 'flatten_json'

Unable to import module 'lambda_function': No module named 'flatten_json' - python

Gettting the below error while running the lambda code , I am using the library called
from flatten_json import flatten
I tried to look for a lambda layer , but did not find any online , please let me know if any one used this before or suggest any alternative

flatten_json library is missing.
Use pip install flatten_json to get it

There are four steps you need to do:
Download the dependency.
Package it in a ZIP file.
Create a new layer in AWS.
Associate the layer with your Lambda.
My answer will focus on 1. and 2. as they are what is most important to your problem. Unfortunately, packaging Python dependencies can be a bit more complicated than for other runtimes.
The main issue is that some dependencies use C code under the hood, especially performance critical libraries, for example for Machine Learning etc.
C code needs to be compiled and if you run pip install on your machine the code will be compiled for your computer. AWS Lambdas use a linux kernel and amd64 architecture. So if you are running pip install on a Linux machine with AMD or Intel processor, you can indeed just use pip install. But if you use macOS or Windows, your best bet is Docker.
Without Docker
pip install --target python flatten_json
zip -r layer.zip python
With Docker
The lambci project provides great Docker container for building and running Lambdas. In the following example I am using their build-python3.8 image.
docker run --rm -v $(pwd):/var/task lambci/lambda:build-python3.8 pip install --target python flatten_json
zip -r layer.zip python
Be aware that $(pwd) is meant to be your current directoy. On macOS and WSL this should work, but if it does not work you can just replace it with the absolute path to your current directory.
Explanation
Those commands will install the dependency into a target folder called python. The name is important, because it is one of two folders of a layer where Lambda looks for dependencies.
The python folder is than archived recursively (-r) in a file called layer.zip.
Your next step is to create a new Layer in AWS and associated your function with that layer.

There are two options to choose from
Option 1) You can use a deployment package to deploy your function code to Lambda.
The deployment package (For e.g zip) will contain your function's code and any dependencies used to run the function's code.
Hence, you can package flatten_json as your code to the Lambda.
Check Creating a function with runtime dependencies page in aws documentation, it explains the use-case of having requests library. In your scenario, the library would be flatten_json
Option 2) Create a layer that has the library dependencies you need, in your case just flatten_json. And then attach that layer to your Lambda.
Check creating and sharing lambda layers guide provided by AWS.
How to decide between 1) and 2)?
Use Option 1) when you just need the dependencies in just that one Lambda. No need to create an extra step of creating a layer.
Layers are useful if you have some common code that you want to share across different Lambdas. So if you need the library accessible in other Lambdas as well, then it's good to have a layer[Option 2)] that can be attached to different lambdas.

You can do this is in a Lambda if you don´t want to create the layer. Keep in mind it will run slower since it has to install the library in every run:
import sys
import subprocess
subprocess.call('pip install flatten_json -t /tmp/ --no-cache-dir'.split(), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
sys.path.insert(1, '/tmp/')
import flatten_json

Related

aws_cdk python error: Unzipped size must be smaller than 262144000 bytes

I use CDK to deploy a lambda function that uses several python modules.
But I got the following error at the deployment.
Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInte
rnal; Status Code: 400; Error Code: InvalidParameterValueException;
I have searched following other questions, related to this issue.
question1
question2
But they focus on serverless.yaml and don't solve my problem.
Is there any way around for this problem?
Here is my app.py for CDK.
from aws_cdk import (
aws_events as events,
aws_lambda as lam,
core,
)
class MyStack(core.Stack):
def __init__(self, app: core.App, id: str) -> None:
super().__init__(app, id)
layer = lam.LayerVersion(
self, "MyLayer",
code=lam.AssetCode.from_asset('./lib'),
);
makeQFn = lam.Function(
self, "Singleton",
function_name='makeQ',
code=lam.AssetCode.from_asset('./code'),
handler="makeQ.main",
timeout=core.Duration.seconds(300),
layers=[layer],
runtime=lam.Runtime.PYTHON_3_7,
)
app = core.App()
MyStack(app, "MS")
app.synth()
In ./lib directory, I put python modules like,
python -m pip install numpy -t lib/python

Edit: Better method!
Original:
There is now an experimental package, aws_lambda_python_alpha which you can use to automatically bundle packages listed in a requirements.txt, but I'm unfortunately still running into the same size issue. I'm thinking now to try layers.
For anyone curious, here's a sample of bundling using aws_lambda_python_alpha:
from aws_cdk import aws_lambda_python_alpha as _lambda_python
self.prediction_lambda = _lambda_python.PythonFunction(
scope=self,
id="PredictionLambda",
# entry points to the directory
entry="lambda_funcs/APILambda",
# index is the file name
index="API_lambda.py",
# handler is the function entry point name in the lambda.py file
handler="handler",
runtime=_lambda.Runtime.PYTHON_3_9,
# name of function on AWS
function_name="ExampleAPILambda",
)

I would suggest checking out the aws-cdk-lambda-asset project, which will help bundle internal project dependencies stored in a requirements.txt file. How this works is, it installs dependencies specified in the requirements file to a local folder, then bundles it up in a zip file which is then used for CDK deployment.
For non-Linux environments like Windows/Mac, it will install the dependencies in a Docker image, so first ensure that you have Docker installed and running on your system.
Note that the above code seems to use poetry, which is a dependency management tool. I have no idea what poetry does or why it is used in lieu of a setup.py file. Therefore, I've created a slight modification in a gist here, in case this is of interest; this will install all local dependencies using a regular pip install command instead of the poetry tool, which I'm not too familiar with.

Thanks a lot.
In my case, the issue is solved just by removing all __pycache__ in the local modules before the deployment.
I hope the situation will be improved and we only have to upload requirements.txt instead of preparing all the modules locally.

Nope, there is no way around that limit in single setup.
What you have to do instead is install your dependencies into multiple zips that become multiple layers.
Basically, install several dependencies to a python folder. Then zip that folder into something like intergration_layer. Clear the python folder and install the next set and name it something else. Like data_manipilation.
Then you have two layers in cdk (using aws_lambda.LayerVersion) and add those layers to each lambda. You'll have to break the layers up to be small enough.
You can use a makefile to generate the layers automatically and then tie the makefile, cdk deploy, and some clean up inside a bash script yo tie them all together.
Note. You are still limited on space with layers and limited to 5 layers. If your dependencies outgrow that then look into Elastic File System. You can install dependencies there and tie that single EFS to any lambda to reference them through PYTHONPATH manipulation built into the EFS lambda connection
Basic makefile idea (but not exact for CDK, but still close)
(and if you never have, a good tutorial on how to use a makefile)
CDK documentation for Layers (use from AssetCode for the location)
AWS dev blog on EFS for Lambdas

How to use AWS Lambda layer using Python?

I have a simple Lambda function which is using the numpy library,
I have set up a virtual environment in my local, and my code is able to fetch and use the library locally.
I tried to use AWS Lambda's layer, and zipped the venv folder and uploaded to the layer,
Then I attached the correct layer and version to my function,
But the function is not able to fetch the library
Following is the code which works fine on local -
import numpy as np
def main(event, context):
a = np.array([1, 2, 3])
print("Your numpy array:")
print(a)
Following is the venv structure which I zipped and uploaded -
I get the following error -
{
"errorMessage": "Unable to import module 'handler': No module named 'numpy'",
"errorType": "Runtime.ImportModuleError"
}
My Lambda deployment looks like this -
I'm trying to refer this -
https://towardsdatascience.com/introduction-to-amazon-lambda-layers-and-boto3-using-python3-39bd390add17

I've seen that a few libraries like numpy and pandas don't work in Lambda when installed using pip. I have had success using the .whl package files for these libraries to create the Lambda layer. Refer to the steps below:
NOTE: These steps set up the libraries specific to the Python 3.7 runtime. If using any other version, you would need to download the .whl files corresponding to that Python version.
Create an EC2 instance using Amazon Linux AMI and SSH into this instance. We should create our layer in Amazon Linux AMI as the Lambda Python 3.7 runtime runs on this operating system (doc).
Make sure this instance has Python3 and "pip" tool installed.
Download the numpy .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/d6/c6/58e517e8b1fb192725cfa23c01c2e60e4e6699314ee9684a1c5f5c9b27e1/numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl
Skip to the next step if you're not using pandas. Download the pandas .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/a4/5f/1b6e0efab4bfb738478919d40b0e3e1a06e3d9996da45eb62a77e9a090d9/pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl
Next, we will create a directory named "python" and unzip these files into that directory:
$ mkdir python
$ unzip pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl -d python/
$ unzip numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl -d python/
We also need to download "pytz" library to successfully import numpy and pandas libraries:
$ pip3 install -t python/ pytz
Next, we would remove the “*.dist-info” files from our package directory to reduce the size of the resulting layer.
$ cd python
$ sudo rm -rf *.dist-info
This will install all the required libraries that we need to run pandas and numpy.
Zip the current "python" directory and upload it to your S3 bucket. Ensure that the libraries are present in the hierarchy as given here.
$ cd ..
$ zip -r lambda-layer.zip python/
$ aws s3 cp lambda-layer.zip s3://YOURBUCKETNAME
The "lambda-layer.zip" file can then be used to create a new layer from the Lambda console.

Base on aws lamda layer doc, https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html your zip package for the layer must have this structure.
my_layer.zip
| python/numpy
| python/numpy-***.dist-info
So what you have to do is create a folder python, and put the content of site-packages inside it, then zip up that python folder. I tried this out with a simple package and it seem to work fine.
Also keep in mind, some package require c/c++ compilation, and for that to work you must install and package on a machine with similar architecture to lambda. Usually you would need to do this on an EC2 where you install and package where it have similar architecture to the lambda.

That's bit of misleading question, because you at least did not mention you use serverless. I found it going through the snapshot of you project structure you provided. That means you probably use serverless for deployment of your project within AWS provider.
Actually, there are multiple ways you can arrange lambda layer. Let's have a look at each of them.
Native AWS
Once you will navigate to Add a layer, you will find 3 options:
[AWS Layers, Custom Layers, Specify an ARN;].
Specify an ARN Guys, who did all work for you: KLayers
so, you need numpy, okay. Within lambda function navigate to the layers --> create a new layer --> out of 3 options, choose Specify an ARN and as the value put: arn:aws:lambda:eu-west-1:770693421928:layer:Klayers-python38-numpy:12.
It will solve your problem and you will be able to work with numpy Namespace.
Custom Layers
Choose a layer from a list of layers created by your AWS account or organization.
For custom layers the way of implementing can differ based on your requirements in terms of deployment.
If are allowed to accomplish things manually, you should have a glimpse at following Medium article. I assume it will help you!
AWS Layers
As for AWS pre-build layers, all is simple.
Layers provided by AWS that are compatible with your function's runtime.
Can differentiate between runtimes
For me I have list of: Perl5, SciPy, AppConfig Extension
Serverless
Within serverless things are much easier, because you can define you layers directly with lambda definition in serverless.yml file. Afterwards, HOW to define them can differ as well.
Examples can be found at: How to publish and use AWS Lambda Layers with the Serverless Framework
If you will have any questions, feel free to expand the discussion.
Cheers!

Load Python Packages In Lambda During Runtime

I am trying to get Gensim on AWS lambda but after trying all the file reduction techniques (https://github.com/robertpeteuil/build-lambda-layer-python) to try to create layers it still does not fit. So I decided to try to load the packages during runtime of the lambda function as our function is not under a heavy time constraint.
So I first looked at uploaded a venv to S3 and then downloading and activating it from a script following (Activate a virtualenv with a Python script) using the 2nd block of the top rated answer. However, it turned out that the linked script was for python 2 so I looked up the python 3 version (making sure to copy an activiate_this.py from a virtualenv to the normal venv bin since the standard venv package doesn't include one)
activator = "/Volumes/SD.Card/Machine_Learning/lambda/bin/activate_this.py"
with open(activator) as f:
exec(f.read(), {'__file__': activator})
import numpy
After running this script to the target venv with numpy I am still getting a no module found error. I cannot find a good resource for how to do this properly. So I guess my question is: what is the best way to load packages during lambda runtime and how does one carry that out?

Unable to install cvxpy into virtualenv for AWS lambda

I am trying to run the cvxpy package in an AWS lambda function. This package isn't in the SDK, so I've read that I'll have to compile the dependencies into a zip, and then upload the zip into the lambda function.
I've done some research and tried out the links below, but when I try to pip install cvxpy I get error messages - I'm on a Windows box, but I know that AWS Lambda runs on Linux.
Appreciate the help!
http://i-systems.github.io/HSE545/machine%20learning%20all/cvxpy_install/CVXPY%2BInstallation%2BGuide%2Bfor%2BWindows.html
https://programwithus.com/learn-to-code/Pip-and-virtualenv-on-Windows/
https://medium.com/#manivannan_data/import-custom-python-packages-on-aws-lambda-function-5fbac36b40f8
https://www.cvxpy.org/install/index.html

For installing cvxpy on windows it requires c++ build tools (please refer: https://buildmedia.readthedocs.org/media/pdf/cvxpy/latest/cvxpy.pdf)
On Windows:
I created a lambda layer python directory structure python/lib/python3.7/site-packages (refer: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html) and installed my pip packages in that site-packages directory.
pip install cvxpy --target python/lib/python3.7/site-packages
Then, I zipped the python/lib/python3.7/site-packages as cvxpy_layer.zip and uploaded it to an S3 bucket (layer zipped file max limit is only 50 MB https://docs.aws.amazon.com/lambda/latest/dg/limits.html), to attach it to my lambda layers.
Now, the layer is ready but the lambda is failing to import the packages as they were installed on a windows machine. (refer: AWS Lambda - unable to import module 'lambda_function')
On Linux:
I created the same directory structure as earlier python/lib/python3.7/site-packages and installed the cvxpy and zipped it as shown below.
Later I uploaded the zip file to an S3 bucket and created a new lambda layer.
Attaching that lambda layer to my lambda function, I colud able to resolve the import issues failing earlier and run the basic cvxpy program on lambda.
mkdir -p alley/python/lib/python3.7/site-packages
pip install cvxpy --target alley/python/lib/python3.7/site-packages
cd alley
zip -rqvT cvxpy_layer.zip .
Lambda layer Image:
Lambda function execution:

You can wrap all your dependencies along with lambda source into a single zipfile and deploy it. Doing this, you will end up having additional repetitive code in multiple lambda functions. Suppose, if more than one of your lambda functions needs the same package cvxpy, you will have to package it twice for both the functions individually.
Instead a better option would be to try Labmda Layers, where you put all your dependencies into a package and deploy a layer into your Lambda. Then attach that layer to your function to fetch its dependencies from there. The layers can even be versioned. :)
Please refer the below links:
https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
https://dev.to/vealkind/getting-started-with-aws-lambda-layers-4ipk

Using numpy in AWS Lambda

I'm looking for a work around to use numpy in AWS lambda. I am not using EC2 just lambda for this so if anyone has a suggestion that'd be appreciated. Currently getting the error:
cannot import name 'multiarray'
Using grunt lambda to create the zip file and upload the function code. All the modules that I use are installed into a folder called python_modules inside the root of the lambda function which includes numpy using pip install and a requirements.txt file.

An easy way to make your lambda function support the numpy library for python 3.7:
Go to your lambda function page
Find the Layers section at the bottom of the page.
Click on Add a layer.
Choose AWS layers as layer source.
Select AWSLambda-Python37-Scipy1x as AWS layers.
Select 37 for version.
And finally click on Add.
Now your lambda function is ready to support numpy.

Updated to include the solution here, rather than a link:
After much effort, I found that I had to create my deployment package from within a python3.6 virtualenv, rather than directly from the host machine. I did the following within a Ubuntu 16.04 docker image. This assumes that you have python3.6, virtualenv and awscli already installed/configured, and that your lambda function code is in the ~/lambda_code directory:
1) cd ~ (We'll build the virtualenv in the home directory)
2) virtualenv venv --python=python3.6 (Create the virtual environment)
3) source venv/bin/activate (Activate the virtual environment)
4) pip install numpy
5) cp -r ~/venv/lib/python3.6/site-packages/* ~/lambda_code (Copy all installed packages into root level of lambda_code directory. This will include a few unnecessary files, but you can remove those yourself if needed)
6) cd ~/lambda_code
7) zip -r9 ~/package.zip . (Zip up the lambda package)
8) aws lambda update-function-code --function-name my_lambda_function --zip-file fileb://~/package.zip (Upload to AWS)
Your lambda function should now be able to import numpy with no problems.
If you want a more out-of-the-box solution, you could consider using serverless to deploy your lambda function. Before I found the above solution, I followed the guide here and was able to run numpy successfully in a python3.6 lambda function.

As of 2018 it's best to just use the inbuilt layers functionality.
AWS have actually released a pre-made one with numpy in it: https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-and-share-common-components/

I was unable to find a good solution using serverless plugins, but I did find a good way with layers. See Serverless - Numpy - Unable to find good bind path format

Add numpy layer in this way:
Go on your lambda function
select add a new layer
add it using this arn: arn:aws:lambda:eu-central-1:770693421928:layer:Klayers-p39-numpy:7
(change your zone if you are not in eu-central-1)
Let me know if it will work

I would add this answer as well: https://stackoverflow.com/a/52508839/1073691
Using pipenv includes all of the needed .so files as well.

1.) Do a Pip install of numpy to a folder on your local machine.
2.) once complete, zip the entire folder and create a zip file.
3.) Go to AWS lambda console, create a layer and upload zip file created in step 2 there and save the layer.
4.) After you create your lambda function, click add layer and add the layer you created. That's it, import numpy will start working.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.