I have a python3.8, and I created a new folder and installed pandas in it using
pip3 install pandas -t .
Next thing - zipped the folder (the zipped folder is 38Mb), and uploaded to s3.
Created a Layer and added the s3 path of zip file in it (also set the runtime env to python3.8 here)
Next I created a lambda function with python3.8 and test its skeleton first and it worked.
Added the layer to the lambda function and imported pandas. And now when I run the test, it does not detect pandas and gives me Module Not Found error.
What can I be doing wrong here?
Did you created a folder named python and put all the package files inside it? Also another supporting package Pytz is required to run panda in lambda. I use MAC OSX to create the zip file but the root folder needs to be named as "python".
The creation of the zip file is little bit lengthy to describe here, I suggest you to go through this document. Also naming convention is equally important. I think if you follow the document and create the zip file and upload it directly (or via S3) to create lambda layer then it will definitely work.
Related
I just uploaded a .zip file to AWS Lambda with all needed packages. I ran all right in my Mac using virtual environment with python 3.8. The AWS Lambda function also has python 3.8. But when I run in AWS Lambda I get this error:
No module named 'numpy.core._multiarray_umath'
I have changed the actual numpy version (1.20.2) to other versions like 1.19.1 and 1.18.5 but the problem can't be fixed.
I am also using spacy 3.0.6 and fastapi 0.63.0.
When I encountered same issue, this steps worked for me:
1- Download required packages(you may need different versions):
- pandas-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- python_dateutil-2.8.2-py2.py3-none-any.whl
- pytz-2022.1-py2.py3-none-any.whl
- numpy-1.21.5-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- If you need others ...
2- Create a project folder and unzip whl files to the folder.
3- Remove *dist-info folders.
4- Add your source code to folder(lambda_function.py)
5- Zip the folder and upload to Lambda as a source code zip file.
Also you can look these links may help you:
https://korniichuk.medium.com/lambda-with-pandas-fd81aa2ff25e
https://github.com/numpy/numpy/issues/13465#issuecomment-545378314
I am havin an issue with importing a common function util file in my AWS lambda. It is a python file and the folder structure looks something like this
(functions folder)
common_util.py
(lambda 1 folder)
lambda1
(lambda 2 folder)
lambda2
I need to access the common_util from both these lambdas. When I run my CDK project locally this is easy i use something like .. on the import statement to tell the file it is one directory up
from ..common_util import (...)
When I deploy to AWS as a lambda (I package all of the above) I need to specify the import without the .. because this is the root folder of the lambda
from common_util import(...)
I need an import statement or a solution that will work for both my CDK project and the lambda.
here is the CDK where the lambda is created
const noteIntegrationLambda = new Function(this as any,"my-lambda",
{
functionName:
"my-lambda",
runtime: StackConfiguration.PYTHON_VERSION,
handler:
"my_lambda.execute",
timeout: Duration.seconds(15),
code: Code.fromAsset("functions/"),
role,
layers: [dependencyLayer],
environment: env,
},
}
);
Lambda layers provide an ideal mechanism to include in solving this problem. As mentioned in https://medium.com/#manojf/sharing-code-among-lambdas-using-lambda-layers-ca097c8cd500,
Lambda layers allow us to share code among lambda functions. We just
have to upload the layer once and reference it in any lambda function.
So consider deploying your common code via a layer. That said, to structure your code, I recommend you create a common package that you install locally using pip install, as outlined at Python how to share package between multiple projects. Then you put that package into a layer that both of your lambdas reference. That completely solves the problem of how to structure code when your local file structure is different than the lambda file structure.
Also consider these resources:
Import a python module in multiple AWS Lambdas
What is the proper way to work with shared modules in Python development?
https://realpython.com/absolute-vs-relative-python-imports/
Python: sharing common code among a family of scripts
Sharing code in AWS Lambda
Installing Python packages from local file system folder to virtualenv with pip
As a layer example, suppose you wanted to include a "common_utils" library for your lambdas to reference. To make a layer, you would need to create a directory structure that contains that code, then zip the entire directory. It may be as follows:
/python
/common_utils
__init__.py
common_util.py
...
When zipped, the zip file must have the "python" folder and inside of that you put your code. If you do this and also install your common code as a package, you can import it in your local code and in your lambdas using the same import.
What I do is use pip install to install to a certain file location--the location that I then zip into a layer. For example, if I wanted to make a layer for the pymysql library I might do
pip install --target=c:\myLayers\python pymysql
That will install the library files into the location I specified, which makes it easy to know what to zip up (just create a zip that includes the "python" directory).
I know this question is old, but I ran into a similar issue. My solution was to detect if the current environment is local or lambda using the os package, and then import differently based on the environment (local or cloud). Will leave here as a reference.
if os.environ.get("AWS_EXECUTION_ENV") is not None:
# For use in lambda function
from package_a import class_a
else:
# For local use
from ...package_a import class_a
Credits to: How to check if Python app is running within AWS lambda function?
I have a simple Lambda function which is using the numpy library,
I have set up a virtual environment in my local, and my code is able to fetch and use the library locally.
I tried to use AWS Lambda's layer, and zipped the venv folder and uploaded to the layer,
Then I attached the correct layer and version to my function,
But the function is not able to fetch the library
Following is the code which works fine on local -
import numpy as np
def main(event, context):
a = np.array([1, 2, 3])
print("Your numpy array:")
print(a)
Following is the venv structure which I zipped and uploaded -
I get the following error -
{
"errorMessage": "Unable to import module 'handler': No module named 'numpy'",
"errorType": "Runtime.ImportModuleError"
}
My Lambda deployment looks like this -
I'm trying to refer this -
https://towardsdatascience.com/introduction-to-amazon-lambda-layers-and-boto3-using-python3-39bd390add17
I've seen that a few libraries like numpy and pandas don't work in Lambda when installed using pip. I have had success using the .whl package files for these libraries to create the Lambda layer. Refer to the steps below:
NOTE: These steps set up the libraries specific to the Python 3.7 runtime. If using any other version, you would need to download the .whl files corresponding to that Python version.
Create an EC2 instance using Amazon Linux AMI and SSH into this instance. We should create our layer in Amazon Linux AMI as the Lambda Python 3.7 runtime runs on this operating system (doc).
Make sure this instance has Python3 and "pip" tool installed.
Download the numpy .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/d6/c6/58e517e8b1fb192725cfa23c01c2e60e4e6699314ee9684a1c5f5c9b27e1/numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl
Skip to the next step if you're not using pandas. Download the pandas .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/a4/5f/1b6e0efab4bfb738478919d40b0e3e1a06e3d9996da45eb62a77e9a090d9/pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl
Next, we will create a directory named "python" and unzip these files into that directory:
$ mkdir python
$ unzip pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl -d python/
$ unzip numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl -d python/
We also need to download "pytz" library to successfully import numpy and pandas libraries:
$ pip3 install -t python/ pytz
Next, we would remove the “*.dist-info” files from our package directory to reduce the size of the resulting layer.
$ cd python
$ sudo rm -rf *.dist-info
This will install all the required libraries that we need to run pandas and numpy.
Zip the current "python" directory and upload it to your S3 bucket. Ensure that the libraries are present in the hierarchy as given here.
$ cd ..
$ zip -r lambda-layer.zip python/
$ aws s3 cp lambda-layer.zip s3://YOURBUCKETNAME
The "lambda-layer.zip" file can then be used to create a new layer from the Lambda console.
Base on aws lamda layer doc, https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html your zip package for the layer must have this structure.
my_layer.zip
| python/numpy
| python/numpy-***.dist-info
So what you have to do is create a folder python, and put the content of site-packages inside it, then zip up that python folder. I tried this out with a simple package and it seem to work fine.
Also keep in mind, some package require c/c++ compilation, and for that to work you must install and package on a machine with similar architecture to lambda. Usually you would need to do this on an EC2 where you install and package where it have similar architecture to the lambda.
That's bit of misleading question, because you at least did not mention you use serverless. I found it going through the snapshot of you project structure you provided. That means you probably use serverless for deployment of your project within AWS provider.
Actually, there are multiple ways you can arrange lambda layer. Let's have a look at each of them.
Native AWS
Once you will navigate to Add a layer, you will find 3 options:
[AWS Layers, Custom Layers, Specify an ARN;].
Specify an ARN Guys, who did all work for you: KLayers
so, you need numpy, okay. Within lambda function navigate to the layers --> create a new layer --> out of 3 options, choose Specify an ARN and as the value put: arn:aws:lambda:eu-west-1:770693421928:layer:Klayers-python38-numpy:12.
It will solve your problem and you will be able to work with numpy Namespace.
Custom Layers
Choose a layer from a list of layers created by your AWS account or organization.
For custom layers the way of implementing can differ based on your requirements in terms of deployment.
If are allowed to accomplish things manually, you should have a glimpse at following Medium article. I assume it will help you!
AWS Layers
As for AWS pre-build layers, all is simple.
Layers provided by AWS that are compatible with your function's runtime.
Can differentiate between runtimes
For me I have list of: Perl5, SciPy, AppConfig Extension
Serverless
Within serverless things are much easier, because you can define you layers directly with lambda definition in serverless.yml file. Afterwards, HOW to define them can differ as well.
Examples can be found at: How to publish and use AWS Lambda Layers with the Serverless Framework
If you will have any questions, feel free to expand the discussion.
Cheers!
I am trying to load, process and write Parquet files in S3 with AWS Lambda. My testing / deployment process is:
https://github.com/lambci/docker-lambda as a container to mock the Amazon environment, because of the native libraries that need to be installed (numpy amongst others).
This procedure to generate a zip file: http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example-deployment-pkg.html#with-s3-example-deployment-pkg-python
Add a test python function to the zip, send it to S3, update the lambda and test it
It seems that there are two possible approaches, which both work locally to the docker container:
fastparquet with s3fs: Unfortunately the unzipped size of the package is bigger than 256MB and therefore I can't update the Lambda code with it.
pyarrow with s3fs: I followed https://github.com/apache/arrow/pull/916 and when executed with the lambda function I get either:
If I prefix the URI with S3 or S3N (as in the code example): In the Lambda environment OSError: Passed non-file path: s3://mybucket/path/to/myfile in pyarrow/parquet.py, line 848. Locally I get IndexError: list index out of range in pyarrow/parquet.py, line 714
If I don't prefix the URI with S3 or S3N: It works locally (I can read the parquet data). In the Lambda environment, I get the same OSError: Passed non-file path: s3://mybucket/path/to/myfile in pyarrow/parquet.py, line 848.
My questions are :
why do I get a different result in my docker container than I do in the Lambda environment?
what is the proper way to give the URI?
is there an accepted way to read Parquet files in S3 through AWS Lambda?
Thanks!
AWS has a project (AWS Data Wrangler) that allows it with full Lambda Layers support.
In the Docs there is a step-by-step to do it.
Code example:
import awswrangler as wr
# Write
wr.s3.to_parquet(
dataframe=df,
path="s3://...",
dataset=True,
database="my_database", # Optional, only with you want it available on Athena/Glue Catalog
table="my_table",
partition_cols=["PARTITION_COL_NAME"])
# READ
df = wr.s3.read_parquet(path="s3://...")
Reference
I was able to accomplish writing parquet files into S3 using fastparquet. It's a little tricky but my breakthrough came when I realized that to put together all the dependencies, I had to use the same exact Linux that Lambda is using.
Here's how I did it:
1. Spin up a EC2 instance using the Amazon Linux image that is used with Lambda
Source:
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
Linux image:
https://console.aws.amazon.com/ec2/v2/home#Images:visibility=public-images;search=amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2
Note: you might need to install many packages and change python version to 3.6 as this Linux is not meant for development. Here's how I looked for packages:
sudo yum list | grep python3
I installed:
python36.x86_64
python36-devel.x86_64
python36-libs.x86_64
python36-pip.noarch
python36-setuptools.noarch
python36-tools.x86_64
2. Used the instructions from here to built a zip file with all of the dependencies that my script would use with dumping them all in a folder and the zipping them with this command:
mkdir parquet
cd parquet
pip install -t . fastparquet
pip install -t . (any other dependencies)
copy my python file in this folder
zip and upload into Lambda
Note: there are some constraints I had to work around: Lambda doesn't let you upload zip larger 50M and unzipped > 260M. If anyone knows a better way to get dependencies into Lambda, please do share.
Source:
Write parquet from AWS Kinesis firehose to AWS S3
This was an environment issue (Lambda in VPC not getting access to the bucket). Pyarrow is now working.
Hopefully the question itself will give a good-enough overview on how to make all that work.
One can also achieve this through the AWS sam cli and Docker (we'll explain this requirement later).
1.Create a directory and initialize sam
mkdir some_module_layer
cd some_module_layer
sam init
by typing the last command a series of three question would be prompted. One could choose the following series of answers (I'm considering working under Python3.7, but other options are possible).
1 - AWS Quick Start Templates
8 - Python 3.7
Project name [sam-app]: some_module_layer
1 - Hello World Example
2. Modify requirements.txt file
cd some_module_layer
vim hello_world/requirements.txt
this will open requirements.txt file on vim, on Windows you could type instead code hello_world/requirements.txt to edit the file on Visual Studio Code.
3. Add pyarrow to requirements.txt
Alongside pyarrow, it will work to include additionnaly pandas and s3fs. In this case including pandas will avoid it to not recognize pyarrow as an engine to read parquet files.
pandas
pyarrow
s3fs
4. Build with a container
Docker is required to use the option --use-container when running the sam build command. If it's the first time, it will pull the lambci/lambda:build-python3.7 Docker image.
sam build --use-container
rm .aws-sam/build/HelloWorldFunction/app.py
rm .aws-sam/build/HelloWorldFunction/__init__.py
rm .aws-sam/build/HelloWorldFunction/requirements.txt
notice that we're keeping only the python libraries.
5. Zip files
cp -r .aws-sam/build/HelloWorldFunction/ python/
zip -r some_module_layer.zip python/
On Windows, it would work to run Compress-Archive python/ some_module_layer.zip.
6. Upload zip file to AWS
The following link is useful for this.
I'm looking for a work around to use numpy in AWS lambda. I am not using EC2 just lambda for this so if anyone has a suggestion that'd be appreciated. Currently getting the error:
cannot import name 'multiarray'
Using grunt lambda to create the zip file and upload the function code. All the modules that I use are installed into a folder called python_modules inside the root of the lambda function which includes numpy using pip install and a requirements.txt file.
An easy way to make your lambda function support the numpy library for python 3.7:
Go to your lambda function page
Find the Layers section at the bottom of the page.
Click on Add a layer.
Choose AWS layers as layer source.
Select AWSLambda-Python37-Scipy1x as AWS layers.
Select 37 for version.
And finally click on Add.
Now your lambda function is ready to support numpy.
Updated to include the solution here, rather than a link:
After much effort, I found that I had to create my deployment package from within a python3.6 virtualenv, rather than directly from the host machine. I did the following within a Ubuntu 16.04 docker image. This assumes that you have python3.6, virtualenv and awscli already installed/configured, and that your lambda function code is in the ~/lambda_code directory:
1) cd ~ (We'll build the virtualenv in the home directory)
2) virtualenv venv --python=python3.6 (Create the virtual environment)
3) source venv/bin/activate (Activate the virtual environment)
4) pip install numpy
5) cp -r ~/venv/lib/python3.6/site-packages/* ~/lambda_code (Copy all installed packages into root level of lambda_code directory. This will include a few unnecessary files, but you can remove those yourself if needed)
6) cd ~/lambda_code
7) zip -r9 ~/package.zip . (Zip up the lambda package)
8) aws lambda update-function-code --function-name my_lambda_function --zip-file fileb://~/package.zip (Upload to AWS)
Your lambda function should now be able to import numpy with no problems.
If you want a more out-of-the-box solution, you could consider using serverless to deploy your lambda function. Before I found the above solution, I followed the guide here and was able to run numpy successfully in a python3.6 lambda function.
As of 2018 it's best to just use the inbuilt layers functionality.
AWS have actually released a pre-made one with numpy in it: https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-and-share-common-components/
I was unable to find a good solution using serverless plugins, but I did find a good way with layers. See Serverless - Numpy - Unable to find good bind path format
Add numpy layer in this way:
Go on your lambda function
select add a new layer
add it using this arn: arn:aws:lambda:eu-central-1:770693421928:layer:Klayers-p39-numpy:7
(change your zone if you are not in eu-central-1)
Let me know if it will work
I would add this answer as well: https://stackoverflow.com/a/52508839/1073691
Using pipenv includes all of the needed .so files as well.
1.) Do a Pip install of numpy to a folder on your local machine.
2.) once complete, zip the entire folder and create a zip file.
3.) Go to AWS lambda console, create a layer and upload zip file created in step 2 there and save the layer.
4.) After you create your lambda function, click add layer and add the layer you created. That's it, import numpy will start working.