I am trying to get Gensim on AWS lambda but after trying all the file reduction techniques (https://github.com/robertpeteuil/build-lambda-layer-python) to try to create layers it still does not fit. So I decided to try to load the packages during runtime of the lambda function as our function is not under a heavy time constraint.
So I first looked at uploaded a venv to S3 and then downloading and activating it from a script following (Activate a virtualenv with a Python script) using the 2nd block of the top rated answer. However, it turned out that the linked script was for python 2 so I looked up the python 3 version (making sure to copy an activiate_this.py from a virtualenv to the normal venv bin since the standard venv package doesn't include one)
activator = "/Volumes/SD.Card/Machine_Learning/lambda/bin/activate_this.py"
with open(activator) as f:
exec(f.read(), {'__file__': activator})
import numpy
After running this script to the target venv with numpy I am still getting a no module found error. I cannot find a good resource for how to do this properly. So I guess my question is: what is the best way to load packages during lambda runtime and how does one carry that out?
Related
I’ve tried to install fenics and use the repository of the paper “Hybrid FEM-NN models: Combining artificial neural networks with the finite element method]” to calculate a linear Physics-Informed Neural Network for a linear problem (https://github.com/sebastkm/hybrid-fem-nn-examples/tree/main/examples/pinn_linear).
I’m using Windows 11 and Python 3.10.4.
To run the script main.py I need to use the fenics package. As usual working in python I did
pip install fenics
which worked without any problems. Trying to run the script prompted the error
from fenics import *
ModuleNotFoundError: No module named 'fenics'
After reading a couple of posts on this issue I made sure there are no other virtual environments anymore and that the path sys.path :
C:\Users\neuma\AppData\Local\Programs\Python\Python310\Lib\site-packages
contains the folder which contains the installed fenics packages:
fenics_dijitso-2019.1.0.dist-info
fenics_ffc-2019.1.0.post0.dist-info
fenics_fiat-2019.1.0.dist-info
fenics_ufl-2019.1.0.dist-info
fenics-2019.1.0.dist-info
I’ve noted that there is no folder named just fenics.
After this attempt did not work I tried to follow the instruction for DOLFINx (https://docs.fenicsproject.org/dolfinx/main/python/installation.html#dependencies) since some posts mentioned dolfinx and fenics are the same.
After installing the docker I followed the instructions on running fenics within the docker (https://fenics.readthedocs.io/projects/containers/en/latest/introduction.html#installing-docker). This at least seemed to work using the Terminal:
C:\Users\neuma>docker run -ti quay.io/fenicsproject/stable:latest
# FEniCS stable version image
Welcome to FEniCS/stable!
This image provides a full-featured and optimized build of the stable
release of FEniCS.
To help you get started this image contains a number of demo
programs. Explore the demos by entering the 'demo' directory, for
example:
cd ~/demo/python/documented/poisson
python3 demo_poisson.py
fenics#9548d966c2fc:~$ cd ~/demo/python/documented/poisson
fenics#9548d966c2fc:~/demo/python/documented/poisson$ python3 demo_poisson.py
Calling FFC just-in-time (JIT) compiler, this may take some time.
Calling FFC just-in-time (JIT) compiler, this may take some time.
Calling FFC just-in-time (JIT) compiler, this may take some time.
Calling FFC just-in-time (JIT) compiler, this may take some time.
Calling FFC just-in-time (JIT) compiler, this may take some time.
Calling FFC just-in-time (JIT) compiler, this may take some time.
Solving linear variational problem.
To view figure, visit http://0.0.0.0:8000
Press Ctrl+C to stop WebAgg server
Visiting the url prompt the following message:
This page is not working
0.0.0.0 has not sent any data.
ERR_EMPTY_RESPONSE
Since I seem to have tried all possible versions of installation of fenics (and/or Dolfinx) and nothing worked I want to ask here if anyone could help me with the installtion.
I’m pretty confused about how to understand the difference betweend fenics and dolfinx and why I need Ubuntu or Linux and a Docker to run a package which already seems to be installed in python . Maybe this screenshot will make it a little clearer:
If you need any more information just let me know. Would be great if someone could help me out.
Oskar
In case anyone else struggles with this or similiar problems, the solution was given here: https://fenicsproject.discourse.group/t/installation-problems/9041/43
Gettting the below error while running the lambda code , I am using the library called
from flatten_json import flatten
I tried to look for a lambda layer , but did not find any online , please let me know if any one used this before or suggest any alternative
flatten_json library is missing.
Use pip install flatten_json to get it
There are four steps you need to do:
Download the dependency.
Package it in a ZIP file.
Create a new layer in AWS.
Associate the layer with your Lambda.
My answer will focus on 1. and 2. as they are what is most important to your problem. Unfortunately, packaging Python dependencies can be a bit more complicated than for other runtimes.
The main issue is that some dependencies use C code under the hood, especially performance critical libraries, for example for Machine Learning etc.
C code needs to be compiled and if you run pip install on your machine the code will be compiled for your computer. AWS Lambdas use a linux kernel and amd64 architecture. So if you are running pip install on a Linux machine with AMD or Intel processor, you can indeed just use pip install. But if you use macOS or Windows, your best bet is Docker.
Without Docker
pip install --target python flatten_json
zip -r layer.zip python
With Docker
The lambci project provides great Docker container for building and running Lambdas. In the following example I am using their build-python3.8 image.
docker run --rm -v $(pwd):/var/task lambci/lambda:build-python3.8 pip install --target python flatten_json
zip -r layer.zip python
Be aware that $(pwd) is meant to be your current directoy. On macOS and WSL this should work, but if it does not work you can just replace it with the absolute path to your current directory.
Explanation
Those commands will install the dependency into a target folder called python. The name is important, because it is one of two folders of a layer where Lambda looks for dependencies.
The python folder is than archived recursively (-r) in a file called layer.zip.
Your next step is to create a new Layer in AWS and associated your function with that layer.
There are two options to choose from
Option 1) You can use a deployment package to deploy your function code to Lambda.
The deployment package (For e.g zip) will contain your function's code and any dependencies used to run the function's code.
Hence, you can package flatten_json as your code to the Lambda.
Check Creating a function with runtime dependencies page in aws documentation, it explains the use-case of having requests library. In your scenario, the library would be flatten_json
Option 2) Create a layer that has the library dependencies you need, in your case just flatten_json. And then attach that layer to your Lambda.
Check creating and sharing lambda layers guide provided by AWS.
How to decide between 1) and 2)?
Use Option 1) when you just need the dependencies in just that one Lambda. No need to create an extra step of creating a layer.
Layers are useful if you have some common code that you want to share across different Lambdas. So if you need the library accessible in other Lambdas as well, then it's good to have a layer[Option 2)] that can be attached to different lambdas.
You can do this is in a Lambda if you don´t want to create the layer. Keep in mind it will run slower since it has to install the library in every run:
import sys
import subprocess
subprocess.call('pip install flatten_json -t /tmp/ --no-cache-dir'.split(), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
sys.path.insert(1, '/tmp/')
import flatten_json
I have a simple Lambda function which is using the numpy library,
I have set up a virtual environment in my local, and my code is able to fetch and use the library locally.
I tried to use AWS Lambda's layer, and zipped the venv folder and uploaded to the layer,
Then I attached the correct layer and version to my function,
But the function is not able to fetch the library
Following is the code which works fine on local -
import numpy as np
def main(event, context):
a = np.array([1, 2, 3])
print("Your numpy array:")
print(a)
Following is the venv structure which I zipped and uploaded -
I get the following error -
{
"errorMessage": "Unable to import module 'handler': No module named 'numpy'",
"errorType": "Runtime.ImportModuleError"
}
My Lambda deployment looks like this -
I'm trying to refer this -
https://towardsdatascience.com/introduction-to-amazon-lambda-layers-and-boto3-using-python3-39bd390add17
I've seen that a few libraries like numpy and pandas don't work in Lambda when installed using pip. I have had success using the .whl package files for these libraries to create the Lambda layer. Refer to the steps below:
NOTE: These steps set up the libraries specific to the Python 3.7 runtime. If using any other version, you would need to download the .whl files corresponding to that Python version.
Create an EC2 instance using Amazon Linux AMI and SSH into this instance. We should create our layer in Amazon Linux AMI as the Lambda Python 3.7 runtime runs on this operating system (doc).
Make sure this instance has Python3 and "pip" tool installed.
Download the numpy .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/d6/c6/58e517e8b1fb192725cfa23c01c2e60e4e6699314ee9684a1c5f5c9b27e1/numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl
Skip to the next step if you're not using pandas. Download the pandas .whl file for the cp37 Python version and the manylinux1_x86_64 OS by executing the below command:
$ wget https://files.pythonhosted.org/packages/a4/5f/1b6e0efab4bfb738478919d40b0e3e1a06e3d9996da45eb62a77e9a090d9/pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl
Next, we will create a directory named "python" and unzip these files into that directory:
$ mkdir python
$ unzip pandas-1.0.4-cp37-cp37m-manylinux1_x86_64.whl -d python/
$ unzip numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl -d python/
We also need to download "pytz" library to successfully import numpy and pandas libraries:
$ pip3 install -t python/ pytz
Next, we would remove the “*.dist-info” files from our package directory to reduce the size of the resulting layer.
$ cd python
$ sudo rm -rf *.dist-info
This will install all the required libraries that we need to run pandas and numpy.
Zip the current "python" directory and upload it to your S3 bucket. Ensure that the libraries are present in the hierarchy as given here.
$ cd ..
$ zip -r lambda-layer.zip python/
$ aws s3 cp lambda-layer.zip s3://YOURBUCKETNAME
The "lambda-layer.zip" file can then be used to create a new layer from the Lambda console.
Base on aws lamda layer doc, https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html your zip package for the layer must have this structure.
my_layer.zip
| python/numpy
| python/numpy-***.dist-info
So what you have to do is create a folder python, and put the content of site-packages inside it, then zip up that python folder. I tried this out with a simple package and it seem to work fine.
Also keep in mind, some package require c/c++ compilation, and for that to work you must install and package on a machine with similar architecture to lambda. Usually you would need to do this on an EC2 where you install and package where it have similar architecture to the lambda.
That's bit of misleading question, because you at least did not mention you use serverless. I found it going through the snapshot of you project structure you provided. That means you probably use serverless for deployment of your project within AWS provider.
Actually, there are multiple ways you can arrange lambda layer. Let's have a look at each of them.
Native AWS
Once you will navigate to Add a layer, you will find 3 options:
[AWS Layers, Custom Layers, Specify an ARN;].
Specify an ARN Guys, who did all work for you: KLayers
so, you need numpy, okay. Within lambda function navigate to the layers --> create a new layer --> out of 3 options, choose Specify an ARN and as the value put: arn:aws:lambda:eu-west-1:770693421928:layer:Klayers-python38-numpy:12.
It will solve your problem and you will be able to work with numpy Namespace.
Custom Layers
Choose a layer from a list of layers created by your AWS account or organization.
For custom layers the way of implementing can differ based on your requirements in terms of deployment.
If are allowed to accomplish things manually, you should have a glimpse at following Medium article. I assume it will help you!
AWS Layers
As for AWS pre-build layers, all is simple.
Layers provided by AWS that are compatible with your function's runtime.
Can differentiate between runtimes
For me I have list of: Perl5, SciPy, AppConfig Extension
Serverless
Within serverless things are much easier, because you can define you layers directly with lambda definition in serverless.yml file. Afterwards, HOW to define them can differ as well.
Examples can be found at: How to publish and use AWS Lambda Layers with the Serverless Framework
If you will have any questions, feel free to expand the discussion.
Cheers!
I am trying to import a python deployment package in aws lambda. The python code uses numpy. I followed the deployment package instructions for virtual env but it still gave Missing required dependencies ['numpy']. I followed the instruction given on stack overflow (skipped step 4 for shared libraries, could not find any shared libraries) but no luck. Any suggestions to make it work?
The easiest way is to use AWS Cloud9, there is no need to start EC2 instances and prepare deployment packages.
Step 1: start up Cloud9 IDE
Go to the AWS console and select the Cloud9 service.
Create environment
enter Name
Environment settings (consider using t2.small instance type, with the default I had sometimes problems to restart the environment)
Review
click Create environment
Step 2: create Lambda function
Create Lambda Function (bottom of screen)
enter Function name and Application name
Select runtime (Python 3.6) and blueprint (empty-python)
Function trigger (none)
Create serverless application (keep defaults)
Finish
wait couple of seconds for the IDE to open
Step 3: install Numpy
at the bottom of the screen there is a command prompt
go to the Application folder, for me this is
cd App
install the package (we have to use pip from the virtual environment, because the default pip points to /usr/bin/pip and this is python 2.7)
venv/bin/pip install numpy -t .
Step4: test installation
you can test the installation by editing the lambda_function.py file:
import numpy as np
def lambda_handler(event, context):
return np.sin(1.0)
save the changes and click the green Run button on the top of the screen
a new tab should appear, now click the green Run button inside the tab
after the Lambda executes I get:
Response
0.8414709848078965
Step 5: deploy Lambda function
on the right hand side of the screen select your Lambda function
click the upwards pointing arrow to deploy
go to the AWS Lambda service tab
the Lambda function should be visible, the name has the format
cloud9-ApplicationName-FunctionName-RandomString
Using Numpy is a real pain.
Numpy needs to be properly compiled on the same OS as it runs. This means that you need to install/compile Numpy on an AMI image in order for it erun properly in Lambda.
The easiest way to do this is to start a small EC2 instance and install it there. Then copy the compiled files (from /usr/lib/python/site-packages/numpy). These are the files you need to include in your Lambda package.
I believe you can also use the serverless tool to achieve this.
NumPy must be compiled on the platform that it will be run on. Easiest way to do this is to use Docker. Docker has a lambda container. Compile NumPy locally in Docker with the lambda container, then push to AWS lambda.
The serverless framework handles all this for you if you want an easy solution. See https://stackoverflow.com/a/50027031/1085343
I was having this same issue and pip install wasn't working for me. Eventually, I found some obscure posts about this issue here and here. They suggested going to pypi, downloading the .whl file (select py version and manylinux1_x86_64), uploading, and unzipping. This is what ultimately worked for me. If nothing else is working for you, I would suggest trying this.
For Python 3.6, you can use a Docker image as someone already suggested. Full instructions for that here: https://medium.com/i-like-big-data-and-i-cannot-lie/how-to-create-an-aws-lambda-python-3-6-deployment-package-using-docker-d0e847207dd6
The key piece is:
docker run -it dacut/amazon-linux-python-3.6
I started writing python codes two weeks ago and until now I have manipulated some excel data after converting it to txt file. Now, I want to manipulate excel data directly, so I need to install openpyxl package. However, my script will be used in many different places and computers (Note that: all of them use either OS X or a Linux distrubution) which might (probably do not) contain openpyxl package installed. I want my script to check whether this package exist, and if it does not exit, then download and install it.
For that purpose, as I searched and found that I could check the existence of the package by first importing the pip module and then pip.get_installed_distributions() method. However, I am not sure whether I am in a wrong way or not. Besides, I have no idea how to download and install openpyxl package "without leaving the script".
Would you like to help me ?
You are getting ahead of yourself by trying to solve a deployment problem when you don't have the software to deploy.
Having a script trying to manage its own deployment is generally a bad idea because it takes responsibility away from package managers.
Here is my answer to your question:
Build your software under the assumption that all packages are installed/deployed correctly. I recommend creating a virtualenv (virtual environment) on your development computer, using pip to install openpyxl and any other packages you need.
Once your software is built and functional, read up on how to deploy your software package to PyPi: https://pypi.python.org/pypi. Create your package by defining openpyxl as a dependency, ensure your package can be installed/run properly (I recommend adding in tests), then deploy to PyPi so anyone can use your software.
If you want to add a nice layer of validation in your script just in case, add the following to your main method:
#!/usr/bin/env python3
import sys
try:
import openpyxl
except ImportError as iE:
print("Python Excel module 'openpyxl' not found")
sys.exit(1)
... rest of script ...