AWS Lambda Deployment Package in Python Limites - python

I want to run my code on AWS lambda function. To do so, i need to import some python packages (i.e. pandas, numpy, sklearn, scipy)
I have two problems:
First of all, the size of (unzip) packaged python zip files is greater than 250MB.
Secondly, I got some error using scipy as well as sklearn as:
Unable to import module 'lambda_function': cannot import name
'_ccallback_c'
of
Unable to import module 'lambda_function': No module named
'sklearn.check_build._check_build'
___________________________________________________________________________ Contents of /var/task/sklearn/__check_build:
__pycache _check_build.cpython-35m-x86_64-linux-gnu.sosetup.py
init.py
___________________________________________________________________________ It seems that scikit-learn has not been built correctly.
I tried to reinstall many times...
But still problems in sklearn and scipy.
Any idea?
sample code in AWS LambdaFunction:
import json
import numpy
import pandas
import sklearn
import scipy
def lambda_handler(event, context):
# TODO implement
print(event)
return

You appear to have two issues.
The first (and easiest to solve) is that you need to install the relevant modules on a Linux distro comparable to Amazon Linux.
You can either do this using EC2 or in a Docker container with Amazon Linux on it.
The second issue (which is a bit trickier if not impossible to solve given the size of the modules you want to use) is that you need to get your deployment size down to under 250MB unzipped and under 50MB zipped.
Using relevant CFLAG when installing may get you some of the way there. See here for an idea of what might work.
If you are still over limit (which I suspect you will be) your only choice left will be to delete some of the files in the modules which you believe will not be used in your particular program. This is risky, often error prone and usually takes many attempts to get right. Using code coverage tools may help you here, as they can indicate which files are actually being used.

Related

Not able to add pandas in aws layer and getting error

I'm getting error when I import pandas/numpy in lambda function:
I added pandas/numpy as a aws layer also I have added pymysql in same way, and it's working fine.
Working on windows 10,
Using python3.7 same in lambda also
Building zip package using pip install pandas -t
In dir - python\lib\python3.7\site-packages
{
"errorMessage": "Unable to import module 'lambda_function': \n\nIMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!\n\nImporting the numpy C-extensions failed. This error can happen for\nmany reasons, often due to issues with your setup or how NumPy was\ninstalled.\n\nWe have compiled some common reasons and troubleshooting tips at:\n\n https://numpy.org/devdocs/user/troubleshooting-importerror.html\n\nPlease note and check the following:\n\n * The Python version is: Python3.7 from \"/var/lang/bin/python3.7\"\n * The NumPy version is: \"1.19.1\"\n\nand make sure that they are the versions you expect.\nPlease carefully study the documentation linked above for further help.\n\nOriginal error was: No module named 'numpy.core._multiarray_umath'\n",
"errorType": "Runtime.ImportModuleError"
}
zip of the package should be done in the build folder, even though you install the packages in the site-packages
https://www.youtube.com/watch?v=zrrH9nbSPhQ - has some good steps to follow through.
Pandas could be a huge zip file, that lambda layers might not support. You might want to check for pandas- example pandas-xlrd.

Error with Serverless Framework Deployment

Background: I have been working in Python to create a lambda function that will hit the Binance API to fetch balances and transactions. To deploy, I have been using the Serverless Framework (https://serverless.com/) and virtualenv which has made it a breeze up until this point. I have 2 other functions working perfect with other exchanges.
Error: When I deploy, I am getting the following:
Unable to import module 'getBinanceTransactions': No module named '_regex'
getBinanceTransactions being the function I created to return what I want. Nothing crazy, just following the python-binance documentation (https://github.com/sammchardy/python-binance) to grab all transactions and then data wrangling.
Note that this works on my local machine!
I do a serverless deploy, and everything updates just fine using serverless-python-requirements to package everything separately. Here are my imports (only 2 external packages):
from __future__ import print_function
import json
from binance.client import Client
import pymysql
And my requirements.txt, with both this and the code separated in a directory just like the other I have working in a similar format:
PyMySQL==0.9.2
python_binance==0.6.9
I have been searching for the solution but no one has seem to run into this problem. It also seems that _regex is a code method to Python which makes the situation even more strange.
I have tried wiping my virtualenv out and rebuilding, rebuilding the entire file structure, pip freeze >> requirements.txt on installing both packages to make sure nothing was missed, changing the name of the imports and requirements, importing regex/re in my function, even switching to Python 2.7 for a hail mary. Nothing seems to work (despite the others working) and I get the same error every time.
Does anyone have any ideas?

In Python, how do I ensure that the used modules are available in target machines?

Often times, after developing a python script that use few specialized modules, I find that I have to let the end-party know about the dependencies and let them install them before running the script.
Is there anyway (similar to setup.py [and managing with pip]) that I can supply along with that will validate and ensure that the required modules are present before executing the script? Is there a pythonic way to do this?
You could always check if they have the dependencies, and if not, download it from the code. Example for pandas, but can be used with any module:
try:
import pandas as pd
except ImportError:
import pip
pip.main(['install', 'pandas'])
import pandas as pd
you might also want to take a look at cx_Freeze which turns your code into an .exe

Python 3.5.1 : How to check whether openpyxl package exist and if it does not exist how to download and install it "within" the script?

I started writing python codes two weeks ago and until now I have manipulated some excel data after converting it to txt file. Now, I want to manipulate excel data directly, so I need to install openpyxl package. However, my script will be used in many different places and computers (Note that: all of them use either OS X or a Linux distrubution) which might (probably do not) contain openpyxl package installed. I want my script to check whether this package exist, and if it does not exit, then download and install it.
For that purpose, as I searched and found that I could check the existence of the package by first importing the pip module and then pip.get_installed_distributions() method. However, I am not sure whether I am in a wrong way or not. Besides, I have no idea how to download and install openpyxl package "without leaving the script".
Would you like to help me ?
You are getting ahead of yourself by trying to solve a deployment problem when you don't have the software to deploy.
Having a script trying to manage its own deployment is generally a bad idea because it takes responsibility away from package managers.
Here is my answer to your question:
Build your software under the assumption that all packages are installed/deployed correctly. I recommend creating a virtualenv (virtual environment) on your development computer, using pip to install openpyxl and any other packages you need.
Once your software is built and functional, read up on how to deploy your software package to PyPi: https://pypi.python.org/pypi. Create your package by defining openpyxl as a dependency, ensure your package can be installed/run properly (I recommend adding in tests), then deploy to PyPi so anyone can use your software.
If you want to add a nice layer of validation in your script just in case, add the following to your main method:
#!/usr/bin/env python3
import sys
try:
import openpyxl
except ImportError as iE:
print("Python Excel module 'openpyxl' not found")
sys.exit(1)
... rest of script ...

mlpy, numpy, scipy on Google App Engine

Can MlPy / SciPy be used on GAE?
I believe I have imported NumPy correctly, as it does not raise any errors so far (GAE 1.6 ships with support for NumPy). However, From what I've read, I still need to import SciPy and MlPy, and I haven't been able to do it so far. Is there any documentation out there that could explain accurately how to setup MlPy on GAE (if this is even possible)?
The main reason as to why I need MlPy is that I need to do a k-means analysis (finding a cluster center point). Isn't there a "lite" library to do this that would avoid all the hassle of setting up NumPy and MlPy?
Thanks.
EDIT:
I'm trying to import scipy. What I did is:
Downloaded scipy-0.11.0b1.tar.gz
Extracted the 'scipy' folder into my GAE App folder
From a python file, call 'import scipy'
The error that I get is:
ImportError: Error importing scipy: you cannot import scipy while
being in scipy source directory; please exit the scipy source
tree first, and relaunch your python intepreter.
Libraries written in pure Python that don't require C modules should be supported.
Libraries written in Python that utilize C modules MAY be supported. The following is a link to supported and not-supported C modules. This may help in determining whether or not the library you want to use will be supported.
Google App Engine Python Library Support

Categories

Resources