No module named 'pyarrow.lib' found from lambda function - python

I have installed pyarrow version 0.14.0. I'm creating a package to run that from lambda.
While executing from lambda i'm getting error - No module named 'pyarrow.lib'
I have incorporated pyarrow package to my deployment zip file as well. My python version used is 3.7.
Can someone please help on this issue?

The underlying problem is that modules like pyarrow port their code from C/ C++. When you check pyarrow codebase, you will find in fact two pyarrow.lib files exist, but they have .pyx and .pxd file extensions. This is not pure Python code and therefore depends on underlying CPU architecture.
Bundling pyarrow with my code in the same zip did not work, irrespective of the .whl file I used. I was able to solve the problem in two ways.
1. Lambda Layer
I had to manually download .whl files for my required version for pyarrow and its dependency numpy. From http://pypi.org/project/pyarrow/, click on Download files and search for your matching version. cp39 means cpython 3.9. and x86 represents the CPU architecture. Follow the same steps for Numpy. I ended up downloading these files: pyarrow-8.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl and numpy-1.22.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
You then have to unzip them and create an archive where both sit together in a folder named python. This folder can be used to create a layer in Lambda. Attach this layer to your project and import pyarrow should now work.
I followed this guide for creating a pyarrow layer.
2. Docker container
The other solution is to use custom Docker images. This worked smoothly for me. I believe the AWS docs are exhaustive on that topic. I have written a PoC and all the steps that I followed here.

Related

AWS Glue: passing additional Python modules to the job - ModuleNotFoundError

I'm trying to run a Glue job (version 4) to perform a simple data batch processing. I'm using additional python libraries that Glue environment doesn't provide with - translate and langdetect. Additionally, regardless of the Glue env provides with 'nltk' package, when I try to import it I keep receiving the error that dependencies are not found (e.g. regex._regex, _sqlite3).
I tried a few solutions to achieve my goal:
using --extra-py-files where I specified path to s3 bucket where I uploaded either:
.zip file that consists of translate and langdetect python packages
just a directory for already unzipped packages
packages itself in .whl format (along with its dependencies)
using --additional-python-modules where I specified path to s3 bucket where I uploaded:
packages itself in .whl format (along with its dependencies)
or just pinpoint which package has to be installed inside the glue env via pip3
using Docker
Additionally, I followed a few useful sources to overcome the issue of ModuleNotFoundError:
a) https://aws.amazon.com/premiumsupport/knowledge-center/glue-import-error-no-module-named/.
b) https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/
c) https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
Also, I tried to play with the Glue versions 4 and 3 but haven't had luck. It seems like a bug. All permissions to read s3 bucket is granted to the glue role. The Python script version is the same as the libraries I'm trying to install - Python 3. To give you more clues, I manage glue resources via Terraform.
What did I do wrong?

No module named 'numpy.core._multiarray_umath' when using AWS Lambda

I just uploaded a .zip file to AWS Lambda with all needed packages. I ran all right in my Mac using virtual environment with python 3.8. The AWS Lambda function also has python 3.8. But when I run in AWS Lambda I get this error:
No module named 'numpy.core._multiarray_umath'
I have changed the actual numpy version (1.20.2) to other versions like 1.19.1 and 1.18.5 but the problem can't be fixed.
I am also using spacy 3.0.6 and fastapi 0.63.0.
When I encountered same issue, this steps worked for me:
1- Download required packages(you may need different versions):
- pandas-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- python_dateutil-2.8.2-py2.py3-none-any.whl
- pytz-2022.1-py2.py3-none-any.whl
- numpy-1.21.5-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- If you need others ...
2- Create a project folder and unzip whl files to the folder.
3- Remove *dist-info folders.
4- Add your source code to folder(lambda_function.py)
5- Zip the folder and upload to Lambda as a source code zip file.
Also you can look these links may help you:
https://korniichuk.medium.com/lambda-with-pandas-fd81aa2ff25e
https://github.com/numpy/numpy/issues/13465#issuecomment-545378314

missing_dependencies error using Pandas in Azure Web Job

I need to run some long running job via Azure Web Job in Python.
I am facing below error trying to import pandas.
File "D:\local\Temp\jobs\triggered\demo2\eveazbwc.iyd\pandas_init_.py", line 13
missing_dependencies.append(f"{dependency}: {e}")
The Web app (under which I will run the web job) also has python code using pandas, and it does not throw any error.
I have tried uploading pandas and numpy folder inside the zip file (creating venv, installing packages and zipping Lib/site-packages content), (for 32 bit and 64 bit python) as well as tried appending 'D:/home/site/wwwroot/my_app_name/env/Lib/site-packages' to sys.path.
I am not facing such issues in importing standard python modules or additional package modules like requests.
Error is also thrown in trying to import numpy.
So, I am assuming some kind of version mismatch is happening somewhere.
Any pointers to solve this will be really useful.
I have been using Python 3.x, not sure if I should try Python 2.x (virtual env, install package and zip content of Lib/site-packages).
Regards
Kunal
The key to solving the problem is to add Python 3.6.4 x64 Extension on the portal.
Steps:
Add Extensions on portal.
Create a requirements.txt file.
pandas==1.1.4
numpy==1.19.3
Create run.cmd.
Create below file and zip them into a single zip.
Upload the zip for the webjob.
For more details, please read this post.
Webjobs Running Error (3587fd: ERR ) from zipfile

Import Python module into AWS Lambda

I have followed all the steps in the documentation:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
create a directory.
Save all of your Python source files (the .py files) at the root level of this directory.
Install any libraries using pip at the root level of the directory.
Zip the content of the project-dir directory)
But after I uploaded the zip-file to lambda function, I got the error message when I test the script
my code:
import psycopg2
#my code...
the error:
Unable to import module 'myfilemane': No module named 'psycopg2._psycopg'
I don't know where is the suffix '_psycopg' from...
Any help regarding this?
You are using native libraries with lambda. We had this similar problem and here is how we solved it.
Spin a machine with AWS supported AMI that runs your real lambda.
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
As this writing, it is,
AMI name: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2
Full documentation in installing native modules your python lambda.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
Install the required modules required for your lambda,
pip install module-name -t /path/to/project-dir
and prepare your package to upload along with the native modules under lambda ami environment.
Hope this helps.
I believe this is caused because psycopg2 needs to be build an compiled with statically linked libraries for Linux. Please reference Using psycopg2 with Lambda to Update Redshift (Python) for more details on this issue. Another [reference][1] of problems of compiling psycopg2 on OSX.
There are a few solutions, but basically it comes down to installing the library on a Linux machine and using that as the Psycopg2 Library in your upload package.

python: import com.oceanoptics.omnidriver.api.wrapper.Wrapper

I am trying to import the following api wrapper / device driver as in this python package:
import com.oceanoptics.omnidriver.api.wrapper.Wrapper
Python just returns that there is no module named like this:
ImportError: No module named com.oceanoptics.omnidriver.api.wrapper.Wrapper
I installed Omnidriver from the device manufacturer's website. Specifically, I used the installer OmniDriver-2.37-win32-installer.exe and installed the "Development version". It installs a bunch of dlls in C:\Program Files (x86)\Ocean Optics\OmniDriver\OOI_HOME.
The wrapper is working properly in Matlab after adding C:\Program Files (x86)\Ocean Optics\OmniDriver\OOI_HOME to C:\Program Files (x86)\MATLAB\R2012b\toolbox\local\librarypath.txt and C:\Program Files (x86)\Ocean Optics\OmniDriver\OOI_HOME\OmniDriver.jar to C:\Program Files (x86)\MATLAB\R2012b\toolbox\local\classpath.txt. Thereafter, I can load the wrapper in Matlab with wrapper = com.oceanoptics.omnidriver.api.wrapper.Wrapper().
I guess my python installation (Enthought Canopy 1.4.1 win 32bit) is not looking for the dlls in the correct path because I would have to tell first.
So, my question is, how do I instruct python to successfully execute the import statement above?
Another approach for interfacing to the spectrometer using Python would be to use the python-seabreeze package. The package does not have thorough documentation, but if you're willing to be patient and try things out for yourself, then you should be able to get it to work. The author has put considerable work into making the package compatible with the vast majority of Ocean Optics' spectrometers. I just finished installing it on my Windows laptop and got it to work in under an hour.
I checked the website, and can't find any reference to python support. I believe the instructions you referenced are instructions how to install the java classes. I could find no information that mentioned or discussed python modules. You should contact Oceanview for clarification.

Categories

Resources