Error while creating python UDF in Redshift

Error while creating python UDF in Redshift - python

I am trying to create Python UDF in Amazon Redshift, and I have successfully created the UDF with no error. I have also created the required library for this UDF successfully. But when I execute the UDF, I get the error:
No Module Named pyffx. Please look at svl_udf_log for more information
I have downloaded the library from pypi.org and uploaded it to Amazon S3. This is the link I used to download the library:
https://pypi.org/project/pyffx/#files
create library pyffx
language plpythonu
from 's3://aws-bucket/tmp/python_module/pyffx-0.3.0.zip'
credentials
'aws_iam_role=iam role'
region 'us-east-1';
CREATE OR REPLACE FUNCTION schema.ffx(src VARCHAR)
RETURNS VARCHAR
STABLE
AS $$
import pyffx
src = unicode(src)
value=(src)
l=len(value)
e = pyffx.String(b'secret-key', alphabet='abcedefghijklmnopqrstuvwxyz123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', length=l)
return e.encrypt(value)
$$ LANGUAGE plpythonu;

I managed to get it to work... sort of.
I did the following:
Downloaded pyffx via the link you provided
Extracted the .tar.gz file and created a .zip of the files
Copied the .zip file to Amazon S3
Loaded the library using your CREATE LIBRARY command
Created the function
However, when I use the function, I receive the error:
Invalid operation: AttributeError: 'module' object has no attribute 'add_metaclass'
My research suggests that the six library (that provides Python 2 and 3 compatibility) is the source of this problem. The Python Language Support for UDFs - Amazon Redshift page indicates that six 1.3 is included in Redshift, yet Pip six.add_metaclass error says that this version does not include add_metaclass. The current version of six is 1.12.
I tried to include an updated six library in the code but wasn't successful. You might be able to wrangle better than me.

Related

AWS Glue: passing additional Python modules to the job - ModuleNotFoundError

I'm trying to run a Glue job (version 4) to perform a simple data batch processing. I'm using additional python libraries that Glue environment doesn't provide with - translate and langdetect. Additionally, regardless of the Glue env provides with 'nltk' package, when I try to import it I keep receiving the error that dependencies are not found (e.g. regex._regex, _sqlite3).
I tried a few solutions to achieve my goal:
using --extra-py-files where I specified path to s3 bucket where I uploaded either:
.zip file that consists of translate and langdetect python packages
just a directory for already unzipped packages
packages itself in .whl format (along with its dependencies)
using --additional-python-modules where I specified path to s3 bucket where I uploaded:
packages itself in .whl format (along with its dependencies)
or just pinpoint which package has to be installed inside the glue env via pip3
using Docker
Additionally, I followed a few useful sources to overcome the issue of ModuleNotFoundError:
a) https://aws.amazon.com/premiumsupport/knowledge-center/glue-import-error-no-module-named/.
b) https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/
c) https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
Also, I tried to play with the Glue versions 4 and 3 but haven't had luck. It seems like a bug. All permissions to read s3 bucket is granted to the glue role. The Python script version is the same as the libraries I'm trying to install - Python 3. To give you more clues, I manage glue resources via Terraform.
What did I do wrong?

Could not find module taos while using TDengine connector

I'm using TDengine database to process time-series data. My application is developed by Python.
I imported the Python connector of TDengine. I encountered an error while loading the python module.
taos.error.InterfaceError: [0xffff]: unable to load taos C
library:Could not find module taos (or one of its dependencies)
I don't know how to fix it.
I checked the documentation, but no solution was found.

Unable to Create UDF using python in snowflake

I'm trying to create snowpark UDF in python as object. Below is my code
from snowflake.snowpark.functions import udf
from pytorch_tabnet.tab_model import TabNetRegressor
session.clearImports()
model = TabNetRegressor()
model.load_model(model_file)
lib_test = udf(lambda: (model.device), return_type=StringType())
I'm getting a error like below
Failed to execute query
CREATE
TEMPORARY FUNCTION "TEST_DB"."TEST".SNOWPARK_TEMP_FUNCTION_GES3G8XHRH()
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION=3.8
IMPORTS=('#"TEST_DB"."TEST".SNOWPARK_TEMP_STAGE_CR0E7FBWQ6/cloudpickle/cloudpickle.zip','#"TEST_DB"."TEST".SNOWPARK_TEMP_STAGE_CR0E7FBWQ6/TEST_DBTESTSNOWPARK_TEMP_FUNCTION_GES3G8XHRH_5843981186544791787/udf_py_1085938638.zip')
HANDLER='udf_py_1085938638.compute'
002072 (42601): SQL compilation error:
Unknown function language: PYTHON.
It throws error as python is not available.
I checked the packages available in information schema. It shows only scala and java. I'm not sure why python is not available in packages. How to add python to the packages? adding python will resolve this issue?
can anyone help on this? Thanks

The Python UDFs are not in production yet and are only available to selected accounts.
Please reach to Snowflake account team to have the functionality enabled.

missing_dependencies error using Pandas in Azure Web Job

I need to run some long running job via Azure Web Job in Python.
I am facing below error trying to import pandas.
File "D:\local\Temp\jobs\triggered\demo2\eveazbwc.iyd\pandas_init_.py", line 13
missing_dependencies.append(f"{dependency}: {e}")
The Web app (under which I will run the web job) also has python code using pandas, and it does not throw any error.
I have tried uploading pandas and numpy folder inside the zip file (creating venv, installing packages and zipping Lib/site-packages content), (for 32 bit and 64 bit python) as well as tried appending 'D:/home/site/wwwroot/my_app_name/env/Lib/site-packages' to sys.path.
I am not facing such issues in importing standard python modules or additional package modules like requests.
Error is also thrown in trying to import numpy.
So, I am assuming some kind of version mismatch is happening somewhere.
Any pointers to solve this will be really useful.
I have been using Python 3.x, not sure if I should try Python 2.x (virtual env, install package and zip content of Lib/site-packages).
Regards
Kunal

The key to solving the problem is to add Python 3.6.4 x64 Extension on the portal.
Steps:
Add Extensions on portal.
Create a requirements.txt file.
pandas==1.1.4
numpy==1.19.3
Create run.cmd.
Create below file and zip them into a single zip.
Upload the zip for the webjob.
For more details, please read this post.
Webjobs Running Error (3587fd: ERR ) from zipfile

Import Python module into AWS Lambda

I have followed all the steps in the documentation:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
create a directory.
Save all of your Python source files (the .py files) at the root level of this directory.
Install any libraries using pip at the root level of the directory.
Zip the content of the project-dir directory)
But after I uploaded the zip-file to lambda function, I got the error message when I test the script
my code:
import psycopg2
#my code...
the error:
Unable to import module 'myfilemane': No module named 'psycopg2._psycopg'
I don't know where is the suffix '_psycopg' from...
Any help regarding this?

You are using native libraries with lambda. We had this similar problem and here is how we solved it.
Spin a machine with AWS supported AMI that runs your real lambda.
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
As this writing, it is,
AMI name: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2
Full documentation in installing native modules your python lambda.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
Install the required modules required for your lambda,
pip install module-name -t /path/to/project-dir
and prepare your package to upload along with the native modules under lambda ami environment.
Hope this helps.

I believe this is caused because psycopg2 needs to be build an compiled with statically linked libraries for Linux. Please reference Using psycopg2 with Lambda to Update Redshift (Python) for more details on this issue. Another [reference][1] of problems of compiling psycopg2 on OSX.
There are a few solutions, but basically it comes down to installing the library on a Linux machine and using that as the Psycopg2 Library in your upload package.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error while creating python UDF in Redshift - python

Related

AWS Glue: passing additional Python modules to the job - ModuleNotFoundError

Could not find module taos while using TDengine connector

Unable to Create UDF using python in snowflake

missing_dependencies error using Pandas in Azure Web Job

Import Python module into AWS Lambda

Categories

Resources