PIG - ExecException: ERROR 1070: Could not resolve testPyUDF.testFunc using imports - python

I am facing a basic issue of importing a python script into PIG. i'm just trying a simple script like this:
PIG script
REGISTER 'test.py' using jython as testPyUDF;
load_data = LOAD '$input_path' USING PigStorage(',') AS (row1);
resp = FOREACH load_data generate row1, testPyUDF.testFunc() as (test:CHARARRAY);
DUMP resp;
test.py (Python UDF to import):
#outputSchema("test:chararray")
def testFunc():
return "test"
But this results in an error: ExecException: ERROR 1070: Could not resolve testPyUDF.testFunc using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin., com.yahoo.yst.sds.ULT., org.apache.pig.piggybank.evaluation., org.apache.pig.piggybank.evaluation.datetime., org.apache.pig.piggybank.evaluation.decode., org.apache.pig.piggybank.evaluation.math., org.apache.pig.piggybank.evaluation.stats., org.apache.pig.piggybank.evaluation.string., org.apache.pig.piggybank.evaluation.util., org.apache.pig.piggybank.evaluation.util.apachelogparser., string., util., math., datetime., sequence., util., java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
I even tried changing the python script path in REGISTER to HDFS path:
REGISTER 'hdfs:///user/xxx/......./test.py' using jython as testPyUDF;
and also absolute path on local..
REGISTER '/homes/user/..../test.py' using jython as testPyUDF;
No luck either way. What I am missing here?

Related

Could not load the file : Error when calling a dll in python project

We have a python library(Lets call it TestLibrary), made as a 'whl' file.
We are consuming that library in another python project(Main Project)(flask based).
In test library, we are calling a dll(C#, Net standard 2.0) , which has few encryption methods, which returns us encrypted data.
Now this test library gives error when called those encryption methods from TestLibrary.
How can we consume those dll's in TestLibrary, and get the data on main project.
// below code is in TestLibrary
def get_encrypted_data():
try:
clr.AddReference('folder/dlls/EncryptionDLL')
from EncryptionDLL import EncryptionClass
encryptionObj = EncryptionClass()
encryptedData = encryptionObj.Encrypt('Data', 'Encryption Key')
return encryptedData
except Exception as e:
return e
//Below Code is in Flask Application
//pip install TestLibrary
from TestLibrary import get_encrypted_data
encryptedData = get_encrypted_data(); //Error here, not able to read dll
I have tried it with, PythonNet, LibMono installation. It works fine when created a POC with only that dll in python.
When we place it another library and consume that library, we are getting error.

Calling python file in Variables section of robotframework 3.2

replace "Variables config" with "Variables /path/CLOUD234/__init__.py" in robot framework .Cloud instance is defined at run time .In each run the value changes ,so I have created a python file initpath.py as follows with fun() keyword .It will return the required path .How can I call it in Variables section of robot framework ? Thank you in advance.
import socket
import re
import os
def fun():
name = socket.gethostname()
pattern = ".*CLOUD[0-9]*"
hname = re.findall(pattern,name)
cloud_instance = hname[0].replace("-","_")
init_file = "/path/{}/__init__.py".format(cloud_instance)
return init_file
Variables section do not execute any code.
I suggest you run the python under testcase/suite up and use Set Test Variable to set the variable.
Try following: Here, you are not required to use any variables/section to call py file.
Under *** Settings *** section add
Library initpath.py

Unable to run another python script in azure function using python

I have created an event grid triggered azure function in python. I have deployed my solution to azure successfully and the execution is working fine. But, I have an issue with calling another python script in the same folder location. My code is given below: -
import os, json, subprocess
import logging
import azure.functions as func
def main(event: func.EventGridEvent):
try:
correctionsMessages = event.get_json()
for correctionMessage in correctionsMessages:
strMessage = json.dumps(correctionMessage)
full_path_to_script = os.path.join(os.path.dirname(os.path.realpath(__file__)) + '/' + correctionMessage['ScriptName'] + '.py')
logging.info('Script Path: %s', full_path_to_script)
logging.info('Parameter: %s', json.dumps(detectionMessage))
subprocess.check_call('python '+ full_path_to_script + ' ' + json.dumps(strMessage))
result = json.dumps({
'id': event.id,
'data': event.get_json(),
'topic': event.topic,
'subject': event.subject,
'event_type': event.event_type,
})
logging.info('Python EventGrid trigger processed an event: %s', result)
except Exception as e:
logging.info('Error: %s', e)
The above code is giving error for subprocess.check_call. Error is "Error: [Errno 2] No such file or directory: 'python /home/site/wwwroot/Detections/Script1.py". Script1.py is in same folder with init.py. When i am running this function locally, it is working absolutely fine.
Per my experience, the error was caused by the subprocess.check_call function not know the call path of python, not due to the Script1.py path.
On your local for Azure Functions development environment, the python path has been configured in the local environment variable, so the subprocess.check_call function could invoke python via search the python execute file from the paths of environment variable. But on cloud, there is not a python path value pre-configured in the same environment variable, only the Azure Function Host know the real absoluted path for Python.
So the solution is to find out the real absoluted path of Python and use it instead of python in your code.
However, in Azure Function for Python stack runtime, I think it's not a good idea for using subprocess.check_call to spawn a child process to do some processing for a given message. The safe and correct way is to define a function in Script1.py or directly in __init__.py to pass the given message as parameters to realize the same feature.

Use reticulate to call Python script and send email

I use Windows Task Scheduler to run an R Script several times a day. The script transforms some new data and adds it to an existing data file.
I want to use reticulate to call a Python script that will send me an email listing how many rows of data were added, and if any errors occurred. This works correctly when I run it line by line from within RStudio. The problem is that it doesn't work when the script runs on schedule. I get the following errors:
Error in py_run_file_impl(file, local, convert) :
Unable to open file 'setup_smtp.py' (does it exist?)
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module '__main__' has no attribute 'message'
Calls: paste0 ... py_get_attr_or_item -> py_get_attr -> py_get_attr_impl
Execution halted
This github answer https://github.com/rstudio/reticulate/issues/232) makes it sound like reticulate can only be used within RStudio - at least for what I'm trying to do. Does anyone have suggestions?
Sample R script:
library(tidyverse)
library(reticulate)
library(lubridate)
n_rows <- 10
time_raw <- now()
result <- paste0("\nAdded ", n_rows,
" rows to data file at ", time_raw, ".")
try(source_python("setup_smtp.py"))
message_final <- paste0(py$message, result)
try(smtpObj$sendmail(my_email, my_email, message_final))
try(smtpObj$quit())
The Python script ("setup_smtp.py") is like this:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Call from reticulate to log in to email
"""
import smtplib
my_email = '...'
my_password = '...'
smtpObj = smtplib.SMTP('smtp.office365.com', 587)
smtpObj.ehlo()
smtpObj.starttls()
smtpObj.login(my_email, my_password)
message = """From: My Name <email address>
To: My Name <email address>
Subject: Test successful!
"""
This execution problem
This works correctly when I run it line by line from within RStudio. The problem is that it doesn't work when the script runs on schedule
can stem from multiple reasons:
You have multiple Python versions where smtplib is installed on one version (e.g., Python 2.7 or Python 3.6) and not the other. Check which Python is being used at command line, Rscript -e "print(Sys.which("python"))" and RStudio, Sys.which("python"). Explicitly define which Python.exe to run with reticulate's use_python("/path/to/python").
You have multiple R versions where Rscript uses a different version than RStudio. Check R.home() variable in both: Rscript -e "print(R.home())" and call R.home() in RStudio. Explicitly call the required Rscript in appropriate R version bin folder: /path/to/R #.#/bin/Rscript "/path/to/code.R".
You have multiple reticulate packages installed on same R version, residing in different library locations, each calling a different Python version. Check with the matrix: installed.package(), locating the reticulate row. Explicitly call library(reticulate, lib.loc="/path/to/specific/library").

How to use gnomevfs.get_mime_type()?

Python gnome binding have gnomevfs module that theoretically can get MIME types. But calling gnomevfs.get_mime_type() with any name other than "/dev/null" raises error "RuntimeError: there was an error reading the file". For example:
import gnomevfs
gnomevfs.get_mime_type( "/tmp/a.py" )
gnomevfs.get_mime_type( "file://tmp/a.py" )
gnomevfs.get_mime_type( "file:///tmp/a.py" )
gnomevfs.get_mime_type( "file://./tmp/a.py" )
This all fails. With any file / folder name except "/dev/null" :(. /tmp/a.py exists and acessible. Any suggestions?
It works for me. Have you tried with other files in other directories? Are you sure that the user under which Python is being run has access to /tmp/a.py?

Categories

Resources