How to use FS gateway handler for python - python

Currently I'm trying copy from local dir to HDFS directory using fs handler from java gateway in python, which will later the script will be running using spark-submit.
My First attempt:
fs.copyFromLocalFile(spark._jvm.org.apache.hadoop.fs.Path(local_dir,hdfs_file_path+v_file_dt))
Second attempt:
def copy_from_local_file(spark, logger, local_dir, hdfspath, delSrc=True, overwrite=True):
# copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst)
try:
fs(spark).copyFromLocalFile(delSrc, overwrite, spark._jvm.org.apache.hadoop.fs.Path(local_dir), spark._jvm.org.apache.hadoop.fs.Path(hdfspath))
log.info("copyFromLocal {} to {} success".format(local_dir, hdfspath))
except Exception as e:
log.error(e)
log.info("copyFromLocal {} to {} error".format(local_dir, hdfspath))
Error for the second attempt:
ERROR:spark-ingestor:'JavaObject' object is not callable
Since I'm not familiar with how FS functions, I simply used the copyFromLocalFile which i assumed reflects the command "-put" if i run the task using basic hdfs command.
However, both methods aren't output any file copied to HDFS directory.Appreciate any help on this issue. Thanks!

Related

Unable to run another python script in azure function using python

I have created an event grid triggered azure function in python. I have deployed my solution to azure successfully and the execution is working fine. But, I have an issue with calling another python script in the same folder location. My code is given below: -
import os, json, subprocess
import logging
import azure.functions as func
def main(event: func.EventGridEvent):
try:
correctionsMessages = event.get_json()
for correctionMessage in correctionsMessages:
strMessage = json.dumps(correctionMessage)
full_path_to_script = os.path.join(os.path.dirname(os.path.realpath(__file__)) + '/' + correctionMessage['ScriptName'] + '.py')
logging.info('Script Path: %s', full_path_to_script)
logging.info('Parameter: %s', json.dumps(detectionMessage))
subprocess.check_call('python '+ full_path_to_script + ' ' + json.dumps(strMessage))
result = json.dumps({
'id': event.id,
'data': event.get_json(),
'topic': event.topic,
'subject': event.subject,
'event_type': event.event_type,
})
logging.info('Python EventGrid trigger processed an event: %s', result)
except Exception as e:
logging.info('Error: %s', e)
The above code is giving error for subprocess.check_call. Error is "Error: [Errno 2] No such file or directory: 'python /home/site/wwwroot/Detections/Script1.py". Script1.py is in same folder with init.py. When i am running this function locally, it is working absolutely fine.
Per my experience, the error was caused by the subprocess.check_call function not know the call path of python, not due to the Script1.py path.
On your local for Azure Functions development environment, the python path has been configured in the local environment variable, so the subprocess.check_call function could invoke python via search the python execute file from the paths of environment variable. But on cloud, there is not a python path value pre-configured in the same environment variable, only the Azure Function Host know the real absoluted path for Python.
So the solution is to find out the real absoluted path of Python and use it instead of python in your code.
However, in Azure Function for Python stack runtime, I think it's not a good idea for using subprocess.check_call to spawn a child process to do some processing for a given message. The safe and correct way is to define a function in Script1.py or directly in __init__.py to pass the given message as parameters to realize the same feature.

subprocess.Popen fails in nginx

I am developing a simple website using Flask + gunicorn + nginx on a Raspberry Pi with Rasbian Jessie.
I am stuck at launching a process with this Python code:
def which(program):
def is_exe(fpath):
return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return program
else:
for path in os.environ["PATH"].split(os.pathsep):
path = path.strip('"')
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return exe_file
return None
mplayer_path = which("mplayer")
try:
player = subprocess.Popen([mplayer_path, mp3], stdin=subprocess.PIPE)
except:
return render_template('no_mp3s.html', mp3_message=sys.exc_info())
"mp3" is the path to an mp3 file while "mplayer_path" is the absolute path to mplayer, as returned by the which function described in this answer.
The code works in development when I launch flask directly. In production, when I access the website through nginx, I get the following error message through the no_mp3s.html template:
type 'exceptions.AttributeError'
AttributeError("'NoneType' object has no attribute 'rfind'",)
traceback object at 0x7612ab98
I suspect a permission issue with nginx, but being very new with Linux I am a bit lost!
Edit:
I should add that nowhere in my code (which fits in a single file) I call rfind(). Also, I am sure that the error is caught in this specific try/except because it is the only one that outputs to no_mp3s.html.
Edit:
Following blubberdiblub comments I found out that it is the which function that does not work when the app is run in nginx. Hard coding the path to mplayer seems to work!

Python NamedTemporaryFile - ValueError When Reading

I am having an issue writing to a NamedTemporaryFile in Python and then reading it back. The function downloads a file via tftpy to the temp file, reads it, hashes the contents, and then compares the hash digest to the original file. The function in question is below:
def verify_upload(self, image, destination):
# create a tftp client
client = TftpClient(ip, 69, localip=self.binding_ip)
# generate a temp file to hold the download info
if not os.path.exists("temp"):
os.makedirs("temp")
with NamedTemporaryFile(dir="temp") as tempfile, open(image, 'r') as original:
try:
# attempt to download the target image
client.download(destination, tempfile, timeout=self.download_timeout)
except TftpTimeout:
raise RuntimeError("Could not download {0} from {1} for verification".format(destination, self.target_ip))
# hash the original file and the downloaded version
original_digest = hashlib.sha256(original.read()).hexdigest()
uploaded_digest = hashlib.sha256(tempfile.read()).hexdigest()
if self.verbose:
print "Original SHA-256: {0}\nUploaded SHA-256: {1}".format(original_digest, uploaded_digest)
# return the hash comparison
return original_digest == uploaded_digest
The problem is that every time I try to execute the line uploaded_digest = hashlib.sha256(tempfile.read()).hexdigest() the application errors out with a ValueError - I/O Operation on a closed file. Since the with block is not complete I am struggling to understand why the temp file would be closed. The only possibility I can think of is that tftpy is closing the file after doing the download, but I cannot find any point in the tftpy source where this would be happening. Note, I have also tried inserting the line tempfile.seek(0) in order to put the file back in a proper state for reading, however this also gives me the ValueError.
Is tftpy closing the file possibly? I read that there is possibly a bug in NamedTemporaryFile causing this problem? Why is the file closed before the reference defined by the with block goes out of scope?
TFTPy is closing the file. When you were looking at the source, you missed the following code path:
class TftpClient(TftpSession):
...
def download(self, filename, output, packethook=None, timeout=SOCK_TIMEOUT):
...
self.context = TftpContextClientDownload(self.host,
self.iport,
filename,
output,
self.options,
packethook,
timeout,
localip = self.localip)
self.context.start()
# Download happens here
self.context.end() # <--
TftpClient.download calls TftpContextClientDownload.end:
class TftpContextClientDownload(TftpContext):
...
def end(self):
"""Finish up the context."""
TftpContext.end(self) # <--
self.metrics.end_time = time.time()
log.debug("Set metrics.end_time to %s", self.metrics.end_time)
self.metrics.compute()
TftpContextClientDownload.end calls TftpContext.end:
class TftpContext(object):
...
def end(self):
"""Perform session cleanup, since the end method should always be
called explicitely by the calling code, this works better than the
destructor."""
log.debug("in TftpContext.end")
self.sock.close()
if self.fileobj is not None and not self.fileobj.closed:
log.debug("self.fileobj is open - closing")
self.fileobj.close() # <--
and TftpContext.end closes the file.

Python fabric calling script "remote path"

I'm using fabric to connect to remote host, when i'm there, I try to call a script that I made (It parses the file I give in argument). But when I call the script from inside my Fabfile.py, it assumes the path I gave is from the machine I launch the fabfile from (so not my remote host)
In my fabfile.py I have:
Import import servclasse
env.host='host1'
def listconf():
#here I browes to the correct folder
s=servclasse.Server("my.file") #this is where I want it to open the host1:my.file file and instanciate a classe from what it parsed
If i do this, it tries to open the file from the folder where servclass.py is. Is there a way to give a "remote path" in argument? I would rather not downloading the file.
Should I upload the script servclasse.py with the operation.put before calling it?
Edit: more info
In my servclasse I have this:
def __init__(self, path):
self.config = ConfigParser.ConfigParser(allow_no_value=True)
self.config.readfp(open(path))
The function open() was the problem.
I figured out how to do it so i'll drop it here in case someone read this topic one day :
def listconf():
#first I browes to the correct folder then
contents = StringIO.StringIO()
get("MyFile",contents)
contents.seek(0)
s=Server(contents)
and in the servclass.py
def __init__(self, objfile):
self.config = ConfigParser.ConfigParser(allow_no_value=True)
self.config.readfp(objfile)
#and i do my stuffs

run an Excel Macro from Python (but an Excel process stays in memory)

with the following python method, I make a call to an Excel macro. I was happy when I got that to work, however I was wondering everytime I was executing this I could see a Windows temporary/lock file of the same name as the .XLA I was using the Macro from.
class XlaMakeUnmake:
__configFile = 'xla_svn.cfg'
__vbaTools = 'makeProtected.xla'
__entries = {}
def __init__(self):
self.__readConfig()
def __newDirs(self, dir):
try:
os.makedirs(dir)
except OSError, e:
if e.errno != errno.EEXIST:
raise
### WILL ONLY WORK FOR UNPROTECTED
'''
filePath: path of XLA to unmake
outputDir: folder in which the lightweight xla and its components will be pushed to
'''
def unmake(self, filePath, outputDir):
xl = win32com.client.Dispatch("Excel.Application")
xl.Visible = 0
self.__newDirs(outputDir)
xl.Application.Run("'" + os.getcwd() + os.sep + self.__vbaTools + "'!Unmake", filePath, outputDir)
When I open the task manager, I can see that an Excel process is still running... How to kill it but in a clean fashion when the job is done ? Is xl.Application.Run launching an asynchronous call to the Macro ? In which case it might be tricky...
Thanks guys !! ;)
I don't know Python, but you need to use:
xl.Quit

Categories

Resources