Pyinstaller not allowing multiprocessing with MacOS - python

I have a python file that I would like to package as an executable for MacOS 11.6.
The python file (called Service.py) relies on one other json file and runs perfectly fine when running with python. My file uses argparse as the arguments can differ depending on what is needed.
Example of how the file is called with python:
python3 Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
The file is run in exactly the same way when it is an executable:
./Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
I can package the file using PyInstaller and the executable runs.
Command used to package the file:
pyinstaller --paths=/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ Service.py
However, when I get to the point that requires multiprocessing, the arguments get lost. My second argument (here noted as https://myurl.cloud) is a URL that I require.
The error I see is:
[MAIN] Starting new process RUNID9157
url before constructing the client recognised as pipe_handle=15
usage: Service [-h] test_management_tool url
Service: error: the following arguments are required: url
Traceback (most recent call last):
File "urllib3/connection.py", line 174, in _new_conn
File "urllib3/util/connection.py", line 72, in create_connection
File "socket.py", line 954, in getaddrinfo
I have done some logging and the url does get correctly read. But as soon as the process started and picking up what it needs to, the url is changed to 'pipe_handle=x', in the code above it is pipe_handle=15.
I need the url to retrieve an authentication token, but it just stops being read as the correct value and is changed to this pipe_handle value. I have no idea why.
Has anyone else seen this?!
I am using Python 3.9, PyInstaller 4.4 and ArgParse.
I have also added
if __name__ == "__main__":
if sys.platform.startswith('win'):
# On Windows - multiprocessing is different to Unix and Mac.
multiprocessing.freeze_support()
to my if name = "main" section as I saw this on other posts but it doesn't help.
Can someone please assist?

Sending commands via sys.argv is complicated by the fact that multiprocessing's "spawn" start method uses that to pass the file descriptors for the initial communication pipes between the parent and child.
I'm projecting here a little because you did not share the code of how/where you call argparse, and how/where you call multiprocessing
If you are parsing args outside of if __name__ == "__main__":, the args may get parsed (re-parsed on child import __main__) before sys.argv gets automatically cleaned up by multiprocessing.spawn.prepare() in the child. You should be able to fix this by moving the argparse stuff inside your target function. It also may be easier to parse the args in the parent, and simply send the parsed results as an argument to the target function. See this answer of mine for further discussion on sys.argv with multiprocessing.

Related

Python multiprocessing throws error with argparse and pyinstaller

In my project, I'm using argprse to pass arguments and somewhere in script I'm using multiprocessing to do rest of the calculations. Script is working fine if I call it from command prompt
for ex.
"python complete_script.py --arg1=xy --arg2=yz" .
But after converting it to exe using Pyinstaller using command "pyinstaller --onefile complete_script.py" it throws
error
" error: unrecognized arguments: --multiprocessing-fork 1448"
Any suggestions how could I make this work. Or any other alternative. My goal is to create an exe application which I can call in other system where Python is not installed.
Here are the details of my workstation:
Platform: Windows 10
Python : 2.7.13 <installed using Anaconda>
multiprocessing : 0.70a1
argparse: 1.1
Copied from comment:
def main():
main_parser = argparse.ArgumentParser()
< added up arguments here>
all_inputs = main_parser.parse_args()
wrap_function(all_inputs)
def wrap_function(all_inputs):
<Some calculation here >
distribute_function(<input array for multiprocessing>)
def distribute_function(<input array>):
pool = Pool(process = cpu_count)
jobs = [pool.apply_async(target_functions, args = (i,) for i in input_array)]
pool.close()
(A bit late but it can be useful for someone else in the future...)
I had the same problem, after some research I found this multiprocessing pyInstaller recipe that states:
When using the multiprocessing module, you must call
multiprocessing.freeze_support()
straight after the if __name__ == '__main__': line of the main module.
Please read the Python library manual about multiprocessing.freeze_support for more information.
Adding that line of code solved the problem for me.
I may be explaining the obvious, but you don't give us much information to work with.
python complete_script.py --arg1=xy --arg2=yz
This sort of call tells me that your parser is setup to accept at least these 2 arguments, ones flagged with '--arg1' and '--arg2'.
The error tells me that this parser (or maybe some other) is also seeing this string:
--multiprocessing-fork 1448
Possibly generated by the multiprocessing code. It would be good to see the usage part of the error, just to confirm which parser is complaining.
One of my first open source contributions to Python was to enhance the warnings about multiprocessing on Windows.
https://docs.python.org/2/library/multiprocessing.html#windows
Is your parser protected by a if __name__ block? Should this particular parser be called when run in a fork? You probably designed the parser to work when the program is called as a standalone script. But when happens when it is imported?

PYTHONPATH variable missing when using os.execlpe to restart script as root

My end goal is to have a script that can be initially launched by a non-privileged user without using sudo, but will prompt for sudo password and self-elevate to root. I've been doing this with a bash wrapper script but would like something tidier that doesn't need an additional file.
Some googling found this question on StackOverflow where the accepted answer suggesting using os.execlpe to re-launch the script while retaining the same environment. I tried it, but it immediately failed to import a non-built-in module on the second run.
Investigating revealed that the PYTHONPATH variable is not carried over, while almost every other environment variable is (PERL5LIB is also missing, and a couple of others, but I'm not using them so they're not troubling me).
I have a brief little test script that demonstrates the issue:
#!/usr/bin/env python
import os
import sys
print(len(os.environ['PYTHONPATH']))
euid = os.geteuid()
if euid != 0:
print("Script not started as root. Running with sudo.")
args = ['sudo', sys,executable] + sys.argv + [os.environ]
os.execlpe('sudo', *args)
print("Success")
Expected output would be:
6548
Script not started as root. Running with sudo.
[sudo] password for esker:
6548
Success
But instead I'm getting a KeyError:
6548
Script not started as root. Running with sudo.
[sudo] password for esker:
Traceback (most recent call last):
File "/usr/home/esker/execlpe_test.py", line 5, in <module>
print(len(os.environ['PYTHONPATH']))
File "/vol/apps/python/2.7.6/lib/python2.7/UserDict.py", line 23, in __getitem__
raise KeyError(key)
KeyError: 'PYTHONPATH'
What would be the cause of this missing variable, and how can I avoid it disappearing? Alternatively, is there a better way about doing this that won't result in running into the problem?
I found this very weird too, and couldn't find any direct way to pass the environment into the replaced process. But I didn't do a full system debugging either.
What I found to work as a workaround is this:
pypath = os.environ.get('PYTHONPATH', "")
args = ['sudo', f"PYTHONPATH={pypath}", sys.executable] + sys.argv
os.execvpe('sudo', args, os.environ)
I.e. explicitly pass PYTHONPATH= to the new process. Note that I prefer to use os.execvpe(), but it works the same with the other exec*(), given the correct call. See this answer for a good overview of the schema.
However, PATH and the rest of the environment is still it's own environment, as an initial print(os.environ) shows. But PYTHONPATH will be passed on this way.
You're passing the environment as arguments to your script instead of arguments to execlpe. Try this instead:
args = ['sudo', sys,executable] + sys.argv + [os.environ]
os.execvpe('sudo', args, os.environ)
If you just want to inherit the environment you can even
os.execvp('sudo', args)

Using python subprocess to fake running a cmd from a terminal

We have a vendor-supplied python tool ( that's byte-compiled, we don't have the source ). Because of this, we're also locked into using the vendor supplied python 2.4. The way to the util is:
source login.sh
oupload [options]
The login.sh just sets a few env variables, and then 2 aliases:
odownload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_download_command.pyc "$#"
}
oupload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_upload_command.pyc "$#"
}
Now, when I run it their way - works fine. It will prompt for a username and password, then do it's thing.
I'm trying to create a wrapper around the tool to do some extra steps after it's run and provide some sane defaults for the utility. The problem I'm running into is I cannot, for the life of me, figure out how to use subprocess to successfully do this. It seems to realize that the original command isn't running directly from the terminal and bails.
I created a '/usr/local/bin/oupload' and copied from the original login.sh. Only difference is instead of doing an alias at the end, I actually run the command.
Then, in my python script, I try to run my new shell script:
if os.path.exists(options.zipfile):
try:
cmd = string.join(cmdargs,' ')
p1 = Popen(cmd, shell=True, stdin=PIPE)
But I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 112, in getAuth
Empty Username not legal
Unknown Error Encountered
SUMMARY:
Name: Empty Username not legal
Description: None
So it seemed like an extra carriage return was getting sent ( I tried rstripping all the options, didn't help ).
If I don't set stdin=PIPE, I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 109, in getAuth
IOError: [Errno 5] Input/output error
Unknown Error Encountered
I've tried other variations of using p1.communicate, p1.stdin.write() along with shell=False and shell=True, but I've had no luck in trying to figure out how to properly send along the username and password. As a last result, I tried looking at the byte code for the utility they provided - it didn't help - once I called the util's main routine with the proper arguments, it ended up core dumping w/ thread errors.
Final thoughts - the utility doesn't want to seem to 'wait' for any input. When run from the shell, it pauses at the 'Username' prompt. When run through python's popen, it just blazes thru and ends, assuming no password was given. I tried to lookup ways of maybe preloading the stdin buffer - thinking maybe the process would read from that if it was available, but couldn't figure out if that was possible.
I'm trying to stay away from using pexpect, mainly because we have to use the vendor's provided python 2.4 because of the precompiled libraries they provide and I'm trying to keep distribution of the script to as minimal a footprint as possible - if I have to, I have to, but I'd rather not use it ( and I honestly have no idea if it works in this situation either ).
Any thoughts on what else I could try would be most appreciated.
UPDATE
So I solved this by diving further into the bytecode and figuring out what I was missing from the compiled command.
However, this presented two problems -
The vendor code, when called, was doing an exit when it completed
The vendor code was writing to stdout, which I needed to store and operate on ( it contains the ID of the uploaded pkg ). I couldn't just redirect stdout, because the vendor code was still asking for the username/password.
1 was solved easy enough by wrapping their code in a try/except clause.
2 was solved by doing something similar to: https://stackoverflow.com/a/616672/677373
Instead of a log file, I used cStringIO. I also had to implement a fake 'flush' method, since it seems the vendor code was calling that and complaining that the new obj I had provided for stdout didn't supply it - code ends up looking like:
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = StringIO()
def write(self, message):
self.terminal.write(message)
self.log.write(message)
def flush(self):
self.terminal.flush()
self.log.flush()
if os.path.exists(options.zipfile):
try:
os.environ['OCLI_CODESET'] = 'ISO-8859-1'
backup = sys.stdout
sys.stdout = output = Logger()
# UploadCommand was the command found in the bytecode
upload = UploadCommand()
try:
upload.main(cmdargs)
except Exception, rc:
pass
sys.stdout = backup
# now do some fancy stuff with output from output.log
I should note that the only reason I simply do a 'pass' in the except: clause is that the except clause is always called. The 'rc' is actually the return code from the command, so I will probably add handling for non-zero cases.
I tried to lookup ways of maybe preloading the stdin buffer
Do you perhaps want to create a named fifo, fill it with username/password info, then reopen it in read mode and pass it to popen (as in popen(..., stdin=myfilledbuffer))?
You could also just create an ordinary temporary file, write the data to it, and reopen it in read mode, again, passing the reopened handle as stdin. (This is something I'd personally avoid doing, since writing username/passwords to temporary files is often of the bad. OTOH it's easier to test than FIFOs)
As for the underlying cause: I suspect that the offending software is reading from stdin via a non-blocking method. Not sure why that works when connected to a terminal.
AAAANYWAY: no need to use pipes directly via Popen at all, right? I kinda laugh at the hackishness of this, but I'll bet it'll work for you:
# you don't actually seem to need popen here IMO -- call() does better for this application.
statuscode = call('echo "%s\n%s\n" | oupload %s' % (username, password, options) , shell=True)
tested with status = call('echo "foo\nbar\nbar\nbaz" |wc -l', shell = True) (output is '4', naturally.)
The original question was solved by just avoiding the issue and not using the terminal and instead importing the python code that was being called by the shell script and just using that.
I believe J.F. Sebastian's answer would probably work better for what was originally asked, however, so I'd suggest people looking for an answer to a similar question look down the path of using the pty module.

Can't seem to get fortran executable to run correctly through python

I have read a bunch of different topics on SO and other sites and cannot get a direct answer to my question/problem. Currently I have this python script that runs completely fine, with the exception of no calls made to run a fortran program are working correctly. I have tried using subprocess commands, os.system commands, opening bash script files that are opened through python, and no luck. Here are some examples and errors I'm getting.
One attmept:
subprocess.Popen(["sh", "{0}{1}".format(SCRIPTS,"qlmtconvertf.sh"), "qlmt"], shell=False, stdout=subprocess.PIPE)
This gives an error that the program has trouble reading the file correctly.
forrtl: severe (24): end-of-file during read, unit 1, file /home/akoufos/lapw/Ar/lda/bcc55_mt1.5_lo_e8_o4/DOS/lat70/qlmt
Another attempt:
subprocess.Popen(["./{0}{1}".format(SOURCE,"qlmtconvertf"), "qlmt"], shell=False, stdout=subprocess.PIPE)
This gives an error of not finding the file.
File "/home/akoufos/lapw/Scripts_Plots/LAPWanalysis.py", line 59, in DOS
subprocess.Popen(["./{0}{1}".format(SOURCE,"qlmtconvertf"), "qlmt"], shell=False, stdout=subprocess.PIPE)
File "/usr/lib64/python2.7/subprocess.py", line 672, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1202, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Yet another attempt:
os.system("{0}{1}".format(SOURCE,"qlmtconvertf qlmt"))
This gives an error equivalent to the first example. In all examples SOURCE="/home/myusername/lapw/Source/", where the fortran source files are, SCRIPTS="/home/myusername/lapw/Scripts_Plots/", where I have other files and the python scripts in, qlmtconvertf is a compiled fortran program, and qlmt is a file the qlmtconvertf reads. This source code works completely fine if I call it in the shell, like I have done countless times, but I'm trying to automate calling these codes. I have written a bash script as well, that does what I need, but I'm trying to do everything through python instead. Any ideas, suggestions, or answers to what I am doing incorrectly and what is going on would be greatly appreciated. Thank you all in advance.
EDIT: I got it working with the suggestion given below by Francis. I had to keep the complete paths (i.e. /home/username/etc) and the os.path.join to call the program correctly.
import os.path
LAPW = "/home/myusername/lapw/"
SOURCE = os.path.join(LAPW,'Source')
SCRIPTS = os.path.join(LAPW,'Scripts_Plots')
QLMTCONVERT = os.path.join(SOURCE,'qlmtconvertf')
qargs = [QLMTCONVERT,'qlmt']
#CALLING PROGRAM
subprocess.Popen(qargs, stdout=subprocess.PIPE).communicate(input=None)
To get it to work correctly I had to also close the 'qlmt' file I had created during the python script. Also I am working in the directory that contains the 'qlmt' file.
(edit Also added .communicate(input=None) to the end of the subprocess. This was unnecessary for this process call, but it was important for a latter one I made in the script that tried to use a file the process was creating. From my understanding the .communicate talks to the process and basically waits for it to finish before the next python line is executed. Similar to .wait(), but more advanced. If someone who understands this more wants to elaborate, please feel free. edit)
I'm not exactly sure why this method worked, but using strings as inputs for the subprocess was giving errors. If any one has any insight on this I would be very thankful if you could pass on your knowledge. Thank you everyone for the help.
I think you forgot a slash in your filenames:
"{0}{1}".format(SOURCE,"qlmtconvertf qlmt") == '/home/myusername/lapw/Sourceqlmtconvertf qlmt'
I assume you mean this?
"{0}/{1}".format(SOURCE,"qlmtconvertf qlmt") == '/home/myusername/lapw/Source/qlmtconvertf qlmt'
I recommend using os.path.join rather than direct string construction for pathname creation:
import os.path
executable = os.path.join(SOURCE, 'qlmtconvertf')
args = ['qlmt']
subprocess.Popen(executable+args, stdout=subprocess.PIPE)

Loading a file just once on initialization of python script using Mod_WSGI and Bottle

I am pretty new to Python, Mod_WSGI and Bottle. My main problem is that when the process is run using Mod_WSGI I want it to load a file once on initialization. With running a script in terminal you would just use if __name__ == '__main__'
I need it to load the file once on initialization (or when first called) so that any subsequent calls to the process does not require the file to be reloaded. I am unsure of how to do this.
The following code is run whenever someone goes to the recommend page
#route('/recommend')
def recommend():
parser = OptionParser(usage="usage: %prog [options]")
parser.add_option('-f', '--file', default='data.csv', help='Specify csv file to read item data from.')
parser.add_option('-D', '--debug', action='store_true', dest='debug', help='Put bottle in debug mode.')
(options, args) = parser.parse_args()
return res.recommend(request)
How do I do the first 4 lines (ones involving parser) just on initialization so that I just need to call the res.recommend() whenever the recommend page is accessed?
Any help is appreciated,
Mo
For daemon mode, place it at global scope in your WSGI script file. That file is only loaded once per process. This would normally be on first request which maps to that application.
For emebedded mode if you modify a WSGI script file it can be reloaded again in same process. In that case, and still even for daemon mode if you wanted to, use a separate script file and use the WSGIImportScript directive to force loading of it on process start.
See:
http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIImportScript
You will need to know though what process group/application group your WSGI application is running in for it to be loaded in same sub interpreter, so also look at WSGIProcessGroup/WSGIApplicationGroup directives.
Python modules are run only the first time you load them.
Subsequent calls doesn't run the code again
E.g.
mod.py:
x = 10
print(x)
main.py:
import mod #will print 10
mod.x = 5
import mod #nothing is printed. mod.x == 5
What you're actually talking about is caching the results of the file read.
We'll keep this simple:
datacache = None
#route("/someroute")
def someroute():
if not datacache:
datacache = do_something_clever_with_file(open("filename"))
page = make_page_from_data(datacache)
return page
Also, parsing the script input argument in a web method is just bad form. Akin to leaving a wet fish inside your co-worker's desk.
Instead, have a configuration file with the options in there and read the configuration file.
For the braver amongst you, have a look at a memoizing decorator, turning this into a cache is left as an exercise to the reader, since caching is just memoization with expiry.

Categories

Resources