Python multiprocessing throws error with argparse and pyinstaller

Python multiprocessing throws error with argparse and pyinstaller - python

In my project, I'm using argprse to pass arguments and somewhere in script I'm using multiprocessing to do rest of the calculations. Script is working fine if I call it from command prompt
for ex.
"python complete_script.py --arg1=xy --arg2=yz" .
But after converting it to exe using Pyinstaller using command "pyinstaller --onefile complete_script.py" it throws
error
" error: unrecognized arguments: --multiprocessing-fork 1448"
Any suggestions how could I make this work. Or any other alternative. My goal is to create an exe application which I can call in other system where Python is not installed.
Here are the details of my workstation:
Platform: Windows 10
Python : 2.7.13 <installed using Anaconda>
multiprocessing : 0.70a1
argparse: 1.1
Copied from comment:
def main():
main_parser = argparse.ArgumentParser()
< added up arguments here>
all_inputs = main_parser.parse_args()
wrap_function(all_inputs)
def wrap_function(all_inputs):
<Some calculation here >
distribute_function(<input array for multiprocessing>)
def distribute_function(<input array>):
pool = Pool(process = cpu_count)
jobs = [pool.apply_async(target_functions, args = (i,) for i in input_array)]
pool.close()

(A bit late but it can be useful for someone else in the future...)
I had the same problem, after some research I found this multiprocessing pyInstaller recipe that states:
When using the multiprocessing module, you must call
multiprocessing.freeze_support()
straight after the if __name__ == '__main__': line of the main module.
Please read the Python library manual about multiprocessing.freeze_support for more information.
Adding that line of code solved the problem for me.

I may be explaining the obvious, but you don't give us much information to work with.
python complete_script.py --arg1=xy --arg2=yz
This sort of call tells me that your parser is setup to accept at least these 2 arguments, ones flagged with '--arg1' and '--arg2'.
The error tells me that this parser (or maybe some other) is also seeing this string:
--multiprocessing-fork 1448
Possibly generated by the multiprocessing code. It would be good to see the usage part of the error, just to confirm which parser is complaining.
One of my first open source contributions to Python was to enhance the warnings about multiprocessing on Windows.
https://docs.python.org/2/library/multiprocessing.html#windows
Is your parser protected by a if __name__ block? Should this particular parser be called when run in a fork? You probably designed the parser to work when the program is called as a standalone script. But when happens when it is imported?

Related

AP Scheduler not running - "func must be a callable or a textual reference to one"

I am trying to run a script that pulls data from online sources and then emails it to me at specified times. The idea is to run this script from another Python script which uses APScheduler to run the initial script. The reason I wanted to create 2 scripts is because I want to use something like cx_freeze to make an exe file out of the 2nd script which will run in the background of my pc so as not to have to have my IDE open all the time.
My code in the 2nd script looks as follows:
from apscheduler.schedulers.blocking import BlockingScheduler
import initial_script
import os
if __name__ == '__main__':
scheduler = BlockingScheduler()
scheduler.add_job(initial_script, trigger='cron', day_of_week='mon-fri', hour='18', minute='30')
print('Press Ctrl+{0} to exit'.format('Break' if os.name == 'nt' else 'C'))
try:
scheduler.start()
except (KeyboardInterrupt, SystemExit):
pass
The 'initial_script' is the main python file that actually provides the information I need. The APScheduler code I mostly used from the Github repo here and then I consulted this link to better understand the purpose for using the __name__ = '__main__' code (which I have a feeling is the problem). In the 'initial_script' file I have not used __name__ = '__main__' anywhere.
I have made sure that both files are saved as .py formats - initially I attempted to run the scripts from .py into jupyter notebooks .ipynb but that caused more issues.
The code from the 'initial_script' does run as I am receiving the email when I run it from the 2nd script but the scheduler/trigger gives me an error and does not run.
The error I get now is as follows: TypeError: func must be a callable or a textual reference to one
Please can you assist in explaining what I am doing wrong?

The add_job() method requires that you pass it a callable. You're passing a module and modules cannot be called (you can't do initial_script()). Maybe try passing it a function that is defined in that module?

Pyinstaller not allowing multiprocessing with MacOS

I have a python file that I would like to package as an executable for MacOS 11.6.
The python file (called Service.py) relies on one other json file and runs perfectly fine when running with python. My file uses argparse as the arguments can differ depending on what is needed.
Example of how the file is called with python:
python3 Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
The file is run in exactly the same way when it is an executable:
./Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
I can package the file using PyInstaller and the executable runs.
Command used to package the file:
pyinstaller --paths=/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ Service.py
However, when I get to the point that requires multiprocessing, the arguments get lost. My second argument (here noted as https://myurl.cloud) is a URL that I require.
The error I see is:
[MAIN] Starting new process RUNID9157
url before constructing the client recognised as pipe_handle=15
usage: Service [-h] test_management_tool url
Service: error: the following arguments are required: url
Traceback (most recent call last):
File "urllib3/connection.py", line 174, in _new_conn
File "urllib3/util/connection.py", line 72, in create_connection
File "socket.py", line 954, in getaddrinfo
I have done some logging and the url does get correctly read. But as soon as the process started and picking up what it needs to, the url is changed to 'pipe_handle=x', in the code above it is pipe_handle=15.
I need the url to retrieve an authentication token, but it just stops being read as the correct value and is changed to this pipe_handle value. I have no idea why.
Has anyone else seen this?!
I am using Python 3.9, PyInstaller 4.4 and ArgParse.
I have also added
if __name__ == "__main__":
if sys.platform.startswith('win'):
# On Windows - multiprocessing is different to Unix and Mac.
multiprocessing.freeze_support()
to my if name = "main" section as I saw this on other posts but it doesn't help.
Can someone please assist?

Sending commands via sys.argv is complicated by the fact that multiprocessing's "spawn" start method uses that to pass the file descriptors for the initial communication pipes between the parent and child.
I'm projecting here a little because you did not share the code of how/where you call argparse, and how/where you call multiprocessing
If you are parsing args outside of if __name__ == "__main__":, the args may get parsed (re-parsed on child import __main__) before sys.argv gets automatically cleaned up by multiprocessing.spawn.prepare() in the child. You should be able to fix this by moving the argparse stuff inside your target function. It also may be easier to parse the args in the parent, and simply send the parsed results as an argument to the target function. See this answer of mine for further discussion on sys.argv with multiprocessing.

What is the correct way (if any) to use Python 2 and 3 libraries in the same program?

I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?

The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.

I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2

Calling a subprocess within a script using mpi4py

I’m having trouble calling an external program from my python script in which I want to use mpi4py to distribute the workload among different processors.
Basically, I want to use my script such that each core prepares some input files for calculations in separate folders, then starts an external program in this folder, waits for the output, and then, finally, reads the results and collects them.
However, I simply cannot get the external program call to work. On my search for a solution to this problem I've found that the problems I'm facing seem to be quite fundamental. The following simple example makes this clear:
#!/usr/bin/env python
import subprocess
subprocess.call(“EXTERNAL_PROGRAM”, shell=True)
subprocess.call(“echo test”, shell=True)
./script.py works fine (both calls work), while mpirun -np 1 ./script.py only outputs test. Is there any workaround for this situation? The program is definitely in my PATH, but it also fails if I use the abolute path for the call.
This SO question seems to be related, sadly there are no answers...
EDIT:
In the original version of my question I’ve not included any code using mpi4py, even though I mention this module in the title. So here is a more elaborate example of the code:
#!/usr/bin/env python
import os
import subprocess
from mpi4py import MPI
def worker(parameter=None):
"""Make new folder, cd into it, prepare the config files and execute the
external program."""
cwd = os.getcwd()
dir = "_calculation_" + parameter
dir = os.path.join(cwd, dir)
os.makedirs(dir)
os.chdir(dir)
# Write input for simulation & execute
subprocess.call("echo {} > input.cfg".format(parameter), shell=True)
subprocess.call("EXTERNAL_PROGRAM", shell=True)
# After the program is finished, do something here with the output files
# and return the data. I'm using the input parameter as a dummy variable
# for the processed output.
data = parameter
os.chdir(cwd)
return data
def run_parallel():
"""Iterate over job_args in parallel."""
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
if rank == 0:
# Here should normally be a list with many more entries, subdivided
# among all the available cores. I'll keep it simple here, so one has
# to run this script with mpirun -np 2 ./script.py
job_args = ["a", "b"]
else:
job_args = None
job_arg = comm.scatter(job_args, root=0)
res = worker(parameter=job_arg)
results = comm.gather(res, root=0)
print res
print results
if __name__ == '__main__':
run_parallel()
Unfortunately I cannot provide more details of the external executable EXTERNAL_PROGRAM other than that it is a C++ application which is MPI enabled. As written in the comment section below, I suspect that this is the reason (or one of the resons) why my external program call is basically ignored.
Please note that I’m aware of the fact that in this situation, nobody can reproduce my exact situation. Still, however, I was hoping that someone here already ran into similar problems and might be able to help.
For completeness, the OS is Ubuntu 14.04 and I’m using OpenMPI 1.6.5.

In your first example you might be able to do this:
#!/usr/bin/env python
import subprocess
subprocess.call(“EXTERNAL_PROGRAM && echo test”, shell=True)
The python script is only facilitating the MPI call. You could just as well write a bash script with command “EXTERNAL_PROGRAM && echo test” and mpirun the bash script; it would be equivalent to mpirunning the python script.
The second example will not work if EXTERNAL_PROGRAM is MPI enabled. When using mpi4py it will initialize the MPI. You cannot spawn another MPI program once you initialized the MPI environment in such a manner. You could spawn using MPI_Comm_spawn or MPI_Comm_spawn_multiple and -up option to mpirun. For mpi4py refer to Compute PI example for spawning (use MPI.COMM_SELF.Spawn).

argparse in python3.2.3 on windows 7 does not seem to parse

since I got python on windows running, here is the next problem I encountered with argparse, and for which I did not see a solution. I uses optparse before. Here is my code:
import argparse
parser = argparse.ArgumentParser(
description = 'Test description') # main description for help
parser.add_argument('-d', '--dir', # -u or --user option
dest = "dir",
help = 'directory to start with')
args = parser.parse_args()
print(args.dir)
but when I run this code with either
code.py -d test
code.py --dir test
I always get a None as output. I feel this is something trivial, and something obvious I overlooked, but I cannot see it.
Tanks
Alex

The problem seem to be caused by Windows, and how the code is tried to be executed on the command line. In the given example the test script was called directly on the command line, without python before the code, as suggested in this answer.
If the code is executed like
python code.py
the expected behavior is seen, and the arguments are correctly parsed in the code.
So either the setup of the Windows system is stil incomplete, or the suggestion in the above link is incomplete.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing throws error with argparse and pyinstaller - python

Related

AP Scheduler not running - "func must be a callable or a textual reference to one"

Pyinstaller not allowing multiprocessing with MacOS

What is the correct way (if any) to use Python 2 and 3 libraries in the same program?

Calling a subprocess within a script using mpi4py

argparse in python3.2.3 on windows 7 does not seem to parse

Categories

Resources