Large amount of python multiprocessing causes Memory Errors

Large amount of python multiprocessing causes Memory Errors - python

Intro
Hi, I'm trying to run a windows OS command in a loop using python 3 multiprocessing, but when the loop gets to big (thousand commands) I'm getting memory errors and the process exits / never completes.
Why?
I need to run 65,000 commands as fast as possible, and one by one seems non efficient. these commands are a windows normal command (dir is for example).
-- I do not need the results of the command ! just for it to run.
Code
import multiprocessing
import subprocess
def worker(num):
print("worker:", num)
subprocess.Popen('dir') # or os.system('dir') for example
return
def main():
jobs = []
for i in list(range(1,65535)):
i = str(i)
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
Question
What am I doing wrong here? whats the correct way to run a windows OS command multiple times with python (while maintaining any threading).

You should limit the number of workers running at the same time.
You can use the p.is_alive() to check how many of them are currently running.

Related

Why pool.imap() does not even execute one task at Jupyter Notebook?

Even a very simple piece of code does not return anything but always keeps running.
pool = mp.Pool(processes=10)
def add1(x):
return x + 1
for x in pool.imap(add1, [1,2,3]):
print(x)
pool.close()
And any other operations can not be done if it's running, including shutting down the kernel!

Multiprocessing only works reliably when you're doing it from a file. If you are on a Mac or Windows, multiprocessing starts a new Python process. That Python process reads the file from which it was started in order to know its own code, but doesn't execute the __main__ code. It then executes what it was to execute.
This process doesn't work with Jupyter because there is no Python file to read.
Multithreading should work fine.

How to launch a couple of python scripts from a first python script and then terminate them all at once?

I have a function in a python script which should launch another python script multiple times, I am assuming this can be done like this(Script is just my imagination of how this would work.)
iterations = input("Enter the number of processes to run")
for x in range(0, iterations):
subprocess.call("python3 /path/to/the/script.py", shell=True)
but, I also need to pass over some defined variables into the other script, for example, if
x = 1
in the first script, then, I need x to have the same value in the second script without defining it there, I have NO idea how to do that.
And then also killing them, I have read about some method using PIDs, but don't those change every time?
Most of the methods I found on Google looked overly complex and what I want to do is really simple. Can anyone guide me in the right direction as to what to use and how I should go at accomplishing it?

I have a function in a python script which should launch another python script multiple times, I am assuming this can be done like this(Script is just my imagination of how this would work.)
**
Here is the subprocess manual page which contains everything I will be talking about
https://docs.python.org/2/library/subprocess.html
One of the way to call one script from other is using subprocess.Popen
something on the lines
import subprocess
for i in range(0,100):
ret = subprocess.Popen("python3 /path/to/the/script.py",stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True)
you can use the return value from Open to make the call synchronous using the communicate method.
out,err = ret.communicate()
This would block the calling script until the subprocess finishes.
I also need to pass over some defined variables into the other script??
There are multiple ways to do this.
1. Pass parameters to the called script and parse it using OptionPraser or sys.args
in the called script have something like
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-x","--variable",action="store_true",dest="xvalue",default=False)
(options,args) = parser.parse_args()
if options.xvalue == True:
###do something
in the callee script use subprocess as
ret = subprocess.Popen("python3 /path/to/the/script.py -x",stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True)
Note the addition of -x parameter
You can use args parse
https://docs.python.org/2/library/argparse.html#module-argparse
Pass the subprocess a environment variable which can be used to configure the subprocess. This is fast but this only works one way, i.e. from parent process to child process.
in called script
import os
x = int(os.enviorn('xvalue'))
in callee script set the environment variable
import os
int x = 1
os.environ['xvalue'] = str(x)
Use sockets or pipes or some other IPC method
And then also killing them, I have read about some method using PIDs, but don't those change every time?
again you can use subprocess to hold the process id and terminate it
this will give you the process id
ret.pid
you can then use .terminate to terminate the process if it is running
ret.terminate()
to check if the process is running you can use the poll method from subprocess Popen. I would suggest you to check before you terminate the process
ret.poll()
poll will return a None if the process is running

If you just need to pass some values to second script, and you need to run that
by means of subprocess module, then you may simply pass the variables as command line arguments:
for x in range(0, iterations):
subprocess.call('python3 /path/to/second_script.py -x=%s'%x, shell=True)
And recieve the -x=1 via sys.argv list inside second_script.py (using argparse module)
On the other hand, If you need to exchange something between the two scripts dynamically (while both are running), You can use the pipe mechanism or even better, use the multiprocessing (wich requires some changes in your current code), it would make communication with and controlling it (terminating it) much cleaner.

You can pass variables to subprocesses via the command line, environment variables or passing data in on stdin. Command line is easy for simple strings that aren't too long and don't themselves have shell meta characters in them. The target script would pull them from sys.argv.
script.py:
import sys
import os
import time
x = sys.argv[1]
print(os.getpid(), "processing", x)
time.sleep(240)
subprocess.Popen starts child processes but doesn't wait for them to complete. You could start all of the children, put their popen objects in a list and finish with them later.
iterations = input("Enter the number of processes to run")
processes = []
for x in range(0, iterations):
processes.append(subprocess.Popen([sys.executable, "/path/to/the/script.py", str(x)])
time.sleep(10)
for proc in processes:
if proc.poll() is not None:
proc.terminate()
for proc in processes:
returncode = proc.wait()

Python Multiprocessing Object/Function to replace Bash &/GNU Screen?

Is there a way I can write a python script that emulates the use of GNU Screen and Bash? I was originally trying to write a simple Bash script, but I suspect learning the multiprocessing module will give me a little bit flexibility down the road, not to mention that python modules are very well documented.
So, I have seen in the tutorials and documentation the use of a single function run in parallel, but am a little bit lost on how to make to use this. Any reference would be extremely helpful.
Below is basically what I want:
If I have a bunch of experiments in different python files, then in Bash:
$python experiment1.py&
$python experiment2.py& ...
In Python, if I have a bunch of functions in the same script, the main emulates the above (? this is really just a guess and don't mean to offend people other than myself with my ignorance):
import multiprocessing as mp
def experiment1():
"""run collection of simulations and collect relevant statistics"""
....
def experiment2():
"""run different collection of simulations and collect relevant statistics"""
....
if __name__ == '__main__':
one = mp.process(target = experiment1)
two = mp.process(target = experiment2)
...
one.start()
two.start()
...
one.join()
two.join()
I am not sure how I would test this except maybe my activity monitor on OSX, which doesn't seem to tell me the distribution of the cores, so suggestions as to checking python-ically without runtime would be helpful. This last question might be too general, but thought I would throw it in. Thank you for your help!

The following program runs a bunch of scripts in parallel. For each one it prints a message when it starts, and when it ends. If it exited with an error, the error code and command line are printed, and the program continues.
It runs one shell script per CPU in the system, at a time.
source
import multiprocessing as mp, subprocess
def run_script(script_name):
curproc = mp.current_process()
cmd = ['python', script_name]
print curproc, 'start:', cmd
try:
return subprocess.check_output(
cmd, shell=False)
except subprocess.CalledProcessError as err:
print '{} error: {}'.format(
curproc, dict(
status=err.returncode,
command=cmd,
)
)
finally:
print curproc, "done"
scripts = ['zhello.py', 'blam']
pool = mp.Pool() # default: num of CPUs
print pool.map(
run_script, scripts,
)
pool.close()
pool.join()
output
python: can't open file 'blam': [Errno 2] No such file or directory
<Process(PoolWorker-2, started daemon)> start: ['python', 'blam']
<Process(PoolWorker-2, started daemon)> error: {'status': 2, 'command': ['python', 'blam']}
<Process(PoolWorker-2, started daemon)> done
<Process(PoolWorker-1, started daemon)> start: ['python', 'zhello.py']
<Process(PoolWorker-1, started daemon)> done
['howdy\n', None]

input() blocks other python processes in Windows 8 (python 3.3)

Working on a multi-threaded cross-platform python3.3 application I came across some weird behavior I was not expecting and am not sure is expected. The issue is on Windows 8 calling the input() method in one thread blocks other threads until it completes. I have tested the below example script on three Linux, two Windows 7 and one Windows 8 computers and this behavior is only observed on the Windows 8 computer. Is this expected behavior for Windows 8?
test.py:
import subprocess, threading, time
def ui():
i = input("-->")
print(i)
def loop():
i = 0
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
t = time.time()
while time.time() < t+15:
if p.poll() != None:
print(i)
time.sleep(3)
i+=1
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
p.terminate()
p.wait()
def start():
t1 = threading.Thread(target=ui)
t2 = threading.Thread(target=loop)
t1.start()
t2.start()
return t2
t2 = start()
t2.join()
print('done')
copy.py:
import shutil
import sys
src = sys.argv[1]
dst = sys.argv[2]
print('Copying \'{0}\' to \'{1}\''.format(src, dst))
shutil.copy(src, dst)
Update:
While trying out one of the suggestions I realized that I rushed to a conclusion missing something obvious. I apologize for getting off to a false start.
As Schollii suggested just using threads (no subprocess or python files) results in all threads making forward progress so the problem actually is using input() in one python process will cause other python processes to block/not run (I do not know exactly what is going on). Furthermore, it appears to be just python processes that are affected. If I use the same code shown above (with some modifications) to execute non-python executables with subprocess.Popen they will run as expected.
To summarize:
Using subprocess to execute non-python executable: works as expected with and without any calls to input().
Using subprocess to execute python executable: created processes appear to not run if a a call to input() is made in the original process.
Use subprocess to create python processes with a call to input() in a new process and not the original process: A call to input() blocks all python processes spawned by the 'main' process.
Side Note: I do not have Windows 8 platform so debugging/tests can be a little slow.

Because there are several problems with input in Python 3.0-3.2 this method has been impacted with few changes.
It's possible that we have a new bug again.
Can you try the following variant, which is raw_input() "back port" (which was avaiable in Python 2.x):
...
i = eval(input("-->"))
...

It's a very good problem to work with,
since you are dependent with input() method, which, usually needs the console input,
since you have threads, all the threads are trying to communicate with the console,
So, I advice you to use either Producer-Consumer concept or define all your inputs to a text file and pass the text file to the program.

python multiprocessing can not control multiple long running console exe?

I am a newbie in Python. I recently tried to use Python script to call a console exe which is a process need long time. I will allow the exe being called as many times as the CPU can permit. And when the exe has finish its job. It should release the CPU to other new jobs. So I think I may need the multiple process control mechanism.
Since multiprocessing can only call python callable function. It can not directly call the console exe. I wrapped the subprocess.Popen(cmd) in a python function. However, after I did so, I found that before multiprocessing.Process.start() is used. The exe has already started. And problem is before it finish what it is doing (need long time), it does not return me the control. The program is freezed to wait. That is not what I want.
I am posting the codes as below:
import sys
import os
import multiprocessing
import subprocess
def subprocessExe(cmd):
return subprocess.call(cmd, shell=False, stdout=subprocess.PIPE, \
stderr=subprocess.PIPE,creationflags=0x08000000)
if __name__ == '__main__':
p = multiprocessing.Process(target=self.subprocessExe(exeFileName))
p.start()
print p, p.is_alive()
Thank you for your time!

You're calling subprocessExec when you create the multiprocessing.Process object. You should instead do this:
p = multiprocessing.Process(target=subprocessExec, args=(exeFileName,))
And then it should work.

There are number of things that are wrong in your test case. The following work for me:
import multiprocessing, subprocess
def subprocessExe(cmd):
subprocess.call([cmd], shell=False)
p = multiprocessing.Process(target=subprocessExe, args=('/full/path/to/script.sh',))
p.start()
OR
subprocess.call([cmd], shell=True)
p = multiprocessing.Process(target=subprocessExe, args=('script.sh',))
OR
subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE, \
stderr=subprocess.PIPE)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Large amount of python multiprocessing causes Memory Errors - python

You should limit the number of workers running at the same time. You can use the p.is_alive() to check how many of them are currently running.

Related

Why pool.imap() does not even execute one task at Jupyter Notebook?

How to launch a couple of python scripts from a first python script and then terminate them all at once?

Python Multiprocessing Object/Function to replace Bash &/GNU Screen?

input() blocks other python processes in Windows 8 (python 3.3)

python multiprocessing can not control multiple long running console exe?

Categories

Resources