How to stop multiprocessing in python running for the full script

How to stop multiprocessing in python running for the full script - python

I have the following code in python:
import multiprocessing
import time
print "I want this to show once at the beggining"
def leaveout():
print "I dont Want This to Show"
def hang(input1):
print "I want this to show 3 times"
print "Number = %s" % input1
def main():
p = multiprocessing.Process(target=hang, args=("0"))
p.start()
p1 = multiprocessing.Process(target=hang, args=("1"))
p1.start()
p2 = multiprocessing.Process(target=hang, args=("2"))
p2.start()
p.join()
p1.join()
p2.join()
if __name__ == '__main__':
main()
print "I want this to show once at the end"
My objective is to multiprocesses the hang function in three instances which is happening successfully. My issue is that the main function also runs three instances of the entire script resuting in the following output:
c:\Evopt>python multiprocessingprac.py
I want this to show once at the beggining
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 2
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 1
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 0
I want this to show once at the end
How can I stop this happening?

When spawning a new process, Windows creates a blank process. A new Python interpreter is then loaded in the spawned process and it's given the same code base to interpret.
That's why you see duplicated print statements being executed. As they are top level expressions, they will be executed every time a process will evaluate that code.
In Unix OSes this is not observed because it implements a totally different process creation mechanism (the fork strategy) which does not require a new Python interpreter to be loaded again in the child process.
To fix your issue, you need to remove the print( ... ) expressions from the script and move them into the main function.
def main():
print("I want this to show once at the beggining")
p0 = multiprocessing.Process( ... )
p0.start()
...
p2.join()
print("I want this to show once at the end")
You can read more about process start strategies in the multiprocessing documentation.

Related

Process starts after I call a function even though I try to start the process first

I'm trying to learn how to use multiple processes in Python and I encountered a problem similar to the example below.
I try to start a process called p1 using .start() and after that to call a function do_something(). The problem is that the function is called before the process starts.
The code I used:
import time
from multiprocessing import Process
def complex_calculation():
start_timer = time.time()
print("Started calculating...")
[x ** 2 for x in range(20000000)] # calculation
print(f"complex_calculation: {time.time() - start_timer}")
def do_something():
print(input("Enter a letter: "))
if __name__ == "__main__":
p1 = Process(target=complex_calculation)
p1.start()
do_something()
p1.join()
It seems to work if I use time.sleep():
if __name__ == "__main__":
p1 = Process(target=complex_calculation)
p1.start()
time.sleep(1)
do_something()
p1.join()
My questions are:
Why does this happen?
What can I do so that I don't have to use time.sleep() ?

As pointed out in the comments, multiple processes run concurrently. Without doing some extra work, there are never guarantees about the order in which the processes are scheduled to run by the operating system. So while you call p1.start() before do_something(), all that means is that the Python code related to starting the process has completed before do_something is run. But the actual process represented by p1 may run in any way relative to the remainder of the Python code. It can run entirely before, entirely after, or interleaved in any way with the remainder of the Python code. Relying on it being scheduled in any particular way is one definition of a race condition.
To control the way in which these processes run relative to one another, you need a synchronization primitive. There are many ways to synchronize processes, it just depends on what you want to accomplish. If you want to make sure that the complex_calculation function has started before do_something is called, an event is probably the simplest approach. For example:
import time
from multiprocessing import Process, Event
def complex_calculation(event):
event.set() # Set the event, notifying any process waiting on it
start_timer = time.time()
print("Started calculating...")
[x ** 2 for x in range(20000000)] # calculation
print(f"complex_calculation: {time.time() - start_timer}")
def do_something(event):
event.wait() # Wait for `complex_calculation` to set the event
print(input("Enter a letter: "))
if __name__ == "__main__":
event = Event()
p1 = Process(target=complex_calculation, args=(event,))
p1.start()
do_something(event)
p1.join()
You should see something like:
$ python3 test.py
Started calculating...
Enter a letter: a
a
complex_calculation: 6.86732816696167

Python Multiprocessing: Execute code serially before and after parallel execution

Novice here: I am trying to execute some code serially and then create a pool of threads and execute some code in parallel. After the parallel execution is done, I want to execute some more code serially.
For example...
import time
from multiprocessing import Pool
print("I only want to print this statement once")
def worker(i):
"""worker function"""
now = time.time()
time.sleep(i)
then = time.time()
print(now, then)
if __name__ == '__main__':
with Pool(3) as p:
p.map(worker, [1, 1, 1])
p.close()
print("Only print this once as well")
I would like this to return...
I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
However what it returns is this:
I only want to print this statement once
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
So it seems to be running the print statements an additional time for each pool.
Any help would be appreciated!

Based on the observed behaviour, I assume you are on a NT/Windows Operating System.
The reason you see all those prints is because on Windows the spawn start strategy is used. When a new process is "spawned", a new Python interpreter is launched and it receives the module and the function it's supposed to execute. When the new interpreter imports the module, the top level print functions are executed. Hence the duplicate prints.
Just move those print statement within the __main__ and you won't see them again.

methods of an object running in parallel can't see change to attribute

I got a object with two principals functions :
1) one is "scanning", and when find something of interest, add it to as list, an attribute of the current object.
2) second one make operation on what is found and stored in the list.
I want the first function run for a long time, like in background, and the second one dealing with what is stored. So I tried using multithreading.
But when my first function modify the list, the second can't see the modification. Minimal example below.
Here the minimal example.
# coding: utf-8
import numpy as npy
import time
from multiprocessing import Process, RLock
class MyStruct(object):
def __init__(self):
self.listOfInterest = []
def searching(self):
while True:
time.sleep(2) # unelegant way to be sure that one process doesn't block other one from running
a = npy.random.randn(1)[0] # random way to add something to the list
if a>=0:
self.listOfInterest.append(a)
print(' add ',str(a),' to list ')
def doingStuff(self):
while True:
time.sleep(1)
try:
a = self.listOfInterest[0] # pb here : list allways empty for this function
# doing stuff on a we don't care here
except IndexError:
print(' list still empty, nothing to deal with ')
if __name__=='__main__':
mystruct = MyStruct()
p1 = Process(target=mystruct.searching)
p2 = Process(target=mystruct.doingStuff)
p1.start()
p2.start()
p1.join()
p2.join()

After comment of "hop" in my question, answer is quite easy. Just have to use Thread instead of process.
After a look on wikipedia, in first and rough approximation, we can say that thread and process are both a list of instruction, but thread shares memories.
If we want to be more precise (at leat if we try...), here is what happen in the problematic code :
1) we run the command "python3 MyStruct.py" , the kernel launch a process, let's call it P. This process gets its memory, and stock in some part an object mystruct build by python
2) when P runs the commands p1.start() et p2.start(), it makes what we call a fork. Like a biological cell splitting in two cells, it becomes two process, let's call then P1 et P2, each one of them independant of the other. Each of them gets a copy of the object, and works on it. So, when one them modify listOfInterest, it's on the object is own process memory, so the other one can't see it.
if we use thread instead of process, the process P runs two threads wich are part of its, and they will share memory.
here the modify code
import numpy as npy
import time
from threading import Thread
class MyStruct(object):
def __init__(self):
self.listOfInterest = []
def searching(self):
while True:
time.sleep(2) # unelegant way to be sure that one process doesn't block other one from running
a = npy.random.randn(1)[0] # random way to add something to the list
if a>=0:
self.listOfInterest.append(a)
print(' add ',str(a),' to list ')
def doingStuff(self):
while True:
time.sleep(1)
try:
a = self.listOfInterest[0] # pb here : list allways empty for this function
# doing stuff on a we don't care here
except IndexError:
print(' list still empty, nothing to deal with ')
if __name__=='__main__':
mystruct = MyStruct()
p1 = Thread(target=mystruct.searching)
p2 = Thread(target=mystruct.doingStuff)
p1.start()
p2.start()
p1.join()
p2.join()

Python multiprocessing start method doesn't run the process

I'm new to multiprocessing and I'm trying to check that I can run two process simultaneously with the following code :
import random, time, multiprocessing as mp
def printer():
"""print function"""
z = random.randit(0,60)
for i in range(5):
print z
wait = 0.2
wait += random.randint(1,60)/100
time.sleep(wait)
return
if __name__ == '__main__':
p1 = mp.Process(target=printer)
p2 = mp.Process(target=printer)
p1.start()
p2.start()
This code does not print anything on the console although I checked that the process are running thanks to the is.alive() method.
However, I can print something using :
p1.run()
p2.run()
Question 1 : Why doesn't the start() method run the process ?
Question 2 : While running the code with run() method, why do I get a sequence like
25,25,25,25,25,11,11,11,11,11
instead of something like
25,25,11,25,11,11,11,25,11,25 ?
It seems that the process run one after the other.
I would like to use multiprocessing for using the same function on multiple files to parallelize file conversion.

I made the script run by adding
from multiprocessing import Process
However, I don't have a random sequence of two numbers, the pattern is always A,B,A,B.. If you know how to show that the two process run simultaneously, any ideas are welcome !

Run long process continously using Tkinter (Python 2.7)

After some time trying to figure out how the Tkinter library works I have run into a problem. The script i wrote uses multiprocessing, and because the script needs to be as fast as possible I minimized the amount of traffic between the processes. This means that it takes about a minute to complete an enormous amount of tasks.
(If this task gets aborted halfway through, the used files will get corrupted).
The problem is that i want a stop button in my GUI to stop the script the proper way. After some research i haven't made any progress in finding a solution, so maybe some of you could help. I basically need a way to tell the script halfway through a task that it has to stop, after which the script will continue until the task is finished.
Edit:
The way my script is set up:
(This is missing the Tkinter part, because i don't know the solution to it yet).
from multiprocessing import Pool
def Setup():
#defines all paths of the files that are edited (and a whole lot more)
def Calculation(x, y, Primes):
#takes an x and y value, calculates the value of that coordinate and determines
#if the value is prime. Returns True of False, and the calculated value.
def Quadrant(List):
#takes a huge list of coordinates that have to be calculated. These
#coordinates (x and y) are passed to the 'Calculation' function, one by one.
#Returns all the calculated coordinates and if they are prime or not (boolean)
if __name__ == "__main__":
Filenames = Setup()
Process = Pool(4)
while True:
#Loop the main bit of the code to keep expanding the generated image
Input = [List of all coordinates, split into 4 quadrants (seperate lists) evenly]
Output = Process.map(Quadrant, Input)
#Combine all data and update list of primes
#Detects if escape is pressed, stops if true.
I am basically looking for a way to stop the while loop above, or an alternative to this loop.

I basically meant that the task has to stop, without aborting it suddenly. The script has to wait untill it's task is finished, and then look if a button is pressed to decide if it has to continue or not
We have no code from you to respond to, so if you are using a while() (note that you can also just issue a return from the function if some condition is True/False).
import time
from multiprocessing import Process, Manager
def test_f(test_d):
""" frist process to run
exit this process when dictionary's 'QUIT' == True
"""
while not test_d["QUIT"]:
print " test_f", test_d["QUIT"]
time.sleep(1.0)
def test_f2(name):
""" second process to run. Runs until the for loop exits
"""
for j in range(0, 10):
print name, j
time.sleep(0.5)
print "second process finished"
if __name__ == '__main__':
##--- create a dictionary via Manager
manager = Manager()
test_d = manager.dict()
test_d["QUIT"] = False
##--- start first process and send dictionary
p = Process(target=test_f, args=(test_d,))
p.start()
##--- start second process
p2 = Process(target=test_f2, args=('P2',))
p2.start()
##--- sleep 2 seconds and then change dictionary
## to exit first process
time.sleep(2.0)
print "\nterminate first process"
test_d["QUIT"] = True
print "test_d changed"
print "data from first process", test_d
##--- may not be necessary, but I always terminate to be sure
time.sleep(5.0)
p.terminate()
p2.terminate()
""" Thanks Doug Hellmann
Note: It is important to join() the process after terminating it.
in order to give the background machinery time to update the.
status of the object to reflect the termination
"""
p.join()
p2.join()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to stop multiprocessing in python running for the full script - python

Related

Process starts after I call a function even though I try to start the process first

Python Multiprocessing: Execute code serially before and after parallel execution

methods of an object running in parallel can't see change to attribute

Python multiprocessing start method doesn't run the process

Run long process continously using Tkinter (Python 2.7)

Categories

Resources