Problems with Pythons Multiprocessing Process Class

Problems with Pythons Multiprocessing Process Class - python

I am currently trying to get into Python.
To explain the code below, you can see a program to compare two strategies in Roulette with many runs.
The color doubling strategy without knowing what hits were before the start.
The color doubling strategy with knowing that red got hit 10 times before, so I start with start value times 2^10
The "Player" class inherits both strategies. The global "globalBal_1" and "globalBal_2" variables count the profit for each strategy.
But the algorithm shall not be the problem. The main problem is that when I run the calculating function "run" with a normal call, it delivers me results. The multiprocessing processes for some reason do not change the global "globalBal_1" and "globalBal_2" variables and thus don't deliver results. Rather they do have value "0" as I have declared them initially.
What am I doing wrong there? I'm fairly new into multiprocessing and Python itself.
Edit:
Expected values for "globalBal_1" and"globalBal_2" are about half of "rounds", so in this case should be "500.000" (per process it is 500.000 / amount of processes).
But the actual results for the multiprocessing runs are "0".
Code:
from numpy.random import randint
import time
from multiprocessing import Process
threads = 4
rounds = int(1000000 / threads)
globalBal_1 = 0
globalBal_2 = 0
class Player:
def __init__(self):
self.balance_1 = 0
self.balance_2 = 0
def strat_1(self, sequence):
counter = 0
for i in range(len(sequence) - 1):
if sequence[i]:
counter += 1
self.balance_1 += counter
def strat_2(self, sequence):
for i in range(len(sequence) - 1 - 1):
if sequence[i] == 1:
return
if sequence[len(sequence) - 1]:
self.balance_2 += 2 ** (len(sequence) - 0)
def getBal_1(self):
return self.balance_1
def getBal_2(self):
return self.balance_2
def run(count):
p1 = Player()
print("Inside run func")
global globalBal_1, globalBal_2
for i in range(count):
rolls = randint(0, 2, 10)
p1.strat_1(rolls)
p1.strat_2(rolls)
globalBal_1 += p1.getBal_1()
globalBal_2 += p1.getBal_2()
print("Finished run func")
if __name__ == '__main__':
start = time.time()
procs = [Process(target=run, args=(rounds,)) for t in range(threads)]
for p in procs:
p.start()
for p in procs:
p.join()
tempEnd = time.time()
print("Multiprocessing result:")
print(globalBal_1, globalBal_2, tempEnd - start)
print("\nSingle process:")
run(rounds)
end = time.time()
print(globalBal_1, globalBal_2, end - start)
Solution thanks to #mirmo and #Bing Wang:
def runMulti(count, result1, result2):
p1 = Player()
for i in range(count):
rolls = randint(0, 2, 10)
p1.strat_1(rolls)
p1.strat_2(rolls)
result1.value += p1.getBal_1()
result2.value += p1.getBal_2()
[...]
profit1 = Value('i', 0)
profit2 = Value('i', 0)
procs = [Process(target=runMulti, args=(rounds, profit1, profit2)) for t in range(threads)]

Please always include the actual and expected output.
The global variables are not updated simply because there are now 4 separate processes created (hence the name multiprocessing) which have no access to the global variables you have created, or more specifically, do not have access to the global variables of the parent process.
Either return the value for each process and add them up at the end, create a queue or as mentioned use a shared object.

Related

Time out a function if it is taking more than 1 minutes in python

I am running a multiprocessing task in python, how can I timeout a function after 60seconds.
What I have done is shown in the snippet below:
import multiprocessing as mp
from multiprocessing import Pool
from multiprocessing import Queue
def main():
global f
global question
global queue
queue = Queue()
processes = []
question = [16,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,21,20,23]
cores=5
loww=0
chunksize = int((len(question)-loww)/cores)
splits = []
for i in range(cores):
splits.append(loww+1+((i)*chunksize))
splits.append(len(question)+1)
print("",splits)
args = []
for i in range(cores):
a=[]
arguments = (i, splits[i], splits[i+1])
a.append(arguments)
args.append(a)
print(args)
p = Pool(cores)
p.map(call_process, args)
p.close()
p.join
def call_process(args):
## end this whole block if it is taking more than 1 minutes
starttime = time.time()
lower=args[0][1]
upper=args[0][2]
for x in range(lower,upper):
if time.time() >= starttime + 60: break
a = question[x-1]
try:
pass
# a lot of functions is called and returned here
except:
continue
#write item to file
print('a = ',a)
return a
main()
I want to ensure that the call_process() method does not run for more than a minute for a particular value. Currently, I am using if time.time() >= starttime + 60: break which would not work effectively as I have different functions and things happening in the try and except block. What can I do?

Use timeout to return if function has not finished

I have the following scenario:
res = []
def longfunc(arg):
# function runs arg number of steps
# each step can take 500 ms to 2 seconds to complete
# longfunc keeps adding result of each step into the array res
def getResult(arg,timeout):
# should call longfunc()
# if longfunc() has not provided result by timeout milliseconds then return None
# if there is partial result in res by timeout milliseconds then return res
# if longfunc() ends before timeout milliseconds then return complete result of longfunc i.e. res array
result = getResult(2, 500)
I am thinking of using multiprocessing.Process() to put longfunc() in a separate process, then start another thread to sleep for timeout milliseconds. I can't figure out how to get result from both of them in the main thread and decide which one came first. Any suggestions on this approach or other approaches are appreciated.

You can use time.perf_counterand your code will see:
import time
ProcessTime = time.perf_counter #this returns nearly 0 when first call it if python version <= 3.6
ProcessTime()
def longfunc(arg, timeout):
start = ProcessTime()
while True
# Do anything
delta = start + timeout - ProcessTime()
if delta > 0:
sleep(1)
else:
return #Error or False
you can change While for a for loop an for each task, check timeout

If you are applying multiprocessing then you have to simply apply p.join(timeout=5) where p in a process
Here is a simple example
import time
from itertools import count
from multiprocessing import Process
def inc_forever():
print('Starting function inc_forever()...')
while True:
time.sleep(1)
print(next(counter))
def return_zero():
print('Starting function return_zero()...')
return 0
if __name__ == '__main__':
# counter is an infinite iterator
counter = count(0)
p1 = Process(target=inc_forever, name='Process_inc_forever')
p2 = Process(target=return_zero, name='Process_return_zero')
p1.start()
p2.start()
p1.join(timeout=5)
p2.join(timeout=5)
p1.terminate()
p2.terminate()
if p1.exitcode is None:
print(f'Oops, {p1} timeouts!')
if p2.exitcode == 0:
print(f'{p2} is luck and finishes in 5 seconds!')
I think it may help you

Python code to benchmark in flops using threading

I'm having trouble writing a benchmark code in python using threading. I was able to get my threading to work, but I can't get my object to return a value. I want to take the values and add them to a list so I can calculate the flops.
create class to carry out threading
class myThread(threading.Thread):
def calculation(self):
n=0
start=time.time()
ex_time=0
while ex_time < 30:
n+=1
end=time.time()
ex_time=end-start
return ex_time
def run(self):
t = threading.Thread(target = self.calculation)
t.start()
function to create threads
def make_threads(num):
times=[]
calcs=[]
for i in range(num):
print('start thread', i+1)
thread1=myThread()
t=thread1.start()
times.append(t)
#calcs.append(n)
#when trying to get a return value it comes back as none as seen
print(times)
#average out the times,add all the calculations to get the final numbers
#to calculate flops
time.sleep(32) #stop the menu from printing until calc finish
def main():
answer=1
while answer != 0:
answer=int(input("Please indicate how many threads to use: (Enter 0 to exit)"))
print("\n\nBenchmark test with ", answer, "threads")
make_threads(answer)
main()

Two ways to do this:
1. Using static variables (hacky, but efficient and quick)
Define some global variable that you then manipulate in the thread. I.e.:
import threading
import time
class myThread(threading.Thread):
def calculation(self):
n=0
start=time.time()
ex_time=0
print("Running....")
while ex_time < 30:
n+=1
end=time.time()
ex_time=end-start
self.myThreadValues[self.idValue] = ex_time
print(self.myThreadValues)
return ex_time
def setup(self,myThreadValues=None,idValue=None):
self.myThreadValues = myThreadValues
self.idValue = idValue
def run(self):
self.calculation()
#t = threading.Thread(target = self.calculation)
#t.start()
def make_threads(num):
threads=[]
calcs=[]
myThreadValues = {}
for i in range(num):
print('start thread', i+1)
myThreadValues[i] = 0
thread1=myThread()
thread1.setup(myThreadValues,i)
thread1.start()
#times.append(t)
threads.append(thread1)
# Now we need to wait for all the threads to finish. There are a couple ways to do this, but the best is joining.
print("joining all threads...")
for thread in threads:
thread.join()
#calcs.append(n)
#when trying to get a return value it comes back as none as seen
print("Final thread values: " + str(myThreadValues))
print("Done")
#average out the times,add all the calculations to get the final numbers
#to calculate flops
#time.sleep(32) #stop the menu from printing until calc finish
def main():
answer=1
while answer != 0:
answer=int(input("Please indicate how many threads to use: (Enter 0 to exit)"))
print("\n\nBenchmark test with ", answer, "threads")
make_threads(answer)
main()
2. The proper way to do this is with Processes
Processes are designed for passing information back and forth, versus threads which are commonly used for async work. See explanation here: https://docs.python.org/3/library/multiprocessing.html
See this answer: How can I recover the return value of a function passed to multiprocessing.Process?
import multiprocessing
from os import getpid
def worker(procnum):
print 'I am number %d in process %d' % (procnum, getpid())
return getpid()
if __name__ == '__main__':
pool = multiprocessing.Pool(processes = 3)
print pool.map(worker, range(5))

python queue.join() dosen't work

I'm trying to do a multiprocessing computation in python but from some reason i can't control the number of new created processes. after few seconds the IDE is getting crushed under hundreds of new processes.
here is the problematic code (start with expectiEvaluation at the bottom):
def serialExpecti(state, level):
allPieces = [L(), T(), O(), Z(), I(), J(), S()]
if level == 0:
return expectiHelper(state) / len(allPieces)
queueStates = util.PriorityQueueWithFunction(expectiHelper)
evaluation = 0;
for i in range(len(allPieces)):
succsecors = findAllPossiblePositions(state, allPieces[i])
for curState in succsecors:
queueStates.push(curState)
bestState = queueStates.pop()
evaluation += serialExpecti(bestState, level + 1)
queueStates = util.PriorityQueueWithFunction(expectiHelper) # clear queue
# print evaluation
return evaluation
def parallelExpecti(state, queue):
print os.getpid()
queue.put(serialExpecti(state, 0))
def expectiEvaluation(state):
allPieces = [L(), T(), O(), Z(), I(), J(), S()]
queue = multiprocessing.JoinableQueue()
queueStates = util.PriorityQueueWithFunction(expectiHelper)
for i in range(len(allPieces)):
succsecors = findAllPossiblePositions(state, allPieces[i])
for curState in succsecors:
queueStates.push(curState)
bestState = queueStates.pop()
p = multiprocessing.Process(target=parallelExpecti, args=(bestState, queue,))
p.start()
queue.join()
evaluation = 0;
while not queue.empty():
evaluation += queue.get()
return evaluation
the computation in serialExpecti() is very heavy so i wanted to make it parallel.
the function expectiEvaluation() is being called from the main thread many times during the run of the program, and i think that from some reason the queue.join() doesn't block the execution so that the main thread keeps ruining and calling expectiEvaluation() causing it to create more and more processes.
what could be the problem?
UPDATE:
I tried to get rid of the queue.join() by just keeping all the processes in a list and joining them at the end of the the function and it worked...sort of. now the processes are under control and not being created like crazy, but the programme is running very very slow. much much slower then the pure serial version. what am i doing wrong?
maybe it will help if i put here the serial version:
def expectiHelper(state):
return 10*holesEvaluation(state) + fullRowEvaluation(state) + heightEvaluation(state) + maxLengthEvaluation(state) + averageEvaluation(state)+ medianEvalutaion(state)
def expectiEvaluation(state, level=0):
global problem
allPieces=[L(), T(), O(), Z(), I(), J(), S()]
if level==3:
return expectiHelper(state)/len(allPieces)
queueStates = util.PriorityQueueWithFunction(expectiHelper)
evaluation=0;
for i in range(len(allPieces)):
succsecors = problem.findAllPossiblePositions(state,allPieces[i])
for curState in succsecors:
queueStates.push(curState)
bestState = queueStates.pop()
evaluation+= expectiEvaluation(bestState, level+1) #add best evaluation for this piece
queueStates = util.PriorityQueueWithFunction(expectiHelper) #clear queue
# print evaluation
return evaluation
as you can see it is a recursive function and every iteration in this loop is independent and very very heavy, so i it could be doen in parallel:
for i in range(len(allPieces)):

How to increment a shared counter from multiple processes?

I am having troubles with the multiprocessing module. I am using a Pool of workers with its map method to concurrently analyze lots of files. Each time a file has been processed I would like to have a counter updated so that I can keep track of how many files remains to be processed. Here is sample code:
import os
import multiprocessing
counter = 0
def analyze(file):
# Analyze the file.
global counter
counter += 1
print counter
if __name__ == '__main__':
files = os.listdir('/some/directory')
pool = multiprocessing.Pool(4)
pool.map(analyze, files)
I cannot find a solution for this.

The problem is that the counter variable is not shared between your processes: each separate process is creating it's own local instance and incrementing that.
See this section of the documentation for some techniques you can employ to share state between your processes. In your case you might want to share a Value instance between your workers
Here's a working version of your example (with some dummy input data). Note it uses global values which I would really try to avoid in practice:
from multiprocessing import Pool, Value
from time import sleep
counter = None
def init(args):
''' store the counter for later use '''
global counter
counter = args
def analyze_data(args):
''' increment the global counter, do something with the input '''
global counter
# += operation is not atomic, so we need to get a lock:
with counter.get_lock():
counter.value += 1
print counter.value
return args * 10
if __name__ == '__main__':
#inputs = os.listdir(some_directory)
#
# initialize a cross-process counter and the input lists
#
counter = Value('i', 0)
inputs = [1, 2, 3, 4]
#
# create the pool of workers, ensuring each one receives the counter
# as it starts.
#
p = Pool(initializer = init, initargs = (counter, ))
i = p.map_async(analyze_data, inputs, chunksize = 1)
i.wait()
print i.get()

Counter class without the race-condition bug:
class Counter(object):
def __init__(self):
self.val = multiprocessing.Value('i', 0)
def increment(self, n=1):
with self.val.get_lock():
self.val.value += n
#property
def value(self):
return self.val.value

A extremly simple example, changed from jkp's answer:
from multiprocessing import Pool, Value
from time import sleep
counter = Value('i', 0)
def f(x):
global counter
with counter.get_lock():
counter.value += 1
print("counter.value:", counter.value)
sleep(1)
return x
with Pool(4) as p:
r = p.map(f, range(1000*1000))

Faster Counter class without using the built-in lock of Value twice
class Counter(object):
def __init__(self, initval=0):
self.val = multiprocessing.RawValue('i', initval)
self.lock = multiprocessing.Lock()
def increment(self):
with self.lock:
self.val.value += 1
#property
def value(self):
return self.val.value
https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.Value
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.RawValue

Here is a solution to your problem based on a different approach from that proposed in the other answers. It uses message passing with multiprocessing.Queue objects (instead of shared memory with multiprocessing.Value objects) and process-safe (atomic) built-in increment and decrement operators += and -= (instead of introducing custom increment and decrement methods) since you asked for it.
First, we define a class Subject for instantiating an object that will be local to the parent process and whose attributes are to be incremented or decremented:
import multiprocessing
class Subject:
def __init__(self):
self.x = 0
self.y = 0
Next, we define a class Proxy for instantiating an object that will be the remote proxy through which the child processes will request the parent process to retrieve or update the attributes of the Subject object. The interprocess communication will use two multiprocessing.Queue attributes, one for exchanging requests and one for exchanging responses. Requests are of the form (sender, action, *args) where sender is the sender name, action is the action name ('get', 'set', 'increment', or 'decrement' the value of an attribute), and args is the argument tuple. Responses are of the form value (to 'get' requests):
class Proxy(Subject):
def __init__(self, request_queue, response_queue):
self.__request_queue = request_queue
self.__response_queue = response_queue
def _getter(self, target):
sender = multiprocessing.current_process().name
self.__request_queue.put((sender, 'get', target))
return Decorator(self.__response_queue.get())
def _setter(self, target, value):
sender = multiprocessing.current_process().name
action = getattr(value, 'action', 'set')
self.__request_queue.put((sender, action, target, value))
#property
def x(self):
return self._getter('x')
#property
def y(self):
return self._getter('y')
#x.setter
def x(self, value):
self._setter('x', value)
#y.setter
def y(self, value):
self._setter('y', value)
Then, we define the class Decorator to decorate the int objects returned by the getters of a Proxy object in order to inform its setters whether the increment or decrement operators += and -= have been used by adding an action attribute, in which case the setters request an 'increment' or 'decrement' operation instead of a 'set' operation. The increment and decrement operators += and -= call the corresponding augmented assignment special methods __iadd__ and __isub__ if they are defined, and fall back on the assignment special methods __add__ and __sub__ which are always defined for int objects (e.g. proxy.x += value is equivalent to proxy.x = proxy.x.__iadd__(value) which is equivalent to proxy.x = type(proxy).x.__get__(proxy).__iadd__(value) which is equivalent to type(proxy).x.__set__(proxy, type(proxy).x.__get__(proxy).__iadd__(value))):
class Decorator(int):
def __iadd__(self, other):
value = Decorator(other)
value.action = 'increment'
return value
def __isub__(self, other):
value = Decorator(other)
value.action = 'decrement'
return value
Then, we define the function worker that will be run in the child processes and request the increment and decrement operations:
def worker(proxy):
proxy.x += 1
proxy.y -= 1
Finally, we define a single request queue to send requests to the parent process, and multiple response queues to send responses to the child processes:
if __name__ == '__main__':
subject = Subject()
request_queue = multiprocessing.Queue()
response_queues = {}
processes = []
for index in range(4):
sender = 'child {}'.format(index)
response_queues[sender] = multiprocessing.Queue()
proxy = Proxy(request_queue, response_queues[sender])
process = multiprocessing.Process(
target=worker, args=(proxy,), name=sender)
processes.append(process)
running = len(processes)
for process in processes:
process.start()
while subject.x != 4 or subject.y != -4:
sender, action, *args = request_queue.get()
print(sender, 'requested', action, *args)
if action == 'get':
response_queues[sender].put(getattr(subject, args[0]))
elif action == 'set':
setattr(subject, args[0], args[1])
elif action == 'increment':
setattr(subject, args[0], getattr(subject, args[0]) + args[1])
elif action == 'decrement':
setattr(subject, args[0], getattr(subject, args[0]) - args[1])
for process in processes:
process.join()
The program is guaranteed to terminate when += and -= are process-safe. If you remove process-safety by commenting the corresponding __iadd__ or __isub__ of Decorator then the program will only terminate by chance (e.g. proxy.x += value is equivalent to proxy.x = proxy.x.__iadd__(value) but falls back to proxy.x = proxy.x.__add__(value) if __iadd__ is not defined, which is equivalent to proxy.x = proxy.x + value which is equivalent to proxy.x = type(proxy).x.__get__(proxy) + value which is equivalent to type(proxy).x.__set__(proxy, type(proxy).x.__get__(proxy) + value), so the action attribute is not added and the setter requests a 'set' operation instead of an 'increment' operation).
Example process-safe session (atomic += and -=):
child 0 requested get x
child 0 requested increment x 1
child 0 requested get y
child 0 requested decrement y 1
child 3 requested get x
child 3 requested increment x 1
child 3 requested get y
child 2 requested get x
child 3 requested decrement y 1
child 1 requested get x
child 2 requested increment x 1
child 2 requested get y
child 2 requested decrement y 1
child 1 requested increment x 1
child 1 requested get y
child 1 requested decrement y 1
Example process-unsafe session (non-atomic += and -=):
child 2 requested get x
child 1 requested get x
child 0 requested get x
child 2 requested set x 1
child 2 requested get y
child 1 requested set x 1
child 1 requested get y
child 2 requested set y -1
child 1 requested set y -1
child 0 requested set x 1
child 0 requested get y
child 0 requested set y -2
child 3 requested get x
child 3 requested set x 2
child 3 requested get y
child 3 requested set y -3 # the program stalls here

A more sophisticated solution based on the lock-free atomic operations, as given by example on atomics library README:
from multiprocessing import Process, shared_memory
import atomics
def fn(shmem_name: str, width: int, n: int) -> None:
shmem = shared_memory.SharedMemory(name=shmem_name)
buf = shmem.buf[:width]
with atomics.atomicview(buffer=buf, atype=atomics.INT) as a:
for _ in range(n):
a.inc()
del buf
shmem.close()
if __name__ == "__main__":
# setup
width = 4
shmem = shared_memory.SharedMemory(create=True, size=width)
buf = shmem.buf[:width]
total = 10_000
# run processes to completion
p1 = Process(target=fn, args=(shmem.name, width, total // 2))
p2 = Process(target=fn, args=(shmem.name, width, total // 2))
p1.start(), p2.start()
p1.join(), p2.join()
# print results and cleanup
with atomics.atomicview(buffer=buf, atype=atomics.INT) as a:
print(f"a[{a.load()}] == total[{total}]")
del buf
shmem.close()
shmem.unlink()
(atomics could be installed via pip install atomics on most of the major platforms)

This is a different solution and the simplest to my taste.
The reasoning is you create an empty list and append to it each time your function executes , then print len(list) to check progress.
Here is an example based on your code :
import os
import multiprocessing
counter = []
def analyze(file):
# Analyze the file.
counter.append(' ')
print len(counter)
if __name__ == '__main__':
files = os.listdir('/some/directory')
pool = multiprocessing.Pool(4)
pool.map(analyze, files)
For future visitors, the hack to add counter to multiprocessing is as follow :
from multiprocessing.pool import ThreadPool
counter = []
def your_function():
# function/process
counter.append(' ') # you can append anything
return len(counter)
pool = ThreadPool()
result = pool.map(get_data, urls)
Hope this will help.

I'm working on a process bar in PyQT5, so I use thread and pool together
import threading
import multiprocessing as mp
from queue import Queue
def multi(x):
return x*x
def pooler(q):
with mp.Pool() as pool:
count = 0
for i in pool.imap_unordered(ggg, range(100)):
print(count, i)
count += 1
q.put(count)
def main():
q = Queue()
t = threading.Thread(target=thr, args=(q,))
t.start()
print('start')
process = 0
while process < 100:
process = q.get()
print('p',process)
if __name__ == '__main__':
main()
this I put in Qthread worker and it works with acceptable latency

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems with Pythons Multiprocessing Process Class - python

Related

Time out a function if it is taking more than 1 minutes in python

Use timeout to return if function has not finished

Python code to benchmark in flops using threading

python queue.join() dosen't work

How to increment a shared counter from multiple processes?

Categories

Resources