Multiprocessing script gets stuck - python

I have the following Python code:
def workPackage(args):
try:
outputdata = dict()
iterator = 1
for name in outputnames:
outputdata[name] = []
for filename in filelist:
read_data = np.genfromtxt(filename, comments="#", unpack=True, names=datacolnames, delimiter=";")
mean_va1 = np.mean(read_data["val1"])
mean_va2 = np.mean(read_data["val2"])
outputdata[outputnames[0]].append(read_data["setpoint"][0])
outputdata[outputnames[1]].append(mean_val1)
outputdata[outputnames[2]].append(mean_val2)
outputdata[outputnames[3]].append(mean_val1-mean_val2)
outputdata[outputnames[4]].append((mean_val1-mean_val2)/read_data["setpoint"][0]*100)
outputdata[outputnames[5]].append(2*np.std(read_data["val1"]))
outputdata[outputnames[6]].append(2*np.std(read_data["val2"]))
print("Process "+str(identifier+1)+": "+str(round(100*(iterator/len(filelist)),1))+"% complete")
iterator = iterator+1
queue.put (outputdata)
except:
some message
if __name__ == '__main__':
"Main script"
This code is used to evaluate a large amount of measurement data. In total I got some 900 files across multiple directories (about 13GB in total).
The main script determines all the filepaths and stores them in 4 chunks. Each chunk (list of filepaths) is given to one process.
try:
print("Distributing the workload on "+str(numberOfProcesses)+" processes...")
for i in range(0,numberOfProcesses):
q[i] = multiprocessing.Queue()
Processes[i] = multiprocessing.Process(target=workPackage, args=(filelistChunks[i], colnames, outputdatanames, i, q[i]))
Processes[i].start()
for i in range(0,numberOfProcesses):
Processes[i].join()
except:
print("Exception while processing stuff...")
After that the restuls are read from the queue and stored to an output file.
Now here's my problem:
The script starts the 4 processes and each of them runs to 100% (see the print in the workPackage function). They don't finish at the same time but within about 2 minutes.
But then the script simply stops.
If I limit the amount of data to process by simply cutting the filelist it sometimes runs until the end but sometimes doesn't.
I don't get, why the script simply gets stuck after all processes reach 100%.
I seriously don't know what's happening there.

You add items to the queue with queue.put(), then call queue.join(), but I don't see where you call queue.get() or queue.task_done(). Join won't release the thread until the queue is empty and task_done() has been called on each item.

Related

Multiprocessing Running Slower than a Single Process

I'm attempting to use multiprocessing to run many simulations across multiple processes; however, the code I have written only uses 1 of the processes as far as I can tell.
Updated
I've gotten all the processes to work (I think) thanks to #PaulBecotte ; however, the multiprocessing seems to run significantly slower than its non-multiprocessing counterpart.
For instance, not including the function and class declarations/implementations and imports, I have:
def monty_hall_sim(num_trial, player_type='AlwaysSwitchPlayer'):
if player_type == 'NeverSwitchPlayer':
player = NeverSwitchPlayer('Never Switch Player')
else:
player = AlwaysSwitchPlayer('Always Switch Player')
return (MontyHallGame().play_game(player) for trial in xrange(num_trial))
def do_work(in_queue, out_queue):
while True:
try:
f, args = in_queue.get()
ret = f(*args)
for result in ret:
out_queue.put(result)
except:
break
def main():
logging.getLogger().setLevel(logging.ERROR)
always_switch_input_queue = multiprocessing.Queue()
always_switch_output_queue = multiprocessing.Queue()
total_sims = 20
num_processes = 5
process_sims = total_sims/num_processes
with Timer(timer_name='Always Switch Timer'):
for i in xrange(num_processes):
always_switch_input_queue.put((monty_hall_sim, (process_sims, 'AlwaysSwitchPlayer')))
procs = [multiprocessing.Process(target=do_work, args=(always_switch_input_queue, always_switch_output_queue)) for i in range(num_processes)]
for proc in procs:
proc.start()
always_switch_res = []
while len(always_switch_res) != total_sims:
always_switch_res.append(always_switch_output_queue.get())
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\tLength of Always Switch Result List: {alw_sw_len}'.format(alw_sw_len=len(always_switch_res))
print '\tThe success average of switching doors was: {alw_sw_prob}'.format(alw_sw_prob=always_switch_success)
which yields:
Time Elapsed: 1.32399988174 seconds
Length: 20
The success average: 0.6
However, I am attempting to use this for total_sims = 10,000,000 over num_processes = 5, and doing so has taken significantly longer than using 1 process (1 process returned in ~3 minutes). The non-multiprocessing counterpart I'm comparing it to is:
def main():
logging.getLogger().setLevel(logging.ERROR)
with Timer(timer_name='Always Switch Monty Hall Timer'):
always_switch_res = [MontyHallGame().play_game(AlwaysSwitchPlayer('Monty Hall')) for x in xrange(10000000)]
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\n\tThe success average of not switching doors was: {not_switching}' \
'\n\tThe success average of switching doors was: {switching}'.format(not_switching=never_switch_success,
switching=always_switch_success)
You could try import “process “ under some if statements
EDIT- you changed some stuff, let me try and explain a bit better.
Each message you put into the input queue will cause the monty_hall_sim function to get called and send num_trial messages to the output queue.
So your original implementation was right- to get 20 output messages, send in 5 input messages.
However, your function is slightly wrong.
for trial in xrange(num_trial):
res = MontyHallGame().play_game(player)
yield res
This will turn the function into a generator that will provide a new value on each next() call- great! The problem is here
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
out_queue.put(ret.next())
except:
break
Here, on each pass through the loop you create a NEW generator with a NEW message. The old one is thrown away. So here, each input message only adds a single output message to the queue before you throw it away and get another one. The correct way to write this is-
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
for result in ret:
out_queue.put(ret.next())
except:
break
Doing it this way will continue to yield output messages from the generator until it finishes (after yielding 4 messages in this case)
I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.

How to run two functions in parallel in python when one collects data from the other?

I'm new to multiprocessing and want to collect data in one function and write data in another function simultaneously. Here's a psuedocode of what I have.
def Read_Data():
for num in range(0,5):
## Read 5 random values
print('Reading ' + str(values[num]))
return(values)
def Write_data(values):
## Arrange the random values in ascending order
for num in range(0,5):
print('Writing' + str(arranged_values[num]))
if __name__=='__main__'
values = Read_Data()
Write_data(values)
I want the output to look like this.
reading 1, writing none
reading 3, writing none
reading 5, writing none
reading 4, writing none
reading 2, writing none
reading 7, writing 1
reading 8, writing 2
reading 10, writing 3
reading 9, writing 4
reading 6, writing 5
Now the reason I want it to run parallel is to make sure I'm collecting data all the time and not losing data while I'm modifying and printing.
How can I do it using multiprocessing?
This should illustrate a few concepts. The queue is used to pass objects between the processes.
The reader simply gets its value somewhere and places it on the queue.
The writer listens on the queue forever.
Adding a "TERMINATED" signal is a very simple way of telling the writer to stop listening forever (there are other more effective ways using signals and events but this just illustrates the concept).
At the end we "join" on the two processes to make sure they exit before we exit the main process (otherwise they are left hanging in space and time)
from multiprocessing import Process, Queue
from time import sleep
def reader(q):
for i in range(10):
print("Reading", i)
q.put(i)
sleep(1)
print("Reading TERMINATED")
q.put("TERMINATE")
def writer(q):
while True:
i = q.get()
if i == "TERMINATE":
print("Writer TERMINATED")
break
print("Writing", i)
q = Queue()
pr = Process(target=reader, args=(q,))
pw = Process(target=writer, args=(q,))
pw.start()
pr.start()
pw.join()
pr.join()

How can I prevent values from overlapping in a Python multiprocessing?

I'm trying Python multiprocessing, and I want to use Lock to avoid overlapping variable 'es_id' values.
According to theory and examples, when a process calls lock, 'es_id' can't overlap because another process can't access it, but, the results show that es_id often overlaps.
How can the id values not overlap?
Part of my code is:
def saveDB(imgName, imgType, imgStar, imgPull, imgTag, lock): #lock=Lock() in main
imgName=NameFormat(imgName) #name/subname > name:subname
i=0
while i < len(imgName):
lock.acquire() #since global es_id
global es_id
print "getIMG.pt:save information about %s"%(imgName[i])
cmd="curl -XPUT http://localhost:9200/kimhk/imgName/"+str(es_id)+" -d '{" +\
'"image_name":"'+imgName[i]+'", '+\
'"image_type":"'+imgType[i]+'", '+\
'"image_star":"'+imgStar[i]+'", '+\
'"image_pull":"'+imgPull[i]+'", '+\
'"image_Tag":"'+",".join(imgTag[i])+'"'+\
"}'"
try:
subprocess.call(cmd,shell=True)
except subprocess.CalledProcessError as e:
print e.output
i+=1
es_id+=1
lock.release()
...
#main
if __name__ == "__main__":
lock = Lock()
exPg, proc_num=option()
procs=[]
pages=[ [] for i in range(proc_num)]
i=1
#Use Multiprocessing to get HTML data quickly
if proc_num >= exPg: #if page is less than proc_num, don't need to distribute the page to the process.
while i<=exPg:
page=i
proc=Process(target=getExplore, args=(page,lock,))
procs.append(proc)
proc.start()
i+=1
else:
while i<=exPg: #distribute the page to the process
page=i
index=(i-1)%proc_num #if proc_num=4 -> 0 1 2 3
pages[index].append(page)
i+=1
i=0
while i<proc_num:
proc=Process(target=getExplore, args=(pages[i],lock,))#
procs.append(proc)
proc.start()
i+=1
for proc in procs:
proc.join()
execution result screen:
result is the output of subprocess.call (cmd, shell = True). I use XPUT to add data to ElasticSearch, and es_id is the id of the data. I want these id to increase sequentially without overlap. (Because they will be overwritten by the previous data if they overlap)
I know XPOST doesn't need to use a lock code because it automatically generates an ID, but I need to access all the data sequentially in the future (like reading one line of files).
If you know how to access all the data sequentially after using XPOST, can you tell me?
It looks like you are trying to access a global variable with a lock, but global variables are different instances between processes. What you need to use is a shared memory value. Here's a working example. It has been tested on Python 2.7 and 3.6:
from __future__ import print_function
import multiprocessing as mp
def process(counter):
# Increment the counter 3 times.
# Hold the counter's lock for read/modify/write operations.
# Keep holding it so the value doesn't change before printing,
# and keep prints from multiple processes from trying to write
# to a line at the same time.
for _ in range(3):
with counter.get_lock():
counter.value += 1
print(mp.current_process().name,counter.value)
def main():
counter = mp.Value('i') # shared integer
processes = [mp.Process(target=process,args=(counter,)) for i in range(3)]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
Output:
Process-2 1
Process-2 2
Process-1 3
Process-3 4
Process-2 5
Process-1 6
Process-3 7
Process-1 8
Process-3 9
You've only given part of your code, so I can only see a potential problem. It doesn't do any good to lock-protect one access to es_id. You must lock-protect them all, anywhere they occur in the program. Perhaps it is best to create an access function for this purpose, like:
def increment_es_id():
global es_id
lock.acquire()
es_id += 1
lock.release()
This can be called safely from any thread.
In your code, it's a good practice to move the acquire/release calls as close together as you can make them. Here you only need to protect one variable, so you can move the acquire/release pair to just before and after the es_id += 1 statement..
Even better is to use the lock in a context manager (although in this simple case it won't make any difference):
def increment_es_id2():
global es_id
with lock:
es_id += 1

Multiprocessing passing an array of dicts through shared memory

The following code works, but it is very slow due to passing the large data sets. In the actual implementation, the speed it takes to create the process and send the data is almost the same as calculation time, so by the time the second process is created, the first process is almost finished with the calculation, making parallezation? pointless.
The code is the same as in this question Multiprocessing has cutoff at 992 integers being joined as result with the suggested change working and implemented below. However, I ran into the common problem as others with I assume, pickling large data taking a long time.
I see answers using the multiprocessing.array to pass a shared memory array. I have an array of ~4000 indexes, but each index has a dictionary with 200 key/value pairs. The data is just read by each process, some calculation is done, and then an matrix (4000x3) (with no dicts) is returned.
Answers like this Is shared readonly data copied to different processes for Python multiprocessing? use map. Is it possible to maintain the below system and implement shared memory? Is there an efficient way to send the data to each process with an array of dicts, such as wrapping the dict in some manager and then putting that inside of the multiprocessing.array ?
import multiprocessing
def main():
data = {}
total = []
for j in range(0,3000):
total.append(data)
for i in range(0,200):
data[str(i)] = i
CalcManager(total,start=0,end=3000)
def CalcManager(myData,start,end):
print 'in calc manager'
#Multi processing
#Set the number of processes to use.
nprocs = 3
#Initialize the multiprocessing queue so we can get the values returned to us
tasks = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
#Setup an empty array to store our processes
procs = []
#Divide up the data for the set number of processes
interval = (end-start)/nprocs
new_start = start
#Create all the processes while dividing the work appropriately
for i in range(nprocs):
print 'starting processes'
new_end = new_start + interval
#Make sure we dont go past the size of the data
if new_end > end:
new_end = end
#Generate a new process and pass it the arguments
data = myData[new_start:new_end]
#Create the processes and pass the data and the result queue
p = multiprocessing.Process(target=multiProcess,args=(data,new_start,new_end,result_q,i))
procs.append(p)
p.start()
#Increment our next start to the current end
new_start = new_end+1
print 'finished starting'
#Print out the results
for i in range(nprocs):
result = result_q.get()
print result
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
#MultiProcess Handling
def multiProcess(data,start,end,result_q,proc_num):
print 'started process'
results = []
temp = []
for i in range(0,22):
results.append(temp)
for j in range(0,3):
temp.append(j)
result_q.put(results)
return
if __name__== '__main__':
main()
Solved
by just putting the list of dictionaries into a manager, the problem was solved.
manager=Manager()
d=manager.list(myData)
It seems that the manager holding the list also manages the dict contained by that list. The startup time is a bit slow, so it seems data is still being copied, but its done once at the beginning and then inside of the process the data is sliced.
import multiprocessing
import multiprocessing.sharedctypes as mt
from multiprocessing import Process, Lock, Manager
from ctypes import Structure, c_double
def main():
data = {}
total = []
for j in range(0,3000):
total.append(data)
for i in range(0,100):
data[str(i)] = i
CalcManager(total,start=0,end=500)
def CalcManager(myData,start,end):
print 'in calc manager'
print type(myData[0])
manager = Manager()
d = manager.list(myData)
#Multi processing
#Set the number of processes to use.
nprocs = 3
#Initialize the multiprocessing queue so we can get the values returned to us
tasks = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
#Setup an empty array to store our processes
procs = []
#Divide up the data for the set number of processes
interval = (end-start)/nprocs
new_start = start
#Create all the processes while dividing the work appropriately
for i in range(nprocs):
new_end = new_start + interval
#Make sure we dont go past the size of the data
if new_end > end:
new_end = end
#Generate a new process and pass it the arguments
data = myData[new_start:new_end]
#Create the processes and pass the data and the result queue
p = multiprocessing.Process(target=multiProcess,args=(d,new_start,new_end,result_q,i))
procs.append(p)
p.start()
#Increment our next start to the current end
new_start = new_end+1
print 'finished starting'
#Print out the results
for i in range(nprocs):
result = result_q.get()
print len(result)
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
#MultiProcess Handling
def multiProcess(data,start,end,result_q,proc_num):
#print 'started process'
results = []
temp = []
data = data[start:end]
for i in range(0,22):
results.append(temp)
for j in range(0,3):
temp.append(j)
print len(data)
result_q.put(results)
return
if __name__ == '__main__':
main()
You may see some improvement by using a multiprocessing.Manager to store your list in a manager server, and having each child process access items from the dict by pulling them from that one shared list, rather than copying slices to each child process:
def CalcManager(myData,start,end):
print 'in calc manager'
print type(myData[0])
manager = Manager()
d = manager.list(myData)
nprocs = 3
result_q = multiprocessing.Queue()
procs = []
interval = (end-start)/nprocs
new_start = start
for i in range(nprocs):
new_end = new_start + interval
if new_end > end:
new_end = end
p = multiprocessing.Process(target=multiProcess,
args=(d, new_start, new_end, result_q, i))
procs.append(p)
p.start()
#Increment our next start to the current end
new_start = new_end+1
print 'finished starting'
for i in range(nprocs):
result = result_q.get()
print len(result)
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
This copies your entire data list to a Manager process prior to creating any of your workers. The Manager returns a Proxy object that allows shared access to the list. You then just pass the Proxy to the workers, which means their startup time will be greatly reduced, since there's no longer any need to copy slices of the data list. The downside here is that accessing the list will be slower in the children, since the access needs to go to the manager process via IPC. Whether or not this will really help performance is very dependent on exactly what work you're doing on the list in your work processes, but its worth a try, since it requires very few code changes.
Looking at your question, I assume the following:
For each item in myData, you want to return an output (a matrix of some sort)
You created a JoinableQueue (tasks) probably for holding the input, but not sure how to use it
The Code
import logging
import multiprocessing
def create_logger(logger_name):
''' Create a logger that log to the console '''
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
# create console handler and set appropriate level
ch = logging.StreamHandler()
formatter = logging.Formatter("%(processName)s %(funcName)s() %(levelname)s: %(message)s")
ch.setFormatter(formatter)
logger.addHandler(ch)
return logger
def main():
global logger
logger = create_logger(__name__)
logger.info('Main started')
data = []
for i in range(0,100):
data.append({str(i):i})
CalcManager(data,start=0,end=50)
logger.info('Main ended')
def CalcManager(myData,start,end):
logger.info('CalcManager started')
#Initialize the multiprocessing queue so we can get the values returned to us
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
# Add tasks
for i in range(start, end):
tasks.put(myData[i])
# Create processes to do work
nprocs = 3
for i in range(nprocs):
logger.info('starting processes')
p = multiprocessing.Process(target=worker,args=(tasks,results))
p.daemon = True
p.start()
# Wait for tasks completion, i.e. tasks queue is empty
try:
tasks.join()
except KeyboardInterrupt:
logger.info('Cancel tasks')
# Print out the results
print 'RESULTS'
while not results.empty():
result = results.get()
print result
logger.info('CalManager ended')
def worker(tasks, results):
while True:
try:
task = tasks.get() # one row of input
task['done'] = True # simular work being done
results.put(task) # Save the result to the output queue
finally:
# JoinableQueue: for every get(), we need a task_done()
tasks.task_done()
if __name__== '__main__':
main()
Discussion
For multiple process situation, I recommend using the logging module as it offer a few advantages:
It is thread- and process- safe; meaning you won't have situation where the output of one processes mingle together
You can configure logging to show the process name, function name--very handy for debugging
CalcManager is essentially a task manager which does the following
Creates three processes
Populate the input queue, tasks
Waits for the task completion
Prints out the result
Note that when creating processes, I mark them as daemon, meaning they will killed when the main program exits. You don't have to worry about killing them
worker is where the work is done
Each of them runs forever (while True loop)
Each time through the loop, they will get one unit of input, do some processing, then put the result in the output
After a task is done, it calls task_done() so that the main process knows when all jobs are done. I put task_done in the finally clause to ensure it will run even if an error occurred during processing

Python: subprocess memory leak

I want to run a serial program on multiple cores at the same time and I need to do that multiple time (in a loop).
I use subprocess.Popen to distribute the jobs on the processors by limiting the number of jobs to the number of available processors. I add the jobs to a list and then I check with poll() if the jobs are done, I remove them from the list and continue the submission until the total number of jobs are completed.
I have been looking on the web and found a couple of interesting scripts to do that and came out with my adapted version:
nextProc = 0
processes = []
while (len(processes) < limitProc): # Here I assume that limitProc < ncores
input = filelist[nextProc]+'.in' # filelist: list of input file
output = filelist[nextProc]+'.out' # list of output file
cwd = pathlist[nextProc] # list of paths
processes.append(subprocess.Popen(['myProgram','-i',input,'-screen',output],cwd=cwd,bufsize=-1))
nextProc += 1
time.sleep(wait)
while (len(processes) > 0): # Loop until all processes are done
time.sleep(wait)
for i in xrange(len(processes)-1, -1, -1): # Remove processes done (traverse backward)
if processes[i].poll() is not None:
del processes[i]
time.sleep(wait)
while (len(processes) < limitProc) and (nextProc < maxProcesses): # Submit new processes
output = filelist[nextProc]+'.out'
input = filelist[nextProc]+'.in'
cwd = pathlist[nextProc]
processes.append(subprocess.Popen(['myProgram','-i',input,'-screen',output],cwd=cwd,bufsize=-1))
nextProc += 1
time.sleep(wait)
print 'Jobs Done'
I run this script in a loop and the problem is that the execution time increases from one step to another. Here is the graph: http://i62.tinypic.com/2lk8f41.png
myProgram time execution is constant.
I'd be so glad if someone could explain me what is causing this leak.
Thanks a lot,
Begbi

Categories

Resources