Why is multiprocessing performance is invisible?

Why is multiprocessing performance is invisible? - python

I saw the reference here, and tried to use the method for my for loop, but it seems not working as expected.
def concatMessage(obj_grab, content):
for logCatcher in obj_grab:
for key in logCatcher.dic_map:
regex = re.compile(key)
for j in range(len(content)):
for m in re.finditer(regex, content[j]):
content[j] += " " + logCatcher.index + " " + logCatcher.dic_map[key]
return content
def transferConcat(args):
return concatMessage(*args)
if __name__ == "__name__":
pool = Pool()
content = pool.map(transferConcat, [(obj_grab, content)])[0]
pool.close()
pool.join()
I want to enhance the performance of for loop because it takes 22 seconds to run.
When I run the method directly, it also takes about 22 seconds.
It seems the enhancement has failed.
What should I do to enhance my for loop speed?
Why is pool.map not working in my case?
After remind by nablahero, I revised my code as below:
if __name__ == "__main__":
content = input_file(target).split("\n")
content = manager.list(content)
for files in source:
obj_grab.append((LogCatcher(files), content))
pool = Pool()
pool.map(transferConcat, obj_grab)
pool.close()
pool.join()
def concatMessage(LogCatcher, content):
for key in LogCatcher.dic_map:
regex = re.compile(key)
for j in range(len(content)):
for m in re.finditer(regex, content[j]):
content[j] += LogCatcher.index + LogCatcher.dic_map[key]
def transferConcat(args):
return concatMessage(*args)
after the long waiting, it caused 82 secs to finish...
Why I got this situation? How can I revise my code?
obj_grab is a list, which contains logCatchers of different file intput
content is the file I want to concat, and use Manager() to let multiprocess concat the same file.

What's in obj_grab and content? I guess it only contains one object so when you're starting your Pool you call the function transferConcat only once because you only got one object in obj_grab and content.
If you use map have a look at your reference again. obj_grab and content must be lists of objects in order to speed your program up, because it call the function multiple times with different obj_grab and content's.
pool.map does not speed up the function itself - the function just gets called multiple times in parallel with different data!
I hope that clears some things up.

Related

Share variable in concurrent.futures

I am trying to do a word counter with mapreduce using concurrent.futures, previously I've done a multi threading version, but was so slow because is CPU bound.
I have done the mapping part to divide the words into ['word1',1], ['word2,1], ['word1,1], ['word3',1] and between the processes, so each process will take care of a part of the text file. The next step ("shuffling") is to put these words in a dictionary so that it looks like this: word1: [1,1], word2:[1], word3: [1], but I cannot share the dictionary between the processes because we are using multiprocessing instead of multithreading, so how can I make each process add the "1" to the dictionary shared between all the processes? I'm stuck with this, and I can't continue.
I am at this point:
import sys
import re
import concurrent.futures
import time
# Read text file
def input(index):
try:
reader = open(sys.argv[index], "r", encoding="utf8")
except OSError:
print("Error")
sys.exit()
texto = reader.read()
reader.close()
return texto
# Convert text to list of words
def splitting(input_text):
input_text = input_text.lower()
input_text = re.sub('[,.;:!¡?¿()]+', '', input_text)
words = input_text.split()
n_processes = 4
# Creating processes
with concurrent.futures.ProcessPoolExecutor() as executor:
results = []
for id_process in range(n_processes):
results.append(executor.submit(mapping, words, n_processes, id_process))
for f in concurrent.futures.as_completed(results):
print(f.result())
def mapping(words, n_processes, id_process):
word_map_result = []
for i in range(int((id_process / n_processes) * len(words)),
int(((id_process + 1) / n_processes) * len(words))):
word_map_result.append([words[i], 1])
return word_map_result
if __name__ == '__main__':
if len(sys.argv) == 1:
print("Please, specify a text file...")
sys.exit()
start_time = time.time()
for index in range(1, len(sys.argv)):
print(sys.argv[index], ":", sep="")
text = input(index)
splitting(text)
# for word in result_dictionary_words:
# print(word, ':', result_dictionary_words[word])
print("--- %s seconds ---" % (time.time() - start_time))
I've seen that when doing concurrent programming it is usually best to avoid using shared state as far as possible, so how I can implement Map reduce word count without share the dictionary between processes?

You can create a shared dictionary using a Manager from multiprocessing. I understand from your program that it is your word_map_result you need to share.
You could try something like this
from multiprocessing import Manager
...
def splitting():
...
word_map_result = Manager().dict()
with concurrent.futures.....:
...
results.append(executor.submit(mapping, words, n_processes, id_process, word_map_result)
...
...
def mapping(words, n_processes, id_process, word_map_result):
for ...
# Do not return anything - word_map_result is up to date in your main process
Basically you will remove the local copy of word_map_result from your mapping function and pass it the Manager instance as a parameter. This word_map_result is now shared between all your subprocesses and the main program. Managers add data transfer overhead, though, so this might not help you very much.
In this case you do not return anything from the workers so you do not need the for loop to process results either in your main program - your word_map_result is identical in all subprocesses and the main program.
I may have misunderstood your problem and I am not familiar with the algorithm if it is possible to re-engineer that to work so that you don't need to share anything between processes.

It seems like a misconception to be using multiprocessing at all. First, there is overhead in creating the pool and overhead in passing data to and from the processes. And if you decide to use a shared, managed dictionary that worker function mapping can use to store its results in, know that a managed dictionary uses a proxy, the accessing of which is rather slow. The alternative to using a managed dictionary would be as you currently have it, i.e. mapping returns a list and the main process uses those results to create the keys and values of the dictionary. But what then is the point of mapping returning a list where each element is always a list of two elements where the second element is always the constant value 1? Isn't that rather wasteful of time and space?
I think your performance will be no faster (probably slower) than just implementing splitting as:
# Convert text to list of words
def splitting(input_text):
input_text = input_text.lower()
input_text = re.sub('[,.;:!¡?¿()]+', '', input_text)
words = input_text.split()
results = {}
for word in words:
results[word] = [1]
return results

How to alternate iterations in multithreading a single function with python

I have a python function that is pretty complex that im trying to run vs around 100 or so different NYSE symbols on the stock market. Right now it takes around 5 minutes to complete. This is not a terrible amount of time, but im trying to make it quicker by multithreading. My idea is that, since this is a single function, im just passing new parameters each iteration, it would maybe work to store a list of symbols that have "completed" then on a new iteration it just runs through the list and if a symbol doesnt exists in the list, it runs the computation. Heres some code i put together:
iteration_count = 0
for index, row in stocklist_df.iterrows():
# The below just filters input data
if '-' in row[0]:
continue
elif '.' in row[0]:
continue
elif '^' in row[0]:
continue
elif len(row[0]) > 4:
continue
else:
symbol = row[0]
#idea is that on first iteration it runs on first thread and appends to threadlist
#second iteration looks at threadlist and if symbol exists, then skips and goes to the next
threadlist.append([symbol, iteration_count])
t1 = threading.Thread(target=get_info(symbol, 0))
t1.start()
if iteration_count > 1:
t2 = threading.Thread(target=get_info(symbol, 0))
t2.start()
Right now this doesnt appear to be working, and im not sure this is the best solution or maybe im implementing it wrong. How can i achieve this task?

I confess to having had some difficulty following your logic. I will just offer up that the usual method of handling threading when you have multiple, similar requests is to use thread pooling. There are several ways offered by Python, such as the ThreadpoolExecutor class concurrent.futures module (see the manual for documentation). The following is an example. Here function get_info essentially just returns its argument:
import concurrent.futures
def get_info(symbol):
return 'answer: ' + symbol
symbols = ['abc', 'def', 'ghi', 'jkl']
NUMBER_THREADS = min(30, len(symbols))
with concurrent.futures.ThreadPoolExecutor(max_workers=NUMBER_THREADS) as executor:
results = executor.map(get_info, symbols)
for result in results:
print(result)
Prints:
answer: abc
answer: def
answer: ghi
answer: jkl
You can play around with the number of threads you create. If you are using for example, the requests package to retrieve URLs from the same website, then you might wish to create a requests Session object and pass that as an additional argument to get_info:
import concurrent.futures
import requests
import functools
def get_info(session, symbol):
"""
r = session.get('https://somewebsite.com?symbol=' + symbol)
return r.text
"""
return 'symbol answer: ' + symbol
symbols = ['abc', 'def', 'ghi', 'jkl']
NUMBER_THREADS = min(30, len(symbols))
with requests.Session() as session:
get_info_with_session = functools.partial(get_info, session) # this will be the first argument
with concurrent.futures.ThreadPoolExecutor(max_workers=NUMBER_THREADS) as executor:
results = executor.map(get_info_with_session, symbols)
for result in results:
print(result)

Multiprocessing Running Slower than a Single Process

I'm attempting to use multiprocessing to run many simulations across multiple processes; however, the code I have written only uses 1 of the processes as far as I can tell.
Updated
I've gotten all the processes to work (I think) thanks to #PaulBecotte ; however, the multiprocessing seems to run significantly slower than its non-multiprocessing counterpart.
For instance, not including the function and class declarations/implementations and imports, I have:
def monty_hall_sim(num_trial, player_type='AlwaysSwitchPlayer'):
if player_type == 'NeverSwitchPlayer':
player = NeverSwitchPlayer('Never Switch Player')
else:
player = AlwaysSwitchPlayer('Always Switch Player')
return (MontyHallGame().play_game(player) for trial in xrange(num_trial))
def do_work(in_queue, out_queue):
while True:
try:
f, args = in_queue.get()
ret = f(*args)
for result in ret:
out_queue.put(result)
except:
break
def main():
logging.getLogger().setLevel(logging.ERROR)
always_switch_input_queue = multiprocessing.Queue()
always_switch_output_queue = multiprocessing.Queue()
total_sims = 20
num_processes = 5
process_sims = total_sims/num_processes
with Timer(timer_name='Always Switch Timer'):
for i in xrange(num_processes):
always_switch_input_queue.put((monty_hall_sim, (process_sims, 'AlwaysSwitchPlayer')))
procs = [multiprocessing.Process(target=do_work, args=(always_switch_input_queue, always_switch_output_queue)) for i in range(num_processes)]
for proc in procs:
proc.start()
always_switch_res = []
while len(always_switch_res) != total_sims:
always_switch_res.append(always_switch_output_queue.get())
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\tLength of Always Switch Result List: {alw_sw_len}'.format(alw_sw_len=len(always_switch_res))
print '\tThe success average of switching doors was: {alw_sw_prob}'.format(alw_sw_prob=always_switch_success)
which yields:
Time Elapsed: 1.32399988174 seconds
Length: 20
The success average: 0.6
However, I am attempting to use this for total_sims = 10,000,000 over num_processes = 5, and doing so has taken significantly longer than using 1 process (1 process returned in ~3 minutes). The non-multiprocessing counterpart I'm comparing it to is:
def main():
logging.getLogger().setLevel(logging.ERROR)
with Timer(timer_name='Always Switch Monty Hall Timer'):
always_switch_res = [MontyHallGame().play_game(AlwaysSwitchPlayer('Monty Hall')) for x in xrange(10000000)]
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\n\tThe success average of not switching doors was: {not_switching}' \
'\n\tThe success average of switching doors was: {switching}'.format(not_switching=never_switch_success,
switching=always_switch_success)

You could try import “process “ under some if statements

EDIT- you changed some stuff, let me try and explain a bit better.
Each message you put into the input queue will cause the monty_hall_sim function to get called and send num_trial messages to the output queue.
So your original implementation was right- to get 20 output messages, send in 5 input messages.
However, your function is slightly wrong.
for trial in xrange(num_trial):
res = MontyHallGame().play_game(player)
yield res
This will turn the function into a generator that will provide a new value on each next() call- great! The problem is here
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
out_queue.put(ret.next())
except:
break
Here, on each pass through the loop you create a NEW generator with a NEW message. The old one is thrown away. So here, each input message only adds a single output message to the queue before you throw it away and get another one. The correct way to write this is-
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
for result in ret:
out_queue.put(ret.next())
except:
break
Doing it this way will continue to yield output messages from the generator until it finishes (after yielding 4 messages in this case)

I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.

Multiprocessing script gets stuck

I have the following Python code:
def workPackage(args):
try:
outputdata = dict()
iterator = 1
for name in outputnames:
outputdata[name] = []
for filename in filelist:
read_data = np.genfromtxt(filename, comments="#", unpack=True, names=datacolnames, delimiter=";")
mean_va1 = np.mean(read_data["val1"])
mean_va2 = np.mean(read_data["val2"])
outputdata[outputnames[0]].append(read_data["setpoint"][0])
outputdata[outputnames[1]].append(mean_val1)
outputdata[outputnames[2]].append(mean_val2)
outputdata[outputnames[3]].append(mean_val1-mean_val2)
outputdata[outputnames[4]].append((mean_val1-mean_val2)/read_data["setpoint"][0]*100)
outputdata[outputnames[5]].append(2*np.std(read_data["val1"]))
outputdata[outputnames[6]].append(2*np.std(read_data["val2"]))
print("Process "+str(identifier+1)+": "+str(round(100*(iterator/len(filelist)),1))+"% complete")
iterator = iterator+1
queue.put (outputdata)
except:
some message
if __name__ == '__main__':
"Main script"
This code is used to evaluate a large amount of measurement data. In total I got some 900 files across multiple directories (about 13GB in total).
The main script determines all the filepaths and stores them in 4 chunks. Each chunk (list of filepaths) is given to one process.
try:
print("Distributing the workload on "+str(numberOfProcesses)+" processes...")
for i in range(0,numberOfProcesses):
q[i] = multiprocessing.Queue()
Processes[i] = multiprocessing.Process(target=workPackage, args=(filelistChunks[i], colnames, outputdatanames, i, q[i]))
Processes[i].start()
for i in range(0,numberOfProcesses):
Processes[i].join()
except:
print("Exception while processing stuff...")
After that the restuls are read from the queue and stored to an output file.
Now here's my problem:
The script starts the 4 processes and each of them runs to 100% (see the print in the workPackage function). They don't finish at the same time but within about 2 minutes.
But then the script simply stops.
If I limit the amount of data to process by simply cutting the filelist it sometimes runs until the end but sometimes doesn't.
I don't get, why the script simply gets stuck after all processes reach 100%.
I seriously don't know what's happening there.

You add items to the queue with queue.put(), then call queue.join(), but I don't see where you call queue.get() or queue.task_done(). Join won't release the thread until the queue is empty and task_done() has been called on each item.

python parallel calculations in a while loop

I have been looking around for some time, but haven't had luck finding an example that could solve my problem. I have added an example from my code. As one can notice this is slow and the 2 functions could be done separately.
My aim is to print every second the latest parameter values. At the same time the slow processes can be calculated in the background. The latest value is shown and when any process is ready the value is updated.
Can anybody recommend a better way to do it? An example would be really helpful.
Thanks a lot.
import time
def ProcessA(parA):
# imitate slow process
time.sleep(5)
parA += 2
return parA
def ProcessB(parB):
# imitate slow process
time.sleep(10)
parB += 5
return parB
# start from here
i, parA, parB = 1, 0, 0
while True: # endless loop
print(i)
print(parA)
print(parB)
time.sleep(1)
i += 1
# update parameter A
parA = ProcessA(parA)
# update parameter B
parB = ProcessB(parB)

I imagine this should do it for you. This has the benefit of you being able to add extra parallel funcitons up to a total equal to the number of cores you have. Edits are welcome.
#import time module
import time
#import the appropriate multiprocessing functions
from multiprocessing import Pool
#define your functions
#whatever your slow function is
def slowFunction(x):
return someFunction(x)
#printingFunction
def printingFunction(new,current,timeDelay):
while new == current:
print current
time.sleep(timeDelay)
#set the initial value that will be printed.
#Depending on your function this may take some time.
CurrentValue = slowFunction(someTemporallyDynamicVairable)
#establish your pool
pool = Pool()
while True: #endless loop
#an asynchronous function, this will continue
# to run in the background while your printing operates.
NewValue = pool.apply_async(slowFunction(someTemporallyDynamicVairable))
pool.apply(printingFunction(NewValue,CurrentValue,1))
CurrentValue = NewValue
#close your pool
pool.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is multiprocessing performance is invisible? - python

Related

Share variable in concurrent.futures

How to alternate iterations in multithreading a single function with python

Multiprocessing Running Slower than a Single Process

Multiprocessing script gets stuck

python parallel calculations in a while loop

Categories

Resources