multiprocessing is slower than thread in python - python

I have tested a multiprocess and thread in python, but multiprocess is slower than thread, and I calculate a distance using editdistance, my code like:
def calc_dist(kw, trie_word):
dists = []
while len(trie_word) != 0:
w = trie_word.pop()
dist = editdistance.eval(kw, w)
dists.append((w, dist))
return dists
if __name__ == "__main__":
word_list = [str(i) for i in range(1, 10000001)]
key_word = '2'
print("calc")
s = time.time()
with Pool(processes=4) as pool:
result = pool.apply_async(calc_dist, (key_word, word_list))
print(len(result.get()))
print("用时",time.time()-s)
Using threading:
class DistThread(threading.Thread):
def __init__(self, func, args):
super(DistThread, self).__init__()
self.func = func
self.args = args
self.dists = None
def run(self):
self.dists = self.func(*self.args)
def join(self):
super().join(self)
return self.dists
In my computer, it consumes about 118s, but thread takes about 36s, where is wrong with it?

a couple of issues:
a significant amount of time will be spent serialising the data so it can be sent to the other process while threads share the same address space so pointers can be used
your current code is only using one process to do all the calcs with multiprocessing. you need to seperate your array into "chunks" somehow so that it can be processed via multiple workers
e.g:
import time
from multiprocessing import Pool
import editdistance
def calc_one(trie_word):
return editdistance.eval(key_word, trie_word)
if __name__ == "__main__":
word_list = [str(i) for i in range(1, 10000001)]
key_word = '2'
print("calc")
s = time.time()
with Pool(processes=4) as pool:
result = pool.map(calc_one, word_list, chunksize=10000)
print(len(result))
print("time",time.time()-s)
s = time.time()
result = list(calc_one(w) for w in word_list)
print(len(result))
print("time",time.time()-s)
this relies on key_word being a global variable. for me, the version using multiple processes takes ~5.3 seconds while the second version takes ~16.9 secs. not 4 times as quick as the data still needs to be sent back and forth, but pretty good

I had a similar experience with threading and multi processing inside Python to consume CSVS that had a large amount of data. I had a small look into this and found that processing spawns multiple processes to perform tasks which can be slower than just running one threaded process since threading runs in one place. There is a more definitive answer here: Multiprocessing vs Threading Python.
Pasting answer from link incase link disappears;
The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.
Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.

Related

Multiprocessing is not executing parallel in Python

I have edited the code , currently it is working fine . But thinks it is not executing parallely or dynamically . Can anyone please check on to it
Code :
def folderStatistic(t):
j, dir_name = t
row = []
for content in dir_name.split(","):
row.append(content)
print(row)
def get_directories():
import csv
with open('CONFIG.csv', 'r') as file:
reader = csv.reader(file,delimiter = '\t')
return [col for row in reader for col in row]
def folderstatsMain():
freeze_support()
start = time.time()
pool = Pool()
worker = partial(folderStatistic)
pool.map(worker, enumerate(get_directories()))
def datatobechecked():
try:
folderstatsMain()
except Exception as e:
# pass
print(e)
if __name__ == '__main__':
datatobechecked()
Config.CSV
C:\USERS, .CSV
C:\WINDOWS , .PDF
etc.
There may be around 200 folder paths in config.csv
welcome to StackOverflow and Python programming world!
Moving on to the question.
Inside the get_directories() function you open the file in with context, get the reader object and close the file immediately after the moment you leave the context so when the time comes to use the reader object the file is already closed.
I don't want to discourage you, but if you are very new to programming do not dive into parallel programing yet. Difficulty in handling multiple threads simultaneously grows exponentially with every thread you add (pools greatly simplify this process though). Processes are even worse as they don't share memory and can't communicate with each other easily.
My advice is, try to write it as a single-thread program first. If you have it working and still need to parallelize it, isolate a single function with input file path as a parameter that does all the work and then use thread/process pool on that function.
EDIT:
From what I can understand from your code, you get directory names from the CSV file and then for each "cell" in the file you run parallel folderStatistics. This part seems correct. The problem may lay in dir_name.split(","), notice that you pass individual "cells" to the folderStatistics not rows. What makes you think it's not running paralelly?.
There is a certain amount of overhead in creating a multiprocessing pool because creating processes is, unlike creating threads, a fairly costly operation. Then those submitted tasks, represented by each element of the iterable being passed to the map method, are gathered up in "chunks" and written to a multiprocessing queue of tasks that are read by the pool processes. This data has to move from one address space to another and that has a cost associated with it. Finally when your worker function, folderStatistic, returns its result (which is None in this case), that data has to be moved from one process's address space back to the main process's address space and that too has a cost associated with it.
All of those added costs become worthwhile when your worker function is sufficiently CPU-intensive such that these additional costs is small compared to the savings gained by having the tasks run in parallel. But your worker function's CPU requirements are so small as to reap any benefit from multiprocessing.
Here is a demo comparing single-processing time vs. multiprocessing times for invoking a worker function, fn, twice where the first time it only performs its internal loop 10 times (low CPU requirements) while the second time it performs its internal loop 1,000,000 times (higher CPU requirements). You can see that in the first case the multiprocessing version runs considerable slower (you can't even measure the time for the single processing run). But when we make fn more CPU-intensive, then multiprocessing achieves gains over the single-processing case.
from multiprocessing import Pool
from functools import partial
import time
def fn(iterations, x):
the_sum = x
for _ in range(iterations):
the_sum += x
return the_sum
# required for Windows:
if __name__ == '__main__':
for n_iterations in (10, 1_000_000):
# single processing time:
t1 = time.time()
for x in range(1, 20):
fn(n_iterations, x)
t2 = time.time()
# multiprocessing time:
worker = partial(fn, n_iterations)
t3 = time.time()
with Pool() as p:
results = p.map(worker, range(1, 20))
t4 = time.time()
print(f'#iterations = {n_iterations}, single processing time = {t2 - t1}, multiprocessing time = {t4 - t3}')
Prints:
#iterations = 10, single processing time = 0.0, multiprocessing time = 0.35399389266967773
#iterations = 1000000, single processing time = 1.182999849319458, multiprocessing time = 0.5530076026916504
But even with a pool size of 8, the running time is not reduced by a factor of 8 (it's more like a factor of 2) due to the fixed multiprocessing overhead. When I change the number of iterations for the second case to be 100,000,000 (even more CPU-intensive), we get ...
#iterations = 100000000, single processing time = 109.3077495098114, multiprocessing time = 27.202054023742676
... which is a reduction in running time by a factor of 4 (I have many other processes running in my computer, so there is competition for the CPU).

Threads is not executing in parallel python with ThreadPoolExecutor

I'm new in python threading and I'm experimenting this:
When I run something in threads (whenever I print outputs), it never seems to be running in parallel. Also, my functions take the same time that before using the library concurrent.futures (ThreadPoolExecutor).
I have to calculate the gains of some attributes over a dataset (I cannot use libraries). Since I have about 1024 attributes and the function was taking about a minute to execute (and I have to use it in a for iteration) I dicided to split the array of attributes into 10 (just as an example) and run the separete function gain(attribute) separetly for each sub array. So I did the following (avoiding some extra unnecessary code):
def calculate_gains(self):
splited_attributes = np.array_split(self.attributes, 10)
result = {}
for atts in splited_attributes:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(self.calculate_gains_helper, atts)
return_value = future.result()
self.gains = {**self.gains, **return_value}
Here's the calculate_gains_helper:
def calculate_gains_helper(self, attributes):
inter_result = {}
for attribute in attributes:
inter_result[attribute] = self.gain(attribute)
return inter_result
Am I doing something wrong? I read some other older posts but I couldn't get any info.
Thanks a lot for any help!
Python threads do not run in parallel (at least in CPython implementation) because of the GIL. Use processes and ProcessPoolExecutor to really have parallelism
with concurrent.futures.ProcessPoolExecutor() as executor:
...
You submit and then wait for each work item serially so all the threads do is slow everything down. I can't guarantee this will speed things up much because you are still dealing with the python GIL that keeps python level stuff from working in parallel, but here goes.
I've created a thread pool and pushed everything possible into the worker, including the slicing of self.attributes.
def calculate_gains(self):
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
result_list = executor.map(self.calculate_gains_helper,
((i, i+10) for i in range(0, len(self.attributes), 10)))
for return_value in result_list:
self.gains = {**self.gains, **return_value}
def calculate_gains_helper(self, start_end):
start, end = start_end
inter_result = {}
for attribute in self.attributes[start:end]:
inter_result[attribute] = self.gain(attribute)
return inter_result

parallel request using multiprocessing.dummy

I trying run parellel get requests using multiprocessing.dummy with report by progress.
from multiprocessing.dummy import Pool
from functools import partial
class Test(object):
def __init__(self):
self.count = 0
self.threads = 10
def callback(self, total, x):
self.count += 1
if self.count%100==0:
print("Working ({}/{}) cases processed.".format(self.count, total))
def do_async(self):
thread_pool = Pool(self.threads)#self.threads
input_list = link
callback = partial(self.callback, len(link))
tasks = [thread_pool.apply_async(get_data, (x,), callback=callback) for x in input_list]
return (task.get() for task in tasks)
start = time.time()
t = Test()
results = t.do_async()
end = time.time()`
the result of the operation - the same time as the non-parallel requests
CPython is inherently single-threaded due to something called the Global Interpreter Lock (GIL). This means only one thread can run at a time, even if there are multiple CPU cores available. multiprocessing.dummy is just a wrapper for using threads, so this is why you are not getting a speed up.
To get the benefit of having multiple CPUs, you must use multiprocessing itself. However, there are overheads based on the cost of sending and receiving the input and output data of the sub-process. If the cost of this is greater than the amount of work done by the sub-process then using multiprocessing can actually slow your program down. So in your example, multiprocessing would likely not give you a speed increase. This is especially true as most of the work in the callback involves printing to standard out, which all the processes in the pool must synchronise over to prevent garbage being printed out.
i found solution in concurrent.futures:
import concurrent.futures as futures
import datetime
import sys
results=[]
print("start", datetime.datetime.now().isoformat())
start =time.time()
with futures.ThreadPoolExecutor(max_workers=100) as executor:
fs = [executor.submit(get_data, url) for url in link]
for i, f in enumerate(futures.as_completed(fs)):
results.append(f.result())
if i%100==0:
sys.stdout.write("line nr: {} / {} \r".format(i, len(link)))

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()
multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass
One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()
The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

Categories

Resources