I want to perform some benchmarking between 'multiprocessing' a file and sequential processing a file.
In basics it's a file that is read line by line (consists of 100 lines), and the first character is read from eachline and is put into the list if it doesn't exists.
import multiprocessing as mp
import sys
import time
database_layout=[]
def get_first_characters(string):
global database_layout
if string[0:1] not in database_layout:
database_layout.append(string[0:1])
if __name__ == '__main__':
start_time = time.time()
bestand_first_read=open('random.txt','r', encoding="latin-1")
for line in bestand_first_read:
p = mp.Process(target=get_first_characters, args=(line,))
p.start()
print(str(len(database_layout)))
print("Finished part one: "+ str(time.time() - start_time))
bestand_first_read.close()
###Part two
database_layout_two=[]
start_time = time.time()
bestand_first_read_two=open('random.txt','r', encoding="latin-1")
for linetwo in bestand_first_read_two:
if linetwo[0:1] not in database_layout_two:
database_layout_two.append(linetwo[0:1])
print(str(len(database_layout_two)))
print("Finished: part two"+ str(time.time() - start_time))
But when i execute this program i get the following result:
python test.py
0
Finished part one: 17.105965852737427
10
Finished part two: 0.0
Two problems arise at this moment.
1) Why does the multiprocessing takes much longer (+/- 17 sec) than the sequential processing (+/- 0 sec).
2) Why does the list 'database_layout' defined not get filled? (It is the same code)
EDIT
A same example which works with Pools.
import multiprocessing as mp
import timeit
def get_first_characters(string):
return string
if __name__ == '__main__':
database_layout=[]
start = timeit.default_timer()
nr = 0
with mp.Pool(processes=4) as pool:
for i in range(99999):
nr += 1
database_layout.append(pool.starmap(get_first_characters, [(str(i),)]))
stop = timeit.default_timer()
print("Pools: %s " % (stop - start))
database_layout=[]
start = timeit.default_timer()
for i in range(99999):
database_layout.append(get_first_characters(str(i)))
stop = timeit.default_timer()
print("Regular: %s " % (stop - start))
After running above example the following output is shown.
Pools: 22.058468394726148
Regular: 0.051738489109649066
This shows that in such a case working with Pools is 440 times slower than using sequential processing. Any clou why this is?
Multiprocessing starts one process for each line of your input. That means that all the overhead of opening one new Python interpreter for each line of your (possibly very long) file. That accounts for the long time it takes to go through the file.
However, there are other issues with your code. While there is no synchronisation issue due to fighting for the file (since all reads are done in the main process, where the line iteration is going on), you have misunderstood how multiprocessing works.
First of all, your global variable is not global across processes. Actually processes don't usually share memory (like threads) and you have to use some interface to share objects (and hence why shared objects must be picklable). When your code opens each process, each interpreter instance starts by loading your file, which creates a new database_layout variable. Because of that, each interpreter starts with an empty list, which means it ends with a single-element list. For actually sharing the list, you might want to use a Manager (also see how to share state in the docs).
Also because of the huge overhead of opening new interpreters, your script performance may benefit from using a pool of workers, since this will open just a few processes for sharing the work. Remember that resource contention will impact performance if opening more processes than you have CPU cores.
The second problem, besides the issue of sharing your variable, is that your code does not wait for the processing to finish. Hence, even if the state was shared, your processing might not have finished when you check the length of database_layout. Again, using a pool might help with that.
PS: unless you want to preserve the insertion order, you might get even faster by using a set, though I'm not sure the Manager supports it.
EDIT after the OP EDIT: Your pool code is still starting up the pool for each line (or number). As you did, you still have much of your processing in the main process, just looping and passing arguments to the other processes. Besides, you're still running each element in the pool individually and appending in the list, which pretty much uses only one worker process at a time (remember that map or starmaps waits until the work finishes to return). This is from Process Explorer running your code:
Note how the main process is still doing all the hard work (22% in a quad-core machine means its CPU is maxed). What you need to do is pass the iterable to map() in a single call, minimizing the work (specially switching between Python and the C side):
import multiprocessing as mp
import timeit
def get_first_characters(number):
return str(number)[0]
if __name__ == '__main__':
start = timeit.default_timer()
with mp.Pool(processes=4) as pool:
database_layout1 = (pool.map(get_first_characters, range(99999)))
stop = timeit.default_timer()
print("Pools: %s " % (stop - start))
database_layout2=[]
start = timeit.default_timer()
for i in range(99999):
database_layout2.append(get_first_characters(str(i)))
stop = timeit.default_timer()
print("Regular: %s " % (stop - start))
assert database_layout1 == database_layout2
This got me from this:
Pools: 14.169268206710512
Regular: 0.056271265139002935
To this:
Pools: 0.35610273658926417
Regular: 0.07681461930314981
It's still slower than the single-processing one, but that's mainly because of the message-passing overhead for a very simple function. If your function is more complex it'll make more sense.
Related
I'm developing a program that involves computing similarity scores for around 480 pairs of images (20 directories with around 24 images in each). I'm utilizing the sentence_transformers Python module for image comparison, and it takes around 0.1 - 0.2 seconds on my Windows 11 machine to compare two images when running in serial, but for some reason, that time gets increased to between 1.5 and 3.0 seconds when running in parallel using a process Pool. So, either a), there's something going on behind the scenes that I'm not yet aware of, or b) I just did it wrong.
Here's a rough structure of the image comparison function:
def compare_images(image_one, image_two, clip_model):
start = time()
images = [image_one, image_two]
# clip_model is set to SentenceTransformer('clip-ViT-B-32') elsewhere in the code
encoded_images = clip_model.encode(images, batch_size = 2, convert_to_tensor = True, show_progress_bar = False)
processed_images = util.paraphrase_mining_embeddings(encoded_images)
stop = time()
print("Comparison time: %f" % (stop - start) )
score, image_id1, image_id2 = processed_images[0]
return score
Here's a rough structure of the serial version of the code to compare every image:
def compare_all_images(candidate_image, directory, clip_model):
for dir_entry in os.scandir(directory):
dir_image_path = dir_entry.path
dir_image = Image.open(dir_image_path)
similiarity_score = compare_images(candidate_image, dir_image, clip_model)
# ... code to determine whether this is the maximum score the program has seen...
Here is a rough structure of the parallel version:
def compare_all_images(candidate_image, directory, clip_model):
pool_results = dict()
pool = Pool()
for dir_entry in os.scandir(directory):
dir_image_path = dir_entry.path
dir_image = Image.open(dir_image_path)
pool_results[dir_image_path] = pool.apply_async(compare_images, args = (candidate_image, dir_image, clip_model)
# Added everything to the pool, close it and wait for everything to finish
pool.close()
pool.join()
# ... remaining code to determine which image has the highest similarity rating
I'm not sure where I might be erring.
The interesting thing here is that I also developed a smaller program to verify whether I was doing things correctly:
def func():
sleep(6)
def main():
pool = Pool()
for i in range(20):
pool.apply_async(func)
pool.close()
start = time()
pool.join()
stop = time()
print("Time: %f" % (stop - start) ) # This gave an average of 12 seconds
# across multiple runs on my Windows 11
# machine, on which multiprocessing.cpu_count=12
Is this a problem with trying to make things parallel with sentence transformers, or does the problem lie elsewhere?
UPDATE: Now I'm especially confused. I'm now only passing str objects to the comparison function and have temporarily slapped a return 0 as the very first line in the function to see if I can further isolate the issue. Oddly, even though the parallel function is doing absolutely nothing now, several seconds (usually around 5) still seem to pass between the time that the pool is closed and the time that pool.join() finishes. Any thoughts?
UPDATE 2: I've done some more playing around, and have found out that an empty pool still has some overhead. This is the code I'm testing out currently:
# ...
pool = Pool()
pool.close()
start = time()
DebuggingUtilities.debug("empty pool closed, doing a join on the empty pool to see if directory traversal is messing things up")
pool.join()
stop = time()
DebuggingUtilities.debug("Empty pool join time: %f" % (stop - start) )
This gives me an "Empty pool join time" of about 5 seconds. Moving this snippet to the very first part of my main function still yields the same. Perhaps Pool works differently on Windows? In WSL (Ubuntu 20.04), the same code runs in about 0.02 seconds. So, what would cause even an empty Pool to hang for such a long time on Windows?
UPDATE 3: I've made another discovery. The empty pool problem goes away if the only imports I have are from multiprocessing import Pool and from time import time. However, the program uses a boatload of import statements across several source files, which causes the program to hang a bit when it first starts. I suspect that this is propagating down into the Pool for some reason. Unfortunately, I need all of the import statements that are in the source files, so I'm not sure how to get around this (or why the imports would affect an empty Pool).
UPDATE 4: So, apparently it's the from sentence_transformers import SentenceTransformer line that's causing issues (without that import, the pool.join() call happens relatively quickly. I think the easiest solution now is to simply move the compare_images function into a separate file. I'll update this question again with updates as I implement this.
UPDATE 5: I've done a little more playing around, and it seems like on Windows, the import statements get executed multiple times whenever a Pool gets created, which I think is just weird. Here's the code I used to verify this:
from multiprocessing import Pool
from datetime import datetime
from time import time
from utils import test
print("outside function lol")
def get_time():
now = datetime.now()
return "%02d/%02d/%04d - %02d:%02d:%02d" % (now.month, now.day, now.year, now.hour, now.minute, now.second)
def main():
pool = Pool()
print("Starting pool")
"""
for i in range(4):
print("applying %d to pool %s" % (i, get_time() ) )
pool.apply_async(test, args = (i, ) )
"""
pool.close()
print("Pool closed, waiting for all processes to finish")
start = time()
pool.join()
stop = time()
print("pool done: %f" % (stop - start) )
if __name__ == "__main__":
main()
Running through Windows command prompt:
outside function lol
Starting pool
Pool closed, waiting for all processes to finish
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
outside function lol
pool done: 4.794051
Running through WSL:
outside function lol
Starting pool
Pool closed, waiting for all processes to finish
pool done: 0.048856
UPDATE 6: I think I might have a workaround, which is to create the Pool in a file that doesn't directly or indirectly import anything from sentence_transformers. I then pass the model and anything else I need from sentence_transformers as parameters to a function that handles the Pool and kicks off all of the parallel processes. Since the sentence_transformers import seems to be the only problematic one, I'll wrap that import statement in an if __name__ == "__main__" so it only runs once, which will be fine, as I'm passing the things I need from it as parameters. It's a rather janky solution, and probably not what others would consider as "Pythonic", but I have a feeling this will work.
UPDATE 7: The workaround was successful. I've managed to get the pool join time on an empty pool down to something reasonable (0.2 - 0.4 seconds). The downside of this approach is that there is definitely considerable overhead in passing the entire model as a parameter to the parallel function, which I needed to do as a result of creating the Pool in a different place than the model was being imported. I'm quite close, though.
I've done a little more digging, and think I've finally discovered the root of the problem, and it has everything to do with what's described here.
To summarize, on Linux systems, processes are forked from the main process, meaning that the current process state is copied (which is why the import statements don't run multiple times). On Windows (and macOS), processes are spawned, meaning that interpreter starts at the beginning of the "main" file, thus running all import statements again. So, the behavior I'm seeing is not a bug, but I will need to rethink my program design to account for this.
I have edited the code , currently it is working fine . But thinks it is not executing parallely or dynamically . Can anyone please check on to it
Code :
def folderStatistic(t):
j, dir_name = t
row = []
for content in dir_name.split(","):
row.append(content)
print(row)
def get_directories():
import csv
with open('CONFIG.csv', 'r') as file:
reader = csv.reader(file,delimiter = '\t')
return [col for row in reader for col in row]
def folderstatsMain():
freeze_support()
start = time.time()
pool = Pool()
worker = partial(folderStatistic)
pool.map(worker, enumerate(get_directories()))
def datatobechecked():
try:
folderstatsMain()
except Exception as e:
# pass
print(e)
if __name__ == '__main__':
datatobechecked()
Config.CSV
C:\USERS, .CSV
C:\WINDOWS , .PDF
etc.
There may be around 200 folder paths in config.csv
welcome to StackOverflow and Python programming world!
Moving on to the question.
Inside the get_directories() function you open the file in with context, get the reader object and close the file immediately after the moment you leave the context so when the time comes to use the reader object the file is already closed.
I don't want to discourage you, but if you are very new to programming do not dive into parallel programing yet. Difficulty in handling multiple threads simultaneously grows exponentially with every thread you add (pools greatly simplify this process though). Processes are even worse as they don't share memory and can't communicate with each other easily.
My advice is, try to write it as a single-thread program first. If you have it working and still need to parallelize it, isolate a single function with input file path as a parameter that does all the work and then use thread/process pool on that function.
EDIT:
From what I can understand from your code, you get directory names from the CSV file and then for each "cell" in the file you run parallel folderStatistics. This part seems correct. The problem may lay in dir_name.split(","), notice that you pass individual "cells" to the folderStatistics not rows. What makes you think it's not running paralelly?.
There is a certain amount of overhead in creating a multiprocessing pool because creating processes is, unlike creating threads, a fairly costly operation. Then those submitted tasks, represented by each element of the iterable being passed to the map method, are gathered up in "chunks" and written to a multiprocessing queue of tasks that are read by the pool processes. This data has to move from one address space to another and that has a cost associated with it. Finally when your worker function, folderStatistic, returns its result (which is None in this case), that data has to be moved from one process's address space back to the main process's address space and that too has a cost associated with it.
All of those added costs become worthwhile when your worker function is sufficiently CPU-intensive such that these additional costs is small compared to the savings gained by having the tasks run in parallel. But your worker function's CPU requirements are so small as to reap any benefit from multiprocessing.
Here is a demo comparing single-processing time vs. multiprocessing times for invoking a worker function, fn, twice where the first time it only performs its internal loop 10 times (low CPU requirements) while the second time it performs its internal loop 1,000,000 times (higher CPU requirements). You can see that in the first case the multiprocessing version runs considerable slower (you can't even measure the time for the single processing run). But when we make fn more CPU-intensive, then multiprocessing achieves gains over the single-processing case.
from multiprocessing import Pool
from functools import partial
import time
def fn(iterations, x):
the_sum = x
for _ in range(iterations):
the_sum += x
return the_sum
# required for Windows:
if __name__ == '__main__':
for n_iterations in (10, 1_000_000):
# single processing time:
t1 = time.time()
for x in range(1, 20):
fn(n_iterations, x)
t2 = time.time()
# multiprocessing time:
worker = partial(fn, n_iterations)
t3 = time.time()
with Pool() as p:
results = p.map(worker, range(1, 20))
t4 = time.time()
print(f'#iterations = {n_iterations}, single processing time = {t2 - t1}, multiprocessing time = {t4 - t3}')
Prints:
#iterations = 10, single processing time = 0.0, multiprocessing time = 0.35399389266967773
#iterations = 1000000, single processing time = 1.182999849319458, multiprocessing time = 0.5530076026916504
But even with a pool size of 8, the running time is not reduced by a factor of 8 (it's more like a factor of 2) due to the fixed multiprocessing overhead. When I change the number of iterations for the second case to be 100,000,000 (even more CPU-intensive), we get ...
#iterations = 100000000, single processing time = 109.3077495098114, multiprocessing time = 27.202054023742676
... which is a reduction in running time by a factor of 4 (I have many other processes running in my computer, so there is competition for the CPU).
I started learning about multiprocessing in python and I noticed that same code is executed much faster on main process than in process which is created with multiprocessing module.
Here is simplified example of my code, where i first execute code on main process and print time for first 10 calculation and time for total calculation. And than same code is executed on new process (which is long running process at which i can send new_pattern at any time).
import multiprocessing
import random
import time
old_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 2000)]
new_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 100)]
new_pattern_for_processing = multiprocessing.Array('d', 10)
there_is_new_pattern = multiprocessing.Value('i', 0)
queue = multiprocessing.Queue()
def iterate_and_add(old_patterns, new_pattern):
for each_pattern in old_patterns:
sum = 0
for count in range(0, 10):
sum += each_pattern[count] + new_pattern[count]
print_count_main_process = 0
def patt_recognition_main_process(new_pattern):
global print_count_main_process
# START of same code on main process
start_main_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern)
if print_count_main_process < 10:
print_count_main_process += 1
print("Time on main process one pattern:", time.time() - start_main_process_one_patt)
# END of same code on main process
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
#START of same code on new process
start_new_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern_on_new_proc)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
queue.put("DONE")
there_is_new_pattern.value = 0
if __name__ == "__main__":
start_main_process = time.time()
for new_pattern in new_patterns:
patt_recognition_main_process(new_pattern)
print(".\n.\n.")
print("Total Time on main process:", time.time() - start_main_process)
print("\n###########################################################################\n")
start_new_process = time.time()
p1 = multiprocessing.Process(target=patt_recognition_new_process, args=(old_patterns, new_pattern_for_processing, there_is_new_pattern, queue))
p1.start()
for new_pattern in new_patterns:
for idx, n in enumerate(new_pattern):
new_pattern_for_processing[idx] = n
there_is_new_pattern.value = 1
while True:
msg = queue.get()
if msg == "DONE":
break
print(".\n.\n.")
print("Total Time on new process:", time.time()-start_new_process)
And here is my result:
Time on main process one pattern: 0.0025289058685302734
Time on main process one pattern: 0.0020127296447753906
Time on main process one pattern: 0.002008199691772461
Time on main process one pattern: 0.002511262893676758
Time on main process one pattern: 0.0020067691802978516
Time on main process one pattern: 0.0020036697387695312
Time on main process one pattern: 0.0020072460174560547
Time on main process one pattern: 0.0019974708557128906
Time on main process one pattern: 0.001997232437133789
Time on main process one pattern: 0.0030074119567871094
.
.
.
Total Time on main process: 0.22810864448547363
###########################################################################
Time on new process one pattern: 0.03462791442871094
Time on new process one pattern: 0.03308463096618652
Time on new process one pattern: 0.034590721130371094
Time on new process one pattern: 0.033623456954956055
Time on new process one pattern: 0.03407788276672363
Time on new process one pattern: 0.03308820724487305
Time on new process one pattern: 0.03408670425415039
Time on new process one pattern: 0.0345921516418457
Time on new process one pattern: 0.03710794448852539
Time on new process one pattern: 0.03358912467956543
.
.
.
Total Time on new process: 4.0528037548065186
Why is there so big difference in execution time?
Its a bit subtle, but the problem is with
new_pattern_for_processing = multiprocessing.Array('d', 10)
That doesn't hold python float objects, it holds raw bytes, in this case enough to hold 10 8-byte machine level double. When you read or write to this array, python must convert float to double or the other way around. This isn't a big deal if you are reading or writing once, but your code does it many times in a loop and those conversions dominate.
To confirm, I copied the machine level array to a list of python floats once and had the process work on that. Now its speed is the same as the parent. My changes were only in one function
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
local_pattern = new_pattern_on_new_proc[:]
#START of same code on new process
start_new_process_one_patt = time.time()
#iterate_and_add(old_patterns, new_pattern_on_new_proc)
iterate_and_add(old_patterns, local_pattern)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
there_is_new_pattern.value = 0
queue.put("DONE")
In this particular case, you seem to be doing sequential execution in another process, not parallelising your algorithm. This creates some overhead.
Process creation takes time in its own right. But this is not all. You are also transmitting data in queues and using Manager proxies. These are all in practice queues or actually two queues and another process. Queues are very, very slow compared to the use of in-memory copies of data.
If you take your code, execute it in another process and use queues to transmit data in and out, it is always slower. Which makes it pointless from performance point of view. There might be other reasons to do that nevertheless, for example if your main program was needed to do something else, for example wait on IO.
If you want a performance boost, you should create several processes instead and split your algorithm so that you process parts of your range in different processes, thus working in parallel. You could also consider Multiprocessing.Pool if you want to have a group of worker processes ready to wait for more work. This will reduce process creation overhead as you do it only once. In Python 3, you can also use ProcessPoolExecutor.
Parallel processing is useful but it is seldom snake oil that will solve all your problems with little effort. To benefit the most of it, you need to redesign your program to maximise parallel processing and minimise data transmission in queues.
I'm trying to get a handle on python Parallelism. This is the code i'm using
import time
from concurrent.futures import ProcessPoolExecutor
def listmaker():
for i in xrange(10000000):
pass
#Without duo core
start = time.time()
listmaker()
end = time.time()
nocore = "Total time, no core, %.3f" % (end- start)
#with duo core
start = time.time()
pool = ProcessPoolExecutor(max_workers=2) #I have two cores
results = list(pool.map(listmaker()))
end = time.time()
core = "Total time core, %.3f" % (end- start)
print nocore
print core
I was under the assumption that because i'm using two cores my the speed be be close to double. However when i run this code most of the time the nocore output is faster than the core output. This is true even if I change
def listmaker():
for i in xrange(10000000):
pass
to
def listmaker():
for i in xrange(10000000):
print i
In fact in some runs the no core run is faster. Could someone shed some light on the issue? I'm is my setup correct? I'm I doing something wrong?
You're using pool.map() incorrectly. Take a look at the pool.map documentation. It expects an iterable argument, and it will pass each of the items from the iterable to the pool individually. Since your function only returns None, there is nothing for it to do. However, you're still incurring the overhead of spawning extra processes, which takes time.
Your usage of pool.map should look like this:
results = pool.map(function_name, some_iterable)
Notice a couple of things:
Since you're using the print statement rather than a function, I'm assuming you're using some Python2 variant. In Python2, pool.map returns a list anyway. No need to convert it to a list again.
The first argument should be the function name without parentheses. This identifies the function that the pool workers should execute. When you include the parentheses, the function is called right there, instead of in the pool.
pool.map is intended to call a function on every item in an iterable, so your test cases needs to create some iterable for it to consume, instead of a function that takes no arguments like your current example.
Try to run your trial again with some actual input to the function, and retrieve the output. Here's an example:
import time
from concurrent.futures import ProcessPoolExecutor
def read_a_file(file_name):
with open(file_name) as fi:
text = fi.read()
return text
file_list = ['t1.txt', 't2.txt', 't3.txt']
#Without duo core
start = time.time()
single_process_text_list = []
for file_name in file_list:
single_process_text_list.append(read_a_file(file_name))
end = time.time()
nocore = "Total time, no core, %.3f" % (end- start)
#with duo core
start = time.time()
pool = ProcessPoolExecutor(max_workers=2) #I have two cores
multiprocess_text_list = pool.map(read_a_file, file_list)
end = time.time()
core = "Total time core, %.3f" % (end- start)
print(nocore)
print(core)
Results:
Total time, no core, 0.047
Total time core, 0.009
The text files are 150,000 lines of gibberish each. Notice how much work had to be done before the parallel processing was worth it. When I ran the trial with 10,000 lines in each file, the single process approach was still faster because it didn't have the overhead of spawning extra processes. But with that much work to do, the extra processes become worth the effort.
And by the way, this functionality is available with multiprocessing pools in Python2, so you can avoid importing anything from futures if you want to.
I am trying to implement multiprocessing with Python. It works when pooling very quick tasks, however, freezes when pooling longer tasks. See my example below:
from multiprocessing import Pool
import math
import time
def iter_count(addition):
print "starting ", addition
for i in range(1,99999999+addition):
if i==99999999:
print "completed ", addition
break
if __name__ == '__main__':
print "starting pooling "
pool = Pool(processes=2)
time_start = time.time()
possibleFactors = range(1,3)
try:
pool.map( iter_count, possibleFactors)
except:
print "exception"
pool.close()
pool.join()
#iter_count(1)
#iter_count(2)
time_end = time.time()
print "total loading time is : ", round(time_end-time_start, 4)," seconds"
In this example, if I use smaller numbers in for loop (something like 9999999) it works. But when running for 99999999 it freezes. I tried running two processes (iter_count(1) and iter_count(2)) in sequence, and it takes about 28 seconds, so not really a big task. But when I pool them it freezes. I know that there are some known bugs in python around multiprocessing, however, in my case, same code works for smaller sub tasks, but freezes for bigger ones.
You're using some version of Python 2 - we can tell because of how print is spelled.
So range(1,99999999+addition) is creating a gigantic list, with at least 100 million integers. And you're doing that in 2 worker processes simultaneously. I bet your disk is grinding itself to dust while the OS swaps out everything it can ;-)
Change range to xrange and see what happens. I bet it will work fine then.