Python quick-sort parallel sort slower than sequential - python

Need some help with code below, there have to be something wrong with it because I get better results with sequential sort than with parallel version.I'm new to python and especially parallel programming and any help would be welcome.
import random, time
from multiprocessing import Process, Pipe,cpu_count
from copy import deepcopy
def main():
create_list = [random.randint(1,1000) for x in range(25000)]
#sequential sort
sequentialsortlist=deepcopy(create_list)
start = time.time()
sorted2 =quicksort(sequentialsortlist)
elapsed = time.time() - start
print("sequential sort")
print(elapsed)
time.sleep(4)
#Parallel quicksort.
parallelsortlist = deepcopy(create_list)
start = time.time()
n = cpu_count()
pconn, cconn = Pipe()
p = Process(target=quicksortParallel,
args=(parallelsortlist, cconn, n,))
p.start()
lyst = pconn.recv()
p.join()
elapsed = time.time() - start
print("Parallels sort")
print(elapsed)
def quicksort(lyst):
less = []
pivotList = []
more = []
if len(lyst) <= 1:
return lyst
else:
pivot = lyst[0]
for i in lyst:
if i < pivot:
less.append(i)
elif i > pivot:
more.append(i)
else:
pivotList.append(i)
less = quicksort(less)
more = quicksort(more)
return less + pivotList + more
def quicksortParallel(lyst, conn, procNum):
less = []
pivotList = []
more = []
if procNum <= 0 or len(lyst) <= 1:
conn.send(quicksort(lyst))
conn.close()
return
else:
pivot = lyst[0]
for i in lyst:
if i < pivot:
less.append(i)
elif i > pivot:
more.append(i)
else:
pivotList.append(i)
pconnLeft, cconnLeft = Pipe()
leftProc = Process(target=quicksortParallel,
args=(less, cconnLeft, procNum - 1))
pconnRight, cconnRight = Pipe()
rightProc = Process(target=quicksortParallel,
args=(more, cconnRight, procNum - 1))
leftProc.start()
rightProc.start()
conn.send(pconnLeft.recv()+pivotList + pconnRight.recv())
conn.close()
leftProc.join()
rightProc.join()
if __name__ == '__main__':
main()

The simple answer is that the overhead of setting up your parallel execution environment and then re-joining it at the end is more expensive then the performance increase gained from the parallelism.
Multi-processing actually forks sub-processes. That's very expensive. It only makes sense to do this if the amount of work done in each thread is very large.
This kind of problem is actually pretty common when people neivly try and parallelize code. It's pretty common that for many 'reasonable' workloads the single-threaded implementation winds up being faster.

There is a cost associated with starting/terminating a process. Interprocess communication is not free, either. So the overhead is just too big.

Related

Create a python code that counts prime numbers in range(3000000) using for-looped multithreading(e.g. if Nthreads is N, it is gonna divide 3000000/N)

I'm trying to run this program but it is showing me "Thread 0 has 0 prime numbers" in the console followed by "Killed" after 5 minutes. Moreover, it is very slow. Please help me develop and correct this code.
import time
Nthreads=4
maxNumber=3000000
starting_range=0
ending_range=0
division=0
lst=[]
def prime(x, y):
prime_list = []
for i in range(x, y):
if i == 0 or i == 1:
continue
else:
for j in range(2, int(i/2)+1):
if i % j == 0:
break
else:
prime_list.append(i)
return prime_list
def func_thread(x, y):
out.append(prime(x, y))
thread_list = []
results = len(lst)
for i in range(Nthreads):
devision=maxNumber//Nthreads
starting_range = (i-1)*division+1
ending_range = i*devision
lst = prime(starting_range, ending_range)
print(" Thread ", i, " has ", len(lst), " prime numbers." )
thread = threading.Thread(target=func_thread, args=(i, results))
thread_list.append(thread)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()```
In Python, if you use multithreading for CPU-bound tasks, it will be slower than if you don't use multithreading. You need to use multiprocessing for this problem. You can read this article for more informations: https://www.geeksforgeeks.org/difference-between-multithreading-vs-multiprocessing-in-python/
Multithreading is wholly inappropriate for CPU-intensive work such as this. However, it can be done:
from concurrent.futures import ThreadPoolExecutor
NTHREADS = 4
MAXNUMBER = 3_000_000
CHUNK = MAXNUMBER // NTHREADS
assert MAXNUMBER % NTHREADS == 0
RANGES = [(base, base+CHUNK) for base in range(0, MAXNUMBER, CHUNK)]
all_primes = []
def isprime(n):
if n <= 3:
return n > 1
if not n % 2 or not n % 3:
return False
for i in range(5, int(n**0.5)+1, 6):
if not n % i or not n % (i + 2):
return False
return True
def process(_range):
lo, hi = _range
if lo < 3:
all_primes.append(2)
lo = 3
elif lo % 2 == 0:
lo += 1
for p in range(lo, hi, 2):
if isprime(p):
all_primes.append(p)
with ThreadPoolExecutor() as executor:
executor.map(process, RANGES)
The all_primes list is unordered. Note also that this strategy will only work if MAXNUMBER is exactly divisible by NTHREADS.
Note on performance:
This takes 7.88s on my machine. A multiprocessing variant takes 2.90s

Run linear and bisect search in parallel and stop on first result

So I have been searching far an wide for this. I want to run three functions in parallel, when one finishes return this value, stop the two others.
I have tried with asynciop, multiprocess and concurrent.futures, and I still can not get it to work.
As an example I have a function that returns 0 when above a certain threshold, and some number otherwise. To test my implementation I want to return the last non-zero value.
My problem is that regardless of whatever I try, all three methods complete every time. Instead of breaking when a match has been found.
from concurrent.futures import FIRST_COMPLETED, ThreadPoolExecutor, wait
def first_zero(f, values=[10, 10**2, 10**3]):
for i in values:
if f(i) == 0:
return i
return values[-1]
def bisect_search(f, first, last):
if f(first) == 0:
return first
middle = (last + first) // 2
while True:
# print(middle, "bisect")
if abs(last - first) <= 1:
return first
middle = (last + first) // 2
if f(middle) > 0:
first = middle
else:
last = middle
def linear_search(f, start, stop, increment):
i = start
minimum = min(start, stop)
maximum = max(start, stop)
if increment < 0:
while minimum <= i <= maximum:
if f(i) != 0:
return i
i += increment
else:
while minimum <= i <= maximum:
if f(i) == 0:
return i - increment
i += increment
return i
def linear_binary_search(f, start=0):
stop = first_zero(f)
tasks = [
(linear_search, [f, start, stop, 1]),
(linear_search, [f, stop, start, -1]),
(bisect_search, [f, start, stop]),
]
executor = ThreadPoolExecutor(max_workers=len(tasks))
futures = [executor.submit(task, *args) for (task, args) in tasks]
done, _ = wait(futures, return_when=FIRST_COMPLETED)
executor.shutdown(wait=False, cancel_futures=True)
return done.pop().result()
if __name__ == "__main__":
import random
import time
target = 995
def f(x):
time.sleep(0.01)
if x <= target:
return random.randint(1, 1000)
return 0
# Comment the following line to see
# how much slower the concurrent version is
print(linear_binary_search(f))
stop = first_zero(f)
print(linear_search(f, stop, 0, -1))

Kill threads after one thread succesfully finished function [duplicate]

This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 12 months ago.
I search a way to transform this kind of code from multiprocessing into multithreading:
import multiprocessing
import random
import time
FIND = 50
MAX_COUNT = 100000
INTERVAL = range(10)
queue = multiprocessing.Queue(maxsize=1)
def find(process, initial):
succ = False
while succ == False:
start=initial
while(start <= MAX_COUNT):
if(FIND == start):
queue.put(f"Found: {process}, start: {initial}")
break;
i = random.choice(INTERVAL)
start = start + i
print(process, start)
processes = []
manager = multiprocessing.Manager()
for i in range(5):
process = multiprocessing.Process(target=find, args=(f'computer_{i}', i))
processes.append(process)
process.start()
ret = queue.get()
for i in range(5):
process = processes[i]
process.terminate()
print(f'terminated {i}')
print(ret)
The way it works is it starts multiple processes and after the first process finished the function find isn't needed anymore. I tried to transform it in that way, but unfortunately the terminate function is not usable:
import _thread as thread
import queue
import random
import time
FIND = 50
MAX_COUNT = 100000
INTERVAL = range(10)
qu = queue.Queue(maxsize=1)
def find(process, initial):
succ = False
while succ == False:
start=initial
while(start <= MAX_COUNT):
if(FIND == start):
qu.put(f"Found: {process}, start: {initial}")
break;
i = random.choice(INTERVAL)
start = start + i
print(process, start)
threads = []
for i in range(5):
th = thread.start_new_thread(find, (f'computer_{i}', i))
threads.append(th)
ret = qu.get()
for i in range(5):
th = threads[i]
th.terminate()
print(f'terminated {i}')
print(ret)
How can I get some termination of threads?
Try:
for id, thread in threading._active.items():
types.pythonapi.PyThreadState_SetAsyncExc(id, ctypes.py_object(SystemExit))

Memoization not working properly, python code from MIT Open Courseware "Introduction to Computer Science and Programming" Lecture 21

I'm brand new to coding and I'm working through the MIT Open Courseware 6.00 course "Introduction to Computer Science and Programming." Lecture 21 covers dynamic programming and memoization specifically.
The assignment/lecture code is an optimization program, finding the combination of items with greatest value one can take from a set of items that will fit in a bag with limited capacity. This is a section of the code that was provided:
def solve(toConsider, avail):
global numCalls
numCalls += 1
if toConsider == [] or avail == 0:
result = (0, ())
elif toConsider[0].getWeight() > avail:
result = solve(toConsider[1:], avail)
else:
nextItem = toConsider[0]
#Explore left branch
withVal, withToTake = solve(toConsider[1:],
avail - nextItem.getWeight())
withVal += nextItem.getValue()
#Explore right branch
withoutVal, withoutToTake = solve(toConsider[1:], avail)
#Choose better branch
if withVal > withoutVal:
result = (withVal, withToTake + (nextItem,))
else:
result = (withoutVal, withoutToTake)
return result
def fastSolve(toConsider, avail, memo = None):
global numCalls
numCalls += 1
if memo == None:
#Initialize for first invocation
memo = {}
if (len(toConsider), avail) in memo:
#Use solution found earlier
result = memo[(len(toConsider), avail)]
return result
elif toConsider == [] or avail == 0:
result = (0, ())
elif toConsider[0].getWeight() > avail:
#Lop off first item in toConsider and solve
result = fastSolve(toConsider[1:], avail, memo)
else:
item = toConsider[0]
#Consider taking first item
withVal, withToTake = fastSolve(toConsider[1:],
avail - item.getWeight(),
memo)
withVal += item.getValue()
#Consider not taking first item
withoutVal, withoutToTake = fastSolve(toConsider[1:],
avail, memo)
#Choose better alternative
if withVal > withoutVal:
result = (withVal, withToTake + (item,))
else:
result = (withoutVal, withoutToTake)
#Update memo
memo[(len(toConsider), avail)] = result
return result
import time
import sys
sys.setrecursionlimit(2000)
def test(maxVal = 10, maxWeight = 10, runSlowly = False):
random.seed(0)
global numCalls
capacity = 8*maxWeight
print '#items, #num taken, Value, Solver, #calls, time'
for numItems in (4,8,16,32,64,128,256,512,1024):
Items = buildManyItems(numItems, maxVal, maxWeight)
if runSlowly:
tests = (fastSolve, solve)
else:
tests = (fastSolve,)
for func in tests:
numCalls = 0
startTime = time.time()
val, toTake = func(Items, capacity)
elapsed = time.time() - startTime
funcName = func.__name__
print numItems, len(toTake), val, funcName, numCalls, elapsed
In the lecture, when the instructor runs this test code there is a drastic difference in the performance of fastSolve due to the use of memoization, but when I run this code the memoization doesn't appear to be working - there is no difference in the number of calls or run time between Solve and fastSolve.
I understand this code was written for an earlier version of python (2.7 I believe) and I'm currently using python 3.8.0.
Can anyone help me to identify why fastSolve/memoization isn't working as expected?

What is the time difference between a normal python code and the same code in multiprocessing?

I'm trying to clearly understand the difference of a function in single process and the same function in multiple cores. The following normal python code and multiprocessor code gives the same time (approx). Am i using multiprocessing wrong?
Normal Python code:
import time
def basic_func(x):
if x == 0:
return 'zero'
elif x % 2 == 0:
return 'even'
else:
return 'odd'
def multiprocessing_func(x):
y = x * x
print('{} squared results in a/an {} number'.format(x, basic_func(y)))
if __name__ == '__main__':
starttime = time.time()
for each in range(0, 1000):
multiprocessing_func(each)
print('That took {} seconds'.format(time.time() - starttime))
Multiprocessing code:
import time
import multiprocessing
def basic_func(x):
if x == 0:
return 'zero'
elif x % 2 == 0:
return 'even'
else:
return 'odd'
def multiprocessing_func(x):
y = x * x
print('{} squared results in a/an {} number'.format(x, basic_func(y)))
if __name__ == '__main__':
starttime = time.time()
pool = multiprocessing.Pool()
pool.map(multiprocessing_func, range(0, 1000))
pool.close()
print('That took {} seconds'.format(time.time() - starttime))
Thanks in advance !
code source : This tutorial
Without multiprocessing, I executed this code in 0.07s. The multiprocessing version took 0.28s. Create some pool of processes take some times and it may not be worth it.
I recommend not printing during the process as it could create a funnel effect (I/O is always an issue for concurrent processes)
Changing a little bit your code :
import time
import multiprocessing
def basic_func(x):
if x == 0:
return 'zero'
elif x % 2 == 0:
return 'even'
else:
return 'odd'
def multiprocessing_func(x):
y = x * x
return basic_func(y)
And comparing results :
starttime = time.time()
for each in range(0, 100000000):
multiprocessing_func(each)
print('That took {} seconds'.format(time.time() - starttime))
Took 34s
starttime = time.time()
pool = multiprocessing.Pool(processes=10)
pool.map(multiprocessing_func, range(0, 100000000))
pool.close()
print('That took {} seconds'.format(time.time() - starttime))
Took 9.6s
See that the "same" problem had drastic different results. Answering your question is not possible, it depends too much on the initial problem, funnel effects and the balance between the duration of the task and the cost of creating pool of processes.

Categories

Resources