Why threads are not working simultaneously? - python

For starters i'm new in Python.
I will be brief. I'm trying to fetch all links from the website using threads.
The problem is that threads are waiting for their turn, but I want them to work simultaneously with other threads.
For example, I set the number of threads to 2, and then get 2 chunks with links.
I want the first thread to iterate over the links in the first chunk, and the second thread to iterate over the links in the second chunk SIMULTANEOUSLY. But my program works in such a way that threads are waiting for their turn. What am I doing wrong, guys? Much obliged for your help
My code:
target()
def url_target(text, e):
global links
global chunks
number = int(sys.argv[1])
for m in text:
time.sleep(0.2)
print(m, e)
print('\n')
main()
def main():
global links
global chunks
url = sys.argv[2]
links = fetch_links(url)
number = int(sys.argv[1])
url_chunk = len(links) // number
start, stop = 0, url_chunk + len(links) % number
chunks = []
time.sleep(1)
while start < len(links):
for i in range(number):
part_links = links[start:stop]
p = Thread(name='myThread', target=url_target, args=(part_links, i+1))
p.start()
chunks.append(p)
start, stop = stop, stop + url_chunk
p.join()
time.sleep(1)
while chunks:
d = chunks.pop()
print(f'{d.ident} done')
Thanks! I'd appreciate any help you can give!

p.join() blocks until p completes. You want to start all the threads first, then wait on each in turn.
while start < len(links):
for i in range(number):
part_links = links[start:stop]
p = Thread(name='myThread', target=url_target, args=(part_links, i+1))
p.start()
chunks.append(p)
start, stop = stop, stop + url_chunk
time.sleep(1)
for p in chunks:
p.join()
If you aren't planning on doing anything while waiting for all the threads to complete, this is fine. However, you might want to block until any thread completes, rather than an arbitrarily chosen one. A thread pool can help, but
a simple way to implement a thread pool is to wait for a short period of time for a thread to complete. If it doesn't, wait on another one and come back to the first one later. For example,
from collections import deque
chunks = deque()
for start in range(0, len(links), url_chunk):
for i in range(1, number+1):
part_links = links[start:start + url_chunk]
p = Thread(name='myThread', target=url_target, args=(part_links, i))
p.start()
chunks.append(p)
while chunks:
p = chunks.popleft()
p.join(5) # Wait 5 seconds, or some other small period of time
if p.is_alive():
chunks.append(p) # put it back

Related

add a semaphore or a lock to a part of the code.. everything gives deadlock

I want to add a semaphore to where I commented down below the code but i couldnt.. so my code runs 4 threads that have linked list attached, and each linked list retrieves same items.. and they are sorted with a value from the items, then if there are no 2 biggest same numbers, winner is elected. if not nodework is called again for each thread with thr down below. so when there are 2 biggest same values and it rolls again, the linkedlist array from previous thread just mixes with the newly ones created, because the threads edit each other arrays as well. also the threads dont wait for each other to fill their arrays with data under with semaphoreAdding so they get wrong data set from the arrays as they keep executing. so i want to add a semaphore where with semaphoreAddingFinished is located in the code. so that, when each thread finishes .add() call, they can keep going. How can I add semaphore there ? I try everything but i get deadlock
import conset
import random
import threading
from threading import Timer
numberOfNodes = 4 #number of threads, (linked lists)
llList = []
roundCount = 1
roundCountCounter = 1
semaphore = threading.Semaphore(0)
semaphoreAdding = threading.Semaphore(1)
semaphoreAddingFinished = threading.Semaphore(1)
for x in range(0, numberOfNodes): #creating a global list with n number of ConSet instances (linked lists)
ll = conset.LinkedList()
llList.append(ll)
numb = 0 #thread counter
def nodeWork(nodeId, n):
global numb
randomInteger = random.randint(0, 3) #generate random number 0 to n^2
theItemTuple = (nodeId, randomInteger) #create the tuple for mailbox
print("Node", nodeId, "proposes value", randomInteger, "for round", roundCount)
semaphoreAddFinish = threading.Semaphore(0)
with semaphoreAdding:
for m in range(0, len(llList)): #add the tuple to all mailboxes of all nodes
llList[m].add(theItemTuple)
#print(nodeId, llList[nodeId].head.data)
with semaphoreAddingFinished: #i want to add a semaphore here
zzz =1
with semaphore:
for k in range(0, len(llList)):
node = llList[nodeId].head
if node.next:
if node.data[1] != node.next.data[1]:
print("Node", nodeId, "decide", node.data[0], "as the leader")
return
else:
print("Node", nodeId, "could not decide on the leader and moves to the round", roundCount+1)
llList[nodeId] = conset.LinkedList()
thr = threading.Thread(target=nodeWork, args=[nodeId, numberOfNodes])
thr.start()
numb = numb + 1
#print("numb", numb)
if numb == 4:
numb = 0
for l in range(0, numberOfNodes):
semaphoreAdding.release()
return
else:
print("Node", nodeId, "decide", node.data[0], "as the leader")
return
main_thread = threading.currentThread() # getting handle to the current thread
for num in range(0, numberOfNodes):
#threading.Lock.acquire()
t = threading.Thread(target=nodeWork, args=[num, numberOfNodes])
t.start()
if num == numberOfNodes-1:
semaphore.release()
semaphoreAdding.acquire()
#threading.Lock.release()
for t in threading.enumerate(): #waiting for all threads to end
if t is not main_thread:
t.join()

Stop a thread if function delays the response

I have the following loop which is calling the getHLS function in a separate thread for each of the text lines. The problem is that the getHLS function might be really slow at some times, and I am looking for a way to "timeout" a thread if the function does not return anything for 10 seconds.
links = open("links.txt")
lines = links.readlines()
linenumber = 0
for line in lines:
linenumber += 1
thread = threading.Thread(target = getHLS, args = (line, linenumber))
thread.setDaemon(False)
thread.start()
if threading.active_count() == 50:
thread.join(10)

Python Monty Hall: Multiprocessing slower than direct processing

I am trying out multiprocessing for my Monty Hall game simulation for improved performance. The game is payed 10mm times and takes ~17 seconds when directly run, however, my multiprocessing implementation is taking significantly longer to run. I am clearly doing something wrong but I can't figure out what.
import multiprocessing
from MontyHall.game import Game
from MontyHall.player import Player
from Timer.timer import Timer
def doWork(input, output):
while True:
try:
f = input.get(timeout=1)
res = f()
output.put(res)
except:
break
def main():
# game setup
player_1 = Player(True) # always switch strategy
game_1 = Game(player_1)
input_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
# total simulations
for i in range(10000000):
input_queue.put(game_1.play_game)
with Timer('timer') as t:
# initialize 5 child processes
processes = []
for i in range(5):
p = multiprocessing.Process(target=doWork, args=(input_queue, output_queue))
processes.append(p)
p.start()
# terminate the processes
for p in processes:
p.join()
results = []
while len(results) != 10000000:
r = output_queue.get()
results.append(r)
win = results.count(True) / len(results)
loss = results.count(False) / len(results)
print(len(results))
print(win)
print(loss)
if __name__ == '__main__':
main()
This is my first post. Advice on posting etiquette is also appreciated. Thank you.
Code for the Classes:
class Player(object):
def __init__(self, switch_door=False):
self._switch_door = switch_door
#property
def switch_door(self):
return self._switch_door
#switch_door.setter
def switch_door(self, iswitch):
self._switch_door = iswitch
def choose_door(self):
return random.randint(0, 2)
class Game(object):
def __init__(self, player):
self.player = player
def non_prize_door(self, door_with_prize, player_choice):
"""Returns a door that doesn't contain the prize and that isn't the players original choice"""
x = 1
while x == door_with_prize or x == player_choice:
x = (x + 1) % 3 # assuming there are only 3 doors. Can be modified for more doors
return x
def switch_function(self, open_door, player_choice):
"""Returns the door that isn't the original player choice and isn't the opened door """
x = 1
while x == open_door or x == player_choice:
x = (x + 1) % 3 # assuming there are only 3 doors. Can be modified for more doors
return x
def play_game(self):
"""Game Logic"""
# randomly places the prize behind one of the three doors
door_with_prize = random.randint(0, 2)
# player chooses a door
player_choice = self.player.choose_door()
# host opens a door that doesn't contain the prize
open_door = self.non_prize_door(door_with_prize, player_choice)
# final player choice
if self.player.switch_door:
player_choice = self.switch_function(open_door, player_choice)
# Result
return player_choice == door_with_prize
Code for running it without multiprocessing:
from MontyHall.game import Game
from MontyHall.player import Player
from Timer.timer import Timer
def main():
# Setting up the game
player_2 = Player(True) # always switch
game_1 = Game(player_2)
# Testing out the hypothesis
with Timer('timer_1') as t:
results = []
for i in range(10000000):
results.append(game_1.play_game())
win = results.count(True) / len(results)
loss = results.count(False) / len(results)
print(
f'When switch strategy is {player_2.switch_door}, the win rate is {win:.2%} and the loss rate is {loss:.2%}')
if __name__ == '__main__':
main()
As you did not give the full code that we can run locally, I can only speculate. My guess is that you are passing an object(a method from your game) to other processes so pickling and unpickling took too much time. Unlike multithreading where you can "share" data, in multiprocessing, you need to pack the data and send to the other process.
However, there's a rule I always follow when I try to optimize my code - profile before optimizing! It would be much better to KNOW what's slow than GUESS.
It's a multiprocessing program so there are not a lot of options in the market. You could try viztracer which supports multiprocessing.
pip install viztracer
viztracer --log_multiprocess your_program.py
It will generate a result.html that you can open with chrome. Or you can just do
vizviewer result.html
I would suggest to reduce the iteration number so you can have a view of the whole picture(because viztracer uses a circular buffer and 10 million iterations will definitely overflow). But, you can still get the last piece of your code executing if you don't, which should be helpful enough for you to figure out what's going on.
I used viztracer as you gave the whole code.
This is one of your iteration in your worker process. As you can tell, the actual working part is very small(the yellow-ish slice in the middle p...). Most of the time has been spent on receiving and putting data, which eliminates the advantage of parallelization.
The correct way to do this is do it in batches. Also as this game does not actually require any data, you should just sent "I want to do it 1000 times" to the process, and let it do it, instead of sending the method one by one.
There's another interesting problem that you can easily find with viztracer:
This is the big picture of your worker process. Notice the large "nothing" in the end? Because your worker needs a timeout to finish, and that's when they are waiting. You should come up with a better idea to elegantly finish your worker process.
Updated my code. I fundamentally misunderstood the multiprocessing method.
def do_work(input, output):
"""Generic function that takes an input function and argument and runs it"""
while True:
try:
f, args = input.get(timeout=1)
results = f(*args)
output.put(results)
except:
output.put('Done')
break
def run_sim(game, num_sim):
"""Runs the game the given number of times"""
res = []
for i in range(num_sim):
res.append(game.play_game())
return res
def main():
input_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
g = Game(Player(False)) # set up game and player
num_sim = 2000000
for i in range(5):
input_queue.put((run_sim, (g, num_sim))) # run sim with game object and number of simulations passed into
# the queue
with Timer('Monty Hall Timer: ') as t:
processes = [] # list to save processes
for i in range(5):
p = multiprocessing.Process(target=do_work, args=(input_queue, output_queue))
processes.append(p)
p.start()
results = []
while True:
r = output_queue.get()
if r != 'Done':
results.append(r)
else:
break
# terminate processes
for p in processes:
p.terminate()
# combining the five returned list
flat_list = [item for sublist in results for item in sublist]
print(len(flat_list))
print(len(results))

How to measure time taken of multi-threads created in a loop?

I want to measure how much time it takes to finish running the code with multiple threads in python.
If I put join inside the loop, it will stop the loop (main thread) from keep creating new threads. It will run the sleep() one by one.
If I put join on the thread which I use to create thread_testing, the join won't work somehow. It prints out the time immediately.
def sleep(name):
print("{} going to sleep".format(name))
time.sleep(5)
print("{} wakes up after 5 seconds".format(name))
def thread_testing():
for i in range(3):
t = threading.Thread(target=sleep, name='thread' + str(i), args=(i,)
t.start()
# t.join() #1
if __name__ == '__main__':
start = time.time()
t = threading.Thread(target=thread_testing, name='threadx')
t.start()
t.join() #2
print(time.time() - start)
Desired output:
1 sleep
2 sleep
3 sleep
1 wake up after 5
2 wake up after 5
3 wake up after 5
5.xxx secs
Join will wait for your thread. That is why your threads were executed one by one.
What you have to do is:
Start all threads
Store them somewhere
Once everything is started wait for every thread to finish.
Assuming you don't need the first thread started in main:
import time
import threading
def sleep(name):
print("{} going to sleep".format(name))
time.sleep(5)
print("{} wakes up after 5 seconds".format(name))
def thread_testing():
threads = []
for i in range(3):
t = threading.Thread(target=sleep, name='thread' + str(i), args=(i,))
t.start()
threads.append(t)
for t in threads:
t.join()
if __name__ == '__main__':
start = time.time()
thread_testing()
print(time.time() - start)

Python - multithreading - Threads terminate in one case. In another they don't. Why?

Consider the following example I've been doing to learn multithreading. It's just an extended example of the Python 3.5 queue documentation.
It prints some numbers over 4 threads, produces one error in the queue, retries this element and should print the remaining queue if a KeyboardInterrupt exception occurs.
import threading
import queue
import time
import random
import traceback
def worker(q, active):
while True:
worker_item = q.get()
#if worker_item == None:
if not active.is_set():
break
time.sleep(random.random())
with threading.Lock():
if worker_item == 5 or worker_item == '5':
try:
print(threading.current_thread().name + ': ' + worker_item + ' | remaining queue: ' + str(list(q.queue)))
except TypeError:
print(threading.current_thread().name + ': ')
print(traceback.format_exc())
q.put(str(worker_item))
else:
print(threading.current_thread().name + ': ' + str(worker_item) + ' | remaining queue: ' + str(list(q.queue)))
q.task_done()
def main():
# INITIALIZE
num_threads = 4
stack1 = list(range(1, 21))
stack2 = list(range(101, 121))
q = queue.Queue()
active = threading.Event()
active.set()
# START THREADS
threads = []
for _ in range(num_threads):
t = threading.Thread(target=worker, args=(q, active))
t.start()
threads.append(t)
try:
# PUT STACK ITEMS ON QUEUE AND BLOCK UNTIL ALL TASKS ARE DONE
for stack1_item in stack1:
q.put(stack1_item)
q.join()
for stack2_item in stack2:
q.put(stack2_item)
q.join()
# STOP WORKER LOOP IN EVERY THREAD
#for _ in threads:
#q.put(None)
active.clear()
# WAIT UNTIL ALL THREADS TERMINATE
for t in threads:
t.join()
except KeyboardInterrupt:
print(traceback.format_exc())
print('remaining queue: ' + str(list(q.queue)))
#for _ in threads:
#q.put(None)
active.clear()
for t in threads:
t.join()
if __name__ == '__main__':
main()
If I run the script as it is (without a KeyboardInterrupt), it won't terminate. I have to kill the signal. But if I comment/uncomment the following lines (not using the event and doing it the docs way...)
comment / worker / if not active.is_set():
uncomment / worker / #if worker_item == None:
comment / main / active.clear()
uncomment / main / #for _ in threads:
#q.put(None)
comment / main / except / active.clear()
uncomment / main / except / #for _ in threads:
#q.put(None)
it does exit with exit code 0. Why?
Why is putting Nones to the queue necessary?
What would be the solution without putting Nones to the queue?
There are two types of threads: daemon and non-daemon. By default, all threads are non-daemon. The process is kept alive as long as there is at least one non-daemon thread.
This means that to stop the process, you either have to:
stop all of its threads (this is what your commented out code does by using None to kick the worker out of the infinite wait in q.get()); or
make the workers daemon threads, in which case the process will stop as soon as the main thread stops (this will require extra care if you want to ensure the workers have finished their tasks).

Categories

Resources