Python processes fail to start - python

I'm running the following code block in my application. While running it with python3.4 I get 'python quit unexpectedly' popup on my screen. The data missing from the aOut file is for a bunch of iterations and it is in chunks. Say 0-1000 items in the list are not present and others have the data. The other items run properly on their own without intervention.
While using python2.7 the failures are for items ~3400-4400 in the list.
On logging I see that, the detect() call are not made for processes from 0-1000 (i.e) process.start() calls dont trigger the detect method.
I am doing this on MAC OS Sierra. What is happening here? Is there a better way to achieve my purpose?
def detectInBatch (aList, aOut):
#iterate through the objects
processPool = []
pthreadIndex = 0
pIndex = 0
manager = Manager()
dict = manager.dict()
outline = ""
print("Threads: ", getMaxThreads()) # max threads is 20
for key in aList:
print("Key: %s, pIndex: %d"%(key.key, pIndex))
processPool.append(Process(target=detect, args=(key.key, dict)))
pthreadIndex = pthreadIndex + 1
pIndex = pIndex + 1
#print("Added for %d" %(pIndex))
if(pthreadIndex == getMaxThreads()):
print("ProcessPool size: %d" %len(processPool))
for process in processPool:
#print("Started")
process.start()
#end for
print("20 Processes started")
for process in processPool:
#print("Joined")
process.join()
#end for
print("20 Processes joined")
for key in dict.keys():
outline = outline + dict.get(key)
#end for
dict.clear()
pthreadIndex = 0
processPool = []
#endif
#endfor
if(pthreadIndex != 0):
for process in processPool:
# print("End Start")
process.start()
#end for
for process in processPool:
# print("End done")
process.join()
#end for
for key in dict.keys():
print ("Dict: " + dict.get(key))
outline = outline + dict.get(key)
#end for
#endif
aOut.write(outline)
#end method detectInBatch

To avoid the 'unexpected quit' perhaps try to ignore the exception with
try:
your_loop()
except:
pass
Then, put in some logging to track the root cause.

Related

Python Monty Hall: Multiprocessing slower than direct processing

I am trying out multiprocessing for my Monty Hall game simulation for improved performance. The game is payed 10mm times and takes ~17 seconds when directly run, however, my multiprocessing implementation is taking significantly longer to run. I am clearly doing something wrong but I can't figure out what.
import multiprocessing
from MontyHall.game import Game
from MontyHall.player import Player
from Timer.timer import Timer
def doWork(input, output):
while True:
try:
f = input.get(timeout=1)
res = f()
output.put(res)
except:
break
def main():
# game setup
player_1 = Player(True) # always switch strategy
game_1 = Game(player_1)
input_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
# total simulations
for i in range(10000000):
input_queue.put(game_1.play_game)
with Timer('timer') as t:
# initialize 5 child processes
processes = []
for i in range(5):
p = multiprocessing.Process(target=doWork, args=(input_queue, output_queue))
processes.append(p)
p.start()
# terminate the processes
for p in processes:
p.join()
results = []
while len(results) != 10000000:
r = output_queue.get()
results.append(r)
win = results.count(True) / len(results)
loss = results.count(False) / len(results)
print(len(results))
print(win)
print(loss)
if __name__ == '__main__':
main()
This is my first post. Advice on posting etiquette is also appreciated. Thank you.
Code for the Classes:
class Player(object):
def __init__(self, switch_door=False):
self._switch_door = switch_door
#property
def switch_door(self):
return self._switch_door
#switch_door.setter
def switch_door(self, iswitch):
self._switch_door = iswitch
def choose_door(self):
return random.randint(0, 2)
class Game(object):
def __init__(self, player):
self.player = player
def non_prize_door(self, door_with_prize, player_choice):
"""Returns a door that doesn't contain the prize and that isn't the players original choice"""
x = 1
while x == door_with_prize or x == player_choice:
x = (x + 1) % 3 # assuming there are only 3 doors. Can be modified for more doors
return x
def switch_function(self, open_door, player_choice):
"""Returns the door that isn't the original player choice and isn't the opened door """
x = 1
while x == open_door or x == player_choice:
x = (x + 1) % 3 # assuming there are only 3 doors. Can be modified for more doors
return x
def play_game(self):
"""Game Logic"""
# randomly places the prize behind one of the three doors
door_with_prize = random.randint(0, 2)
# player chooses a door
player_choice = self.player.choose_door()
# host opens a door that doesn't contain the prize
open_door = self.non_prize_door(door_with_prize, player_choice)
# final player choice
if self.player.switch_door:
player_choice = self.switch_function(open_door, player_choice)
# Result
return player_choice == door_with_prize
Code for running it without multiprocessing:
from MontyHall.game import Game
from MontyHall.player import Player
from Timer.timer import Timer
def main():
# Setting up the game
player_2 = Player(True) # always switch
game_1 = Game(player_2)
# Testing out the hypothesis
with Timer('timer_1') as t:
results = []
for i in range(10000000):
results.append(game_1.play_game())
win = results.count(True) / len(results)
loss = results.count(False) / len(results)
print(
f'When switch strategy is {player_2.switch_door}, the win rate is {win:.2%} and the loss rate is {loss:.2%}')
if __name__ == '__main__':
main()
As you did not give the full code that we can run locally, I can only speculate. My guess is that you are passing an object(a method from your game) to other processes so pickling and unpickling took too much time. Unlike multithreading where you can "share" data, in multiprocessing, you need to pack the data and send to the other process.
However, there's a rule I always follow when I try to optimize my code - profile before optimizing! It would be much better to KNOW what's slow than GUESS.
It's a multiprocessing program so there are not a lot of options in the market. You could try viztracer which supports multiprocessing.
pip install viztracer
viztracer --log_multiprocess your_program.py
It will generate a result.html that you can open with chrome. Or you can just do
vizviewer result.html
I would suggest to reduce the iteration number so you can have a view of the whole picture(because viztracer uses a circular buffer and 10 million iterations will definitely overflow). But, you can still get the last piece of your code executing if you don't, which should be helpful enough for you to figure out what's going on.
I used viztracer as you gave the whole code.
This is one of your iteration in your worker process. As you can tell, the actual working part is very small(the yellow-ish slice in the middle p...). Most of the time has been spent on receiving and putting data, which eliminates the advantage of parallelization.
The correct way to do this is do it in batches. Also as this game does not actually require any data, you should just sent "I want to do it 1000 times" to the process, and let it do it, instead of sending the method one by one.
There's another interesting problem that you can easily find with viztracer:
This is the big picture of your worker process. Notice the large "nothing" in the end? Because your worker needs a timeout to finish, and that's when they are waiting. You should come up with a better idea to elegantly finish your worker process.
Updated my code. I fundamentally misunderstood the multiprocessing method.
def do_work(input, output):
"""Generic function that takes an input function and argument and runs it"""
while True:
try:
f, args = input.get(timeout=1)
results = f(*args)
output.put(results)
except:
output.put('Done')
break
def run_sim(game, num_sim):
"""Runs the game the given number of times"""
res = []
for i in range(num_sim):
res.append(game.play_game())
return res
def main():
input_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
g = Game(Player(False)) # set up game and player
num_sim = 2000000
for i in range(5):
input_queue.put((run_sim, (g, num_sim))) # run sim with game object and number of simulations passed into
# the queue
with Timer('Monty Hall Timer: ') as t:
processes = [] # list to save processes
for i in range(5):
p = multiprocessing.Process(target=do_work, args=(input_queue, output_queue))
processes.append(p)
p.start()
results = []
while True:
r = output_queue.get()
if r != 'Done':
results.append(r)
else:
break
# terminate processes
for p in processes:
p.terminate()
# combining the five returned list
flat_list = [item for sublist in results for item in sublist]
print(len(flat_list))
print(len(results))

While loop in python doesn't end when it contains a lock

I'm currently learning to use threads in Python, and I'm playing around with this dummy bit of code for practice:
import threading
import queue
import time
my_queue = queue.Queue()
lock = threading.Lock()
for i in range(5):
my_queue.put(i)
def something_useful(CPU_number):
while not my_queue.empty():
lock.acquire()
print("\n CPU_C " + str(CPU_number) + ": " + str(my_queue.get()))
lock.release()
print("\n CPU_C " + str(CPU_number) + ": the next line is the return")
return
number_of_threads = 8
practice_threads = []
for i in range(number_of_threads):
thread = threading.Thread(target=something_useful, args=(i, ))
practice_threads.append(thread)
thread.start()
All this does is create a queue with 5 items, and pull them out and print them with different threads.
What I noticed, though, is that some of the threads aren't terminating properly. For example, if I later add something to the queue (e.g. my_queue.put(7)) then some thread will instantly print that number.
That's why I added the last print line print("\n CPU_C " + str(CPU_number) + ": the next line is the return"), and I noticed that only one thread will terminate. In other words, when I run the code above, only one thread will print "the next line is the return".
The weird thing is, this issue disappears when I remove the lock. Without the lock, it works perfectly fine.
What am I missing?
Actually it's not just 1 thread that will give the next line is the return. There can be anywhere between 1 to 8.
In my executions, sometimes i got 1,3,4,5,6,7 or 1,2,3,4,5,6,7 or 1,4,5,6,7 or only 5,6,7 etc.
You have a race-condition.
The race condition is in between the while check not my_queue.empty() and the lock.acquire()
Essentially, the .empty() could give you a "it is not empty" but before you acquired the lock, something else could have taken that value out. Hence you need to do your checks for these things within the lock.
Here is a safer implementation:
import threading
import queue
import time
my_queue = queue.Queue()
lock = threading.Lock()
for i in range(50):
my_queue.put(i)
def something_useful(CPU_number):
while True:
lock.acquire()
if not my_queue.empty():
print("CPU_C " + str(CPU_number) + ": " + str(my_queue.get()))
lock.release()
else:
lock.release()
break
print("CPU_C " + str(CPU_number) + ": the next line is the return")
return
number_of_threads = 8
practice_threads = []
for i in range(number_of_threads):
thread = threading.Thread(target=something_useful, args=(i, ))
practice_threads.append(thread)
thread.start()
Note: in you're current code as you're only getting the value - it's always a blocker i.e. only 1 thread at a time for the whole loop. Ideally you would do:
if not my_queue.empty():
val = my_queue.get()
lock.release()
print("CPU_C " + str(CPU_number) + ": " + str(val))
heavy_processing(val) # While this is going on another thread can read the next val

Daemon thread not exiting despite main program finishing

I've already referred to this thread, but it seems to be outdated
and there doesn't seem to be a clean explanation
Python daemon thread does not exit when parent thread exits
I'm running python 3.6 and trying to run the script from either IDLE or Spyder IDE.
Here is my code:
import threading
import time
total = 4
def creates_items():
global total
for i in range(10):
time.sleep(2)
print('added item')
total += 1
print('creation is done')
def creates_items_2():
global total
for i in range(7):
time.sleep(1)
print('added item')
total += 1
print('creation is done')
def limits_items():
#print('finished sleeping')
global total
while True:
if total > 5:
print ('overload')
total -= 3
print('subtracted 3')
else:
time.sleep(1)
print('waiting')
limitor = threading.Thread(target = limits_items, daemon = True)
creator1 = threading.Thread(target = creates_items)
creator2 = threading.Thread(target = creates_items_2)
print(limitor.isDaemon())
creator1.start()
creator2.start()
limitor.start()
creator1.join()
creator2.join()
print('our ending value of total is' , total)
limitor thread doesn't seem to be ending despite being a daemon thread.
Is this a way to get this working from IDLE or Spyder?
Thanks.
I had the same Problem and solved it by using multiprocessing instead of threading:
from multiprocessing import Process
import multiprocessing
from time import sleep
def daemon_thread():
for _ in range(10):
sleep(1)
print("Daemon")
if __name__ == '__main__':
multiprocessing.freeze_support()
sub_process = Process(target = daemon_thread, daemon = True)
sub_process.start()
print("Exiting Main")
I haven't yet really understood why I need the call to freeze_support() but it makes the code work.

Print value from inside thread during it's execution

I've just been trying to get threading working properly and I've hit a problem. The default thread module doesn't seem to be able to return values, so I looked up a solution and found this answer - how to get the return value from a thread in python?
I've got this working for getting multiple threads running, but I can't seem to print any values from inside the thread until they've all finished. Here is the code I currently have:
import random
from multiprocessing.pool import ThreadPool
#only 2 threads for now to make sure they don't all finish at once
pool = ThreadPool(processes=2)
#should take a few seconds to process
def printNumber(number):
num = random.randint( 50000, 500000 )
for i in range( num ):
if i % 10000 == 0:
print "Thread " + str( number ) + " progress: " + str( i )
test = random.uniform( 0, 10 ) ** random.uniform( 0, 1 )
return number
thread_list = []
#Execute threads
for i in range(1,10):
m = pool.apply_async(printNumber, (i,))
thread_list.append(m)
#Wait for values and get output
totalNum = 0
for i in range( len( thread_list ) ):
totalNum += thread_list[i].get()
print "Thread finished"
# Demonstrates that the main process waited for threads to complete
print "Done"
What happens, is you get 9x "Thread finished", then "Done", then everything that was printed by the threads.
However, remove the #wait for values part, and it prints them correctly. Is there any way I can keep it waiting for completion, but print things from inside the function?
Edit: Here is the output (a bit long to add to the post), it weirdly reverses the print order - http://pastebin.com/9ZRhg52Q

Is there a way to stop a Python thread when the correct answer is found?

I'm trying to check a list of answers like so:
def checkAns(File, answer):
answer = bytes(answer, "UTF-8")
try:
File.extractall(pwd=answer)
except:
pass
else:
print("[+] Correct Answer: " + answer.decode("UTF-8") + "\n")
def main():
File = zipfile.ZipFile("questions.zip")
ansFile = open("answers.txt")
for line in ansFile.readlines():
answer = line.strip("\n")
t = Thread(target=extractFile, args=(File, answer))
t.start()
Assume the correct answer is 4 and your list contains values 1 through 1000000.
How do I get it to stop after it gets to 4 and not run through the remaining numbers in the list?
I have tried it several different ways:
else:
print("[+] Correct Answer: " + answer.decode("UTF-8") + "\n")
exit(0)
and also
try:
File.extractall(pwd=answer)
print("[+] Correct Answer: " + answer.decode("UTF-8") + "\n")
exit(0)
except:
pass
How do I get all the threads to stop after the correct answer is found?
Strangely in Python you can't kill threads:
Python’s Thread class supports a subset of the behavior of Java’s
Thread class; currently, there are no priorities, no thread groups,
and threads cannot be destroyed, stopped, suspended, resumed, or
interrupted.
https://docs.python.org/2/library/threading.html#threading.ThreadError
This sample creates a thread that will run for 10 seconds. The parent then waits a second, then is "done", and waits (ie: join()s) the outstanding threads before exiting cleanly.
import sys, threading, time
class MyThread(threading.Thread):
def run(self):
for _ in range(10):
print 'ding'
time.sleep(1)
MyThread().start()
time.sleep(2)
print 'joining threads'
for thread in threading.enumerate():
if thread is not threading.current_thread():
thread.join()
print 'done'

Categories

Resources