I am a newbie in python programming, what I understand is that a process can be a daemon, but a thread in a daemon mode, I couldn't understand the usecase of this, I would request the python gurus to help me in understanding this.
Here is some basic code using threading:
import Queue
import threading
def basic_worker(queue):
while True:
item = queue.get()
# do_work(item)
print(item)
queue.task_done()
def basic():
# http://docs.python.org/library/queue.html
queue = Queue.Queue()
for i in range(3):
t = threading.Thread(target=basic_worker,args=(queue,))
t.daemon = True
t.start()
for item in range(4):
queue.put(item)
queue.join() # block until all tasks are done
print('got here')
basic()
When you run it, you get
% test.py
0
1
2
3
got here
Now comment out the line:
t.daemon = True
Run it again, and you'll see that the script prints the same result, but hangs.
The main thread ends (note that got here was printed), but the second thread never finishes.
In contrast, when t.daemon is set to True, the thread t is terminated when the main thread ends.
Note that "daemon threads" has little to do with daemon processes.
It looks like people intend to use Queue to explain threading, but I think there should be a much simpler way, by using time.sleep(), to demo a daemon thread.
Create daemon thread by setting the daemon parameter (default as None):
from threading import Thread
import time
def worker():
time.sleep(3)
print('daemon done')
thread = Thread(target=worker, daemon=True)
thread.start()
print('main done')
Output:
main done
Process finished with exit code 0
Remove the daemon argument, like:
thread = Thread(target=worker)
Re-run and see the output:
main done
daemon done
Process finished with exit code 0
Here we already see the difference of a daemon thread:
The entire Python program can exit if only daemon thread is left.
isDaemon() and setDaemon() are old getter/setter API. Using constructor argument, as above, or daemon property is recommended.
Module Queue has been renamed queue starting with Python3 to better reflect the fact that there are several queue classes (lifo, fifo, priority) in the module.
so please make the changes while using this example
In simple words...
What is a Daemon thread?
daemon threads can shut down any time in between their flow whereas non-daemon (i.e. user threads) execute completely.
daemon threads run intermittently in the background as long as other non-daemon threads are running.
When all of the non-daemon threads are complete, daemon threads terminate automatically (no matter whether they got fully executed or not).
daemon threads are service providers for user threads running in the same process.
python does not care about daemon threads to complete when in running state, NOT EVEN the finally block but python does give preference to non-daemon threads that are created by us.
daemon threads act as services in operating systems.
python stops the daemon threads when all user threads (in contrast to the daemon threads) are terminated. Hence daemon threads can be used to implement, for example, a monitoring functionality as the thread is stopped by the python as soon as all user threads have stopped.
In a nutshell
If you do something like this
thread = Thread(target=worker_method, daemon=True)
there is NO guarantee that worker_method will get executed completely.
Where does this behaviour be useful?
Consider two threads t1 (parent thread) and t2 (child thread). Let t2 be daemon. Now, you want to analyze the working of t1 while it is in running state; you can write the code to do this in t2.
Reference:
StackOverflow - What is a daemon thread in Java?
GeeksForGeeks - Python daemon threads
TutotrialsPoint - Concurrency in Python - Threads
Official Python Documentation
I've adapted #unutbu's answer for python 3. Make sure that you run this script from the command line and not some interactive environment like jupyter notebook.
import queue
import threading
def basic_worker(q):
while True:
item = q.get()
# do_work(item)
print(item)
q.task_done()
def basic():
q = queue.Queue()
for item in range(4):
q.put(item)
for i in range(3):
t = threading.Thread(target=basic_worker,args=(q,))
t.daemon = True
t.start()
q.join() # block until all tasks are done
print('got here')
basic()
So when you comment out the daemon line, you'll notice that the program does not finish, you'll have to interrupt it manually.
Setting the threads to daemon threads makes sure that they are killed once they have finished.
Note: you could achieve the same thing here without daemon threads, if you would replace the infinite while loop with another condition:
def basic_worker(q):
while not q.empty():
item = q.get()
# do_work(item)
print(item)
q.task_done()
Related
I have code like below
def run():
While True:
doSomething()
def main():
thread = threading.thread(target = run)
thread.setDaemon(True)
thread.start()
doSomethingElse()
if I Write code like above, when the main thread exits, the Deamon thread will exit, but maybe still in the process of doSomething.
The main function will be called outside, I am not allowed to use join in the main thread,
is there any way I can do to make the Daemon thread exit gracefully upon the main thread completion.
You can use thread threading.Event to signal child thread when to exit from main thread.
Example:
class DemonThead(threading.Thread):
def __init__(self):
self.shutdown_flag = threading.Event()
def run(self):
while not self.shutdown_flag:
# Run your code here
pass
def main_thread():
demon_thread = DemonThead()
demon_thread.setDaemon(True)
demon_thread.start()
# Stop your thread
demon_thread.shutdown_flag.set()
demon_thread.join()
You are not allowed to use join, but you can set an Event and do not use daemonic flag. Official doc is below:
Note: Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
The thing I cannot figure out is that although ThreadPoolExecutor uses daemon workers, they will still run even if main thread exit.
I can provide a minimal example in python3.6.4:
import concurrent.futures
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread_pool = concurrent.futures.ThreadPoolExecutor()
thread_pool.submit(fn)
while True:
time.sleep(1)
print("Wow")
Both main thread and the worker thread are infinite loops. So if I use KeyboardInterrupt to terminate main thread, I expect that the whole program will terminate too. But actually the worker thread is still running even though it is a daemon thread.
The source code of ThreadPoolExecutor confirms that worker threads are daemon thread:
t = threading.Thread(target=_worker,
args=(weakref.ref(self, weakref_cb),
self._work_queue))
t.daemon = True
t.start()
self._threads.add(t)
Further, if I manually create a daemon thread, it works like a charm:
from threading import Thread
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread = Thread(target=fn)
thread.daemon = True
thread.start()
while True:
time.sleep(1)
print("Wow")
So I really cannot figure out this strange behavior.
Suddenly... I found why. According to much more source code of ThreadPoolExecutor:
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpreter shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
atexit.register(_python_exit)
There is an exit handler which will join all unfinished worker...
Here's the way to avoid this problem. Bad design can be beaten by another bad design. People write daemon=True only if they really know that the worker won't damage any objects or files.
In my case, I created TreadPoolExecutor with a single worker and after a single submit I just deleted the newly created thread from the queue so the interpreter won't wait till this thread stops on its own. Notice that worker threads are created after submit, not after the initialization of TreadPoolExecutor.
import concurrent.futures.thread
from concurrent.futures import ThreadPoolExecutor
...
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(lambda: self._exec_file(args))
del concurrent.futures.thread._threads_queues[list(executor._threads)[0]]
It works in Python 3.8 but may not work in 3.9+ since this code is accessing private variables.
See the working piece of code on github
I am wondering the ways to end a worker thread in Python 3.
If you look at this code sample from this question the worker has a while True loop in it and all I see is that q.task_done() is called.
Why is this worker automatically ended?
Specifically I am interested in:
What options exist to end the workers infinite loop?
It seems like the only options would be to call break or return but I am not sure if those even kill the thread.
To be clear I actually want this thread to die when its task has completed and I do not see the ways to kill the thread documented anywhere.
#!python3
import threading
from queue import Queue
import time
# lock to serialize console output
lock = threading.Lock()
def do_work(item):
time.sleep(.1) # pretend to do some lengthy work.
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
print(threading.current_thread().name,item)
# The worker thread pulls an item from the queue and processes it
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
for i in range(4):
t = threading.Thread(target=worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
# stuff work items on the queue (in this case, just a number).
start = time.perf_counter()
for item in range(20):
q.put(item)
q.join() # block until all tasks are done
# "Work" took .1 seconds per task.
# 20 tasks serially would be 2 seconds.
# With 4 threads should be about .5 seconds (contrived because non-CPU intensive "work")
print('time:',time.perf_counter() - start)
What options exist to end the workers infinite loop?
Run with the while loop polling a threading.semaphore object instance, rather than a constant True boolean. Signal the semaphore from the killing thread when you want to kill the worker, and it will drop out of the loop.
If you want the main thread to wait for the worker to finish, then signal the semaphore, and then do a thread.join() to block the main thread until the worker has finished doing whatever it needs to do. Just remember to signal the semaphore first, or it will hang ;)
That said, you've daemonized the thread, so you don't need to kill it. The process will die when there are no non-daemon threads left alive. UPDATE To remove the daemon effect as you want the thread to exit cleanly, just remove this line:
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
I have some code that needs to run against several other systems that may hang or have problems not under my control. I would like to use python's multiprocessing to spawn child processes to run independent of the main program and then when they hang or have problems terminate them, but I am not sure of the best way to go about this.
When terminate is called it does kill the child process, but then it becomes a defunct zombie that is not released until the process object is gone. The example code below where the loop never ends works to kill it and allow a respawn when called again, but does not seem like a good way of going about this (ie multiprocessing.Process() would be better in the __init__()).
Anyone have a suggestion?
class Process(object):
def __init__(self):
self.thing = Thing()
self.running_flag = multiprocessing.Value("i", 1)
def run(self):
self.process = multiprocessing.Process(target=self.thing.worker, args=(self.running_flag,))
self.process.start()
print self.process.pid
def pause_resume(self):
self.running_flag.value = not self.running_flag.value
def terminate(self):
self.process.terminate()
class Thing(object):
def __init__(self):
self.count = 1
def worker(self,running_flag):
while True:
if running_flag.value:
self.do_work()
def do_work(self):
print "working {0} ...".format(self.count)
self.count += 1
time.sleep(1)
You might run the child processes as daemons in the background.
process.daemon = True
Any errors and hangs (or an infinite loop) in a daemon process will not affect the main process, and it will only be terminated once the main process exits.
This will work for simple problems until you run into a lot of child daemon processes which will keep reaping memories from the parent process without any explicit control.
Best way is to set up a Queue to have all the child processes communicate to the parent process so that we can join them and clean up nicely. Here is some simple code that will check if a child processing is hanging (aka time.sleep(1000)), and send a message to the queue for the main process to take action on it:
import multiprocessing as mp
import time
import queue
running_flag = mp.Value("i", 1)
def worker(running_flag, q):
count = 1
while True:
if running_flag.value:
print(f"working {count} ...")
count += 1
q.put(count)
time.sleep(1)
if count > 3:
# Simulate hanging with sleep
print("hanging...")
time.sleep(1000)
def watchdog(q):
"""
This check the queue for updates and send a signal to it
when the child process isn't sending anything for too long
"""
while True:
try:
msg = q.get(timeout=10.0)
except queue.Empty as e:
print("[WATCHDOG]: Maybe WORKER is slacking")
q.put("KILL WORKER")
def main():
"""The main process"""
q = mp.Queue()
workr = mp.Process(target=worker, args=(running_flag, q))
wdog = mp.Process(target=watchdog, args=(q,))
# run the watchdog as daemon so it terminates with the main process
wdog.daemon = True
workr.start()
print("[MAIN]: starting process P1")
wdog.start()
# Poll the queue
while True:
msg = q.get()
if msg == "KILL WORKER":
print("[MAIN]: Terminating slacking WORKER")
workr.terminate()
time.sleep(0.1)
if not workr.is_alive():
print("[MAIN]: WORKER is a goner")
workr.join(timeout=1.0)
print("[MAIN]: Joined WORKER successfully!")
q.close()
break # watchdog process daemon gets terminated
if __name__ == '__main__':
main()
Without terminating worker, attempt to join() it to the main process would have blocked forever since worker has never finished.
The way Python multiprocessing handles processes is a bit confusing.
From the multiprocessing guidelines:
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or active_children() is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s Process.is_alive will join the process. Even so it is probably good practice to explicitly join all the processes that you start.
In order to avoid a process to become a zombie, you need to call it's join() method once you kill it.
If you want a simpler way to deal with the hanging calls in your system you can take a look at pebble.
I have a script that does a bunch of things and I want to spawn a thread that monitors the cpu and memory usage of what's happening.
The monitoring portion is:
import psutil
import time
import datetime
def MonitorProcess():
procname = "firefox"
while True:
output_sys = open("/tmp/sysstats_counter.log", 'a')
for proc in psutil.process_iter():
if proc.name == procname:
p = proc
p.cmdline
proc_rss, proc_vms = p.get_memory_info()
proc_cpu = p.get_cpu_percent(1)
scol1 = str(proc_rss / 1024)
scol2 = str(proc_cpu)
now = str(datetime.datetime.now())
output_sys.write(scol1)
output_sys.write(", ")
output_sys.write(scol2)
output_sys.write(", ")
output_sys.write(now)
output_sys.write("\n")
output_sys.close( )
time.sleep(1)
I'm sure there's a better way to do the monitoring but I don't care at this point.
The main script calls:
RunTasks() # which runs the forground tasks
MonitorProcess() # Which is intended to monitor the tasks CPU and Memory Usage over time
I want to run both functions simultaneously. To do this I assume that I have to use the threading library. Is the approach then to do something like:
thread = threading.Thread(target=MonitorProcess())
thread.start
Or am I way off?
Also when the RunTasks() function finishes how do I get MonitorProcess() to automatically stop? I assume I could test for the process to be present and if it's not kill the function???
It sounds like you want a daemon thread. From the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
In your code:
thread = threading.Thread(target=MonitorProcess)
thread.daemon = True
thread.start()
The program will exit when main exits, even if the daemon thread is still active. You will want to run your foreground tasks after you set up and start your monitoring thread.