Python - Cannot join thread - No multiprocessing - python

I have this piece of code in my program. Where OnDone function is an event in a wxPython GUI. When I click the button DONE, the OnDone event fires up, which then does some functionality and starts the thread self.tstart - with target function StartEnable. This thread I want to join back using self.tStart.join(). However I am getting an error as follows:
Exception in thread StartEnablingThread:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "//wagnernt.wagnerspraytech.com/users$/kundemj/windows/my documents/Production GUI/Trial python Codes/GUI_withClass.py", line 638, in StartEnable
self.tStart.join()
File "C:\Python27\lib\threading.py", line 931, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
I have not got this type of error before. Could any one of you guys tell me what I am missing here.
def OnDone(self, event):
self.WriteToController([0x04],'GuiMsgIn')
self.status_text.SetLabel('PRESSURE CALIBRATION DONE \n DUMP PRESSURE')
self.led1.SetBackgroundColour('GREY')
self.add_pressure.Disable()
self.tStart = threading.Thread(target=self.StartEnable, name = "StartEnablingThread", args=())
self.tStart.start()
def StartEnable(self):
while True:
time.sleep(0.5)
if int(self.pressure_text_control.GetValue()) < 50:
print "HELLO"
self.start.Enable()
self.tStart.join()
print "hello2"
break
I want to join the thread after the "if" condition has executed. Until them I want the thread to run.

Joining Is Waiting
Joining a thread actually means waiting fo another thread to finish.
So, in thread1, there can be code which says:
thread2.join()
That means "stop here and do not execute the next line of code until thread2 is finished".
If you did (in thread1) the following, that would fail with the error from the question:
thread1.join() # RuntimeError: cannot join current thread
Joining Is Not Stopping
Calling thread2.join() does not cause thread2 to stop, nor even signal to it in any way that it should stop.
A thread stops when its target function exits. Often, a thread is implemented as a loop which checks for a signal (a variable) which tells it to stop, e.g.
def run():
while whatever:
# ...
if self.should_abort_immediately:
print 'aborting'
return
Then, the way to stop the thread is to do:
thread2.should_abort_immediately = True # tell the thread to stop
thread2.join() # entirely optional: wait until it stops
The Code from the Question
That code already implements the stopping correctly with the break. The join should just be deleted.
if int(self.pressure_text_control.GetValue()) < 50:
print "HELLO"
self.start.Enable()
print "hello2"
break

When the StartEnable method is executing, it is running on the StartEnablingThread you created in the __init__ method. You cannot join the current thread. This is clearly stated in the documentation for the join call.
join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.

I have some bad news. Threading in Python is pointless and you best bet to look at using only 1 thread or use multi process. If you will need to look at thread then you will need to look at a different language like C# or C. Have look at https://docs.python.org/2/library/multiprocessing.html
The reason that threading is pointless in python is because of the global interpreter lock (GIL). This make you only able to use one thread at a time, so no multi-threading in python but there are people working on it. http://pypy.org/

Related

Why can asyncio event loop sometimes finish a task even when encountering `RuntimeError`?

I've been playing around with Python's asyncio. I think I have a reasonable understanding by now. But the following behavior puzzles me.
test.py:
from threading import Thread
import asyncio
async def wait(t):
await asyncio.sleep(t)
print(f'waited {t} sec')
def run(loop):
loop.run_until_complete(wait(2))
loop = asyncio.get_event_loop()
t = Thread(target=run, args=(loop,))
t.start()
loop.run_until_complete(wait(1))
t.join()
This code is wrong. I know that. The event loop can't be run while it's running, and it's generally not thread safe.
My question: why can wait(1) sometimes still finish its job?
Here's the output from two consecutive runs:
>>> py test.py
... Traceback (most recent call last):
... File "test.py", line 14, in <module>
... loop.run_until_complete(wait(1))
... File "C:\Python\Python37\lib\asyncio\base_events.py", line 555, in run_until_complete
... self.run_forever()
... File "C:\Python\Python37\lib\asyncio\base_events.py", line 510, in run_forever
...
... raise RuntimeError('This event loop is already running')
... RuntimeError: This event loop is already running
... waited 2 sec
>>> py test.py
... Traceback (most recent call last):
... File "test.py", line 14, in <module>
... loop.run_until_complete(wait(1))
... File "C:\Python\Python37\lib\asyncio\base_events.py", line 555, in run_until_c
... omplete
... self.run_forever()
... File "C:\Python\Python37\lib\asyncio\base_events.py", line 510, in run_forever
...
... raise RuntimeError('This event loop is already running')
... RuntimeError: This event loop is already running
... waited 1 sec
... waited 2 sec
The first run's behavior is what I expected - the main thread fails, but the event loop still runs wait(2) to finish in the thread t.
The second run is puzzling, how can wait(1) do its job when the RuntimeError is already thrown? I guess it has to do with thread synchronization and the non-thread-safe nature of the event loop. But I don't know exactly how this works.
Ohhh... never mind. I read the code of asyncio and figured it out. It's actually quite simple.
run_until_complete calls ensure_future(future, loop=self) before it checks self.is_running() (which is done in run_forever). Since the loop is already running, it can pick up the task before the RuntimeError is thrown. Of course it doesn't always happen because of the race condition.
Exceptions are thrown per thread. The runtime error is raised in a different thread from the event loop. The event loop continues to execute, regardless.
And wait(1) can sometimes finish it's job because you can get lucky. The asyncio loop internal data structures are not guarded against race conditions caused by using threads (which is why there are specific thread-support methods you should use instead). But the nature of race conditions is such that it depends on the exact order of events and that order can change each time you run your program, depending on what else your OS is doing at the time.
The run_until_complete() method first calls asyncio.ensure_task() to add the coroutine to the task queue with a 'done' callback attached that will stop the event loop again, then calls loop.run_forever(). When the coroutine returns, the callback stops the loop. The loop.run_forever() call throws the RuntimeError here.
When you do this from a thread, the task gets added to a deque object attached to the loop, and if that happens at the right moment (e.g. when the running loop is not busy emptying the queue), the running loop in the main thread will find it, and execute it, even if the loop.run_forever() call raised an exception.
All this relies on implementation details. Different versions of Python will probably exhibit different behaviour here, and if you install an alternative loop (e.g. uvloop), there will almost certainly be different behaviour again.
If you want to schedule coroutines from a different thread, use asyncio.run_coroutine_threadsafe(); it would :
from threading import Thread
import asyncio
async def wait(t):
print(f'going to wait {t} seconds')
await asyncio.sleep(t)
print(f'waited {t} sec')
def run(loop):
asyncio.run_coroutine_threadsafe(wait(2), loop)
loop = asyncio.get_event_loop()
t = Thread(target=run, args=(loop,))
t.start()
loop.run_until_complete(wait(1))
t.join()
The above doesn't actually complete the wait(2) coroutine because the wait(1) coroutine is being run with loop.run_until_complete() so its callback stops the loop again before the 2 second wait is over. But the coroutine is actually started:
going to wait 1 seconds
going to wait 2 seconds
waited 1 sec
but if you made the main-thread coroutine take longer (with, say, wait(3)) then the one scheduled from the thread would also complete. You'd have to do additional work to ensure that there are no more pending tasks scheduled to run with the loop before you shut it down.

When is the right time to call loop.close()?

I have been experimenting with asyncio for a little while and read the PEPs; a few tutorials; and even the O'Reilly book.
I think I got the hang of it, but I'm still puzzled by the behavior of loop.close() which I can't quite figure out when it is "safe" to invoke.
Distilled to its simplest, my use case is a bunch of blocking "old school" calls, which I wrap in the run_in_executor() and an outer coroutine; if any of those calls goes wrong, I want to stop progress, cancel the ones still outstanding, print a sensible log and then (hopefully, cleanly) get out of the way.
Say, something like this:
import asyncio
import time
def blocking(num):
time.sleep(num)
if num == 2:
raise ValueError("don't like 2")
return num
async def my_coro(loop, num):
try:
result = await loop.run_in_executor(None, blocking, num)
print(f"Coro {num} done")
return result
except asyncio.CancelledError:
# Do some cleanup here.
print(f"man, I was canceled: {num}")
def main():
loop = asyncio.get_event_loop()
tasks = []
for num in range(5):
tasks.append(loop.create_task(my_coro(loop, num)))
try:
# No point in waiting; if any of the tasks go wrong, I
# just want to abandon everything. The ALL_DONE is not
# a good solution here.
future = asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
done, pending = loop.run_until_complete(future)
if pending:
print(f"Still {len(pending)} tasks pending")
# I tried putting a stop() - with/without a run_forever()
# after the for - same exception raised.
# loop.stop()
for future in pending:
future.cancel()
for task in done:
res = task.result()
print("Task returned", res)
except ValueError as error:
print("Outer except --", error)
finally:
# I also tried placing the run_forever() here,
# before the stop() - no dice.
loop.stop()
if pending:
print("Waiting for pending futures to finish...")
loop.run_forever()
loop.close()
I tried several variants of the stop() and run_forever() calls, the "run_forever first, then stop" seems to be the one to use according to the pydoc and, without the call to close() yields a satisfying:
Coro 0 done
Coro 1 done
Still 2 tasks pending
Task returned 1
Task returned 0
Outer except -- don't like 2
Waiting for pending futures to finish...
man, I was canceled: 4
man, I was canceled: 3
Process finished with exit code 0
However, when the call to close() is added (as shown above) I get two exceptions:
exception calling callback for <Future at 0x104f21438 state=finished returned int>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
callback(self)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/futures.py", line 414, in _call_set_state
dest_loop.call_soon_threadsafe(_set_state, destination, source)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 620, in call_soon_threadsafe
self._check_closed()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 357, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
which is at best annoying, but to me, totally puzzling: and, to make matter worse, I've been unable to figure out what would The Right Way of handling such a situation.
Thus, two questions:
what am I missing? how should I modify the code above in a way that with the call to close() included does not raise?
what actually happens if I don't call close() - in this trivial case, I presume it's largely redundant; but what might the consequences be in a "real" production code?
For my own personal satisfaction, also:
why does it raise at all? what more does the loop want from the coros/tasks: they either exited; raised; or were canceled: isn't this enough to keep it happy?
Many thanks in advance for any suggestions you may have!
Distilled to its simplest, my use case is a bunch of blocking "old school" calls, which I wrap in the run_in_executor() and an outer coroutine; if any of those calls goes wrong, I want to stop progress, cancel the ones still outstanding
This can't work as envisioned because run_in_executor submits the function to a thread pool, and OS threads can't be cancelled in Python (or in other languages that expose them). Canceling the future returned by run_in_executor will attempt to cancel the underlying concurrent.futures.Future, but that will only have effect if the blocking function is not yet running, e.g. because the thread pool is busy. Once it starts to execute, it cannot be safely cancelled. Support for safe and reliable cancellation is one of the benefits of using asyncio compared to threads.
If you are dealing with synchronous code, be it a legacy blocking call or longer-running CPU-bound code, you should run it with run_in_executor and incorporate a way to interrupt it. For example, the code could occasionally check a stop_requested flag and exit if that is true, perhaps by raising an exception. Then you can "cancel" those tasks by setting the appropriate flag or flags.
how should I modify the code above in a way that with the call to close() included does not raise?
As far as I can tell, there is currently no way to do so without modifications to blocking and the top-level code. run_in_executor will insist on informing the event loop of the result, and this fails when the event loop is closed. It doesn't help that the asyncio future is cancelled, because the cancellation check is performed in the event loop thread, and the error occurs before that, when call_soon_threadsafe is called by the worker thread. (It might be possible to move the check to the worker thread, but it should be carefully analyzed whether it leads a race condition between the call to cancel() and the actual check.)
why does it raise at all? what more does the loop want from the coros/tasks: they either exited; raised; or were canceled: isn't this enough to keep it happy?
It wants the blocking functions passed to run_in_executor (literally called blocking in the question) that have already been started to finish running before the event loop is closed. You cancelled the asyncio future, but the underlying concurrent future still wants to "phone home", finding the loop closed.
It is not obvious whether this is a bug in asyncio, or if you are simply not supposed to close an event loop until you somehow ensure that all work submitted to run_in_executor is done. Doing so requires the following changes:
Don't attempt to cancel the pending futures. Canceling them looks correct superficially, but it prevents you from being able to wait() for those futures, as asyncio will consider them complete.
Instead, send an application-specific event to your background tasks informing them that they need to abort.
Call loop.run_until_complete(asyncio.wait(pending)) before loop.close().
With these modifications (except for the application-specific event - I simply let the sleep()s finish their course), the exception did not appear.
what actually happens if I don't call close() - in this trivial case, I presume it's largely redundant; but what might the consequences be in a "real" production code?
Since a typical event loop runs as long as the application, there should be no issue in not call close() at the very end of the program. The operating system will clean up the resources on program exit anyway.
Calling loop.close() is important for event loops that have a clear lifetime. For example, a library might create a fresh event loop for a specific task, run it in a dedicated thread, and dispose of it. Failing to close such a loop could leak its internal resources (such as the pipe it uses for inter-thread wakeup) and cause the program to fail. Another example are test suites, which often start a new event loop for each unit test to ensure separation of test environments.
EDIT: I filed a bug for this issue.
EDIT 2: The bug was fixed by devs.
Until the upstream issue is fixed, another way to work around the problem is by replacing the use of run_in_executor with a custom version without the flaw. While rolling one's own run_in_executor sounds like a bad idea at first, it is in fact only a small glue between a concurrent.futures and an asyncio future.
A simple version of run_in_executor can be cleanly implemented using the public API of those two classes:
def run_in_executor(executor, fn, *args):
"""Submit FN to EXECUTOR and return an asyncio future."""
loop = asyncio.get_event_loop()
if args:
fn = functools.partial(fn, *args)
work_future = executor.submit(fn)
aio_future = loop.create_future()
aio_cancelled = False
def work_done(_f):
if not aio_cancelled:
loop.call_soon_threadsafe(set_result)
def check_cancel(_f):
nonlocal aio_cancelled
if aio_future.cancelled():
work_future.cancel()
aio_cancelled = True
def set_result():
if work_future.cancelled():
aio_future.cancel()
elif work_future.exception() is not None:
aio_future.set_exception(work_future.exception())
else:
aio_future.set_result(work_future.result())
work_future.add_done_callback(work_done)
aio_future.add_done_callback(check_cancel)
return aio_future
When loop.run_in_executor(blocking) is replaced with run_in_executor(executor, blocking), executor being a ThreadPoolExecutor created in main(), the code works without other modifications.
Of course, in this variant the synchronous functions will continue running in the other thread to completion despite being canceled -- but that is unavoidable without modifying them to support explicit interruption.

Strange behaviour when task added to empty loop in different thread

I have an app which adds coroutines to an already-running event loop.
The arguments for these coroutines depend on I/O and are not available when I initially start the event loop - with loop.run_forever(), so I add the tasks later. To demonstrate the phenomenon, here is some example code:
import asyncio
from threading import Thread
from time import sleep
loop = asyncio.new_event_loop()
def foo():
loop.run_forever()
async def bar(s):
while True:
await asyncio.sleep(1)
print("s")
#loop.create_task(bar("A task created before thread created & before loop started"))
t = Thread(target=foo)
t.start()
sleep(1)
loop.create_task(bar("secondary task"))
The strange behaviour is that everything works as expected when there is at least one task in the loop when invoking loop.run_forever(). i.e. when the commented line is not commented out.
But when it is commented out, as shown above, nothing is printed and it appears I am unable to add a task to the event_loop. Should I avoid invoking run_forever() without adding a single task? I don't see why this should be a problem. Adding tasks to an event_loop after it is running is standard, why should the empty case be an issue?
Adding tasks to an event_loop after it is running is standard, why should the empty case be an issue?
Because you're supposed to add tasks from the thread running the event loop. In general one should not mix threads and asyncio, except through APIs designed for that purpose, such as loop.run_in_executor.
If you understand this and still have good reason to add tasks from a separate thread, use asyncio.run_coroutine_threadsafe. Change loop.create_task(bar(...)) to:
asyncio.run_coroutine_threadsafe(bar("in loop"), loop=loop)
run_coroutine_threadsafe accesses the event loop in a thread-safe manner, and also ensures that the event loop wakes up to notice the new task, even if it otherwise has nothing to do and is just waiting for IO/timeouts.
Adding another task beforehand only appeared to work because bar happens to be an infinite coroutine that makes the event loop wake up every second. Once the event loop wakes up for any reason, it executes all runnable tasks regardless of which thread added them. It would be a really bad idea to rely on this, though, because loop.create_task is not thread-safe, so there could be any number of race conditions if it executed in parallel with a running event loop.
Because loop.create_task is not thread safe, and if you set loop._debug = True, you should see the error as
Traceback (most recent call last):
File "test.py", line 23, in <module>
loop.create_task(bar("secondary task"))
File "/Users/soulomoon/.pyenv/versions/3.6.3/lib/python3.6/asyncio/base_events.py", line 284, in create_task
task = tasks.Task(coro, loop=self)
File "/Users/soulomoon/.pyenv/versions/3.6.3/lib/python3.6/asyncio/base_events.py", line 576, in call_soon
self._check_thread()
File "/Users/soulomoon/.pyenv/versions/3.6.3/lib/python3.6/asyncio/base_events.py", line 615, in _check_thread
"Non-thread-safe operation invoked on an event loop other "
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one

Basic python multi-threading issue

New to python and trying to understand multi-threading. Here's an example from python documentation on Queue
For the heck of my life, I don't understand how this example is working. In the worker() function, there's an infinite loop. How does the worker know when to get out of the loop? There seems to be no breaking condition.
And what exactly is the join doing at the end? Shouldn't I be joining the threads instead?
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
Also another question, When should multithreading be used and when should multiprocessing be used?
Yup. You're right. worker will run forever. However since Queue only has a finite number of items, eventually worker will permanently block at q.get() (Since there will be no more items in the queue). At this point, it's inconsequential that worker is still running. q.join() blocks until the Queue count drops to 0 (whenever the worker thread calls q.task_done, the count drops by 1). After that, the program ends. And the infinitely blocking thread dies with it's creator.
Regarding your second question, the biggest difference between threads and processes in Python is that the mainstream implementations use a global interpreter lock (GIL) to ensure that multiple threads can't mess up Python's internal data structures. This means that for programs that spend most of their time doing computation in pure Python, even with multiple CPUs you're not going to speed the program up much because only one thread at a time can hold the GIL. On the other hand, multiple threads can trivially share data in a Python program, and in some (but by no means all) cases, you don't have to worry too much about thread safety.
Where multithreading can speed up a Python program is when the program spends most of its time waiting on I/O -- disk access or, particularly these days, network operations. The GIL is not held while doing I/O, so many Python threads can run concurrently in I/O bound applications.
On the other hand, with multiprocessing, each process has its own GIL, so your performance can scale to the number of CPU cores you have available. The down side is that all communication between the processes will have to be done through a multiprocessing.Queue (which acts on the surface very like a Queue.Queue, but has very different underlying mechanics, since it has to communicate across process boundaries).
Since working through a thread safe or interprocess queue avoids a lot of potential threading problems, and since Python makes it so easy, the multiprocessing module is very attractive.
Agree with joel-cornett, mostly. I tried to run the following snippet in python2.7 :
from threading import Thread
from Queue import Queue
def worker():
def do_work(item):
print(item)
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(4):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in range(10):
q.put(item)
q.join()
The output is:
0
1
2
3
4
5
6
7
8
9
Exception in thread Thread-3 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 504, in run
File "abc.py", line 9, in worker
File "/usr/lib/python2.7/Queue.py", line 168, in get
File "/usr/lib/python2.7/threading.py", line 236, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
Most probable explanation i think:
As the queue gets empty after task exhaustion, parent thread quits, after returning from q.join() and destroys the queue. Child threads are terminated upon receiving the first TypeError exception produced in "item = q.get()", as the queue exists no more.

python multiprocessing pool, wait for processes and restart custom processes

I used python multiprocessing and do wait of all processes with this code:
...
results = []
for i in range(num_extract):
url = queue.get(timeout=5)
try:
print "START PROCESS!"
result = pool.apply_async(process, [host,url],callback=callback)
results.append(result)
except Exception,e:
continue
for r in results:
r.get(timeout=7)
...
i try to use pool.join but get error:
Traceback (most recent call last):
File "C:\workspace\sdl\lxchg\walker4.py", line 163, in <module>
pool.join()
File "C:\Python25\Lib\site-packages\multiprocessing\pool.py", line 338, in joi
n
assert self._state in (CLOSE, TERMINATE)
AssertionError
Why join dont work? And what is the good way to wait all processes.
My second question is how can i restart certain process in pool? i need this in the reason of memory leak. Now In fact i do rebuild of all pool after all processes done their tasks (create new object pool to do process restarting).
What i need: for example i have 4 process in pool. Then process get his task, after task is done i need to kill process and start new (to refresh memory leak).
You are getting the error because you need to call pool.close() before calling pool.join()
I don't know of a good way to shut down a process started with apply_async but see if properly shutting down the pool doesn't make your memory leak go away.
The reason I think this is that the Pool class has a bunch of attributes that are threads running in daemon mode. All of these threads get cleaned up by the join method. The code you have now won't clean them up so if you create a new Pool, you'll still have all those threads running from the last one.

Categories

Resources