How to start another thread without waiting for function to finish? - python

Hey I am making a telegram bot and I need it to be able to run the same command multiple times at once.
dispatcher.add_handler(CommandHandler("send", send))
This is the command ^
And inside the command it starts a function:
sendmail(email, amount, update, context)
This function takes around 5seconds to finish. I want it so I can run it multiple times at once without needing to wait for it to finish. I tried the following:
Thread(target=sendmail(email, amount, update, context)).start()
This would give me no errors but It waits for function to finish then proceeds. I also tried this
with ThreadPoolExecutor(max_workers=100) as executor:
executor.submit(sendmail, email, amount, update, context).result()
but it gave me the following error:
No error handlers are registered, logging exception.
Traceback (most recent call last):
File "C:\Users\seal\AppData\Local\Programs\Python\Python310\lib\site-packages\telegram\ext\dispatcher.py", line 557, in process_update
handler.handle_update(update, self, check, context)
File "C:\Users\seal\AppData\Local\Programs\Python\Python310\lib\site-packages\telegram\ext\handler.py", line 199, in handle_update
return self.callback(update, context)
File "c:\Users\seal\Downloads\telegrambot\main.py", line 382, in sendmailcmd
executor.submit(sendmail, email, amount, update, context).result()
File "C:\Users\main\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 169, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

This is my first attempt at threading, but maybe try this:
import threading
x1 = threading.Thread(target=sendmail, args=(email, amount, update, context))
x1.start()
You can just put the x1 = threading... and x1.start() in a loop to have it run multiple times
Hope this helps

It's not waiting for one function to finish, to start another, but in python GIL (Global Interpreter Lock) executes only one thread at a given time. Since thread use multiple cores, time between two functions are negligible in most cases.
Following is the way to start threads with the ThreadPoolExecutor, please adjust it to your usecase.
def async_send_email(emails_to_send):
with ThreadPoolExecutor(max_workers=32) as executor:
futures = [
executor.submit(
send_email,
email=email_to_send.email,
amount=email_to_send.amount,
update=email_to_send.update,
context=email_to_send.context
)
for email_to_send in emails_to_send
]
for future, email_to_send in zip(futures, emails_to_send):
try:
future.result()
except Exception as e:
# Handle the exceptions.
continue
def send_email(email, amount, update, context):
# do what you want here.

Related

The child process exception is not visible when BrokenProcessPool is raised

I have the following code:
import asyncio
from concurrent.futures import ProcessPoolExecutor
PROCESS_POOL_EXECUTOR = ProcessPoolExecutor(max_workers=2)
def run_in_process(blocking_task, *args) -> Awaitable:
event_loop = asyncio.get_event_loop()
return event_loop.run_in_executor(PROCESS_POOL_EXECUTOR, blocking_task, *args)
It has been working fine until I added an additional volume and mounted it in an EC2 instance. After I did that, it is raising the following exception:
File "/proc/self/fd/3/repo/utils/asyncio.py", line 63, in run_in_process
return event_loop.run_in_executor(PROCESS_POOL_EXECUTOR, blocking_task, *args)
File "/conda/lib/python3.8/asyncio/base_events.py", line 783, in run_in_executor
executor.submit(func, *args), loop=self)
File "/conda/lib/python3.8/concurrent/futures/process.py", line 629, in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
There is nothing except this log. If I understood correctly, this means that the worker raised some exception and that's why the child process terminated. But I don't see that child process exception. That's why I have no idea what is going wrong.
It is probably related to that additional volume and mounting because it works without that new volume. I just don't know what exactly is going wrong.
I tried to run the code in ipython and it worked just fine there too.
I understand that this is a bad question that is not reproducible but maybe someone has seen this before and has some idea.

Discover what is blocking the event loop

I have thousands of asyncio tasks running.
Something is taking about 10 seconds to complete (some CPU intensive work).
This is making the program not work, as some tasks need to answer a message lets say in 5 seconds, on their network connection.
My current idea is to somehow intercept the event loop.
There must be some area in the asyncio module where it executes all current active tasks in an event loop, between each epoll()/select(). If I could insert a "elapsed = time.time()" before and "elapsed = time.time() - elapsed" after each task "resumed", I think it would be enough to find out the tasks that are taking too much time.
I think the related code may be here, at line 79:
https://github.com/python/cpython/blob/master/Lib/asyncio/events.py
def _run(self):
try:
self._context.run(self._callback, *self._args)
except (SystemExit, KeyboardInterrupt):
raise
except BaseException as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
But I don't know what to do here to print any useful info; I need a way to identify what line of my code this "self._context.run(...)" will execute.
I have passed the last 5 sleepless months trying to fix my code and had no success yet.
I have tried to use CProfiler, line_profile, but none of them helped.
They tell me the time it takes to execute a function and the time spent on each line. What I need to find out is how much time the code is taken between each loop iteration.
All those profiling/debugging tools I tried gave me no clue what should be fixed. And after rewriting the same program about 15 times in different ways I still can't have it working.
I'm just a non-professional programmer and still a newbie in Python, but if I cant solve this problem the next step will be learning learning Rust, which itself will be a huge pain in the ass and probably 3 years after I started, I will have this thing working, which supposed to take no more than 2 months.
By the way, there is a built-in cool feature inside asyncio (you can see the code source: here) which tells you if there is a "blocking" function.
You just need to enable the debugging mode (good for load tests).
How to enable the debug mode - you can find here all the options how.
Just edited file /usr/lib/python3.7/asyncio/events.py and added:
import time
import signal
import traceback
START_TIME = 0
def handler(signum, frame):
print('##########', time.time() - START_TIME)
traceback.print_stack()
signal.signal(signal.SIGALRM, handler)
And on line 79:
def _run(self):
global START_TIME
try:
signal.alarm(3)
START_TIME = time.time()
self._context.run(self._callback, *self._args)
signal.alarm(0)
except Exception as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
Now every time some asynchronous code block the event loop for 3 seconds it will show a message.
Found out my problem was with a simple "BeautifulSoup(page, 'html.parser')" where page was a 1mb html file with a big table.

How to use asyncio with ProcessPoolExecutor

I am searching for huge number of addresses on web, I want to use both asyncio and ProcessPoolExecutor in my task to quickly search the addresses.
async def main():
n_jobs = 3
addresses = [list of addresses]
_addresses = list_splitter(data=addresses, n=n_jobs)
with ProcessPoolExecutor(max_workers=n_jobs) as executor:
futures_list = []
for _address in _addresses:
futures_list +=[asyncio.get_event_loop().run_in_executor(executor, execute_parallel, _address)]
for f in tqdm(as_completed(futures_list, loop=asyncio.get_event_loop()), total=len(_addresses)):
results = await f
asyncio.get_event_loop().run_until_complete(main())
expected:
I want to execute_parallel function should run in parallel.
error:
Traceback (most recent call last):
File "/home/awaish/danamica/scraping/skraafoto/aerial_photos_scraper.py", line 228, in <module>
asyncio.run(main())
File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "/home/awaish/danamica/scraping/skraafoto/aerial_photos_scraper.py", line 224, in main
results = await f
File "/usr/local/lib/python3.7/asyncio/tasks.py", line 533, in _wait_for_one
return f.result() # May raise f.exception().
TypeError: can't pickle coroutine objects
I'm not sure I'm answering the correct question, but it appears the intent of your code is to run your execute_parallel function across several processes using Asyncio. As opposed to using ProcessPoolExecutor, why not try something like using a normal multiprocessing Pool and setting up separate Asyncio loops to run in each. You might set up one process per core and let Asyncio work its magic within each process.
async def run_loop(addresses):
loop = asyncio.get_event_loop()
loops = [loop.create_task(execute_parallel, address) for address in addresses]
loop.run_until_complete(asyncio.wait(loops))
def main():
n_jobs = 3
addresses = [list of addresses]
_addresses = list_splitter(data=addresses, n=n_jobs)
with multiprocessing.Pool(processes=n_jobs) as pool:
pool.imap_unordered(run_loop, _addresses)
I've used Pool.imap_unordered with great success, but depending on your needs you may prefer Pool.map or some other functionality. You can play around with chunksize or with the number of addresses in each list to achieve optimal results (ie, if you're getting a lot of timeouts you may want to reduce the number of addresses being processed concurrently)

Timeout and Exception Function Getting Stuck

I am polling data using a python 2.7.10 function that I want to timeout if a device takes too long to respond, or catch a RuntimeError if that device is not available.
I am using this Timeout function:
class Timeout():
class Timeout(Exception):
pass
def __init__(self, sec):
self.sec = sec
def __enter__(self):
signal.signal(signal.SIGALRM, self.raise_timeout)
signal.alarm(self.sec)
def __exit__(self, *args):
signal.alarm(0)
def raise_timeout(self, *args):
raise Timeout.Timeout()
This is my loop to make the data polls (Modbus) and catch the exceptions. This loop is called every 60 seconds:
def getDeviceTags(name, tag_data):
global val_returns
for tag in tag_data[name]:
local_vals = []
local_vals.append(name+"."+tag)
try:
with Timeout(3):
value = modbus.read(str(name), str(tag))
local_vals.append(str(value.value()))
except RuntimeError:
print("RuntimeError on " + str(name))
local_vals.append(None)
except Timeout.Timeout:
print("Timeout on " + str(name))
local_vals.append(None)
val_returns.append(local_vals)
This will work for DAYS at a time with no issues, both RuntimeErrors and Timeouts being printed to the console, all data logged - GREAT.
However, recently its been getting stuck - and this is the only error I'm getting:
Traceback (most recent call last):
File "working_one_min_back.py", line 161, in <module>
job()
File "working_one_min_back.py", line 79, in job
getDeviceTags(str(key), data)
File "working_one_min_back.py", line 57, in getDeviceTags
print("RuntimeError on " + str(name))
File "working_one_min_back.py", line 30, in raise_timeout
raise Timeout.Timeout()
__main__.Timeout
There’s no guarantee that a “Python signal” isn’t delivered after a call to alarm(0). The actual (C) signal might already have been delivered, causing the Python handler to be invoked a few bytecode instructions later.
If you call signal.signal from __exit__, any such pending signal is discarded, which usefully prevents mistaking it for the next one requested. Using that to restore the handler to the value it had before the Timeout was created (as returned by the first signal.signal call) is a good idea anyway. (Reset it after calling alarm(0) to prevent SIG_DFL from killing the process.)
In Python 3, such a call delivers any pending signals instead of discarding them, which is an improvement in that it prevents losing a signal just because the handler changed. (This is no more documented than the Python 2 behavior, unfortunately.) You can try to suppress such a late signal by setting an attribute in __exit__ and ignoring any (Python) signal raised when it is set.
Of course, the signal could be delivered after __exit__ begins execution and before the signal is discarded (or marked to be ignored). You therefore have to handle an operation both completing and timing out, perhaps by having several assignments to a single variable that is then appended in just one place.

Database errors in Django when using threading

I am working in a Django web application which needs to query a PostgreSQL database. When implementing concurrency using Python threading interface, I am getting DoesNotExist errors for the queried items. Of course, these errors do not occur when performing the queries sequentially.
Let me show a unit test which I wrote to demonstrate the unexpected behavior:
class ThreadingTest(TestCase):
fixtures = ['demo_city',]
def test_sequential_requests(self):
"""
A very simple request to database, made sequentially.
A fixture for the cities has been loaded above. It is supposed to be
six cities in the testing database now. We will made a request for
each one of the cities sequentially.
"""
for number in range(1, 7):
c = City.objects.get(pk=number)
self.assertEqual(c.pk, number)
def test_threaded_requests(self):
"""
Now, to test the threaded behavior, we will spawn a thread for
retrieving each city from the database.
"""
threads = []
cities = []
def do_requests(number):
cities.append(City.objects.get(pk=number))
[threads.append(threading.Thread(target=do_requests, args=(n,))) for n in range(1, 7)]
[t.start() for t in threads]
[t.join() for t in threads]
self.assertNotEqual(cities, [])
As you can see, the first test performs some database requests sequentially, which are indeed working with no problem. The second test, however, performs exactly the same requests but each request is spawned in a thread. This is actually failing, returning a DoesNotExist exception.
The output of the execution of this unit tests is like this:
test_sequential_requests (cesta.core.tests.threadbase.ThreadingTest) ... ok
test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest) ...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
... other threads returns a similar output ...
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
FAIL
======================================================================
FAIL: test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 52, in test_threaded_requests
self.assertNotEqual(cities, [])
AssertionError: [] == []
----------------------------------------------------------------------
Ran 2 tests in 0.278s
FAILED (failures=1)
Destroying test database for alias 'default' ('test_cesta')...
Remember that all this is happening in a PostgreSQL database, which is supposed to be thread safe, not with the SQLite or similars. Test was ran using PostgreSQL also.
At this point, I am totally lost about what can be failing. Any idea or suggestion?
Thanks!
EDIT: I wrote a little view just to check up if it works out of the tests. Here is the code of the view:
def get_cities(request):
queue = Queue.Queue()
def get_async_cities(q, n):
city = City.objects.get(pk=n)
q.put(city)
threads = [threading.Thread(target=get_async_cities, args=(queue, number)) for number in range(1, 5)]
[t.start() for t in threads]
[t.join() for t in threads]
cities = list()
while not queue.empty():
cities.append(queue.get())
return render_to_response('async/cities.html', {'cities': cities},
context_instance=RequestContext(request))
(Please, do not take into account the folly of writing the application logic inside the view code. Remember that this is only a proof of concept and would not be never in the real app.)
The result is that code is working nice, the requests are made successfully in threads and the view finally shows the cities after calling its URL.
So, I think making queries using threads will only be a problem when you need to test the code. In production, it will work without any problem.
Any useful suggestions to test this kind of code successfully?
Try using TransactionTestCase:
class ThreadingTest(TransactionTestCase):
TestCase keeps data in memory and doesn't issue a COMMIT to database. Probably the threads are trying to connect directly to DB, while the data is not commited there yet. Seedescription here:
https://docs.djangoproject.com/en/dev/topics/testing/?from=olddocs#django.test.TransactionTestCase
TransactionTestCase and TestCase are identical except for the manner
in which the database is reset to a known state and the ability for
test code to test the effects of commit and rollback. A
TransactionTestCase resets the database before the test runs by
truncating all tables and reloading initial data. A
TransactionTestCase may call commit and rollback and observe the
effects of these calls on the database.
Becomes more clear from this part of the documentation
class LiveServerTestCase(TransactionTestCase):
"""
...
Note that it inherits from TransactionTestCase instead of TestCase because
the threads do not share the same transactions (unless if using in-memory
sqlite) and each thread needs to commit all their transactions so that the
other thread can see the changes.
"""
Now, the transaction has not been committed inside a TestCase, hence the changes are not visible to the other thread.
This sounds like it's an issue with transactions. If you're creating elements within the current request (or test), they're almost certainly in an uncommitted transaction that isn't accessible from the separate connection in the other thread. You probably need to manage your transctions manually to get this to work.

Categories

Resources