Timeout and Exception Function Getting Stuck

Timeout and Exception Function Getting Stuck - python

I am polling data using a python 2.7.10 function that I want to timeout if a device takes too long to respond, or catch a RuntimeError if that device is not available.
I am using this Timeout function:
class Timeout():
class Timeout(Exception):
pass
def __init__(self, sec):
self.sec = sec
def __enter__(self):
signal.signal(signal.SIGALRM, self.raise_timeout)
signal.alarm(self.sec)
def __exit__(self, *args):
signal.alarm(0)
def raise_timeout(self, *args):
raise Timeout.Timeout()
This is my loop to make the data polls (Modbus) and catch the exceptions. This loop is called every 60 seconds:
def getDeviceTags(name, tag_data):
global val_returns
for tag in tag_data[name]:
local_vals = []
local_vals.append(name+"."+tag)
try:
with Timeout(3):
value = modbus.read(str(name), str(tag))
local_vals.append(str(value.value()))
except RuntimeError:
print("RuntimeError on " + str(name))
local_vals.append(None)
except Timeout.Timeout:
print("Timeout on " + str(name))
local_vals.append(None)
val_returns.append(local_vals)
This will work for DAYS at a time with no issues, both RuntimeErrors and Timeouts being printed to the console, all data logged - GREAT.
However, recently its been getting stuck - and this is the only error I'm getting:
Traceback (most recent call last):
File "working_one_min_back.py", line 161, in <module>
job()
File "working_one_min_back.py", line 79, in job
getDeviceTags(str(key), data)
File "working_one_min_back.py", line 57, in getDeviceTags
print("RuntimeError on " + str(name))
File "working_one_min_back.py", line 30, in raise_timeout
raise Timeout.Timeout()
__main__.Timeout

There’s no guarantee that a “Python signal” isn’t delivered after a call to alarm(0). The actual (C) signal might already have been delivered, causing the Python handler to be invoked a few bytecode instructions later.
If you call signal.signal from __exit__, any such pending signal is discarded, which usefully prevents mistaking it for the next one requested. Using that to restore the handler to the value it had before the Timeout was created (as returned by the first signal.signal call) is a good idea anyway. (Reset it after calling alarm(0) to prevent SIG_DFL from killing the process.)
In Python 3, such a call delivers any pending signals instead of discarding them, which is an improvement in that it prevents losing a signal just because the handler changed. (This is no more documented than the Python 2 behavior, unfortunately.) You can try to suppress such a late signal by setting an attribute in __exit__ and ignoring any (Python) signal raised when it is set.
Of course, the signal could be delivered after __exit__ begins execution and before the signal is discarded (or marked to be ignored). You therefore have to handle an operation both completing and timing out, perhaps by having several assignments to a single variable that is then appended in just one place.

Related

The child process exception is not visible when BrokenProcessPool is raised

I have the following code:
import asyncio
from concurrent.futures import ProcessPoolExecutor
PROCESS_POOL_EXECUTOR = ProcessPoolExecutor(max_workers=2)
def run_in_process(blocking_task, *args) -> Awaitable:
event_loop = asyncio.get_event_loop()
return event_loop.run_in_executor(PROCESS_POOL_EXECUTOR, blocking_task, *args)
It has been working fine until I added an additional volume and mounted it in an EC2 instance. After I did that, it is raising the following exception:
File "/proc/self/fd/3/repo/utils/asyncio.py", line 63, in run_in_process
return event_loop.run_in_executor(PROCESS_POOL_EXECUTOR, blocking_task, *args)
File "/conda/lib/python3.8/asyncio/base_events.py", line 783, in run_in_executor
executor.submit(func, *args), loop=self)
File "/conda/lib/python3.8/concurrent/futures/process.py", line 629, in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
There is nothing except this log. If I understood correctly, this means that the worker raised some exception and that's why the child process terminated. But I don't see that child process exception. That's why I have no idea what is going wrong.
It is probably related to that additional volume and mounting because it works without that new volume. I just don't know what exactly is going wrong.
I tried to run the code in ipython and it worked just fine there too.
I understand that this is a bad question that is not reproducible but maybe someone has seen this before and has some idea.

How to start another thread without waiting for function to finish?

Hey I am making a telegram bot and I need it to be able to run the same command multiple times at once.
dispatcher.add_handler(CommandHandler("send", send))
This is the command ^
And inside the command it starts a function:
sendmail(email, amount, update, context)
This function takes around 5seconds to finish. I want it so I can run it multiple times at once without needing to wait for it to finish. I tried the following:
Thread(target=sendmail(email, amount, update, context)).start()
This would give me no errors but It waits for function to finish then proceeds. I also tried this
with ThreadPoolExecutor(max_workers=100) as executor:
executor.submit(sendmail, email, amount, update, context).result()
but it gave me the following error:
No error handlers are registered, logging exception.
Traceback (most recent call last):
File "C:\Users\seal\AppData\Local\Programs\Python\Python310\lib\site-packages\telegram\ext\dispatcher.py", line 557, in process_update
handler.handle_update(update, self, check, context)
File "C:\Users\seal\AppData\Local\Programs\Python\Python310\lib\site-packages\telegram\ext\handler.py", line 199, in handle_update
return self.callback(update, context)
File "c:\Users\seal\Downloads\telegrambot\main.py", line 382, in sendmailcmd
executor.submit(sendmail, email, amount, update, context).result()
File "C:\Users\main\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 169, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

This is my first attempt at threading, but maybe try this:
import threading
x1 = threading.Thread(target=sendmail, args=(email, amount, update, context))
x1.start()
You can just put the x1 = threading... and x1.start() in a loop to have it run multiple times
Hope this helps

It's not waiting for one function to finish, to start another, but in python GIL (Global Interpreter Lock) executes only one thread at a given time. Since thread use multiple cores, time between two functions are negligible in most cases.
Following is the way to start threads with the ThreadPoolExecutor, please adjust it to your usecase.
def async_send_email(emails_to_send):
with ThreadPoolExecutor(max_workers=32) as executor:
futures = [
executor.submit(
send_email,
email=email_to_send.email,
amount=email_to_send.amount,
update=email_to_send.update,
context=email_to_send.context
)
for email_to_send in emails_to_send
]
for future, email_to_send in zip(futures, emails_to_send):
try:
future.result()
except Exception as e:
# Handle the exceptions.
continue
def send_email(email, amount, update, context):
# do what you want here.

Discover what is blocking the event loop

I have thousands of asyncio tasks running.
Something is taking about 10 seconds to complete (some CPU intensive work).
This is making the program not work, as some tasks need to answer a message lets say in 5 seconds, on their network connection.
My current idea is to somehow intercept the event loop.
There must be some area in the asyncio module where it executes all current active tasks in an event loop, between each epoll()/select(). If I could insert a "elapsed = time.time()" before and "elapsed = time.time() - elapsed" after each task "resumed", I think it would be enough to find out the tasks that are taking too much time.
I think the related code may be here, at line 79:
https://github.com/python/cpython/blob/master/Lib/asyncio/events.py
def _run(self):
try:
self._context.run(self._callback, *self._args)
except (SystemExit, KeyboardInterrupt):
raise
except BaseException as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
But I don't know what to do here to print any useful info; I need a way to identify what line of my code this "self._context.run(...)" will execute.
I have passed the last 5 sleepless months trying to fix my code and had no success yet.
I have tried to use CProfiler, line_profile, but none of them helped.
They tell me the time it takes to execute a function and the time spent on each line. What I need to find out is how much time the code is taken between each loop iteration.
All those profiling/debugging tools I tried gave me no clue what should be fixed. And after rewriting the same program about 15 times in different ways I still can't have it working.
I'm just a non-professional programmer and still a newbie in Python, but if I cant solve this problem the next step will be learning learning Rust, which itself will be a huge pain in the ass and probably 3 years after I started, I will have this thing working, which supposed to take no more than 2 months.

By the way, there is a built-in cool feature inside asyncio (you can see the code source: here) which tells you if there is a "blocking" function.
You just need to enable the debugging mode (good for load tests).
How to enable the debug mode - you can find here all the options how.

Just edited file /usr/lib/python3.7/asyncio/events.py and added:
import time
import signal
import traceback
START_TIME = 0
def handler(signum, frame):
print('##########', time.time() - START_TIME)
traceback.print_stack()
signal.signal(signal.SIGALRM, handler)
And on line 79:
def _run(self):
global START_TIME
try:
signal.alarm(3)
START_TIME = time.time()
self._context.run(self._callback, *self._args)
signal.alarm(0)
except Exception as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
Now every time some asynchronous code block the event loop for 3 seconds it will show a message.
Found out my problem was with a simple "BeautifulSoup(page, 'html.parser')" where page was a 1mb html file with a big table.

"WindowsError: Access is denied" on calling Process.terminate

I enforce a timeout for a block of code using the multiprocessing module. It appears that with certain sized inputs, the following error is raised:
WindowsError: [Error 5] Access is denied
I can replicate this error with the following code. Note that the code completes with '467,912,040' but not with '517,912,040'.
import multiprocessing, Queue
def wrapper(queue, lst):
lst.append(1)
queue.put(lst)
queue.close()
def timeout(timeout, lst):
q = multiprocessing.Queue(1)
proc = multiprocessing.Process(target=wrapper, args=(q, lst))
proc.start()
try:
result = q.get(True, timeout)
except Queue.Empty:
return None
finally:
proc.terminate()
return result
if __name__ == "__main__":
# lst = [0]*417912040 # this works fine
# lst = [0]*467912040 # this works fine
lst = [0] * 517912040 # this does not
print "List length:",len(lst)
timeout(60*30, lst)
The output (including error):
List length: 517912040
Traceback (most recent call last):
File ".\multiprocessing_error.py", line 29, in <module>
print "List length:",len(lst)
File ".\multiprocessing_error.py", line 21, in timeout
proc.terminate()
File "C:\Python27\lib\multiprocessing\process.py", line 137, in terminate
self._popen.terminate()
File "C:\Python27\lib\multiprocessing\forking.py", line 306, in terminate
_subprocess.TerminateProcess(int(self._handle), TERMINATE)
WindowsError: [Error 5] Access is denied
Am I not permitted to terminate a Process of a certain size?
I am using Python 2.7 on Windows 7 (64bit).

While I am still uncertain regarding the precise cause of the problem, I have some additional observations as well as a workaround.
Workaround.
Adding a try-except block in the finally clause.
finally:
try:
proc.terminate()
except WindowsError:
pass
This also seems to be the solution arrived at in a related (?) issue posted here on GitHub (you may have to scroll down a bit).
Observations.
This error is dependent on the size of the object passed to the Process/Queue, but it is not related to the execution of the Process itself. In the OP, the Process completes before the timeout expires.
proc.is_alive returns True before and after the execution of proc.terminate() (which then throws the WindowsError). A second or two later, proc.is_alive() returns False and a second call to proc.terminate() succeeds.
Forcing the main thread to sleep time.sleep(1) in the finally block also prevents the throwing of the WindowsError. Thanks, #tdelaney's comment in the OP.
My best guess is that proc is in the process of freeing memory (?, or something comparable) while being killed by the OS (having completed execution) when the call to proc.terminate() attempts to kill it again.

cPickle - unpickling error ONLY when the unpickled stuff is used later on

The code I write now works fine, I can even print the deserialized objects with no mistakes whatsoever, so I do know exactly what is in there.
#staticmethod
def receiveData(self):
'''
This method has to be static, as it is the argument of a Thread.
It receives Wrapperobjects from the server (as yet containing only a player)
and resets the local positions accordingly
'''
logging.getLogger(__name__).info("Serverinformationen werden nun empfangen")
from modules.logic import game
sock = self.sock
time.sleep(10)
self.myPlayer = game.get_player()
while (True):
try:
wrapPacked = sock.recv(4096)
self.myList = cPickle.loads(wrapPacked)
# self.setData(self.myList)
except Exception as eload:
print eload
However, if I try to actually use the line that is in comments here (self.setData(self.myList),
I get
unpickling stack underflow
and
invalid load key, ' '.
Just for the record, the code of setData is:
def setData(self, list):
if (list.__sizeof__()>0):
first = list [0]
self.myPlayer.setPos(first[1])
self.myPlayer.setVelocity(first[2])
I have been on this for 3 days now, and really, I have no idea what is wrong.
Can you help me?
Full Traceback:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "mypath/client.py", line 129, in receiveData
self.myList = cPickle.loads(wrapPacked)
UnpicklingError: unpickling stack underflow –

The fact that your exceptions always happen when you try to access the pickled data seem to indicate that you are hitting a bug in the cPickle library instead.
What can happen is that a C library forgets to handle an exception. The exception info is stored, not handled, and is sitting there in the interpreter until another exception happens or another piece of C code does check for an exception. At this point the old, unhandled exception is thrown instead.
Your error is clearly cPickle related, it is very unhappy about the data you feed it, but the exception itself is thrown in unrelated locations. This could be threading related, it could be a regular non-threading-related bug.
You need to see if you can load the data in a test setting. Write wrapPacked to a file for later testing. Load that file in a interpreter shell session, load it with cPickle.loads() and see what happens. Do the same with the pickle module.
If you do run into similar problems in this test session, and you can reproduce it (weird exceptions being thrown at a later point in the session) you need to file a bug with the Python project to have this looked at.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Timeout and Exception Function Getting Stuck - python

Related

The child process exception is not visible when BrokenProcessPool is raised

How to start another thread without waiting for function to finish?

Discover what is blocking the event loop

"WindowsError: Access is denied" on calling Process.terminate

cPickle - unpickling error ONLY when the unpickled stuff is used later on

Categories

Resources