I'm using apscheduler to process few things in background.
I'd like to capture and report possible exceptions to Sentry. My code looks like this:
sentry = Client(dsn=SENTRY_DSN)
def sample_method():
# some processing..
raise ConnectionError
def listen_to_exceptions(event):
if event.exception:
# I was hoping raven will capture the exception using sys.exc_info(), but it's not
sentry.captureException()
scheduler = BlockingScheduler()
scheduler.add_listener(listen_to_exceptions, EVENT_JOB_EXECUTED | EVENT_JOB_ERROR)
scheduler.add_job(sample_method, 'interval', minutes=5, max_instances=1)
# run forever!!!
scheduler.start()
But instead capturing the exception, it generates more exceptions trying to report it to Sentry.
ConnectionError
Error notifying listener
Traceback (most recent call last):
File "/.../venv/lib/python3.6/site-packages/apscheduler/schedulers/base.py", line 825, in _dispatch_event
cb(event)
File "app.py", line 114, in listen_to_exceptions
sentry.captureException(event.exception)
File "/.../venv/lib/python3.6/site-packages/raven/base.py", line 814, in captureException
'raven.events.Exception', exc_info=exc_info, **kwargs)
File "/.../venv/lib/python3.6/site-packages/raven/base.py", line 623, in capture
if self.skip_error_for_logging(exc_info):
File "/.../venv/lib/python3.6/site-packages/raven/base.py", line 358, in skip_error_for_logging
key = self._get_exception_key(exc_info)
File "/.../venv/lib/python3.6/site-packages/raven/base.py", line 345, in _get_exception_key
code_id = id(exc_info[2] and exc_info[2].tb_frame.f_code)
TypeError: 'ConnectionError' object is not subscriptable
I'm trying to use event listener according to the docs. Is there another way to capture exceptions in executed jobs?
Of course I could add try except blocks to each job function. I'm just trying to understand if there's a way to do it with apscedular, because I've 20+ jobs and adding sentry.captureException() every where seems like repetition.
You only need to capture EVENT_JOB_ERROR. Also, sentry.captureException() requires an exc_info tuple as its argument, not the exception object. The following will work on Python 3:
def listen_to_exceptions(event):
exc_info = type(event.exception), event.exception, event.exception.__traceback__
sentry.captureException(exc_info)
scheduler.add_listener(listen_to_exceptions, EVENT_JOB_ERROR)
The documentation has been updated. So you have to do it the following way:
from sentry_sdk import capture_exception
....
def sentry_listener(event):
if event.exception:
capture_exception(event.exception)
scheduler.add_listener(sentry_listener, EVENT_JOB_ERROR)
Related
Here is the example python code, called dask_multithread_demo.py
import dask.bag as db
import logging
log = logging.getLogger(__name__)
def dask_function(input_list):
db.from_sequence(input_list) \
.map(lambda x: log.info(f"showing x: {x}")) \
.compute()
def main():
dask.config.set(scheduler='single-threaded')
dask_function(["abc"])
dask.config.set(scheduler='multiprocessing')
dask_function(["abc"])
When I run the main method and do the first call of dask_function, I do not get any Exceptions
When I get to the second call of dask_function, I get the following exception
File "dask_multithread_demo.py", line 18, in main
dask_function(["abc"])
File "dask_multithread_demo.py", line 10, in dask_function
.map(lambda x: log.info(f"showing x: {x}")) \
File "lib/python3.6/site-packages/dask/base.py", line 166, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "lib/python3.6/site-packages/dask/base.py", line 437, in compute
results = schedule(dsk, keys, **kwargs)
File "lib/python3.6/site-packages/dask/multiprocessing.py", line 222, in get
**kwargs
File "lib/python3.6/site-packages/dask/local.py", line 489, in get_async
raise_exception(exc, tb)
File "lib/python3.6/site-packages/dask/local.py", line 318, in reraise
raise exc.with_traceback(tb)
File "lib/python3.6/site-packages/dask/local.py", line 224, in execute_task
task, data = loads(task_info)
TypeError: get_logger() missing 1 required positional argument: 'name'
get_logger() missing 1 required positional argument: 'name'
Here is my question: How do I do the logging in dask_function with scheduler set to "multiprocessing" without the Type Error Exception happening?
I think when you pass multiprocessing you cannot share variables between processes as you would in a single-threaded context. See: https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes
You may want to check out the Debug page in the Dask docs for general ideas on debugging here: https://docs.dask.org/en/stable/how-to/debug.html#debug
And more specifically the section that talks about logging: https://docs.dask.org/en/stable/how-to/debug.html#logs
I personally found the rerun_exceptions_locally argument to be valuable for quickly debugging errors that happen on Dask workers: https://docs.dask.org/en/stable/how-to/debug.html#rerun-failed-task-locally
In short; Logging (and debugging in general) isn't a trivial thing in a distributed environment. Exceptions often end up killing the worker process and because of that the stack traces or logs might not make it back to you.
Hey guys I'm working on a prototype for a project at my school (I'm a research assistant so this isn't a graded project). I'm running celery on a server cluster (with 48 workers/cores) which is already setup and working. The nutshell of my project is that we want to use celery for some number crunching of a rather large amount of files/tasks.
Because of this it is very important that we save results to an actual file, we have gigs upon gigs of data and it WON'T fit in RAM while running the traditional task queue/backend.
Anyways...
My prototype (with a trivial add function):
task.py
from celery import Celery
app=Celery()
#app.task
def mult(x,y):
return x*y
And this works great when I execute: $ celery worker -A task -l info
But if I try and add a new backend:
from celery import Celery
app=Celery()
app.conf.update(CELERY_RESULT_BACKEND = 'file://~/Documents/results')
#app.task
def mult(x,y):
return x*y
I get a rather large error:
[2017-08-04 13:22:18,133: CRITICAL/MainProcess] Unrecoverable error:
AttributeError("'NoneType' object has no attribute 'encode'",)
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'backend'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site- packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/bootsteps.py", line 115, in start
self.on_start()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 143, in on_start
self.emit_banner()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 158, in emit_banner
' \n', self.startup_info(artlines=not use_image))),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 221, in startup_info
results=self.app.backend.as_uri(),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 1183, in backend
return self._get_backend()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 902, in _get_backend
return backend(app=self, url=url)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/backends/filesystem.py", line 45, in __init__
self.path = path.encode(encoding)
AttributeError: 'NoneType' object has no attribute 'encode'
I am only 2 days into this project and have never worked with celery (or a similar library) before (I come from the algorithmic, mathy side of the fence). I currently wrangling with celery's user guide docs, but they're honestly pretty sparse on this detail.
Any help is much appreciated and thank you.
Looking at the celery code for filesystem backed result backend here.
https://github.com/celery/celery/blob/master/celery/backends/filesystem.py#L54
Your path needs to start with file:/// (3 slashes)
Your settings has it starting with file:// (2 slashes)
You might also want to use the absolute path instead of the ~.
I have a few twitterbots that I run on my raspberryPi. I have most functions wrapped in a try / except to ensure that if something errors it doesn't break the program and continues to execute.
I'm also using Python's Streaming library as my source of monitoring for the tags that I want the bot to retweet.
Here is an issue that happens that kills the program although I have the main function wrapped in a try/except:
Unhandled exception in thread started by <function startBot5 at 0x762fbed0>
Traceback (most recent call last):
File "TwitButter.py", line 151, in startBot5
'<botnamehere>'
File "/home/pi/twitter/bots/TwitBot.py", line 49, in __init__
self.startFiltering(trackList)
File "/home/pi/twitter/bots/TwitBot.py", line 54, in startFiltering
self.myStream.filter(track=tList)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 445, in filter
self._start(async)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 361, in _start
self._run()
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 294, in _run
raise exception
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 263, in _run
self._read_loop(resp)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 313, in _read_loop
line = buf.read_line().strip()
AttributeError: 'NoneType' object has no attribute 'strip'
My setup:
I have a parent class TwitButter.py, that creates an object from the TwitBot.py. These objects are the bots, and they are started on their own thread so they can run independently.
I have a function in the TwitBot that runs the startFiltering() function. It is wrapped in a try/except, but my except code is never triggered.
My guess is that the error is occurring within the Streaming library. Maybe that library is poorly coded and breaks on the line that is specified at the bottom of the traceback.
Any help would be awesome, and I wonder if others have experienced this issue?
I can provide extra details if needed.
Thanks!!!
This actually is problem in tweepy that was fixed by github #870 in 2017-04. So, should be resolved by updating your local copy to latest master.
What I did to discover that:
Did a web search to find the tweepy source repo.
Looked at streaming.py for context on the last traceback lines.
Noticed the most recent change to the file was the same problem.
I'll also note that most of the time you get a traceback from deep inside a Python library, the problem comes from the code calling it incorrectly, rather than a bug in the library. But not always. :)
I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!
Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.