Python 3.6: "AttributeError: 'SimpleQueue' object has no attribute '_poll'" - python

When I use a standard Queue to send samples to a process, everything works fine. However, since my needs are simple, I tried to use a SimpleQueue and for some reason the 'empty' method doesn't work. Here's the details:
Error comes from the consumer process (when sample_queue is Queue, everything works, when sample_queue is SimpleQueue, things break):
def frame_update(i):
while not self.sample_queue.empty():
sample = self.sample_queue.get()
for line in lines:
While executing sample_queue.empty() -- SimpleQueue.empty(), from Python 3.6 on windows (queues.py) we get:
def empty(self):
return not self._poll()
Where self._poll() has been set in init by:
def __init__(self, *, ctx):
self._reader, self._writer = connection.Pipe(duplex=False)
self._rlock = ctx.Lock()
self._poll = self._reader.poll
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = ctx.Lock()
So I follow the self._reader which is set from connection.Pipe (connection.py):
...
c1 = PipeConnection(h1, writable=duplex)
c2 = PipeConnection(h2, readable=duplex)
Ok, great. The _reader is going to be a PipeConnection and pipe connection has this method:
def _poll(self, timeout):
if (self._got_empty_message or
_winapi.PeekNamedPipe(self._handle)[0] != 0):
return True
return bool(wait([self], timeout))
Alright -- So a couple of questions:
1) Shouldn't the init of SimpleQueue be assigning self.poll to self._reader._poll instead of self._reader.poll? Or am I missing something in the inheritance hierarchy?
2) The PipeConnection _poll routine takes a timeout parameter, so #1 shouldn't work...
*) -- Is there some other binding of PipeConnection _poll that I'm missing?
Am I missing something? I am using Python3.6, Windows, debugging in PyCharm and I follow all the paths and they're in the standard multiprocessing paths. I'd appreciate any help or advice. Thanks!
EDIT: After further review, I can see that PipeConnection is a subclass of _ConnectionBase which does indeed have a 'poll' method and it is bound with a default timeout parameter.
So the question is: When SimpleQueue is initializing and sets
self._poll = self._reader.poll
Why doesn't it go up the class hierarchy to grab that from _ConnectionBase?

After looking at why the Queue type works and why the SimpleQueue doesn't, I found that Queue sets the _poll method 'after_fork' as well as before. SimpleQueue doesn't. By changing the setstate method to add self._poll = self._reader.poll as follows (queues.py, line 338), SimpleQueue works
def __setstate__(self, state):
(self._reader, self._writer, self._rlock, self._wlock) = state
self._poll = self._reader.poll
Seems like a bug to me unless I'm really misunderstanding something. I'll submit a bug report and reference this post. Hope this helps someone!
http://bugs.python.org/issue30301

Related

Why the context manager fails to execute when using the value returned from it?

I'm using a context manager to communicate with one of my servers. I've created a context manager using the contextlib module:
#contextlib.contextmanager
def session_ctx(self):
"""context manager session interface"""
try:
self.connect()
yield self
finally:
self.disconnect()
Pretty simple. It returns self to the ctx. Now I'm used to use it with the as variable syntax, with obj.session_ctx() as s:
with obj.session_ctx() as x:
some_data = x.get_data()
...
This let's me work on the object returned from the context manager function.
However, this fails to execute x.get_data() with returns invalid state (such no connection exist).
If I remove the as x part and use the obj directly, it works as expected:
with obj.session_ctx():
some_data = obj.get_data()
Why don't this work with the as x ? If x is actually self of obj, this should be transparent, isn't it?
I'm using ctx on other parts of my code, and it seems to work with the as x syntax. I'm just curious why this is different.
update: I added debug prints, and x is assigned with the value returned from self.connect(). That's totally a different object. How is this changed??

Python readline, tab completion cycling with the Cmd interface

I am using the cmd.Cmd class in Python to offer a simple readline interface to my program.
Self contained example:
from cmd import Cmd
class CommandParser(Cmd):
def do_x(self, line):
pass
def do_xy(self, line):
pass
def do_xyz(self, line):
pass
if __name__ == "__main__":
parser = CommandParser()
parser.cmdloop()
Pressing tab twice will show possibilities. Pressing tab again does the same.
My question is, how do I get the options to cycle on the third tab press? In readline terms I think this is called Tab: menu-complete, but I can't see how to apply this to a Cmd instance.
I already tried:
readline.parse_and_bind('Tab: menu-complete')
Both before and after instantiating the parser instance. No luck.
I also tried passing "Tab: menu-complete" to the Cmd constructor. No Luck here either.
Anyone know how it's done?
Cheers!
The easiest trick would be to add a space after menu-complete:
parser = CommandParser(completekey="tab: menu-complete ")
The bind expression that is executed
readline.parse_and_bind(self.completekey+": complete")
will then become
readline.parse_and_bind("tab: menu-complete : complete")
Everything after the second space is acutally ignored, so it's the same as tab: menu-complete.
If you don't want to rely on that behaviour of readline parsing (I haven't seen it documented) you could use a subclass of str that refuses to be extended as completekey:
class stubborn_str(str):
def __add__(self, other):
return self
parser = CommandParser(completekey=stubborn_str("tab: menu-complete"))
self.completekey+": complete" is now the same as self.completekey.
Unfortunately, it seems as though the only way around it is to monkey-patch the method cmdloop from the cmd.Cmd class, or roll your own.
The right approach is to use "Tab: menu-complete", but that's overriden by the class as shown in line 115: readline.parse_and_bind(self.completekey+": complete"), it is never activated. (For line 115, and the entire cmd package, see this: https://hg.python.org/cpython/file/2.7/Lib/cmd.py). I've shown an edited version of that function below, and how to use it:
import cmd
# note: taken from Python's library: https://hg.python.org/cpython/file/2.7/Lib/cmd.py
def cmdloop(self, intro=None):
"""Repeatedly issue a prompt, accept input, parse an initial prefix
off the received input, and dispatch to action methods, passing them
the remainder of the line as argument.
"""
self.preloop()
if self.use_rawinput and self.completekey:
try:
import readline
self.old_completer = readline.get_completer()
readline.set_completer(self.complete)
readline.parse_and_bind(self.completekey+": menu-complete") # <---
except ImportError:
pass
try:
if intro is not None:
self.intro = intro
if self.intro:
self.stdout.write(str(self.intro)+"\n")
stop = None
while not stop:
if self.cmdqueue:
line = self.cmdqueue.pop(0)
else:
if self.use_rawinput:
try:
line = raw_input(self.prompt)
except EOFError:
line = 'EOF'
else:
self.stdout.write(self.prompt)
self.stdout.flush()
line = self.stdin.readline()
if not len(line):
line = 'EOF'
else:
line = line.rstrip('\r\n')
line = self.precmd(line)
stop = self.onecmd(line)
stop = self.postcmd(stop, line)
self.postloop()
finally:
if self.use_rawinput and self.completekey:
try:
import readline
readline.set_completer(self.old_completer)
except ImportError:
pass
# monkey-patch - make sure this is done before any sort of inheritance is used!
cmd.Cmd.cmdloop = cmdloop
# inheritance of the class with the active monkey-patched `cmdloop`
class MyCmd(cmd.Cmd):
pass
Once you've monkey-patched the class method, (or implemented your own class), it provides the correct behavior (albeit without highlighting and reverse-tabbing, but these can be implemented with other keys as necessary).

Factory instance has no attribute 'startedConnecting'

I've looked around quite a bit, but haven't found an error quite like this. When I execute my code (below), I get the exception ControllerFactory instance has no attribute 'startedConnecting'. I tried adding the method with the body just being pass, but that simply causes it to stall without transmitting anything, leading me to believe that the problem lies within the way I've set up the classes.
This code is based on code from the twisted website. It's meant to be able to transmit python files, which the server saves, then later transmit arguments for running the python file.
#!/usr/bin/env python
from twisted.internet import reactor, protocol
import argparse
file_header = "pfile:"
run_header = "runwith:"
class Controller(protocol.Protocol):
def sendMessage(self,message):
self.transport.write(message)
class ControllerFactory(protocol.Factory):
def buildProtocol(self, addr):
cont = Controller()
cont.factory = self
return cont
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--address")
parser.add_argument("--file")
parser.add_argument("--args")
args = parser.parse_args()
if(args.file and args.args):
raise Exception("Can't send file and args at same time.")
reactor.connectTCP(args.address, 1337, ControllerFactory())
reactor.run()
if(args.file):
print(args.file)
a = open(args.file)
factory.connectedProtocol.sendMessage(file_header + a.read())
a.close()
if(args.args):
print(args.args)
factory.connectedProtocol.sendMessage(run_header + args.args)
Use twisted.internet.protocol.ClientFactory for clients (instead of twisted.internet.protocol.Factory). Or use something from twisted.internet.endpoints instead of twisted.internet.reactor.connectTCP.
Also, note that reactor.run() blocks. All of your code that follows that line will not run in any useful way.

pylint on in-memory file/stream

I'd like to embed pylint in a program. The user enters python programs (in Qt, in a QTextEdit, although not relevant) and in the background I call pylint to check the text he enters. Finally, I print the errors in a message box.
There are thus two questions: First, how can I do this without writing the entered text to a temporary file and giving it to pylint ? I suppose at some point pylint (or astroid) handles a stream and not a file anymore.
And, more importantly, is it a good idea ? Would it cause problems for imports or other stuffs ? Intuitively I would say no since it seems to spawn a new process (with epylint) but I'm no python expert so I'm really not sure. And if I use this to launch pylint, is it okay too ?
Edit:
I tried tinkering with pylint's internals, event fought with it, but finally have been stuck at some point.
Here is the code so far:
from astroid.builder import AstroidBuilder
from astroid.exceptions import AstroidBuildingException
from logilab.common.interface import implements
from pylint.interfaces import IRawChecker, ITokenChecker, IAstroidChecker
from pylint.lint import PyLinter
from pylint.reporters.text import TextReporter
from pylint.utils import PyLintASTWalker
class Validator():
def __init__(self):
self._messagesBuffer = InMemoryMessagesBuffer()
self._validator = None
self.initValidator()
def initValidator(self):
self._validator = StringPyLinter(reporter=TextReporter(output=self._messagesBuffer))
self._validator.load_default_plugins()
self._validator.disable('W0704')
self._validator.disable('I0020')
self._validator.disable('I0021')
self._validator.prepare_import_path([])
def destroyValidator(self):
self._validator.cleanup_import_path()
def check(self, string):
return self._validator.check(string)
class InMemoryMessagesBuffer():
def __init__(self):
self.content = []
def write(self, st):
self.content.append(st)
def messages(self):
return self.content
def reset(self):
self.content = []
class StringPyLinter(PyLinter):
"""Does what PyLinter does but sets checkers once
and redefines get_astroid to call build_string"""
def __init__(self, options=(), reporter=None, option_groups=(), pylintrc=None):
super(StringPyLinter, self).__init__(options, reporter, option_groups, pylintrc)
self._walker = None
self._used_checkers = None
self._tokencheckers = None
self._rawcheckers = None
self.initCheckers()
def __del__(self):
self.destroyCheckers()
def initCheckers(self):
self._walker = PyLintASTWalker(self)
self._used_checkers = self.prepare_checkers()
self._tokencheckers = [c for c in self._used_checkers if implements(c, ITokenChecker)
and c is not self]
self._rawcheckers = [c for c in self._used_checkers if implements(c, IRawChecker)]
# notify global begin
for checker in self._used_checkers:
checker.open()
if implements(checker, IAstroidChecker):
self._walker.add_checker(checker)
def destroyCheckers(self):
self._used_checkers.reverse()
for checker in self._used_checkers:
checker.close()
def check(self, string):
modname = "in_memory"
self.set_current_module(modname)
astroid = self.get_astroid(string, modname)
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
self._add_suppression_messages()
self.set_current_module('')
self.stats['statement'] = self._walker.nbstatements
def get_astroid(self, string, modname):
"""return an astroid representation for a module"""
try:
return AstroidBuilder().string_build(string, modname)
except SyntaxError as ex:
self.add_message('E0001', line=ex.lineno, args=ex.msg)
except AstroidBuildingException as ex:
self.add_message('F0010', args=ex)
except Exception as ex:
import traceback
traceback.print_exc()
self.add_message('F0002', args=(ex.__class__, ex))
if __name__ == '__main__':
code = """
a = 1
print(a)
"""
validator = Validator()
print(validator.check(code))
The traceback is the following:
Traceback (most recent call last):
File "validator.py", line 16, in <module>
main()
File "validator.py", line 13, in main
print(validator.check(code))
File "validator.py", line 30, in check
self._validator.check(string)
File "validator.py", line 79, in check
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
File "c:\Python33\lib\site-packages\pylint\lint.py", line 659, in check_astroid_module
tokens = tokenize_module(astroid)
File "c:\Python33\lib\site-packages\pylint\utils.py", line 103, in tokenize_module
print(module.file_stream)
AttributeError: 'NoneType' object has no attribute 'file_stream'
# And sometimes this is added :
File "c:\Python33\lib\site-packages\astroid\scoped_nodes.py", line 251, in file_stream
return open(self.file, 'rb')
OSError: [Errno 22] Invalid argument: '<?>'
I'll continue digging tomorrow. :)
I got it running.
the first one (NoneType …) is really easy and a bug in your code:
Encountering an exception can make get_astroid “fail”, i.e. send one syntax error message and return!
But for the secong one… such bullshit in pylint’s/logilab’s API… Let me explain: Your astroid object here is of type astroid.scoped_nodes.Module.
It’s also created by a factory, AstroidBuilder, which sets astroid.file = '<?>'.
Unfortunately, the Module class has following property:
#property
def file_stream(self):
if self.file is not None:
return open(self.file, 'rb')
return None
And there’s no way to skip that except for subclassing (Which would render us unable to use the magic in AstroidBuilder), so… monkey patching!
We replace the ill-defined property with one that checks an instance for a reference to our code bytes (e.g. astroid._file_bytes) before engaging in above default behavior.
def _monkeypatch_module(module_class):
if module_class.file_stream.fget.__name__ == 'file_stream_patched':
return # only patch if patch isn’t already applied
old_file_stream_fget = module_class.file_stream.fget
def file_stream_patched(self):
if hasattr(self, '_file_bytes'):
return BytesIO(self._file_bytes)
return old_file_stream_fget(self)
module_class.file_stream = property(file_stream_patched)
That monkeypatching can be called just before calling check_astroid_module. But one more thing has to be done. See, there’s more implicit behavior: Some checkers expect and use astroid’s file_encoding field. So we now have this code in the middle of check:
astroid = self.get_astroid(string, modname)
if astroid is not None:
_monkeypatch_module(astroid.__class__)
astroid._file_bytes = string.encode('utf-8')
astroid.file_encoding = 'utf-8'
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
One could say that no amount of linting creates actually good code. Unfortunately pylint unites enormous complexity with a specialization of calling it on files. Really good code has a nice native API and wraps that with a CLI interface. Don’t ask me why file_stream exists if internally, Module gets built from but forgets the source code.
PS: i had to change sth else in your code: load_default_plugins has to come before some other stuff (maybe prepare_checkers, maybe sth. else)
PPS: i suggest subclassing BaseReporter and using that instead of your InMemoryMessagesBuffer
PPPS: this just got pulled (3.2014), and will fix this: https://bitbucket.org/logilab/astroid/pull-request/15/astroidbuilderstring_build-was/diff
4PS: this is now in the official version, so no monkey patching required: astroid.scoped_nodes.Module now has a file_bytes property (without leading underscore).
Working with an unlocatable stream may definitly cause problems in case of relative imports, since the location is then needed to find the actually imported module.
Astroid support building an AST from a stream, but this is not used/exposed through Pylint which is a level higher and designed to work with files. So while you may acheive this it will need a bit of digging into the low-level APIs.
The easiest way is definitly to save the buffer to the file then to use the SA answer to start pylint programmatically if you wish (totally forgot this other account of mine found in other responses ;). Another option being to write a custom reporter to gain more control.

Hang in Python script using SQLAlchemy and multiprocessing

Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module.
This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default) on Debian squeeze.
This is a simplified version of some actual code.
import multiprocessing
from sqlalchemy import *
from sqlalchemy.orm import *
dbuser = ...
password = ...
dbname = ...
dbstring = "postgresql://%s:%s#localhost:5432/%s"%(dbuser, password, dbname)
db = create_engine(dbstring)
m = MetaData(db)
def make_foo(i):
t1 = Table('foo%s'%i, m, Column('a', Integer, primary_key=True))
conn = db.connect()
for i in range(10):
conn.execute("DROP TABLE IF EXISTS foo%s"%i)
conn.close()
db.dispose()
for i in range(10):
make_foo(i)
m.create_all()
def do(kwargs):
i, dbstring = kwargs['i'], kwargs['dbstring']
db = create_engine(dbstring)
Session = scoped_session(sessionmaker())
Session.configure(bind=db)
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i, 'dbstring':dbstring})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
r.get()
r.wait()
pool.close()
pool.join()
This script hangs with the following error message.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 259, in _handle_results
task = get()
TypeError: ('__init__() takes at least 4 arguments (2 given)', <class 'sqlalchemy.exc.ProgrammingError'>, ('(ProgrammingError) syntax error at or near "%"\nLINE 1: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;\n ^\n',))
Of course, the syntax error here is TRUNCATE foo%s;. My question is, why is the process hanging, and can I persuade it to exit with an error instead, without doing major surgery to my code? This behavior is very similar to that of my actual code.
Note that the hang does not occur if the statement is replaced by something like print foobarbaz. Also, the hang still happens if we replace
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
by just Session.execute("TRUNCATE foo%s;")
I'm using the former version because it is closer to what my actual code is doing.
Also, removing multiprocessing from the picture and looping over the tables serially makes the hang go away, and it just exits with an error.
I'm also kind of puzzled by the form of the error, particularly the TypeError: ('__init__() takes at least 4 arguments (2 given)' bit. Where is this error coming from? It seems likely it is from somewhere in the multiprocessing code.
The PostgreSQL logs aren't helpful. I see lots of lines like
2012-01-09 14:16:34.174 IST [7810] 4f0aa96a.1e82/1 12/583 0 ERROR: syntax error at or near "%" at character 28
2012-01-09 14:16:34.175 IST [7810] 4f0aa96a.1e82/2 12/583 0 STATEMENT: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;
but nothing else that seems relevant.
UPDATE 1: Thanks to lbolla and his insightful analysis, I was able to file a Python bug report about this.
See sbt's analysis in that report, and also here. See also the Python bug report Fix exception pickling. So, following sbt's explanation, we can reproduce the original error with
import sqlalchemy.exc
e = sqlalchemy.exc.ProgrammingError("", {}, None)
type(e)(*e.args)
which gives
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
TypeError: __init__() takes at least 4 arguments (2 given)
UPDATE 2: This has been fixed, at least for SQLAlchemy, by Mike Bayer, see the bug report StatementError Exceptions un-pickable.. Per Mike's suggestion, I also reported a similar bug to psycopg2, though I didn't (and don't) have an actual example of breakage. Regardless, they have apparently fixed it, though they gave no details of the fix. See psycopg exceptions cannot be pickled. For good measure, I also reported a Python bug ConfigParser exceptions are not pickleable corresponding to the SO question lbolla mentioned. It seems they want a test for this.
Anyway, this looks like it will continue to be a problem in the foreseeable future, since, by and large, Python developers don't seem to be aware of this issue and so don't guard against it. Surprisingly, it seems that there are not enough people using multiprocessing for this to be a well known issue, or maybe they just put up with it. I hope the Python developers get around to fixing it at least for Python 3, because it is annoying.
I accepted lbolla's answer, as without his explanation of how the problem was related to exception handling, I would likely have gone nowhere in understanding this. I also want to thank sbt, who explained that Python not being able to pickle exceptions was the problem. I'm very grateful to both of them, and please vote their answers up. Thanks.
UPDATE 3: I posted a followup question: Catching unpickleable exceptions and re-raising.
I believe the TypeError comes from multiprocessing's get.
I've stripped out all the DB code from your script. Take a look at this:
import multiprocessing
import sqlalchemy.exc
def do(kwargs):
i = kwargs['i']
print i
raise sqlalchemy.exc.ProgrammingError("", {}, None)
return i
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
# Use get or wait?
# r.get()
r.wait()
pool.close()
pool.join()
print results
Using r.wait returns the result expected, but using r.get raises TypeError. As describe in python's docs, use r.wait after a map_async.
Edit: I have to amend my previous answer. I now believe the TypeError comes from SQLAlchemy. I've amended my script to reproduce the error.
Edit 2: It looks like the problem is that multiprocessing.pool does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).
I've amended my script to highlight this.
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
class GoodExc(Exception):
def __init__(self, a=None):
'''Optional param in the constructor.'''
self.a = a
def do(kwargs):
i = kwargs['i']
print i
raise BadExc('a')
# raise GoodExc('a')
return i
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do function and re-raise a normal Exception instead. Something like this:
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
def do(kwargs):
try:
i = kwargs['i']
print i
raise BadExc('a')
return i
except Exception as e:
raise Exception(repr(e))
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.
As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).
The TypeError: ('__init__() takes at least 4 arguments (2 given) error isn't related to the sql you're trying to execute, it has to do with how you're using SqlAlchemy's API.
The trouble is that you're trying to call execute on the session class rather than an instance of that session.
Try this:
session = Session()
session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
session.commit()
From the docs:
It is intended that the sessionmaker() function be called within the
global scope of an application, and the returned class be made
available to the rest of the application as the single class used to
instantiate sessions.
So Session = sessionmaker() returns a new session class and session = Session() returns an instance of that class which you can then call execute on.
I don't know about the cause of the original exception. However, multiprocessing's problems with "bad" exceptions is really down to how pickling works. I think the sqlachemy exception class is broken.
If an exception class has an __init__() method which does not call BaseException.__init__() (directly or indirectly) then self.args probably will not be set properly. BaseException.__reduce__() (which is used by the pickle protocol) assumes that a copy of an exception e can be recreated by just doing
type(e)(*e.args)
For example
>>> e = ValueError("bad value")
>>> e
ValueError('bad value',)
>>> type(e)(*e.args)
ValueError('bad value',)
If this invariant does not hold then pickling/unpickling will fail. So instances of
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
can be pickled, but the result cannot be unpickled:
>>> from cPickle import loads, dumps
>>> class BadExc(Exception):
... def __init__(self, a):
... '''Non-optional param in the constructor.'''
... self.a = a
...
>>> loads(dumps(BadExc(1)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.BadExc'>, ())
But instances of
class GoodExc1(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
Exception.__init__(self, a)
self.a = a
or
class GoodExc2(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.args = (a,)
self.a = a
can be successfully pickled/unpickled.
So you should ask the developers of sqlalchemy to fix their exception classes. In the mean time you can probably use copy_reg.pickle() to override BaseException.__reduce__() for the troublesome classes.
(This is in answer to Faheem Mitha's question in a comment about how to use copy_reg to work around the broken exception classes.)
The __init__() methods of SQLAlchemy's exception classes seem to call their base class's __init__() methods, but with different arguments. This mucks up pickling.
To customise the pickling of sqlalchemy's exception classes you can use copy_reg to register your own reduce functions for those classes.
A reduce function takes an argument obj and returns a pair (callable_obj, args) such that a copy of obj can be created by doing callable_obj(*args). For example
class StatementError(SQLAlchemyError):
def __init__(self, message, statement, params, orig):
SQLAlchemyError.__init__(self, message)
self.statement = statement
self.params = params
self.orig = orig
...
can be "fixed" by doing
import copy_reg, sqlalchemy.exc
def reduce_StatementError(e):
message = e.args[0]
args = (message, e.statement, e.params, e.orig)
return (type(e), args)
copy_reg.pickle(sqlalchemy.exc.StatementError, reduce_StatementError)
There are several other classes in sqlalchemy.exc which need to be fixed similarly. But hopefully you get the idea.
On second thoughts, rather than fixing each class individually, you can probably just monkey patch the __reduce__() method of the base exception class:
import sqlalchemy.exc
def rebuild_exc(cls, args, dic):
e = Exception.__new__(cls)
e.args = args
e.__dict__.update(dic)
return e
def __reduce__(e):
return (rebuild_exc, (type(e), e.args, e.__dict__))
sqlalchemy.exc.SQLAlchemyError.__reduce__ = __reduce__

Categories

Resources