I am trying to use concurrent.futures.ThreadPoolExecutor module to run a class method in parallel, the simplified version of my code is pretty much the following:
class TestClass:
def __init__(self, secondsToSleepFor):
self.secondsToSleepFor = secondsToSleepFor
def testMethodToExecInParallel(self):
print("ThreadName: " + threading.currentThread().getName())
print(threading.currentThread().getName() + " is sleeping for " + str(self.secondsToSleepFor) + " seconds")
time.sleep(self.secondsToSleepFor)
print(threading.currentThread().getName() + " has finished!!")
with concurrent.futures.ThreadPoolExecutor(max_workers = 2) as executor:
futuresList = []
print("before try")
try:
testClass = TestClass(3)
future = executor.submit(testClass.testMethodToExecInParallel)
futuresList.append(future)
except Exception as exc:
print('Exception generated: %s' % exc)
If I execute this code it seems to behave like it is intended to.
But if I make a mistake like specifying a wrong number of parameters in "testMethodToExecInParallel" like:
def testMethodToExecInParallel(self, secondsToSleepFor):
and then still submitting the function as:
future = executor.submit(testClass.testMethodToExecInParallel)
or trying to concatenate a string object with an integer object (without using str(.) ) inside a print statement in "testMethodToExecInParallel" method:
def testMethodToExecInParallel(self):
print("ThreadName: " + threading.currentThread().getName())
print("self.secondsToSleepFor: " + self.secondsToSleepFor) <-- Should report an Error here
the program doesn't return any error; just prints "before try" and ends execution...
Is trivial to understand that this makes the program nearly undebuggable... Could someone explain me why such behaviour happens?
(for the first case of mistake) concurrent.futures.ThreadPoolExecutor doesn't check for a function with the specified signature to submit and, eventually, throw some sort of "noSuchFunction" exception?
Maybe there is some sort of problem in submitting to ThreadPoolExecutor class methods instead of simple standalone functions and, so, such behaviour could be expected?
Or maybe the error is thrown inside the thread and for some reason I can't read it?
-- EDIT --
Akshay.N suggestion of inserting future.result() after submitting functions to ThreadPoolExecutor makes the program behave as expected: goes nice if the code is correct, prints the error if something in the code is wrong.
I thing users must be warned about this very strange behaviour of ThreadPoolExecutor:
if you only submit functions to ThreadPoolExecutor WITHOUT THEN CALLING future.result():
- if the code is correct, the program goes on and behaves as expected
- if something in the code is wrong seems the program doesn't call the submitted function, whatever it does: it doesn't report the errors in the code
As far as my knowledge goes which is "not so far", you have to call "e.results()" after "executor.submit(testClass.testMethodToExecInParallel)" in order to execute the threadpool .
I have tried what you said and it is giving me error, below is the code
>>> import concurrent.futures as cf
>>> executor = cf.ThreadPoolExecutor(1)
>>> def a(x,y):
... print(x+y)
...
>>> future = executor.submit(a, 2, 35, 45)
>>> future.result()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line
425, in result
return self.__get_result()
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line
384, in __get_result
raise self._exception
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\thread.py", line
57, in run
result = self.fn(*self.args, **self.kwargs)
TypeError: a() takes 2 positional arguments but 3 were given
Let me know if it still doesn't work
Related
I have been having trouble with Python mock and have been going crazy. I've held off on this question due to fear of down voting for not enough research. I have a cumulative 24 hours over the last week trying to figure out how to get this work and cannot.
I have read numerous examples and have created this one from those. I know mock objects are supposed to be easy to use, but this has taken too long. Now I am out of time.
I am trying to do two simple things here:
1. Override a request.ok status code inside another function
2. Cause an urllib2.HTTPError exception to be thrown
I have distilled these two tasks into the simplest possible example for your convenience:
#ExampleModule.py
import requests
import urllib2
def hello_world():
try:
print "BEGIN TRY"
r = requests.request('GET', "http://127.0.0.1:80")
print r.ok
if r.ok:
print "PATCH 1 FAILED"
else:
print "PATCH 1 SUCCESSFUL"
except urllib2.HTTPError:
print "PATCH 2 SUCCESSFUL"
print "EXCEPTION 2 HIT\n"
else:
print "PATCH 2 FAILED\n"
and
#in TestModule.py
import mock
import ExampleModule
def test_function_try():
with mock.patch('ExampleModule.hello_world') as patched_request:
patched_request.requests.request.ok = False
result = ExampleModule.hello_world()
print result
def test_function_exception():
with mock.patch('ExampleModule.hello_world') as patched_exception:
patched_exception.urllib2.side_effect = HTTPError
result = ExampleModule.hello_world()
print result
test_function_try()
test_function_exception()
A normal call to hello_world() outputs:
BEGIN TRY
True
<Response [200]>
A normal call to test_function_try() outputs:
<MagicMock name='hello_world()' id='70272816'>
#From the "print result" inside test_function_try()
A normal call to test_function_exception() outputs:
<MagicMock name='hello_world()' id='62320016'>
#From the "print result" inside test_function_exception()
Obviously, I am not actually returning anything from hello_world() so it looks like the patched object is the hello_world() function instead of the requests or urllib2 patched modules.
It should be noted that when I try to patch with 'ExampleModule.hello_world.requests' or 'ExampleModule.hello_world.urllib2' I get an error saying they cannot be found in hello_world()
QUESTION SUMMARY
What is wrong with the two functions test_function_try() and test_function_exception()? What needs to be modified so that I can manually assign the value of request.ok inside hello_world() and also manually raise the exception HTTPError so that I can test the code in that block... Bonus points for explaining 'when' exactly the exception gets thrown: as soon as the try: is entered, or when request is called, or some other time?
Something that has been a concern of mine: will my print statements inside the ExampleModule.py reveal whether my patching and mock tests are working or do I HAVE to use assert methods to get the truth? I am not sure whether assert is a necessity when people mention 'use assertions to find out if the actual patched object was called, etc.' or if this is for convenience/convention/practicality.
UPDATE
After changing the patch target to the requests.request() function, as per #chepner's suggestion, I receive the following output:
BEGIN TRY
False
PATCH 1 SUCCESSFUL
PATCH 2 FAILED
BEGIN TRY
Traceback (most recent call last):
File "C:\LOCAL\ECLIPSE PROJECTS\MockingTest\TestModule.py", line 44, in <module>
test_function_exception()
File "C:\LOCAL\ECLIPSE PROJECTS\MockingTest\TestModule.py", line 19, in test_function_exception
ExampleModule.hello_world()
File "C:\LOCAL\ECLIPSE PROJECTS\MockingTest\ExampleModule.py", line 12, in hello_world
r = requests.request('GET', "http://127.0.0.1:8080")
File "C:\Python27\lib\site-packages\mock\mock.py", line 1062, in __call__
return _mock_self._mock_call(*args, **kwargs)
File "C:\Python27\lib\site-packages\mock\mock.py", line 1118, in _mock_call
raise effect
TypeError: __init__() takes exactly 6 arguments (1 given)
You can't mock local variables in a function. The thing you want to mock is requests.request itself, so that when hello_world calls it, a mock object is returned, rather than actually making an HTTP request.
import mock
import urllib2
import ExampleModule
def test_function_try():
with mock.patch('ExampleModule.requests.request') as patched_request:
patched_request.return_value.ok = False
ExampleModule.hello_world()
def test_function_exception():
with mock.patch('ExampleModule.requests.request') as patched_request:
patched_request.side_effect = urllib2.HTTPError
ExampleModule.hello_world()
test_function_try()
test_function_exception()
What happens is that if your code raises a runtime exception and that your completion doesn't work, you have no idea why because traceback isn't printed. Try this very short code to see what I mean : the program should crash on line c = 2+"ddda", obviously you're adding a string and an int, which simply doesn't work. But instead of crashing, the exception is kind of caught and you have no idea what's happening. The program continues to run as if nothing happend.
import cmd
class App(cmd.Cmd):
def complete_foo(self,*arg):
# Uncommenting this line will silently crash the progrm
# making it hard to debug.
# Is there a way to force the program to crash ?
c = 2 + "ddda"
return "d dzpo idz dza dpaoi".split(" ")
def do_foo(self,*args):
print "foo"
App().cmdloop()
My question is : how to show the error when there is one ? (when using the cmd module).
Unfortunately, exceptions in completers are caught somewhere inside the dark depths of readline. You can try something like that:
import cmd
import traceback
def log_exceptions(fun):
def wrapped(*a, **kw):
try:
return fun(*a, **kw)
except Exception:
print traceback.format_exc()
raise
return wrapped
class App(cmd.Cmd):
#log_exceptions
def complete_foo(self,*arg):
# Uncommenting this line will silently crash the progrm
# making it hard to debug.
# Is there a way to force the program to crash ?
c = 2 + "ddda"
return "d dzpo idz dza dpaoi".split(" ")
$ python c.py
(Cmd) foo Traceback (most recent call last):
File "c.py", line 7, in wrapped
return fun(*a, **kw)
File "c.py", line 20, in complete_foo
c = 2 + "ddda"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Remove the decorator after debugging your completers, because printing tracebacks from inside the readline can mess up your terminal.
No, you can't crash readline easily.
UPDATE: As noted by Mr. Fooz, the functional version of the wrapper has a bug, so I reverted to the original class implementation. I've put the code up on GitHub:
https://github.com/nofatclips/timeout/commits/master
There are two commits, one working (using the "import" workaround) the second one broken.
The source of the problem seems to be the pickle#dumps function, which just spits out an identifier when called on an function. By the time I call Process, that identifier points to the decorated version of the function, rather than the original one.
ORIGINAL MESSAGE:
I was trying to write a function decorator to wrap a long task in a Process that would be killed if a timeout expires. I came up with this (working but not elegant) version:
from multiprocessing import Process
from threading import Timer
from functools import partial
from sys import stdout
def safeExecution(function, timeout):
thread = None
def _break():
#stdout.flush()
#print (thread)
thread.terminate()
def start(*kw):
timer = Timer(timeout, _break)
timer.start()
thread = Process(target=function, args=kw)
ret = thread.start() # TODO: capture return value
thread.join()
timer.cancel()
return ret
return start
def settimeout(timeout):
return partial(safeExecution, timeout=timeout)
##settimeout(1)
def calculatePrimes(maxPrimes):
primes = []
for i in range(2, maxPrimes):
prime = True
for prime in primes:
if (i % prime == 0):
prime = False
break
if (prime):
primes.append(i)
print ("Found prime: %s" % i)
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrime = a(calculatePrimes)
calculatePrime(24000)
As you can see, I commented out the decorator and assigned the modified version of calculatePrimes to calculatePrime. If I tried to reassign it to the same variable, I'd get a "Can't pickle : attribute lookup builtins.function failed" error when trying to call the decorated version.
Anybody has any idea of what is happening under the hood? Is the original function being turned into something different when I assign the decorated version to the identifier referencing it?
UPDATE: To reproduce the error, I just change the main part to
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrimes = a(calculatePrimes)
calculatePrimes(24000)
#sleep(2)
which yields:
Traceback (most recent call last):
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 49, in <module>
calculatePrimes(24000)
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 19, in start
ret = thread.start()
File "C:\Python33\lib\multiprocessing\process.py", line 111, in start
self._popen = Popen(self)
File "C:\Python33\lib\multiprocessing\forking.py", line 241, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python33\lib\multiprocessing\forking.py", line 160, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtin
s.function failed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python33\lib\multiprocessing\forking.py", line 344, in main
self = load(from_parent)
EOFError
P.S. I also wrote a class version of safeExecution, which has exactly the same behaviour.
Move the function to a module that's imported by your script.
Functions are only picklable in python if they're defined at the top level of a module. Ones defined in scripts are not picklable by default. Module-based functions are pickled as two strings: the name of the module, and the name of the function. They're unpickled by dynamically importing the module then looking up the function object by name (hence the restriction on top-level-only functions).
It's possible to extend the pickle handlers to support semi-generic function and lambda pickling, but doing so can be tricky. In particular, it can be difficult to reconstruct the full namespace tree if you want to properly handle things like decorators and nested functions. If you want to do this, it's best to use Python 2.7 or later or Python 3.3 or later (earlier versions have a bug in the dispatcher of cPickle and pickle that's unpleasant to work around).
Is there an easy way to pickle a python function (or otherwise serialize its code)?
Python: pickling nested functions
http://bugs.python.org/issue7689
EDIT:
At least in Python 2.6, the pickling works fine for me if the script only contains the if __name__ block, the script imports calculatePrimes and settimeout from a module, and if the inner start function's name is monkey-patched:
def safeExecution(function, timeout):
...
def start(*kw):
...
start.__name__ = function.__name__ # ADD THIS LINE
return start
There's a second problem that's related to Python's variable scoping rules. The assignment to the thread variable inside start creates a shadow variable whose scope is limited to one evaluation of the start function. It does not assign to the thread variable found in the enclosing scope. You can't use the global keyword to override the scope because you want and intermediate scope and Python only has full support for manipulating the local-most and global-most scopes, not any intermediate ones. You can overcome this problem by placing the thread object in a container that's housed in the intermediate scope. Here's how:
def safeExecution(function, timeout):
thread_holder = [] # MAKE IT A CONTAINER
def _break():
#stdout.flush()
#print (thread)
thread_holder[0].terminate() # REACH INTO THE CONTAINER
def start(*kw):
...
thread = Process(target=function, args=kw)
thread_holder.append(thread) # MUTATE THE CONTAINER
...
start.__name__ = function.__name__ # MAKES THE PICKLING WORK
return start
Not sure really why you get that problem, but to answer your title question: Why does the decorator not work?
When you pass arguments to a decorator, you need to structure the code slightly different. Essentially you have to implement the decorator as a class with an __init__ and an __call__.
In the init, you collect the arguments that you send to the decorator, and in the call, you'll get the function you decorate:
class settimeout(object):
def __init__(self, timeout):
self.timeout = timeout
def __call__(self, func):
def wrapped_func(n):
func(n, self.timeout)
return wrapped_func
#settimeout(1)
def func(n, timeout):
print "Func is called with", n, 'and', timeout
func(24000)
This should get you going on the decorator front at least.
I'm running some python code and get an error:
Exception in Tkinter callback
Traceback (most recent call last):
File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1413, in __call__
return self.func(*args)
File "multiline14.py", line 28, in getText
if encodinghex in process:
TypeError: argument of type 'function' is not iterable
My def is below.
def gui(item):
def default_encode(s):
pass
# map items from menu to commands
encodinghex = '.encode("hex")'
decodinghex = '.decode("hex")'
mapping = {"encode_b64": base64.encodestring,"encode_url": urllib.quote_plus,"encode_hex": encodinghex, "decode_b64": base64.decodestring, "decode_url": urllib.unquote_plus, "decode_hex": decodinghex}
process = mapping.get(item, default_encode)
def getText():
#clear bottom text field
bottomtext.delete(1.0, END)
#var equals whats in middle
var = middletext.get(1.0, 'end-1c')
#insert encoded var in bottom
if encodinghex in process:
var = '"%s"' % (var)
bottomtext.insert(INSERT, eval(var + process))
elif decodinghex in process:
var = '"%s"' % (var)
bottomtext.insert(INSERT, eval(var + process))
else:
bottomtext.insert(INSERT, process(var))
What causes that error?
You are requesting a function from mapping here:
process = mapping.get(item, default_encode)
You then try and iterate it here:
if encodinghex in process:
You can't use the in keyword unless the subject is Iterable.
What you're trying to achieve here is to actually see which function your call to mapping.get() returned
if process == encodinghex:
Note that base64.encodestring, urllib.quote_plus, encodinghex, base64.decodestring, urllib.unquote_plus, decodinghex are all functions
What you've done doesn't seem to make any sense at all. You have two text strings, encodinghex and decodinghex, which you're using eval to turn into code to execute. But in your mapping dict alongside those you've also got various actual methods, which you're also trying to pass to eval - which is bound to fail in itself, but even before that your code is trying to add the existing text string to the actual function value, which is impossible.
Judging from the last line of your sample, you have a function called process() somewhere. Yet you try to access it as if it were an iterable in the line if encodinghex in process. To fix the error, change the name of either the function or the iterable.
Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module.
This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default) on Debian squeeze.
This is a simplified version of some actual code.
import multiprocessing
from sqlalchemy import *
from sqlalchemy.orm import *
dbuser = ...
password = ...
dbname = ...
dbstring = "postgresql://%s:%s#localhost:5432/%s"%(dbuser, password, dbname)
db = create_engine(dbstring)
m = MetaData(db)
def make_foo(i):
t1 = Table('foo%s'%i, m, Column('a', Integer, primary_key=True))
conn = db.connect()
for i in range(10):
conn.execute("DROP TABLE IF EXISTS foo%s"%i)
conn.close()
db.dispose()
for i in range(10):
make_foo(i)
m.create_all()
def do(kwargs):
i, dbstring = kwargs['i'], kwargs['dbstring']
db = create_engine(dbstring)
Session = scoped_session(sessionmaker())
Session.configure(bind=db)
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i, 'dbstring':dbstring})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
r.get()
r.wait()
pool.close()
pool.join()
This script hangs with the following error message.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 259, in _handle_results
task = get()
TypeError: ('__init__() takes at least 4 arguments (2 given)', <class 'sqlalchemy.exc.ProgrammingError'>, ('(ProgrammingError) syntax error at or near "%"\nLINE 1: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;\n ^\n',))
Of course, the syntax error here is TRUNCATE foo%s;. My question is, why is the process hanging, and can I persuade it to exit with an error instead, without doing major surgery to my code? This behavior is very similar to that of my actual code.
Note that the hang does not occur if the statement is replaced by something like print foobarbaz. Also, the hang still happens if we replace
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
by just Session.execute("TRUNCATE foo%s;")
I'm using the former version because it is closer to what my actual code is doing.
Also, removing multiprocessing from the picture and looping over the tables serially makes the hang go away, and it just exits with an error.
I'm also kind of puzzled by the form of the error, particularly the TypeError: ('__init__() takes at least 4 arguments (2 given)' bit. Where is this error coming from? It seems likely it is from somewhere in the multiprocessing code.
The PostgreSQL logs aren't helpful. I see lots of lines like
2012-01-09 14:16:34.174 IST [7810] 4f0aa96a.1e82/1 12/583 0 ERROR: syntax error at or near "%" at character 28
2012-01-09 14:16:34.175 IST [7810] 4f0aa96a.1e82/2 12/583 0 STATEMENT: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;
but nothing else that seems relevant.
UPDATE 1: Thanks to lbolla and his insightful analysis, I was able to file a Python bug report about this.
See sbt's analysis in that report, and also here. See also the Python bug report Fix exception pickling. So, following sbt's explanation, we can reproduce the original error with
import sqlalchemy.exc
e = sqlalchemy.exc.ProgrammingError("", {}, None)
type(e)(*e.args)
which gives
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
TypeError: __init__() takes at least 4 arguments (2 given)
UPDATE 2: This has been fixed, at least for SQLAlchemy, by Mike Bayer, see the bug report StatementError Exceptions un-pickable.. Per Mike's suggestion, I also reported a similar bug to psycopg2, though I didn't (and don't) have an actual example of breakage. Regardless, they have apparently fixed it, though they gave no details of the fix. See psycopg exceptions cannot be pickled. For good measure, I also reported a Python bug ConfigParser exceptions are not pickleable corresponding to the SO question lbolla mentioned. It seems they want a test for this.
Anyway, this looks like it will continue to be a problem in the foreseeable future, since, by and large, Python developers don't seem to be aware of this issue and so don't guard against it. Surprisingly, it seems that there are not enough people using multiprocessing for this to be a well known issue, or maybe they just put up with it. I hope the Python developers get around to fixing it at least for Python 3, because it is annoying.
I accepted lbolla's answer, as without his explanation of how the problem was related to exception handling, I would likely have gone nowhere in understanding this. I also want to thank sbt, who explained that Python not being able to pickle exceptions was the problem. I'm very grateful to both of them, and please vote their answers up. Thanks.
UPDATE 3: I posted a followup question: Catching unpickleable exceptions and re-raising.
I believe the TypeError comes from multiprocessing's get.
I've stripped out all the DB code from your script. Take a look at this:
import multiprocessing
import sqlalchemy.exc
def do(kwargs):
i = kwargs['i']
print i
raise sqlalchemy.exc.ProgrammingError("", {}, None)
return i
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
# Use get or wait?
# r.get()
r.wait()
pool.close()
pool.join()
print results
Using r.wait returns the result expected, but using r.get raises TypeError. As describe in python's docs, use r.wait after a map_async.
Edit: I have to amend my previous answer. I now believe the TypeError comes from SQLAlchemy. I've amended my script to reproduce the error.
Edit 2: It looks like the problem is that multiprocessing.pool does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).
I've amended my script to highlight this.
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
class GoodExc(Exception):
def __init__(self, a=None):
'''Optional param in the constructor.'''
self.a = a
def do(kwargs):
i = kwargs['i']
print i
raise BadExc('a')
# raise GoodExc('a')
return i
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do function and re-raise a normal Exception instead. Something like this:
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
def do(kwargs):
try:
i = kwargs['i']
print i
raise BadExc('a')
return i
except Exception as e:
raise Exception(repr(e))
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.
As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).
The TypeError: ('__init__() takes at least 4 arguments (2 given) error isn't related to the sql you're trying to execute, it has to do with how you're using SqlAlchemy's API.
The trouble is that you're trying to call execute on the session class rather than an instance of that session.
Try this:
session = Session()
session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
session.commit()
From the docs:
It is intended that the sessionmaker() function be called within the
global scope of an application, and the returned class be made
available to the rest of the application as the single class used to
instantiate sessions.
So Session = sessionmaker() returns a new session class and session = Session() returns an instance of that class which you can then call execute on.
I don't know about the cause of the original exception. However, multiprocessing's problems with "bad" exceptions is really down to how pickling works. I think the sqlachemy exception class is broken.
If an exception class has an __init__() method which does not call BaseException.__init__() (directly or indirectly) then self.args probably will not be set properly. BaseException.__reduce__() (which is used by the pickle protocol) assumes that a copy of an exception e can be recreated by just doing
type(e)(*e.args)
For example
>>> e = ValueError("bad value")
>>> e
ValueError('bad value',)
>>> type(e)(*e.args)
ValueError('bad value',)
If this invariant does not hold then pickling/unpickling will fail. So instances of
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
can be pickled, but the result cannot be unpickled:
>>> from cPickle import loads, dumps
>>> class BadExc(Exception):
... def __init__(self, a):
... '''Non-optional param in the constructor.'''
... self.a = a
...
>>> loads(dumps(BadExc(1)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.BadExc'>, ())
But instances of
class GoodExc1(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
Exception.__init__(self, a)
self.a = a
or
class GoodExc2(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.args = (a,)
self.a = a
can be successfully pickled/unpickled.
So you should ask the developers of sqlalchemy to fix their exception classes. In the mean time you can probably use copy_reg.pickle() to override BaseException.__reduce__() for the troublesome classes.
(This is in answer to Faheem Mitha's question in a comment about how to use copy_reg to work around the broken exception classes.)
The __init__() methods of SQLAlchemy's exception classes seem to call their base class's __init__() methods, but with different arguments. This mucks up pickling.
To customise the pickling of sqlalchemy's exception classes you can use copy_reg to register your own reduce functions for those classes.
A reduce function takes an argument obj and returns a pair (callable_obj, args) such that a copy of obj can be created by doing callable_obj(*args). For example
class StatementError(SQLAlchemyError):
def __init__(self, message, statement, params, orig):
SQLAlchemyError.__init__(self, message)
self.statement = statement
self.params = params
self.orig = orig
...
can be "fixed" by doing
import copy_reg, sqlalchemy.exc
def reduce_StatementError(e):
message = e.args[0]
args = (message, e.statement, e.params, e.orig)
return (type(e), args)
copy_reg.pickle(sqlalchemy.exc.StatementError, reduce_StatementError)
There are several other classes in sqlalchemy.exc which need to be fixed similarly. But hopefully you get the idea.
On second thoughts, rather than fixing each class individually, you can probably just monkey patch the __reduce__() method of the base exception class:
import sqlalchemy.exc
def rebuild_exc(cls, args, dic):
e = Exception.__new__(cls)
e.args = args
e.__dict__.update(dic)
return e
def __reduce__(e):
return (rebuild_exc, (type(e), e.args, e.__dict__))
sqlalchemy.exc.SQLAlchemyError.__reduce__ = __reduce__