Hang in Python script using SQLAlchemy and multiprocessing - python

Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module.
This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default) on Debian squeeze.
This is a simplified version of some actual code.
import multiprocessing
from sqlalchemy import *
from sqlalchemy.orm import *
dbuser = ...
password = ...
dbname = ...
dbstring = "postgresql://%s:%s#localhost:5432/%s"%(dbuser, password, dbname)
db = create_engine(dbstring)
m = MetaData(db)
def make_foo(i):
t1 = Table('foo%s'%i, m, Column('a', Integer, primary_key=True))
conn = db.connect()
for i in range(10):
conn.execute("DROP TABLE IF EXISTS foo%s"%i)
conn.close()
db.dispose()
for i in range(10):
make_foo(i)
m.create_all()
def do(kwargs):
i, dbstring = kwargs['i'], kwargs['dbstring']
db = create_engine(dbstring)
Session = scoped_session(sessionmaker())
Session.configure(bind=db)
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i, 'dbstring':dbstring})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
r.get()
r.wait()
pool.close()
pool.join()
This script hangs with the following error message.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 259, in _handle_results
task = get()
TypeError: ('__init__() takes at least 4 arguments (2 given)', <class 'sqlalchemy.exc.ProgrammingError'>, ('(ProgrammingError) syntax error at or near "%"\nLINE 1: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;\n ^\n',))
Of course, the syntax error here is TRUNCATE foo%s;. My question is, why is the process hanging, and can I persuade it to exit with an error instead, without doing major surgery to my code? This behavior is very similar to that of my actual code.
Note that the hang does not occur if the statement is replaced by something like print foobarbaz. Also, the hang still happens if we replace
Session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
Session.commit()
db.dispose()
by just Session.execute("TRUNCATE foo%s;")
I'm using the former version because it is closer to what my actual code is doing.
Also, removing multiprocessing from the picture and looping over the tables serially makes the hang go away, and it just exits with an error.
I'm also kind of puzzled by the form of the error, particularly the TypeError: ('__init__() takes at least 4 arguments (2 given)' bit. Where is this error coming from? It seems likely it is from somewhere in the multiprocessing code.
The PostgreSQL logs aren't helpful. I see lots of lines like
2012-01-09 14:16:34.174 IST [7810] 4f0aa96a.1e82/1 12/583 0 ERROR: syntax error at or near "%" at character 28
2012-01-09 14:16:34.175 IST [7810] 4f0aa96a.1e82/2 12/583 0 STATEMENT: COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;
but nothing else that seems relevant.
UPDATE 1: Thanks to lbolla and his insightful analysis, I was able to file a Python bug report about this.
See sbt's analysis in that report, and also here. See also the Python bug report Fix exception pickling. So, following sbt's explanation, we can reproduce the original error with
import sqlalchemy.exc
e = sqlalchemy.exc.ProgrammingError("", {}, None)
type(e)(*e.args)
which gives
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
TypeError: __init__() takes at least 4 arguments (2 given)
UPDATE 2: This has been fixed, at least for SQLAlchemy, by Mike Bayer, see the bug report StatementError Exceptions un-pickable.. Per Mike's suggestion, I also reported a similar bug to psycopg2, though I didn't (and don't) have an actual example of breakage. Regardless, they have apparently fixed it, though they gave no details of the fix. See psycopg exceptions cannot be pickled. For good measure, I also reported a Python bug ConfigParser exceptions are not pickleable corresponding to the SO question lbolla mentioned. It seems they want a test for this.
Anyway, this looks like it will continue to be a problem in the foreseeable future, since, by and large, Python developers don't seem to be aware of this issue and so don't guard against it. Surprisingly, it seems that there are not enough people using multiprocessing for this to be a well known issue, or maybe they just put up with it. I hope the Python developers get around to fixing it at least for Python 3, because it is annoying.
I accepted lbolla's answer, as without his explanation of how the problem was related to exception handling, I would likely have gone nowhere in understanding this. I also want to thank sbt, who explained that Python not being able to pickle exceptions was the problem. I'm very grateful to both of them, and please vote their answers up. Thanks.
UPDATE 3: I posted a followup question: Catching unpickleable exceptions and re-raising.

I believe the TypeError comes from multiprocessing's get.
I've stripped out all the DB code from your script. Take a look at this:
import multiprocessing
import sqlalchemy.exc
def do(kwargs):
i = kwargs['i']
print i
raise sqlalchemy.exc.ProgrammingError("", {}, None)
return i
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
# Use get or wait?
# r.get()
r.wait()
pool.close()
pool.join()
print results
Using r.wait returns the result expected, but using r.get raises TypeError. As describe in python's docs, use r.wait after a map_async.
Edit: I have to amend my previous answer. I now believe the TypeError comes from SQLAlchemy. I've amended my script to reproduce the error.
Edit 2: It looks like the problem is that multiprocessing.pool does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).
I've amended my script to highlight this.
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
class GoodExc(Exception):
def __init__(self, a=None):
'''Optional param in the constructor.'''
self.a = a
def do(kwargs):
i = kwargs['i']
print i
raise BadExc('a')
# raise GoodExc('a')
return i
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do function and re-raise a normal Exception instead. Something like this:
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
def do(kwargs):
try:
i = kwargs['i']
print i
raise BadExc('a')
return i
except Exception as e:
raise Exception(repr(e))
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.
As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).

The TypeError: ('__init__() takes at least 4 arguments (2 given) error isn't related to the sql you're trying to execute, it has to do with how you're using SqlAlchemy's API.
The trouble is that you're trying to call execute on the session class rather than an instance of that session.
Try this:
session = Session()
session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
session.commit()
From the docs:
It is intended that the sessionmaker() function be called within the
global scope of an application, and the returned class be made
available to the rest of the application as the single class used to
instantiate sessions.
So Session = sessionmaker() returns a new session class and session = Session() returns an instance of that class which you can then call execute on.

I don't know about the cause of the original exception. However, multiprocessing's problems with "bad" exceptions is really down to how pickling works. I think the sqlachemy exception class is broken.
If an exception class has an __init__() method which does not call BaseException.__init__() (directly or indirectly) then self.args probably will not be set properly. BaseException.__reduce__() (which is used by the pickle protocol) assumes that a copy of an exception e can be recreated by just doing
type(e)(*e.args)
For example
>>> e = ValueError("bad value")
>>> e
ValueError('bad value',)
>>> type(e)(*e.args)
ValueError('bad value',)
If this invariant does not hold then pickling/unpickling will fail. So instances of
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
can be pickled, but the result cannot be unpickled:
>>> from cPickle import loads, dumps
>>> class BadExc(Exception):
... def __init__(self, a):
... '''Non-optional param in the constructor.'''
... self.a = a
...
>>> loads(dumps(BadExc(1)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.BadExc'>, ())
But instances of
class GoodExc1(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
Exception.__init__(self, a)
self.a = a
or
class GoodExc2(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.args = (a,)
self.a = a
can be successfully pickled/unpickled.
So you should ask the developers of sqlalchemy to fix their exception classes. In the mean time you can probably use copy_reg.pickle() to override BaseException.__reduce__() for the troublesome classes.

(This is in answer to Faheem Mitha's question in a comment about how to use copy_reg to work around the broken exception classes.)
The __init__() methods of SQLAlchemy's exception classes seem to call their base class's __init__() methods, but with different arguments. This mucks up pickling.
To customise the pickling of sqlalchemy's exception classes you can use copy_reg to register your own reduce functions for those classes.
A reduce function takes an argument obj and returns a pair (callable_obj, args) such that a copy of obj can be created by doing callable_obj(*args). For example
class StatementError(SQLAlchemyError):
def __init__(self, message, statement, params, orig):
SQLAlchemyError.__init__(self, message)
self.statement = statement
self.params = params
self.orig = orig
...
can be "fixed" by doing
import copy_reg, sqlalchemy.exc
def reduce_StatementError(e):
message = e.args[0]
args = (message, e.statement, e.params, e.orig)
return (type(e), args)
copy_reg.pickle(sqlalchemy.exc.StatementError, reduce_StatementError)
There are several other classes in sqlalchemy.exc which need to be fixed similarly. But hopefully you get the idea.
On second thoughts, rather than fixing each class individually, you can probably just monkey patch the __reduce__() method of the base exception class:
import sqlalchemy.exc
def rebuild_exc(cls, args, dic):
e = Exception.__new__(cls)
e.args = args
e.__dict__.update(dic)
return e
def __reduce__(e):
return (rebuild_exc, (type(e), e.args, e.__dict__))
sqlalchemy.exc.SQLAlchemyError.__reduce__ = __reduce__

Related

Assertion Error when logging.exception(error)

I have this function in a script called mymodule.py
import logging
def foo():
try:
raise ConnectionError('My Connection Error')
except ConnectionError as ce:
logging.exception(ce)
And I have the test for it called test_mymodule.py:
import unittest
import unittest.mock as um
import mymodule
class TestLoggingException(unittest.TestCase):
#um.patch('mymodule.logging')
def test_connection_error_correctly_logged_without_raising(self, mock_logging):
mymodule.foo()
mock_logging.assert_has_calls(
[um.call(ConnectionError('My Connection Error'))]
)
However, when running test_mymodule.py, the below assertion error is raised.
AssertionError: Calls not found.
Expected: [call(ConnectionError('My Connection Error'))]
Actual: [call(ConnectionError('My Connection Error'))]
Why is it thinking they are different and how could I work around this?
The problem is that two instances of ConnectionError, even if create with the same arguments, are not equal.
You create two instances in your code : in foo and in the um.call().
However, those two instance are not the same, and are therefore not equal. You can illustrate that simply:
>>> ConnectionError("test") == ConnectionError("test")
False
One solution is to check which calls were made to the mockup. The calls are exposed through a variable called mockup_calls.
Something like this
class TestLoggingException(unittest.TestCase):
#um.patch('mymodule.logging')
def test_connection_error_correctly_logged_without_raising(self, mock_logging):
mymodule.foo()
print("Calls are: ", mock_logging.mock_calls)
# Check that logging was called with logging.exception(ConnectionError("My Connection Error"))
calls = mock_logging.mock_calls
assert(len(calls) == 1)
# Unpack call
function_called, args, kwargs = calls[0]
assert(function_called == "exception")
connection_error = args[0]
assert(isinstance(connection_error, ConnectionError))
assert(connection_error.args[0] == "My Connection Error") # This will depend on how ConnectionError is defined
What is tricky about this example is that it would work with types that evaluate equal even if they are not the same, like str("hi" == "hi" will yield True), but not most classes.
Does it help ?

concurrent.futures.ThreadPoolExecutor doesn't print errors

I am trying to use concurrent.futures.ThreadPoolExecutor module to run a class method in parallel, the simplified version of my code is pretty much the following:
class TestClass:
def __init__(self, secondsToSleepFor):
self.secondsToSleepFor = secondsToSleepFor
def testMethodToExecInParallel(self):
print("ThreadName: " + threading.currentThread().getName())
print(threading.currentThread().getName() + " is sleeping for " + str(self.secondsToSleepFor) + " seconds")
time.sleep(self.secondsToSleepFor)
print(threading.currentThread().getName() + " has finished!!")
with concurrent.futures.ThreadPoolExecutor(max_workers = 2) as executor:
futuresList = []
print("before try")
try:
testClass = TestClass(3)
future = executor.submit(testClass.testMethodToExecInParallel)
futuresList.append(future)
except Exception as exc:
print('Exception generated: %s' % exc)
If I execute this code it seems to behave like it is intended to.
But if I make a mistake like specifying a wrong number of parameters in "testMethodToExecInParallel" like:
def testMethodToExecInParallel(self, secondsToSleepFor):
and then still submitting the function as:
future = executor.submit(testClass.testMethodToExecInParallel)
or trying to concatenate a string object with an integer object (without using str(.) ) inside a print statement in "testMethodToExecInParallel" method:
def testMethodToExecInParallel(self):
print("ThreadName: " + threading.currentThread().getName())
print("self.secondsToSleepFor: " + self.secondsToSleepFor) <-- Should report an Error here
the program doesn't return any error; just prints "before try" and ends execution...
Is trivial to understand that this makes the program nearly undebuggable... Could someone explain me why such behaviour happens?
(for the first case of mistake) concurrent.futures.ThreadPoolExecutor doesn't check for a function with the specified signature to submit and, eventually, throw some sort of "noSuchFunction" exception?
Maybe there is some sort of problem in submitting to ThreadPoolExecutor class methods instead of simple standalone functions and, so, such behaviour could be expected?
Or maybe the error is thrown inside the thread and for some reason I can't read it?
-- EDIT --
Akshay.N suggestion of inserting future.result() after submitting functions to ThreadPoolExecutor makes the program behave as expected: goes nice if the code is correct, prints the error if something in the code is wrong.
I thing users must be warned about this very strange behaviour of ThreadPoolExecutor:
if you only submit functions to ThreadPoolExecutor WITHOUT THEN CALLING future.result():
- if the code is correct, the program goes on and behaves as expected
- if something in the code is wrong seems the program doesn't call the submitted function, whatever it does: it doesn't report the errors in the code
As far as my knowledge goes which is "not so far", you have to call "e.results()" after "executor.submit(testClass.testMethodToExecInParallel)" in order to execute the threadpool .
I have tried what you said and it is giving me error, below is the code
>>> import concurrent.futures as cf
>>> executor = cf.ThreadPoolExecutor(1)
>>> def a(x,y):
... print(x+y)
...
>>> future = executor.submit(a, 2, 35, 45)
>>> future.result()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line
425, in result
return self.__get_result()
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line
384, in __get_result
raise self._exception
File "C:\Users\username
\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\thread.py", line
57, in run
result = self.fn(*self.args, **self.kwargs)
TypeError: a() takes 2 positional arguments but 3 were given
Let me know if it still doesn't work

Python 3.6: "AttributeError: 'SimpleQueue' object has no attribute '_poll'"

When I use a standard Queue to send samples to a process, everything works fine. However, since my needs are simple, I tried to use a SimpleQueue and for some reason the 'empty' method doesn't work. Here's the details:
Error comes from the consumer process (when sample_queue is Queue, everything works, when sample_queue is SimpleQueue, things break):
def frame_update(i):
while not self.sample_queue.empty():
sample = self.sample_queue.get()
for line in lines:
While executing sample_queue.empty() -- SimpleQueue.empty(), from Python 3.6 on windows (queues.py) we get:
def empty(self):
return not self._poll()
Where self._poll() has been set in init by:
def __init__(self, *, ctx):
self._reader, self._writer = connection.Pipe(duplex=False)
self._rlock = ctx.Lock()
self._poll = self._reader.poll
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = ctx.Lock()
So I follow the self._reader which is set from connection.Pipe (connection.py):
...
c1 = PipeConnection(h1, writable=duplex)
c2 = PipeConnection(h2, readable=duplex)
Ok, great. The _reader is going to be a PipeConnection and pipe connection has this method:
def _poll(self, timeout):
if (self._got_empty_message or
_winapi.PeekNamedPipe(self._handle)[0] != 0):
return True
return bool(wait([self], timeout))
Alright -- So a couple of questions:
1) Shouldn't the init of SimpleQueue be assigning self.poll to self._reader._poll instead of self._reader.poll? Or am I missing something in the inheritance hierarchy?
2) The PipeConnection _poll routine takes a timeout parameter, so #1 shouldn't work...
*) -- Is there some other binding of PipeConnection _poll that I'm missing?
Am I missing something? I am using Python3.6, Windows, debugging in PyCharm and I follow all the paths and they're in the standard multiprocessing paths. I'd appreciate any help or advice. Thanks!
EDIT: After further review, I can see that PipeConnection is a subclass of _ConnectionBase which does indeed have a 'poll' method and it is bound with a default timeout parameter.
So the question is: When SimpleQueue is initializing and sets
self._poll = self._reader.poll
Why doesn't it go up the class hierarchy to grab that from _ConnectionBase?
After looking at why the Queue type works and why the SimpleQueue doesn't, I found that Queue sets the _poll method 'after_fork' as well as before. SimpleQueue doesn't. By changing the setstate method to add self._poll = self._reader.poll as follows (queues.py, line 338), SimpleQueue works
def __setstate__(self, state):
(self._reader, self._writer, self._rlock, self._wlock) = state
self._poll = self._reader.poll
Seems like a bug to me unless I'm really misunderstanding something. I'll submit a bug report and reference this post. Hope this helps someone!
http://bugs.python.org/issue30301

pylint on in-memory file/stream

I'd like to embed pylint in a program. The user enters python programs (in Qt, in a QTextEdit, although not relevant) and in the background I call pylint to check the text he enters. Finally, I print the errors in a message box.
There are thus two questions: First, how can I do this without writing the entered text to a temporary file and giving it to pylint ? I suppose at some point pylint (or astroid) handles a stream and not a file anymore.
And, more importantly, is it a good idea ? Would it cause problems for imports or other stuffs ? Intuitively I would say no since it seems to spawn a new process (with epylint) but I'm no python expert so I'm really not sure. And if I use this to launch pylint, is it okay too ?
Edit:
I tried tinkering with pylint's internals, event fought with it, but finally have been stuck at some point.
Here is the code so far:
from astroid.builder import AstroidBuilder
from astroid.exceptions import AstroidBuildingException
from logilab.common.interface import implements
from pylint.interfaces import IRawChecker, ITokenChecker, IAstroidChecker
from pylint.lint import PyLinter
from pylint.reporters.text import TextReporter
from pylint.utils import PyLintASTWalker
class Validator():
def __init__(self):
self._messagesBuffer = InMemoryMessagesBuffer()
self._validator = None
self.initValidator()
def initValidator(self):
self._validator = StringPyLinter(reporter=TextReporter(output=self._messagesBuffer))
self._validator.load_default_plugins()
self._validator.disable('W0704')
self._validator.disable('I0020')
self._validator.disable('I0021')
self._validator.prepare_import_path([])
def destroyValidator(self):
self._validator.cleanup_import_path()
def check(self, string):
return self._validator.check(string)
class InMemoryMessagesBuffer():
def __init__(self):
self.content = []
def write(self, st):
self.content.append(st)
def messages(self):
return self.content
def reset(self):
self.content = []
class StringPyLinter(PyLinter):
"""Does what PyLinter does but sets checkers once
and redefines get_astroid to call build_string"""
def __init__(self, options=(), reporter=None, option_groups=(), pylintrc=None):
super(StringPyLinter, self).__init__(options, reporter, option_groups, pylintrc)
self._walker = None
self._used_checkers = None
self._tokencheckers = None
self._rawcheckers = None
self.initCheckers()
def __del__(self):
self.destroyCheckers()
def initCheckers(self):
self._walker = PyLintASTWalker(self)
self._used_checkers = self.prepare_checkers()
self._tokencheckers = [c for c in self._used_checkers if implements(c, ITokenChecker)
and c is not self]
self._rawcheckers = [c for c in self._used_checkers if implements(c, IRawChecker)]
# notify global begin
for checker in self._used_checkers:
checker.open()
if implements(checker, IAstroidChecker):
self._walker.add_checker(checker)
def destroyCheckers(self):
self._used_checkers.reverse()
for checker in self._used_checkers:
checker.close()
def check(self, string):
modname = "in_memory"
self.set_current_module(modname)
astroid = self.get_astroid(string, modname)
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
self._add_suppression_messages()
self.set_current_module('')
self.stats['statement'] = self._walker.nbstatements
def get_astroid(self, string, modname):
"""return an astroid representation for a module"""
try:
return AstroidBuilder().string_build(string, modname)
except SyntaxError as ex:
self.add_message('E0001', line=ex.lineno, args=ex.msg)
except AstroidBuildingException as ex:
self.add_message('F0010', args=ex)
except Exception as ex:
import traceback
traceback.print_exc()
self.add_message('F0002', args=(ex.__class__, ex))
if __name__ == '__main__':
code = """
a = 1
print(a)
"""
validator = Validator()
print(validator.check(code))
The traceback is the following:
Traceback (most recent call last):
File "validator.py", line 16, in <module>
main()
File "validator.py", line 13, in main
print(validator.check(code))
File "validator.py", line 30, in check
self._validator.check(string)
File "validator.py", line 79, in check
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
File "c:\Python33\lib\site-packages\pylint\lint.py", line 659, in check_astroid_module
tokens = tokenize_module(astroid)
File "c:\Python33\lib\site-packages\pylint\utils.py", line 103, in tokenize_module
print(module.file_stream)
AttributeError: 'NoneType' object has no attribute 'file_stream'
# And sometimes this is added :
File "c:\Python33\lib\site-packages\astroid\scoped_nodes.py", line 251, in file_stream
return open(self.file, 'rb')
OSError: [Errno 22] Invalid argument: '<?>'
I'll continue digging tomorrow. :)
I got it running.
the first one (NoneType …) is really easy and a bug in your code:
Encountering an exception can make get_astroid “fail”, i.e. send one syntax error message and return!
But for the secong one… such bullshit in pylint’s/logilab’s API… Let me explain: Your astroid object here is of type astroid.scoped_nodes.Module.
It’s also created by a factory, AstroidBuilder, which sets astroid.file = '<?>'.
Unfortunately, the Module class has following property:
#property
def file_stream(self):
if self.file is not None:
return open(self.file, 'rb')
return None
And there’s no way to skip that except for subclassing (Which would render us unable to use the magic in AstroidBuilder), so… monkey patching!
We replace the ill-defined property with one that checks an instance for a reference to our code bytes (e.g. astroid._file_bytes) before engaging in above default behavior.
def _monkeypatch_module(module_class):
if module_class.file_stream.fget.__name__ == 'file_stream_patched':
return # only patch if patch isn’t already applied
old_file_stream_fget = module_class.file_stream.fget
def file_stream_patched(self):
if hasattr(self, '_file_bytes'):
return BytesIO(self._file_bytes)
return old_file_stream_fget(self)
module_class.file_stream = property(file_stream_patched)
That monkeypatching can be called just before calling check_astroid_module. But one more thing has to be done. See, there’s more implicit behavior: Some checkers expect and use astroid’s file_encoding field. So we now have this code in the middle of check:
astroid = self.get_astroid(string, modname)
if astroid is not None:
_monkeypatch_module(astroid.__class__)
astroid._file_bytes = string.encode('utf-8')
astroid.file_encoding = 'utf-8'
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
One could say that no amount of linting creates actually good code. Unfortunately pylint unites enormous complexity with a specialization of calling it on files. Really good code has a nice native API and wraps that with a CLI interface. Don’t ask me why file_stream exists if internally, Module gets built from but forgets the source code.
PS: i had to change sth else in your code: load_default_plugins has to come before some other stuff (maybe prepare_checkers, maybe sth. else)
PPS: i suggest subclassing BaseReporter and using that instead of your InMemoryMessagesBuffer
PPPS: this just got pulled (3.2014), and will fix this: https://bitbucket.org/logilab/astroid/pull-request/15/astroidbuilderstring_build-was/diff
4PS: this is now in the official version, so no monkey patching required: astroid.scoped_nodes.Module now has a file_bytes property (without leading underscore).
Working with an unlocatable stream may definitly cause problems in case of relative imports, since the location is then needed to find the actually imported module.
Astroid support building an AST from a stream, but this is not used/exposed through Pylint which is a level higher and designed to work with files. So while you may acheive this it will need a bit of digging into the low-level APIs.
The easiest way is definitly to save the buffer to the file then to use the SA answer to start pylint programmatically if you wish (totally forgot this other account of mine found in other responses ;). Another option being to write a custom reporter to gain more control.

Why does this function not work when used as a decorator?

UPDATE: As noted by Mr. Fooz, the functional version of the wrapper has a bug, so I reverted to the original class implementation. I've put the code up on GitHub:
https://github.com/nofatclips/timeout/commits/master
There are two commits, one working (using the "import" workaround) the second one broken.
The source of the problem seems to be the pickle#dumps function, which just spits out an identifier when called on an function. By the time I call Process, that identifier points to the decorated version of the function, rather than the original one.
ORIGINAL MESSAGE:
I was trying to write a function decorator to wrap a long task in a Process that would be killed if a timeout expires. I came up with this (working but not elegant) version:
from multiprocessing import Process
from threading import Timer
from functools import partial
from sys import stdout
def safeExecution(function, timeout):
thread = None
def _break():
#stdout.flush()
#print (thread)
thread.terminate()
def start(*kw):
timer = Timer(timeout, _break)
timer.start()
thread = Process(target=function, args=kw)
ret = thread.start() # TODO: capture return value
thread.join()
timer.cancel()
return ret
return start
def settimeout(timeout):
return partial(safeExecution, timeout=timeout)
##settimeout(1)
def calculatePrimes(maxPrimes):
primes = []
for i in range(2, maxPrimes):
prime = True
for prime in primes:
if (i % prime == 0):
prime = False
break
if (prime):
primes.append(i)
print ("Found prime: %s" % i)
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrime = a(calculatePrimes)
calculatePrime(24000)
As you can see, I commented out the decorator and assigned the modified version of calculatePrimes to calculatePrime. If I tried to reassign it to the same variable, I'd get a "Can't pickle : attribute lookup builtins.function failed" error when trying to call the decorated version.
Anybody has any idea of what is happening under the hood? Is the original function being turned into something different when I assign the decorated version to the identifier referencing it?
UPDATE: To reproduce the error, I just change the main part to
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrimes = a(calculatePrimes)
calculatePrimes(24000)
#sleep(2)
which yields:
Traceback (most recent call last):
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 49, in <module>
calculatePrimes(24000)
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 19, in start
ret = thread.start()
File "C:\Python33\lib\multiprocessing\process.py", line 111, in start
self._popen = Popen(self)
File "C:\Python33\lib\multiprocessing\forking.py", line 241, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python33\lib\multiprocessing\forking.py", line 160, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtin
s.function failed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python33\lib\multiprocessing\forking.py", line 344, in main
self = load(from_parent)
EOFError
P.S. I also wrote a class version of safeExecution, which has exactly the same behaviour.
Move the function to a module that's imported by your script.
Functions are only picklable in python if they're defined at the top level of a module. Ones defined in scripts are not picklable by default. Module-based functions are pickled as two strings: the name of the module, and the name of the function. They're unpickled by dynamically importing the module then looking up the function object by name (hence the restriction on top-level-only functions).
It's possible to extend the pickle handlers to support semi-generic function and lambda pickling, but doing so can be tricky. In particular, it can be difficult to reconstruct the full namespace tree if you want to properly handle things like decorators and nested functions. If you want to do this, it's best to use Python 2.7 or later or Python 3.3 or later (earlier versions have a bug in the dispatcher of cPickle and pickle that's unpleasant to work around).
Is there an easy way to pickle a python function (or otherwise serialize its code)?
Python: pickling nested functions
http://bugs.python.org/issue7689
EDIT:
At least in Python 2.6, the pickling works fine for me if the script only contains the if __name__ block, the script imports calculatePrimes and settimeout from a module, and if the inner start function's name is monkey-patched:
def safeExecution(function, timeout):
...
def start(*kw):
...
start.__name__ = function.__name__ # ADD THIS LINE
return start
There's a second problem that's related to Python's variable scoping rules. The assignment to the thread variable inside start creates a shadow variable whose scope is limited to one evaluation of the start function. It does not assign to the thread variable found in the enclosing scope. You can't use the global keyword to override the scope because you want and intermediate scope and Python only has full support for manipulating the local-most and global-most scopes, not any intermediate ones. You can overcome this problem by placing the thread object in a container that's housed in the intermediate scope. Here's how:
def safeExecution(function, timeout):
thread_holder = [] # MAKE IT A CONTAINER
def _break():
#stdout.flush()
#print (thread)
thread_holder[0].terminate() # REACH INTO THE CONTAINER
def start(*kw):
...
thread = Process(target=function, args=kw)
thread_holder.append(thread) # MUTATE THE CONTAINER
...
start.__name__ = function.__name__ # MAKES THE PICKLING WORK
return start
Not sure really why you get that problem, but to answer your title question: Why does the decorator not work?
When you pass arguments to a decorator, you need to structure the code slightly different. Essentially you have to implement the decorator as a class with an __init__ and an __call__.
In the init, you collect the arguments that you send to the decorator, and in the call, you'll get the function you decorate:
class settimeout(object):
def __init__(self, timeout):
self.timeout = timeout
def __call__(self, func):
def wrapped_func(n):
func(n, self.timeout)
return wrapped_func
#settimeout(1)
def func(n, timeout):
print "Func is called with", n, 'and', timeout
func(24000)
This should get you going on the decorator front at least.

Categories

Resources