Does sending a dictionary through a multiprocessing.queue mutate it somehow? - python

I have a setup where I send a dictionary through a multiprocessing.queue and do some stuff with it. I was getting an odd "dictionary size changed while iterating over it" error when I wasn't changing anything in the dictionary. Here's the traceback, although it's not terribly helpful:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/queues.py", line 242, in _feed
send(obj)
RuntimeError: dictionary changed size during iteration
So I tried changing the dictionary to an immutable dictionary to see where it was getting altered. Here's the traceback I got:
Traceback (most recent call last):
File "/home/jason/src/interface_dev/jiva_interface/jiva_interface/delta.py", line 54, in main
msg = self.recv()
File "/home/jason/src/interface_dev/jiva_interface/jiva_interface/process/__init__.py", line 65, in recv
return self.inqueue.get(timeout=timeout)
File "/usr/lib/python2.6/multiprocessing/queues.py", line 91, in get
res = self._recv()
File "build/bdist.linux-i686/egg/pysistence/persistent_dict.py", line 22, in not_implemented_method
raise NotImplementedError, 'Cannot set values in a PDict'
NotImplementedError: Cannot set values in a PDict
This is a bit odd, because as far as I can tell, I'm not doing anything other than getting it from the queue. Could someone shed some light on what's happening here?

There was a bug fixed quite recently where a garbage collection could change the size of a dictionary that contained weak references and that could trigger the "dictionary changed size during iteration" error. I don't know if that is your problem but the multiprocessing package does use weak references.
See http://bugs.python.org/issue7105

Related

Race condition in line of Python

I have an interesting problem. I am -- for shits and giggles -- trying to write a program really shortly. I have it down to 2 lines, but it has a race condition, and I can't figure out why. Here's the gist of it:
imports...
...[setattr(__main__, 'f', [1, 2, ..]), reduce(...random.choice(f)...)][1]...
Every once in a while, the following exception will be generated. But NOT always. That's my problem. I suspect that the order of execution is not guaranteed especially since I'm using the list trick -- I would assume that maybe the interpreter can predict that setattr() returns None and knows that I'm only selecting the 2nd thing in the list, so it defers the actual setattr() to later. But it only happens sometimes. Any ideas? Does CPython automatically thread some things like map, filter, reduce calls?
Traceback (most recent call last):
File "/usr/lib64/python3.4/random.py", line 253, in choice
i = self._randbelow(len(seq))
File "/usr/lib64/python3.4/random.py", line 230, in _randbelow
r = getrandbits(k) # 0 <= r < 2**k
ValueError: number of bits must be greater than zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test4.py", line 2, in <module>
print(" ".join([setattr(n,'f',open(sys.argv[1],"r").read().replace("\n"," ").split(" ")),setattr(n,'m',c.defaultdict(list)),g.reduce(lambda p,e:p+[r.choice(m[p[-1]])],range(int(sys.argv[2])),[r.choice(list(filter(lambda x:[m[x[0]].append(x[1]),x[0].isupper()][1],zip(f[:-1],f[1:]))))[0]])][2]))
File "test4.py", line 2, in <lambda>
print(" ".join([setattr(n,'f',open(sys.argv[1],"r").read().replace("\n"," ").split(" ")),setattr(n,'m',c.defaultdict(list)),g.reduce(lambda p,e:p+[r.choice(m[p[-1]])],range(int(sys.argv[2])),[r.choice(list(filter(lambda x:[m[x[0]].append(x[1]),x[0].isupper()][1],zip(f[:-1],f[1:]))))[0]])][2]))
File "/usr/lib64/python3.4/random.py", line 255, in choice
raise IndexError('Cannot choose from an empty sequence')
IndexError: Cannot choose from an empty sequence
I've tried modifying globals() and vars() insetad of using setattr(), but that does not seem to help (same exception sequence).
Here's the actual code:
import sys,collections as c,random as r,functools as g,__main__ as n
print(" ".join([setattr(n,'f',open(sys.argv[1],"r").read().replace("\n"," ").split(" ")),setattr(n,'m',c.defaultdict(list)),g.reduce(lambda p,e:p+[r.choice(m[p[-1]])],range(int(sys.argv[2])),[r.choice(list(filter(lambda x:[m[x[0]].append(x[1]),x[0].isupper()][1],zip(f[:-1],f[1:]))))[0]])][2]))
If you're curious: This is to read in a text file, generate a Markov model, and spit out a sentence.
random.choice()
Well, of course that is nondeterministic. If you are very careful, you could set the seed of the pseudo-random number generator to something constant, and hope that's fabricates the same sequence every time. There's a good chance it will work.
random.seed(42); ...
Alright, here's what actually happened: In my sentence generation, I sometimes hit the last word in the file (which in some cases, depending on the file, does not have a possible successor state). Hence, I'm trying to choose from an empty list in that case.

Multiprocessing memory limitation

Can anyone please shed some light into what's the highest memory which can be reached from multiprocessing?
I am running into an issue where multiprocessing.Queue fails by returning the following error when I run a program with arrays of size ~ 40 Gb. The same program works for the smaller arrays:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
send(obj)
SystemError: NULL result without error in PyObject_Call
So can anyone please share some information regarding the limitations for huge data?
Thank you

Django matching query does not exist

I was just checking if a change I had made to my models had taken affect when I started to get this for some (but not all) of my models. I've never seen this before and I'm fairly certain I've had no issue querying these models in the past.
>>> record = Record.objects.get(id=1)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/alittlesquid/grocerygod/fratgroceries/ggenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/alittlesquid/grocerygod/fratgroceries/ggenv/local/lib/python2.7/site- packages/django/db/models/query.py", line 404, in get
self.model._meta.object_name)
DoesNotExist: Record matching query does not exist.
After more digging I discovered that a query for all Record.objects.all() works as expected. Can anyone shed light on why this would be happening to some of my models? A fix would also be tremendously helpful, thanks.
There is probably no Record with the id as 1 (maybe you meant pk?). You can easily verify this by running Record.objects.values("id") and checking the output manually.

Python 'if x is None' not catching NoneType

The below snippet of code keeps returning a "NoneType isn't iterable" error. Why doesn't the if statement catch this?
inset = set()
for x in node.contacted:
print type(x)
if x.is_converted() is True:
nset.add(x)
if x.contacted is None:
memotable[node.gen][node.genind] = nset
else:
nset.union(self.legacy(x, memotable))
memotable[node.gen][node.genind] = nset
Full traceback as requested:
Traceback (most recent call last):
File "F:\Dropbox\CS\a4\skeleton\trialtest.py", line 142, in
test_legacy_and_frac()
File "F:\Dropbox\CS\a4\skeleton\trialtest.py", line 125, in
test_legacy_and_frac
cunittest2.assert_equals(set([n10,n12,n21]), t.legacy(n00,mtable))
File "F:\Dropbox\CS\a4\skeleton\trial.py", line 138, in legacy
nset.union(self.legacy(x, memotable))
File "F:\Dropbox\CS\a4\skeleton\trial.py", line 138, in legacy
nset.union(self.legacy(x, memotable))
TypeError: 'NoneType' object is not iterable
The if statement guarantees that x.contacted isn't None.
But x.contacted isn't what you're trying to iterate or index, so it isn't guarding anything.
There's no reason memotable or memotable[node.gen] can't be None even though x.contacted is something else. For that matter, we have no idea of what the code inside self.legacy(x, memotable) does—maybe it tries to iterate x, or other_table[x], or who knows what, any of which could be None.
This is why you need to look at the entire traceback, not just the error string. It will tell you exactly which statement failed, and why.
And now that you've pasted the traceback:
File "F:\Dropbox\CS\a4\skeleton\trial.py", line 138, in legacy nset.union(self.legacy(x, memotable))
Yep, it's something that happens inside that self.legacy line, and it has absolutely nothing to do with x.contacted. The problem is almost certainly that your self.legacy method is returning None, so you're doing nset.union(None).
Again, whether x.contacted is or is not None is completely irrelevant here, so your check doesn't guard you here.
If you want us to debug the problem in that function, you will have to give us the code to that function, instead of code that has nothing to do with the error. Maybe it's something silly, like doing a + b instead of return a + b at the end, or maybe it's some deep logic error, but there's really no way we can guess.
Check the value of memotable and memotable[node.gen] as it can not be said to guaranteed that they are not None if x.contacted is not None (without the code).
If you mention the values of the variables here and Post the Full Traceback, we may be able to point out the problem more precisely.
The exception occurs because the function call self.legacy(x, memotable) returns None.
The traceback indicates the error occurs in nset.union(self.legacy(x, memotable)), and set.union() raises that exception when its argument is None. (I'm assuming nset is a set. Your code defines inset = set(), but does not show where nset comes from)
>>> set().union(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

How to recover a broken python "cPickle" dump?

I am using rss2email for converting a number of RSS feeds into mail for easier consumption. That is, I was using it because it broke in a horrible way today: On every run, it only gives me this backtrace:
Traceback (most recent call last):
File "/usr/share/rss2email/rss2email.py", line 740, in <module>
elif action == "list": list()
File "/usr/share/rss2email/rss2email.py", line 681, in list
feeds, feedfileObject = load(lock=0)
File "/usr/share/rss2email/rss2email.py", line 422, in load
feeds = pickle.load(feedfileObject)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))
The only helpful fact that I have been able to construct from this backtrace is that the file ~/.rss2email/feeds.dat in which rss2email keeps all its configuration and runtime state is somehow broken. Apparently, rss2email reads its state and dumps it back using cPickle on every run.
I have even found the line containing that 'sxOYAAuyzSx0WqN3BVPjE+6pgPU'string mentioned above in the giant (>12MB) feeds.dat file. To my untrained eye, the dump does not appear to be truncated or otherwise damaged.
What approaches could I try in order to reconstruct the file?
The Python version is 2.5.4 on a Debian/unstable system.
EDIT
Peter Gibson and J.F. Sebastian have suggested directly loading from the
pickle file and I had tried that before. Apparently, a Feed class
that is defined in rss2email.py is needed, so here's my script:
#!/usr/bin/python
import sys
# import pickle
import cPickle as pickle
sys.path.insert(0,"/usr/share/rss2email")
from rss2email import Feed
feedfile = open("feeds.dat", 'rb')
feeds = pickle.load(feedfile)
The "plain" pickle variant produces the following traceback:
Traceback (most recent call last):
File "./r2e-rescue.py", line 8, in <module>
feeds = pickle.load(feedfile)
File "/usr/lib/python2.5/pickle.py", line 1370, in load
return Unpickler(file).load()
File "/usr/lib/python2.5/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.5/pickle.py", line 1133, in load_reduce
value = func(*args)
TypeError: 'str' object is not callable
The cPickle variant produces essentially the same thing as calling
r2e itself:
Traceback (most recent call last):
File "./r2e-rescue.py", line 10, in <module>
feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))
EDIT 2
Following J.F. Sebastian's suggestion around putting "printf
debugging" into Feed.__setstate__ into my test script, these are the
last few lines before Python bails out.
u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html': u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html'},
'to': None,
'url': 'http://arstechnica.com/'}
Traceback (most recent call last):
File "./r2e-rescue.py", line 23, in ?
feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))
The same thing happens on a Debian/etch box using python 2.4.4-2.
How I solved my problem
A Perl port of pickle.py
Following J.F. Sebastian's comment about how simple the pickle
format is, I went out to port parts of pickle.py to Perl. A couple
of quick regular expressions would have been a faster way to access my
data, but I felt that the hack value and an opportunity to learn more
about Python would be be worth it. Plus, I still feel much more
comfortable using (and debugging code in) Perl than Python.
Most of the porting effort (simple types, tuples, lists, dictionaries)
went very straightforward. Perl's and Python's different notions of
classes and objects has been the only issue so far where a bit more
than simple translation of idioms was needed. The result is a module
called Pickle::Parse which after a bit of polishing will be
published on CPAN.
A module called Python::Serialise::Pickle existed on CPAN, but I
found its parsing capabilities lacking: It spews debugging output all
over the place and doesn't seem to support classes/objects.
Parsing, transforming data, detecting actual errors in the stream
Based upon Pickle::Parse, I tried to parse the feeds.dat file.
After a few iteration of fixing trivial bugs in my parsing code, I got
an error message that was strikingly similar to pickle.py's original
object not callable error message:
Can't use string ("sxOYAAuyzSx0WqN3BVPjE+6pgPU") as a subroutine
ref while "strict refs" in use at lib/Pickle/Parse.pm line 489,
<STDIN> line 187102.
Ha! Now we're at a point where it's quite likely that the actual data
stream is broken. Plus, we get an idea where it is broken.
It turned out that the first line of the following sequence was wrong:
g7724
((I2009
I3
I19
I1
I19
I31
I3
I78
I0
t(dtRp62457
Position 7724 in the "memo" pointed to that string
"sxOYAAuyzSx0WqN3BVPjE+6pgPU". From similar records earlier in the
stream, it was clear that a time.struct_time object was needed
instead. All later records shared this wrong pointer. With a simple
search/replace operation, it was trivial to fix this.
I find it ironic that I found the source of the error by accident
through Perl's feature that tells the user its position in the input
data stream when it dies.
Conclusion
I will move away from rss2email as soon as I find time to
automatically transform its pickled configuration/state mess to
another tool's format.
pickle.py needs more meaningful error messages that tell the user
about the position of the data stream (not the poision in its own
code) where things go wrong.
Porting parts pickle.py to Perl was fun and, in the end, rewarding.
Have you tried manually loading the feeds.dat file using both cPickle and pickle? If the output differs it might hint at the error.
Something like (from your home directory):
import cPickle, pickle
f = open('.rss2email/feeds.dat', 'r')
obj1 = cPickle.load(f)
obj2 = pickle.load(f)
(you might need to open in binary mode 'rb' if rss2email doesn't pickle in ascii).
Pete
Edit: The fact that cPickle and pickle give the same error suggests that the feeds.dat file is the problem. Probably a change in the Feed class between versions of rss2email as suggested in the Ubuntu bug J.F. Sebastian links to.
Sounds like the internals of cPickle are getting tangled up. This thread (http://bytes.com/groups/python/565085-cpickle-problems) looks like it might have a clue..
'sxOYAAuyzSx0WqN3BVPjE+6pgPU' is most probably unrelated to the pickle's problem
Post an error traceback for (to determine what class defines the attribute that can't be called (the one that leads to the TypeError):
python -c "import pickle; pickle.load(open('feeds.dat'))"
EDIT:
Add the following to your code and run (redirect stderr to file then use 'tail -2' on it to print last 2 lines):
from pprint import pprint
def setstate(self, dict_):
pprint(dict_, stream=sys.stderr, depth=None)
self.__dict__.update(dict_)
Feed.__setstate__ = setstate
If the above doesn't yield an interesting output then use general troubleshooting tactics:
Confirm that 'feeds.dat' is the problem:
backup ~/.rss2email directory
install rss2email into virtualenv/pip sandbox (or use zc.buildout) to isolate the environment (make sure you are using feedparser.py from the trunk).
add couple of feeds, add feeds until 'feeds.dat' size is greater than the current. Run some tests.
try old 'feeds.dat'
try new 'feeds.dat' on existing rss2email installation
See r2e bails out with TypeError bug on Ubuntu.

Categories

Resources