Ensure pickling is complete before ProcessPoolExecutor.submit returns - python

Say I have the following simple class (easily pickled):
import time
from concurrent.futures import ProcessPoolExecutor
class A:
def long_computation(self):
time.sleep(10)
return 42
I would like to be able to do this:
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(a.long_computation)
On Python 3.6.9, this fails with TypeError: can't pickle _thread.RLock objects. On 3.8.0, it results in an endless wait for a lock to be acquired.
What does work (on both versions) is this:
a = A()
with ProcessPoolExecutor(1) as executor:
future = executor.submit(a.long_computation)
time.sleep(0.001)
a.future = future
It seems to me that executor.submit does not block long enough for the pickling of a to finish, and runs into issues with pickling the resulting Future object.
I'm not too happy about the time.sleep(0.001) workaround, as it involves a magic number and I imagine it could easily fail if the pickling ends up taking longer. I don't want to sleep for a safer, longer time as that would be a waste. Ideally I would want executor.submit to block until it is safe to store a reference to the Future object in a.
Is there a better way to do this?

Thinking about it a bit more, I came up with the following:
import pickle
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(pickle.loads(pickle.dumps(a)).long_computation)
It involves a duplication of effort as the a is pickled twice, but works fine and ensures that the Future object does not get pickled in any case, as desired.
Then I realised that the reason this works is that it creates a copy of a. So the pickling and unpickling can be avoided by simply (shallow) copying the object before submitting the method, which ensures that no reference to the Future object exists on the copy:
from copy import copy
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(copy(a).long_computation)
This is faster and much less awkward than the above cycle of pickles, but I'm still interested in the best practice here so I'll wait a bit before accepting this answer.

Related

Why does `Queue.put` seem to be faster at pickling a numpy array than actual pickle?

It appears that I can call q.put 1000 times in under 2.5ms. How is that possible when just pickling that very same array 1000 times takes over 2 seconds?
>>> a = np.random.rand(1024,1024)
>>> q = Queue()
>>> timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
>>> timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327
Obviously, I am not understanding something about how Queue.put works. Can anyone enlighten me?
I also observed the following:
>>> def f():
... q.put(a)
... q.get()
>>> timeit.timeit(lambda: f(), number=1000)
42.33058542700019
This appears to be more realistic and suggests to me that simply calling q.put() will return before the object is actually serialized. Is that correct?
The multiprocessing implementation has a number of moving parts under the covers. Here, dealing with a multiprocessing.Queue is mostly done in a hidden (to the end user) worker thread. .put() just puts an object pointer on a queue (fast and constant-time), and that worker thread does the actual pickling, when it gets around to it.
This can burn you, though: if, in your example, the main program goes on to mutate the np array, after the .put(), an undefined number of those mutations may be captured by the eventually pickled state. The user-level .put() only captures the object pointer, nothing about the object's state.

Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method

Assume yo = Yo() is a big object with a method double, which returns its parameter multiplied by 2.
If I pass yo.double to imap of multiprocessing, then it is incredibly slow, because every function call creates a copy of yo I think.
Ie, this is very slow:
from tqdm import tqdm
from multiprocessing import Pool
import numpy as np
class Yo:
def __init__(self):
self.a = np.random.random((10000000, 10))
def double(self, x):
return 2 * x
yo = Yo()
with Pool(4) as p:
for _ in tqdm(p.imap(yo.double, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1it [00:06, 6.54s/it]
2it [00:11, 6.17s/it]
3it [00:16, 5.60s/it]
4it [00:20, 5.13s/it]
...
BUT, if I wrap yo.double with a function double_wrap and pass it to imap, then it is essentially instantaneous.
def double_wrap(x):
return yo.double(x)
with Pool(4) as p:
for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]
How and why does wrapping the function change the behavior?
I use Python 3.6.6.
You are right about the copying. yo.double is a 'bound method', bound to your big object. When you pass it into the pool-method, it will pickle the whole instance with it, send it to the child processes and unpickle it there. This happens for every chunk of the iterable a child process works on. The default value for chunksize in pool.imap is 1, so you are hitting this communication overhead for every processed item in the iterable.
Contrary when you pass double_wrap, you are just passing a module-level function. Only it's name will actually get pickled and the child processes will import the function from __main__. Since you're obviously on an OS which supports forking, your double_wrap function will have access to the forked yo instance of Yo. Your big object won't be serialized (pickled) in this case, hence the communication overhead is tiny compared to the other approach.
#Darkonaut I just don't understand why making the function module level prevents copying of the object. After all, the function needs to have a pointer to the yo object itself – which should require all processes to copy yo as they cannot share memory.
The function running in the child process will automatically find a reference to a global yo, because your operating system (OS) is using fork to create a child process. Forking leads to a clone of your whole parent process and as long neither the parent nor the child alter a specific object, both will see the same object in the same memory place.
Only if parent or child change something on the object, the object get's copied in the child process. That's called "copy-on-write" and happens at OS level without you taking notice of it in Python. Your code wouldn't work on Windows, which uses 'spawn' as start method for new processes.
Now I'm simplifying a bit above where I write "the object gets copied", since the unit the OS operates on is a "page" (most commonly this will be of size 4KB). This answer here would be a good follow up read for broading your understanding.

Changing object properties in python multiprocessing

consider the following objects:
class Item(object):
def __init__(self):
self.c = 0
def increase(self):
S.increase(self)
class S(object):
#staticmethod
def increase(item):
item.c += 1
This mirrors the situation I am currently in, S is some library class, Item collects and organises data and data manipulation processes. Now I want to parallelise the work, for that I use the python multiprocessing module:
from multiprocessing import Process
l= [Item() for i in range(5)]
for i in l:
Process(target=i.increase).start()
The result is not what I expected:
[i.c for i in l]
[0, 0, 0, 0, 0]
Where am I going wrong?
You're expecting your mutator, the static method increase in class S (called from the non-static increase in class item) to adjust each i.c field—and it does. The problem is not with the static method, but rather with the internal design of multiprocessing.
The multiprocessing package works by running multiple separate instances of Python. On Unix-like systems, it uses fork, which makes this easier; on Windows-like systems, it spawns new copies of itself. Either way, this imposes all the slightly odd restrictions described in the Python documentation: v2 and v3. (NB: the rest of the links below are to the Python2 documentation since that was the page I still had open. The restrictions are pretty much the same for both Python2 and Python3.)
In this particular case, each Process call makes a copy of the object i and sends that copy to a new process. The process modifies the copy, which has no effect on the original.
To fix this, you may either send the modified objects back, e.g., through a Queue() or Pipe() instance, or place the objects into shared memory. The send-back technique is simpler and easier to program and automatically does most of the necessary synchronization (but see the caveat about being sure to collect all results before using a Process instance's join, even implicitly).

Should I worry about circular references in Python?

Suppose I have code that maintains a parent/children structure. In such a structure I get circular references, where a child points to a parent and a parent points to a child. Should I worry about them? I'm using Python 2.5.
I am concerned that they will not be garbage collected and the application will eventually consume all memory.
"Worry" is misplaced, but if your program turns out to be slow, consume more memory than expected, or have strange inexplicable pauses, the cause is indeed likely to be in those garbage reference loops -- they need to be garbage collected by a different procedure than "normal" (acyclic) reference graphs, and that collection is occasional and may be slow if you have a lot of objects tied up in such loops (the cyclical-garbage collection is also inhibited if an object in the loop has a __del__ special method).
So, reference loops will not affect your program's correctness, but may affect its performance and/or footprint.
If and when you want to remove unwanted loops of references, you can often use the weakref module in Python's standard library.
If and when you want to exert more direct control (or perform debugging, see what exactly is happening) regarding cyclical garbage collection, use the gc module in Python's standard library.
Experimentally: you're fine:
import itertools
for i in itertools.count():
a = {}
b = {"a":a}
a["b"] = b
It consistently stays at using 3.6 MB of RAM.
Python will detect the cycle and release the memory when there are no outside references.
Circular references are a normal thing to do, so I don't see a reason to be worried about them. Many tree algorithms require that each node have links to its children and its parent. They're also required to implement something like a doubly linked list.
I don't think you should worry. Try the following program and will you see that it won't consume all memory:
while True:
a=range(100)
b=range(100)
a.append(b)
b.append(a)
a.append(a)
b.append(b)
There seems to be a issue with references to methods in lists in a variable. Here are two examples. The first one does not call __del__. The second one with weakref is ok for __del__. However, in this later case the problem is that you cannot weakly reference methods: http://docs.python.org/2/library/weakref.html
import sys, weakref
class One():
def __init__(self):
self.counters = [ self.count ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(One)
one = One()
sys.getrefcount(One)
del one
sys.getrefcount(One)
class Two():
def __init__(self):
self.counters = [ weakref.ref(self.count) ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(Two)
two = Two()
sys.getrefcount(Two)
del two
sys.getrefcount(Two)

How to do cleanup reliably in python?

I have some ctypes bindings, and for each body.New I should call body.Free. The library I'm binding doesn't have allocation routines insulated out from the rest of the code (they can be called about anywhere there), and to use couple of useful features I need to make cyclic references.
I think It'd solve if I'd find a reliable way to hook destructor to an object. (weakrefs would help if they'd give me the callback just before the data is dropped.
So obviously this code megafails when I put in velocity_func:
class Body(object):
def __init__(self, mass, inertia):
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
I also tried to solve it through weakrefs, with those the things seem getting just worse, just only largely more unpredictable.
Even if I don't put in the velocity_func, there will appear cycles at least then when I do this:
class Toy(object):
def __init__(self, body):
self.body.owner = self
...
def collision(a, b, contacts):
whatever(a.body.owner)
So how to make sure Structures will get garbage collected, even if they are allocated/freed by the shared library?
There's repository if you are interested about more details: http://bitbucket.org/cheery/ctypes-chipmunk/
What you want to do, that is create an object that allocates things and then deallocates automatically when the object is no longer in use, is almost impossible in Python, unfortunately. The del statement is not guaranteed to be called, so you can't rely on that.
The standard way in Python is simply:
try:
allocate()
dostuff()
finally:
cleanup()
Or since 2.5 you can also create context-managers and use the with statement, which is a neater way of doing that.
But both of these are primarily for when you allocate/lock in the beginning of a code snippet. If you want to have things allocated for the whole run of the program, you need to allocate the resource at startup, before the main code of the program runs, and deallocate afterwards. There is one situation which isn't covered here, and that is when you want to allocate and deallocate many resources dynamically and use them in many places in the code. For example of you want a pool of memory buffers or similar. But most of those cases are for memory, which Python will handle for you, so you don't have to bother about those. There are of course cases where you want to have dynamic pool allocation of things that are NOT memory, and then you would want the type of deallocation you try in your example, and that is tricky to do with Python.
If weakrefs aren't broken, I guess this may work:
from weakref import ref
pointers = set()
class Pointer(object):
def __init__(self, cfun, ptr):
pointers.add(self)
self.ref = ref(ptr, self.cleanup)
self.data = cast(ptr, c_void_p).value # python cast it so smart, but it can't be smarter than this.
self.cfun = cfun
def cleanup(self, obj):
print 'cleanup 0x%x' % self.data
self.cfun(self.data)
pointers.remove(self)
def cleanup(cfun, ptr):
Pointer(cfun, ptr)
I yet try it. The important piece is that the Pointer doesn't have any strong references to the foreign pointer, except an integer. This should work if ctypes doesn't free memory that I should free with the bindings. Yeah, it's basicly a hack, but I think it may work better than the earlier things I've been trying.
Edit: Tried it, and it seem to work after small finetuning my code. A surprising thing is that even if I got del out from all of my structures, it seem to still fail. Interesting but frustrating.
Neither works, from some weird chance I've been able to drop away cyclic references in places, but things stay broke.
Edit: Well.. weakrefs WERE broken after all! so there's likely no solution for reliable cleanup in python, except than forcing it being explicit.
In CPython, __del__ is a reliable destructor of an object, because it will always be called when the reference count reaches zero (note: there may be cases - like circular references of items with __del__ method defined - where the reference count will never reaches zero, but that is another issue).
Update
From the comments, I understand the problem is related to the order of destruction of objects: body is a global object, and it is being destroyed before all other objects, thus it is no longer available to them.
Actually, using global objects is not good; not only because of issues like this one, but also because of maintenance.
I would then change your class with something like this
class Body(object):
def __init__(self, mass, inertia):
self._bodyref = body
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
A couple of notes:
The change is only adding a reference to the global body object, that thus will live at least as much as all the objects derived from that class.
Still, using a global object is not good because of unit testing and maintenance; better would be to have a factory for the object, that will set the correct "body" to the class, and in case of unit test will easily put a mock object. But that's really up to you and how much effort you think makes sense in this project.

Categories

Resources