Passing a method of a big object to imap: 1000-fold speed-up by wrapping the method - python

Assume yo = Yo() is a big object with a method double, which returns its parameter multiplied by 2.
If I pass yo.double to imap of multiprocessing, then it is incredibly slow, because every function call creates a copy of yo I think.
Ie, this is very slow:
from tqdm import tqdm
from multiprocessing import Pool
import numpy as np
class Yo:
def __init__(self):
self.a = np.random.random((10000000, 10))
def double(self, x):
return 2 * x
yo = Yo()
with Pool(4) as p:
for _ in tqdm(p.imap(yo.double, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1it [00:06, 6.54s/it]
2it [00:11, 6.17s/it]
3it [00:16, 5.60s/it]
4it [00:20, 5.13s/it]
...
BUT, if I wrap yo.double with a function double_wrap and pass it to imap, then it is essentially instantaneous.
def double_wrap(x):
return yo.double(x)
with Pool(4) as p:
for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]
How and why does wrapping the function change the behavior?
I use Python 3.6.6.

You are right about the copying. yo.double is a 'bound method', bound to your big object. When you pass it into the pool-method, it will pickle the whole instance with it, send it to the child processes and unpickle it there. This happens for every chunk of the iterable a child process works on. The default value for chunksize in pool.imap is 1, so you are hitting this communication overhead for every processed item in the iterable.
Contrary when you pass double_wrap, you are just passing a module-level function. Only it's name will actually get pickled and the child processes will import the function from __main__. Since you're obviously on an OS which supports forking, your double_wrap function will have access to the forked yo instance of Yo. Your big object won't be serialized (pickled) in this case, hence the communication overhead is tiny compared to the other approach.
#Darkonaut I just don't understand why making the function module level prevents copying of the object. After all, the function needs to have a pointer to the yo object itself – which should require all processes to copy yo as they cannot share memory.
The function running in the child process will automatically find a reference to a global yo, because your operating system (OS) is using fork to create a child process. Forking leads to a clone of your whole parent process and as long neither the parent nor the child alter a specific object, both will see the same object in the same memory place.
Only if parent or child change something on the object, the object get's copied in the child process. That's called "copy-on-write" and happens at OS level without you taking notice of it in Python. Your code wouldn't work on Windows, which uses 'spawn' as start method for new processes.
Now I'm simplifying a bit above where I write "the object gets copied", since the unit the OS operates on is a "page" (most commonly this will be of size 4KB). This answer here would be a good follow up read for broading your understanding.

Related

Ensure pickling is complete before ProcessPoolExecutor.submit returns

Say I have the following simple class (easily pickled):
import time
from concurrent.futures import ProcessPoolExecutor
class A:
def long_computation(self):
time.sleep(10)
return 42
I would like to be able to do this:
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(a.long_computation)
On Python 3.6.9, this fails with TypeError: can't pickle _thread.RLock objects. On 3.8.0, it results in an endless wait for a lock to be acquired.
What does work (on both versions) is this:
a = A()
with ProcessPoolExecutor(1) as executor:
future = executor.submit(a.long_computation)
time.sleep(0.001)
a.future = future
It seems to me that executor.submit does not block long enough for the pickling of a to finish, and runs into issues with pickling the resulting Future object.
I'm not too happy about the time.sleep(0.001) workaround, as it involves a magic number and I imagine it could easily fail if the pickling ends up taking longer. I don't want to sleep for a safer, longer time as that would be a waste. Ideally I would want executor.submit to block until it is safe to store a reference to the Future object in a.
Is there a better way to do this?
Thinking about it a bit more, I came up with the following:
import pickle
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(pickle.loads(pickle.dumps(a)).long_computation)
It involves a duplication of effort as the a is pickled twice, but works fine and ensures that the Future object does not get pickled in any case, as desired.
Then I realised that the reason this works is that it creates a copy of a. So the pickling and unpickling can be avoided by simply (shallow) copying the object before submitting the method, which ensures that no reference to the Future object exists on the copy:
from copy import copy
a = A()
with ProcessPoolExecutor(1) as executor:
a.future = executor.submit(copy(a).long_computation)
This is faster and much less awkward than the above cycle of pickles, but I'm still interested in the best practice here so I'll wait a bit before accepting this answer.

Changing object properties in python multiprocessing

consider the following objects:
class Item(object):
def __init__(self):
self.c = 0
def increase(self):
S.increase(self)
class S(object):
#staticmethod
def increase(item):
item.c += 1
This mirrors the situation I am currently in, S is some library class, Item collects and organises data and data manipulation processes. Now I want to parallelise the work, for that I use the python multiprocessing module:
from multiprocessing import Process
l= [Item() for i in range(5)]
for i in l:
Process(target=i.increase).start()
The result is not what I expected:
[i.c for i in l]
[0, 0, 0, 0, 0]
Where am I going wrong?
You're expecting your mutator, the static method increase in class S (called from the non-static increase in class item) to adjust each i.c field—and it does. The problem is not with the static method, but rather with the internal design of multiprocessing.
The multiprocessing package works by running multiple separate instances of Python. On Unix-like systems, it uses fork, which makes this easier; on Windows-like systems, it spawns new copies of itself. Either way, this imposes all the slightly odd restrictions described in the Python documentation: v2 and v3. (NB: the rest of the links below are to the Python2 documentation since that was the page I still had open. The restrictions are pretty much the same for both Python2 and Python3.)
In this particular case, each Process call makes a copy of the object i and sends that copy to a new process. The process modifies the copy, which has no effect on the original.
To fix this, you may either send the modified objects back, e.g., through a Queue() or Pipe() instance, or place the objects into shared memory. The send-back technique is simpler and easier to program and automatically does most of the necessary synchronization (but see the caveat about being sure to collect all results before using a Process instance's join, even implicitly).

does return object from function lead to memory leak

Taking the following code for example, does return object from function lead to memory leak?
I'm very curious about what happens to the object handle after used by the function use_age.
class Demo(object):
def _get_mysql_handle(self):
handle = MySQLdb.connect(host=self.conf["host"],
port=self.conf["port"],
user=self.conf["user"],
passwd=self.conf["passwd"],
db=self.conf["db"])
return handle
def use_age(self):
cursor = self._get_mysql_handle().cursor()
if __name__ == "__main__":
demo = Demo()
demo.use_age()
No, that code won't lead to a memory leak.
CPython handles object lifetimes by reference counting. In your example the reference count drops back to 0 and the database connection object is deleted again.
The local name handle in _get_mysql_handle is one reference, it is dropped when _get_mysql_handle returns.
The stack holding the return value from self._get_mysql_handle() is another, it too is dropped when the expression result is completed.
.cursor() is a method, so it'll have another reference for the self argument to that method, until the method exits.
The return value from .cursor() probably stores a reference, it'll be dropped when the cursor itself is reaped. That then depends on the lifetime of the local cursor variable in the use_age() method. As a local it doesn't live beyond the use_age() function.
Other Python implementations use garbage collection strategies; Jython uses the Java runtime facilities, for example. The object may live a little longer, but won't 'leak'.
In Python versions < 3.4, you do need to watch out for creating circular references with custom classes that define a __del__ method. Those are the circular references that the gc module does not break. You can introspect such chains in the gc.garbage object.

Memory leaks in Python when using an external C DLL

I have a python module that calls a DLL written C to encode XML strings. Once the function returns the encoded string, it fails to de-allocate the memory which was allocated during this step. Concretely:
encodeMyString = ctypes.create_string_buffer(4096)
CallEncodingFuncInDLL(encodeMyString, InputXML)
I have looked at this, this, and this and have also tried calling the gc.collect but perhaps since the object has been allocated in an external DLL, python gc doesn't have any record of it and fails to remove it. But since the code keeps calling the encoding function, it keeps on allocating memory and eventually the python process crashes. Is there a way to profile this memory usage?
Since you haven't given any information about the DLL, this will necessarily be pretty vague, but…
Python can't track memory allocated by something external that it doesn't know about. How could it? That memory could be part of the DLL's constant segment, or allocated with mmap or VirtualAlloc, or part of a larger object, or the DLL could just be expecting it to be alive for its own use.
Any DLL that has a function that allocates and returns a new object has to have a function that deallocates that object. For example, if CallEncodingFuncInDLL returns a new object that you're responsible for, there will be a function like DestroyEncodedThingInDLL that takes such an object and deallocates it.
So, when do you call this function?
Let's step back and make this more concrete. Let's say the function is plain old strdup, so the function you call to free up the memory is free. You have two choices for when to call free. No, I have no idea why you'd ever want to call strdup from Python, but it's about the simplest possible example, so let's pretend it's not useless.
The first option is to call strdup, immediately convert the returned value to a native Python object and free it, and not have to worry about it after that:
newbuf = libc.strdup(mybuf)
s = newbuf.value
libc.free(newbuf)
# now use s, which is just a Python bytes object, so it's GC-able
Or, better, wrap this up so it's automatic by using a custom restype callable:
def convert_and_free_char_p(char_p):
try:
return char_p.value
finally:
libc.free(char_p)
libc.strdup.restype = convert_and_free_char_p
s = libc.strdup(mybuf)
# now use s
But some objects can't be converted to a native Python object so easily—or they can be, but it's not very useful to do so, because you need to keep passing them back into the DLL. In that case, you can't clean it up until you're done with it.
The best way to do this is to wrap that opaque value up in a class that releases it on close or __exit__ or __del__ or whatever seems appropriate. One nice way to do this is with #contextmanager:
#contextlib.contextmanager
def freeing(value):
try:
yield value
finally:
libc.free(value)
So:
newbuf = libc.strdup(mybuf)
with freeing(newbuf):
do_stuff(newbuf)
do_more_stuff(newbuf)
# automatically freed before you get here
# (or even if you don't, because of an exception/return/etc.)
Or:
#contextlib.contextmanager
def strduping(buf):
value = libc.strdup(buf)
try:
yield value
finally:
libc.free(value)
And now:
with strduping(mybuf) as newbuf:
do_stuff(newbuf)
do_more_stuff(newbuf)
# again, automatically freed here

Should I worry about circular references in Python?

Suppose I have code that maintains a parent/children structure. In such a structure I get circular references, where a child points to a parent and a parent points to a child. Should I worry about them? I'm using Python 2.5.
I am concerned that they will not be garbage collected and the application will eventually consume all memory.
"Worry" is misplaced, but if your program turns out to be slow, consume more memory than expected, or have strange inexplicable pauses, the cause is indeed likely to be in those garbage reference loops -- they need to be garbage collected by a different procedure than "normal" (acyclic) reference graphs, and that collection is occasional and may be slow if you have a lot of objects tied up in such loops (the cyclical-garbage collection is also inhibited if an object in the loop has a __del__ special method).
So, reference loops will not affect your program's correctness, but may affect its performance and/or footprint.
If and when you want to remove unwanted loops of references, you can often use the weakref module in Python's standard library.
If and when you want to exert more direct control (or perform debugging, see what exactly is happening) regarding cyclical garbage collection, use the gc module in Python's standard library.
Experimentally: you're fine:
import itertools
for i in itertools.count():
a = {}
b = {"a":a}
a["b"] = b
It consistently stays at using 3.6 MB of RAM.
Python will detect the cycle and release the memory when there are no outside references.
Circular references are a normal thing to do, so I don't see a reason to be worried about them. Many tree algorithms require that each node have links to its children and its parent. They're also required to implement something like a doubly linked list.
I don't think you should worry. Try the following program and will you see that it won't consume all memory:
while True:
a=range(100)
b=range(100)
a.append(b)
b.append(a)
a.append(a)
b.append(b)
There seems to be a issue with references to methods in lists in a variable. Here are two examples. The first one does not call __del__. The second one with weakref is ok for __del__. However, in this later case the problem is that you cannot weakly reference methods: http://docs.python.org/2/library/weakref.html
import sys, weakref
class One():
def __init__(self):
self.counters = [ self.count ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(One)
one = One()
sys.getrefcount(One)
del one
sys.getrefcount(One)
class Two():
def __init__(self):
self.counters = [ weakref.ref(self.count) ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(Two)
two = Two()
sys.getrefcount(Two)
del two
sys.getrefcount(Two)

Categories

Resources