Deallocation of dict value in for-loop [duplicate] - python

I wrote a Python program that acts on a large input file to create a few million objects representing triangles. The algorithm is:
read an input file
process the file and create a list of triangles, represented by their vertices
output the vertices in the OFF format: a list of vertices followed by a list of triangles. The triangles are represented by indices into the list of vertices
The requirement of OFF that I print out the complete list of vertices before I print out the triangles means that I have to hold the list of triangles in memory before I write the output to file. In the meanwhile I'm getting memory errors because of the sizes of the lists.
What is the best way to tell Python that I no longer need some of the data, and it can be freed?

According to Python Official Documentation, you can explicitly invoke the Garbage Collector to release unreferenced memory with gc.collect(). Example:
import gc
gc.collect()
You should do that after marking what you want to discard using del:
del my_array
del my_object
gc.collect()

Unfortunately (depending on your version and release of Python) some types of objects use "free lists" which are a neat local optimization but may cause memory fragmentation, specifically by making more and more memory "earmarked" for only objects of a certain type and thereby unavailable to the "general fund".
The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.
In your use case, it seems that the best way for the subprocesses to accumulate some results and yet ensure those results are available to the main process is to use semi-temporary files (by semi-temporary I mean, NOT the kind of files that automatically go away when closed, just ordinary files that you explicitly delete when you're all done with them).

The del statement might be of use, but IIRC it isn't guaranteed to free the memory. The docs are here ... and a why it isn't released is here.
I have heard people on Linux and Unix-type systems forking a python process to do some work, getting results and then killing it.
This article has notes on the Python garbage collector, but I think lack of memory control is the downside to managed memory

Python is garbage-collected, so if you reduce the size of your list, it will reclaim memory. You can also use the "del" statement to get rid of a variable completely:
biglist = [blah,blah,blah]
#...
del biglist

(del can be your friend, as it marks objects as being deletable when there no other references to them. Now, often the CPython interpreter keeps this memory for later use, so your operating system might not see the "freed" memory.)
Maybe you would not run into any memory problem in the first place by using a more compact structure for your data.
Thus, lists of numbers are much less memory-efficient than the format used by the standard array module or the third-party numpy module. You would save memory by putting your vertices in a NumPy 3xN array and your triangles in an N-element array.

You can't explicitly free memory. What you need to do is to make sure you don't keep references to objects. They will then be garbage collected, freeing the memory.
In your case, when you need large lists, you typically need to reorganize the code, typically using generators/iterators instead. That way you don't need to have the large lists in memory at all.

I had a similar problem in reading a graph from a file. The processing included the computation of a 200 000x200 000 float matrix (one line at a time) that did not fit into memory. Trying to free the memory between computations using gc.collect() fixed the memory-related aspect of the problem but it resulted in performance issues: I don't know why but even though the amount of used memory remained constant, each new call to gc.collect() took some more time than the previous one. So quite quickly the garbage collecting took most of the computation time.
To fix both the memory and performance issues I switched to the use of a multithreading trick I read once somewhere (I'm sorry, I cannot find the related post anymore). Before I was reading each line of the file in a big for loop, processing it, and running gc.collect() every once and a while to free memory space. Now I call a function that reads and processes a chunk of the file in a new thread. Once the thread ends, the memory is automatically freed without the strange performance issue.
Practically it works like this:
from dask import delayed # this module wraps the multithreading
def f(storage, index, chunk_size): # the processing function
# read the chunk of size chunk_size starting at index in the file
# process it using data in storage if needed
# append data needed for further computations to storage
return storage
partial_result = delayed([]) # put into the delayed() the constructor for your data structure
# I personally use "delayed(nx.Graph())" since I am creating a networkx Graph
chunk_size = 100 # ideally you want this as big as possible while still enabling the computations to fit in memory
for index in range(0, len(file), chunk_size):
# we indicates to dask that we will want to apply f to the parameters partial_result, index, chunk_size
partial_result = delayed(f)(partial_result, index, chunk_size)
# no computations are done yet !
# dask will spawn a thread to run f(partial_result, index, chunk_size) once we call partial_result.compute()
# passing the previous "partial_result" variable in the parameters assures a chunk will only be processed after the previous one is done
# it also allows you to use the results of the processing of the previous chunks in the file if needed
# this launches all the computations
result = partial_result.compute()
# one thread is spawned for each "delayed" one at a time to compute its result
# dask then closes the tread, which solves the memory freeing issue
# the strange performance issue with gc.collect() is also avoided

Others have posted some ways that you might be able to "coax" the Python interpreter into freeing the memory (or otherwise avoid having memory problems). Chances are you should try their ideas out first. However, I feel it important to give you a direct answer to your question.
There isn't really any way to directly tell Python to free memory. The fact of that matter is that if you want that low a level of control, you're going to have to write an extension in C or C++.
That said, there are some tools to help with this:
cython
swig
boost python

As other answers already say, Python can keep from releasing memory to the OS even if it's no longer in use by Python code (so gc.collect() doesn't free anything) especially in a long-running program. Anyway if you're on Linux you can try to release memory by invoking directly the libc function malloc_trim (man page).
Something like:
import ctypes
libc = ctypes.CDLL("libc.so.6")
libc.malloc_trim(0)

If you don't care about vertex reuse, you could have two output files--one for vertices and one for triangles. Then append the triangle file to the vertex file when you are done.

Related

Why does Python's multiprocessing module only support two data types?

According to Python's multiprocessing documentation:
Data can be stored in a shared memory map using Value or Array.
Is shared memory treated differently than memory that is typically allocated to a process? Why does Python only support two data structures?
I'm guessing it has to do with garbage collection and is perhaps along the same reasons GIL exists. If this is the case, how/why are Value and Array implemented to be an exception to this?
I'm not remotely an expert on this, so def not a complete answer. There are a couple of things I think this considers:
Processes have their own memory space, so if we share "normal" variables between processes and try to write each process will have its own copy (perhaps using copy on write semantics).
Shared memory needs some sort of abstraction or primitive as it exists outside of process memory (SOURCE)
Value and Array, by default, are thread/process safe for concurrent use by guarding access with locks, handling allocations to shared memory AND protecting it :)
The attached documentation is able to answer, yes to:
is shared memory treated differently than memory that is typically allocated to a process?

How to clear up memory when using python?

I'm workin with fairly large dataframes and textfiles (thousands of docs) that I am opening up in my ipython notebook. I'm noticing that after a while, my computer becomes really slow. Is there a way to take inventory of my python program to find out what's slowing down my computer?
You have a few options. First, you can use third party tools like heapy or PySizer to evaluate your memory usage at different points in your program. This (now closed) SO question discusses them a little bit. Additionally, there is a third option simply called 'memory_profiler' hosted here on GitHub, and according to this blog there are some special shortcuts in IPython for memory_profiler.
Once you have identified the data structures that are consuming the most memory, there are a few options:
Refactor to take advantage of garbage collection
Examine the flow of data through your program and see if there are any places where large data structures are kept around when they don't need to be. If you have a large data structure that you do some processing on, put that processing in a function and returned the processed result so the original memory hog can go out of scope and be destroyed.
A comment suggested using the del statement. Although the commenter is correct that it will free memory, it really should indicate to you that your program isn't structured correctly. Python has good garbage collection, and if you find yourself manually messing with memory freeing, you should probably put that section of code in a function or method instead, and let the garbage collector do its thing.
Temporary Files
If you really need access to large data structures (almost) simultaneously, consider writing one or several of them to temporary files while not needed. You can use the JSON or Pickle libraries to write stuff out in sophisticated formats, or simply pprint your data to a file and read it back in later.
I know that seems like some kind of manual hard disk thrashing, but it gives you great control over exactly when the writes to and reads from the hard disk occur. Also, in this case only your files are bouncing on and off the disk. When you use up your memory and swapping starts occurring, everything gets bounced around - data files, program instructions, memory page tables, etc... Everything grinds to a halt instead of just your program running a little more slowly.
Buy More Memory
Yes, this is an option. But like the del statement, it can usually be avoided by more careful data abstraction and should be a last resort, reserved for special cases.
iPython it's a wonderful tool, but sometimes it tends to slow things up.
If you have large print output statements, lots of graphics, or your code has grown too big, the autosave takes forever to snap your Notebooks. Try autosaving sparingly with:
%autosave 300
Or disabling it entirely:
%autosave 0

How to track memory for a python script

We have a system that only has one interpreter. Many user scripts come through this interpreter. We want put a cap on each script's memory usage. There is only process, and that process invokes tasklets for each script. So since we only have one interpreter and one process, we don't know a way to put a cap on each scripts memory usage. What is the best way to do this
I don't think that it's possible at all. Your questions implies that the memory used by your tasklets is completly separated, which is probably not the case. Python is optimizing small objects like integers. As far as I know, for example each 3 in your code is using the same object, which is not a problem, because it is imutable. So if two of your tasklets use the same (small?) integer, they are already sharing memory. ;-)
Memory is separated at OS process level. There's no easy way to tell to which tasklet and even to which thread does a particular object belong.
Also, there's no easy way to add a custom bookkeeping allocator that would analyze which tasklet or thread is is allocating a piece of memory and prevent from allocating too much. It would also need to plug into garbage-collection code to discount objects which are freed.
Unless you're keen to write a custom Python interpreter, using a process per task is your best bet.
You don't even need to kill and respawn the interpreters every time you need to run another script. Pool several interpreters and only kill the ones that overgrow a certain memory threshold after running a script. Limit interpreters' memory consumption by means provided by OS if you need.
If you need to share large amounts of common data between the tasks, use shared memory; for smaller interactions, use sockets (with a messaging level above them as needed).
Yes, this might be slower than your current setup. But from your use of Python I suppose that in these scripts you don't do any time-critical computing anyway.

How to deserialize 1GB of objects into Python faster than cPickle?

We've got a Python-based web server that unpickles a number of large data files on startup using cPickle. The data files (pickled using HIGHEST_PROTOCOL) are around 0.4 GB on disk and load into memory as about 1.2 GB of Python objects -- this takes about 20 seconds. We're using Python 2.6 on 64-bit Windows machines.
The bottleneck is certainly not disk (it takes less than 0.5s to actually read that much data), but memory allocation and object creation (there are millions of objects being created). We want to reduce the 20s to decrease startup time.
Is there any way to deserialize more than 1GB of objects into Python much faster than cPickle (like 5-10x)? Because the execution time is bound by memory allocation and object creation, I presume using another unpickling technique such as JSON wouldn't help here.
I know some interpreted languages have a way to save their entire memory image as a disk file, so they can load it back into memory all in one go, without allocation/creation for each object. Is there a way to do this, or achieve something similar, in Python?
Try the marshal module - it's internal (used by the byte-compiler) and intentionally not advertised much, but it is much faster. Note that it doesn't serialize arbitrary instances like pickle, only builtin types (don't remember the exact constraints, see docs). Also note that the format isn't stable.
If you need to initialize multiple processes and can tolerate one process always loaded, there is an elegant solution: load the objects in one process, and then do nothing in it except forking processes on demand. Forking is fast (copy on write) and shares the memory between all processes. [Disclaimers: untested; unlike Ruby, Python ref counting will trigger page copies so this is probably useless if you have huge objects and/or access a small fraction of them.]
If your objects contain lots of raw data like numpy arrays, you can memory-map them for much faster startup. pytables is also good for these scenarios.
If you'll only use a small part of the objects, then an OO database (like Zope's) can probably help you. Though if you need them all in memory, you will just waste lots of overhead for little gain. (never used one, so this might be nonsense).
Maybe other python implementations can do it? Don't know, just a thought...
Are you load()ing the pickled data directly from the file? What about to try to load the file into the memory and then do the load?
I would start with trying the cStringIO(); alternatively you may try to write your own version of StringIO that would use buffer() to slice the memory which would reduce the needed copy() operations (cStringIO still may be faster, but you'll have to try).
There are sometimes huge performance bottlenecks when doing these kinds of operations especially on Windows platform; the Windows system is somehow very unoptimized for doing lots of small reads while UNIXes cope quite well; if load() does lot of small reads or you are calling load() several times to read the data, this would help.
I haven't used cPickle (or Python) but in cases like this I think the best strategy is to
avoid unnecessary loading of the objects until they are really needed - say load after start up on a different thread, actually its usually better to avoid unnecessary loading/initialization at anytime for obvious reasons. Google 'lazy loading' or 'lazy initialization'. If you really need all the objects to do some task before server start up then maybe you can try to implement a manual custom deserialization method, in other words implement something yourself if you have intimate knowledge of the data you will deal with which can help you 'squeeze' better performance then the general tool for dealing with it.
Did you try sacrificing efficiency of pickling by not using HIGHEST_PROTOCOL? It isn't clear what performance costs are associated with using this protocol, but it might be worth a try.
Impossible to answer this without knowing more about what sort of data you are loading and how you are using it.
If it is some sort of business logic, maybe you should try turning it into a pre-compiled module;
If it is structured data, can you delegate it to a database and only pull what is needed?
Does the data have a regular structure? Is there any way to divide it up and decide what is required and only then load it?
I'll add another answer that might be helpful - if you can, can you try to define _slots_ on the class that is most commonly created? This may be a little limiting and impossible, however it seems to have cut the time needed for initialization on my test to about a half.

Python reclaiming memory after deleting items in a dictionary

I have a relatively large dictionary in Python and would like to be able to not only delete items from it, but actually reclaim the memory back from these deletions in my program. I am running across a problem whereby although I delete items from the dictionary and even run the garbage collector manually, Python does not appear to be freeing the memory itself.
A simple example of this:
>>> tupdict = {}
# consumes around 2 GB of memory
>>> for i in xrange(12500000):
... tupdict[i] = (i,i)
...
# delete over half the entries, no drop in consumed memory
>>> for i in xrange(7500000):
... del tupdict[i]
...
>>> import gc
# manually garbage collect, still no drop in consumed memory after this
>>> gc.collect()
0
>>>
I imagine what is happening is that although the entries are deleted and garbage collector run, Python does not go ahead and resize the dictionary. My question is, is there any simple way around this, or am I likely to require a more serious rethink about how I write my program?
A lot of factors go into whether Python returns this memory to the underlying OS or not, which is probably how you're trying to tell if memory is being freed. CPython has a pooled allocator system that tends to hold on to freed memory so that it can be reused in an efficient manner (but these subsequent allocations won't increase your memory footprint from the perspective of the OS), which might be what you're seeing.
Also, on some unix platforms processes don't release freed memory back to the OS until the application closes (or some other significant event occurs). Even if you are in a situation where an entire pool has been freed (and thus Python might decide to free() it rather than holding it open for future objects), the OS still won't release this memory to be used by other processes (but can be used for further reallocation within the original process). In general this is good for reducing memory fragmentation and doesn't have too much of a downside, as the unused process memory will get paged out to disk. Windows does release process memory back to the OS for use by any new allocation (which you can then see in the Task Manager), so trying this on Windows will likely appear to give you a different result.
In the end, how to manage deallocated process memory is the purview of the operating system, and there are various schemes (with upsides and downsides) used such that just looking in your system information tool of choice won't necessarily tell you the whole truth.
You're right that Python doesn't resize dictionary back if items are deleted from dictionary. This have nothing to do with OS memory management and garbage collection, it is an implementation detail of Python's dict data structure.
A workaround is to create a new dictionary by copying the old dictionary. Check this great video for more info: http://pyvideo.org/video/276/the-mighty-dictionary-55 (around 26:30 there is an answer).

Categories

Resources