As far as I know, a buffer is something that has yet to be "written" to disk, while a cache is something that has been "read" from the disk and stored for later use.
But for this mechanism: In python, when a memory is not being used, there exists such an area which will be kept by the system for the next usage instead of just releasing immediately.
I am wondering does this area belong to the Buffer or the Cache?
Thanks.
As far as I understand, the mechanism you mentioned is related to Python's memory management and garbage collection.
This isn't related to buffering or caching data. Cache and Buffer are different things, which used to reduce disk-related operations (reading or writing data to disk).
Python's memory mechanism talks about allocating memory from the operating system.
Yoy can read more about Python's Garbage Collector here and the difference between cache and buffer here.
Related
According to Python's multiprocessing documentation:
Data can be stored in a shared memory map using Value or Array.
Is shared memory treated differently than memory that is typically allocated to a process? Why does Python only support two data structures?
I'm guessing it has to do with garbage collection and is perhaps along the same reasons GIL exists. If this is the case, how/why are Value and Array implemented to be an exception to this?
I'm not remotely an expert on this, so def not a complete answer. There are a couple of things I think this considers:
Processes have their own memory space, so if we share "normal" variables between processes and try to write each process will have its own copy (perhaps using copy on write semantics).
Shared memory needs some sort of abstraction or primitive as it exists outside of process memory (SOURCE)
Value and Array, by default, are thread/process safe for concurrent use by guarding access with locks, handling allocations to shared memory AND protecting it :)
The attached documentation is able to answer, yes to:
is shared memory treated differently than memory that is typically allocated to a process?
Everywhere I see shared memory implementations for python (e.g. in multiprocessing), creating shared memory always allocates new memory. Is there a way to create a shared memory object and have it refer to existing memory? The purpose would be to pre-initialize the data values, or rather, to avoid having to copy into the new shared memory if we already have, say, an array in hand. In my experience, allocating a large shared array is much faster than copying values into it.
The short answer is no.
I'm the author of the Python extensions posix_ipc1 and sysv_ipc2. Like Python's multiprocessing module from the standard library, my modules are just wrappers around facilities provided by the operating system, so what you really need to know is what the OS allows when allocating shared memory. That differs a little for SysV IPC and POSIX IPC, but in this context the difference isn't really important. (I think multiprocessing uses POSIX IPC where possible.)
For SysV IPC, the OS-level call to allocate shared memory is shmget(). You can see on that call's man page that it doesn't accept a pointer to existing memory; it always allocates new memory for you. Ditto for the POSIX IPC version of the same call (shm_open()). POSIX IPC is interesting because it implements shared memory to look like a memory mapped file, so it behaves a bit differently from SysV IPC.
Regardless, whether one is calling from Python or C, there's no option to ask the operating system to turn an existing piece of private memory into shared memory.
If you think about it, you'll see why. Suppose you could pass a pointer to a chunk of private memory to shmget() or shm_open(). Now the operating system is stuck with the job of keeping that memory where it is until all sharing processes are done with it. What if it's in the middle of your stack? Suddenly this big chunk of your stack can't be allocated because other processes are using it. It also means that when your process dies, the OS can't release all its memory because some of it is now being used by other processes.
In short, what you're asking for from Python isn't offered because the underlying OS calls don't allow it, and the underlying OS calls don't allow it (probably) because it would be really messy for the OS.
I read several ZODB tutorials but here is one thing I still don't get: How do you free memory that is already serialized (and committed) to the (say) FileStorage?
More specifically, I want the following code to stop eating all my memory:
for i in xrange(bignumber):
iobtree[i]=Bigobject() # Bigobject is about 1Mb
if(i%10==0):
transaction.commit() # or savepoint(True)
transaction.commit()
How can this be achieved? Is it possible to release references stored by iobtree and replace them by 'weak references' that would be accessible on demand?
Creating savepoints and commiting the transaction already clears a lot of you memory.
You'll need to check what your ZODB cache parameters are set to, and tune these as necessary. The cache size parameter indicates the number of objects cached, not bytes, so you'll have to adjust this based on the size of your objects.
You can try and call .cacheMinimize() on the ZODB connection object, this explicitly deactivates any unmodified (or already committed) objects in the cache.
Other than that, do note that even when Python frees objects from memory, the OS doesn't always reclaim that freed memory until it is needed for something else. OS-reported memory usage doesn't necessarily reflect actual memory requirements for a Python process.
I have lots of data to operate on (write, sort, read). This data can potentially be larger than the main memory and doesn't need to be stored permanently.
Is there any kind of library/database that can store these data for me in memory and that does have and automagically fallback to disk if system runs in a OOM situation? The API and storage type is unimportant as long as it can store basic Python types (str, int, list, date and ideally dict).
Python's built-in sqlite3 caches file-system writes.
I will go for the in memory solution and let the OS swap. I can still replace the storage component if this will be really a problem. Thanks agf.
I have a relatively large dictionary in Python and would like to be able to not only delete items from it, but actually reclaim the memory back from these deletions in my program. I am running across a problem whereby although I delete items from the dictionary and even run the garbage collector manually, Python does not appear to be freeing the memory itself.
A simple example of this:
>>> tupdict = {}
# consumes around 2 GB of memory
>>> for i in xrange(12500000):
... tupdict[i] = (i,i)
...
# delete over half the entries, no drop in consumed memory
>>> for i in xrange(7500000):
... del tupdict[i]
...
>>> import gc
# manually garbage collect, still no drop in consumed memory after this
>>> gc.collect()
0
>>>
I imagine what is happening is that although the entries are deleted and garbage collector run, Python does not go ahead and resize the dictionary. My question is, is there any simple way around this, or am I likely to require a more serious rethink about how I write my program?
A lot of factors go into whether Python returns this memory to the underlying OS or not, which is probably how you're trying to tell if memory is being freed. CPython has a pooled allocator system that tends to hold on to freed memory so that it can be reused in an efficient manner (but these subsequent allocations won't increase your memory footprint from the perspective of the OS), which might be what you're seeing.
Also, on some unix platforms processes don't release freed memory back to the OS until the application closes (or some other significant event occurs). Even if you are in a situation where an entire pool has been freed (and thus Python might decide to free() it rather than holding it open for future objects), the OS still won't release this memory to be used by other processes (but can be used for further reallocation within the original process). In general this is good for reducing memory fragmentation and doesn't have too much of a downside, as the unused process memory will get paged out to disk. Windows does release process memory back to the OS for use by any new allocation (which you can then see in the Task Manager), so trying this on Windows will likely appear to give you a different result.
In the end, how to manage deallocated process memory is the purview of the operating system, and there are various schemes (with upsides and downsides) used such that just looking in your system information tool of choice won't necessarily tell you the whole truth.
You're right that Python doesn't resize dictionary back if items are deleted from dictionary. This have nothing to do with OS memory management and garbage collection, it is an implementation detail of Python's dict data structure.
A workaround is to create a new dictionary by copying the old dictionary. Check this great video for more info: http://pyvideo.org/video/276/the-mighty-dictionary-55 (around 26:30 there is an answer).