My python application reads data from files and stores these data in dictionaries during start up (dictionaries are properties of data reader classes). Once the application starts and the read data is used, these data in the dictionaries are no longer needed. However, they consume large amount of memory. How do I delete these dictionaries to free the memory?
For example:
class DataReader():
def __init__(self, data_file):
self.data_file = data_file
def read_data_file_and_store_data_in_dictionary():
self.data_dictionary = {}
for [data_name, data] in self.data_file:
self.data_dictionary[data_name] = data
class Application():
def __init__(self, data_file):
self.data_reader = DataReader()
self.data_reader.read()
def start_app(self):
self.use_read_data()
After application is started, self.data_dictionary is no longer needed. How do I delete self.data_dictionary permanently?
Use the del statement
del self.data_dictionary # or del obj.data_dictionary
Note this will only delete this reference to the dictionary. If any other references still exist for the dictionary (say if you had done d = data_reader.data_dictionary and d still references data_dictionary) then the dictionary will not be freed from memory. This also includes any references to d.keys(), d.values(), d.items().
Only when all references have been removed will the dictionary finally be released.
Using Python, you should not care about memory management.
Python has an excellent garbage collector, which counts for each object the reference in the code.
If there are no references, the object will be unallocated.
In your case, if the memory is not free after you're done using it, it means that the object can be still used in your program. If you delete it and then try to call it, you will get a ReferenceError.
Someone in other answers is suggesting to use del, but it will only delete the variable name.
My suggestion is to ensure that your code does not actually call the object anymore, and if it does, manipulate your data accordingly (use a lightweight db, save them on local hard drive, ...) and retrieve them when needed. If your big dictionaries are class parameters of a class which is still used, but doesn't need the dicts anymore, you should take those dicts outside the class (maybe referencing a new class, which only manages the dicts). In this Q&A you will find useful tips for optimizing memory usage.
You can read this article to have a really interesting dive into the Python's garbage collector
How about having the data in a smaller scope?
class Application():
def __init__(self, data_file):
self.use_read_data(DataReader(data_file).read())
After application is started, self.data_dictionary is no longer needed
If you do not need the data for the whole lifetime of the application then you shouldn't be storing it in an instance attribute.
Choose the right scope and you won't need to care about deleting variables.
del will delete the reference to the object, however it could still be on memory. In that case, the garbage collector (gc.collect([generation])) will free the memory:
https://docs.python.org/2.7/library/gc.html
import gc
[...]
# Delete reference
del object
# Garbage collector
gc.collect()
[...]
Related
I am writing a python class like this:
class MyImageProcessor:
def __init__ (self, image, metadata):
self.image=image
self.metadata=metadata
Both image and metadata are objects of a class written by a
colleague. Now I need to make sure there is no waste of memory. I am thinking of defining a quit() method like this,
def quit():
self.image=None
self.metadata=None
import gc
gc.collect()
and suggest users to call quit() systematically. I would like to know whether this is the right way. In particular, do the instructions in quit() above guarantee that unused memories being well collected?
Alternatively, I could rename quit() to the build-in __exit__(), and suggest users to use the "with" syntax. But my question is
more about whether the instructions in quit() indeed fulfill the garbage collection work one would need in this situation.
Thank you for your help.
In python every object has a built-in reference_count, the variables(names) you create are only pointers to the objects. There are mutable and unmutable variables (for example if you change the value of an integer, the name will be pointed to another integer object, while changing a list element will not cause changing of the list name).
Reference count basically counts how many variable uses that data, and it is incremented/decremented automatically.
The garbage collector will destroy the objects with zero references (actually not always, it takes extra steps to save time). You should check out this article.
Similarly to object constructors (__init__()), which are called on object creation, you can define destructors (__del__()), which are executed on object deletion (usually when the reference count drops to 0). According to this article, in python they are not needed as much needed in C++ because Python has a garbage collector that handles memory management automatically. You can check out those examples too.
Hope it helps :)
No need for quit() (Assuming you're using C-based python).
Python uses two methods of garbage collection, as alluded to in the other answers.
First, there's reference counting. Essentially each time you add a reference to an object it gets incremented & each time you remove the reference (e.g., it goes out of scope) it gets decremented.
From https://devguide.python.org/garbage_collector/:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference counts are decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can get information about current reference counts for an object using sys.getrefcount(x), but really, why bother.
The second way is through garbage collection (gc). [Reference counting is a type of garbage collection, but python specifically calls this second method "garbage collection" -- so we'll also use this terminology. ] This is intended to find those places where reference count is not zero, but the object is no longer accessible. ("Reference cycles") For example:
class MyObj:
pass
x = MyObj()
x.self = x
Here, x refers to itself, so the actual reference count for x is more than 1. You can call del x but that merely removes it from your scope: it lives on because "someone" still has a reference to it.
gc, and specifically gc.collect() goes through objects looking for cycles like this and, when it finds an unreachable cycle (such as your x post deletion), it will deallocate the whole lot.
Back to your question: You don't need to have a quit() object because as soon as your MyImageProcessor object goes out of scope, it will decrement reference counters for image and metadata. If that puts them to zero, they're deallocated. If that doesn't, well, someone else is using them.
Your setting them to None first, merely decrements the reference count right then, but when MyImageProcessor goes out of scope, it won't decrement those reference count again, because MyImageProcessor no longer holds the image or metadata objects! So you're just explicitly doing what python does for you already for free: no more, no less.
You didn't create a cycle, so your calling gc.collect() is unlikely to change anything.
Check out https://devguide.python.org/garbage_collector/ if you are interested in more earthy details.
Not sure if it make sense but to my logic you could
Use :
gc.get_count()
before and after
gc.collect()
to see if something has been removed.
what are count0, count1 and count2 values returned by the Python gc.get_count()
I have this python class in which I need to do
self.data = copy.deepcopy(raw_data)
raw_data is a dictionary of a dictionary and takes many megabytes in memory. I only need the data once (in which I do some modification to the data thus the need to do a deepcopy) and I would like to destroy the deepcopy data once I'm done with the computation.
What would be the best way to clear the data from the memory?
Would this work?
self.data = None
Note I'm using Python 3.4 if it makes a difference.
Some say it's not neccesary that python will do it for you, as long as you don't use the varaible for some time. Some say to use garabage collector library.
According Havenard and to Python Official Documentation, you can force the Garbage Collector to release unreferenced memory with gc.collect()
For more information on this:
How can I explicitly free memory in Python?
You will have to stop referencing the object both directly and indirectly. It maybe that setting self.data to None is not enough. del self.data does about the same thing, but instead of setting self.data to None it removes the attribute data from self instead.
CPython (the normal implementation of python) uses primary reference counting to determine when an object may be collected, when reference count has dropped to zero it will be collected immediately. You can check the reference count of self.data by using sys.getrefcount(self.data), but that might confuse you as it may report one more reference since the function itself has a reference to the object.
I want a Roach class to "die" when it reaches a certain amount of "hunger", but I don't know how to delete the instance. I may be making a mistake with my terminology, but what I mean to say is that I have a ton of "roaches" on the window and I want specific ones to disappear entirely.
I would show you the code, but it's quite long. I have the Roach class being appended into a Mastermind classes roach population list.
In general:
Each binding variable -> object increases internal object's reference counter
there are several usual ways to decrease reference (dereference object -> variable binding):
exiting block of code where variable was declared (used for the first time)
destructing object will release references of all attributes/method variable -> object references
calling del variable will also delete reference in the current context
after all references to one object are removed (counter==0) it becomes good candidate for garbage collection, but it is not guaranteed that it will be processed (reference here):
CPython currently uses a reference-counting scheme with (optional)
delayed detection of cyclically linked garbage, which collects most
objects as soon as they become unreachable, but is not guaranteed to
collect garbage containing circular references. See the documentation
of the gc module for information on controlling the collection of
cyclic garbage. Other implementations act differently and CPython may
change. Do not depend on immediate finalization of objects when they
become unreachable (ex: always close files).
how many references on the object exists, use sys.getrefcount
module for configure/check garbage collection is gc
GC will call object.__ del__ method when destroying object (additional reference here)
some immutable objects like strings are handled in a special way - e.g. if two vars contain same string, it is possible that they reference the same object, but some not - check identifying objects, why does the returned value from id(...) change?
id of object can be found out with builtin function id
module memory_profiler looks interesting - A module for monitoring memory usage of a python program
there is lot of useful resources for the topic, one example: Find all references to an object in python
You cannot force a Python object to be deleted; it will be deleted when nothing references it (or when it's in a cycle only referred to be the items in the cycle). You will have to tell your "Mastermind" to erase its reference.
del somemastermind.roaches[n]
for i,roach in enumerate(roachpopulation_list)
if roach.hunger == 100
del roachpopulation_list[i]
break
Remove the instance by deleting it from your population list (containing all the roach instances.
If your Roaches are Sprites created in Pygame, then a simple command of .kill would remove the instance.
According to the official Python documentation for the weakref module the "primary use for weak references is to implement caches or mappings holding large objects,...". So, I used a WeakValueDictionary to implement a caching mechanism for a long running function. However, as it turned out, values in the cache never stayed there until they would actually be used again, but needed to be recomputed almost every time. Since there were no strong references between accesses to the values stored in the WeakValueDictionary, the GC got rid of them (even though there was absolutely no problem with memory).
Now, how am I then supposed to use the weak reference stuff to implement a cache? If I keep strong references somewhere explicitly to keep the GC from deleting my weak references, there would be no point using a WeakValueDictionary in the first place. There should probably be some option to the GC that tells it: delete everything that has no references at all and everything with weak references only when memory is running out (or some threshold is exceeded). Is there something like that? Or is there a better strategy for this kind of cache?
I'll attempt to answer your inquiry with an example of how to use the weakref module to implement caching. We'll keep our cache's weak references in a weakref.WeakValueDictionary, and the strong references in a collections.deque because it has a maxlen property that controls how many objects it holds on to. Implemented in function closure style:
import weakref, collections
def createLRUCache(factory, maxlen=64):
weak = weakref.WeakValueDictionary()
strong = collections.deque(maxlen=maxlen)
notFound = object()
def fetch(key):
value = weak.get(key, notFound)
if value is notFound:
weak[key] = value = factory(key)
strong.append(value)
return value
return fetch
The deque object will only keep the last maxlen entries, simply dropping references to the old entries once it reaches capacity. When the old entries are dropped and garbage collected by python, the WeakValueDictionary will remove those keys from the map. Hence, the combination of the two objects helps us keep only maxlen entries in our LRU cache.
class Silly(object):
def __init__(self, v):
self.v = v
def fib(i):
if i > 1:
return Silly(_fibCache(i-1).v + _fibCache(i-2).v)
elif i: return Silly(1)
else: return Silly(0)
_fibCache = createLRUCache(fib)
It looks like there is no way to overcome this limitation, at least in CPython 2.7 and 3.0.
Reflecting on solution createLRUCache():
The solution with createLRUCache(factory, maxlen=64) is not fine with my expectations. The idea of binding to 'maxlen' is something I would like to avoid. It would force me to specify here some non scalable constant or create some heuristic, to decide which constant is better for this or that host memory limits.
I would prefer GC will eliminate unreferenced values from WeakValueDictionary not straight away, but on condition is used for regular GC:
When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.
I am loading a JSON file to parse it and convert it (only a part of the JSON) to a CSV.
So at the end of the method I would free the space of the loaded JSON.
Here is the method:
def JSONtoCSV(input,output):
outputWriter = csv.writer(open(output,'wb'), delimiter=',')
jsonfile = open(input).read()
data = loads(jsonfile)
for k,v in data["specialKey"].iteritems():
outputWriter.writerow([v[1],v[5]])
How do you free the space of the "data" variable?
del data
should do it if you only have one reference. Keep in mind this will happen automatically when the current scope ends (the function returns).
Also, you don't need to keep the jsonfile string around, you can just
data = json.load(open(input))
to read the JSON data directly from the file.
If you want data to go away as soon as you're done with it, you can combine all of that:
for k,v in json.load(open(input))["specialKey"].iteritems():
since there is no reference to the data once the loop has ended, Python will free the memory immediately.
In Python, variables are automatically freed when they go out of scope so you shouldn't have to worry about it. However if you really want to, you can use
del data
One thing to note is that the garbage collector probably won't kick in immediately, even if you do use del. That's the downside of garbage collection. You just don't have 100% control of memory management. That is something you will need to accept if you want to use Python. You just have to trust the garbage collector to know what it's doing.
The data variable does not take up any meaningful space—it's just a name. The data object takes up some space, and Python does not allow you to free objects manually. Objects will be garbage collected some time after there are no references to them.
To make sure that you don't keep things alive longer than you want, make sure you don't have a way to access them (don't have a name still bound to them, etc).
An improved implementation might be
def JSONtoCSV(input_filename, output_filename):
with open(input_filename) as f:
special_data = json.load(f)[u'specialKey']
with open(output_filename,'wb') as f:
outputWriter = csv.writer(f, delimiter=',')
for k, v in special_data.iteritems():
outputWriter.writerow([v[1], v[5]])
This doesn't ever store the string you called jsonfile or the dict you called data, so they're freed to be collected as soon as Python wants. The former improvement was made by using json.load instead of json.loads, which takes the file object itself. The latter improvement is made by looking up 'specialKey' immediately rather than binding a name to all of data.
Consider that this delicate dance probably isn't necessary at all, since as soon as you return these references will cease to be around and you've at best sped things up momentarily.
Python is a garbage-collected language, so you don't have to worry about freeing memory once you've used it; once the jsonfile variable goes out of scope, it will automatically be freed by the interpreter.
If you really want to delete the variable, you can use del jsonfile, which will cause an error if you try to refer to it after deleting it. However, unless you're loading enough data to cause a significant drop in performance, I would leave this to the garbage collector.
Please refer to Python json memory bloat. Garbage collection is not kicking-in as thresholds are not met. So even a del call will not free memory. However a forced garbage collection using gc.collect() will free up the object.