Something seems to be stuck in memory - python

I have a program that loops over a list and then performs a function on the list. The result that is getting returned from the function is different depending on whether I loop over several observations versus just one. For example when I put in the 10th observation by itself, I get one result but when I put in 9 and 10 and loop over them I get a different answer for 10. The only thing I can come up with is that there is some variable in storage that is leftover from performing the function on 9 that is leading to something different for 10. Here's the code for the loop:
for i, k in enumerate(Compobs):
print i+1, ' of ', len(Compobs)
print Compobs[i]
Compobs[i] = Filing(k[0],k[1])
Compobs is just a list like this:
[['355300', '19990531'],[...],...]
The function Filing is from another .py file that I import. It defines a new class, Filing() and performs a bunch of functions on each observation and ultimately returns some output. I'm fairly new to python so I'm at a bit of a loss here. I could post the Filing.py code, but that's over 1,000 lines of code.
Here's the Filing class and the init.
class Filing(object):
cik =''
datadate=''
potentialpaths=[]
potential_files=[]
filingPath =''
filingType=''
reportPeriod=''
filingText=''
current_folder=''
compData=pd.Series()
potentialtablenumbers=[]
tables=[]
statementOfCashFlows=''
parsedstatementOfCashFlows=[]
denomination=''
cashFlowDictionary ={}
CFdataDictionary=OrderedDict()
CFsectionindex=pd.Series()
cfDataSeries=pd.Series()
cfMapping=pd.DataFrame()
compCFSeries=pd.Series()
cftablenumber=''
CompleteCF=pd.DataFrame()
def __init__(self,cik,datadate):
self.cik=cik
self.datadate=datadate
self.pydate=date(int(datadate[0:4]),int(datadate[4:6]),int(datadate[6:8]))
self.findpathstofiling()
self.selectfiling()
self.extractFilingType()
self.extractFilingText()
self.getCompData()
self.findPotentialStatementOfCashFlows()
self.findStatementOfCashFlows()
self.cleanUpCashFlowTable()
self.createCashFlowDictionary()
self.extractCFdataDictionary()
self.createCFdataSeries()
self.identifySections()
self.createMapping()
self.findOthers()
Shouldn't all the variables in the Filing.py get cleared out of memory each time it is called? Is there something I'm missing?

All of the lists, dicts, and other objects defined at the top level of Filing have only one copy. Even if you explicitly assign them to an instance, that copy is shared (and if you don't explicitly assign them, they're inherited). The point is that if you modify them in one instance, you modify them in all instances.
If you want each instance to have its own copy, then get rid of the top-level assignments altogether, and instead assign new instances of the objects in __init__.
In other words, don't do this:
class Foo(object):
x = []
def __init__(self):
self.x = x
Instead, do this:
class Foo(object):
def __init__(self):
self.x = []
Then each instance will have its own, unshared copy of x.

You are defining your class data members as class attributes, not object attributes. They are like static data member of a C++ or Java class.
To fix this, you need to not define them above the __init__ method, but instead, define them in the __init__ method. For example, instead of
tables = []
above __init__ you should have:
self.tables = []
in __init__

Related

Best Practice in Python: Class Object with helper variables - delete helpers after use

I want to have a Class with only one argument to it. Based on that argument a couple of calculations should take place aiming at setting a specific attribute for the Class. Other attributes won't be needed afterwards and I would like to delete them within the Class. What's the best approach?
Simplified Example:
class Sportsteam:
def __init__(self, members):
self.members = members # members will be a list
self.num_members = len(self.members) # helpler variable: how many team members are in the sportsteam?
self.rooms = math.ceil(self.num_members/2) # how many doubles will be needed in a hotel?
I want to delete the instance variable num_members because it won't be needed afterwards. I want that to be done within the class/object, so I do not need a separate line with del instance.num_members within my script for each instance.
Please note that variable assigning is more complex with a lot of conditions in the original use case. Calculation without the helper-variable would work in the example above, but would be really annoying in the use case.
As #monk pointed out, also local variables can be assigned within the __init__ statement. For above example the use of a helper variable would therefor be:
class Sportsteam:
def __init__(self, members):
self.members = members # members will be a list
num_members = len(self.members) # helper variable: how many team members are in the sportsteam?
self.rooms = math.ceil(num_members/2) # how many doubles will be needed in a hotel?
In this case instance.num_members does not exist.
I was researching for an answer to my question with different keywords for quite a while, but neither came to a solution nor to an example which showed that possibility

How to iteratively use a method of an iteratively created object in Python

I'm trying to create object iteratively by using a class method inside the class that the objects belong to. So everytime I call that class method it creates an object, and ads it to a dictionary with its proper idex (both are class variables). My problem comes when I want to call the same method of every object, but iteratively and with a random attribute each time. My code is large so here I coded a another program with exactly what I'm looking for so it's easier to understand.
class new_class:
objects = {} #this dictionary stores all objects of this class
i = 0 #used to iterate the dictionary and define every object separately
def __init__(self):
pass
def method(self, random): #<-- here goes the random elements that the method should be called with
return random #sample usage of the random value
#classmethod
def object_creator(cls):
cls.i += 1
cls.objects[cls.i] = cls() <-- this creates a new object of its own class and adds it to the dictionary with the key of the also iterated "i" variable
while True:
new_class.object_creator()
#Here I want to call for the method of evey existing object with random attributes
Calling the object this way, with the dictionary and its index doesn't work because it just calls the last created object, since the current index belongs to him.
while True:
new_class.object_creator()
new_class.objects[new_class.i].method()
I'm not sure if this is even possible because I would have to essentialy "create new code" for each created object. The only pseudo-solution I've found is to make another loop and make iterate through the length of the dictionary, and call the method of the object whose index is the loop's one, but that calls each method at a time and not all of them at the same time.
By default, your code is executed sequentially by a single thread, so the calls to the method will be done one after another. But it may be very quick to call all your objects' method because computers are fast. And from the point of view of the programming language, calling call_my_method_for_all_my_objects is no different than calling int("14").
If you really really (really) want to have code executed in parallel, you can have a look at multi-threading and multi-processing, but these are not easy topics. Don't bother with them if you don't actually want your program to execute faster or really need to have multiple code execution at the same time.
Using a dict instead of a list is not a real issue.
The problem with
while True:
new_class.object_creator()
new_class.objects[new_class.i].method()
is that at each iteration of the loop, it will create a new object (which increments i), then call the i-th object (newly created) method. It means that each object will have its method called only once, and in the creation order which is also i-ascending.
As for a solution, I recommand you to create a function or a method that will call for each of your objects. I decided to implement it as a static method of the class :
class new_class:
objects = {}
i = 0
def __init__(self):
pass
def method(self, random):
return random
#classmethod
def object_creator(cls):
cls.i += 1
cls.objects[cls.i] = cls()
#staticmethod # static
def call_each():
for i, obj in new_class.objects.items(): # iterate over the objects
print(obj.method(i)) # call each one's method, for example with its index
I used it like that :
# let's create 3 items for demonstration purposes
new_class.object_creator(); new_class.object_creator(); new_class.object_creator()
print(new_class.objects) # {1: <__main__.new_class object at 0x0000022B26285470>,
# 2: <__main__.new_class object at 0x0000022B262855C0>,
# 3: <__main__.new_class object at 0x0000022B262854A8>}
new_class.call_each() # prints 1 2 3
If you want to provide a random value for each call, add import random to your script and change the call_each method to :
#staticmethod
def call_each():
for obj in new_class.objects.values():
print(obj.method(random.random()))
so that
new_class.call_each() # prints 0.35280749626847374
# 0.22163283338299222
# 0.7368657784332368
If this does not answer your question, please please try to be extra clear in what you ask.

Caching attributes with id(self), any better solutions?

I'm trying to cache attributes to get a behavior like this:
ply = Player(id0=1)
ply.name = 'Bob'
# Later on, even in a different file
ply = Player(id0=1)
print(ply.name) # outputs: Bob
So basically I want to retain the value between different objects if only their id0 is equal.
Here's what I attempted:
class CachedAttr(object):
_cached_attrs = {}
def __init__(self, defaultdict_factory=int):
type(self)._cached_attrs[id(self)] = defaultdict(defaultdict_factory)
def __get__(self, instance, owner):
if instance:
return type(self)._cached_attrs[id(self)][instance.id0]
def __set__(self, instance, value):
type(self)._cached_attrs[id(self)][instance.id0] = value
And you'd use the class like so:
class Player(game_engine.Player):
name = CachedAttr(str)
health = CachedAttr(int)
It seems to work. However, a friend of mine (somewhat) commented about this:
You are storing objects by their id (memory address) which is most likely going to leaks or get values of garbage collected objects from a new one which reused the pointer. This is dangerous since the id itself is not a reference but only an integer independent of the object itself (which means you will most likely store freed pointers and grow in size till you hit a MemoryError).
And I've been experiencing some random crashes, could this be the reason of the crashes?
If so, is there a better way to cache the values other than their id?
Edit: Just to make sure; my Player class inherits from game_engine.Player, which is not created by me (I'm only creating a mod for an other game), and the game_engine.Player is used by always getting a new instance of the player from his id0. So this isn't a behavior defined by me.
Instead of __init__, look at __new__. There, look up the unique object in dict and return that instead of cerating a new one. That way, you avoid the unnecessary allocations for the object wherever you need it. Also, you avoid the problem that the different objects have different IDs.

Python - Appending to a list of an object declared inside a for loop

I'm having a very specific problem that I could not find the answer to anywhere on the web. I'm new to python code (C++ is my first language), so I'm assuming this is just a semantic problem. My question is regarding objects that are declared inside the scope of a for loop.
The objective of this code is to create a new temporary object inside the for loop, add some items to it's list, then put the object into a list outside of the for loop. At each iteration of the for loop, I wish to create a NEW, SEPARATE object to populate with items. However, each time the for loop executes, the object's list is already populated with the items from the previous iteration.
I have a bigger program that is having the same problem, but instead of including a massive program, I wrote up a small example which has the same semantic problem:
#Test
class Example:
items = []
objList = []
for x in xrange(5):
Object = Example()
Object.items.append("Foo")
Object.items.append("Bar")
print Object.items
objList.append(Object)
print "Final lists: "
for x in objList:
print x.items
By the end of the program, every item in objList (even the ones from the first iterations) contains
["Foo","Bar","Foo","Bar","Foo","Bar","Foo","Bar","Foo","Bar"]`
This leads me to believe the Example (called Object in this case) is not recreated in each iteration, but instead maintained throughout each iteration, and accessed every time the for loop continues.
My simple question; in python, how to I change this?
Change your class definition to:
class Example:
def __init__(self):
self.items = []
The problem is that items is not being bound to the instance of each object, it is a part of the class definition. Because items is a class variable, it is shared between the instances of Example.
Your class Example is using a class variable Example.items to store the strings. Class variables are shared across all instances of the objects which is why you're getting them all together and thinking it's the same object when in fact, it is not.
You should create the items variable and assign it to self in your __init__ function
class Example(object):
def __init__(self):
self.items = []
This will ensure that each instance of Example has its own items list to work with.
Also, as you're using Python 2.x, you really should subclass object as I have there.

Why do new instances of a class share members with other instances?

class Ball:
a = []
def __init__(self):
pass
def add(self,thing):
self.a.append(thing)
def size(self):
print len(self.a)
for i in range(3):
foo = Ball()
foo.add(1)
foo.add(2)
foo.size()
I would expect a return of :
2
2
2
But I get :
2
4
6
Why is this? I've found that by doing a=[] in the init, I can route around this behavior, but I'm less than clear why.
doh
I just figured out why.
In the above case, the a is a class attribute, not a data attribute - those are shared by all Balls(). Commenting out the a=[] and placing it into the init block means that it's a data attribute instead. (And, I couldn't access it then with foo.a, which I shouldn't do anyhow.) It seems like the class attributes act like static attributes of the class, they're shared by all instances.
Whoa.
One question though : CodeCompletion sucks like this. In the foo class, I can't do self.(variable), because it's not being defined automatically - it's being defined by a function. Can I define a class variable and replace it with a data variable?
What you probably want to do is:
class Ball:
def __init__(self):
self.a = []
If you use just a = [], it creates a local variable in the __init__ function, which disappears when the function returns. Assigning to self.a makes it an instance variable which is what you're after.
For a semi-related gotcha, see how you can change the value of default parameters for future callers.
"Can I define a class variable and replace it with a data variable?"
No. They're separate things. A class variable exists precisely once -- in the class.
You could -- to finesse code completion -- start with some class variables and then delete those lines of code after you've written your class. But every time you forget to do that nothing good will happen.
Better is to try a different IDE. Komodo Edit's code completions seem to be sensible.
If you have so many variables with such long names that code completion is actually helpful, perhaps you should make your classes smaller or use shorter names. Seriously.
I find that when you get to a place where code completion is more helpful than annoying, you've exceeded the "keep it all in my brain" complexity threshold. If the class won't fit in my brain, it's too complex.

Categories

Resources