Optimize checking attribute of each class instance inside a list

Optimize checking attribute of each class instance inside a list - python

Let's say I have a simple class, with an attribute, x:
class A:
def __init__(self):
self.x = random.randint(-5, 5) # not the most efficient, but it serves purposes well
I'll also have a list, with hundreds of instances of this class:
Az = []
for i in range(150):
Az.append(A())
Now, let's say I want to loop through all the As in Az, and run a function on the classes who's x attribute is equivalent to less than one. This is one way, but alas, is very inefficient:
for cls in Az:
if cls.x<1:
func1(cls) # A random function that accepts a class as a parameter, and does something to it
So, to wrap it up, my question: How to optimize the speed of the checking?

Optimizing only the third step is tricky. Why not start at the second step by saving list ids of classes where attribute x is <1?
Az = []
ids = []
for i, id in enumerate(range(150)):
cls = A()
if cls.x < 1:
ids.append(id)
Az.append(cls)
And then modify the third step:
for id in ids:
func1(Az[id])

Related

Python mutable class variable vs immutable class variable

Running the sample code below:
class S:
i = 0
a = []
def __init__(self):
self.i += 1
self.a.append(1)
s1 = S()
print((s1.i, s1.a))
s2 = S()
print((s2.i, s2.a))
The output will be:
(1, [1])
(1, [1, 1])
My question is why the int S.i reset to 0 for s2 but the list S.a does not reset to empty? I think it has something to do with the immutable int vs mutable list but could someone help to express more details what happened to the two class variables during the two init calls? Thanks!

So you are altering the instance attributes when you call s1.i or s1.a. To change the class attributes try this:
S.i += 1
S.a.append(1)
In your constructor you initialise self.a and self.i. this creates instance attributes that belong to each instance of the class.
The a and the i declared outside the constructor are class attributes and are shared by all instances.
The reason s1.a and S.a updates regardless of which attribute is used is because lists are mutable and both the instance and class variables are references to the same list.

self.i += 1
is equivalent to
self.i = self.i + 1
When the instance variable does not exist, the value is looked up on the class, so in this scenario, it is equivalent to
self.i = S.i + 1
After you define self.i, then any further value lookup is on the instance variable, not on the class variable. So after this line, you have S.i = 0 and s1.i = 1. Since S.i is not modified, s2.i also becomes 1.
On the other hand,
self.a.append(1)
does not create a new instance variable, but appends an element to the existing class variable.

The way this particular code is written abstracts some of what Python is doing behind the scenes here, so let's go through it.
When you define the class, and you define variables outside of any function like you do at the beginning in your code, it creates class attributes. These are shared among all instances of your class (in your case, s1 and s2 are both sharing the same reference to your i object and your a object).
When you initialize the class, you are calling the __init__ function, which, in your code, first calls self.i += 1, and I think this is where most of the confusion is coming from. In Python, integers are immutable, so they cannot be overridden. By calling +=, you are removing the reference to your old i variable and creating a new one referencing a different place in memory. But because you are now in a function in your class, it's being defined as an instance attribute. Instance attributes are not shared among different instances of your class.
However, lists are mutable. So when you append 1 to your list, you are not creating a new instance variable, so it keeps the same reference to the class attribute, and therefore when you initialize your class the second time, it adds it onto the class attribute that already has been populated once when you created the first instance.

class S:
i = 0
a = []
def __init__(self):
self.i += 1
self.a.append(1)
the list as defined by a = [] is a class attribute. It's instantiated when the class is defined, and remains the same list object. Any instances of this class are going to reference the one list.
If you want to have an empty list for every new instance, then move the list definition to within the __init__ method:
class S:
i = 0
def __init__(self):
self.a = []
self.i += 1
self.a.append(1)
Result:
>>> s1 = S()
>>> print((s1.i, s1.a))
(1, [1])
>>>
>>> s2 = S()
>>> print((s2.i, s2.a))
(1, [1])

The zen of python applied to methods in classes

The Zen of python tells us:
There should be one and only one obvious way to do it.
This is difficult to put in practice when it comes to the following situation.
A class receives a list of documents.
The output is a dictionary per document with a variety of key/value pairs.
Every pair depends on a previous calculated one or even from other value/pairs of other dictionary of the list.
This is a very simplified example of such a class.
What is the “obvious” way to go? Every method adds a value/pair to every of the dictionaries.
class T():
def __init__(self,mylist):
#build list of dicts
self.J = [{str(i):mylist[i]} for i in range(len(mylist))]
# enhancement 1: upper
self.method1()
# enhancement 2: lower
self.J = self.static2(self.J)
def method1(self):
newdict = []
for i,mydict in enumerate(self.J):
mydict['up'] = mydict[str(i)].upper()
newdict.append(mydict)
self.J = newdict
#staticmethod
def static2(alist):
J = []
for i,mydict in enumerate(alist):
mydict['down'] = mydict[str(i)].lower()
J.append(mydict)
return J
#property
def propmethod(self):
J = []
for i,mydict in enumerate(self.J):
mydict['prop'] = mydict[str(i)].title()
J.append(mydict)
return J
# more methods extrating info out of every doc in the list
# ...
self.method1() is simple run and a new key/value pair is added to every dict.
The static method 2 can also be used.
and also the property.
Out of the three ways I discharge #property because I am not adding another attribute.
From the other two which one would you choose?
Remember the class will be composed by tens of this Methode that so not add attributes. Only Update (add keine pair values) dictionaries in a list.
I can not see the difference between method1 and static2.
thx.

Python set more properties with 1 call

I have a very expensive method that returns 2 values, and it is called by class A. Since it is expensive, I made the 2 values lazy evaluated, using properties. Since I don't want to call the very_expensive_function 2 times, the first time the user wants to access one of the 2 values, I save both.
So far I wrote this:
class A:
def __init__(self):
self._attr1 = None
self._attr2 = None
#property
def attr1(self):
self.calculate_metrics()
return self._attr1
#property
def attr2(self):
self.calculate_metrics()
return self._attr2
def calculate_metrics():
if self._attr1 is None:
attr1, attr2 = very_expensive_call()
self._attr1 = attr1
self._attr2 = attr2
As you can see, the first time the user access to attr1 or attr2, I save both. Is it correct or is it possible in another way? It seems very strange to have that calculate_metrics() copy-pasted every time.

Memoization is, simply put, remembering if you have already called a function with particular arguments. If you have it simply returns the already calculated return value rather than calculating it again.
import time
def long_calculation(x, y, memo={}):
try:
result = memo[x, y] # already calculated!
except KeyError:
# make long_calculation take a long time!
time.sleep(2)
result = x * y
memo[x, y] = result
return result
The dictionary memo is able to remember calls to the function because it is evaluated when the function is first loaded: every call to long_calculation shares the same memo dictionary.
To test this try:
# Note that (2,2) (7,8) and (10,10) are repeated here:
test_values = ((2,2),(4,5),(2,2),(7,8),(2,3),(7,8),(10,11),(4,5),(10,10),(10,10))
for values in test_values:
start = time.time()
res = long_calculation(*values)
end = time.time()
elapsed = end-start
print(values,' calculated in ',elapsed, "seconds")
It should be fairly easy to insert this kind of code into your class. If you always need the attributes calculated then you can put the call in __init__.

Removing list item if not in another list - python

Here is my situation.
I have a list of Person objects.
class Person():
def __init__(self, name="", age=):
self.name = name
self.uid = str( uuid.uuid4( ) )
self.age = age
My UI contains a treeview displaying these items. In some cases users can have a instance of the same person if they want. I highlight those bold to let the user know it's the same person.
THE PROBLEM
When a user deletes a tree node I then need to know if I should remove the actual object from the list. However if another instance of that object is being used then I shouldn't delete it.
My thoughts for a solution.
Before the delete operation takes place, which removes just the treenode items, I would collect all persons being used in the ui.
Next I would proceed with deleting the treeview items.
Next take another collection of objevst being used in the ui.
Laslty compare the two lists and delete persons not appearing in second list.
If I go this solution would I be best to do a test like
for p in reversed(original_list):
if p not in new_list:
original_list.remove(p)
Or should I collect the uid numbers instead to do the comparisons rather then the entire object?
The lists could be rather large.
Herr is the code with my first attempt at handling the remove operation. It saves out a json file when you close the app.
https://gist.github.com/JokerMartini/4a78b3c5db1dff8b7ed8
This is my function doing the deleting.
def delete_treewidet_items(self, ctrl):
global NODES
root = self.treeWidget.invisibleRootItem()
# delete treewidget items from gui
for item in self.treeWidget.selectedItems():
(item.parent() or root).removeChild(item)
# collect all uids used in GUI
uids_used = self.get_used_uids( root=self.treeWidget.invisibleRootItem() )
for n in reversed(NODES):
if n.uid not in uids_used:
NODES.remove(n)

You have not really posted enough code but from what I can gather:
import collections
import uuid
class Person():
def __init__(self, name="", age=69):
self.name = name
self.uid = str( uuid.uuid4( ) )
self.age = age
def __eq__(self, other):
return isinstance(other, Person) and self.uid == other.uid
def __ne__(self, other): return self != other # you need this
def __hash__(self):
return hash(self.uid)
# UI --------------------------------------------------------------------------
persons_count = collections.defaultdict(int) # belongs to your UI class
your_list_of_persons = [] # should be a set
def add_to_ui(person):
persons_count[person] += 1
# add it to the UI
def remove_from_ui(person):
persons_count[person] -= 1
if not persons_count[person]: your_list_of_persons.remove(person)
# remove from UI
So basically:
before the delete operation takes place, which removes just the treenode items, I would collect all persons being used in the ui.
No - you have this info always available as a module variable in your ui - the persons_count above. This way you don't have to copy lists around.
Remains the code that creates the persons - then your list (which contains distinct persons so should be a set) should be updated. If this is done in add_to_ui (makes sense) you should modify as:
def add_to_ui(name, age):
p = Person(name, age)
set_of_persons.add(p) # if already there won't be re-added and it's O(1)
persons_count[person] += 1
# add it to the UI
To take this a step further - you don't really need your original list - that is just persons_count.keys(), you just have to modify:
def add_to_ui(name, age):
p = Person(name, age)
persons_count[person] += 1
# add it to the UI
def remove_from_ui(person):
persons_count[person] -= 1
if not persons_count[person]: del persons_count[person]
# remove from UI
So you get the picture
EDIT: here is delete from my latest iteration:
def delete_tree_nodes_clicked(self):
root = self.treeWidget.invisibleRootItem()
# delete treewidget items from gui
for item in self.treeWidget.selectedItems():
(item.parent() or root).removeChild(item)
self.highlighted.discard(item)
persons_count[item.person] -= 1
if not persons_count[item.person]: del persons_count[item.person]
I have posted my solution (a rewrite of the code linked to the first question) in: https://github.com/Utumno/so_34104763/commits/master. It's a nice exercise in refactoring - have a look at the commit messages. In particular I introduce the dict here: https://github.com/Utumno/so_34104763/commit/074b7e659282a9896ea11bbef770464d07e865b7
Could use more work but it's a step towards the right direction I think - should be faster too in most operations and conserve memory

Not worrying too much about runtime or size of lists, you could use set-operations:
for p in set(original_list) - set(new_list):
original_list.remove(p)
Or filter the list:
new_original_list = [p for p in original_list if p in new_list]
But then again, why look at the whole list - when one item (or even a non-leaf node in a tree) is deleted, you know which item was deleted, so you could restrict your search to just that one.

You can compare objects using:
object identity
object equality
To compare objects identity you should use build in function id() or keyword is (that uses id()). From docs:
id function
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
is operator
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
Example:
>>> p1 = Person('John')
>>> p2 = Person('Billy')
>>> id(p1) == id(p2)
False
>>> p1 is p2
False
To compare object equality you use == operator. == operator uses eq method to test for equality. If class does not define such method it falls back to comparing identity of objects.
So for:
Or should I collect the uid numbers instead to do the comparisons
rather then the entire object?
you would be doing the same thing since you have not defined eq in your class.
To filter lists do not modify list while you are iterating over it is bad. Guess what will be printed:
>>> a = [1, 2, 3]
>>> b = [1, 2]
>>> for item in a:
... if item in b:
... a.remove(item)
>>> a
[2, 3]
If you want to do this safely iterate over list from the back like:
>>> a = [1, 2, 3]
>>> b = [1, 2]
>>> for i in xrange(len(a) - 1, -1, -1):
... if a[i] in b:
... a.pop(i)
2
1
>>> a
[3]

Python, how to copy an object in an efficient way that permits to modyfing it too?

in my Python code I have the following issue: i have to copy the same object many times and then pass each copy to a function that modifies it. I tried with copy.deepcopy, but it's really computationally expensive, then i tried with itertools.repeat(), but it was a bad idea because after that i've to modify the object. So i wrote a simple method that copy an object simply returning a new object with the same attributes:
def myCopy(myObj):
return MyClass(myObj.x, myObj.y)
The problem is that this is really unefficient too: i've to make it abaout 6000 times and it takes more than 10 seconds! So, does exist a better way to do that?
The object to copy and modify is table, that is created like that:
def initialState(self):
table = []
[table.append(Events()) for _ in xrange(self.numSlots)]
for ei in xrange(self.numEvents - 1):
ei += 1
enr = self.exams[ei]
k = random.randint(0, self.numSlots - 1)
table[k].Insert(ei, enr)
x = EtState(table)
return x
class Event:
def __init__(self, i, enrollment, contribution = None):
self.ei = i
self.enrollment = enrollment
self.contribution = contribution
class Events:
def __init__(self):
self.count = 0
self.EventList = []
def getEvent(self, i):
return self.EventList[i].ei
def getEnrollment(self, i):
return self.EventList[i].enrollment
def Insert(self, ei, enroll = 1, contribution = None):
self.EventList.append(Event(ei, enroll, contribution))
self.count += 1
def eventIn(self, ei):
for x in xrange(self.count):
if(self.EventList[x].ei == ei):
self.EventList[x].enrollment += 1
return True
return False

More Pythonic way would be to create function(s) that modify the object, but don't modify the original object, just return its modified form. But from this code you posted, it is not clear what are you acutally trying to do, you should make a more simple (generic) example of what are you trying to do.
Since Object in Python means anything, class, instance, dict, list, tuple, 'a', etc..
to copy object is kind of not clear...
You mean copy instance of a Class if I understood it correctly
So write a function that takes one instance of that class, in that function create another instance and copy all atributes you need..

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimize checking attribute of each class instance inside a list - python

Optimizing only the third step is tricky. Why not start at the second step by saving list ids of classes where attribute x is <1? Az = [] ids = [] for i, id in enumerate(range(150)): cls = A() if cls.x < 1: ids.append(id) Az.append(cls) And then modify the third step: for id in ids: func1(Az[id])

Related

Python mutable class variable vs immutable class variable

The zen of python applied to methods in classes

Python set more properties with 1 call

Removing list item if not in another list - python

Python, how to copy an object in an efficient way that permits to modyfing it too?

Categories

Resources