Efficient looping and comparing properties of two similar objects

Efficient looping and comparing properties of two similar objects - python

I have a function find() that needs to loop through a lot of objects to identify a similar object by comparing a bunch of properties.
class Target:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class Source:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def find(target: Target, source_set: set):
for s in source_set:
if s.a == target.a:
if s.b == target.b:
if s.c == target.c:
print("Found!")
source_set = {
Source(a=1, b=2, c=3),
Source(a=4, b=2, c=4)
}
target = Target(a=4, b=2, c=4)
find(target, source_set)
The current function is very slow as my source_set can be millions.
The source_set creation and its Source objects can be adjusted (e.g. the type). The source_set itself is not modified after initialisation.
The Source objects creation's input is coming from a dict with the same properties. One Source's raw input data is like this:
{'a': '1', 'b': '2', 'c': '3'}
The source_set is searched with many targets.
Is there a nice way to be more efficient? I'm hoping to not need to change the data structure.

Without any external libraries, you can modify the __hash__ method of each class
class Target:
...
def __hash__(self):
return hash(frozenset(self.__dict__.items()))
class Source:
...
def __hash__(self):
return hash(frozenset(self.__dict__.items()))
Now try:
count = len({hash(target),}.intersection(map(hash, source_set)))
print(count)
# Output
1

Using Pandas:
# Python env: pip install pandas
# Miniconda env: conda install pandas
import pandas as pd
df = pd.DataFrame([s.__dict__ for s in source_set])
sr = pd.Series(target.__dict__)
print(df)
print(sr)
# Output of source_set
a b c
0 4 2 4
1 1 2 3
# Output of target
a 4
b 2
c 4
dtype: int64
Find same rows:
>>> sr.eq(df).all(axis=1).sum()
1

Since the source_set is only created once, but searched with many targets (as stated in your question), it is beneficial to invest time into creating a data structure for the source_set (which is only done once) if the reward is a time gain for the comparison later on (which is done multiple times).
Python's set provides the desired functionality. Internally it is somehow implemented as a hash map (not sure on this). To make use of the in statement, the elements in the set and also the elements that are compared to the set have to be hashable and comparable, i.e. both provide a __hash__ method and one of them provide a __eq__ method.
class Target:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __hash__(self):
return hash((self.a, self.b, self.c))
class Source:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __eq__(self, other):
return self.a == other.a and self.b == other.b and self.c == other.c
def __hash__(self):
return hash((self.a, self.b, self.c))
Now building the set of Source elements is a time investment, because for each element that is added, the __hash__ method is applied.
source_set = {
Source(a=1, b=2, c=3),
Source(a=4, b=2, c=4),
}
However, the reward is that now checking if a target is in the source_set happens in constant time compared to your current approach by comparing the target consecutively to each of the Sources which is in linear time.
target = Target(a=4, b=2, c=4)
target in source_set
# returns True

Related

Same name for instance method and static method in python

I am writing some small library and I want to provide users two approaches for the same functionality, by instance method and static method. Here is a simplified example:
class ClassTimesAdd(object):
def __init__(self, a, b):
self.a = a
self.b = b
def TimesAdd(self, c):
return self.a * self.b + c
#staticmethod
def TimesAdd(a, b, c):
return a * b + c
print(ClassTimesAdd.TimesAdd(1, 3, 7))
ins = ClassTimesAdd(2, 5)
print(ins.TimesAdd(7))
And you can find that the earlier function will be overwritten and only the last one is valid. I'm wondering if there is some simple method that I can use to make the two approaches both work.

List Comprehension Function Pointer Python

I have a method (dosomething) that defines an attribute (self.b). Dummy code below:
class foo:
def __init__(self):
self.a = 1
def dosomething(self, i):
self.b = 2 * self.a + i
return self.b ** 2
testobj = foo()
Attribute a can change - so dosomething is called to determine b given the current value of a.
I want to write a list comprehension like the one below. Except, I need to call dosomething for b to change. The dummy code below would just repeat the current value of self.b 20 times.
[testobj.b for i in range(20)] # pass i to dosomething then store self.b
The quick way is to just return self.b but, the return statement is preoccupied for another value that's much more complicated. If I could return self.b, then the following statement would work:
[testobj.dosomething(i) for i in range(20)]
Attribute b is just an intermediate value that I want to access. Is there a one liner list comprehension for this situation? I was considering defining a function within the method that returns self.b but, I'm not sure how I would be able to access it properly. So something like foo().dosomething(1).getb() wouldn't work because dosomething(1) evaluates to a number.
class foo:
def __init__(self):
self.a = 1
def dosomething(self, i):
self.b = 2 * self.a + i
def getb():
return self.b
return self.b ** 2
I guess I should also add that I don't want to be returning a data structure of different values. It would effect much of my code elsewhere.

Not a good use case for list comprehensions.

Changing Method called without new instance of Python Class

I'm new to Python OOP and for the purpose of this question I have simplified my problem to this:
class Foo:
def __init__(self, a, b):
self.a = a
self.b = b
def add(self):
# some arbitrary change
return self.a + self.b
def subtract(self):
# some arbitrary change
return self.a - self.b
a = Foo(a=1, b=2).add()
b = Foo(a=1, b=3).subtract()
So I have an object, which has 2 methods which do different things, in order for me to get some output, I have created 2 separate instances of Foo as the value b has changed.
Is there a way for me to just dynamically set b and the obj.method() without just listing them one after the other? I.E: some sort of generic class that I can use to dynamically set the attributes and the methods that are present in the object? or is there anything built in I can use...
Edit
Here is another example:
class Foo:
def __init__(self, a, b):
self.a = list(a)
self.b = list(b)
def method1(self):.
# some arbitrary change in data
return self.a * 2
def method2(self):
return self.b + [5, 6, 4]
a = Foo(a=[1, 2, 3], b=[]).method1()
b = Foo(b=[1, 2, 3], a=[]).method2()
print(a)
print(b)
So here, the input list changes based on the method called, is there a way for me to package this up so I could feed just one instance some data and then it 'knows' that list a is for method1(), list b is for method2() - I want to use the word reflection but I feel like that might not be accurate.
Again I'm new to OOP so any advice is appreciated

class Foo:
def add(self, a, b):
return a + b
def subtract(self, a, b):
return a - b
fo = Foo()
a = fo.add(1,2)
b = fo.subtract(1,3)

you don't need 2 instances of Foo to achieve this.
Just do something like this:
foo = Foo(a = 1, b = 2)
# Perform addition (now 'a' is 1 and 'b' is 2)
a = foo.add()
# Change 'b'
foo.b = 3
# Now perform subtraction (now 'a' is 1 and 'b' is 3)
b = foo.subtract()

Unpacking instance variables by making container iterable

I just want to be able to unpack the instance variables of class foo, for example:
x = foo("name", "999", "24", "0.222")
a, b, c, d = *x
a, b, c, d = [*x]
I am not sure as to which is the correct method for doing so when implementing my own __iter__ method, however, the latter is the one that has worked with mixed "success". I say mixed because doing so with the presented code appears to alter the original instance object x, such that it is no longer valid.
class foo:
def __init__(self, a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
def __iter__(self):
return iter([a, b, c, d])
I have read the myriad posts on this site regarding __iter__, __next__, generators etc., and also a python book and docs.python.org and seem unable to figure what I am not understanding. I've gathered that __iter__ needs to return an iterable (which can be just be self, but I am not sure how that works for what I want). I've also tried various ways of playing around with implementing __next__ and iterating over vars(foo).items(), either by casting to a list or as a dictionary, with no success.
I don't believe this is a duplicate post on account that the only similar questions I've seen present a single list sequence object attribute or employ a range of numbers instead of a four non-container variables.

If you want the instance's variables, you should access them with .self:
def __iter__(self):
return iter([self.a, self.b, self.c, self.d])
with this change,
a, b, c, d = list(x)
will get you the variables.
You could go to the more risky method of using vars(x) or x.__dict__, sort it by the variables name (and that's why it is also a limited one, the variables are saved in no-order), and extract the second element of each tuple. But I would say the iterator is definitely better.

You can store the arguments in an attribute (self.e below) or return them on function call:
class foo:
def __init__(self, *args):
self.a, self.b, self.c, self.d = self.e = args
def __call__(self):
return self.e
x = foo("name", "999", "24", "0.222")
a, b, c, d = x.e
# or
a, b, c, d = x()

Mutable objects in python and constants

I have a class which contains data as attributes and which has a method to return a tuple containing these attributes:
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
I use this class essentially as a tuple where the items (attributes) can be modified/read through their attribute name. Now I would like to create objects of this class, which would be constants and have pre-defined attribute values, which I could then assign to a variable/mutable object, thereby initializing this variable object's attributes to match the constant object, while at the same time retaining the ability to modify the attributes' values. For example I would like to do this:
constant_object = myclass(1,2,3)
variable_object = constant_object
variable_object.a = 999
Now of course this doesn't work in python, so I am wondering what is the best way to get this kind of functionality?

Now I would like to create objects of this class, which would be constants and have pre-defined attribute values, which I could then assign to a variable/mutable object, thereby initializing this variable object's attributes to match the constant object,
Well, you can't have that. Assignment in Python doesn't initialize anything. It doesn't copy or create anything. All it does is give a new name to the existing value.
If you want to initialize an object, the way to do that in Python is to call the constructor.
So, with your existing code:
new_object = myclass(old_object.a, old_object.b, old_object.c)
If you look at most built-in and stdlib classes, it's a lot more convenient. For example:
a = set([1, 2, 3])
b = set(a)
How do they do that? Simple. Just define an __init__ method that can be called with an existing instance. (In the case of set, this comes for free, because a set can be initialized with any iterable, and sets are iterable.)
If you don't want to give up your existing design, you're going to need a pretty clumsy __init__, but it's at least doable. Maybe this:
_sentinel = object()
def __init__(myclass_or_a, b=_sentinel, c=_sentinel):
if isinstance(a, myclass):
self.a, self.b, self.c = myclass_or_a.a, myclass_or_a.b, myclass_or_a.c
else:
self.a, self.b, self.c = myclass_or_a, b, c
… plus some error handling to check that b is _sentinel in the first case and that it isn't in the other case.
So, however you do it:
constant_object = myclass(1,2,3)
variable_object = myclass(constant_object)
variable_object.a = 999

import copy
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
constant_object = myclass(1,2,3)
variable_object = copy.deepcopy(constant_object)
variable_object.a = 999
print constant_object.a
print variable_object.a
Output:
1
999

Deepcopying is not entirely necessary in this case, because of the way you've setup your tuple method
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
constant_object = myclass(1,2,3)
variable_object = myclass(*constant_object.tuple())
variable_object.a = 999
>>> constant_object.a
1
>>> variable_object.a
999
Usually (as others have suggested), you'd want to deepcopy. This creates a brand new object, with no ties to the object being copied. However, given that you are using only ints, deepcopy is overkill. You're better off doing a shallow copy. As a matter of fact, it might even be faster to call the class constructor on the parameters of the object you already have, seeing as these parameters are ints. This is why I suggested the above code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient looping and comparing properties of two similar objects - python

Related

Same name for instance method and static method in python

List Comprehension Function Pointer Python

Changing Method called without new instance of Python Class

Unpacking instance variables by making container iterable

Mutable objects in python and constants

Categories

Resources