|= vs update with a subclass of collections.abc.Set - python

I need to subclass set so I subclassed collections.abc.Set, as suggested here: https://stackoverflow.com/a/6698723/211858.
Please find my simple implementation below.
It essentially wraps a set of integers.
I generate list of 10,000 MySet instances consisting of 100 random integers.
I would like to take the union of these wrapped sets.
I have two implementations below.
For some reason, the first using update is very fast, yet the second using |= is slow.
The tqdm wrapper is to conduct nonrigorous benchmarks.
Is there some way to correct the definition of the class to fix this performance issue?
Thanks!
I'm on Python 3.10.5.
from collections.abc import Iterable, Iterator, Set
from tqdm import tqdm
class MySet(Set):
def __init__(self, integers: Iterable[int]) -> None:
self.data: set[int] = set(integers)
def __len__(self) -> int:
return len(self.data)
def __iter__(self) -> Iterator[int]:
return iter(self.data)
def __contains__(self, x: object) -> bool:
if isinstance(x, int):
return x in self.data
else:
raise NotImplemented
def my_func(self):
...
def my_other_func(self):
...
# %%
import random
# Make some mock data
my_sets: list[MySet] = [
MySet(random.sample(range(1_000_000), 100)) for _ in range(10_000)
]
# %%
universe: set[int] = set()
universe2: set[int] = set()
# %%
# Nearly instant
for my_set in tqdm(my_sets):
universe.update(my_set)
# %%
# Takes well over 5 minutes on my laptop
for my_set in tqdm(my_sets):
universe2 |= my_set

Conclusion: The way to add the least code is to implement the __ior__ method.
What happens when there is no implementation:
When binary inplace or operation is performed for the first time, because universe2 is set and my_set is MySet, set cannot recognize the MySet class, so the binary inplace or operation will degenerate into a binary or operation.
As in point 1, the binary or operation of set will fail, so Python will try to call the __ror__ method of MySet.
Because MySet has no __ror__ method, Python will fall back to the collections.abc.Set. The __ror__ method of it is the same as the __or__ method and returns the result of type MySet. You can find it in the _collections_abc.py file:
class Set(Collection):
...
#classmethod
def _from_iterable(cls, it):
'''Construct an instance of the class from any iterable input.
Must override this method if the class constructor signature
does not accept an iterable for an input.
'''
return cls(it)
...
def __or__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain)
__ror__ = __or__
...
For the subsequent binary inplace or operation, because the first __ror__ operation changes universe2 to MySet type and neither MySet nor collections.abc.Set has the __ior__ method, so the collections.abc.Set.__or__ function will be called repeatly, and a copy will be made per loop. This is the root cause of the slow speed of the second loop. Therefore, as long as the __ior__ method is implemented to avoid copying of subsequent operations, the performance will be greatly improved.
Suggestions for better implementation: The abstract class collections.abc.Set represents an immutable set. For this reason, it does not implement the inplace operation method. If you need your subclass to support inplace operation, you should consider inheriting collections.abc.MutableSet and implementing the add and discard abstract methods. Mutableset implements the inplace operation methods such as __ior__ through these two abstract methods (of course, it is still not efficient compared with the built-in set, so it is better to implement them by yourself):
class MutableSet(Set):
...
def __ior__(self, it):
for value in it:
self.add(value)
return self
...
Correction: There are some mistakes in the old answers. Here we want to correct them. I hope those who have read the old answers can see here:
Mistacke 1:
If necessary, you can also implement the __ior__ method, but it is not recommended to implement it separately when neither __or__ nor __ror__ methods are implemented, because Python will try to call the __ior__ method when it cannot find their implementation, which will make the behavior of non inplace operations become inplace operations, and may lead to unexpected results.
Correction: the binary or operation does not call the __ior__ method when the __or__ and __ror__ are missing.
Mistacke 2:
Generally speaking, binary operations between instances of different types may expect to get the type results of left operands, such as set and frozenset:
>>> {1} | frozenset({2})
{1, 2}
>>> frozenset({2}) | {1}
frozenset({1, 2})
Correction: This is not always true. For example, the __ror__ operation of collections.abc.Set also returns its subtype instead of the type of the left operand.

Related

Regrading Python methods in same class [duplicate]

I am trying to implement method overloading in Python:
class A:
def stackoverflow(self):
print 'first method'
def stackoverflow(self, i):
print 'second method', i
ob=A()
ob.stackoverflow(2)
but the output is second method 2; similarly:
class A:
def stackoverflow(self):
print 'first method'
def stackoverflow(self, i):
print 'second method', i
ob=A()
ob.stackoverflow()
gives
Traceback (most recent call last):
File "my.py", line 9, in <module>
ob.stackoverflow()
TypeError: stackoverflow() takes exactly 2 arguments (1 given)
How do I make this work?
It's method overloading, not method overriding. And in Python, you historically do it all in one function:
class A:
def stackoverflow(self, i='some_default_value'):
print('only method')
ob=A()
ob.stackoverflow(2)
ob.stackoverflow()
See the Default Argument Values section of the Python tutorial. See "Least Astonishment" and the Mutable Default Argument for a common mistake to avoid.
See PEP 443 for information about the single dispatch generic functions added in Python 3.4:
>>> from functools import singledispatch
>>> #singledispatch
... def fun(arg, verbose=False):
... if verbose:
... print("Let me just say,", end=" ")
... print(arg)
>>> #fun.register(int)
... def _(arg, verbose=False):
... if verbose:
... print("Strength in numbers, eh?", end=" ")
... print(arg)
...
>>> #fun.register(list)
... def _(arg, verbose=False):
... if verbose:
... print("Enumerate this:")
... for i, elem in enumerate(arg):
... print(i, elem)
You can also use pythonlangutil:
from pythonlangutil.overload import Overload, signature
class A:
#Overload
#signature()
def stackoverflow(self):
print('first method')
#stackoverflow.overload
#signature("int")
def stackoverflow(self, i):
print('second method', i)
While agf was right with the answer in the past, pre-3.4, now with PEP-3124 we got our syntactic sugar.
See typing documentation for details on the #overload decorator, but note that this is really just syntactic sugar and IMHO this is all people have been arguing about ever since.
Personally, I agree that having multiple functions with different signatures makes it more readable then having a single function with 20+ arguments all set to a default value (None most of the time) and then having to fiddle around using endless if, elif, else chains to find out what the caller actually wants our function to do with the provided set of arguments. This was long overdue following the Python Zen:
Beautiful is better than ugly.
and arguably also
Simple is better than complex.
Straight from the official Python documentation linked above:
from typing import overload
#overload
def process(response: None) -> None:
...
#overload
def process(response: int) -> Tuple[int, str]:
...
#overload
def process(response: bytes) -> str:
...
def process(response):
<actual implementation>
EDIT: for anyone wondering why this example is not working as you'd expect if from other languages I'd suggest to take a look at this discussion. The #overloaded functions are not supposed to have any actual implementation. This is not obvious from the example in the Python documentation.
In Python, you don't do things that way. When people do that in languages like Java, they generally want a default value (if they don't, they generally want a method with a different name). So, in Python, you can have default values.
class A(object): # Remember the ``object`` bit when working in Python 2.x
def stackoverflow(self, i=None):
if i is None:
print 'first form'
else:
print 'second form'
As you can see, you can use this to trigger separate behaviour rather than merely having a default value.
>>> ob = A()
>>> ob.stackoverflow()
first form
>>> ob.stackoverflow(2)
second form
You can't, never need to and don't really want to.
In Python, everything is an object. Classes are things, so they are objects. So are methods.
There is an object called A which is a class. It has an attribute called stackoverflow. It can only have one such attribute.
When you write def stackoverflow(...): ..., what happens is that you create an object which is the method, and assign it to the stackoverflow attribute of A. If you write two definitions, the second one replaces the first, the same way that assignment always behaves.
You furthermore do not want to write code that does the wilder of the sorts of things that overloading is sometimes used for. That's not how the language works.
Instead of trying to define a separate function for each type of thing you could be given (which makes little sense since you don't specify types for function parameters anyway), stop worrying about what things are and start thinking about what they can do.
You not only can't write a separate one to handle a tuple vs. a list, but also don't want or need to.
All you do is take advantage of the fact that they are both, for example, iterable (i.e. you can write for element in container:). (The fact that they aren't directly related by inheritance is irrelevant.)
I write my answer in Python 3.2.1.
def overload(*functions):
return lambda *args, **kwargs: functions[len(args)](*args, **kwargs)
How it works:
overload takes any amount of callables and stores them in tuple functions, then returns lambda.
The lambda takes any amount of arguments,
then returns result of calling function stored in functions[number_of_unnamed_args_passed] called with arguments passed to the lambda.
Usage:
class A:
stackoverflow=overload( \
None, \
#there is always a self argument, so this should never get called
lambda self: print('First method'), \
lambda self, i: print('Second method', i) \
)
I think the word you're looking for is "overloading". There isn't any method overloading in Python. You can however use default arguments, as follows.
def stackoverflow(self, i=None):
if i != None:
print 'second method', i
else:
print 'first method'
When you pass it an argument, it will follow the logic of the first condition and execute the first print statement. When you pass it no arguments, it will go into the else condition and execute the second print statement.
I write my answer in Python 2.7:
In Python, method overloading is not possible; if you really want access the same function with different features, I suggest you to go for method overriding.
class Base(): # Base class
'''def add(self,a,b):
s=a+b
print s'''
def add(self,a,b,c):
self.a=a
self.b=b
self.c=c
sum =a+b+c
print sum
class Derived(Base): # Derived class
def add(self,a,b): # overriding method
sum=a+b
print sum
add_fun_1=Base() #instance creation for Base class
add_fun_2=Derived()#instance creation for Derived class
add_fun_1.add(4,2,5) # function with 3 arguments
add_fun_2.add(4,2) # function with 2 arguments
In Python, overloading is not an applied concept. However, if you are trying to create a case where, for instance, you want one initializer to be performed if passed an argument of type foo and another initializer for an argument of type bar then, since everything in Python is handled as object, you can check the name of the passed object's class type and write conditional handling based on that.
class A:
def __init__(self, arg)
# Get the Argument's class type as a String
argClass = arg.__class__.__name__
if argClass == 'foo':
print 'Arg is of type "foo"'
...
elif argClass == 'bar':
print 'Arg is of type "bar"'
...
else
print 'Arg is of a different type'
...
This concept can be applied to multiple different scenarios through different methods as needed.
In Python, you'd do this with a default argument.
class A:
def stackoverflow(self, i=None):
if i == None:
print 'first method'
else:
print 'second method',i
Python does not support method overloading like Java or C++. We may overload the methods, but we can only use the latest defined method.
# First sum method.
# Takes two argument and print their sum
def sum(a, b):
s = a + b
print(s)
# Second sum method
# Takes three argument and print their sum
def sum(a, b, c):
s = a + b + c
print(s)
# Uncommenting the below line shows an error
# sum(4, 5)
# This line will call the second sum method
sum(4, 5, 5)
We need to provide optional arguments or *args in order to provide a different number of arguments on calling.
Courtesy Python | Method Overloading
I just came across overloading.py (function overloading for Python 3) for anybody who may be interested.
From the linked repository's README file:
overloading is a module that provides function dispatching based on
the types and number of runtime arguments.
When an overloaded function is invoked, the dispatcher compares the
supplied arguments to available function signatures and calls the
implementation that provides the most accurate match.
Features
Function validation upon registration and detailed resolution rules
guarantee a unique, well-defined outcome at runtime. Implements
function resolution caching for great performance. Supports optional
parameters (default values) in function signatures. Evaluates both
positional and keyword arguments when resolving the best match.
Supports fallback functions and execution of shared code. Supports
argument polymorphism. Supports classes and inheritance, including
classmethods and staticmethods.
Python 3.x includes standard typing library which allows for method overloading with the use of #overload decorator. Unfortunately, this is to make the code more readable, as the #overload decorated methods will need to be followed by a non-decorated method that handles different arguments.
More can be found here here but for your example:
from typing import overload
from typing import Any, Optional
class A(object):
#overload
def stackoverflow(self) -> None:
print('first method')
#overload
def stackoverflow(self, i: Any) -> None:
print('second method', i)
def stackoverflow(self, i: Optional[Any] = None) -> None:
if not i:
print('first method')
else:
print('second method', i)
ob=A()
ob.stackoverflow(2)
Python added the #overload decorator with PEP-3124 to provide syntactic sugar for overloading via type inspection - instead of just working with overwriting.
Code example on overloading via #overload from PEP-3124
from overloading import overload
from collections import Iterable
def flatten(ob):
"""Flatten an object to its component iterables"""
yield ob
#overload
def flatten(ob: Iterable):
for o in ob:
for ob in flatten(o):
yield ob
#overload
def flatten(ob: basestring):
yield ob
is transformed by the #overload-decorator to:
def flatten(ob):
if isinstance(ob, basestring) or not isinstance(ob, Iterable):
yield ob
else:
for o in ob:
for ob in flatten(o):
yield ob
In the MathMethod.py file:
from multipledispatch import dispatch
#dispatch(int, int)
def Add(a, b):
return a + b
#dispatch(int, int, int)
def Add(a, b, c):
return a + b + c
#dispatch(int, int, int, int)
def Add(a, b, c, d):
return a + b + c + d
In the Main.py file
import MathMethod as MM
print(MM.Add(200, 1000, 1000, 200))
We can overload the method by using multipledispatch.
There are some libraries that make this easy:
functools - if you only need the first argument use #singledispatch
plum-dispatch - feature rich method/function overloading.
multipledispatch - alternative to plum less features but lightweight.
python 3.5 added the typing module. This included an overload decorator.
This decorator's intended purpose it to help type checkers. Functionally its just duck typing.
from typing import Optional, overload
#overload
def foo(index: int) -> str:
...
#overload
def foo(name: str) -> str:
...
#overload
def foo(name: str, index: int) -> str:
...
def foo(name: Optional[str] = None, index: Optional[int] = None) -> str:
return f"name: {name}, index: {index}"
foo(1)
foo("bar", 1)
foo("bar", None)
This leads to the following type information in vs code:
And while this can help, note that this adds lots of "weird" new syntax. Its purpose - purely type hints - is not immediately obvious.
Going with Union of types usually is a better option.

Special method like __str__ that returns a number representation of an object

Say I have a Python class as follows:
class TestClass():
value = 20
def __str__(self):
return str(self.value)
The __str__ method will automatically be called any time I try to use an instance of TestClass as a string, like in print. Is there any equivalent for treating it as a number? For example, in
an_object = TestClass()
if an_object > 30:
...
where some hypothetical __num__ function would be automatically called to interpret the object as a number. How could this be easily done?
Ideally I'd like to avoid overloading every normal mathematical operator.
You can provide __float__(), __int__(), and/or __complex__() methods to convert objects to numbers. There is also a __round__() method you can provide for custom rounding. Documentation here. The __bool__() method technically fits here too, since Booleans are a subclass of integers in Python.
While Python does implicitly convert objects to strings for e.g. print(), it never converts objects to numbers without you saying to. Thus, Foo() + 42 isn't valid just because Foo has an __int__ method. You have to explicitly use int() or float() or complex() on them. At least that way, you know what you're getting just by reading the code.
To get classes to actually behave like numbers, you have to implement all the special methods for the operations that numbers participate in, including arithmetic and comparisons. As you note, this gets annoying. You can, however, write a mixin class so that at least you only have to write it once. Such as:
class NumberMixin(object):
def __eq__(self, other): return self.__num__() == self.__getval__(other)
# other comparison methods
def __add__(self, other): return self.__num__() + self.__getval__(other)
def __radd__(self, other): return self.__getval__(other) + self.__num__()
# etc., I'm not going to write them all out, are you crazy?
This class expects two special methods on the class it's mixed in with.
__num__() - converts self to a number. Usually this will be an alias for the conversion method for the most precise type supported by the object. For example, your class might have __int__() and __float__() methods, but __int__() will truncate the number, so you assign __num__ = __float__ in your class definition. On the other hand, if your class has a natural integral value, you might want to provide __float__ so it can also be converted to a float, but you'd use __num__ = __int__ since it should behave like an integer.
__getval__() - a static method that obtains the numeric value from another object. This is useful when you want to be able to support operations with objects other than numeric types. For example, when comparing, you might want to be able to compare to objects of your own type, as well as to traditional numeric types. You can write __getval__() to fish out the right attribute or call the right method of those other objects. Of course with your own instances you can just rely on float() to do the right thing, but __getval__() lets you be as flexible as you like in what you accept.
A simple example class using this mixin:
class FauxFloat(NumberMixin):
def __init__(self, value): self.value = float(value)
def __int__(self): return int(self.value)
def __float__(self): return float(self.value)
def __round__(self, digits=0): return round(self.value, digits)
def __str__(self): return str(self.value)
__repr__ = __str__
__num__ = __float__
#staticmethod
def __getval__(obj):
if isinstance(obj, FauxFloat):
return float(obj)
if hasattr(type(obj), "__num__") and callable(type(obj).__num__):
return type(obj).__num__(obj) # don't call dunder method on instance
try:
return float(obj)
except TypeError:
return int(obj)
ff = FauxFloat(42)
print(ff + 13) # 55.0
For extra credit, you could register your class so it'll be seen as a subclass of an appropriate abstract base class:
import numbers
numbers.Real.register(FauxFloat)
issubclass(FauxFloat, numbers.Real) # True
For extra extra credit, you might also create a global num() function that calls __num__() on objects that have it, otherwise falling back to the older methods.
In case of numbers it a bit more complicated. But its possible! You have to override your class operators to fit your needs.
operator.__lt__(a, b) # lower than
operator.__le__(a, b) # lower equal
operator.__eq__(a, b) # equal
operator.__ne__(a, b) # not equal
operator.__ge__(a, b) # greater equial
operator.__gt__(a, b) # greater than
Python Operators
Looks like you need __gt__ method.
class A:
val = 0
def __gt__(self, other):
return self.val > other
a = A()
a.val = 12
a > 10
If you just wanna cast object to int - you should define __int__ method (or __float__).

Overriding special methods on builtin types

Can magic methods be overridden outside of a class?
When I do something like this
def __int__(x):
return x + 5
a = 5
print(int(a))
it prints '5' instead of '10'. Do I do something wrong or magic methods just can't be overridden outside of a class?
Short answer; not really.
You cannot arbitrarily change the behaviour of int() a builtin function (*which internally calls __int__()) on arbitrary builtin types such as int(s).
You can however change the behaviour of custom objects like this:
Example:
class Foo(object):
def __init__(self, value):
self.value = value
def __add__(self, other):
self.value += other
def __repr__(self):
return "<Foo(value={0:d})>".format(self.value)
Demo:
>>> x = Foo(5)
>>> x + 5
>>> x
<Foo(value=10)>
This overrides two things here and implements two special methods:
__repr__() which get called by repr()
__add__() which get called by the + operator.
Update: As per the comments above; techincally you can redefine the builtin function int; Example:
def int(x):
return x + 5
int(5) # returns 10
However this is not recommended and does not change the overall behaviour of the object x.
Update #2: The reason you cannot change the behaviour of bultin types (without modifying the underlying source or using Cuthon or ctypes) is because builtin types in Python are not exposed or mutable to the user unlike Homoiconic Languages (See: Homoiconicity). -- Even then I'm not really sure you can with Cython/ctypes; but the reason question is "Why do you want to do this?"
Update #3: See Python's documentation on Data Model (object.__complex__ for example).
You can redefine a top-level __int__ function, but nobody ever calls that.
As implied in the Data Model documentation, when you write int(x), that calls x.__int__(), not __int__(x).
And even that isn't really true. First, __int__ is a special method, meaning it's allowed to call type(x).__int__(x) rather than x.__int__(), but that doesn't matter here. Second, it's not required to call __int__ unless you give it something that isn't already an int (and call it with the one-argument form). So, it could be as if it's was written like this:
def int(x, base=None):
if base is not None:
return do_basey_stuff(x, base)
if isinstance(x, int):
return x
return type(x).__int__(x)
So, there is no way to change what int(5) will do… short of just shadowing the builtin int function with a different builtin/global/local function of the same name, of course.
But what if you wanted to, say, change int(5.5)? That's not an int, so it's going to call float.__int__(5.5). So, all we have to do is monkeypatch that, right?
Well, yes, except that Python allows builtin types to be immutable, and most of the builtin types in CPython are. So, if you try it:
>>> _real_float_int = float.__int__
>>> def _float_int(self):
... return _real_float_int(self) + 5
>>> _float_int(5.5)
10
>>> float.__int__ = _float_int
TypeError: can't set attributes of built-in/extension type 'float'
However, if you're defining your own types, that's a different story:
>>> class MyFloat(float):
... def __int__(self):
... return super().__int__() + 5
>>> f = MyFloat(5.5)
>>> int(f)
10

Redeclaration of the method "in" within a class

I am creating an Abstract Data Type, which create a doubly linked list (not sure it's the correct translation). In it I have create a method __len__ to calcucate the length of it in the correct way, a method __repr__ to represent it correctly, but I wan't now to create a method which, when the user will make something like:
if foo in liste_adt
will return the correct answer, but I don't know what to use, because __in__ is not working.
Thank you,
Are you looking for __contains__?
object.__contains__(self, item)
Called to implement membership test operators. Should return true if item is in self, false otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs.
For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.
Quick example:
>>> class Bar:
... def __init__(self, iterable):
... self.list = list(iterable)
... def __contains__(self, item):
... return item in self.list
>>>
>>> b = Bar([1,2,3])
>>> b.list
[1, 2, 3]
>>> 4 in b
False
>>> 2 in b
True
Note: Usually when you have this kind of doubts references can be found in the Data Model section of the The Python Language Reference.
Since the data structure is a linked list, it is necessary to iterate over it to check membership. Implementing an __iter__() method would make both if in and for in work. If there is a more efficient way for checking membership, implement that in __contains__().

Understanding python object membership for sets

If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection.
However, this does not seem to be the case for sets:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data-other.data
a = MyObject(5)
b = MyObject(5)
print a in [b] //evaluates to True, as I'd expect
print a in set([b]) //evaluates to False
How is an object membership tested in a set, then?
Adding a __hash__ method to your class yields this:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data - other.data
def __hash__(self):
return hash(self.data)
a = MyObject(5)
b = MyObject(5)
print a in [b] # True
print a in set([b]) # Also True!
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.
Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).
As others pointed, your objects don't have a __hash__ so they use the default id as a hash, and you can override it as Nathon suggested, BUT read the docs about __hash__, specifically the points about when you should and should not do that.
A set uses a dict behind the scenes, so the "in" statement is checking whether the object exists as a key in the dict. Since your object doesn't implement a hash function, the default hash function for objects uses the object's id. So even though a and b are equivalent, they're not the same object, and that's what's being tested.

Categories

Resources