Modifying the immutable class instance - python

I've got a somewhat complex data-type Mask that I'd like to be able to use fast identity checking for such as:
seen = set()
m = Mask(...)
if m in seen:
...
This suggests that Mask should be hashable and therefore immutable. However, I'd like to generate variants of m and Mask seems like the place to encapsulate the variation logic. Here is a minimalish example that demonstrates what I want to accomplish:
class Mask(object):
def __init__(self, seq):
self.seq = seq
self.hash = reduce(lambda x, y: x ^ y, self.seq)
# __hash__ and __cmp__ satisfy the hashable contract §3.4.1)
def __hash__(self):
return self.hash
def __cmp__(self, other):
return cmp(self.seq, other.seq)
def complement(self):
# cannot modify self without violating immutability requirement
# so return a new instance instead
return Mask([-x for x in self.seq])
This satisfies all of the hashable and immutable properties. The peculiarity is having what is effectively a Factory method complement; is this a reasonable way to implement the desired function? Note that I am not at all interested in protecting against "malicious" modification as many related questions on SO are looking to achieve.
As this example is intentionally small it could be trivially solved by making a tuple of seq. The type that I am actually working with does not afford such simple casting.

Yes, this is pretty much how you write an immutable class; methods that would otherwise change an object's state become, in your terms, "factories" that create new instances.
Note that in the specific case of computing a complement, you can name the method __invert__ and use it as
inv_mask = ~mask
Operator syntax is a strong signal to client code that operations return new values of the same type.

When using immutable types, all 'variation' methods must return new objects/instances, and thus be factories in that sense.
Many languages make strings immutable (including Python). So all string operations return new strings; strB = strA.replace(...) etc.
If you could change the instance at all, it wouldn't be immutable.
Edit:
Rereading, you're asking if this is reasonable, and I say yes. The logic would not be better put somewhere else, and as I pointed out with string immutability it is a common paradigm to get new variations from existing ones.

Python doesn't inforce immutability. It is up to you to make sure that objects are not modified while they are in a set. Otherwise you don't need immutability to use sets, dictionaries in Python.

Related

How to find out a specific method in python is "in-place" or not?

I couldn't find a convenient way to distinguish in-place methods from assignable methods in python.
I mean for example a method like my_list.sort() don't need assignment and it does changes itself (it is in-place right?), but some other methods need assignment to a variable.
am I wrong ?
The reason you can't easily find such a distinction is that it really doesn't exist except by tenuous convention. "In-place" just means that a function modifies the data of a mutable argument, rather than returning an all new object. If "Not in-place" is taken to mean that a function returns a new object encapsulating updated data, leaving the input alone, then there are simply too many other possible conventions to be worth cataloging.
The standard library does its best to follow the convention of not returning values from single-argument in-place functions. So for example you have list.sort, list.append, random.shuffle and heapq.heapify all operate in-place, returning None. At the same time, you have functions and methods that create new objects, and must therefore return them, like sorted, list.__add__ and tuple.__iadd__. But you also have in-place methods that must return a value like list.__iadd__ (compare to list.extend which does not return a value).
__iadd__ and similar in-place operators emphasize a very important point, which is that in-place operation is not an option for immutable objects. Python has a workaround for this:
x = (1, 2)
y = (3, 4)
x += y
For all objects, the third line is equivalent to
x = type(x).__iadd__(x, y)
Ignoring the fact that the method is called as a function, notice that the name x is reassigned, so even if x += y has to create a new object (e.g., because tuple is immutable), you can still see it through the name x. Mutable objects will generally just return x in this case, so the method call will appear not to return a value, even when it really does.
As an interesting aside, the reassignment sometimes causes an unexpected error:
>>> z = ([],)
>>> z[0].extend([1, 2]) # OK
>>> z[0] += [3, 4] # Error! But list is mutable!
Many third party libraries, such as numpy, support the convention of in-place functions without a return value, up to a point. Most numpy functions create new objects, like np.cumsum, np.add, and np.sort. However, there are in also functions and methods that operate in-place and return None, like np.ndarray.sort and np.random.shuffle.
numpy can work with large memory buffers, which means that in-place operation is often desirable. Instead of having a separate in-place version of the function, some functions and methods (most notably universal functions) have an out parameter that can be set to the input, like np.cumsum, np.ndarray.cumsum, and np.add. In these cases, the function will operate in-place, but still return a reference to the out parameter, much in the same way that python's in-place operators do.
An added complication is that not all functions and methods perform a single action on a single object. You can write a class like this to illustrate:
class Test:
def __init__(self, value):
self.value = value
def op(self, other):
other.value += self.value
return self
This class modifies another object, but returns a reference to the unmodified self. While contrived, the example serves to illustrate that the in-place/not-in-place paradigm is not all-encompassing.
TL;DR
In the end, the general concept of in-place is often useful, but can't replace the need for reading documentation and understanding what each function does on an individual basis. This will also save you from many common gotchas with mutable objects supporting in-place operations vs immutable ones just emulating them.

How can I make my Python `set` and `frozenset` subclasses preserve their types when engaging in binary operations?

I have some set and frozenset subclasses, OCDSet and OCDFrozenSet respectively. When I use them together with instances of their ancestor classes in binary operations, the ancestor classes dominate the type of the result – by which I mean, when I do something like subtract an OCDFrozenSet from a frozenset, I get a frozenset… but the same is true if I reverse the types in the operation (i.e. subtract a frozenset from an OCDFrozenSet.
Like so:
… what is especially counterintuitively vexing to me is the fact that using -= (subtract-in-place) mutates the type of the existing instance!
My knowledge of how to deal with this sort of thing comes strictly from C++, where the type of the operation is a forgone conclusion that is explicitly specified in a (likely templated) operator-overload function; in Python the type system is often much more implicit, but it isn’t so mutably unpredictable as that in-place operation would have me now believe.
So, what is the most expedient way to address this – I assume it involves overriding some double-underscored instance methods in the subclasses of interest?
The in-place operations doesn't guarantee that they will update the object the in-place, it completely depends on the type of the object.
Tuple, frozenset etc are immutable types, hence it is not possible to update them in-place.
From library reference on in-place operators:
For immutable targets such as strings, numbers, and tuples, the updated value is computed, but not assigned back to the input variable.
Similarly the frozenset docs also mention the same thing about in-place operations[source]:
The following table lists operations available for set that do not apply to immutable instances of frozenset.
Now, as your OCDFrozenSet doesn't implements __isub__, it will fallback to __sub__ method which will return the type of base class frozenset. The base class is used because Python has no idea about the arguments your base class would expect on the newly created frozenset from the __sub__ operation.
More importantly this was a bug in Python 2 where such operation returned the subclass instance, the fix was only ported to Python 3 though to prevent breaking existing systems.
To get the expected output you can provide the required methods in your subclass:
class OCDFrozenSet(frozenset):
def __sub__(self, other):
return type(self)(super().__sub__(other))
def __rsub__(self, other):
return type(self)(super().__rsub__(other))

(kind of) Singleton Pattern in Python using Hash and `==`

I am looking into ways to use a quasi Singleton Pattern in python. Quick problem description:
I have objects that describe subset of a certain group. For simplicity assume integer numbers like in set([1,2,3]). In my case, comparison is difficult and if possible expensive, so I assume that if I have
complex_set1 = ...
complex_set2 = ...
these are different. Also, all set are immutable (like frozenset). Except there exists the full and empty sets for convenience
full_set = FullSet()
empty_set = EmptySet()
These seem to make sense to be singletons. One way would be to create one instance and just add it to the package on import. So that there exists only one and you cannot create another.
Now my idea:
Since I do not care if I have multiple objects as long as they are considered the same in any case (besides a is b is obviously false) I just make them look equal (like a singleton would). An example would be
len(set([FullSet(), FullSet()]))
>>> 1
So, I experimented with
def hash(self):
return 0 # make sure all have the same hash
def __eq__(self, other):
if is instance(other, FullSet):
return True
return NotImplemented
Does this have a name? Is it considered a singleton pattern or something else?
Should I use this or are there caveats to be aware of?
Any comments on the hash value which is also used for other purposes than comparison? Does it make more sense to use e.g. return hash(FullSet)

Custom comparison functions for built-in types in Python

I am using Python's built-in sets to hold objects of a class I have defined. For this class, I defined __eq__, __ne__, and __hash__ so that I can compare objects by my custom comparison functions. That works just fine, until I find out that I actually need two sets of comparison functions, which will be used in different ways at different times in my code.
I can't define two sets of __eq__, etc. methods in my class, and Python's built-in set type does not accept a comparator argument. I suppose I could go write a wrapper class around set, but that seems like a lot more work than necessary.
Is there any easier solution to this than writing my own set class?
Let's say you have this class:
class Thingy(object):
def __init__(self, key, notkey):
self.key, self.notkey = key, notkey
def __eq__(self, other):
return self.key == other.key
def __hash__(self):
return hash(self.key)
Now, you want to put these in a set, but keyed by notkey instead of key. You can't do that as-is, because a set expects its elements to have a consistent meaning for equality—and also a consistent meaning for hash such that a == b always implies hash(a) == hash(b). So, create a wrapper:
class WrappedThingy(object):
def __init__(self, thingy):
self.thingy = thingy
def __eq__(self, other):
return self.thingy.notkey == other.thingy.notkey
def __hash__(self):
return hash(self.thingy.notkey)
And you can put those in a set:
wts = set(WrappedThingy(thingy) for thingy in thingies)
For example, let's say you want to uniquify your thingies, keeping exactly one thingy (arbitrarily) for each notkey value. Just wrap them, stick the the wrappers in a set, then unwrap them and stick the unwrappees in a list:
wts = set(WrappedThingy(thingy) for thingy in thingies)
thingies = [wt.thingy for wt in wts]
This is part of a more general Python pattern called "DSU". This stands for "decorate-sort-undecorate", which is wildly inaccurate nowadays, since you almost never need it for sorting-related tasks in modern Python… but historically it made sense. Feel free to call it "decorate-process-undecorate" in hopes that it will catch on, but don't hope too hard.
The reason you don't need DSU for sorting nowadays is that most sorting functions all take key functions as arguments. In fact, even for uniquifying, the unique_everseen function in the itertools recipes takes a key.
But if you look at what it does under the covers, it's basically DSU:
for element in iterable:
k = key(element)
if k not in seen:
seen.add(k)
yield element
(The fact that it's a generator rather than a list-building function means it can "undecorate on the fly", which makes things a bit simpler. But otherwise, same idea.)

Pythonic way to Implement Data Types (Python 2.7)

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.
A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.
class BoundInt:
def __init__(self, target = 0, low = 0, high = 1):
self.lowerLimit = low
self.upperLimit = high
self._value = target
self._balance()
def _balance(self):
if (self._value > self.upperLimit):
self._value = self.upperLimit
elif (self._value < self.lowerLimit):
self._value = self.lowerLimit
self._value = int(round(self._value))
def value(self):
self._balance()
return self._value
def set(self, target):
self._value = target
self._balance()
def __str__(self):
return str(self._value)
This is a good start, but it requires accessing the meat of these BoundInt types like so
x = BoundInt()
y = 4
x.set(y) #it would be nicer to do something like x = y
print y #prints "4"
print x #prints "1"
z = 2 + x.value() #again, it would be nicer to do z = 2 + x
print z #prints "3"
We can add a large number of python's "magic method" definitions to the class to add some more functionality:
def __add__(self, other):
return self._value + other
def __sub__(self, other):
return self._value - other
def __mul__(self, other):
return self._value * other
def __div__(self, other):
return self._value / other
def __pow__(self, power):
return self._value**power
def __radd__(self, other):
return self._value + other
#etc etc
Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.
I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?
There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.
However, there are some answers to what you want to do. I'll start with the most serious, then work backward.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
…
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:
def __init__(self, target, low, high):
self.target, self.low, self.high = int(target), int(low), int(high)
I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).
Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.
For example, imagine this C++ code:
BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);
This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.
Now look at the Python equivalent:
def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)
Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)
Now, what about forwarding all those operators to the value?
Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.
So, is that Pythonic? Well, it depends.
If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.
And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).
But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.
A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.
Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.
Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:
>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5
I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)
If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.
There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.
Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.
To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.
If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)
You would use this like:
B = BoundInt(1, 10)
x = B(7)
The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.
Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.
If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:
Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.
Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).
If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.
So I'd handle that approach something like this:
class BoundIntBase(object):
# Don't use this class directly; use a subclass that specifies low and high as
# class attributes.
def __init__(self, value):
self.value = min(self.high, max(self.low, int(value)))
def __int__(self):
return self.value
# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
def tmp(self, other):
try:
low = min(self.low, other.low)
high = max(self.high, other.high)
except AttributError:
cls = type(self)
else:
cls = BountInd(low, high)
v = getattr(int(self), method)(int(other))
return cls(v)
tmp.__name__ = method
setattr(BountIntBase, method, tmp)
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.
The type that behaves exactly like numbers in all situations needs
many special methods due to rich syntax support in Python (it seems no
other types require so much methods e.g., it is much simpler to define
types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several
ways to make the code less repetitive.
ABC classes such as
numbers.Integral
provide default implementations for some methods e.g., if __add__,
__radd__ are implemented in a subclass then __sub__, __rsub__
are available automatically.
fractions.Fraction
uses
_operator_fallbacks
to define __r*__ and provide fallback operators to
deal with other numeric types:
__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)
Python allows to generate/modify a class dynamically in a factory
function/metaclass e.g.,
Can anyone help condense this Python code?. Even
exec could be used in (very) rare cases e.g.,
namedtuple().
Numbers are immutable in Python so you should use __new__ instead of __init__.
Rare cases that are not covered by __new__ could be defined in
from_sometype(cls, d: sometype) -> your_type class methods. And in
reverse, cases that are not covered by special methods could use
as_sometype(self) -> sometype methods.
A simpler solution in your case might be to define a higher-level type
specific for your application domain. Number abstraction might be too
low-level e.g.,
decimal.Decimal is
more than 6 KLOC.

Categories

Resources