Sentinel object and its applications? - python

I know in python the builtin object() returns a sentinel object. I'm curious to what it is, but mainly its applications.

object is the base class that all other classes inherit from in python 3. There's not a whole lot you can do with a plain old object. However an object's identity could be useful. For example the iter function takes a sentinel argument that signals when to stop termination. We could supply an object() to that.
sentinel = object()
def step():
inp = input('enter something: ')
if inp == 'stop' or inp == 'exit' or inp == 'done':
return sentinel
return inp
for inp in iter(step, sentinel):
print('you entered', inp)
This will ask for input until the user types stop, exit, or done. I'm not exactly sure when iter with a sentinel is more useful than a generator, but I guess it's interesting anyway.
I'm not sure if this answers your question. To be clear, this is just a possible application of object. Fundamentally its existence in the python language has nothing to do with it being usable as a sentinel value (to my knowledge).

Object identity and Classes in Python
You statement "I know in python the builtin object() returns a sentinel object." is slightly off, but not totally wrong, so let me first address that just to make sure we're on the same page:
object() in Python is merely the parent of all classes. In Python 2 this was for a while explicit. In Python 2 you had to write:
class Foo(object):
...
to get a so-called "new-style object". You could also define classes without that superclass, but that was only for backwards compatibility and not important for this question.
Today in Python 3, the object superclass is implicit. So all classes inherit from that class. As such, the two classes below are identical in Python 3:
class Foo:
pass
class Foo(object):
pass
Knowing this, we can slightly rephrase your initial statement:
... the builtin object() returns a sentinel object.
becomes then:
... the builtin object() returns an object instance of class "object"
So, when writing:
my_sentinel = object()
simply creates an empty object instance "somewhere in memory". That last part is importante, because by default, the builtin id() function and checks using ... is ..., rely on the memory address. For example:
>>> a = object()
>>> b = object()
>>> a is b
False
This gives you a way to create object instances that you can use to check for a certain kind of logic in your code that is otherwise very difficult or even impossible. That is the main use of "sentinel" objects.
Example use case: Making the difference between "None" and "Nothing/Uninitialised/Empty/..."
Sometimes the value None is a valid value for a variable and you may need to detect the difference between "empty" or something similar and None.
Let's assume you have a class doing lazy-loading for an expensive operation where "None" is a valid value. You can then write it like this:
#: sentinel value for uninitialised values
UNLOADED = object()
class MyLoader:
def __init__(self, remote_addr):
self.value = UNLOADED
self.remote_addr = remote_addr
def get(self):
if self.value is UNLOADED:
self.value = expensive_operation(self.remote_addr)
return self.value
Now expensive_operation can return any value. Even None or any other "falsy" value and the "caching" will work without unintended bugs. It also makes the code pretty readable as it communicates the intent pretty clearly to the reader of the code-block. You also save storage (albeit negilgable) for an additional "is_loaded" boolean value.
The same code using a boolean:
class MyLoader:
def __init__(self, remote_addr):
self.value = None
self.remote_addr = remote_addr
self.is_loaded = False # <- need for an additional variable
def get(self):
if not self.is_loaded:
self.value = expensive_operation(self.remote_addr)
self.is_loaded = True # <- source for a bug if this is forgotten
return self.value
or, using "None" as default:
class MyLoader:
def __init__(self, remote_addr):
self.value = None # <- We'll use this to detect load state
self.remote_addr = remote_addr
def get(self):
if self.value is None:
self.value = expensive_operation(self.remote_addr)
# If the above returned "None" we will never "cache" the result
return self.value
Final Thoughts
The above "MyLoader" example is just one example where sentinel values can be handy. They help making code more readable and more expressive. They also avoid certain types of bugs.
They are especially useful in areas where one is tempted to use None to signify a special value. Whenever you think something like "When X is the case, I will set the variable to None" it may be worth thinking about using a sentinel value. Because you now gave the value None a special meaning for a specific context.
Another such example would be to have special values for infinite integers. The concept of infinity only exists in floats. But if you want to ensure type-safety you may want to create your own "special" values like that to signify infinity.
Using sentinel values like that help distinguish between multiple different concepts which would otherwise be impossible. If you need many different "special" values and use None everywhere, you may end up using None from one concept in the context of another concept and end up with unintended side-effects which may be hard to debug. Imagine a contrived function like this:
SENTINEL_A = object()
SENTINEL_B = object()
def foobar(a = SENTINEL_A, b = SENTINEL_B):
if a is SENTINEL_A:
a = -12
if b is SENTINEL_B:
b = a * 2
print(a+b)
By using sentinels like this it becomes impossible to accidentally triggering the if-branches by mixing up the variables. For example, assume you refactor code and trip up somewhere, mixin a and b like this:
SENTINEL_A = object()
SENTINEL_B = object()
def foobar(a = SENTINEL_A, b = SENTINEL_B):
if b is SENTINEL_A: # <- bug: using *b* instead of *a*
a = -12
if b is SENTINEL_B:
b = a * 2
print(a+b)
In that case, the first if can never be true (unless the function is called improperly of course). If you would have used None as default instead, this bug would become harder to detect because you would end up with a = -12 in cases where you would not expect it.
In that sense, sentinels make your code more robust. And if logical-errors occur in your code they will be easier to find.
Having said all that, sentinel values are pretty rare. I personally find them very useful to avoid excessive usages of None for flagging special cases.

This is a source code example from the Python standard library for dataclasses on using sentinel values
# A sentinel object to detect if a parameter is supplied or not. Use
# a class to give it a better repr.
class _MISSING_TYPE:
pass
MISSING = _MISSING_TYPE()

Related

Make my custom list Class return 'list' as type

I am writing a custom class which extends the default python lists by adding some new functions such as shuffling , adding, multiplying etc.
The code goes something like this:
class xlist(list):
def __init__(self, original_list: list):
self._olist = original_list
def sumall(self) -> int:
sum = 0
for e in self._olist:
sum += e
return sum
...
But while doing some calculations I needed to get the type of a instance of a xlist. I want to do something like this :
>>> from xlist import xlist
>>> x = xlist([1, 2, 3])
>>> type(x)
When I do this I get <class 'xlist.xlist'> , but I want it to return list.
I am little confused about Metaclasses which seems to be able to solve the problem.
Any Help?
Why do you expect type(x) to return list if you're really creating an xlist? Your xlist inherits from list, so every xlist object is an instance of a list since it inherits from it all of its behaviour (and extends by adding some new functionality).
Note that:
x = xlist([1, 2, 3])
isinstance(x, list)
returns True. You might also want to have a look at Difference between type() and isinstance()
There are two ways for Python to check the class of an object - one is calling type and the other is checking the __class__ slot.
Most times both return the samething, but one can modify the class (for example, by customizing attribute access on the metaclass) so that __class__ will "lie" and Python code using myobject.__class__ will get the "false" information.
However, underneath, the "true" __class__ slot in the type object will always hold a reference to the real type - and this can't be falsified. Any C extension, and maybe even a few Python extensions, and the return to type(myobject) itself will see the real class.
Changing the contents of this slot actually changes the class of your instance. It is feasible from pure Python with a simple = attribution - but there are guards in place on this assignment to ensure it is only done across types that have a compatible memory layout. Forcing it to change to an incompatible type (Via an extension, or ctypes) will get your Python runtime to segfault.
All that said, there is no reason to lie about your class to users of your class -they should be able to "see" that the object the are holding is a xlist and not a list, and that xlists are also list objects, due to inheritance. Falsifying this information would be a rather bad practice. On the other hand, there are few calls, in Python stdlib itself, that require the underlying object to really be a list and won't accept subtypes (notoriously Python´s json.dumps serialization). That call has a native code path and won't be fooled by customizing access to __class__. However, the same call also has a Python only codepath that is triggered by setting some of the optional arguments (for example, by passing ident=4 on the call). If that is you are trying to achieve (fool some code that requires an strict list) you have to heck that, and if it is Python code, it is doable. In the specific case of json.dump, you'd be better monkeypatching the encoder to use a less strict check than to falsify your object - because I think the code there uses type for the checing.
So, with all of the above said, the "metaclass trick" to Falsify the return of .__class__ can be as simple as:
class xlist(list, metaclass=Meta):
def __init__(self, original_list: list):
self._olist = original_list
def sumall(self) -> int:
sum = 0
for e in self._olist:
sum += e
return sum
#property
def __class__(self):
return list

Make an object that behaves like a slice

How can we make a class represent itself as a slice when appropriate?
This didn't work:
class MyThing(object):
def __init__(self, start, stop, otherstuff):
self.start = start
self.stop = stop
self.otherstuff = otherstuff
def __index__(self):
return slice(self.start, self.stop)
Expected output:
>>> thing = MyThing(1, 3, 'potato')
>>> 'hello world'[thing]
'el'
Actual output:
TypeError: __index__ returned non-(int,long) (type slice)
Inheriting from slice doesn't work either.
TLDR: It's impossible to make custom classes replace slice for builtins types such as list and tuple.
The __index__ method exists purely to provide an index, which is by definition an integer in python (see the Data Model). You cannot use it for resolving an object to a slice.
I'm afraid that slice seems to be handled specially by python. The interface requires an actual slice; providing its signature (which also includes the indices method) is not sufficient. As you've found out, you cannot inherit from it, so you cannot create new types of slices. Even Cython will not allow you to inherit from it.
So why is slice special? Glad you asked. Welcome to the innards of CPython. Please wash your hands after reading this.
So slice objects are described in slice.rst. Note these two guys:
.. c:var:: PyTypeObject PySlice_Type
The type object for slice objects. This is the same as :class:slice in the
Python layer.
.. c:function:: int PySlice_Check(PyObject *ob)
Return true if ob is a slice object; ob must not be NULL.
Now, this is actually implemented in sliceobject.h as :
#define PySlice_Check(op) (Py_TYPE(op) == &PySlice_Type)
So only the slice type is allowed here. This check is actually used in list_subscript (and tuple subscript, ...) after attempting to use the index protocol (so having __index__ on a slice is a bad idea). A custom container class is free to overwrite __getitem__ and use its own rules, but that's how list (and tuple, ...) does it.
Now, why is it not possible to subclass slice? Well, type actually has a flag indicating whether something can be subclassed. It is checked here and generates the error you have seen:
if (!PyType_HasFeature(base_i, Py_TPFLAGS_BASETYPE)) {
PyErr_Format(PyExc_TypeError,
"type '%.100s' is not an acceptable base type",
base_i->tp_name);
return NULL;
}
I haven't been able to track down how slice (un)sets this value, but the fact that one gets this error means it does. This means you cannot subclass it.
Closing remarks: After remembering some long-forgotten C-(non)-skills, I'm fairly sure this is not about optimization in the strict sense. All existing checks and tricks would still work (at least those I've found).
After washing my hands and digging around in the internet, I've found a few references to similar "issues". Tim Peters has said all there is to say:
Nothing implemented in C is subclassable unless somebody volunteers the work
to make it subclassable; nobody volunteered the work to make the [insert name here]
type subclassable. It sure wasn't at the top of my list wink.
Also see this thread for a short discussion on non-subclass'able types.
Practically all alternative interpreters replicate the behavior to various degrees: Jython, Pyston, IronPython and PyPy (didn't find out how they do it, but they do).
I'M SORRY FOR THE DARK MAGIC
Using Forbiddenfruit and the python's builtin new method I was able to do this:
from forbiddenfruit import curse
class MyThing(int):
def __new__(cls, *args, **kwargs):
magic_slice = slice(args[0], args[1])
curse(slice, 'otherstuff', args[2])
return magic_slice
thing = MyThing(1, 3, 'thing')
print 'hello world'[thing]
print thing.otherstuff
output:
>>> el
>>> thing
I wrote it as a challenge just because everybody said it is impossible, I would never use it on production code IT HAS SO MANY SIDE EFFECTS, you should think again on your structure and needs
A slice can't be in your return type as the method just doesn't support this. You can read more about the __index__ special method here.
I could only come up with a workaround that directly calls the function in your class:
class MyThing(object):
def __init__(self, start, stop, otherstuff):
self.start = start
self.stop = stop
self.otherstuff = otherstuff
def __index__(self):
return slice(self.start, self.stop)
thing = MyThing(1, 3, 'potato')
print 'Hello World'[thing.__index__()]
This will return el.

Pythonic way to Implement Data Types (Python 2.7)

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.
A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.
class BoundInt:
def __init__(self, target = 0, low = 0, high = 1):
self.lowerLimit = low
self.upperLimit = high
self._value = target
self._balance()
def _balance(self):
if (self._value > self.upperLimit):
self._value = self.upperLimit
elif (self._value < self.lowerLimit):
self._value = self.lowerLimit
self._value = int(round(self._value))
def value(self):
self._balance()
return self._value
def set(self, target):
self._value = target
self._balance()
def __str__(self):
return str(self._value)
This is a good start, but it requires accessing the meat of these BoundInt types like so
x = BoundInt()
y = 4
x.set(y) #it would be nicer to do something like x = y
print y #prints "4"
print x #prints "1"
z = 2 + x.value() #again, it would be nicer to do z = 2 + x
print z #prints "3"
We can add a large number of python's "magic method" definitions to the class to add some more functionality:
def __add__(self, other):
return self._value + other
def __sub__(self, other):
return self._value - other
def __mul__(self, other):
return self._value * other
def __div__(self, other):
return self._value / other
def __pow__(self, power):
return self._value**power
def __radd__(self, other):
return self._value + other
#etc etc
Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.
I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?
There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.
However, there are some answers to what you want to do. I'll start with the most serious, then work backward.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
…
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:
def __init__(self, target, low, high):
self.target, self.low, self.high = int(target), int(low), int(high)
I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).
Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.
For example, imagine this C++ code:
BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);
This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.
Now look at the Python equivalent:
def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)
Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)
Now, what about forwarding all those operators to the value?
Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.
So, is that Pythonic? Well, it depends.
If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.
And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).
But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.
A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.
Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.
Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:
>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5
I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)
If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.
There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.
Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.
To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.
If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)
You would use this like:
B = BoundInt(1, 10)
x = B(7)
The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.
Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.
If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:
Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.
Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).
If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.
So I'd handle that approach something like this:
class BoundIntBase(object):
# Don't use this class directly; use a subclass that specifies low and high as
# class attributes.
def __init__(self, value):
self.value = min(self.high, max(self.low, int(value)))
def __int__(self):
return self.value
# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
def tmp(self, other):
try:
low = min(self.low, other.low)
high = max(self.high, other.high)
except AttributError:
cls = type(self)
else:
cls = BountInd(low, high)
v = getattr(int(self), method)(int(other))
return cls(v)
tmp.__name__ = method
setattr(BountIntBase, method, tmp)
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.
The type that behaves exactly like numbers in all situations needs
many special methods due to rich syntax support in Python (it seems no
other types require so much methods e.g., it is much simpler to define
types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several
ways to make the code less repetitive.
ABC classes such as
numbers.Integral
provide default implementations for some methods e.g., if __add__,
__radd__ are implemented in a subclass then __sub__, __rsub__
are available automatically.
fractions.Fraction
uses
_operator_fallbacks
to define __r*__ and provide fallback operators to
deal with other numeric types:
__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)
Python allows to generate/modify a class dynamically in a factory
function/metaclass e.g.,
Can anyone help condense this Python code?. Even
exec could be used in (very) rare cases e.g.,
namedtuple().
Numbers are immutable in Python so you should use __new__ instead of __init__.
Rare cases that are not covered by __new__ could be defined in
from_sometype(cls, d: sometype) -> your_type class methods. And in
reverse, cases that are not covered by special methods could use
as_sometype(self) -> sometype methods.
A simpler solution in your case might be to define a higher-level type
specific for your application domain. Number abstraction might be too
low-level e.g.,
decimal.Decimal is
more than 6 KLOC.

Python - Function attributes or mutable default values

Say you have a function that needs to maintain some sort of state and behave differently depending on that state. I am aware of two ways to implement this where the state is stored entirely by the function:
Using a function attribute
Using a mutable default value
Using a slightly modified version of Felix Klings answer to another question, here is an example function that can be used in re.sub() so that only the third match to a regex will be replaced:
Function attribute:
def replace(match):
replace.c = getattr(replace, "c", 0) + 1
return repl if replace.c == 3 else match.group(0)
Mutable default value:
def replace(match, c=[0]):
c[0] += 1
return repl if c[0] == 3 else match.group(0)
To me the first seems cleaner, but I have seen the second more commonly. Which is preferable and why?
I use closure instead, no side effects.
Here is the example (I've just modified the original example of Felix Klings answer):
def replaceNthWith(n, replacement):
c = [0]
def replace(match):
c[0] += 1
return replacement if c[0] == n else match.group(0)
return replace
And the usage:
# reset state (in our case count, c=0) for each string manipulation
re.sub(pattern, replaceNthWith(n, replacement), str1)
re.sub(pattern, replaceNthWith(n, replacement), str2)
#or persist state between calls
replace = replaceNthWith(n, replacement)
re.sub(pattern, replace, str1)
re.sub(pattern, replace, str2)
For mutable what should happen if somebody call replace(match, c=[])?
For attribute you broke encapsulation (yes i know that python didn't implemented in classes from diff reasons ...)
Both ways feel strange to me. The first though is much better. But when you think about it this way: "Something with a state that can do operations with that state and additional input", it really sounds like a normal object. And when something sounds like an object, it should be an object...
SO, my solution would be to use a simple object with a __call__ method:
class StatefulReplace(object):
def __init__(self, initial_c=0):
self.c = initial_c
def __call__(self, match):
self.c += 1
return repl if self.c == 3 else match.group(0)
And then you can write in the global space or your module init:
replace = StatefulReplace(0)
How about:
Use a class
Use a global variable
True, these are not stored entirely within the function. I would probably use a class:
class Replacer(object):
c = 0
#staticmethod # if you like
def replace(match):
replace.c += 1
...
To answer your actual question, use getattr. It's a very clear and readable way to store data away for later. It should be pretty obvious to someone reading it what you're trying to do.
The mutable default argument version is an example of a common programming error (assuming you'll get a new list every time). For that reason alone I would avoid it. Someone reading it later might decide that it's a good idea without fully understanding the consequences. And even in this case, it seems though your function would only work once (your c value is never reset to zero).
To me, both of these approaches look dodgy. The problem is crying out for a class instance. We don't normally think about functions as maintaining state between calls; that's what classes are for.
That said, I've used function attributes before for this sort of thing. Particularly if it's a one-shot function defined within other code (i.e. not possible for it to be used from anywhere else), just tacking on attributes to it is more concise than defining a whole new class and creating an instance of it.
I would never abuse default values for this. There's a large barrier to understanding because the natural purpose of default values is to provide default values of arguments, not to maintain state between calls. Plus a default argument invites you to supply a non-default value, and you typically get very strange behaviour if you do that with a function that is abusing default values to maintain state.

How to maintain lists and dictionaries between function calls in Python?

I have a function. Inside that I'm maintainfing a dictionary of values.
I want that dictionary to be maintained between different function calls
Suppose the dic is :
a = {'a':1,'b':2,'c':3}
At first call,say,I changed a[a] to 100
Dict becomes a = {'a':100,'b':2,'c':3}
At another call,i changed a[b] to 200
I want that dic to be a = {'a':100,'b':200,'c':3}
But in my code a[a] doesn't remain 100.It changes to initial value 1.
I need an answer ASAP....I m already late...Please help me friends...
You might be talking about a callable object.
class MyFunction( object ):
def __init__( self ):
self.rememberThis= dict()
def __call__( self, arg1, arg2 ):
# do something
rememberThis['a'] = arg1
return someValue
myFunction= MyFunction()
From then on, use myFunction as a simple function. You can access the rememberThis dictionary using myFunction.rememberThis.
You could use a static variable:
def foo(k, v):
foo.a[k] = v
foo.a = {'a': 1, 'b': 2, 'c': 3}
foo('a', 100)
foo('b', 200)
print foo.a
Rather than forcing globals on the code base (that can be the decision of the caller) I prefer the idea of keeping the state related to an instance of the function. A class is good for this but doesn't communicate well what you are trying to accomplish and can be a bit verbose. Taking advantage of closures is, in my opinion, a lot cleaner.
def function_the_world_sees():
a = {'a':1,'b':2,'c':3}
def actual_function(arg0, arg1):
a[arg0] = arg1
return a
return actual_function
stateful_function = function_the_world_sees()
stateful_function("b", 100)
stateful_function("b", 200)
The main caution to keep in mind is that when you make assignments in "actual_function", they occur within "actual_function". This means you can't reassign a to a different variable. The work arounds I use are to put all of my variables I plan to reassign into either into a single element list per variable or a dictionary.
If 'a' is being created inside the function. It is going out of scope. Simply create it outside the function(and before the function is called). By doing this the list/hash will not be deleted after the program leaves the function.
a = {'a':1,'b':2,'c':3}
# call you funciton here
This question doesn't have an elegant answer, in my opinion. The options are callable objects, default values, and attribute hacks. Callable objects are the right answer, but they bring in a lot of structure for what would be a single "static" declaration in another language. Default values are a minor change to the code, but it's kludgy and can be confusing to a new python programmer looking at your code. I don't like them because their existence isn't hidden from anyone who might be looking at your API.
I generally go with an attribute hack. My preferred method is:
def myfunct():
if not hasattr(myfunct, 'state'): myfunct.state = list()
# access myfunct.state in the body however you want
This keeps the declaration of the state in the first line of the function where it belongs, as well as keeping myfunct as a function. The downside is you do the attribute check every time you call the function. This is almost certainly not going to be a bottleneck in most code.
You can 'cheat' using Python's behavior for default arguments. Default arguments are only evaluated once; they get reused for every call of the function.
>>> def testFunction(persistent_dict={'a': 0}):
... persistent_dict['a'] += 1
... print persistent_dict['a']
...
>>> testFunction()
1
>>> testFunction()
2
This isn't the most elegant solution; if someone calls the function and passes in a parameter it will override the default, which probably isn't what you want.
If you just want a quick and dirty way to get the results, that will work. If you're doing something more complicated it might be better to factor it out into a class like S. Lott mentioned.
EDIT: Renamed the dictionary so it wouldn't hide the builtin dict as per the comment below.
Personally, I like the idea of the global statement. It doesn't introduce a global variable but states that a local identifier actually refers to one in the global namespace.
d = dict()
l = list()
def foo(bar, baz):
global d
global l
l.append(bar, baz)
d[bar] = baz
In python 3.0 there is also a "nonlocal" statement.

Categories

Resources