Make an object that behaves like a slice - python

How can we make a class represent itself as a slice when appropriate?
This didn't work:
class MyThing(object):
def __init__(self, start, stop, otherstuff):
self.start = start
self.stop = stop
self.otherstuff = otherstuff
def __index__(self):
return slice(self.start, self.stop)
Expected output:
>>> thing = MyThing(1, 3, 'potato')
>>> 'hello world'[thing]
'el'
Actual output:
TypeError: __index__ returned non-(int,long) (type slice)
Inheriting from slice doesn't work either.

TLDR: It's impossible to make custom classes replace slice for builtins types such as list and tuple.
The __index__ method exists purely to provide an index, which is by definition an integer in python (see the Data Model). You cannot use it for resolving an object to a slice.
I'm afraid that slice seems to be handled specially by python. The interface requires an actual slice; providing its signature (which also includes the indices method) is not sufficient. As you've found out, you cannot inherit from it, so you cannot create new types of slices. Even Cython will not allow you to inherit from it.
So why is slice special? Glad you asked. Welcome to the innards of CPython. Please wash your hands after reading this.
So slice objects are described in slice.rst. Note these two guys:
.. c:var:: PyTypeObject PySlice_Type
The type object for slice objects. This is the same as :class:slice in the
Python layer.
.. c:function:: int PySlice_Check(PyObject *ob)
Return true if ob is a slice object; ob must not be NULL.
Now, this is actually implemented in sliceobject.h as :
#define PySlice_Check(op) (Py_TYPE(op) == &PySlice_Type)
So only the slice type is allowed here. This check is actually used in list_subscript (and tuple subscript, ...) after attempting to use the index protocol (so having __index__ on a slice is a bad idea). A custom container class is free to overwrite __getitem__ and use its own rules, but that's how list (and tuple, ...) does it.
Now, why is it not possible to subclass slice? Well, type actually has a flag indicating whether something can be subclassed. It is checked here and generates the error you have seen:
if (!PyType_HasFeature(base_i, Py_TPFLAGS_BASETYPE)) {
PyErr_Format(PyExc_TypeError,
"type '%.100s' is not an acceptable base type",
base_i->tp_name);
return NULL;
}
I haven't been able to track down how slice (un)sets this value, but the fact that one gets this error means it does. This means you cannot subclass it.
Closing remarks: After remembering some long-forgotten C-(non)-skills, I'm fairly sure this is not about optimization in the strict sense. All existing checks and tricks would still work (at least those I've found).
After washing my hands and digging around in the internet, I've found a few references to similar "issues". Tim Peters has said all there is to say:
Nothing implemented in C is subclassable unless somebody volunteers the work
to make it subclassable; nobody volunteered the work to make the [insert name here]
type subclassable. It sure wasn't at the top of my list wink.
Also see this thread for a short discussion on non-subclass'able types.
Practically all alternative interpreters replicate the behavior to various degrees: Jython, Pyston, IronPython and PyPy (didn't find out how they do it, but they do).

I'M SORRY FOR THE DARK MAGIC
Using Forbiddenfruit and the python's builtin new method I was able to do this:
from forbiddenfruit import curse
class MyThing(int):
def __new__(cls, *args, **kwargs):
magic_slice = slice(args[0], args[1])
curse(slice, 'otherstuff', args[2])
return magic_slice
thing = MyThing(1, 3, 'thing')
print 'hello world'[thing]
print thing.otherstuff
output:
>>> el
>>> thing
I wrote it as a challenge just because everybody said it is impossible, I would never use it on production code IT HAS SO MANY SIDE EFFECTS, you should think again on your structure and needs

A slice can't be in your return type as the method just doesn't support this. You can read more about the __index__ special method here.
I could only come up with a workaround that directly calls the function in your class:
class MyThing(object):
def __init__(self, start, stop, otherstuff):
self.start = start
self.stop = stop
self.otherstuff = otherstuff
def __index__(self):
return slice(self.start, self.stop)
thing = MyThing(1, 3, 'potato')
print 'Hello World'[thing.__index__()]
This will return el.

Related

Make my custom list Class return 'list' as type

I am writing a custom class which extends the default python lists by adding some new functions such as shuffling , adding, multiplying etc.
The code goes something like this:
class xlist(list):
def __init__(self, original_list: list):
self._olist = original_list
def sumall(self) -> int:
sum = 0
for e in self._olist:
sum += e
return sum
...
But while doing some calculations I needed to get the type of a instance of a xlist. I want to do something like this :
>>> from xlist import xlist
>>> x = xlist([1, 2, 3])
>>> type(x)
When I do this I get <class 'xlist.xlist'> , but I want it to return list.
I am little confused about Metaclasses which seems to be able to solve the problem.
Any Help?
Why do you expect type(x) to return list if you're really creating an xlist? Your xlist inherits from list, so every xlist object is an instance of a list since it inherits from it all of its behaviour (and extends by adding some new functionality).
Note that:
x = xlist([1, 2, 3])
isinstance(x, list)
returns True. You might also want to have a look at Difference between type() and isinstance()
There are two ways for Python to check the class of an object - one is calling type and the other is checking the __class__ slot.
Most times both return the samething, but one can modify the class (for example, by customizing attribute access on the metaclass) so that __class__ will "lie" and Python code using myobject.__class__ will get the "false" information.
However, underneath, the "true" __class__ slot in the type object will always hold a reference to the real type - and this can't be falsified. Any C extension, and maybe even a few Python extensions, and the return to type(myobject) itself will see the real class.
Changing the contents of this slot actually changes the class of your instance. It is feasible from pure Python with a simple = attribution - but there are guards in place on this assignment to ensure it is only done across types that have a compatible memory layout. Forcing it to change to an incompatible type (Via an extension, or ctypes) will get your Python runtime to segfault.
All that said, there is no reason to lie about your class to users of your class -they should be able to "see" that the object the are holding is a xlist and not a list, and that xlists are also list objects, due to inheritance. Falsifying this information would be a rather bad practice. On the other hand, there are few calls, in Python stdlib itself, that require the underlying object to really be a list and won't accept subtypes (notoriously Python´s json.dumps serialization). That call has a native code path and won't be fooled by customizing access to __class__. However, the same call also has a Python only codepath that is triggered by setting some of the optional arguments (for example, by passing ident=4 on the call). If that is you are trying to achieve (fool some code that requires an strict list) you have to heck that, and if it is Python code, it is doable. In the specific case of json.dump, you'd be better monkeypatching the encoder to use a less strict check than to falsify your object - because I think the code there uses type for the checing.
So, with all of the above said, the "metaclass trick" to Falsify the return of .__class__ can be as simple as:
class xlist(list, metaclass=Meta):
def __init__(self, original_list: list):
self._olist = original_list
def sumall(self) -> int:
sum = 0
for e in self._olist:
sum += e
return sum
#property
def __class__(self):
return list

Is it possible to proxy a Python str and make join work?

I'm trying to implement a lazy-evaluated str-like class. What I have now is simething like
class LazyString(object):
__class__ = str
def __init__(self, func):
self._func = func
def __str__(self):
return self._func()
which works fine (for my purposes) in most cases, except one: str.join:
' '.join(['this', LazyString(lambda: 'works')])
fails with
TypeError: sequence item 1: expected string, LazyString found
And after some poking around there doesn't seem to be any magic functions available behind this. join seems to be hard-coded inside the core implementation, and only instances of limited built-in type can make it work without actually being a str.
So am I really out of options here, or is there another way that I'm not aware of?
join takes strings, so give it strings:
' '.join(map(str, ['this', LazyString(lambda: 'works')]))
Python does not have support for the kind of transparent lazy evaluation you're looking for. If you want to force evaluation of a lazy object, you will have to do so explicitly, rather than having it done automatically when needed. Sometimes, Python will call some method of your object that you can rely on, such as __nonzero__ if you want a lazy boolean, but not always, and you won't generally be able to achieve full interoperability.

Pythonic way to Implement Data Types (Python 2.7)

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.
A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.
class BoundInt:
def __init__(self, target = 0, low = 0, high = 1):
self.lowerLimit = low
self.upperLimit = high
self._value = target
self._balance()
def _balance(self):
if (self._value > self.upperLimit):
self._value = self.upperLimit
elif (self._value < self.lowerLimit):
self._value = self.lowerLimit
self._value = int(round(self._value))
def value(self):
self._balance()
return self._value
def set(self, target):
self._value = target
self._balance()
def __str__(self):
return str(self._value)
This is a good start, but it requires accessing the meat of these BoundInt types like so
x = BoundInt()
y = 4
x.set(y) #it would be nicer to do something like x = y
print y #prints "4"
print x #prints "1"
z = 2 + x.value() #again, it would be nicer to do z = 2 + x
print z #prints "3"
We can add a large number of python's "magic method" definitions to the class to add some more functionality:
def __add__(self, other):
return self._value + other
def __sub__(self, other):
return self._value - other
def __mul__(self, other):
return self._value * other
def __div__(self, other):
return self._value / other
def __pow__(self, power):
return self._value**power
def __radd__(self, other):
return self._value + other
#etc etc
Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.
I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?
There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.
However, there are some answers to what you want to do. I'll start with the most serious, then work backward.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
…
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:
def __init__(self, target, low, high):
self.target, self.low, self.high = int(target), int(low), int(high)
I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).
Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.
For example, imagine this C++ code:
BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);
This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.
Now look at the Python equivalent:
def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)
Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)
Now, what about forwarding all those operators to the value?
Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.
So, is that Pythonic? Well, it depends.
If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.
And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).
But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.
A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.
Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.
Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:
>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5
I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)
If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.
There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.
Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.
To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.
If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)
You would use this like:
B = BoundInt(1, 10)
x = B(7)
The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.
Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.
If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:
Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.
Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).
If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.
So I'd handle that approach something like this:
class BoundIntBase(object):
# Don't use this class directly; use a subclass that specifies low and high as
# class attributes.
def __init__(self, value):
self.value = min(self.high, max(self.low, int(value)))
def __int__(self):
return self.value
# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
def tmp(self, other):
try:
low = min(self.low, other.low)
high = max(self.high, other.high)
except AttributError:
cls = type(self)
else:
cls = BountInd(low, high)
v = getattr(int(self), method)(int(other))
return cls(v)
tmp.__name__ = method
setattr(BountIntBase, method, tmp)
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.
The type that behaves exactly like numbers in all situations needs
many special methods due to rich syntax support in Python (it seems no
other types require so much methods e.g., it is much simpler to define
types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several
ways to make the code less repetitive.
ABC classes such as
numbers.Integral
provide default implementations for some methods e.g., if __add__,
__radd__ are implemented in a subclass then __sub__, __rsub__
are available automatically.
fractions.Fraction
uses
_operator_fallbacks
to define __r*__ and provide fallback operators to
deal with other numeric types:
__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)
Python allows to generate/modify a class dynamically in a factory
function/metaclass e.g.,
Can anyone help condense this Python code?. Even
exec could be used in (very) rare cases e.g.,
namedtuple().
Numbers are immutable in Python so you should use __new__ instead of __init__.
Rare cases that are not covered by __new__ could be defined in
from_sometype(cls, d: sometype) -> your_type class methods. And in
reverse, cases that are not covered by special methods could use
as_sometype(self) -> sometype methods.
A simpler solution in your case might be to define a higher-level type
specific for your application domain. Number abstraction might be too
low-level e.g.,
decimal.Decimal is
more than 6 KLOC.

Is making in-place operations return the object a bad idea?

I'm talking mostly about Python here, but I suppose this probably holds for most languages. If I have a mutable object, is it a bad idea to make an in-place operation also return the object? It seems like most examples just modify the object and return None. For example, list.sort.
Yes, it is a bad idea. The reason is that if in-place and non-in-place operations have apparently identical output, then programmers will frequently mix up in-place operations and non-in-place operations (List.sort() vs. sorted()) and that results in hard-to-detect errors.
In-place operations returning themselves can allow you to perform "method chaining", however, this is bad practice because you may bury functions with side-effects in the middle of a chain by accident.
To prevent errors like this, method chains should only have one method with side-effects, and that function should be at the end of the chain. Functions before that in the chain should transform the input without side-effects (for instance, navigating a tree, slicing a string, etc.). If in-place operations return themselves then a programmer is bound to accidentally use it in place of an alternative function that returns a copy and therefore has no side effects (again, List.sort() vs. sorted()) which may result in an error that is difficult to debug.
This is the reason Python standard library functions always either return a copy or return None and modify objects in-place, but never modify objects in-place and also return themselves. Other Python libraries like Django also follow this practice (see this very similar question about Django).
Returning the modified object from the method that modified it can have some benefits, but is not recommended in Python. Returning self after a modification operation will allow you to perform method chaining on the object, which is a convenient way of executing several methods on the same object, it's a very common idiom in object-oriented programming. And in turn, method chaining allows a straightforward implementation of fluent interfaces. Also, it allows some functional-programming idioms to be expressed more easily.
To name a few examples: in Python, the Moka library uses method chaining. In Java, the StringBuilder class allows multiple append() invocations on the same object. In JavaScript, JQuery uses method chaining extensively. Smalltalk takes this idea to the next level: by default, all methods return self unless otherwise specified (therefore encouraging method chaining) - contrast this with Python, which returns None by default.
The use of this idiom is not common in Python, because Python abides by the Command/Query Separation Principle, which states that "every method should either be a command that performs an action, or a query that returns data to the caller, but not both".
All things considered, whether it's a good or bad idea to return self at the end is a matter of programming culture and convention, mixed with personal taste. As mentioned above, some programming languages encourage this (like Smalltalk) whereas others discourage it (like Python). Each point of view has advantages and disadvantages, open to heated discussions. If you're a by-the-book Pythonist, better refrain from returning self - just be aware that sometimes it can be useful to break this rule.
I suppose it depends on the use case. I don't see why returning an object from an in-place operation would hurt, other than maybe you won't use the result, but that's not really a problem if you're not being super-fastidious about pure functionalism. I like the call-chaining pattern, such as jQuery uses, so I appreciate it when functions return the object they've acted upon, in case I want to use it further.
The answers here about not returning from in-place operations messed me up for a bit until I came across this other SO post that links to the Python documentation (which I thought I read, but must have only skimmed). The documentation, in reference to in-place operators, says:
These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self).
When I tried to use the in-place operation without returning self, then it became None. In this example, it will say vars requires an object with __dict__. Looking at the type of self there shows None.
# Skipping type enforcement and such.
from copy import copy
import operator
import imported_utility # example.
class A:
def __init__(self, a, b):
self.a = a
self.b = b
def one(self, scaler):
self *= scaler
return imported_utility(vars(self))
def two(self, scaler):
tmp = self * scaler
return imported_utility(vars(tmp))
def three(self, scaler):
return imported_utility(vars(self * scaler))
# ... addition, subtraction, etc.; as below.
def __mul__(self, other):
tmp = copy(self)
tmp._inplace_operation(other, operator.imul)
return tmp
def __imul__(self, other): # fails.
self._inplace_operation(other, operator.imul)
# Fails for __imul__.
def _inplace_operation(self, other, op):
self.a = op(self.a, other)
self.b = op(self.b, other)
* works (two and three), but *= (one) does not until self is returned.
def __imul__(self, other):
return self._inplace_operation(other, operator.imul)
def _inplace_operation(self, other, op):
self.a = op(self.a, other)
self.b = op(self.b, other)
return self
I do not fully understand this behavior, but a follow-up comment to the referenced post, says without returning self, the in-place method is truly modifying that object, but rebinding its name to None. Unless self is returned, Python does not know what to rebind to. That behavior can be seen by keeping a separate reference to the object.

Parameter names in Python functions that take single object or iterable

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Categories

Resources