Let's say we have a class which has an instance method that accepts another instance of that class, and then returns a new instance of that class.
An example of an this type of class is an integer. It has the __mul__ method, which accepts another integer and returns an integer, which is the product of both numbers.
Here's the problem. I have a class that implements a method like __mul__. I have a list of instances of this class, and I want to apply the aforementioned method of the last object to the object before it, then take the result of that and apply it to the one before it, etc., until we have processed the entire list, and have ourselves one object.
A concrete example looks like this. Imagine we have a list of objects...
my_objs = [do, re, me, fa, so, la, te, do]
... And imagine they have the "combine" method, which follows the pattern outlined above, and we want to apply the procedure I outlined to it. You might think of it like this ...
my_objs_together = do.combine(re.combine(me.combine(fa.combine(so.combine(la.combine(te.combine(do)))))))
That's pretty gnarly, obviously. This makes me want to write a generic function like this...
def together(list_of_objects, method_name):
combined = list_of_objects[0]
for obj in list_of_objects[1:]:
combined = getattr(combined, method_name)(obj)
return combined
...But it occurs to me that there's likely already a standard library function that does this, right?
It's reduce! (I was in the middle of writing the question when I found it :/)
https://docs.python.org/2/library/functions.html#reduce
Apply function of two arguments cumulatively to the items of iterable,
from left to right, so as to reduce the iterable to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5).
Related
I need a container that can collect a number of objects and provides some reporting functionality on the container's elements. Essentially, I'd like to be able to do:
magiclistobject = MagicList()
magiclistobject.report() ### generates all my needed info about the list content
So I thought of subclassing the normal list and adding a report() method. That way, I get to use all the built-in list functionality.
class SubClassedList(list):
def __init__(self):
list.__init__(self)
def report(self): # forgive the silly example
if 999 in self:
print "999 Alert!"
Instead, I could also create my own class that has a magiclist attribute but I would then have to create new methods for appending, extending, etc., if I want to get to the list using:
magiclistobject.append() # instead of magiclistobject.list.append()
I would need something like this (which seems redundant):
class MagicList():
def __init__(self):
self.list = []
def append(self,element):
self.list.append(element)
def extend(self,element):
self.list.extend(element)
# more list functionality as needed...
def report(self):
if 999 in self.list:
print "999 Alert!"
I thought that subclassing the list would be a no-brainer. But this post here makes it sounds like a no-no. Why?
One reason why extending list might be bad is since it ties together your 'MagicReport' object too closely to the list. For example, a Python list supports the following methods:
append
count
extend
index
insert
pop
remove
reverse
sort
It also contains a whole host of other operations (adding, comparisons using < and >, slicing, etc).
Are all of those operations things that your 'MagicReport' object actually wants to support? For example, the following is legal Python:
b = [1, 2]
b *= 3
print b # [1, 2, 1, 2, 1, 2]
This is a pretty contrived example, but if you inherit from 'list', your 'MagicReport' object will do exactly the same thing if somebody inadvertently does something like this.
As another example, what if you try slicing your MagicReport object?
m = MagicReport()
# Add stuff to m
slice = m[2:3]
print type(slice)
You'd probably expect the slice to be another MagicReport object, but it's actually a list. You'd need to override __getslice__ in order to avoid surprising behavior, which is a bit of a pain.
It also makes it harder for you to change the implementation of your MagicReport object. If you end up needing to do more sophisticated analysis, it often helps to be able to change the underlying data structure into something more suited for the problem.
If you subclass list, you could get around this problem by just providing new append, extend, etc methods so that you don't change the interface, but you won't have any clear way of determining which of the list methods are actually being used unless you read through the entire codebase. However, if you use composition and just have a list as a field and create methods for the operations you support, you know exactly what needs to be changed.
I actually ran into a scenario very similar to your at work recently. I had an object which contained a collection of 'things' which I first internally represented as a list. As the requirements of the project changed, I ended up changing the object to internally use a dict, a custom collections object, then finally an OrderedDict in rapid succession. At least in my experience, composition makes it much easier to change how something is implemented as opposed to inheritance.
That being said, I think extending list might be ok in scenarios where your 'MagicReport' object is legitimately a list in all but name. If you do want to use MagicReport as a list in every single way, and don't plan on changing its implementation, then it just might be more convenient to subclass list and just be done with it.
Though in that case, it might be better to just use a list and write a 'report' function -- I can't imagine you needing to report the contents of the list more than once, and creating a custom object with a custom method just for that purpose might be overkill (though this obviously depends on what exactly you're trying to do)
As a general rule, whenever you ask yourself "should I inherit or have a member of that type", choose not to inherit. This rule of thumb is known as "favour composition over inheritance".
The reason why this is so is: composition is appropriate where you want to use features of another class; inheritance is appropriate if other code needs to use the features of the other class with the class you are creating.
I am trying to write a primes module in python. One thing I would like to be able to write is
>>> primes.primesLessThan(12)
[2, 3, 5, 7, 11]
However, I would also like to be able to write
>>> primes.primesLessThan.Sundaram(12)
[2, 3, 5, 7, 11]
to force it to use the Sieve of Sundaram. My original idea was to make primesLessThan a class with several static methods, but since __init__ can't return anything, this didn't let me achieve the first example. Would this be better done as a seperate module that primes imports or is there something else I missed?
As a rule of thumb, if you have a class without any instance variables, an empty init method and just a bunch of static methods, then its probably going to be simpler to organize it as a module instead.
#sieves module
def Sundaram(n):
return [2,3,5,7]
def Eratosthenes(n):
return [2,3,5,7]
And then you can use the functions from the module
import primes.sieves
primes.sieves.Sundaram(12)
Finally, python functions are first class and can be passed around in function parameter or stored in data structures. This means that if you ever need to write some code that depends on an algorithm choice, you can just pass that as a parameter.
def test_first_primes(algorithm):
return algorithm(10) == [2,3,5,7]
print (test_first_primes(Sundaram))
print (test_first_primes(Eratosthenes))
Two ways I can think of, to get these kinds of semantics.
Make primes a class, and then make primesLessThan a property. It would also be a class, which implements __iter__ etc. to simulate a list, while also having some subfunctions. primesLessThan would be a constructor to that class, with the argument having a default to allow passing through.
Make primes itself support __getitem__/__iter__/etc. You can still use properties (with default), but make primesLessThan just set some internal variable in the class, and then return self. This lets you do them in any order i.e. primes.Sundaram.primesLessThan(12) would work the same way as primes.primesLessThan.Sundaram(12), though, that looks strange to me.
Either one of these are going to be a bit weird on the return values... you can create something that acts like a list, but it obviously won't be. You can have repr show it like a list, and you'll be able to iterate over like a list (i.e. for prime in primes.Sundaram(12)), but it can't return an actual list for obvious reasons....
Out of curiosity is more desirable to explicitly pass functions to other functions, or let the function call functions from within. is this a case of Explicit is better than implicit?
for example (the following is only to illustrate what i mean)
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(xs,ys):
return partialfun(sum(map(operator.mul,xs,ys)))
>>> bar([1,2,3], [4,5,6])
--or--
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> bar(partialfun, [1,2,3], [4,5,6])
There's not really any difference between functions and anything else in this situation. You pass something as an argument if it's a parameter that might vary over different invocations of the function. If the function you are calling (bar in your example) is always calling the same other function, there's no reason to pass that as an argument. If you need to parameterize it so that you can use many different functions (i.e., bar might need to call many functions besides partialfun, and needs to know which one to call), then you need to pass it as an argument.
Generally, yes, but as always, it depends. What you are illustrating here is known as dependency injection. Generally, it is a good idea, as it allows separation of variability from the logic of a given function. This means, for example, that it will be extremely easy for you to test such code.
# To test the process performed in bar(), we can "inject" a function
# which simply returns its argument
def dummy(x):
return x
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> assert bar(dummy, [1,2,3], [4,5,6]) == 32
It depends very much on the context.
Basically, if the function is an argument to bar, then it's the responsibility of the caller to know how to implement that function. bar doesn't have to care. But consequently, bar's documentation has to describe what kind of function it needs.
Often this is very appropriate. The obvious example is the map builtin function. map implements the logic of applying a function to each item in a list, and giving back a list of results. map itself neither knows nor cares about what the items are, or what the function is doing to them. map's documentation has to describe that it needs a function of one argument, and each caller of map has to know how to implement or find a suitable function. But this arrangement is great; it allows you to pass a list of your custom objects, and a function which operates specifically on those objects, and map can go away and do its generic thing.
But often this arrangement is inappropriate. A function gives a name to a high level operation and hides the internal implementation details, so you can think of the operation as a unit. Allowing part of its operation to be passed in from outside as a function parameter exposes that it works in a way that uses that function's interface.
A more concrete (though somewhat contrived) example may help. Lets say I've implemented data types representing Person and Job, and I'm writing a function name_and_title for formatting someone's full name and job title into a string, for client code to insert into email signatures or on letterhead or whatever. It's obviously going to take a Person and Job. It could potentially take a function parameter to let the caller decide how to format the person's name: something like lambda firstname, lastname: lastname + ', ' + firstname. But to do this is to expose that I'm representing people's names with a separate first name and last name. If I want to change to supporting a middle name, then either name_and_title won't be able to include the middle name, or I have to change the type of the function it accepts. When I realise that some people have 4 or more names and decide to change to storing a list of names, then I definitely have to change the type of function name_and_title accepts.
So for your bar example, we can't say which is better, because it's an abstract example with no meaning. It depends on whether the call to partialfun is an implementation detail of whatever bar is supposed to be doing, or whether the call to partialfun is something that the caller knows about (and might want to do something else). If it's "part of" bar, then it shouldn't be a parameter. If it's "part of" the caller, then it should be a parameter.
It's worth noting that bar could have a huge number of function parameters. You call sum, map, and operator.mul, which could all be parameterised to make bar more flexible:
def bar(fn, xs,ys, g, h, i):
return fn(g(h(i,xs,ys))
And the way in which g is called on the output of h could be abstracted too:
def bar(fn, xs, ys, g, h, i, j):
return fn(j(g, h(i, xs, ys)))
And we can keep going on and on, until bar doesn't do anything at all, and everything is controlled by the functions passed in, and the caller might as well have just directly done what they want done rather than writing 100 functions to do it and passing those to bar to execute the functions.
So there really isn't a definite answer one way or the other that applies all the time. It depends on the particular code you're writing.
I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Questions I've looked at:
python's sum() and non-integer values
why there's a start argument in python's built-in sum function
TypeError after overriding the __add__ method
I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?
Instead of sum, use:
import operator
from functools import reduce
reduce(operator.add, seq)
in Python 2 reduce was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.
Also note: (Warning: maths rant ahead)
Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
Note that all of:
naturals
reals
complex numbers
N-d vectors
NxM matrices
strings
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).
Thanks for expanding, I'll refer to your particular module now:
There are 2 concepts here:
Simple locations,
Compound locations.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
Indeed, this appears to work.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).
As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.
I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.
In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?
>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
... return builtin_sum(iterable, startobj)
...
Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)
Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:
sum([o1, o2]) => o1 + o2 # obviously
sum([o1]) => o1 # But how should __add__ be called here? Not at all?
sum([]) => ? # What now?
You could use an object that's universally neutral wrt. addition:
class Neutral:
def __add__(self, other):
return other
print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
You could so something like:
from operator import add
try:
total = reduce(add, whatever) # or functools.reduce in Py3.x
except TypeError as e:
# I'm not 100% happy about branching on the exception text, but
# figure this msg isn't likely to be changed after so long...
if e.args[0] == 'reduce() of empty sequence with no initial value':
pass # do something appropriate here if necessary
else:
pass # Most likely that + isn't usable between objects...
Would you please look at code below,
def getCrewListAll(self):
"""
Set to crew list to all available crew
"""
crew = getIdNumbers()
return map(lambda cr: cr.id, crew)
What is the meaning of cr.id here, is id a builtin python function or?
In your example, the reference to cr.id is not a function call. Its accessing a member attribute of the cr variable, which is a placeholder for whatever the crew objects are.
id is the name of a builtin function, yes. But there is no way to know if its being used under the hood in these objects without seeing the actual class definitions.
As an example... if this were, say, a django application, model instances have id member attributes that give you the database id of that record. Its part of the design of the class for that framework.
Even though I am assuming its an attribute... for all we know it could also be a computed property which is similar to a method call that acts like an attribute. It could be doing more logic that it seems when looking at your example.
Lastly, since the cr.id could be anything, it could even be a method and the map is returning a lis of callables: cr.id()
cr.id isn't a builtin function (unless you've assigned it to be...), it's a normal member of the cr object there.
id(cr) would be an invocation of that builtin and would return the identity of cr.
I think the real problem here is that you don't understand the code you have posted.
In this context you need to understand map and lambda.
map is a function which applies a function to each element of a list and returns this as a list:
>>> def func(a):
... return a * 2
...
>>> map(func, [1,2,3])
[2, 4, 6]
lambda can be seen as a shortcut to create functions. The above could be written with lambda:
>>> func = lambda a: a * 2
>>> map(func, [1,2,3])
[2, 4, 6]
So what your code map(lambda cr: cr.id, crew) is doing: It returns a list of the id attribute from each of the objects in the list crew.
The problem is that this code is actually not pretty good. You can write the same function with a list comprehension, which is much more intuitive:
def getCrewListAll(self):
return [cr.id for cr in getIdNumbers()]
there is a built in id function but may or may not be related to this, depending on the implementation of that object, either way id here is a member/property of that object.
If you are curious to see what kind of members/fields that object has you can do dir(crew[0]) assuming its retuning at least one, and if its properly document you can also do this help(crew[0].id)