I have a Python class with functions and properties like this:
#property
def xcoords(self):
' Returns numpy array. '
try:
return self.x_coords
except:
self.x_coords = self._read_coords('x')
return self.x_coords
def _read_coords(self, type):
# read lots of stuff from big file
return array
This allows me to do this: data.xcoords, nice and simple.
I want to keep this as it is, however I want to define functions which allow me to do this:
data.xcoords.mm
data.xcoords.in
How do I do it? I also want these function to work for other properties of the class such as data.zcoords.mm.
If you really want xcoords to return a numpy array, then people may not expect the value of xcoords to have mm and in_ methods. You should think about whether mm and in_ are really properties of the arrays themselves, or if they are properties of the class you're defining. In the latter case, I would recommend against subclassing ndarray -- just define them as methods of the containing class.
On the other hand, if these are definitely properties of the thing returned by xcoords, then subclassing ndarray is a reasonable approach. Be sure to get it right by defining __new__ and __array_finalize__ as discussed in the docs.
To decide whether you should subclass ndarray, you might consider whether you can see yourself reusing this class elsewhere in your program. (You don't actually have to use it elsewhere, right now -- you just have to be able to see yourself reusing it at some point.) If you can't, then these are probably properties of the containing class. The line of reasoning here is that -- thinking in terms of functions -- if you have a short function foo and a short function bar, and know you will never call them any other way than foo(bar(x)), you might be better off writing foo_bar instead. The same logic applies to classes.
Finally, as larsmans pointed out, in is a keyword in python, and so isn't available for use in this case (which is why I used in_ above).
Related
This is really a design question and I would like to know a bit of what design patterns to use.
I have a module, let's say curves.py that defines a Bezier class. Then I want to write a function intersection which uses a recursive algorithm to find the intersections between two instances of Bezier.
What options do I have for where to put this functions? What are some best practices in this case? Currently I have written the function in the module itself (and not as a method to the class).
So currently I have something like:
def intersections(inst1, inst2): ...
def Bezier(): ...
and I can call the function by passing two instances:
from curves import Bezier, intersections
a = Bezier()
b = Bezier()
result = intersections(a, b)
However, another option (that I can think of) would be to make intersection a method of the class. In this case I would instead use
a.intersections(b)
For me the first choice makes a bit more sense since it feels more natural to call intersections(a, b) than a.intersections(b). However, the other option feels more natural in the sense that the function intersection really only acts on Bezier instances and this feels more encapsulated.
Do you think one of these is better than the other, and in that case, for what reasons? Are there any other design options to use here? Are there any best practices?
As an example, you can compare how the builtin set class does this:
intersection(*others)
set & other & ...
Return a new set with elements common to the set and all others.
So intersection is defined as a regular instance method on the class that takes another (or multiple) sets and returns the intersection, and it can be called as a.intersection(b).
However, due to the standard mechanics of how instance methods work, you can also spell it set.intersection(a, b) and in practice you'll see this quite often since like you say it feels more natural.
You can also override the __and__ method so this becomes available as a & b.
In terms of ease of use, putting it on the class is also friendlier, because you can just import the Bezier class and have all associated features available automatically, and they're also discoverable via help(Bezier).
Lets say you're writing a child class that has a constructor that passes its unused kwargs up to the parent constructor, but your class has the argument x that it needs to store that shouldn't be passed to the parent.
I have seen two different approaches to this:
def __init__(self, **kwargs):
self.x = kwargs.pop('x', 'default')
super().__init__(**kwargs)
and
def __init__(self, x='default', **kwargs):
self.x = x
super().__init__(**kwargs)
Is there every any functional difference between these two constructors? Is there any reason to use one over the other?
The only difference I can see is that the second form, which defines x in the signature, allows the user to better see it as a possible argument, or an IDE to offer it as an autocomplete option. Or in Python 3.5+, you could add a type annotation to x. Does that make the first form objectively worse?
As already mentionned by Giacomo Alzetta in a comment, the second version allow to pass x as a positional argument when the first only allow named arguments, IOW with the second form you can use both Child(x=2) AND Child(2), while the first only supports Child(x=2).
Also, when using inspection to check the method's signature, the second form will clearly mention the existance of the x param, while the first won't.
And finally, the second version will yield a slightly clearer exception if x is not passed.
And that's for the functional differences.
Is there any reason to use one over the other?
Well... As a general rule, it's cleaner (best practice) to use explicit arguments whenever possible, even if only for readability, and from experience it does usually make maintenance easier indeed. So from this point of view, the second form can be seen as "objectively better" than the first.
This being said, when the parent method has dozens of mostly optional and rarely used arguments (django.forms.Form, I'm lookig at you) AND you also want to preserve positional arguments order, it can be convenient to just use the generic *args, **kwargs signature for the child and force the additional param(s) to be passed as kwargs. Assuming you clearly document this in the docstring, it's still explicit enough (as far as I'm concerned, YMMV), and also avoids a lot of clutter (you can have a look at django.forms.Form for a concrete example of what I mean here).
So as always with "best practices" and other golden rules, you have to understand and weight the pros and cons wrt/ the concrete case at hand.
PS: just to make things clear, django's Form class signature makes perfect sense so I'm not ranting here - it's just one of those cases where there's no "beautiful" solution to the problem, period.
Aside obvious differences in code clarity, there might be a little difference in speed of calling the function, in this case method init().
If you can, define all necessary arguments with default values if you have some, in both methods, and pass them classically, and exclude ones you do not wish.
In this way you make the code clear and speed of calls stays the same.
If you need some micro-optimization, then use timeit to check what works faster.
I expect that one with the "x" added as an argument will perhaps be a winner.
Because getting to its value directly from local variables will be faster and kwargs dict() is smaller.
When you use "normal" arguments, they are automatically inserted into the functions local variables dictionary.
When you use *args and/or **kwargs they are additional tuple() and/or dict() added as new local variables. They are first created from the arguments you passed into the function call.
When you are passing them to a next function, they are extracted
to match that function's call signature. In both operations you lose a tiny bit of speed.
If you add removing something from the kwargs dictionary, ( x = kwargs.pop("x") ), you also lose some speed.
By observing both codes, it seems that their call speed would be equal. But you should check. If you do not need an extra 0.000001 seconds when initializing your instances, then both options are fine and just choose what you like most.
But again, if you are free to do it, and if it will not greatly impair the code's maintenance, define all arguments and their default values and pass them on one-by-one.
If I have some code like this:
x.y.z = 12
I can infer that the z member is being indexed from the call to __setattr__. However if I have something like this:
foo = x.y.z # situation 1
bar = x.y.z.bar # situation 2
How can I determine which of the above situations I am in, if I care to do something special for z based on whether or not it is last in the chain of indexing? Is this kind of inference even possible in Python?
For more clarity let's assume I can change the implementation of all the objects being indexed, so using descriptors is wholly possible.
I worry that the answer to this question is "you can't do that" since it is impossible to override = like you can in C++.
I'm not sure how you define 'being last at chain of indexing'. You can still call more attributes on an object at any time.
But you can know when your object is being accessed as an attribute. As mentioned before, you can overload __getattr__ and __getattribute__, but a more robust way would be with descriptors.
This can get you started: http://nbviewer.jupyter.org/urls/gist.github.com/ChrisBeaumont/5758381/raw/descriptor_writeup.ipynb
Alternatively, there's a more formal guide: https://docs.python.org/3/howto/descriptor.html
There is no way to do this with python overrides. The only way is to have a known member that means "the end." For example, if you wanted to know which member was being set in a long chain of indexes you'd need some kind of setter:
x.y.z.set(some_value)
Out of curiosity is more desirable to explicitly pass functions to other functions, or let the function call functions from within. is this a case of Explicit is better than implicit?
for example (the following is only to illustrate what i mean)
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(xs,ys):
return partialfun(sum(map(operator.mul,xs,ys)))
>>> bar([1,2,3], [4,5,6])
--or--
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> bar(partialfun, [1,2,3], [4,5,6])
There's not really any difference between functions and anything else in this situation. You pass something as an argument if it's a parameter that might vary over different invocations of the function. If the function you are calling (bar in your example) is always calling the same other function, there's no reason to pass that as an argument. If you need to parameterize it so that you can use many different functions (i.e., bar might need to call many functions besides partialfun, and needs to know which one to call), then you need to pass it as an argument.
Generally, yes, but as always, it depends. What you are illustrating here is known as dependency injection. Generally, it is a good idea, as it allows separation of variability from the logic of a given function. This means, for example, that it will be extremely easy for you to test such code.
# To test the process performed in bar(), we can "inject" a function
# which simply returns its argument
def dummy(x):
return x
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> assert bar(dummy, [1,2,3], [4,5,6]) == 32
It depends very much on the context.
Basically, if the function is an argument to bar, then it's the responsibility of the caller to know how to implement that function. bar doesn't have to care. But consequently, bar's documentation has to describe what kind of function it needs.
Often this is very appropriate. The obvious example is the map builtin function. map implements the logic of applying a function to each item in a list, and giving back a list of results. map itself neither knows nor cares about what the items are, or what the function is doing to them. map's documentation has to describe that it needs a function of one argument, and each caller of map has to know how to implement or find a suitable function. But this arrangement is great; it allows you to pass a list of your custom objects, and a function which operates specifically on those objects, and map can go away and do its generic thing.
But often this arrangement is inappropriate. A function gives a name to a high level operation and hides the internal implementation details, so you can think of the operation as a unit. Allowing part of its operation to be passed in from outside as a function parameter exposes that it works in a way that uses that function's interface.
A more concrete (though somewhat contrived) example may help. Lets say I've implemented data types representing Person and Job, and I'm writing a function name_and_title for formatting someone's full name and job title into a string, for client code to insert into email signatures or on letterhead or whatever. It's obviously going to take a Person and Job. It could potentially take a function parameter to let the caller decide how to format the person's name: something like lambda firstname, lastname: lastname + ', ' + firstname. But to do this is to expose that I'm representing people's names with a separate first name and last name. If I want to change to supporting a middle name, then either name_and_title won't be able to include the middle name, or I have to change the type of the function it accepts. When I realise that some people have 4 or more names and decide to change to storing a list of names, then I definitely have to change the type of function name_and_title accepts.
So for your bar example, we can't say which is better, because it's an abstract example with no meaning. It depends on whether the call to partialfun is an implementation detail of whatever bar is supposed to be doing, or whether the call to partialfun is something that the caller knows about (and might want to do something else). If it's "part of" bar, then it shouldn't be a parameter. If it's "part of" the caller, then it should be a parameter.
It's worth noting that bar could have a huge number of function parameters. You call sum, map, and operator.mul, which could all be parameterised to make bar more flexible:
def bar(fn, xs,ys, g, h, i):
return fn(g(h(i,xs,ys))
And the way in which g is called on the output of h could be abstracted too:
def bar(fn, xs, ys, g, h, i, j):
return fn(j(g, h(i, xs, ys)))
And we can keep going on and on, until bar doesn't do anything at all, and everything is controlled by the functions passed in, and the caller might as well have just directly done what they want done rather than writing 100 functions to do it and passing those to bar to execute the functions.
So there really isn't a definite answer one way or the other that applies all the time. It depends on the particular code you're writing.
i am a python newbie, and i am not sure why python implemented len(obj), max(obj), and min(obj) as a static like functions (i am from the java language) over obj.len(), obj.max(), and obj.min()
what are the advantages and disadvantages (other than obvious inconsistency) of having len()... over the method calls?
why guido chose this over the method calls? (this could have been solved in python3 if needed, but it wasn't changed in python3, so there gotta be good reasons...i hope)
thanks!!
The big advantage is that built-in functions (and operators) can apply extra logic when appropriate, beyond simply calling the special methods. For example, min can look at several arguments and apply the appropriate inequality checks, or it can accept a single iterable argument and proceed similarly; abs when called on an object without a special method __abs__ could try comparing said object with 0 and using the object change sign method if needed (though it currently doesn't); and so forth.
So, for consistency, all operations with wide applicability must always go through built-ins and/or operators, and it's those built-ins responsibility to look up and apply the appropriate special methods (on one or more of the arguments), use alternate logic where applicable, and so forth.
An example where this principle wasn't correctly applied (but the inconsistency was fixed in Python 3) is "step an iterator forward": in 2.5 and earlier, you needed to define and call the non-specially-named next method on the iterator. In 2.6 and later you can do it the right way: the iterator object defines __next__, the new next built-in can call it and apply extra logic, for example to supply a default value (in 2.6 you can still do it the bad old way, for backwards compatibility, though in 3.* you can't any more).
Another example: consider the expression x + y. In a traditional object-oriented language (able to dispatch only on the type of the leftmost argument -- like Python, Ruby, Java, C++, C#, &c) if x is of some built-in type and y is of your own fancy new type, you're sadly out of luck if the language insists on delegating all the logic to the method of type(x) that implements addition (assuming the language allows operator overloading;-).
In Python, the + operator (and similarly of course the builtin operator.add, if that's what you prefer) tries x's type's __add__, and if that one doesn't know what to do with y, then tries y's type's __radd__. So you can define your types that know how to add themselves to integers, floats, complex, etc etc, as well as ones that know how to add such built-in numeric types to themselves (i.e., you can code it so that x + y and y + x both work fine, when y is an instance of your fancy new type and x is an instance of some builtin numeric type).
"Generic functions" (as in PEAK) are a more elegant approach (allowing any overriding based on a combination of types, never with the crazy monomaniac focus on the leftmost arguments that OOP encourages!-), but (a) they were unfortunately not accepted for Python 3, and (b) they do of course require the generic function to be expressed as free-standing (it would be absolutely crazy to have to consider the function as "belonging" to any single type, where the whole POINT is that can be differently overridden/overloaded based on arbitrary combination of its several arguments' types!-). Anybody who's ever programmed in Common Lisp, Dylan, or PEAK, knows what I'm talking about;-).
So, free-standing functions and operators are just THE right, consistent way to go (even though the lack of generic functions, in bare-bones Python, does remove some fraction of the inherent elegance, it's still a reasonable mix of elegance and practicality!-).
It emphasizes the capabilities of an object, not its methods or type. Capabilites are declared by "helper" functions such as __iter__ and __len__ but they don't make up the interface. The interface is in the builtin functions, and beside this also in the buit-in operators like + and [] for indexing and slicing.
Sometimes, it is not a one-to-one correspondance: For example, iter(obj) returns an iterator for an object, and will work even if __iter__ is not defined. If not defined, it goes on to look if the object defines __getitem__ and will return an iterator accessing the object index-wise (like an array).
This goes together with Python's Duck Typing, we care only about what we can do with an object, not that it is of a particular type.
Actually, those aren't "static" methods in the way you are thinking about them. They are built-in functions that really just alias to certain methods on python objects that implement them.
>>> class Foo(object):
... def __len__(self):
... return 42
...
>>> f = Foo()
>>> len(f)
42
These are always available to be called whether or not the object implements them or not. The point is to have some consistency. Instead of some class having a method called length() and another called size(), the convention is to implement len and let the callers always access it by the more readable len(obj) instead of obj.methodThatDoesSomethingCommon
I thought the reason was so these basic operations could be done on iterators with the same interface as containers. However, it actually doesn't work with len:
def foo():
for i in range(10):
yield i
print len(foo())
... fails with TypeError. len() won't consume and count an iterator; it only works with objects that have a __len__ call.
So, as far as I'm concerned, len() shouldn't exist. It's much more natural to say obj.len than len(obj), and much more consistent with the rest of the language and the standard library. We don't say append(lst, 1); we say lst.append(1). Having a separate global method for length is an odd, inconsistent special case, and eats a very obvious name in the global namespace, which is a very bad habit of Python.
This is unrelated to duck typing; you can say getattr(obj, "len") to decide whether you can use len on an object just as easily--and much more consistently--than you can use getattr(obj, "__len__").
All that said, as language warts go--for those who consider this a wart--this is a very easy one to live with.
On the other hand, min and max do work on iterators, which gives them a use apart from any particular object. This is straightforward, so I'll just give an example:
import random
def foo():
for i in range(10):
yield random.randint(0, 100)
print max(foo())
However, there are no __min__ or __max__ methods to override its behavior, so there's no consistent way to provide efficient searching for sorted containers. If a container is sorted on the same key that you're searching, min/max are O(1) operations instead of O(n), and the only way to expose that is by a different, inconsistent method. (This could be fixed in the language relatively easily, of course.)
To follow up with another issue with this: it prevents use of Python's method binding. As a simple, contrived example, you can do this to supply a function to add values to a list:
def add(f):
f(1)
f(2)
f(3)
lst = []
add(lst.append)
print lst
and this works on all member functions. You can't do that with min, max or len, though, since they're not methods of the object they operate on. Instead, you have to resort to functools.partial, a clumsy second-class workaround common in other languages.
Of course, this is an uncommon case; but it's the uncommon cases that tell us about a language's consistency.