Question about whether to include something in the __init__() method - python

I am new to OOP and hence, am looking for suggestions on good practice for coding something where the following issue arises.
I am defining a Seller(a, b, c, d) class. There are many attributes of this class, two of which are, mostRecentProfit and profitHistory. However, values of these two are not known when the class is initialized. Some other steps in the program have to be executed before these are realized. My questions is:
In the __init__(a, b, c, d) of the seller class, should I write
self.mostRecentProfit = None
self.profitHistory = []
or, should I not define these at all in the __init__ method. The reason former appears attractive to me is that by looking at the __init__() method, I can know all the attributes for the class. However, that may not be a good reason for doing this. Any suggestions would be appreciated.
Thank you.

Defining the attributes in __init__() makes the code better for when someone who has not seen the code has to start working with it. It can be confusing when a class starts accessing an attribute that doesn't seem to exist at first.
Also, since one of your default values is a list instead of None, initializing it means you can always treat the attribute as a list and never have to worry about it's state.

I would define them. In my experience, not doing so when the code dealing with the instances makes frequent references to those properties, means you end up forever typing if object.profitHistory: before looping etc. With an empty list there, you can skip those conditions. And as you say, it makes it much more legible.

I would define them all in the __init() method because that would not only document what they all normally were, but if you define their default values to all be something valid, allow most of the rest of your code to easily process instances of the class even if these attributes never get updated.
So, in your example, that would mean initializing self.mostRecentProfit to 0 or perhaps 0.0 rather than None. Doing this would allow it to be used as a number without checking for it's existence with a value not equal to None before each reference to it or wrapping each of them in a try/except block to handle the cases where they were never explicitly set to another value.

Related

Where am I in a chain of indexing?

If I have some code like this:
x.y.z = 12
I can infer that the z member is being indexed from the call to __setattr__. However if I have something like this:
foo = x.y.z # situation 1
bar = x.y.z.bar # situation 2
How can I determine which of the above situations I am in, if I care to do something special for z based on whether or not it is last in the chain of indexing? Is this kind of inference even possible in Python?
For more clarity let's assume I can change the implementation of all the objects being indexed, so using descriptors is wholly possible.
I worry that the answer to this question is "you can't do that" since it is impossible to override = like you can in C++.
I'm not sure how you define 'being last at chain of indexing'. You can still call more attributes on an object at any time.
But you can know when your object is being accessed as an attribute. As mentioned before, you can overload __getattr__ and __getattribute__, but a more robust way would be with descriptors.
This can get you started: http://nbviewer.jupyter.org/urls/gist.github.com/ChrisBeaumont/5758381/raw/descriptor_writeup.ipynb
Alternatively, there's a more formal guide: https://docs.python.org/3/howto/descriptor.html
There is no way to do this with python overrides. The only way is to have a known member that means "the end." For example, if you wanted to know which member was being set in a long chain of indexes you'd need some kind of setter:
x.y.z.set(some_value)

Should I subclass list or create class with list as attribute?

I need a container that can collect a number of objects and provides some reporting functionality on the container's elements. Essentially, I'd like to be able to do:
magiclistobject = MagicList()
magiclistobject.report() ### generates all my needed info about the list content
So I thought of subclassing the normal list and adding a report() method. That way, I get to use all the built-in list functionality.
class SubClassedList(list):
def __init__(self):
list.__init__(self)
def report(self): # forgive the silly example
if 999 in self:
print "999 Alert!"
Instead, I could also create my own class that has a magiclist attribute but I would then have to create new methods for appending, extending, etc., if I want to get to the list using:
magiclistobject.append() # instead of magiclistobject.list.append()
I would need something like this (which seems redundant):
class MagicList():
def __init__(self):
self.list = []
def append(self,element):
self.list.append(element)
def extend(self,element):
self.list.extend(element)
# more list functionality as needed...
def report(self):
if 999 in self.list:
print "999 Alert!"
I thought that subclassing the list would be a no-brainer. But this post here makes it sounds like a no-no. Why?
One reason why extending list might be bad is since it ties together your 'MagicReport' object too closely to the list. For example, a Python list supports the following methods:
append
count
extend
index
insert
pop
remove
reverse
sort
It also contains a whole host of other operations (adding, comparisons using < and >, slicing, etc).
Are all of those operations things that your 'MagicReport' object actually wants to support? For example, the following is legal Python:
b = [1, 2]
b *= 3
print b # [1, 2, 1, 2, 1, 2]
This is a pretty contrived example, but if you inherit from 'list', your 'MagicReport' object will do exactly the same thing if somebody inadvertently does something like this.
As another example, what if you try slicing your MagicReport object?
m = MagicReport()
# Add stuff to m
slice = m[2:3]
print type(slice)
You'd probably expect the slice to be another MagicReport object, but it's actually a list. You'd need to override __getslice__ in order to avoid surprising behavior, which is a bit of a pain.
It also makes it harder for you to change the implementation of your MagicReport object. If you end up needing to do more sophisticated analysis, it often helps to be able to change the underlying data structure into something more suited for the problem.
If you subclass list, you could get around this problem by just providing new append, extend, etc methods so that you don't change the interface, but you won't have any clear way of determining which of the list methods are actually being used unless you read through the entire codebase. However, if you use composition and just have a list as a field and create methods for the operations you support, you know exactly what needs to be changed.
I actually ran into a scenario very similar to your at work recently. I had an object which contained a collection of 'things' which I first internally represented as a list. As the requirements of the project changed, I ended up changing the object to internally use a dict, a custom collections object, then finally an OrderedDict in rapid succession. At least in my experience, composition makes it much easier to change how something is implemented as opposed to inheritance.
That being said, I think extending list might be ok in scenarios where your 'MagicReport' object is legitimately a list in all but name. If you do want to use MagicReport as a list in every single way, and don't plan on changing its implementation, then it just might be more convenient to subclass list and just be done with it.
Though in that case, it might be better to just use a list and write a 'report' function -- I can't imagine you needing to report the contents of the list more than once, and creating a custom object with a custom method just for that purpose might be overkill (though this obviously depends on what exactly you're trying to do)
As a general rule, whenever you ask yourself "should I inherit or have a member of that type", choose not to inherit. This rule of thumb is known as "favour composition over inheritance".
The reason why this is so is: composition is appropriate where you want to use features of another class; inheritance is appropriate if other code needs to use the features of the other class with the class you are creating.

Python: Using a dummy class to pass variable names?

This is a followup to function that returns a dict whose keys are the names of the input arguments, which I learned many things (paraphrased):
Python objects, on the whole, don't know their names.
No, this is not possible in general with *args. You'll have to use keyword arguments
When the number of arguments is fixed, you can do this with locals
Using globals(). This will only work if the values are unique in the module scope, so it's fragile
You're probably better off not doing this anyway and rethinking the problem.
The first point highlighting my fundamental misunderstanding of Python variables. The responses were very pedagogic and nearly instantaneous, clearly this is both a well-understood yet easily confused topic.
Since I'd like to learn how to do things proper, is it considered bad practice to create a dummy class to simply hold the variables with names attached to them?
class system: pass
S = system ()
S.T = 1.0
S.N = 20
S.L = 10
print vars(S)
This accomplishes my original intent, but I'm left wondering if there is something I'm not considering that can bite me later.
I do it as a homage to Javascript, where you don't have any distinction between dictionaries and instance variables. I think it's not necessarily an antipattern, also because differently from dictionaries, if you don't have the value it raises AttributeError instead of KeyError, and it is easier to spot typos of the name. As I said, not an antipattern, provided that
the scope of the class is restricted to a very specific usage
the routine or method you are calling (e.g. vars in your example) is private in nature. I would not want a public interface with that calling semantics, nor I want it as a returned entity
the name of the "dummy" class is extremely clear in its intent and the kind of aggregate it represents.
the lifetime of that object is short and uneventful. It is just a temporary bag of data.
If these constraints are not respected, go for a fully recognized class with properties.
you can do that, but why not use a dictionary?
but if you do that, you're better off passing keywords args to the class's constructor, and then let the constructor copy them to the app's members. something like:
class Foo(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)

Vector in python

I'm working on this project which deals with vectors in python. But I'm new to python and don't really know how to crack it. Here's the instruction:
"Add a constructor to the Vector class. The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these, then consider this argument to be the length of the Vector instance. In this case, construct a Vector of the specified length with each element is initialized to 0.0. If the length is negative, raise a ValueError with an appropriate message. If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence. If the argument is not used as the length of the vector and if it is not a sequence, then raise a TypeError with an appropriate message.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
I'm not sure how to do the class type checking, as well as how to initialize the vector based on the given object. Could someone please help me with this? Thanks!
Your instructor seems not to "speak Python as a native language". ;) The entire concept for the class is pretty silly; real Python programmers just use the built-in sequence types directly. But then, this sort of thing is normal for academic exercises, sadly...
Add a constructor to the Vector class.
In Python, the common "this is how you create a new object and say what it's an instance of" stuff is handled internally by default, and then the baby object is passed to the class' initialization method to make it into a "proper" instance, by setting the attributes that new instances of the class should have. We call that method __init__.
The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these
This is tested by using the builtin function isinstance. You can look it up for yourself in the documentation (or try help(isinstance) at the REPL).
In this case, construct a Vector of the specified length with each element is initialized to 0.0.
In our __init__, we generally just assign the starting values for attributes. The first parameter to __init__ is the new object we're initializing, which we usually call "self" so that people understand what we're doing. The rest of the arguments are whatever was passed when the caller requested an instance. In our case, we're always expecting exactly one argument. It might have different types and different meanings, so we should give it a generic name.
When we detect that the generic argument is an integer type with isinstance, we "construct" the vector by setting the appropriate data. We just assign to some attribute of self (call it whatever makes sense), and the value will be... well, what are you going to use to represent the vector's data internally? Hopefully you've already thought about this :)
If the length is negative, raise a ValueError with an appropriate message.
Oh, good point... we should check that before we try to construct our storage. Some of the obvious ways to do it would basically treat a negative number the same as zero. Other ways might raise an exception that we don't get to control.
If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence.
"Sequence" is a much fuzzier concept; lists and tuples and what-not don't have a "sequence" base class, so we can't easily check this with isinstance. (After all, someone could easily invent a new kind of sequence that we didn't think of). The easiest way to check if something is a sequence is to try to create an iterator for it, with the built-in iter function. This will already raise a fairly meaningful TypeError if the thing isn't iterable (try it!), so that makes the error handling easy - we just let it do its thing.
Assuming we got an iterator, we can easily create our storage: most sequence types (and I assume you have one of them in mind already, and that one is certainly included) will accept an iterator for their __init__ method and do the obvious thing of copying the sequence data.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
Hopefully this is self-explanatory. Hint: you should be able to simplify this by making use of the storage attribute's own __repr__. Also consider using string formatting to put the string together.
Everything you need to get started is here:
http://docs.python.org/library/functions.html
There are many examples of how to check types in Python on StackOverflow (see my comment for the top-rated one).
To initialize a class, use the __init__ method:
class Vector(object):
def __init__(self, sequence):
self._internal_list = list(sequence)
Now you can call:
my_vector = Vector([1, 2, 3])
And inside other functions in Vector, you can refer to self._internal_list. I put _ before the variable name to indicate that it shouldn't be changed from outside the class.
The documentation for the list function may be useful for you.
You can do the type checking with isinstance.
The initialization of a class with done with an __init__ method.
Good luck with your assignment :-)
This may or may not be appropriate depending on the homework, but in Python programming it's not very usual to explicitly check the type of an argument and change the behaviour based on that. It's more normal to just try to use the features you expect it to have (possibly catching exceptions if necessary to fall back to other options).
In this particular example, a normal Python programmer implementing a Vector that needed to work this way would try using the argument as if it were an integer/long (hint: what happens if you multiply a list by an integer?) to initialize the Vector and if that throws an exception try using it as if it were a sequence, and if that failed as well then you can throw a TypeError.
The reason for doing this is that it leaves your class open to working with other objects types people come up with later that aren't integers or sequences but work like them. In particular it's very difficult to comprehensively check whether something is a "sequence", because user-defined classes that can be used as sequences don't have to be instances of any common type you can check. The Vector class itself is quite a good candidate for using to initialize a Vector, for example!
But I'm not sure if this is the answer your teacher is expecting. If you haven't learned about exception handling yet, then you're almost certainly not meant to use this approach so please ignore my post. Good luck with your learning!

Parameter names in Python functions that take single object or iterable

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Categories

Resources