Which is a better __repr__ for a custom Python class? - python

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.

The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Related

Best practice: how to pass many arguments to a function?

I am running some numerical simulations, in which my main function must receive lots and lots of arguments - I'm talking 10 to 30 arguments depending on the simulation to run.
What are some best practices to handle cases like this? Dividing the code into, say, 10 functions with 3 arguments each doesn't sound very feasible in my case.
What I do is create an instance of a class (with no methods), store the inputs as attributes of that instance, then pass the instance - so the function receives only one input.
I like this because the code looks clean, easy to read, and because I find it easy to define and run alternative scenarios.
I dislike it because accessing class attributes within a function is slower than accessing a local variable (see: How / why to optimise code by copying class attributes to local variables?) and because it is not an efficient use of memory - too much data stored multiple times unnecessarily.
Any thoughts or recommendations?
myinput=MyInput()
myinput.input_sql_table = that_sql_table
myinput.input_file = that_input_file
myinput.param1 = param1
myinput.param2 = param2
myoutput = calc(myinput)
Alternative scenarios:
inputs=collections.OrderedDict()
scenarios=collections.OrderedDict()
inputs['base scenario']=copy.deepcopy(myinput)
inputs['param2 = 100']=copy.deepcopy(myinput)
inputs['param2 = 100'].param2 = 100
# loop through all the inputs and stores the outputs in the ordered dictionary scenarios
I don't think this is really a StackOverflow question, more of a Software Engineering question. For example check out this question.
As far as whether or not this is a good design pattern, this is an excellent way to handle a large number of arguments. You mentioned that this isn't very efficient in terms of memory or speed, but I think you're making an improper micro-optimization.
As far as memory is concerned, the overhead of running the Python interpreter is going to dwarf the couple of extra bytes used by instantiating your class.
Unless you have run a profiler and determined that accessing members of that options class is slowing you down, I wouldn't worry about it. This is especially the case because you're using Python. If speed is a real concern, you should be using something else.
You may not be aware of this, but most of the large scale number crunching libraries for Python aren't actually written in Python, they're just wrappers around C/C++ libraries that are much faster.
I recommend reading this article, it is well established that "Premature optimization is the root of all evil".
You could pass in a dictionary like so:
all_the_kwargs = {kwarg1: 0, kwarg2: 1, kwargN: xyz}
some_func_or_class(**all_the_kwargs)
def some_func_or_class(kwarg1: int = -1, kwarg2: int = 0, kwargN: str = ''):
print(kwarg1, kwarg2, kwargN)
Or you could use several named tuples like referenced here: Type hints in namedtuple
also note that depending on which version of python you are using there may be a limit to the number of arguments you can pass into a function call.
Or you could use just a dictionary:
def some_func(a_dictionary):
a_dictionary.get('argXYZ', None) # defaults to None if argXYZ doesn't exist

Is it a good idea to inherit from dict or list classes? [duplicate]

Here's my general problem space:
I have a byte/bit protocol with a device over I2C.
I've got a "database" of the commands to fully describe all the bitfields types and values and enumerations.
I have a class to consume the database and a i2c driver/transactor so that I can then call commands and get responses.
MyProtocol = Protocol('database.xml',I2CDriver())
theStatus = MyProtocol.GET_STATUS()
creates the proper byte stream for the GET_STATUS command, sends it over the i2c and returns the response as a byte array currently. I can get it to pretty print the response inside of the GET_STATUS() implementation, but I want to move that behavior to return object, rather than in the command.
I want my return object to be 'smart': theStatus needs to have the list/array of bytes plus a reference to its field definitions.
I want theStatus to act like an list/bytearray, so I can directly inspect the bytes. I don't care if slices are anything other than lists of bytes or bytearray. Once they've been sliced out they're just bytes.
I want 'theStatus' to be able to be printed print(theStatus) and have it pretty print all the fields in the status. I'm comfortable on how to make this happen once I settle on a workable data structure that allows me to access the bytes and the data base.
I want to inspect theStatus by field name with something like theStatus.FIELDNAME or maybe theStatus['FIELDNAME']'. Same thing: once I have a workable data structure that has the byte array and database as members, I can make this happen.
The problem is I don't know the "right" data structure to cause the least amount of problems.
Any suggestions on the most pythonic way to accomplish this? My initial thought was to subclass list and add the field definitions as a member, but it seems like python doesn't like that idea at all.
Composition seems like the next bet, but getting it to act like a proper list seems like that might be a bunch of work to get it 'right'.
What you really want is to implement a new sequence type, one that is perhaps mutable. You can either create one from scratch by implementing the special methods needed to emulate container types, or you can use a suitable collections.abc collection ABC as a base.
The latter is probably the easiest path to take, as the ABCs provide implementations for many of the methods as base versions that rely on a few abstract methods you must implement.
For example, the (immutable) Sequence ABC only requires you to provide implementations for __getitem__ and __len__; the base ABC implementation provides the rest:
from collections.abc import Sequence
class StatusBytes(Sequence):
def __init__(self, statusbytes):
self._bytes = statusbytes
def __getitem__(self, idx_or_name):
try:
return self._bytes[idx_or_name]
except IndexError:
# assume it is a fieldname
return FIELDNAMES[idx_or_name]
def __len__(self):
return len(self._bytes)
If you really need a full list implementation, including support for rich comparisons (list_a <= list_b), sorting (list_a.sort()), copying [list_a.copy()] and multiplication (list_a * 3), then there is also the collections.UserList() class. This class inherits from collections.abc.MutableSequence, and adds the extra functionality that list offers over the base sequence ABCs. If you don't need that extra functionality, stick to base ABCs.
It sounds like you're looking for collections.UserList.
Make a subclass which inherits from collections.UserList, that's exactly what its for https://docs.python.org/3/library/collections.html#collections.UserList

memory usage #on_trait_change vs _foo_changed()

I did built an application with Enthought Traits, which is using too much memory. I think, the problem is caused by trait notifications:
There seems to be a fundamental difference in memory usage of events caught by #on_trait_change or by using the special naming convention (e.g. _foo_changed() ). I made a little example with two classes Foo and FooDecorator, which i assumed to show exactly the same behaviour. But they don't!
from traits.api import *
class Foo(HasTraits):
a = List(Int)
def _a_changed(self):
pass
def _a_items_changed(self):
pass
class FooDecorator(HasTraits):
a = List(Int)
#on_trait_change('a[]')
def bar(self):
pass
if __name__ == '__main__':
n = 100000
c = FooDecorator
a = [c() for i in range(n)]
When running this script with c = Foo, Windows task manager shows a memory usage for the whole python process of 70MB, which stays constant for increasing n. For c = FooDecorator, the python process is using 450MB, increasing for higher n.
Can you please explain this behaviour to me?
EDIT: Maybe i should rephrase: Why would anyone choose FooDecorator over Foo?
EDIT 2: I just uninstalled python(x,y) 2.7.9 and installed the newest version of canopy with traits 4.5.0. Now the 450MB became 750MB.
EDIT 3: Compiled traits-4.6.0.dev0-py2.7-win-amd64 myself. The outcome is the same as in EDIT 2. So despite all plausibility https://github.com/enthought/traits/pull/248/files does not seem to be the cause.
I believe you are seeing the effect of a memory leak that has been fixed recently:
https://github.com/enthought/traits/pull/248/files
As for why one would use the decorator, in this particular instance the two versions are practically equivalent.
In general, the decorator is more flexible: you can give a list of traits to listen to, and you can use the extended name notation, as described here:
http://docs.enthought.com/traits/traits_user_manual/notification.html#semantics
For example, in this case:
class Bar(HasTraits):
b = Str
class FooDecorator(HasTraits):
a = List(Bar)
#on_trait_change('a.b')
def bar(self):
print 'change'
the bar notifier is going to be called for changes to the trait a, its items, and for the change of the trait b in each of the Bar items. Extended names can be quite powerful.
What's going on here is that Traits has two distinct ways of handling notifications: static notifiers and dynamic notifiers.
Static notifiers (such as those created by the specially-named _*_changed() methods) are fairly light-weight: each trait on an instance has a list of notifiers on t, which are basically the functions or methods with a lightweight wrapper.
Dynamic notifiers (such as those created with on_trait_change() and the extended trait name conventions like a[] are significantly more powerful and flexible, but as a result they are much more heavy-weight. In particular, in addition to the wrapper object they create, they also create a parsed representation of the extended trait name and a handler object, some of which are in-turn HasTraits subclass instances.
As a result, even for a simple expression like a[] there will be a fair number of new Python objects created, and these objects have to be created for every on_trait_change listener on every instance separately to properly handle corner-cases like instance traits. The relevant code is here: https://github.com/enthought/traits/blob/master/traits/has_traits.py#L2330
Base on the reported numbers, the majority of the difference in memory usage that you are seeing is in the creation of this dynamic listener infrastructure for each instance and each on_trait_change decorator.
It's worth noting that there is a short-circuit for on_trait_change in the case where you are using a simple trait name, in which case it generates a static trait notifier instead of a dynamic notifier. So if you were to instead write something like:
class FooSimpleDecorator(HasTraits):
a = List(Int)
#on_trait_change('a')
def a_updated(self):
pass
#on_trait_change('a_items')
def a_items_updated(self):
pass
you should see similar memory performance to the specially-named methods.
To answer the rephrased question about "why use on_trait_change", in FooDecorator you can write one method instead of two if your response to a change of either the list or any items in the list is the same. This makes code significantly easier to debug and maintain, and if you aren't creating thousands of these objects then the extra memory usage is negligible.
This becomes even more of a factor when you consider more sophisticated extended trait name patterns, where the dynamic listeners automatically handle changes which would otherwise require significant manual (and error-prone) code for hooking up and removing listeners from intermediate objects and traits. The power and simplicity of this approach usually outweighs the concerns about memory usage.

Should I subclass list or create class with list as attribute?

I need a container that can collect a number of objects and provides some reporting functionality on the container's elements. Essentially, I'd like to be able to do:
magiclistobject = MagicList()
magiclistobject.report() ### generates all my needed info about the list content
So I thought of subclassing the normal list and adding a report() method. That way, I get to use all the built-in list functionality.
class SubClassedList(list):
def __init__(self):
list.__init__(self)
def report(self): # forgive the silly example
if 999 in self:
print "999 Alert!"
Instead, I could also create my own class that has a magiclist attribute but I would then have to create new methods for appending, extending, etc., if I want to get to the list using:
magiclistobject.append() # instead of magiclistobject.list.append()
I would need something like this (which seems redundant):
class MagicList():
def __init__(self):
self.list = []
def append(self,element):
self.list.append(element)
def extend(self,element):
self.list.extend(element)
# more list functionality as needed...
def report(self):
if 999 in self.list:
print "999 Alert!"
I thought that subclassing the list would be a no-brainer. But this post here makes it sounds like a no-no. Why?
One reason why extending list might be bad is since it ties together your 'MagicReport' object too closely to the list. For example, a Python list supports the following methods:
append
count
extend
index
insert
pop
remove
reverse
sort
It also contains a whole host of other operations (adding, comparisons using < and >, slicing, etc).
Are all of those operations things that your 'MagicReport' object actually wants to support? For example, the following is legal Python:
b = [1, 2]
b *= 3
print b # [1, 2, 1, 2, 1, 2]
This is a pretty contrived example, but if you inherit from 'list', your 'MagicReport' object will do exactly the same thing if somebody inadvertently does something like this.
As another example, what if you try slicing your MagicReport object?
m = MagicReport()
# Add stuff to m
slice = m[2:3]
print type(slice)
You'd probably expect the slice to be another MagicReport object, but it's actually a list. You'd need to override __getslice__ in order to avoid surprising behavior, which is a bit of a pain.
It also makes it harder for you to change the implementation of your MagicReport object. If you end up needing to do more sophisticated analysis, it often helps to be able to change the underlying data structure into something more suited for the problem.
If you subclass list, you could get around this problem by just providing new append, extend, etc methods so that you don't change the interface, but you won't have any clear way of determining which of the list methods are actually being used unless you read through the entire codebase. However, if you use composition and just have a list as a field and create methods for the operations you support, you know exactly what needs to be changed.
I actually ran into a scenario very similar to your at work recently. I had an object which contained a collection of 'things' which I first internally represented as a list. As the requirements of the project changed, I ended up changing the object to internally use a dict, a custom collections object, then finally an OrderedDict in rapid succession. At least in my experience, composition makes it much easier to change how something is implemented as opposed to inheritance.
That being said, I think extending list might be ok in scenarios where your 'MagicReport' object is legitimately a list in all but name. If you do want to use MagicReport as a list in every single way, and don't plan on changing its implementation, then it just might be more convenient to subclass list and just be done with it.
Though in that case, it might be better to just use a list and write a 'report' function -- I can't imagine you needing to report the contents of the list more than once, and creating a custom object with a custom method just for that purpose might be overkill (though this obviously depends on what exactly you're trying to do)
As a general rule, whenever you ask yourself "should I inherit or have a member of that type", choose not to inherit. This rule of thumb is known as "favour composition over inheritance".
The reason why this is so is: composition is appropriate where you want to use features of another class; inheritance is appropriate if other code needs to use the features of the other class with the class you are creating.

GetAttr Function Problems (Python 3)

I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.

Categories

Resources