Subclass Python list to Validate New Items - python

I want a python list which represents itself externally as an average of its internal list items, but otherwise behaves as a list. It should raise a TypeError if an item is added that can't be cast to a float.
The part I'm stuck on is raising TypeError. It should be raised for invalid items added via any list method, like .append, .extend, +=, setting by slice, etc.
Is there a way to intercept new items added to the list and validate them?
I tried re-validating the whole list in __getattribute__, but when its called I only have access to the old version of the list, plus it doesn't even get called initialization, operators like +=, or for slices like mylist[0] = 5.
Any ideas?

Inherit from MutableSequence and implement the methods it requires as well as any others that fall outside of the scope of Sequences alone -- like the operators here. This will allow you to change the operator manipulations for list-like capabilities while automatically generating iterators and contains capabilities.
If you want to check for slices btw you need to do isinstance(key, slice) in your __getitem__ (and/or __setitem__) methods. Note that a single index like myList[0] is not a slice request, but a single index and myList[:0] is an actual slice request.

The array.array class will take care of the float part:
class AverageList(array.array):
def __new__(cls, *args, **kw):
return array.array.__new__(cls, 'd')
def __init__(self, values=()):
self.extend(values)
def __repr__(self):
if not len(self): return 'Empty'
return repr(math.fsum(self)/len(self))
And some tests:
>>> s = AverageList([1,2])
>>> s
1.5
>>> s.append(9)
>>> s
4.0
>>> s.extend('lol')
Traceback (most recent call last):
File "<pyshell#117>", line 1, in <module>
s.extend('lol')
TypeError: a float is required

Actually the best answer may be: don't.
Checking all objects as they get added to the list will be computationally expensive. What do you gain by doing those checks? It seems to me that you gain very little, and I'd recommend against implementing it.
Python doesn't check types, and so trying to have a little bit of type checking for one object really doesn't make a lot of sense.

There are 7 methods of the list class that add elements to the list and would have to be checked. Here's one compact implementation:
def check_float(x):
try:
f = float(x)
except:
raise TypeError("Cannot add %s to AverageList" % str(x))
def modify_method(f, which_arg=0, takes_list=False):
def new_f(*args):
if takes_list:
map(check_float, args[which_arg + 1])
else:
check_float(args[which_arg + 1])
return f(*args)
return new_f
class AverageList(list):
def __check_float(self, x):
try:
f = float(x)
except:
raise TypeError("Cannot add %s to AverageList" % str(x))
append = modify_method(list.append)
extend = modify_method(list.extend, takes_list=True)
insert = modify_method(list.insert, 1)
__add__ = modify_method(list.__add__, takes_list=True)
__iadd__ = modify_method(list.__iadd__, takes_list=True)
__setitem__ = modify_method(list.__setitem__, 1)
__setslice__ = modify_method(list.__setslice__, 2, takes_list=True)

The general approach would be to create your own class inheriting vom list and overwriting the specific methods like append, extend etc. This will probably also include magic methods of the Python list (see this article for details: http://www.rafekettler.com/magicmethods.html#sequence).
For validation, you will need to overwrite __setitem__(self, key, value)

Here's how to create a subclass using the MutableSequence abstract base class in the collections module as its base class (not fully tested -- an exercise for the reader ;-):
import collections
class AveragedSequence(collections.MutableSequence):
def _validate(self, x):
try: return float(x)
except: raise TypeError("Can't add {} to AveragedSequence".format(x))
def average(self): return sum(self._list) / len(self._list)
def __init__(self, arg): self._list = [self._validate(v) for v in arg]
def __repr__(self): return 'AveragedSequence({!r})'.format(self._list)
def __setitem__(self, i, value): self._list[i] = self._validate(value)
def __delitem__(self, i): del self._list[i]
def insert(i, value): return self._list.insert(i, self._validate(value))
def __getitem__(self, i): return self._list[i]
def __len__(self): return len(self._list)
def __iter__(self): return iter(self._list)
def __contains__(self, item): return item in self._list
if __name__ == '__main__':
avgseq = AveragedSequence(range(10))
print avgseq
print avgseq.average()
avgseq[2] = 3
print avgseq
print avgseq.average()
# ..etc

Related

How to chain attribute lookups that might return None in Python?

My problem is a general one, how to chain a series of attribute lookups when one of the intermediate ones might return None, but since I ran into this problem trying to use Beautiful Soup, I'm going to ask it in that context.
Beautiful Soup parses an HTML document and returns an object that can be used to access the structured content of that document. For example, if the parsed document is in the variable soup, I can get its title with:
title = soup.head.title.string
My problem is that if the document doesn't have a title, then soup.head.title returns None and the subsequent string lookup throws an exception. I could break up the chain as:
x = soup.head
x = x.title if x else None
title = x.string if x else None
but this, to my eye, is verbose and hard to read.
I could write:
title = soup.head and soup.head.title and soup.title.head.string
but that is verbose and inefficient.
One solution if thought of, which I think is possible, would be to create an object (call it nil) that would return None for any attribute lookup. This would allow me to write:
title = ((soup.head or nil).title or nil).string
but this is pretty ugly. Is there a better way?
The most straightforward way is to wrap in a try...except block.
try:
title = soup.head.title.string
except AttributeError:
print "Title doesn't exist!"
There's really no reason to test at each level when removing each test would raise the same exception in the failure case. I would consider this idiomatic in Python.
You might be able to use reduce for this:
>>> class Foo(object): pass
...
>>> a = Foo()
>>> a.foo = Foo()
>>> a.foo.bar = Foo()
>>> a.foo.bar.baz = Foo()
>>> a.foo.bar.baz.qux = Foo()
>>>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux'],a)
<__main__.Foo object at 0xec2f0>
>>> reduce(lambda x,y:getattr(x,y,''),['foo','bar','baz','qux','quince'],a)
''
In python3.x, I think that reduce is moved to functools though :(
I suppose you could also do this with a simpler function:
def attr_getter(item,attributes)
for a in attributes:
try:
item = getattr(item,a)
except AttributeError:
return None #or whatever on error
return item
Finally, I suppose the nicest way to do this is something like:
try:
title = foo.bar.baz.qux
except AttributeError:
title = None
One solution would be to wrap the outer object inside a Proxy that handles None values for you. See below for a beginning implementation.
import unittest
class SafeProxy(object):
def __init__(self, instance):
self.__dict__["instance"] = instance
def __eq__(self, other):
return self.instance==other
def __call__(self, *args, **kwargs):
return self.instance(*args, **kwargs)
# TODO: Implement other special members
def __getattr__(self, name):
if hasattr(self.__dict__["instance"], name):
return SafeProxy(getattr(self.instance, name))
if name=="val":
return lambda: self.instance
return SafeProxy(None)
def __setattr__(self, name, value):
setattr(self.instance, name, value)
# Simple stub for creating objects for testing
class Dynamic(object):
def __init__(self, **kwargs):
for name, value in kwargs.iteritems():
self.__setattr__(name, value)
def __setattr__(self, name, value):
self.__dict__[name] = value
class Test(unittest.TestCase):
def test_nestedObject(self):
inner = Dynamic(value="value")
middle = Dynamic(child=inner)
outer = Dynamic(child=middle)
wrapper = SafeProxy(outer)
self.assertEqual("value", wrapper.child.child.value)
self.assertEqual(None, wrapper.child.child.child.value)
def test_NoneObject(self):
self.assertEqual(None, SafeProxy(None))
def test_stringOperations(self):
s = SafeProxy("string")
self.assertEqual("String", s.title())
self.assertEqual(type(""), type(s.val()))
self.assertEqual()
if __name__=="__main__":
unittest.main()
NOTE: I am personally not sure wether I would use this in an actual project, but it makes an interesting experiment and I put it here to get people thoughts on this.
I'm running Python 3.9
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)]
and the and key word solves my problem
memo[v] = short_combo and short_combo.copy()
From what I gather this is not pythonic and you should handle the exception.
However in my solution None ambiguity exists within the function, and in this scenario I would think it to be a poor practice to handle exceptions that occur ~50% of the time.
Where I outside of the function and calling it I would handle the exception.
Here is another potential technique, which hides the assignment of the intermediate value in a method call. First we define a class to hold the intermediate value:
class DataHolder(object):
def __init__(self, value = None):
self.v = value
def g(self):
return self.v
def s(self, value):
self.v = value
return value
x = DataHolder(None)
Then we get use it to store the result of each link in the chain of calls:
import bs4;
for html in ('<html><head></head><body></body></html>',
'<html><head><title>Foo</title></head><body></body></html>'):
soup = bs4.BeautifulSoup(html)
print x.s(soup.head) and x.s(x.g().title) and x.s(x.g().string)
# or
print x.s(soup.head) and x.s(x.v.title) and x.v.string
I don't consider this a good solution, but I'm including it here for completeness.
This is how I handled it with inspiration from #TAS and Is there a Python library (or pattern) like Ruby's andand?
class Andand(object):
def __init__(self, item=None):
self.item = item
def __getattr__(self, name):
try:
item = getattr(self.item, name)
return item if name is 'item' else Andand(item)
except AttributeError:
return Andand()
def __call__(self):
return self.item
title = Andand(soup).head.title.string()
My best shoot to handle middle-way null attributes like this is to use pydash as sample code on repl.it here
import pydash
title = pydash.get(soup, 'head.title.string', None)

Can you easily create a list-like object in python that uses something like a descriptor for its items?

I'm trying to write an interface that abstracts another interface somewhat.
The bottom interface is somewhat inconsistent about what it requires: sometimes id's, and sometimes names. I'm trying to hide details like these.
I want to create a list-like object that will allow you to add names to it, but internally store id's associated with those names.
Preferably, I'd like to use something like descriptors for class attributes, except that they work on list items instead. That is, a function (like __get__) is called for everything added to the list to convert it to the id's I want to store internally, and another function (like __set__) to return objects (that provide convenience methods) instead of the actual id's when trying to retrieve items from the list.
So that I can do something like this:
def get_thing_id_from_name(name):
# assume that this is more complicated
return other_api.get_id_from_name_or_whatever(name)
class Thing(object)
def __init__(self, thing_id):
self.id = thing_id
self.name = other_api.get_name_somehow(id)
def __eq__(self, other):
if isinstance(other, basestring):
return self.name == other
if isinstance(other, Thing):
return self.thing_id == other.thing_id
return NotImplemented
tl = ThingList()
tl.append('thing_one')
tl.append('thing_two')
tl[1] = 'thing_three'
print tl[0].id
print tl[0] == 'thing_one'
print tl[1] == Thing(3)
The documentation recommends defining 17 methods (not including a constructor) for an object that acts like a mutable sequence. I don't think subclassing list is going to help me out at all. It feels like I ought to be able to achieve this just defining a getter and setter somewhere.
UserList is apparently depreciated (although is in python3? I'm using 2.7 though).
Is there a way to achieve this, or something similar, without having to redefine so much functionality?
Yo don't need to override all the list methods -- __setitem__, __init__ and \append should be enough - you may want to have insert and some others as well. You could write __setitem__ and __getitem__ to call __set__ and __get__ methods on a sepecial "Thing" class exactly as descriptors do.
Here is a short example - maybe something like what you want:
class Thing(object):
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
class ThingList(list):
def __init__(self, items):
for item in items:
self.append(item)
def append(self, value):
list.append(self, Thing(value))
def __setitem__(self, index, value):
list.__setitem__(self, index, Thing(value))
Example:
>>> a = ThingList(range(3))
>>> a.append("three")
>>> a
[I am a 0, I am a 1, I am a 2, I am a three]
>>> a[0].id
35242896
>>>
-- edit --
The O.P. commented: "I was really hoping that there would be a way to have all the functionality from list - addition, extending, slices etc. and only have to redefine the get/set item behaviour."
So mote it be - one really has to override all relevant methods in this way. But if what we want to avoid is just a lot of boiler plate code with a lot of functions doing almost the same, the new, overriden methods, can be generated dynamically - all we need is a decorator to change ordinary objects into Things for all operations that set values:
class Thing(object):
# Prevents duplicating the wrapping of objects:
def __new__(cls, thing):
if isinstance(thing, cls):
return thing
return object.__new__(cls, thing)
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
def converter(func, cardinality=1):
def new_func(*args):
# Pick the last item in the argument list, which
# for all item setter methods on a list is the one
# which actually contains the values
if cardinality == 1:
args = args[:-1] + (Thing(args[-1] ),)
else:
args = args[:-1] + ([Thing(item) for item in args[-1]],)
return func(*args)
new_func.func_name = func.__name__
return new_func
my_list_dict = {}
for single_setter in ("__setitem__", "append", "insert"):
my_list_dict[single_setter] = converter(getattr(list, single_setter), cardinality=1)
for many_setter in ("__setslice__", "__add__", "__iadd__", "__init__", "extend"):
my_list_dict[many_setter] = converter(getattr(list, many_setter), cardinality="many")
MyList = type("MyList", (list,), my_list_dict)
And it works thus:
>>> a = MyList()
>>> a
[]
>>> a.append(5)
>>> a
[I am a 5]
>>> a + [2,3,4]
[I am a 5, I am a 2, I am a 3, I am a 4]
>>> a.extend(range(4))
>>> a
[I am a 5, I am a 0, I am a 1, I am a 2, I am a 3]
>>> a[1:2] = range(10,12)
>>> a
[I am a 5, I am a 10, I am a 11, I am a 1, I am a 2, I am a 3]
>>>

__str__ and pretty-printing (sub)dictionaries

I have an object that consists primarily of a very large nested dictionary:
class my_object(object):
def __init__(self):
self.the_dict = {} # Big, nested dictionary
I've modified __ str__ to pretty-print the top-level dictionary by simply "printing" the object:
def __str__(self):
pp = pprint.PrettyPrinter()
return pp.pformat(self.the_dict)
My goal here was to make the user's life a bit easier when he/she peruses the object with IPython:
print(the_object) # Pretty-prints entire dict
This works to show the user the entire dictionary, but I would like to expand this functionality to sub-portions of the dictionary as well, allowing the user to get pretty-printed output from commands such as:
print(the_object.the_dict['level1']['level2']['level3'])
(would pretty-print only the 'level3' sub-dict)
Is there a straight-forward way to use __ str__ (or similar) to do this?
You could provide a custom displayhook that prints builtin dictionaries and other objects you choose according to your taste at an interactive prompt:
>>> import sys
>>> oldhook = sys.displayhook
>>> sys.displayhook = your_module.DisplayHook(oldhook)
It doesn't change print obj behavior.
The idea is that your users can choose whether they'd like to use your custom formatting for dicts or not.
When a user says
print(the_object.the_dict['level1']['level2']['level3'])
Python evaluates the_object.the_dict['level1']['level2']['level3'] and (let's say) finds it is a dict, and passes that on to print.
Since the_object.the_dict is a dict, the rest is out of the_object's control. As you burrow down through level1, level2, and level3, only the type of object returned by the_object.the_dict['level1']['level2']['level3'] is going to affect how print behaves. the_object's __str__ method is not going to affect anything beyond the_object itself.
Moreover, when printing nested objects, pprint.pformat uses the repr of the object, not str of the object.
So to get the behave we want, we need the_object.the_dict['level1']['level2']['level3'] to evaluate to something like a dict but with a different __repr__...
You could make a dict-like object (e.g. Turtle) and use Turtles all the way down:
import collections
import pprint
class Turtle(collections.MutableMapping):
def __init__(self,*args,**kwargs):
self._data=dict(*args,**kwargs)
def __getitem__(self,key):
return self._data[key]
def __setitem__(self, key, value):
self._data[key]=value
def __delitem__(self, key):
del self._data[key]
def __iter__(self):
return iter(self._data)
def __len__(self):
return len(self._data)
def __contains__(self, x):
return x in self._data
def __repr__(self):
return pprint.pformat(self._data)
class MyObject(object):
def __init__(self):
self.the_dict=Turtle()
def __repr__(self):
return repr(self.the_dict)
the_object=MyObject()
the_object.the_dict['level1']=Turtle()
the_object.the_dict['level1']['level2']=Turtle()
the_object.the_dict['level1']['level2']['level3']=Turtle({i:i for i in range(20)})
print(the_object)
print(the_object.the_dict['level1']['level2']['level3'])
To use this, you must replace all dicts in your nested dict structure with Turtles.
But really (as you can tell from my fanciful naming), I don't really expect you to use Turtles. Dicts are such nice, optimized builtins, I would not want to add this intermediate object just to effect pretty printing.
If instead you can convince your users to type
from pprint import pprint
then they can just use
pprint(the_object.the_dict['level1']['level2']['level3'])
to get pretty printing.
you can convert the underlying dictionaries to "pretty printing dictionaries" ... perhaps something like this will do:
class my_object( object ):
_pp = pprint.PrettyPrinter()
class PP_dict( dict ):
def __setitem__( self, key, value ):
if isinstance( value, dict ): value = PP_dict( value )
super( my_object.PP_dict, self ).__setitem__( key, value )
def __str__( self ):
return my_object.pp( self )
#property
def the_dict( self ):
return self.__dict__[ 'the_dict' ]
#the_dict.setter
def the_dict( self, value ):
self.__dict__[ 'the_dict' ] = my_object.PP_dict( value )
The property is only because I don't know how you set/manipulate "the_dict".
This approach is limited -- for instance if you put dict-derivatives that are not dicts in the_dict, they will be replaced by PP_dict. Also, if you have other reference to these subdicts, they will no longer be pointing to the same objects.
Another approach would be to put a __getitem__ in my_object directly, that returns a proxy wrapper for the dictionary that pretty prints the current object in __str__, overrides __getitem__ to return proxies for subobjects, and otherwise forwards all acccess/manipulation to the wrapped class.

Python OrderedSet with .index() method

Does anyone know about a fast OrderedSet implementation for python that:
remembers insertion order
has an index() method (like the one lists offer)
All implementations I found are missing the .index() method.
You can always add it in a subclass. Here is a basic implementation for the OrderedSet you linked in a comment:
class IndexOrderedSet(OrderedSet):
def index(self, elem):
if key in self.map:
return next(i for i, e in enumerate(self) if e == elem)
else:
raise KeyError("That element isn't in the set")
You mentioned you only need add, index, and in-order iteration. You can get this by using an OrderedDict as storage. As a bonus, you can subclass the collections.Set abstract class to get the other set operations frozensets support:
from itertools import count, izip
from collections import OrderedDict, Set
class IndexOrderedSet(Set):
"""An OrderedFrozenSet-like object
Allows constant time 'index'ing
But doesn't allow you to remove elements"""
def __init__(self, iterable = ()):
self.num = count()
self.dict = OrderedDict(izip(iterable, self.num))
def add(self, elem):
if elem not in self:
self.dict[elem] = next(self.num)
def index(self, elem):
return self.dict[elem]
def __contains__(self, elem):
return elem in self.dict
def __len__(self):
return len(self.dict)
def __iter__(self):
return iter(self.dict)
def __repr__(self):
return 'IndexOrderedSet({})'.format(self.dict.keys())
You can't subclass collections.MutableSet because you can't support removing elements from the set and keep the indexes correct.

Python: Defining a class with only Integers Defined

I am defining a class where only a set of integers is used.
I cannot use the following datatypes in defining my class: set, frozenset and dictionaries.
i need help defining:
remove(self,i): Integer i is removed from the set. An exception is raised if i is not in self.
discard(self, i): integer i is removed from the set. No exception is raised if i is not in self
Assuming you are using an internal list based on what you've said, you could do it like so:
class Example(object):
def __init__(self):
self._list = list()
# all your other methods here...
def remove(self, i):
try:
self._list.remove(i)
except ValueError:
raise ValueError("i is not in the set.")
def discard(self, i):
try:
self._list.remove(i)
except ValueError:
pass
remove() tries to remove the element and catches the list's ValueError so it can throw its own. discard() does the same but instead does nothing if a ValueError occurs.
I cannot use the following datatypes in defining my class: set, frozenset and dictionaries.
It looks like you are going to use list.
You can use list's remove method and handle exceptions in appropriate way.
Here's highly inefficient but complete implementation using MutableSet ABC:
import collections
class MySet(collections.MutableSet):
def __init__(self, iterable=tuple()):
self._items = []
for value in iterable:
self.add(value)
def discard(self, value):
try: self._items.remove(value)
except ValueError:
pass
def add(self, value):
if value not in self:
self._items.append(value)
def __iter__(self):
return iter(self._items)
def __len__(self):
return len(self._items)
def __contains__(self, value):
return value in self._items
From collections.MutableSet source:
def remove(self, value):
if value not in self:
raise KeyError(value)
self.discard(value)
Here is something I did with duplication, take some ideas from it
combList = list1 + list2
combList.sort()
last = combList[-1]
for i in range(len(combList)-2, -1, -1):
if last == combList[i]:
del combList[i]
else:
last = combList[i]
combList.sort()
for i in range(len(combList)):
print i+1, combList[i]
I totally agreed with LiOliQ, the only way is to do it as a list.

Categories

Resources