Python OrderedSet with .index() method - python

Does anyone know about a fast OrderedSet implementation for python that:
remembers insertion order
has an index() method (like the one lists offer)
All implementations I found are missing the .index() method.

You can always add it in a subclass. Here is a basic implementation for the OrderedSet you linked in a comment:
class IndexOrderedSet(OrderedSet):
def index(self, elem):
if key in self.map:
return next(i for i, e in enumerate(self) if e == elem)
else:
raise KeyError("That element isn't in the set")
You mentioned you only need add, index, and in-order iteration. You can get this by using an OrderedDict as storage. As a bonus, you can subclass the collections.Set abstract class to get the other set operations frozensets support:
from itertools import count, izip
from collections import OrderedDict, Set
class IndexOrderedSet(Set):
"""An OrderedFrozenSet-like object
Allows constant time 'index'ing
But doesn't allow you to remove elements"""
def __init__(self, iterable = ()):
self.num = count()
self.dict = OrderedDict(izip(iterable, self.num))
def add(self, elem):
if elem not in self:
self.dict[elem] = next(self.num)
def index(self, elem):
return self.dict[elem]
def __contains__(self, elem):
return elem in self.dict
def __len__(self):
return len(self.dict)
def __iter__(self):
return iter(self.dict)
def __repr__(self):
return 'IndexOrderedSet({})'.format(self.dict.keys())
You can't subclass collections.MutableSet because you can't support removing elements from the set and keep the indexes correct.

Related

Should I implement __contains__ method when implementing Mapping or MutableMapping in Python?

I wrote a code for a custom dictionary using a list.
According to collections.abc module I only need to implement __getitem__, __iter__, __len__ when impelementing Mapping and I did.
But a code include in statement which using __contains__ methods inside seems always returns True.
Should I write __contains__ method too?
from collections import Mapping
class ListDict(Mapping):
def __init__(self):
self._data = [] # [(key, value)]
def __getitem__(self, item):
for k, v in self._data:
if k == item:
return v
def __iter__(self):
return iter(e[0] for e in self._data)
def __len__(self):
return len(self._data)
d = ListDict()
print('key' in d) # True
You forgot to handle the key-not-found case in your __getitem__. The provided __contains__ implementation relies on __getitem__ raising a KeyError for not-present keys. Your __getitem__ is broken, so the provided __contains__ breaks too.
Raise a KeyError:
def __getitem__(self, item):
for k, v in self._data:
if k == item:
return v
raise KeyError(item)

Python iterable collection that wraps a dictionary

I'm trying to modify the following code, so that MyCollection will wrap a dictionary. I still have to implement the iter and next methods in order to have the "for element in collection" functionality. I know that can be easily done by iterating through the values, but I am required to do it like this. Can someone help me ?
class MyCollection:
def __init__(self):
self._data = [] // should be {}
def __iter__(self):
'''
Return an iterator
'''
self._iterPoz = 0
return self
def __next__(self):
'''
Returns the next element of the iteration
'''
if self._iterPoz >= len(self._data):
raise StopIteration()
rez = self._data[self._iterPoz]
self._iterPoz = self._iterPoz + 1
return rez
This begins with a design decision. When you iterate MyCollection what data do you want? If its the values of the contained dictionary you can return its iterator and then you don't implement __next__ at all.
class MyCollection:
def __init__(self):
self._data = {}
def __iter__(self):
'''
Return an iterator of contained values
'''
return iter(self._data.values())

Custom key function for python defaultdict

What is a good way to define a custom key function analogous to the key argument to list.sort, for use in a collections.defaultdict?
Here's an example use case:
import collections
class Path(object):
def __init__(self, start, end, *other_features):
self._first = start
self._last = end
self._rest = other_features
def startpoint(self):
return self._first
def endpoint(self):
return self._last
# Maybe it has __eq__ and __hash__, maybe not
paths = [... a list of Path objects ...]
by_endpoint = collections.defaultdict(list)
for p in paths:
by_last_name[p.endpoint()].append(p)
# do stuff that depends on lumping paths with the same endpoint together
What I desire is a way to tell by_endpoint to use Path.endpoint as the key function, similar to the key argument to list.sort, and not have to put this key definition into the Path class itself (via __eq__ and __hash__), since it is just as sensible to also support "lumping by start point" as well.
Something like this maybe:
from collections import defaultdict
class defaultkeydict(defaultdict):
def __init__(self, default_factory, key=lambda x: x, *args, **kwargs):
defaultdict.__init__(self, default_factory, *args, **kwargs)
self.key_func = key
def __getitem__(self, key):
return defaultdict.__getitem__(self, self.get_key(key))
def __setitem__(self, key, value):
defaultdict.__setitem__(self, self.get_key(key), value)
def get_key(self, key):
try:
return self.key_func(key)
except Exception:
return key
Note the logic that falls back to the passed-in key if the key function can't be executed. That way you can still access the items using strings or whatever keys.
Now:
p = Path("Seattle", "Boston")
d = defaultkeydict(list, key=lambda x: x.endpoint())
d[p].append(p)
print(d) # defaultdict(<type 'list'>, {'Boston': [<__main__.Path object at ...>]})

Subclass Python list to Validate New Items

I want a python list which represents itself externally as an average of its internal list items, but otherwise behaves as a list. It should raise a TypeError if an item is added that can't be cast to a float.
The part I'm stuck on is raising TypeError. It should be raised for invalid items added via any list method, like .append, .extend, +=, setting by slice, etc.
Is there a way to intercept new items added to the list and validate them?
I tried re-validating the whole list in __getattribute__, but when its called I only have access to the old version of the list, plus it doesn't even get called initialization, operators like +=, or for slices like mylist[0] = 5.
Any ideas?
Inherit from MutableSequence and implement the methods it requires as well as any others that fall outside of the scope of Sequences alone -- like the operators here. This will allow you to change the operator manipulations for list-like capabilities while automatically generating iterators and contains capabilities.
If you want to check for slices btw you need to do isinstance(key, slice) in your __getitem__ (and/or __setitem__) methods. Note that a single index like myList[0] is not a slice request, but a single index and myList[:0] is an actual slice request.
The array.array class will take care of the float part:
class AverageList(array.array):
def __new__(cls, *args, **kw):
return array.array.__new__(cls, 'd')
def __init__(self, values=()):
self.extend(values)
def __repr__(self):
if not len(self): return 'Empty'
return repr(math.fsum(self)/len(self))
And some tests:
>>> s = AverageList([1,2])
>>> s
1.5
>>> s.append(9)
>>> s
4.0
>>> s.extend('lol')
Traceback (most recent call last):
File "<pyshell#117>", line 1, in <module>
s.extend('lol')
TypeError: a float is required
Actually the best answer may be: don't.
Checking all objects as they get added to the list will be computationally expensive. What do you gain by doing those checks? It seems to me that you gain very little, and I'd recommend against implementing it.
Python doesn't check types, and so trying to have a little bit of type checking for one object really doesn't make a lot of sense.
There are 7 methods of the list class that add elements to the list and would have to be checked. Here's one compact implementation:
def check_float(x):
try:
f = float(x)
except:
raise TypeError("Cannot add %s to AverageList" % str(x))
def modify_method(f, which_arg=0, takes_list=False):
def new_f(*args):
if takes_list:
map(check_float, args[which_arg + 1])
else:
check_float(args[which_arg + 1])
return f(*args)
return new_f
class AverageList(list):
def __check_float(self, x):
try:
f = float(x)
except:
raise TypeError("Cannot add %s to AverageList" % str(x))
append = modify_method(list.append)
extend = modify_method(list.extend, takes_list=True)
insert = modify_method(list.insert, 1)
__add__ = modify_method(list.__add__, takes_list=True)
__iadd__ = modify_method(list.__iadd__, takes_list=True)
__setitem__ = modify_method(list.__setitem__, 1)
__setslice__ = modify_method(list.__setslice__, 2, takes_list=True)
The general approach would be to create your own class inheriting vom list and overwriting the specific methods like append, extend etc. This will probably also include magic methods of the Python list (see this article for details: http://www.rafekettler.com/magicmethods.html#sequence).
For validation, you will need to overwrite __setitem__(self, key, value)
Here's how to create a subclass using the MutableSequence abstract base class in the collections module as its base class (not fully tested -- an exercise for the reader ;-):
import collections
class AveragedSequence(collections.MutableSequence):
def _validate(self, x):
try: return float(x)
except: raise TypeError("Can't add {} to AveragedSequence".format(x))
def average(self): return sum(self._list) / len(self._list)
def __init__(self, arg): self._list = [self._validate(v) for v in arg]
def __repr__(self): return 'AveragedSequence({!r})'.format(self._list)
def __setitem__(self, i, value): self._list[i] = self._validate(value)
def __delitem__(self, i): del self._list[i]
def insert(i, value): return self._list.insert(i, self._validate(value))
def __getitem__(self, i): return self._list[i]
def __len__(self): return len(self._list)
def __iter__(self): return iter(self._list)
def __contains__(self, item): return item in self._list
if __name__ == '__main__':
avgseq = AveragedSequence(range(10))
print avgseq
print avgseq.average()
avgseq[2] = 3
print avgseq
print avgseq.average()
# ..etc

Python: Defining a class with only Integers Defined

I am defining a class where only a set of integers is used.
I cannot use the following datatypes in defining my class: set, frozenset and dictionaries.
i need help defining:
remove(self,i): Integer i is removed from the set. An exception is raised if i is not in self.
discard(self, i): integer i is removed from the set. No exception is raised if i is not in self
Assuming you are using an internal list based on what you've said, you could do it like so:
class Example(object):
def __init__(self):
self._list = list()
# all your other methods here...
def remove(self, i):
try:
self._list.remove(i)
except ValueError:
raise ValueError("i is not in the set.")
def discard(self, i):
try:
self._list.remove(i)
except ValueError:
pass
remove() tries to remove the element and catches the list's ValueError so it can throw its own. discard() does the same but instead does nothing if a ValueError occurs.
I cannot use the following datatypes in defining my class: set, frozenset and dictionaries.
It looks like you are going to use list.
You can use list's remove method and handle exceptions in appropriate way.
Here's highly inefficient but complete implementation using MutableSet ABC:
import collections
class MySet(collections.MutableSet):
def __init__(self, iterable=tuple()):
self._items = []
for value in iterable:
self.add(value)
def discard(self, value):
try: self._items.remove(value)
except ValueError:
pass
def add(self, value):
if value not in self:
self._items.append(value)
def __iter__(self):
return iter(self._items)
def __len__(self):
return len(self._items)
def __contains__(self, value):
return value in self._items
From collections.MutableSet source:
def remove(self, value):
if value not in self:
raise KeyError(value)
self.discard(value)
Here is something I did with duplication, take some ideas from it
combList = list1 + list2
combList.sort()
last = combList[-1]
for i in range(len(combList)-2, -1, -1):
if last == combList[i]:
del combList[i]
else:
last = combList[i]
combList.sort()
for i in range(len(combList)):
print i+1, combList[i]
I totally agreed with LiOliQ, the only way is to do it as a list.

Categories

Resources