Slicing converts UserList to list - python

While implementing a custom list (via UserList) I noticed that all slicing operations return a type of list not of the derived class type. This creates an issue that, after slicing, none of the added functionality is available in the object. Here is a quick test program to demonstrate the issue, just note the the actual code is more complicated.
#!/usr/bin/python3
from collections import UserList
class myList(UserList):
def __init__(self, data=None):
super().__init__(data)
def setFunc(self, data):
self.data.extend(data)
def getFunc(self):
return self.data
l1 = myList()
l1.setFunc([1,2,3,4])
print(type(l1))
l2 = l1[:3]
print(type(l2))
print(l2.getFunc())
<class '__main__.myList'>
<class 'list'>
Traceback (most recent call last):
File "./test.py", line 17, in <module>
print(l2.getFunc())
AttributeError: 'list' object has no attribute 'getFunc'
I can overcome this issue by "casting" the list with l2 = myList(l1[:3]) but it seems like the right solution would be to implement this functionality directly in myList.
I'm not certain the correct/most-elegant way to do this. I suspect putting a cast in __getitem__ would work. Is that the best way or is there a more direct change to the slicing that would be preferred? Also, what other methods should I override in order to assure all operations return a myList not a list?

I'm not sure why this isn't the default behavior in UserList but implementing the following in the derived class seems to fix the issue.
def __getitem__(self, i):
new_data = self.data[i]
if type(new_data) == list:
return self.__class__(new_data)
else:
return new_data
The parameter i for __getitem__ apparently can be either a slice object or an integer so new_data will either be a list or a single element. If it's a list put it in the myList container and return. Otherwise, if it's a single element, just pass that back.

Related

Is there a reason why something like `list[]` raises `SyntaxError` in Python?

Let's say that I want to implement my custom list class, and I want to override __getitem__ so that the item parameter can be initialized with a default None, and behave accordingly:
class CustomList(list):
def __init__(self, iterable, default_index):
self.default_index = default_index
super().__init__(iterable)
def __getitem__(self, item=None):
if item is None:
item = self._default_index
return super().__getitem__(item)
iterable = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
my_list = CustomList(iterable, 2)
This allows for my_list[None], but it would be awesome to have something like my_list[] inherently use the default argument.
Unfortunately that raises SyntaxError, so I'm assuming that the statement is illegal at the grammar level...my question is: why? Would it conflict with some other statements?
I'm very curious about this, so thanks a bunch to anyone willing to explain!
Its not syntactically useful. There isn't a useful way to programatically use my_list[] without literally hard-coding it as such. A single piece of code can't sometimes have a variable in the list reference and other times not. In that case, why not just have a different property that gets the default?
#property
def default(self):
return super().__getitem__(self.default)
#property.setter
def default(self, val):
super().__setitem__(self.default, val)
object.__getitem__(self, val) is defined to have a required positional argument. Python is dynamic and so you can get away with changing that call signature, but that doesn't change how all the other code uses it.
All python operators have a magic method behind them and its always the case that the magic method could expose more features than the operator. Why not let + have a default? So, a = b + would be legal. Once again, that would not be syntactically useful - you could just expose a function if you want to do that.
__getitem__ always takes exactly one argument. You can kindof pass multiple arguments, but this actually just converts it into a tuple:
>>> a = []
>>> a[1, 2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not tuple
Note the "not tuple" in the error message.

Custom Indexing Python Data Structure

I have a class that wraps around python deque from collections. When I go and create a deque x=deque(), and I want to reference the first variable....
In[78]: x[0]
Out[78]: 0
My question is how can use the [] for referencing in the following example wrapper
class deque_wrapper:
def __init__(self):
self.data_structure = deque()
def newCustomAddon(x):
return len(self.data_structure)
def __repr__(self):
return repr(self.data_structure)
Ie, continuing from above example:
In[75]: x[0]
Out[76]: TypeError: 'deque_wrapper' object does not support indexing
I want to customize my own referencing, is that possible?
You want to implement the __getitem__ method:
class DequeWrapper:
def __init__(self):
self.data_structure = deque()
def newCustomAddon(x):
return len(self.data_structure)
def __repr__(self):
return repr(self.data_structure)
def __getitem__(self, index):
# etc
Whenever you do my_obj[x], Python will actually call my_obj.__getitem__(x).
You may also want to consider implementing the __setitem__ method, if applicable. (When you write my_obj[x] = y, Python will actually run my_obj.__setitem__(x, y).
The documentation on Python data models will contain more information on which methods you need to implement in order to make custom data structures in Python.

extending built-in python dict class

I want to create a class that would extend dict's functionalities. This is my code so far:
class Masks(dict):
def __init__(self, positive=[], negative=[]):
self['positive'] = positive
self['negative'] = negative
I want to have two predefined arguments in the constructor: a list of positive and negative masks. When I execute the following code, I can run
m = Masks()
and a new masks-dictionary object is created - that's fine. But I'd like to be able to create this masks objects just like I can with dicts:
d = dict(one=1, two=2)
But this fails with Masks:
>>> n = Masks(one=1, two=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'two'
I should call the parent constructor init somewhere in Masks.init probably. I tried it with **kwargs and passing them into the parent constructor, but still - something went wrong. Could someone point me on what should I add here?
You must call the superclass __init__ method. And if you want to be able to use the Masks(one=1, ..) syntax then you have to use **kwargs:
In [1]: class Masks(dict):
...: def __init__(self, positive=(), negative=(), **kwargs):
...: super(Masks, self).__init__(**kwargs)
...: self['positive'] = list(positive)
...: self['negative'] = list(negative)
...:
In [2]: m = Masks(one=1, two=2)
In [3]: m['one']
Out[3]: 1
A general note: do not subclass built-ins!!!
It seems an easy way to extend them but it has a lot of pitfalls that will bite you at some point.
A safer way to extend a built-in is to use delegation, which gives better control on the subclass behaviour and can avoid many pitfalls of inheriting the built-ins. (Note that implementing __getattr__ it's possible to avoid reimplementing explicitly many methods)
Inheritance should be used as a last resort when you want to pass the object into some code that does explicit isinstance checks.
Since all you want is a regular dict with predefined entries, you can use a factory function.
def mask(*args, **kw):
"""Create mask dict using the same signature as dict(),
defaulting 'positive' and 'negative' to empty lists.
"""
d = dict(*args, **kw)
d.setdefault('positive', [])
d.setdefault('negative', [])

How to implement an array-like property wrapper in python?

I have a class in python that acts as a front-end to a c-library. This library performs simulations and handles very large arrays of data. This library passes forward a ctype array and my wrapper converts it into a proper numpy.ndarray.
class SomeClass(object):
#property
def arr(self):
return numpy.array(self._lib.get_arr())
However, in order to make sure that memory problems don't occur, I keep the ndarray data separate from the library data, so changing the ndarray does not cause a change in the true array being used by the library. However, I can pass along a new array of the same shape and overwrite the library's held array.
#arr.setter
def arr(self, new_arr):
self._lib.set_arr(new_arr.ctypes)
So, I can interact with the array like so:
x = SomeClass()
a = x.arr
a[0] += 1
x.arr = a
My desire is to simplify this even more by allowing syntax to simply be x.arr[0] += 1, which would be more readable and have less variables. I am not exactly sure how to go about creating such a wrapper (I have very little experience making wrapper classes/functions) that mimics properties but allows item access as my example.
How would I go about making such a wrapper class? Is there a better way to accomplish this goal? If you have any advice or reading that could help I would appreciate it very much.
This could work. Array is a proxy for the Numpy/C array:
class Array(object):
def __init__(self):
#self.__lib = ...
self.np_array = numpy.array(self._lib.get_arr())
def __getitem__(self, key):
self.np_array = numpy.array(self._lib.get_arr())
return self.np_array.__getitem__(key)
def __setitem__(self, key, value):
self.np_array.__setitem__(key, value)
self._lib.set_arr(new_arr.ctypes)
def __getattr__(self, name):
"""Delegate to NumPy array."""
try:
return getattr(self.np_array, name)
except AttributeError:
raise AttributeError(
"'Array' object has no attribute {}".format(name))
Should behave like this:
>>> a = Array()
>>> a[1]
1
>>> a[1] = 10
>>> a[1]
10
The 10 should end up in your C array too.
I think your descriptor should return Instance of list-like class which knows about self._lib and will update it during normal operation append, __setitem__, __getitem__, etc.

Understanding python object membership for sets

If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection.
However, this does not seem to be the case for sets:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data-other.data
a = MyObject(5)
b = MyObject(5)
print a in [b] //evaluates to True, as I'd expect
print a in set([b]) //evaluates to False
How is an object membership tested in a set, then?
Adding a __hash__ method to your class yields this:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data - other.data
def __hash__(self):
return hash(self.data)
a = MyObject(5)
b = MyObject(5)
print a in [b] # True
print a in set([b]) # Also True!
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.
Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).
As others pointed, your objects don't have a __hash__ so they use the default id as a hash, and you can override it as Nathon suggested, BUT read the docs about __hash__, specifically the points about when you should and should not do that.
A set uses a dict behind the scenes, so the "in" statement is checking whether the object exists as a key in the dict. Since your object doesn't implement a hash function, the default hash function for objects uses the object's id. So even though a and b are equivalent, they're not the same object, and that's what's being tested.

Categories

Resources