Object is enumerable but not indexable? - python

Problem summary and question
I'm trying to look at some of the data inside an object that can be enumerated over but not indexed. I'm still newish to python, but I don't understand how this is possible.
If you can enumerate it, why can't you access the index through the same way enumerate does? And if not, is there a way to access the items individually?
The actual example
import tensorflow_datasets as tfds
train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])
(train_data, validation_data), test_data = tfds.load(
name="imdb_reviews",
split=(train_validation_split, tfds.Split.TEST),
as_supervised=True)
Take a select subset of the dataset
foo = train_data.take(5)
I can iterate over foo with enumerate:
[In] for i, x in enumerate(foo):
print(i)
which generates the expected output:
0
1
2
3
4
But then, when I try to index into it foo[0] I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-2acbea6d9862> in <module>
----> 1 foo[0]
TypeError: 'TakeDataset' object does not support indexing

Python only allows these things if the class has methods for them:
__getitem__ is required for the [] syntax.
__iter__ and __next__1 are required to iterate.
Any class can define one without defining the other. __getattr__ is usually not defined if it would be inefficient.
1 __next__ is required on the class returned by __iter__.

This is a result of foo being iterable, but not having a __getitem__ function. You can use itertools.isslice to get the nth element of an iterable like so
import itertools
def nth(iterable, n, default=None):
"Returns the nth item or a default value"
return next(itertools.islice(iterable, n, None), default)

In Python, instances of custom classes can implement enumeration through special (or "dunder") __iter__ method. Perhaps this class implements __iter__ but not __getitem__.
Dunder overview: https://dbader.org/blog/python-dunder-methods
Specs for an __iter__ method: https://docs.python.org/3/library/stdtypes.html#typeiter

Related

Is there a reason why something like `list[]` raises `SyntaxError` in Python?

Let's say that I want to implement my custom list class, and I want to override __getitem__ so that the item parameter can be initialized with a default None, and behave accordingly:
class CustomList(list):
def __init__(self, iterable, default_index):
self.default_index = default_index
super().__init__(iterable)
def __getitem__(self, item=None):
if item is None:
item = self._default_index
return super().__getitem__(item)
iterable = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
my_list = CustomList(iterable, 2)
This allows for my_list[None], but it would be awesome to have something like my_list[] inherently use the default argument.
Unfortunately that raises SyntaxError, so I'm assuming that the statement is illegal at the grammar level...my question is: why? Would it conflict with some other statements?
I'm very curious about this, so thanks a bunch to anyone willing to explain!
Its not syntactically useful. There isn't a useful way to programatically use my_list[] without literally hard-coding it as such. A single piece of code can't sometimes have a variable in the list reference and other times not. In that case, why not just have a different property that gets the default?
#property
def default(self):
return super().__getitem__(self.default)
#property.setter
def default(self, val):
super().__setitem__(self.default, val)
object.__getitem__(self, val) is defined to have a required positional argument. Python is dynamic and so you can get away with changing that call signature, but that doesn't change how all the other code uses it.
All python operators have a magic method behind them and its always the case that the magic method could expose more features than the operator. Why not let + have a default? So, a = b + would be legal. Once again, that would not be syntactically useful - you could just expose a function if you want to do that.
__getitem__ always takes exactly one argument. You can kindof pass multiple arguments, but this actually just converts it into a tuple:
>>> a = []
>>> a[1, 2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not tuple
Note the "not tuple" in the error message.

How to get pyomo VarData/values from slice of indexed variable?

I'm working with pyomo variables indexed by multiple sets. I've created slices along some sets and would like to use these slices to access the variable values, given indices of the sliced-along sets.
Code that I hoped would work is:
from pyomo.environ import *
m = ConcreteModel()
m.time = Set(initialize=[1,2,3])
m.space = Set(initialize=[1,2,3])
m.comp = Set(initialize=['a','b','c'])
m.v = Var(m.time, m.space, m.comp, initialize=1)
slice = m.v[:, :, 'a']
for x in m.space:
value_list = []
for t in m.time:
value_list.append(slice[t, x].value)
# write value_list to csv file
But this gives me:
>>> value_list
[<pyomo.core.base.indexed_component_slice._IndexedComponent_slice object at 0x7f4db9104a58>, <pyomo.core.base.indexed_component_slice._IndexedComponent_slice object at 0x7f4db9104a58>, <pyomo.core.base.indexed_component_slice._IndexedComponent_slice object at 0x7f4db9104a58>]
instead of a list of values, as I hoped.
Is it possible to access values corresponding to variables slices from only the wildcard indices?
I tried using some of the methods of _IndexedComponent_slice, without success. For example:
>>> for item in slice.wildcard_items(): item
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rparker1/Idaes/pyomo/pyomo/core/base/indexed_component_slice.py", line 194, in <genexpr>
return ((_iter.get_last_index_wildcards(), _) for _ in _iter)
File "/home/rparker1/Idaes/pyomo/pyomo/core/base/indexed_component_slice.py", line 350, in __next__
_comp = _comp.__getitem__( _call[1] )
AttributeError: '_GeneralVarData' object has no attribute '__getitem__'
I would expect some method to give me a dictionary mapping wildcard indices to vardata objects, but could not find one. Any help finding such a dictionary or other solution is much appreciated.
_IndexedComponent_slice objects are a bit tricky, in that they are designed to work with hierarchical models. As such, they should be thought of as more of a special iterator and not as a view into a dictionary. In particular, these "slice-like" objects defer the resolution of __getitem__, __getattr__, and __call__ until iteration time. So, when you say slice.value, that attribute lookup doesn't actually occur until you iterate over the slice.
The easiest way to get the variable values is to iterate over the slice:
value_list = list(m.v[:, :, 'a'].value)
If you want a new component that you can treat in a dictionary-like manner (just like the original Var), then you want to create a Reference component using the slice:
r = Reference(m.v[:, :, 'a'])
These can be attached to a model like any other component, and (for regular slices) will adopt the ctype of the referred-to objects (so in this case, r will look and act just like a Var).

Slicing converts UserList to list

While implementing a custom list (via UserList) I noticed that all slicing operations return a type of list not of the derived class type. This creates an issue that, after slicing, none of the added functionality is available in the object. Here is a quick test program to demonstrate the issue, just note the the actual code is more complicated.
#!/usr/bin/python3
from collections import UserList
class myList(UserList):
def __init__(self, data=None):
super().__init__(data)
def setFunc(self, data):
self.data.extend(data)
def getFunc(self):
return self.data
l1 = myList()
l1.setFunc([1,2,3,4])
print(type(l1))
l2 = l1[:3]
print(type(l2))
print(l2.getFunc())
<class '__main__.myList'>
<class 'list'>
Traceback (most recent call last):
File "./test.py", line 17, in <module>
print(l2.getFunc())
AttributeError: 'list' object has no attribute 'getFunc'
I can overcome this issue by "casting" the list with l2 = myList(l1[:3]) but it seems like the right solution would be to implement this functionality directly in myList.
I'm not certain the correct/most-elegant way to do this. I suspect putting a cast in __getitem__ would work. Is that the best way or is there a more direct change to the slicing that would be preferred? Also, what other methods should I override in order to assure all operations return a myList not a list?
I'm not sure why this isn't the default behavior in UserList but implementing the following in the derived class seems to fix the issue.
def __getitem__(self, i):
new_data = self.data[i]
if type(new_data) == list:
return self.__class__(new_data)
else:
return new_data
The parameter i for __getitem__ apparently can be either a slice object or an integer so new_data will either be a list or a single element. If it's a list put it in the myList container and return. Otherwise, if it's a single element, just pass that back.

Is avoiding expensive __init__ a good reason to use __new__?

In my project, we have a class based on set. It can be initialised from a string, or an iterable (eg tuple) of strings, or other custom classes. When initialised with an iterable it converts each item to a particular custom class if it is not one already.
Because it can be initialised from a variety of data structures a lot of the methods that operate on this class (such as __and__) are liberal in what they accept and just convert their arguments to this class (ie initialise a new instance). We are finding this is rather slow, when the argument is already an instance of the class, and has a lot of members (it is iterating through them all and checking that they are the right type).
I was thinking that to avoid this, we could add a __new__ method to the class and just if the argument passed in is already an instance of the class, return it directly. Would this be a reasonable use of __new__?
Adding a __new__ method will not solve your problem. From the documentation for __new__:
If __new__() returns an instance of cls, then the new instance’s
__init__() method will be invoked like __init__(self[, ...]),
where self is the new instance and the remaining arguments are the
same as were passed to __new__().
In otherwords, returning the same instance will not prevent python from calling __init__.
You can verify this quite easily:
In [20]: class A:
...: def __new__(cls, arg):
...: if isinstance(arg, cls):
...: print('here')
...: return arg
...: return super().__new__(cls)
...: def __init__(self, values):
...: self.values = list(values)
In [21]: a = A([1,2,3])
In [22]: A(a)
here
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-c206e38274e0> in <module>()
----> 1 A(a)
<ipython-input-20-5a7322f37287> in __init__(self, values)
6 return super().__new__(cls)
7 def __init__(self, values):
----> 8 self.values = list(values)
TypeError: 'A' object is not iterable
You may be able to make this work if you did not implement __init__ at all, but only __new__. I believe this is what tuple does.
Also that behaviour would be acceptable only if your class is immutable (e.g. tuple does this), because the result would be sensible. If it is mutable you are asking for hidden bugs.
A more sensible approach is to do what set does: __*__ operations operate only on sets, however set also provides named methods that work with any iterable:
In [30]: set([1,2,3]) & [1,2]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-dfd866b6c99b> in <module>()
----> 1 set([1,2,3]) & [1,2]
TypeError: unsupported operand type(s) for &: 'set' and 'list'
In [31]: set([1,2,3]) & set([1,2])
Out[31]: {1, 2}
In [32]: set([1,2,3]).intersection([1,2])
Out[32]: {1, 2}
In this way the user can choose between speed and flexibility of the API.
A simpler approach is the one proposed by unutbu: use isinstance instead of duck-typing when implementing the operations.

extending built-in python dict class

I want to create a class that would extend dict's functionalities. This is my code so far:
class Masks(dict):
def __init__(self, positive=[], negative=[]):
self['positive'] = positive
self['negative'] = negative
I want to have two predefined arguments in the constructor: a list of positive and negative masks. When I execute the following code, I can run
m = Masks()
and a new masks-dictionary object is created - that's fine. But I'd like to be able to create this masks objects just like I can with dicts:
d = dict(one=1, two=2)
But this fails with Masks:
>>> n = Masks(one=1, two=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'two'
I should call the parent constructor init somewhere in Masks.init probably. I tried it with **kwargs and passing them into the parent constructor, but still - something went wrong. Could someone point me on what should I add here?
You must call the superclass __init__ method. And if you want to be able to use the Masks(one=1, ..) syntax then you have to use **kwargs:
In [1]: class Masks(dict):
...: def __init__(self, positive=(), negative=(), **kwargs):
...: super(Masks, self).__init__(**kwargs)
...: self['positive'] = list(positive)
...: self['negative'] = list(negative)
...:
In [2]: m = Masks(one=1, two=2)
In [3]: m['one']
Out[3]: 1
A general note: do not subclass built-ins!!!
It seems an easy way to extend them but it has a lot of pitfalls that will bite you at some point.
A safer way to extend a built-in is to use delegation, which gives better control on the subclass behaviour and can avoid many pitfalls of inheriting the built-ins. (Note that implementing __getattr__ it's possible to avoid reimplementing explicitly many methods)
Inheritance should be used as a last resort when you want to pass the object into some code that does explicit isinstance checks.
Since all you want is a regular dict with predefined entries, you can use a factory function.
def mask(*args, **kw):
"""Create mask dict using the same signature as dict(),
defaulting 'positive' and 'negative' to empty lists.
"""
d = dict(*args, **kw)
d.setdefault('positive', [])
d.setdefault('negative', [])

Categories

Resources