__getitem__ for a list vs a dict - python

The Dictionary __getitem__ method does not seem to work the same way as it does for List, and it is causing me headaches. Here is what I mean:
If I subclass list, I can overload __getitem__ as:
class myList(list):
def __getitem__(self,index):
if isinstance(index,int):
#do one thing
if isinstance(index,slice):
#do another thing
If I subclass dict, however, the __getitem__ does not expose index, but key instead as in:
class myDict(dict):
def __getitem__(self,key):
#Here I want to inspect the INDEX, but only have access to key!
So, my question is how can I intercept the index of a dict, instead of just the key?
Example use case:
a = myDict()
a['scalar'] = 1 # Create dictionary entry called 'scalar', and assign 1
a['vector_1'] = [1,2,3,4,5] # I want all subsequent vectors to be 5 long
a['vector_2'][[0,1,2]] = [1,2,3] # I want to intercept this and force vector_2 to be 5 long
print(a['vector_2'])
[1,2,3,0,0]
a['test'] # This should throw a KeyError
a['test'][[0,2,3]] # So should this

Dictionaries have no order; there is no index to pass in; this is why Python can use the same syntax ([..]) and the same magic method (__getitem__) for both lists and dictionaries.
When you index a dictionary on an integer like 0, the dictionary treats that like any other key:
>>> d = {'foo': 'bar', 0: 42}
>>> d.keys()
[0, 'foo']
>>> d[0]
42
>>> d['foo']
'bar'
Chained indexing applies to return values; the expression:
a['vector_2'][0, 1, 2]
is executed as:
_result = a['vector_2'] # via a.__getitem__('vector_2')
_result[0, 1, 2] # via _result.__getitem__((0, 1, 2))
so if you want values in your dictionary to behave in a certain way, you must return objects that support those operations.

Related

How to use any value as a dictionary key?

I'd like to use instances of any type as a key in a single dict.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
d = {}
add_to_dict('my_str', arbitrary_val)
add_to_dict(my_list, arbitrary_val)
add_to_dict(my_int, arbirtray_val)
my_object = myclass()
my_object.__hash__ = None
add_to_dict(my_object, arbitrary_val)
The above won't work because my_list and my_object can't be hashed.
My first thought was to just pass in the id value of the object using the id() function.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
However, that won't work because id('some string') == id('some string') is not guaranteed to always be True.
My second thought was to test if the object has the __hash__ attribute. If it does, use the object, otherwise, use the id() value.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ my_object if my_object.__hash__ else id(my_object) ] = arbitrary_val
However, since hash() and id() both return int's, I believe I will eventually get a collision.
How can I write add_to_dict(obj, d) above to ensure that no matter what obj is (list, int, str, object, dict), it will correctly set the item in the dictionary and do so without collision?
We could make some kind of dictionary that allows us to insert mutable objects as well:
class DictionaryMutable:
nullobject = object()
def __init__(self):
self._inner_dic = {}
self._inner_list = []
def __getitem__(self, name):
try:
return self._inner_dic[name]
except TypeError:
for key, val in self._inner_list:
if name == key:
return val
raise KeyError(name)
def __setitem__(self, name, value):
try:
self._inner_dic[name] = value
except TypeError:
for elm in self._inner_list:
if name == elm[0]:
elm[1] = value
break
else:
self._inner_list.append([name,value])
# ...
This works as follows: the DictionaryMutable consists out of a dictionary and a list. The dictionary contains the hashable immutable keys, the list contains sublists where each sublist contains two elements: a key and a value.
For each lookup we first attempt to perform a lookup on the dictionary, in case the key name is unhashable, a TypeError will be thrown. In that case we iterate through the list, check if one of the keys matches and return the corresponding value if it does. If no such element exists, we raise a KeyError.
Setting elements works approximately the same way: first we attempt to set the element in the dictionary. If it turns out the key is unhashable, we search linearly through the list and aim to add the element. If that fails, we add it at the end of the list.
This implementation has some major disadvantages:
if the dictionary lookup fails due to the key being unhashable, we will perform linear lookup, this can siginificantly slow down the lookup; and
if you alter an object that is in the dictionary, then the key will be updated, and thus a search for that object will fail. It thus can result in some unpredicted behavior.
This is only a basic implementation. For instance __iter__, etc. need to be implemented as well.
Instead of the id() of the object, you could use the pickled byte stream representation of the object pickle.dumps() returns for it. pickle works with most built-in types, and there are ways to extend it to work with most values it doesn't know how to do automatically.
Note: I used the repr() of the object as its "arbitrary value" in an effort to make it easier to identify them in the output displayed.
try:
import cpickle as pickle
except ModuleNotFoundError:
import pickle
from pprint import pprint
def add_to_dict(d, obj, arbitrary_val='123'):
d[pickle.dumps(obj)] = arbitrary_val
class MyClass: pass
my_string = 'spam'
my_list = [13, 'a']
my_int = 42
my_instance = MyClass()
d = {}
add_to_dict(d, my_string, repr(my_string))
add_to_dict(d, my_list, repr(my_list))
add_to_dict(d, my_int, repr(my_int))
add_to_dict(d, my_instance, repr(my_instance))
pprint(d)
Output:
{b'\x80\x03K*.': '42',
b'\x80\x03X\x04\x00\x00\x00spamq\x00.': "'spam'",
b'\x80\x03]q\x00(K\rX\x01\x00\x00\x00aq\x01e.': "[13, 'a']",
b'\x80\x03c__main__\nMyClass\nq\x00)\x81q\x01.': '<__main__.MyClass object at '
'0x021C1630>'}

Why make lists unhashable?

A common issue on SO is removing duplicates from a list of lists. Since lists are unhashable, set([[1, 2], [3, 4], [1, 2]]) throws TypeError: unhashable type: 'list'. Answers to this kind of question usually involve using tuples, which are immutable and therefore hashable.
This answer to What makes lists unhashable? include the following:
If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.
but I don't quite understand because other types that can be used for dictionary keys can be changed without issue:
>>> d = {}
>>> a = 1234
>>> d[a] = 'foo'
>>> a += 1
>>> d[a] = 'bar'
>>> d
{1234: 'foo', 1235: 'bar'}
It is obvious that if the value of a changes, it will hash to a different location in the dictionary. Why is the same assumption dangerous for a list? Why is the following an unsafe method for hashing a list, since it is what we all use when we need to anyway?
>>> class my_list(list):
... def __hash__(self):
... return tuple(self).__hash__()
...
>>> a = my_list([1, 2])
>>> b = my_list([3, 4])
>>> c = my_list([1, 2])
>>> foo = [a, b, c]
>>> foo
[[1, 2], [3, 4], [1, 2]]
>>> set(foo)
set([[1, 2], [3, 4]])
It seems that this solves the set() problem, why is this an issue? Lists may be mutable, but they are ordered which seems like it would be all that's needed for hashing.
You seem to confuse mutability with rebinding. a += 1 assigns a new object, the int object with the numeric value 1235, to a. Under the hood, for immutable objects like int, a += 1 is just the same as a = a + 1.
The original 1234 object is not mutated. The dictionary is still using an int object with numeric value 1234 as the key. The dictionary still holds a reference to that object, even though a now references a different object. The two references are independent.
Try this instead:
>>> class BadKey:
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return other == self.value
... def __hash__(self):
... return hash(self.value)
... def __repr__(self):
... return 'BadKey({!r})'.format(self.value)
...
>>> badkey = BadKey('foo')
>>> d = {badkey: 42}
>>> badkey.value = 'bar'
>>> print(d)
{BadKey('bar'): 42}
Note that I altered the attribute value on the badkey instance. I didn't even touch the dictionary. The dictionary reflects the change; the actual key value itself was mutated, the object that both the name badkey and the dictionary reference.
However, you now can't access that key anymore:
>>> badkey in d
False
>>> BadKey('bar') in d
False
>>> for key in d:
... print(key, key in d)
...
BadKey('bar') False
I have thoroughly broken my dictionary, because I can no longer reliably locate the key.
That's because BadKey violates the principles of hashability; that the hash value must remain stable. You can only do that if you don't change anything about the object that the hash is based on. And the hash must be based on whatever makes two instances equal.
For lists, the contents make two list objects equal. And you can change those, so you can't produce a stable hash either.

Strange thing when Python __setitem__ use multiple key

I wan't to test the type of key when use __setitem__. But strangely I found some part of code be omitted when use mutiple keys. Here is my test class:
class foo():
def __init__(self):
self.data=[[1,2],[3,4],[5,6]]
def __getitem__(self, key):
return self.data[key]
def __setitem__(self, key, value):
print('Key is {0}, type of key is {1}'.format(key,type(key)))
self.data[key] = value
f = foo()
When use one key it's ok:
>>>f[1] = [0,0]
Key is 1, type of key is <class 'int'>
>>>f[1]
[0, 0]
when use two keys, result is correct, but why nothing be printed out
>>>f[1][1] = 100
>>>f[1][1]
100
I'm new in python any suggestion will appreciated!
f[1][1] = 0 is equivalent to
f.__getitem__(1).__setitem__(1, 0)
It calls __getitem__ on your custom class; and this returns [0, 0] or [3, 4] or whatever was stored in f[1]; in any case this value is a plain Python list; then calls the __setitem__ on this list. list.__setitem__ does not print anything.
f[1] calls your __getitem__ and thus returns a list ([3,4] in the case of the freshly initialized object). Then, the second indexing operation f[1][1] indexes the object that was returned, the list [3,4], which is not an instance of your class, but simply a list type (as returned by your class).

How to make live connection between two objects

I declare two variables. The first is a dictionary. The second is a list (it is the output of dictionary's '.values()' method).
dictVar={'one':1,'two':2,'three':3}
listVar=dictVar.values()
At this point the content of listVar accurately represents every value stored in dictionary dictVar
Later somewhere down the code the dictionary is updated with a new value:
dictVar['four']=4
Now the content of listVar is "outdated". It does not represent every value stored in dictionary.
In order to keep list updated I have to manually append a new value such as:
dictVar['four']=4
listVar.append(4)
I wonder if there is a way to establish a "live" update between the list variable and dictionary. So every time dictionary is changed the list is updated too.
Use a dictionary view object:
>>> dictVar={'one':1,'two':2,'three':3}
>>> listVar=dictVar.viewvalues()
>>> listVar
dict_values([3, 2, 1])
>>> dictVar['one']=100
>>> listVar
dict_values([3, 2, 100])
>>> dictVar['four']=4
>>> listVar
dict_values([4, 3, 2, 100])
>>> list(listVar)==dictVar.values()
True
Something you could do would be to create a custom class that acts as a wrapper for the dictionary. Whenever you call obj[key] = val, you're implicitly calling that object's __setitem__(self, key, val) method. When you create a custom class, you can overwrite this method to do what you like with it (namely, update an associated list).
Here's a sample class wrapper:
class EnhancedDict(object):
def __init__(self): # The constructor
self.dictVar = {} # Your dictionary
self.listVar = [] # Your list
def __getitem__(self, key): # Equivalent to obj[key]
return self.dictVar[key]
def __setitem__(self, key, val) # Equivalent to obj[key] = val
self.dictVar[key] = val
self.listvar.append(val)
Then the list is automatically updated whenever you add a new item to the dictionary, which you can do easily:
>>> dict_obj = EnhancedDict()
>>> dict_obj["foo"] = "bar" # Automatically updates both the list and the dict
>>> dict_obj["foo"]
'bar'
>>> dict_obj.dictVar
{'foo': 'bar'}
>>> dict_obj.listVar
['bar']
There's also a __delitem__ function you can override to complete the functionality of the class. Lots more information can be found in the docs:
https://docs.python.org/2/reference/datamodel.html

Attach functions to array elements?

Is there a way to attach a function (same function) to all the elements of an array without looping through and attaching it one by one?
So like
# create function foo from some computation
foo # some def
# list
objects # list of objects
# attach same foo function to all elements of objects
# maybe using a decorator?
# loop through list to execute foo
for obj in objects:
obj.foo()
Let me explain this more:
Of course I can just assign the value of an object like
obj.attr = value
or for an object list:
for obj in objects:
obj.attr = value
What I am trying to avoid is setting the value of an attr on each single object, but rather applying a function on the entire list/array and each element would execute that function.
You could make a function to wrap it up:
def for_each(l, f):
for item in l:
f(item)
Then for a function foo you could do this:
for_each(objects, foo)
For a method foo you could do this:
for_each(objects, lambda item: item.foo())
Or this:
from operator import methodcaller
for_each(objects, methodcaller('foo'))
In Python 2, you can also use map:
map(foo, objects)
For Python 3, you'll have to wrap that in list(...). In either version, you can use list comprehensions:
[foo(item) for item in objects]
However, if you're calling the function just for its side effect rather than transforming the list somehow, I'd recommend against these last two ways as it's against the Zen of Python:
Explicit is better than implicit.
And frankly, one more line for a for loop isn't that much.
You can use map. It is generally used to create a second list, and will return that value, but you can just ignore it.
map(lambda x: x.foo(), objects)
use numpy vectorize! It will work perfectly for you!
from numpy import np
def fun(x):
#do something
array = np.array(your_list)
vectfun = np.vectorize(fun)
answer = vectfun(array)
So now answer will be a resulting array consisting of all the items in the previous list with the function applied to them!! Here is an example:
>>> your_list = [1,2,3,4,5]
>>> def fun(x):
... return x**x
...
>>> array = np.array((your_list))
>>> vectfun = np.vectorize(fun)
>>> answer = vectfun(array)
>>> answer
array([ 1, 4, 27, 256, 3125])

Categories

Resources