I'd like to use instances of any type as a key in a single dict.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
d = {}
add_to_dict('my_str', arbitrary_val)
add_to_dict(my_list, arbitrary_val)
add_to_dict(my_int, arbirtray_val)
my_object = myclass()
my_object.__hash__ = None
add_to_dict(my_object, arbitrary_val)
The above won't work because my_list and my_object can't be hashed.
My first thought was to just pass in the id value of the object using the id() function.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
However, that won't work because id('some string') == id('some string') is not guaranteed to always be True.
My second thought was to test if the object has the __hash__ attribute. If it does, use the object, otherwise, use the id() value.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ my_object if my_object.__hash__ else id(my_object) ] = arbitrary_val
However, since hash() and id() both return int's, I believe I will eventually get a collision.
How can I write add_to_dict(obj, d) above to ensure that no matter what obj is (list, int, str, object, dict), it will correctly set the item in the dictionary and do so without collision?
We could make some kind of dictionary that allows us to insert mutable objects as well:
class DictionaryMutable:
nullobject = object()
def __init__(self):
self._inner_dic = {}
self._inner_list = []
def __getitem__(self, name):
try:
return self._inner_dic[name]
except TypeError:
for key, val in self._inner_list:
if name == key:
return val
raise KeyError(name)
def __setitem__(self, name, value):
try:
self._inner_dic[name] = value
except TypeError:
for elm in self._inner_list:
if name == elm[0]:
elm[1] = value
break
else:
self._inner_list.append([name,value])
# ...
This works as follows: the DictionaryMutable consists out of a dictionary and a list. The dictionary contains the hashable immutable keys, the list contains sublists where each sublist contains two elements: a key and a value.
For each lookup we first attempt to perform a lookup on the dictionary, in case the key name is unhashable, a TypeError will be thrown. In that case we iterate through the list, check if one of the keys matches and return the corresponding value if it does. If no such element exists, we raise a KeyError.
Setting elements works approximately the same way: first we attempt to set the element in the dictionary. If it turns out the key is unhashable, we search linearly through the list and aim to add the element. If that fails, we add it at the end of the list.
This implementation has some major disadvantages:
if the dictionary lookup fails due to the key being unhashable, we will perform linear lookup, this can siginificantly slow down the lookup; and
if you alter an object that is in the dictionary, then the key will be updated, and thus a search for that object will fail. It thus can result in some unpredicted behavior.
This is only a basic implementation. For instance __iter__, etc. need to be implemented as well.
Instead of the id() of the object, you could use the pickled byte stream representation of the object pickle.dumps() returns for it. pickle works with most built-in types, and there are ways to extend it to work with most values it doesn't know how to do automatically.
Note: I used the repr() of the object as its "arbitrary value" in an effort to make it easier to identify them in the output displayed.
try:
import cpickle as pickle
except ModuleNotFoundError:
import pickle
from pprint import pprint
def add_to_dict(d, obj, arbitrary_val='123'):
d[pickle.dumps(obj)] = arbitrary_val
class MyClass: pass
my_string = 'spam'
my_list = [13, 'a']
my_int = 42
my_instance = MyClass()
d = {}
add_to_dict(d, my_string, repr(my_string))
add_to_dict(d, my_list, repr(my_list))
add_to_dict(d, my_int, repr(my_int))
add_to_dict(d, my_instance, repr(my_instance))
pprint(d)
Output:
{b'\x80\x03K*.': '42',
b'\x80\x03X\x04\x00\x00\x00spamq\x00.': "'spam'",
b'\x80\x03]q\x00(K\rX\x01\x00\x00\x00aq\x01e.': "[13, 'a']",
b'\x80\x03c__main__\nMyClass\nq\x00)\x81q\x01.': '<__main__.MyClass object at '
'0x021C1630>'}
Related
I have to convert a bunch of strings into numbers, process the numbers and convert back.
I thought of a map where I will add 2 keys when I've provided string:
Key1: (string, number);
Key2: (number, string).
But this is not optimal in terms of memory.
What I need to archieve in example:
my_cool_class.get('string') # outputs 1
my_cool_class.get(1) # outputs 'string'
Is there better way to do this in python?
Thanks in advance!
You can implement your own twoway dict like
class TwoWayDict(dict):
def __len__(self):
return dict.__len__(self) / 2
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
dict.__setitem__(self, value, key)
my_cool_class = TwoWayDict()
my_cool_class[1] = 'string'
print my_cool_class[1] # 'string'
print my_cool_class['string'] # 1
Instead of allocate another memory for the second dict, you can get the key from the value, consider that it will cost you with run-time.
mydict = {'george':16,'amber':19}
print (mydict.keys()[mydict.values().index(16)])
>>> 'george'
EDIT:
Notice that In Python 3, dict.values() (along with dict.keys() and dict.items()) returns a view, rather than a list. You therefore need to wrap your call to dict.values() in a call to list like so:
mydict = {'george':16,'amber':19}
print (list(mydict.keys())[list(mydict.values()).index(16)])
If optimal memory usage is an issue, you may not want to use Python in the first place. To solve your immediate problem, just add both the string and the number as keys to the dictionary. Remember that only a reference to the original objects will be stored. Additional copies will not be made:
d = {}
s = '123'
n = int(s)
d[s] = n
d[n] = s
Now you can access the value by the opposite key just like you wanted. This method has the advantage of O(1) lookup time.
You can create a dictionary of tuples this way you just need to check against the type of the variable to decide which one you should return.
Example:
class your_cool_class(object):
def __init__(self):
# example of dictionary
self.your_dictionary = {'3': ('3', 3), '4': ('4', 4)}
def get(self, numer):
is_string = isinstanceof(number, str)
number = str(number)
n = self.your_dictionary.get(number)
if n is not None:
return n[0] if is_string else n[1]
>>>> my_cool_class = your_cool_class()
>>>> my_cool_class.get(3)
>>>> '3'
>>>> my_cool_class.get('3')
>>>> 3
I wan't to test the type of key when use __setitem__. But strangely I found some part of code be omitted when use mutiple keys. Here is my test class:
class foo():
def __init__(self):
self.data=[[1,2],[3,4],[5,6]]
def __getitem__(self, key):
return self.data[key]
def __setitem__(self, key, value):
print('Key is {0}, type of key is {1}'.format(key,type(key)))
self.data[key] = value
f = foo()
When use one key it's ok:
>>>f[1] = [0,0]
Key is 1, type of key is <class 'int'>
>>>f[1]
[0, 0]
when use two keys, result is correct, but why nothing be printed out
>>>f[1][1] = 100
>>>f[1][1]
100
I'm new in python any suggestion will appreciated!
f[1][1] = 0 is equivalent to
f.__getitem__(1).__setitem__(1, 0)
It calls __getitem__ on your custom class; and this returns [0, 0] or [3, 4] or whatever was stored in f[1]; in any case this value is a plain Python list; then calls the __setitem__ on this list. list.__setitem__ does not print anything.
f[1] calls your __getitem__ and thus returns a list ([3,4] in the case of the freshly initialized object). Then, the second indexing operation f[1][1] indexes the object that was returned, the list [3,4], which is not an instance of your class, but simply a list type (as returned by your class).
I am using python's set class. The set contains tuples (id,name). Given an id how can I check whether that corresponds to one already in the set and do:
if id is not in the set by searching the tuples
add a new tuple (id,name) in the set
I am using sets because they are supposed to use a hashtable which is more efficient than a list and I am dealing with a lot of data (more than 50GB)
You'll have to loop over all tuples in the set and test each one:
if not any(t[0] == id for t in tuple_set):
tuple_set.add((id, some_name))
The any() function here will iterate over the generator expression given and short-circuit to return True as soon as a match is found.
If your tuples are always going to be unique based on the first element, then you probably want to use a custom class that implements __eq__ and __hash__:
class Entry(object):
__slots__ = ('id', 'name') # save some memory
def __init__(self, id, name):
self.id = id
self.name = name
def __eq__(self, other):
if not isinstance(other, Entry): return NotImplemented
return self.id == other.id
def __hash__(self):
return id(self.id)
def __repr__(self):
return '<{0}({1[0]!r}, {1[1]!r})>'.format(type(self).__name__, self)
def __getitem__(self, index):
return getattr(self, ('id', 'name')[index])
then use those in a set, after which you can use:
if Entry(id, some_name) in entries_set:
Demo:
>>> entries_set = {Entry('foo', 'bar'), Entry('foo', 'baz')}
>>> entries_set
set([<Entry('foo', 'baz')>])
>>> Entry('foo', 'spam') in entries_set
True
Another option is to just map ids to names in a dictionary; dictionaries are sets with values:
id_value_dictionary = {'id1': 'name1', 'id2': 'name2'}
if id not in id_value_dictionary:
id_value_dictionary[id] = some_name
in Python set and dict use a very similar implementation:
Python collections complexity
And they're both backed by an hashtable.
What you'd like to do is not suitable to set; use a dict with "id" as key and "name" as value, and use the setdefault method:
#!/usr/bin/python
d = {"a": 1, "b": 2, "c": 3}
d.setdefault("a", 5) # a will retain its original value
d.setdefault("d", 9) # the d key will be inserted with the passed value
In order to get the key-value tuples as you'd like, you can use the items() or iteritems() methods (which one depends on your requirements, the first creates a list, the second an iterable; the latter is probably better for very large datasets as it uses less memory).
I declare two variables. The first is a dictionary. The second is a list (it is the output of dictionary's '.values()' method).
dictVar={'one':1,'two':2,'three':3}
listVar=dictVar.values()
At this point the content of listVar accurately represents every value stored in dictionary dictVar
Later somewhere down the code the dictionary is updated with a new value:
dictVar['four']=4
Now the content of listVar is "outdated". It does not represent every value stored in dictionary.
In order to keep list updated I have to manually append a new value such as:
dictVar['four']=4
listVar.append(4)
I wonder if there is a way to establish a "live" update between the list variable and dictionary. So every time dictionary is changed the list is updated too.
Use a dictionary view object:
>>> dictVar={'one':1,'two':2,'three':3}
>>> listVar=dictVar.viewvalues()
>>> listVar
dict_values([3, 2, 1])
>>> dictVar['one']=100
>>> listVar
dict_values([3, 2, 100])
>>> dictVar['four']=4
>>> listVar
dict_values([4, 3, 2, 100])
>>> list(listVar)==dictVar.values()
True
Something you could do would be to create a custom class that acts as a wrapper for the dictionary. Whenever you call obj[key] = val, you're implicitly calling that object's __setitem__(self, key, val) method. When you create a custom class, you can overwrite this method to do what you like with it (namely, update an associated list).
Here's a sample class wrapper:
class EnhancedDict(object):
def __init__(self): # The constructor
self.dictVar = {} # Your dictionary
self.listVar = [] # Your list
def __getitem__(self, key): # Equivalent to obj[key]
return self.dictVar[key]
def __setitem__(self, key, val) # Equivalent to obj[key] = val
self.dictVar[key] = val
self.listvar.append(val)
Then the list is automatically updated whenever you add a new item to the dictionary, which you can do easily:
>>> dict_obj = EnhancedDict()
>>> dict_obj["foo"] = "bar" # Automatically updates both the list and the dict
>>> dict_obj["foo"]
'bar'
>>> dict_obj.dictVar
{'foo': 'bar'}
>>> dict_obj.listVar
['bar']
There's also a __delitem__ function you can override to complete the functionality of the class. Lots more information can be found in the docs:
https://docs.python.org/2/reference/datamodel.html
The Dictionary __getitem__ method does not seem to work the same way as it does for List, and it is causing me headaches. Here is what I mean:
If I subclass list, I can overload __getitem__ as:
class myList(list):
def __getitem__(self,index):
if isinstance(index,int):
#do one thing
if isinstance(index,slice):
#do another thing
If I subclass dict, however, the __getitem__ does not expose index, but key instead as in:
class myDict(dict):
def __getitem__(self,key):
#Here I want to inspect the INDEX, but only have access to key!
So, my question is how can I intercept the index of a dict, instead of just the key?
Example use case:
a = myDict()
a['scalar'] = 1 # Create dictionary entry called 'scalar', and assign 1
a['vector_1'] = [1,2,3,4,5] # I want all subsequent vectors to be 5 long
a['vector_2'][[0,1,2]] = [1,2,3] # I want to intercept this and force vector_2 to be 5 long
print(a['vector_2'])
[1,2,3,0,0]
a['test'] # This should throw a KeyError
a['test'][[0,2,3]] # So should this
Dictionaries have no order; there is no index to pass in; this is why Python can use the same syntax ([..]) and the same magic method (__getitem__) for both lists and dictionaries.
When you index a dictionary on an integer like 0, the dictionary treats that like any other key:
>>> d = {'foo': 'bar', 0: 42}
>>> d.keys()
[0, 'foo']
>>> d[0]
42
>>> d['foo']
'bar'
Chained indexing applies to return values; the expression:
a['vector_2'][0, 1, 2]
is executed as:
_result = a['vector_2'] # via a.__getitem__('vector_2')
_result[0, 1, 2] # via _result.__getitem__((0, 1, 2))
so if you want values in your dictionary to behave in a certain way, you must return objects that support those operations.