How do I use setdefault in python for nested dictionary structures.
eg..
self.table[field] = 0
self.table[date] = []
self.table[value] = {}
I would like to setdefault for these.
Assuming self.table is a dict, you could use
self.table.setdefault(field,0)
The rest are all similar. Note that if self.table already has a key field, then the value associated with that key is returned. Only if there is no key field is self.table[field] set to 0.
Edit: Perhaps this is closer to what you want:
import collections
class Foo(object):
def __init__(self):
self.CompleteAnalysis=collections.defaultdict(
lambda: collections.defaultdict(list))
def getFilledFields(self,sentence):
field, field_value, field_date = sentence.split('|')
field_value = field_value.strip('\n')
field_date = field_date.strip('\n')
self.CompleteAnalysis[field]['date'].append(field_date)
self.CompleteAnalysis[field]['value'].append(field_value)
foo=Foo()
foo.getFilledFields('A|1|2000-1-1')
foo.getFilledFields('A|2|2000-1-2')
print(foo.CompleteAnalysis['A']['date'])
# ['2000-1-1', '2000-1-2']
print(foo.CompleteAnalysis['A']['value'])
# ['1', '2']
Instead of keeping track of the count, perhaps just take the length of the list:
print(len(foo.CompleteAnalysis['A']['value']))
# 2
Related
I'd like to use instances of any type as a key in a single dict.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
d = {}
add_to_dict('my_str', arbitrary_val)
add_to_dict(my_list, arbitrary_val)
add_to_dict(my_int, arbirtray_val)
my_object = myclass()
my_object.__hash__ = None
add_to_dict(my_object, arbitrary_val)
The above won't work because my_list and my_object can't be hashed.
My first thought was to just pass in the id value of the object using the id() function.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
However, that won't work because id('some string') == id('some string') is not guaranteed to always be True.
My second thought was to test if the object has the __hash__ attribute. If it does, use the object, otherwise, use the id() value.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ my_object if my_object.__hash__ else id(my_object) ] = arbitrary_val
However, since hash() and id() both return int's, I believe I will eventually get a collision.
How can I write add_to_dict(obj, d) above to ensure that no matter what obj is (list, int, str, object, dict), it will correctly set the item in the dictionary and do so without collision?
We could make some kind of dictionary that allows us to insert mutable objects as well:
class DictionaryMutable:
nullobject = object()
def __init__(self):
self._inner_dic = {}
self._inner_list = []
def __getitem__(self, name):
try:
return self._inner_dic[name]
except TypeError:
for key, val in self._inner_list:
if name == key:
return val
raise KeyError(name)
def __setitem__(self, name, value):
try:
self._inner_dic[name] = value
except TypeError:
for elm in self._inner_list:
if name == elm[0]:
elm[1] = value
break
else:
self._inner_list.append([name,value])
# ...
This works as follows: the DictionaryMutable consists out of a dictionary and a list. The dictionary contains the hashable immutable keys, the list contains sublists where each sublist contains two elements: a key and a value.
For each lookup we first attempt to perform a lookup on the dictionary, in case the key name is unhashable, a TypeError will be thrown. In that case we iterate through the list, check if one of the keys matches and return the corresponding value if it does. If no such element exists, we raise a KeyError.
Setting elements works approximately the same way: first we attempt to set the element in the dictionary. If it turns out the key is unhashable, we search linearly through the list and aim to add the element. If that fails, we add it at the end of the list.
This implementation has some major disadvantages:
if the dictionary lookup fails due to the key being unhashable, we will perform linear lookup, this can siginificantly slow down the lookup; and
if you alter an object that is in the dictionary, then the key will be updated, and thus a search for that object will fail. It thus can result in some unpredicted behavior.
This is only a basic implementation. For instance __iter__, etc. need to be implemented as well.
Instead of the id() of the object, you could use the pickled byte stream representation of the object pickle.dumps() returns for it. pickle works with most built-in types, and there are ways to extend it to work with most values it doesn't know how to do automatically.
Note: I used the repr() of the object as its "arbitrary value" in an effort to make it easier to identify them in the output displayed.
try:
import cpickle as pickle
except ModuleNotFoundError:
import pickle
from pprint import pprint
def add_to_dict(d, obj, arbitrary_val='123'):
d[pickle.dumps(obj)] = arbitrary_val
class MyClass: pass
my_string = 'spam'
my_list = [13, 'a']
my_int = 42
my_instance = MyClass()
d = {}
add_to_dict(d, my_string, repr(my_string))
add_to_dict(d, my_list, repr(my_list))
add_to_dict(d, my_int, repr(my_int))
add_to_dict(d, my_instance, repr(my_instance))
pprint(d)
Output:
{b'\x80\x03K*.': '42',
b'\x80\x03X\x04\x00\x00\x00spamq\x00.': "'spam'",
b'\x80\x03]q\x00(K\rX\x01\x00\x00\x00aq\x01e.': "[13, 'a']",
b'\x80\x03c__main__\nMyClass\nq\x00)\x81q\x01.': '<__main__.MyClass object at '
'0x021C1630>'}
I have to convert a bunch of strings into numbers, process the numbers and convert back.
I thought of a map where I will add 2 keys when I've provided string:
Key1: (string, number);
Key2: (number, string).
But this is not optimal in terms of memory.
What I need to archieve in example:
my_cool_class.get('string') # outputs 1
my_cool_class.get(1) # outputs 'string'
Is there better way to do this in python?
Thanks in advance!
You can implement your own twoway dict like
class TwoWayDict(dict):
def __len__(self):
return dict.__len__(self) / 2
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
dict.__setitem__(self, value, key)
my_cool_class = TwoWayDict()
my_cool_class[1] = 'string'
print my_cool_class[1] # 'string'
print my_cool_class['string'] # 1
Instead of allocate another memory for the second dict, you can get the key from the value, consider that it will cost you with run-time.
mydict = {'george':16,'amber':19}
print (mydict.keys()[mydict.values().index(16)])
>>> 'george'
EDIT:
Notice that In Python 3, dict.values() (along with dict.keys() and dict.items()) returns a view, rather than a list. You therefore need to wrap your call to dict.values() in a call to list like so:
mydict = {'george':16,'amber':19}
print (list(mydict.keys())[list(mydict.values()).index(16)])
If optimal memory usage is an issue, you may not want to use Python in the first place. To solve your immediate problem, just add both the string and the number as keys to the dictionary. Remember that only a reference to the original objects will be stored. Additional copies will not be made:
d = {}
s = '123'
n = int(s)
d[s] = n
d[n] = s
Now you can access the value by the opposite key just like you wanted. This method has the advantage of O(1) lookup time.
You can create a dictionary of tuples this way you just need to check against the type of the variable to decide which one you should return.
Example:
class your_cool_class(object):
def __init__(self):
# example of dictionary
self.your_dictionary = {'3': ('3', 3), '4': ('4', 4)}
def get(self, numer):
is_string = isinstanceof(number, str)
number = str(number)
n = self.your_dictionary.get(number)
if n is not None:
return n[0] if is_string else n[1]
>>>> my_cool_class = your_cool_class()
>>>> my_cool_class.get(3)
>>>> '3'
>>>> my_cool_class.get('3')
>>>> 3
I am using python's set class. The set contains tuples (id,name). Given an id how can I check whether that corresponds to one already in the set and do:
if id is not in the set by searching the tuples
add a new tuple (id,name) in the set
I am using sets because they are supposed to use a hashtable which is more efficient than a list and I am dealing with a lot of data (more than 50GB)
You'll have to loop over all tuples in the set and test each one:
if not any(t[0] == id for t in tuple_set):
tuple_set.add((id, some_name))
The any() function here will iterate over the generator expression given and short-circuit to return True as soon as a match is found.
If your tuples are always going to be unique based on the first element, then you probably want to use a custom class that implements __eq__ and __hash__:
class Entry(object):
__slots__ = ('id', 'name') # save some memory
def __init__(self, id, name):
self.id = id
self.name = name
def __eq__(self, other):
if not isinstance(other, Entry): return NotImplemented
return self.id == other.id
def __hash__(self):
return id(self.id)
def __repr__(self):
return '<{0}({1[0]!r}, {1[1]!r})>'.format(type(self).__name__, self)
def __getitem__(self, index):
return getattr(self, ('id', 'name')[index])
then use those in a set, after which you can use:
if Entry(id, some_name) in entries_set:
Demo:
>>> entries_set = {Entry('foo', 'bar'), Entry('foo', 'baz')}
>>> entries_set
set([<Entry('foo', 'baz')>])
>>> Entry('foo', 'spam') in entries_set
True
Another option is to just map ids to names in a dictionary; dictionaries are sets with values:
id_value_dictionary = {'id1': 'name1', 'id2': 'name2'}
if id not in id_value_dictionary:
id_value_dictionary[id] = some_name
in Python set and dict use a very similar implementation:
Python collections complexity
And they're both backed by an hashtable.
What you'd like to do is not suitable to set; use a dict with "id" as key and "name" as value, and use the setdefault method:
#!/usr/bin/python
d = {"a": 1, "b": 2, "c": 3}
d.setdefault("a", 5) # a will retain its original value
d.setdefault("d", 9) # the d key will be inserted with the passed value
In order to get the key-value tuples as you'd like, you can use the items() or iteritems() methods (which one depends on your requirements, the first creates a list, the second an iterable; the latter is probably better for very large datasets as it uses less memory).
I declare two variables. The first is a dictionary. The second is a list (it is the output of dictionary's '.values()' method).
dictVar={'one':1,'two':2,'three':3}
listVar=dictVar.values()
At this point the content of listVar accurately represents every value stored in dictionary dictVar
Later somewhere down the code the dictionary is updated with a new value:
dictVar['four']=4
Now the content of listVar is "outdated". It does not represent every value stored in dictionary.
In order to keep list updated I have to manually append a new value such as:
dictVar['four']=4
listVar.append(4)
I wonder if there is a way to establish a "live" update between the list variable and dictionary. So every time dictionary is changed the list is updated too.
Use a dictionary view object:
>>> dictVar={'one':1,'two':2,'three':3}
>>> listVar=dictVar.viewvalues()
>>> listVar
dict_values([3, 2, 1])
>>> dictVar['one']=100
>>> listVar
dict_values([3, 2, 100])
>>> dictVar['four']=4
>>> listVar
dict_values([4, 3, 2, 100])
>>> list(listVar)==dictVar.values()
True
Something you could do would be to create a custom class that acts as a wrapper for the dictionary. Whenever you call obj[key] = val, you're implicitly calling that object's __setitem__(self, key, val) method. When you create a custom class, you can overwrite this method to do what you like with it (namely, update an associated list).
Here's a sample class wrapper:
class EnhancedDict(object):
def __init__(self): # The constructor
self.dictVar = {} # Your dictionary
self.listVar = [] # Your list
def __getitem__(self, key): # Equivalent to obj[key]
return self.dictVar[key]
def __setitem__(self, key, val) # Equivalent to obj[key] = val
self.dictVar[key] = val
self.listvar.append(val)
Then the list is automatically updated whenever you add a new item to the dictionary, which you can do easily:
>>> dict_obj = EnhancedDict()
>>> dict_obj["foo"] = "bar" # Automatically updates both the list and the dict
>>> dict_obj["foo"]
'bar'
>>> dict_obj.dictVar
{'foo': 'bar'}
>>> dict_obj.listVar
['bar']
There's also a __delitem__ function you can override to complete the functionality of the class. Lots more information can be found in the docs:
https://docs.python.org/2/reference/datamodel.html
how to nest a OrderedDict?
i tried:
table=collections.OrderedDict()
table['E']['a']='abc'
but this shows error.
i tried also:
table=collections.OrderedDict(OrderedDict())
table['E']['a']='abc'
this also shows error.
i tried:
table=collections.OrderedDict()
table['E']=collections.OrderedDict()
table['E']['a']='abc'
this works fine.
in my coding i had to use like this:
table=collections.OrderedDict()
for lhs in left:
table[lhs]=collections.OrderedDict()
for val in terminal:
table[lhs][val]=0
which works fine. but is there any other method. as i read python manages its data structure automatically.
is there anyway to declare a dictionary along with how much nesting it'll be and what will be the data-structures of its nests in one line.
using an extra loop just to declare a dictionary feels like i'm missing something in python.
You can define your own custom subclass of OrderedDict, handle the __missing__ method to support infinite nesting.
from collections import OrderedDict
class MyDict(OrderedDict):
def __missing__(self, key):
val = self[key] = MyDict()
return val
Demo:
>>> d = MyDict()
>>> d['b']['c']['e'] = 100
>>> d['a']['c']['e'] = 100
>>> d.keys()
['b', 'a']
>>> d['a']['d']['e'] = 100
>>> d['a'].keys()
['c', 'd']
If you really want to do it in one line, then this would work
table = collections.OrderedDict([(lhs, collections.OrderedDict(zip(terminal, [0] * len(terminal)))) for lhs in left])
You would be best off (especially if terminal has a lot of members) doing
zipped = zip(terminal, [0] * len(terminal))
table = collections.OrderedDict([(lhs, collections.OrderedDict(zipped)) for lhs in left])
class OrderedDefaultDict(OrderedDict):
def __init__(self, default_factory=None, *args, **kwargs):
super(OrderedDefaultDict, self).__init__(*args, **kwargs)
self.default_factory = default_factory
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
val = self[key] = self.default_factory()
return val
It's simple enough to subclass OrderedDict with defaultdict-like behavior. You can then use an OrderedDefaultDict as follows:
table = OrderedDefaultDict(OrderedDict)
table['a']['b'] = 3