Can I have a dictionary with same-name keys? - python

I need to have a dictionary which might have same names for some keys and return a list of values when referencing the key in that case.
For example
print mydict['key']
[1,2,3,4,5,6]

For consistency, you should have the dictionary map keys to lists (or sets) of values, of which some can be empty. There is a nice idiom for this:
from collections import defaultdict
d = defaultdict(set)
d["key"].add(...)
(A defaultdict is like a normal dictionary, but if a key is missing it will call the argument you passed in when you instantiated it and use the result as the default value. So this will automatically create an empty set of values if you ask for a key which isn't already present.)
If you need the object to look more like a dictionary (i.e. to set a value by d["key"] = ...) you can do the following. But this is probably a bad idea, because it goes against the normal Python syntax, and is likely to come back and bite you later. Especially if someone else has to maintain your code.
class Multidict(defaultdict):
def __init__(self):
super(Multidict, self).__init__(set)
def __setitem__(self, key, value):
if isinstance(value, (self.default_factory)): # self.default_factory is `set`
super().__setitem__(key, value)
else:
self[key].append(value)
I haven't tested this.

You can also try paste.util.multidict.MultiDict
$ easy_install Paste
Then:
from paste.util.multidict import MultiDict
d = MultiDict()
d.add('a', 1)
d.add('a', 2)
d.add('b', 3)
d.mixed()
>>> {'a': [1, 2], 'b': 3}
d.getall('a')
>>> [1, 2]
d.getall('b')
>>> [3]
Web frameworks like Pylons are using this library to handle HTTP query string/post data, which can have same-name keys.

You can use:
myDict = {'key': []}
Then during runtime:
if newKey in myDict:
myDict[newKey].append(value)
else:
myDict[newKey] = [value]
Edited as per #Ben's comment:
myDict = {}
myDict.setdefault(newKey, []).append(value)

This is an ideal place to use a defaultdict object from the collections library
from collections import defaultdict
mydict = defaultdict(set)
mydict['key'] += set([1,2,3,4])
mydict['key'] += set([4,5,6])
print(mydict['key'])
returns [1,2,3,4,5,6]
In the case where a key is referenced that has not been implicitly assigned, an empty set is returned.
print(mydict['bad_key'])
returns []
Using setdefault on a dict from the standard library would require a significant change in your syntax when assigning values and can get rather messy. I've never used Multidict, but it also looks like a significant change in the way assignments are made. Using this method, you simply assume that there may already be a value associated with this key in the dictionary and slightly modify your assignment operator by using the '+=' operator when assigning key values.
FYI - I am a big fan of using the NoneType as the default which results in any access of an invalid key returning None. This behaves properly in most cases including iterating and json dumps, but for your specific need the default should be of type set unless you want to enable having duplicate values stored in the key. Then use a list. In fact, anytime you have a homogenous dictionary the default should be of that type.
mydict = defaultdict(lambda: None)

I'm unsatisfied with all the proposed solutions, so this is my solution. This is for Python 3. Code is below.
EXAMPLES
(code is below)
>>> a = MultiDict({0: [0]})
>>> a
MultiDict({0: [0]})
>>> a[0] = (1, 7)
>>> a
MultiDict({0: [1, 7]})
>>> a.add(0, 2)
>>> a
MultiDict({0: [1, 7, 2]})
>>> a.add(1, 2)
>>> a
MultiDict({0: [1, 7, 2], 1: [2]})
>>> a.getfirst(0)
1
>>> a.getfirst(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 61, in getfirst
File "<stdin>", line 17, in __getitem__
KeyError: 3
>>> len(a)
2
>>> tuple(a.items())
((0, [1, 7, 2]), (1, [2]))
>>> tuple(a.values())
([1, 7, 2], [2])
>>> a.get(0)
[1, 7, 2]
>>> tuple(a.multiitems())
((0, 1), (0, 7), (0, 2), (1, 2))
>>> tuple(a.multikeys())
(0, 0, 0, 1)
>>> tuple(a.multivalues())
(1, 7, 2, 2)
>>> a.remove(0, 1)
>>> a
MultiDict({0: [7, 2], 1: [2]})
>>> a.remove(3, 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 53, in remove
File "<stdin>", line 17, in __getitem__
KeyError: 3
>>> a.remove(0, 5)
Traceback (most recent call last):
File "<stdin>", line 53, in remove
ValueError: list.remove(x): x not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 56, in remove
ValueError: No element with value 5 for key 0
>>> b = MultiDict({0: [7, 2], 1: [2]})
>>> b == a
True
>>> c = MultiDict(a)
>>> c
MultiDict({0: [7, 2], 1: [2]})
>>> d = MultiDict({0: 0})
Traceback (most recent call last):
File "<stdin>", line 30, in __init__
TypeError: 'int' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 33, in __init__
TypeError: Values must be iterables, found 'int' for key 0
>>> a.pop(0)
[7, 2]
>>> a
MultiDict({1: [2]})
>>> c.popitem()
(0, [7, 2])
>>> c.setdefault(0, [1])
[1]
>>> c
MultiDict({0: [1], 1: [2]})
>>> c.setdefault(0, [2])
[1]
>>> c
MultiDict({0: [1], 1: [2]})
>>> c.setdefault(3)
[]
>>> c
MultiDict({0: [1], 1: [2], 3: []})
>>> c.getfirst(3)
Traceback (most recent call last):
File "<stdin>", line 61, in getfirst
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 63, in getfirst
IndexError: No values in key 3
>>> c.clear()
>>> c
MultiDict({})
>>> c.update(b)
>>> c
MultiDict({0: [7, 2], 1: [2]})
>>> d = c.copy()
>>> d == c
True
>>> id(d) == id(c)
False
>>> MultiDict.fromkeys((0, 1), [5])
MultiDict({0: [5], 1: [5]})
>>> MultiDict.fromkeys((0, 1))
MultiDict({0: [], 1: []})
CODE
try:
from collections.abc import MutableMapping
except ImportError: # python < 3.3
from collections import MutableMapping
class MultiDict(MutableMapping):
#classmethod
def fromkeys(cls, seq, value=None, *args, **kwargs):
if value is None:
v = []
else:
v = value
return cls(dict.fromkeys(seq, v, *args, **kwargs))
def __setitem__(self, k, v):
self._dict[k] = list(v)
def __getitem__(self, k):
return self._dict[k]
def __iter__(self):
for k in self._dict:
yield k
def __init__(self, *args, **kwargs):
self._dict = dict(*args, **kwargs)
for k, v in self._dict.items():
try:
self._dict[k] = list(v)
except TypeError:
err_str = "Values must be iterables, found '{t}' for key {k}"
raise TypeError(err_str.format(k=k, t=type(v).__name__))
def __delitem__(self, k):
del self._dict[k]
def __len__(self):
return len(self._dict)
def add(self, k, v):
if not k in self:
self[k] = []
self[k].append(v)
def remove(self, k, v):
try:
self[k].remove(v)
except ValueError:
err_str = "No element with value {v} for key {k}"
raise ValueError(err_str.format(v=v, k=k))
def getfirst(self, k):
try:
res = self[k][0]
except IndexError:
raise IndexError("No values in key {k}".format(k=k))
return self[k][0]
def multiitems(self):
for k, v in self.items():
for vv in v:
yield (k, vv)
def multikeys(self):
for k, v in self.items():
for vv in v:
yield k
def multivalues(self):
for v in self.values():
for vv in v:
yield vv
def setdefault(self, k, default=None):
if default is None:
def_val = []
else:
def_val = default
if k not in self:
self[k] = def_val
return self[k]
def copy(self):
return self.__class__(self)
def __repr__(self):
return (
self.__class__.__name__ +
"({{{body}}})".format(body=self._dict)
)
SOME VERBOSE EXPLAINATION
For simplicity, the constructor is the same as dict. All values passed to the constructor, or assigned directly to a key, must be iterables.
All the values of my MultiDict are lists, even if value is only one. This is to avoid confusion.
I added also a remove method to delete a single entry from the MultiDict. Furthermore I added a multiitems, that iters over the couple (key, value) over all the values of the dictionary. multikeys and multivalues are similar.
ALTERNATIVES
You can also use aiohttp, WebOp or Werkzeug implementations of MultiDict.

def toMultiDict(items):
def insertMulti(d, kv):
k, v = kv
d.setdefault(k, []).append(v)
return d
return reduce(insertMulti, [{}] + items)
should create a dict from key to a list of values:
In [28]: toMultiDict(zip([1,2,1], [4,5,6]))
Out[28]: {1: [4, 6], 2: [5]}
I couldn't put insertMulti into a lambda, because the lambda needs to return the dict again.

Related

TypeError: object has no attribute '__getItem__'

I've looked at another post with this heading but I'm puzzled because my values are already integers. I want the script to look at each key's values (an array with multiple values), sort the array by making it a list, and then iterate through the sorted and converted list's values subtracting the first from the second, then the second from the third, and so on, storing the differences in a list.
b = {"a":[5,2,1],"b":[8,4,3]}
for k in b.values():
eVals = []
#print listVals
x = 0
for i in sorted(k):
dif = i[x+1] - i[x]
print dif
eVals.append(dif)
x +=1
Here is the error:
Traceback (most recent call last):
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 654, in run
exec cmd in globals, locals
File "N:\Python\test_dict.py", line 1, in <module>
b = {"a":[5,2,1],"b":[8,4,3]}
TypeError: 'int' object has no attribute '__getitem__'
>>> b = {"a":[5,2,1],"b":[8,4,3]}
>>> for key, value in b.iteritems():
... value.sort()
... value[:] = [cur-prev for cur, prev in zip(value, [0] + value[:-1])]
...
>>> b
{'a': [1, 1, 3], 'b': [3, 1, 4]}
If you have numpy conveniently available, you can do this in a one-liner comprehension:
>>> import numpy as np
>>> b = {"a":[5,2,1],"b":[8,4,3]}
>>> {k: np.diff([0] + sorted(v)) for k, v in b.iteritems()}
{'a': array([1, 1, 3]), 'b': array([3, 1, 4])}
You try to use __getitem__ on an int.
Here's a possible solution
inp = {"a":[5,2,1],"b":[8,4,3]}
out = {}
for key, lis in inp.iteritems():
difLis = []
sLis = sorted(lis)
for i, _ in enumerate(sLis[:-1]):
dif = sLis[i+1] - sLis[i]
print "%d - %d = %d" % (sLis[i+1], sLis[i], dif)
difLis.append(dif)
out[key] = difLis
print out # {'a': [1, 3], 'b': [1, 4]}

Create dictionary of statistics for several lists in Python?

I would like to create some basic statistics for several lists of data and store them in a dictionary:
>>> from statistics import mean,median
>>> a,b,c=[1,2,3],[4,5,6],[7,8,9]
The following list comprehension works and outputs stats for "a":
>>> [eval("{}({})".format(op,a)) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
Assigning the list's variable name (a) to another object (dta) and evaluating "dta" in a list comprehension also works:
>>> dta="a"
>>> [eval("{}({})".format(op,eval("dta"))) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
But when I try to tie this all together in a dictionary comprehension, it does not work:
>>> {k:[eval("{}({})".format(op,eval("k"))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
File "<stdin>", line 1, in <listcomp>
File "<string>", line 1, in <module>
NameError: name 'k' is not defined
My guess is that the eval is processed before the comprehension, which is why 'k' is not yet defined? Any suggestions for how to get this work or a different routine that would accomplish the same output?
Do not quote the k in the inner eval:
{k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
^
Or drop eval altogether:
[[mean(k), median(k), min(k), max(k)] for k in [a, b, c]]
You can do a simple workaround with the keys to change this to a dictionary comprehension.
Try removing the quotation marks around k in your call to eval in the format function.
I ran the following commands:
> from statistics import mean,median
> a,b,c=[1,2,3],[4,5,6],[7,8,9]
> {k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
and got the following output:
{'a': [2.0, 2, 1, 3], 'c': [8.0, 8, 7, 9], 'b': [5.0, 5, 4, 6]}

Cant understand some python tuple syntax

I started learning python today and found this very nice code visualization tool pythontutor.com, the problem is that I still don't quite get some of the syntax on the example code.
def listSum(numbers):
if not numbers:
return 0
else:
(f, rest) = numbers
return f + listSum(rest)
myList = (1, (2, (3, None)))
total = listSum(myList)
What does (f, rest) = numbers means?
It's tuple unpacking.
There needs to be 2 items in the tuple when used in this way. More or less will result in an exception, as shown below.
>>> numbers = (1, 2, 3, 4, 5)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
>>> numbers = (1, 2)
>>> (f, rest) = numbers
>>> print f
1
>>> print rest
2
>>> numbers = (1)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> numbers = (1,)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
Note that (1) and (1, )are syntactically different, with only the latter being a tuple.
See the Python Doc on Tuples and Sequences for more details.
(f, rest) = numbers
unpacks the tuple. That is, it takes the two values stored in numbers and stores them in f and rest, respectively. Note that the number of variables you unpack into must be the same as the number of values in the tuple, or else an exception will be thrown.
Tupple is a data structure in which you can store multiple items under one name.
Lets say that we have a tupple(t) with two items.
Then t[0] = first_item and t[1] = sencond_item
Another way of accessing the tupple item is:
(f, rest) = numbers
In this syntax numbers (tupple) must have 2 items only otherwise it is an exception
f = numbers[0]
rest = numbers[1]

TypeError while representing arbitrary element type in multiprocessing.Array

>>> from multiprocessing import Array, Value
>>> import numpy as np
>>> a = [(i,[]) for i in range(3)]
>>> a
[(0, []), (1, []), (2, [])]
>>> a[0][1].extend(np.array([1,2,3]))
>>> a[1][1].extend(np.array([4,5]))
>>> a[2][1].extend(np.array([6,7,8]))
>>> a
[(0, [1, 2, 3]), (1, [4, 5]), (2, [6, 7, 8])]
Following the python multiprocessing example: def test_sharedvalues(): I am trying to create a shared Proxy object using the below code:
shared_a = [multiprocessing.Array(id, e) for id, e in a]
but it is giving me an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/multiprocessing/__init__.py", line 255, in Array
return Array(typecode_or_type, size_or_initializer, **kwds)
File "/usr/lib64/python2.6/multiprocessing/sharedctypes.py", line 87, in Array
obj = RawArray(typecode_or_type, size_or_initializer)
File "/usr/lib64/python2.6/multiprocessing/sharedctypes.py", line 60, in RawArray
result = _new_value(type_)
File "/usr/lib64/python2.6/multiprocessing/sharedctypes.py", line 36, in _new_value
size = ctypes.sizeof(type_)
TypeError: this type has no size
Ok. The problem is solved
I changed
>>> a = [(i,[]) for i in range(3)]
to
>>> a = [('i',[]) for i in range(3)]
and this solved the TypeError.
Actually, I also found out that I did not necessarily had to use the i as count within range(3) (since Array automatically allows indexing), The 'i' is for c_int typecode under multiprocessing.sharedctypes
Hope this helps.

Why list as key dictionary, will still show itself as tuple as key dictionary

When I define a dictionary which use list as key
collections.defaultdict(list)
When I print it out, it shows itself is using tuple as key.
May I know why?
import collections
tuple_as_dict_key = collections.defaultdict(tuple)
tuple_as_dict_key['abc', 1, 2] = 999
tuple_as_dict_key['abc', 3, 4] = 999
tuple_as_dict_key['abc', 5, 6] = 888
# defaultdict(<type 'tuple'>, {('abc', 5, 6): 888, ('abc', 1, 2): 999, ('abc', 3, 4): 999})
print tuple_as_dict_key
list_as_dict_key = collections.defaultdict(list)
list_as_dict_key['abc', 1, 2] = 999
list_as_dict_key['abc', 3, 4] = 999
list_as_dict_key['abc', 5, 6] = 888
# defaultdict(<type 'list'>, {('abc', 5, 6): 888, ('abc', 1, 2): 999, ('abc', 3, 4): 999})
# Isn't it should be defaultdict(<type 'list'>, {['abc', 5, 6]: 888, ...
print list_as_dict_key
The parameter to defaultdict is not the type of the key, it is a function that creates default data. Your test cases don't exercise this because you're filling the dict with defined values and not using any defaults. If you were to try to get the value list_as_dict_key['abc', 7, 8] it would return an empty list, since that is what you defined as a default value and you never set the value at that index.
When you're adding values to your dictionary you're doing it the same way in both cases and they're treated as a tuple. What you're passing to the constructor is the default value for any keys that are not present. Your default value in this case happens to be of type "type", but that has absolutely nothing to do with how other keys are treated.
There's a nice article explaining the answer to why you can't use a list as key here.
Dictionary keys can only be immutable types. Since a list is a mutable type it must be converted to an immutable type such as a tuple to be used as a dictionary key, and this conversion is being done automatically.
defaultdict is not setting the key as a list. It's setting the default value.
>>> from collections import defaultdict
>>> d1 = collections.defaultdict(list)
>>> d1['foo']
[]
>>> d1['foo'] = 37
>>> d1['foo']
37
>>> d1['bar']
[]
>>> d1['bar'].append(37)
>>> d1['bar']
[37]
The way that you're getting a tuple as the key type is normal dict behaviour:
>>> d2 = dict()
>>> d2[37, 19, 2] = [14, 19]
>>> d2
{(37, 19, 2): [14, 19]}
The way Python works with subscripting is that a is a, a, b is a tuple, a:b is a slice object. See how it works with a list:
>>> mylist = [1, 2, 3]
>>> mylist[4, 5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
It's taken 4, 5 as a tuple. The dict has done the same.

Categories

Resources