Pythonic way of adding items to a dict of lists - python

I use this snippet pretty often:
d = {}
for x in some_list:
y = some_func(x) # can be identity
if y in d:
d[y].append(another_func(x))
else:
d[y] = [another_func(x)]
Is this the most pythonic way of doing this or there's a better way? I use Python 3.

you can try
from collections import defaultdict
d = defaultdict(list)
for x in some_list:
d[some_func(x)].append(another_func(x))
The defaultdict is a dictionary option that you init it with the type that you would like to assign in case the key that you are looking for not exists in the keys of the dictionary.
In this case, each time that you will call d if the key doesn't exist it will create an empty list.

Related

Replace collections' defaultdict by a normal dict with setdefault

I often used collections.defaultdict to be able to append an element to d[key] without having to initialize it first to [] (benefit: you don't need to do: if key not in d: d[key] = []):
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j].append(i) # if d[j] does not exist yet, initialize it to [], so we can use append directly
Now I realize we can simply use a normal dict and setdefault:
import random
d = {}
for i in range(100):
j = random.randint(0,20)
d.setdefault(j, []).append(i)
Question: when using a dict whose values are lists, is there a good reason to use a collections.defaultdict instead of the second method (using a simple dict and setdefault), or are they purely equivalent?
collections.defaultdict is generally more performant, it is optimised exactly for this task and C-implemented. However, you should use dict.setdefault if you want accessing an absent key in your resulting dictionary to result in a KeyError rather than inserting an empty list. This is the most important practical difference.
In addition to the answer by Chris_Rands, I want to further emphasize that a primary reason to use defaultdict is if you want key accesses to always succeed, and to insert the default value if there was none.
This can be for any reason, and a completely valid one is the convenience of being able to use [] instead of having to call dict.setdefault before every access.
Also note that key in default_dict will still return False if that key has never been accessed before, so you can still check for existence of keys in a defaultdict if necessary. This allows appending to the lists without checking for their existence, but also checking for the existence of the lists if necessary.
When using defaultdict you have a possibility to do inplace addition:
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j] += [i]
There is no equivalent construction like d.setdefault(j, []) += [i], it gives SyntaxError: cannot assign to function call.

python dictionary keyError

New to python and what looks like simple doable piece of code yielding KeyError:
patt=list('jkasb')
dict={}
for i in patt:
dict[i]= 1 if dict[i] is None else dict[i]+1 # This line throws error
error: KeyError: 'j'
In your case, the KeyError is occurring because you are trying to access a key which is not in the dictionary. Initially, the dictionary is empty. So, none of the keys exist in it.
This may seem strange if you are coming from a C++ background as C++ maps give default values for keys that don't exist yet. You can get the same behavior in python by using collections.defaultdict. The modified code is given below. I took the liberty of converting the defaultdict to a regular dictionary at the end of the code:
from collections import defaultdict
patt='jkasb'
my_default_dict=defaultdict(int)
for i in patt:
my_default_dict[i]+=1
my_dict = dict(my_default_dict) # converting the defaultdict to a regular dictionary
You can also solve this problem in a number of other ways. I am showing some of them below:
By checking if the key exists in the dictionary:
patt='jkasb'
my_dict={}
for i in patt:
my_dict[i]= 1 if i not in my_dict else my_dict[i]+1 # checking if i exists in dict
Using dict.get() without default return values:
patt='jkasb'
my_dict={}
for i in patt:
my_dict[i]= 1 if my_dict.get(i) is None else my_dict[i]+1 # using dict.get
print(my_dict)
Using dict.get() with default return values:
patt='jkasb'
my_dict={}
for i in patt:
my_dict[i]= my_dict.get(i, 0)+1 # using dict.get with default return value 0
As your code is actually just counting the frequency of each character, you can also use collections.Counter and then convert it to a dictionary:
from collections import Counter
patt='jkasb'
character_counter = Counter(patt)
my_dict = dict(character_counter)
Also, as dict is a built-in data type and I used dict to convert the defaultdict and Counter to a normal dictionary, I changed the name of the dictionary from dict to my_dict.
While building the dict dict, dict[i] is trying to access a key which does not exist yet, in order to check if a key exists in a dictionary, use the in operator instead:
d[i] = 1 if i not in d else d[i] + 1
Alternatives (for what you're trying to accomplish):
Using dict.get:
d[i] = d.get(i, 0) + 1
Using collections.defaultdict:
from collections import defaultdict
d = defaultdict(int)
for i in 'jkasb':
d[i] += 1
Using collections.Counter:
from collections import Counter
d = Counter('jkasb')
Avoid using dict (built-in type) as a variable name. And just iterate over 'jkasb' without having to convert it to a list, strings are iterable too.
As your dict is initially empty, trying to access any value with dict[i] will throw a KeyError.
You should replace this with .get() which returns None if the key is not found:
for i in patt:
dict[i] = 1 if dict.get(i) is None else dict[i] + 1
Another alternative, as suggested by #snakecharmerb, is to check beforehand whether or not the key exists in your dict:
for i in patt:
dict[i] = 1 if i not in dict else dict[i] + 1
Both solutions are equivalent, but the second is maybe more "idiomatic".
These snippets: dict[i] anddict[i]+1 will try to get a value from the dictionary with the corresponding key i. Since you have nothing in your dictionary, you get a KeyError.
you are trying to access a key in an empty dictionary, you can also use defaultdic so you do not care if the key exists already or not:
from collections import defaultdict
patt=list('jkasb')
my_dict = defaultdict(int)
for i in patt:
my_dict[i] += 1

Joining dictionaries in a for loop

I have a simple for loop
for m in my_list:
x = my_function(m)
my_dictionary = {m:x}
my_function() gives me a string. And if I use print my_dictionary at the end i get
{m1: string1}
{m2: string2}
I want to be able to have one dictionary at the end.
I have tried several methods that I found in other threads.
dic = dict(my_dictionary, **my_dictionary)
dic = my_dictionary.copy()
dic.update(my_dictionary)
But overtime I just get the last dictionary instead of all dictionaries.
I wish I could just add them with +, but you can't do that in python
You can use a dict comprehension to create your main dict:
dic = {m : my_function(m) for m in my_list}
Why are you creating separate dictionaries in the first place? Just set the key in an existing dict on each iteration.
my_dictionary = {}
for m in my_list:
x = my_function(m)
my_dictionary[m] = x
Maybe I'm missing something, but isn't your problem just that you want a simple, non-nested dictionary, and you keep overwriting it within the loop? In that case, this small change should suffice:
my_dictionary = {}
for m in my_list:
x = my_function(m)
my_dictionary[m] = x
You can update the dictionary as opposed to overwrite it each time.
my_dictionary = {}
for m in my_list:
x = my_function(m)
my_dictionary.update({m:x})
print my_dictionary
There is no need to recreate a new dictionnary at each loop iteration.
You can either create the dictionnary before and add items to it as you iterate:
my_dict = {}
for m in my_list:
my_dict[m] = my_function(m)
or you can use a dict comprehension:
my_dict = {m:my_function(m) for m in my_list}
or, in python 2:
my_dict = dict((m,my_function(m)) for m in my_list)

Searching key/values with defaultdict

I'm familiar with the use of the iteritems() and items() use with the standard dictionary which can be coupled with a for loop to scan over keys and values. However how can I best do this with the default dict. For example, I'd like to check that a given value does not show up in either the key or any of the values associated with any key. I'm currently trying the following:
for key, val in dic.iteritems():
print key, val
however I get the following:
1 deque([2, 2])
and I have the following declarations for the variables/dictionary
from collections import defaultdict, deque
clusterdict = defaultdict(deque)
So how do I best get at key values? Thanks!
In general, for a defaultdict dd, to check whether a value x is used as a key do this:
x in dd
To check whether x is used as a value do this:
x in dd.itervalues()
In your case (a defaultdict with deques as values), you may want to see whether x is in any of the deques:
any(x in deq for deq in dd.itervalues())
Remember, defaultdicts behave like regular dictionaries except that they create new entries automatically when doing d[k] lookups on missing keys; otherwise, they behave no differently than regular dicts.
If I understood your question:
for key, val in dic.iteritems():
if key!=given_value and not given_value in val:
print "it's not there!"
Unless you meant something else...
I made this for Python 3:
from collections import defaultdict
count_data = defaultdict(int)
count_data[1] = 10
query = 2
if query in count_data.values():
print('yes')
Edit
You can use Counter dictionary too:
from collections import Counter
count_data = Counter()
count_data[1] = 10
query = 2
if query in count_data.values():
print('yes')
stuff = 'value to check'
if not any((suff in key or stuff in value) for key, value in dic.iteritems()):
# do something if stuff not in any key or value
http://docs.python.org/library/collections.html#collections.defaultdict
So you can use iteritems

Efficient way to either create a list, or append to it if one already exists?

I'm going through a whole bunch of tuples with a many-to-many correlation, and I want to make a dictionary where each b of (a,b) has a list of all the a's that correspond to a b. It seems awkward to test for a list at key b in the dictionary, then look for an a, then append a if it's not already there, every single time through the tuple digesting loop; but I haven't found a better way yet. Does one exist? Is there some other way to do this that's a lot prettier?
See the docs for the setdefault() method:
setdefault(key[, default])
If key is
in the dictionary, return its value.
If not, insert key with a value of
default and return default. default
defaults to None.
You can use this as a single call that will get b if it exists, or set b to an empty list if it doesn't already exist - and either way, return b:
>>> key = 'b'
>>> val = 'a'
>>> print d
{}
>>> d.setdefault(key, []).append(val)
>>> print d
{'b': ['a']}
>>> d.setdefault(key, []).append('zee')
>>> print d
{'b': ['a', 'zee']}
Combine this with a simple "not in" check and you've done what you're after in three lines:
>>> b = d.setdefault('b', [])
>>> if val not in b:
... b.append(val)
...
>>> print d
{'b': ['a', 'zee', 'c']}
Assuming you're not really tied to lists, defaultdict and set are quite handy.
import collections
d = collections.defaultdict(set)
for a, b in mappings:
d[b].add(a)
If you really want lists instead of sets, you could follow this with a
for k, v in d.iteritems():
d[k] = list(v)
And if you really want a dict instead of a defaultdict, you can say
d = dict(d)
I don't really see any reason you'd want to, though.
Use collections.defaultdict
your_dict = defaultdict(list)
for (a,b) in your_list:
your_dict[b].append(a)
you can sort your tuples O(n log n) then create your dictionary O(n)
or simplier O(n) but could impose heavy load on memory in case of many tuples:
your_dict = {}
for (a,b) in your_list:
if b in your_dict:
your_dict[b].append(a)
else:
your_dict[b]=[a]
Hmm it's pretty much the same as you've described. What's awkward about that?
You could also consider using an sql database to do the dirty work.
Instead of using an if, AFAIK it is more pythonic to use a try block instead.
your_list=[('a',1),('a',3),('b',1),('f',1),('a',2),('z',1)]
your_dict={}
for (a,b) in your_list:
try:
your_dict[b].append(a)
except KeyError:
your_dict[b]=[a]
print your_dict
I am not sure how you will get out of the key test, but once they key/value pair has been initialized it is easy :)
d = {}
if 'b' not in d:
d['b'] = set()
d['b'].add('a')
The set will ensure that only 1 of 'a' is in the collection. You need to do the initial 'b' check though to make sure the key/value exist.
Dict get method?
It returns the value of my_dict[some_key] if some_key is in the dictionary, and if not - returns some default value ([] in the example below):
my_dict[some_key] = my_dict.get(some_key, []).append(something_else)
There's another way that's rather efficient (though maybe not as efficient as sets) and simple. It's similar in practice to defaultdict but does not require an additional import.
Granted that you have a dict with empty (None) keys, it means you also create the dict keys somewhere. You can do so with the dict.fromkeys method, and this method also allows for setting a default value to all keys.
keylist = ['key1', 'key2']
result = dict.fromkeys(keylist, [])
where result will be:
{'key1': [], 'key2': []}
Then you can do your loop and use result['key1'].append(..) directly

Categories

Resources