seeking a way to initialize dict in a more efficient way - python

prior={}
conditionProb={}
Counts={}
for i in range(len(trainingData)):
label=trainingLabels[i]
prior[label]+=1
datum=trainingData[i]
for j in range(len(datum)):
Counts[(i,j,label)]+=1
if(datum[j]>0):
conditionProb[(i,j,label)]+=1
when I run this code, it will report a key error because prior do not initialize first so the value is 0. I can initialize these 3 dict by loops but it seems put too many code to do the work. So I am seeking some other way to do this, e.g. override default method in dict? I am not familiar with python. Any idea is appreciated.

You can use defaultdict to initialize keys to 0:
from collections import defaultdict
prior = defaultdict(lambda: 0)
conditionProb = defaultdict(lambda: 0)
Counts = defaultdict(lambda: 0)
for i, (label, data) in enumerate(zip(trainingLabels, trainingData)):
prior[label] += 1
for j,datum in enumerate(data):
Counts[i, j, label] += 1
if datum > 0:
conditionProb[i, j, label] += 1

You can use defaultdict from the collections module. You construct it passing the type of values in there, in this case an int, plus a default value if it's not set (default is 0). Do it like this:
from collections import defaultdict
my_dict = defaultdict(int)
my_dict['foo'] += 2

You can use Counter:
>>> from collections import Counter
>>> c = Counter()
>>> c['a'] += 2
>>> c
Counter({'a': 2})

Related

Most frequent element in Python hashed dictionary

I have the following dictionary:
data = {112: [25083], 25091: [6939], 32261: [9299, 6939, 3462], 32934: [7713, 6762, 6939], 34854: [6939], 56630: [7713]}
I am trying to overcome with the most frequent values. The output has to look like ({value: number, ...}):
{6939:4, 7713:2, 25083:1, 9299:1, 3462:1, 6762:1}
or ({value: keys, ...})
{6939:[25091, 32261, 32934, 34854], 7713:[32934, 56630], 25083:[25083], 9299:[32261], 3462:[32261], 6762:32934 }
I use the script for the normal dictionary, but for unhashed I don't know how to manage it.
k = {}
from collections import defaultdict
for key, val in data.items():
for i in val:
k.setdefault(i, set()).add(k)
You can use Counter and defaultdict:
from collections import Counter, defaultdict
from itertools import chain
data = {112: [25083], 25091: [6939], 32261: [9299, 6939, 3462], 32934: [7713, 6762, 6939], 34854: [6939], 56630: [7713]}
counter = Counter(chain.from_iterable(data.values()))
print(counter) # Counter({6939: 4, 7713: 2, 25083: 1, 9299: 1, 3462: 1, 6762: 1})
data_inverted = defaultdict(list)
for k, vs in data.items():
for v in vs:
data_inverted[v].append(k)
print(data_inverted)
# defaultdict(<class 'list'>,
# {25083: [112],
# 6939: [25091, 32261, 32934, 34854],
# 9299: [32261],
# 3462: [32261],
# 7713: [32934, 56630],
# 6762: [32934]})
Actually, if you are going to get data_inverted anyway, you can use the following after data_inverted (instead of using collections.Counter:
counter = {k: len(v) for k, v in data_inverted.items()}

How to use dictionaries .get method for counting?

So I want to set up a function that takes a string and basically counts how many times a letter is repeating, and I want to do it with dictionaries. I've used an if else statement, but now I want to use the .get method. So far my code looks like this:
def histogram(s):
d = dict()
for c in s:
d.get(c)
d[c] = 1
return d
g = histogram('bronto')
print(g)
This prints:
{'b': 1, 'r': 1, 'o': 1, 'n': 1, 't': 1}
However as you can see there should be 2 o's. I cant do d[c] += 1, because it hasn't been previously declared. How do I get the function to count in the extra letters within that for loop?
That's exactly what collections.Counter is for:
from collections import Counter
g = Counter('bronto')
However if you want to use plain dicts and dict.get you need to process the return value of dict.get, for example with:
d[c] = d.get(c, 0) + 1
You'll want to check if the entry exists in the dictionary before trying to add to it. The simplest extension of what you've written so far is to check each character as you go.
def histogram(s):
d = dict()
for c in s:
if c in d:
d[c] += 1
else:
d[c] = 1
return d
g = histogram('bronto')
print(g)
Apart from the d[c] = d.get(c, 0) + 1, and the Counter solutions, I'd like to point out the existence of defaultdict:
from collections import defaultdict
def histogram(s):
d = defaultdict(int)
for c in s:
d[c] += 1
return d
defaultdict never raises a KeyError. It is initialized with a constructor (a class, or a function). If a key is missing, the constructor will be called without arguments and the returned value will be assigned to the key, before resuming the normal operation.
For this case I'd use Counter, but defaultdict can be useful in more general scenarios.

In Python merge two dictionaries so that their keys are added/subtracted

I've two dictionaries, output of factorint from sympy.ntheory. I need to merge them so that the common keys gets their values summed up, i.e. MergedDict[key] = Dict1[key] + Dict2[key], while unique keys remain same.
Also I need to get a merged dictionary with the common keys being differenced, i.e. MergedDict[key] = Dict1[key] - Dict2[key]. Here Dict2 keys will be always a subset of Dict1 keys, so no problem of negative numbers.
I've tried to follow this question. But I'm unable to make it work. So far my approach has been as follows:
from sympy.ntheory import factorint
from collections import defaultdict
d=factorint(12)
dd = defaultdict(lambda: defaultdict(int))
for key, values_dict in d.items():
for date, integer in values_dict.items():
dd[key] += integer
for n in range(2,6):
u = factorint(n)
for key, values_dict in u.items():
for date, integer in values_dict.items():
dd[key] += integer
It gives the error AttributeError: 'int' object has no attribute 'items'. The code above in only for the summing up part. Yet to do anything on the differencing part, assuming that summing up can be changed to work for differencing in case of common keys.
Not sure what you goal is but factorint gives you key/value pairs of ints so you should be summing the values, you are trying to call items on each val from the dict which is an integer and obviously not going to work:
from sympy.ntheory import factorint
from collections import defaultdict
d=factorint(12)
dd = defaultdict(int)
for key, val in d.items():
dd[key] += val
for n in range(2, 6):
u = factorint(n)
for key, val in u.items():
dd[key] += val
print(dd)
Output:
defaultdict(<type 'int'>, {2: 5, 3: 2, 5: 1})
factorint being a dict cannot have duplicate keys so the first loop cann be done using update:
d = factorint(12)
dd = defaultdict(int)
dd.update(d)
for n in range(2, 6):
u = factorint(n)
for key, val in u.items():
dd[key] += val
It seems that collections.Counter can do most of what you want. It might be as simple as (untested, I do not have sympy installed):
from collections import Counter
cnt1 = Counter(Dict1)
cnt2 = Counter(Dict2)
sum_cnt = cnt1 + cnt2
diff_cnt = cnt1 - cnt2

Python dictionary increment

In Python it's annoying to have to check whether a key is in the dictionary first before incrementing it:
if key in my_dict:
my_dict[key] += num
else:
my_dict[key] = num
Is there a shorter substitute for the four lines above?
An alternative is:
my_dict[key] = my_dict.get(key, 0) + num
You have quite a few options. I like using Counter:
>>> from collections import Counter
>>> d = Counter()
>>> d[12] += 3
>>> d
Counter({12: 3})
Or defaultdict:
>>> from collections import defaultdict
>>> d = defaultdict(int) # int() == 0, so the default value for each key is 0
>>> d[12] += 3
>>> d
defaultdict(<function <lambda> at 0x7ff2fe7d37d0>, {12: 3})
What you want is called a defaultdict
See http://docs.python.org/library/collections.html#collections.defaultdict
transform:
if key in my_dict:
my_dict[key] += num
else:
my_dict[key] = num
into the following using setdefault:
my_dict[key] = my_dict.setdefault(key, 0) + num
There is also a little bit different setdefault way:
my_dict.setdefault(key, 0)
my_dict[key] += num
Which may have some advantages if combined with other logic.
A solution to shorten the condition can be the following sample:
dict = {}
dict['1'] = 10
dict['1'] = dict.get('1', 0) + 1 if '1' in dict else 1
print(dict)
Any one of .get or .setdefault can be used:
.get() give default value passed in the function if there is no valid key
my_dict[key] = my_dict.get(key, 0) + num
.setdefault () create a key with default value passed
my_dict[key] = my_dict.setdefault(key, 0) + num

defaultdict of defaultdict?

Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?
for x in stuff:
d[x.a][x.b] += x.c_int
d needs to be built ad-hoc, depending on x.a and x.b elements.
I could use:
for x in stuff:
d[x.a,x.b] += x.c_int
but then I wouldn't be able to use:
d.keys()
d[x.a].keys()
Yes like this:
defaultdict(lambda: defaultdict(int))
The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn't exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).
If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().
The parameter to the defaultdict constructor is the function which will be called for building new elements. So let's use a lambda !
>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0
Since Python 2.7, there's an even better solution using Counter:
>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})
Some bonus features
>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]
For more information see PyMOTW - Collections - Container data types and Python Documentation - collections
Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:
def ddict():
return defaultdict(ddict)
Usage:
>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
{1: defaultdict(<function ddict at 0x7fcac68bf048>,
{'a': defaultdict(<function ddict at 0x7fcac68bf048>,
{True: 0.5}),
'b': 3})})
I find it slightly more elegant to use partial:
import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)
Of course, this is the same as a lambda.
For reference, it's possible to implement a generic nested defaultdict factory method through:
from collections import defaultdict
from functools import partial
from itertools import repeat
def nested_defaultdict(default_factory, depth=1):
result = partial(defaultdict, default_factory)
for _ in repeat(None, depth - 1):
result = partial(defaultdict, result)
return result()
The depth defines the number of nested dictionary before the type defined in default_factory is used.
For example:
my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')
Others have answered correctly your question of how to get the following to work:
for x in stuff:
d[x.a][x.b] += x.c_int
An alternative would be to use tuples for keys:
d = defaultdict(int)
for x in stuff:
d[x.a,x.b] += x.c_int
# ^^^^^^^ tuple key
The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.

Categories

Resources