Most frequent element in Python hashed dictionary - python

I have the following dictionary:
data = {112: [25083], 25091: [6939], 32261: [9299, 6939, 3462], 32934: [7713, 6762, 6939], 34854: [6939], 56630: [7713]}
I am trying to overcome with the most frequent values. The output has to look like ({value: number, ...}):
{6939:4, 7713:2, 25083:1, 9299:1, 3462:1, 6762:1}
or ({value: keys, ...})
{6939:[25091, 32261, 32934, 34854], 7713:[32934, 56630], 25083:[25083], 9299:[32261], 3462:[32261], 6762:32934 }
I use the script for the normal dictionary, but for unhashed I don't know how to manage it.
k = {}
from collections import defaultdict
for key, val in data.items():
for i in val:
k.setdefault(i, set()).add(k)

You can use Counter and defaultdict:
from collections import Counter, defaultdict
from itertools import chain
data = {112: [25083], 25091: [6939], 32261: [9299, 6939, 3462], 32934: [7713, 6762, 6939], 34854: [6939], 56630: [7713]}
counter = Counter(chain.from_iterable(data.values()))
print(counter) # Counter({6939: 4, 7713: 2, 25083: 1, 9299: 1, 3462: 1, 6762: 1})
data_inverted = defaultdict(list)
for k, vs in data.items():
for v in vs:
data_inverted[v].append(k)
print(data_inverted)
# defaultdict(<class 'list'>,
# {25083: [112],
# 6939: [25091, 32261, 32934, 34854],
# 9299: [32261],
# 3462: [32261],
# 7713: [32934, 56630],
# 6762: [32934]})
Actually, if you are going to get data_inverted anyway, you can use the following after data_inverted (instead of using collections.Counter:
counter = {k: len(v) for k, v in data_inverted.items()}

Related

Given a list of [string, number] tuples, create a dictionary where keys are the first characters of strings and the values are sums of the numbers

I have a list of tuples, where first object is a string and second one is a number. I need to create a dictionary with using first letter of the string as a key and number (or I need to add some numbers if keys will be the same) as a value.
for example:
input
lst = [('Alex', 5), ('Addy', 7), ('Abdul', 2), ('Bob', 6), ('Carl', 8), ('Cal', 4)]
output
dct = {'A': 14, 'B': 6, 'C': 12}
The most simple, straightforward and naive way is:
dct = {}
for k, v lst:
if k in v:
dct[k] += v
else:
dct[k] = v
There are ways to progressively be more clever, the first is probably to use .get with the default:
dct = {}
for k, v in lst:
dct[k] = dct.get(k, 0) + v
Finally, you can use a collections.defaultdict, which takes a "factory" function which will be called if the key is not there, use int as the factor:
from collections import defaultdict
dct = defaultdict(int)
for k, v in lst:
dct[k] += v
NOTE: it is usually safer to create a regular dict out of this, to avoid the default behavior:
dct = dict(dct)
Or even
dct.default_factory = None
Finally, one of the more flexible ways is to create your own dict subclass and use __missing__, this is useful if need access to the key when you are making the default value, so not particularly more helpful here, but for completion's sake:
class AggDict(dict):
def __missing__(self, key):
return 0
dct = AggDict()
for k, v in dct:
dct[k] += v
Use a defaultdict:
dct = defaultdict(int) # default to 0
for name, val in lst:
dct[name[0]] += val
dct = dict(dct) # get rid of default value
You could use Counter from collections to convert the tuples to countable key/values, then use reduce from functools to add them together:
from collections import Counter
from functools import reduce
lst = [('Alex', 5), ('Addy', 7), ('Abdul', 2), ('Bob', 6), ('Carl', 8), ('Cal', 4)]
dst = reduce(Counter.__add__,(Counter({k[:1]:v}) for k,v in lst))
# Counter({'A': 14, 'C': 12, 'B': 6})

Getting values and total keys they are present in python

i have a dic in this form:
dic = {'movie1':{('bob',1),('jim',3),('dianne',4)},
'movie2': {('dianne',1),('bob',3),('asz',4)}}
it consists of of movie_name as keys, and tuples of values with (name,score) for each moviee
i want to convert this into:
{ 'bob': { 'movie1': 1,'movie2': 3},
'jim': {'movie1': 3},
'dianne': {'movie1': 4,'movie2': 1},
'asz': {'movie2: 4} }
i.e movie reviewed by each person, and the score for each movie.
Im lookingo for a function in which i can pass my dictionary 'dic' and it gives this result
What i tried was:
def reviewer_rank(db):
main_l=[]
for i in dic.values():
temp_l = list(i)
temp_l=dict(temp_l).keys()
main_l.extend(temp_l)
return main_l
i was able to get all the names of the dict in the list
You could use a defaultdict to avoid some bumpy code checking for the existence of each name in the result dict:
from collections import defaultdict
d = defaultdict(dict)
for movie, scores in dic.items():
for name, score in scores:
d[name][movie] = score
d
# defaultdict: {'bob': {'movie1': 1, 'movie2': 3}, 'asz': {'movie2': 4}, 'jim': {'movie1': 3}, 'dianne': {'movie1': 4, 'movie2': 1}}
You can iterate over the key-value tuples k,vs of the outer dictionary, and then for every tuple u, s in the value vs, you assign result[u][k] with s.
Since it is not certain that u is already in the result dictionary, we better use a defaultdict:
from collections import defaultdict
result = defaultdict(dict)
for k, vs in dic.items():
for u, s in vs:
result[u][k] = s
You can later cast the result back to a dict, with:
dict(result)
Note however that a defaultdict is a subclass of dict, so all dictionary operations are supported.
Here is a naive solution. It is just a simple case of looping over your input dict restructuring it into a new one with the required structure:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from collections import defaultdict
dic = {'movie1': {('bob', 1), ('jim', 3), ('dianne', 4)},
'movie2': {('dianne', 1), ('bob', 3), ('asz', 4)}}
raters = defaultdict(dict)
for movie, ratings in dic.items():
for rater, rating in ratings:
raters[rater][movie] = rating
Hope this helps!
You can organize your current data structure into tuples, and create a new dict of dict based on that. Do that with a generator, and it will yield one item at a time.
Using namedtuple you can access your data using strings instead of ints, which is more readable.
from collections import defaultdict, namedtuple
entries = namedtuple('Entries', 'name movie score')
new_dic = defaultdict(dict)
keys = (entries(name[0], k, name[1]) for k, v in dic.items() for name in v)
for item in keys:
new_dic[item.name][item.movie] = item.score

Filter out elements that occur less times than a minimum threshold

After trying to count the occurrences of an element in a list using the below code
from collections import Counter
A = ['a','a','a','b','c','b','c','b','a']
A = Counter(A)
min_threshold = 3
After calling Counter on A above, a counter object like this is formed:
>>> A
Counter({'a': 4, 'b': 3, 'c': 2})
From here, how do I filter only 'a' and 'b' using minimum threshold value of 3?
Build your Counter, then use a dict comprehension as a second, filtering step.
{x: count for x, count in A.items() if count >= min_threshold}
# {'a': 4, 'b': 3}
As covered by Satish BV, you can iterate over your Counter with a dictionary comprehension. You could use items (or iteritems for more efficiency and if you're on Python 2) to get a sequence of (key, value) tuple pairs.
And then turn that into a Counter.
my_dict = {k: v for k, v in A.iteritems() if v >= min_threshold}
filteredA = Counter(my_dict)
Alternatively, you could iterate over the original Counter and remove the unnecessary values.
for k, v in A.items():
if v < min_threshold:
A.pop(k)
This looks nicer:
{ x: count for x, count in A.items() if count >= min_threshold }
You could remove the keys from the dictionary that are below 3:
for key, cnts in list(A.items()): # list is important here
if cnts < min_threshold:
del A[key]
Which gives you:
>>> A
Counter({'a': 4, 'b': 3})

Create a list from an existing list of key value pairs in python

I am trying to come up with a neat way of doing this in python.
I have a list of pairs of alphabets and numbers that look like this :
[(a,1),(a,2),(a,3),(b,10),(b,100),(c,99),(d,-1),(d,-2)]
What I want to do is to create a new list for each alphabet and append all the numerical values to it.
So, output should look like:
alist = [1,2,3]
blist = [10,100]
clist = [99]
dlist = [-1,-2]
Is there a neat way of doing this in Python?
from collections import defaultdict
data = [('a',1),('a',2),('a',3),('b',10),('b',100),('c',99),('d',-1),('d',-2)]
if __name__ == '__main__':
result = defaultdict(list)
for alphabet, number in data:
result[alphabet].append(number)
or without collections module:
if __name__ == '__main__':
result = {}
for alphabet, number in data:
if alphabet not in result:
result[alphabet] = [number, ]
continue
result[alphabet].append(number)
But i think, that first solution more effective and clear.
If you want to avoid using a defaultdict but are comfortable using itertools, you can do it with a one-liner
from itertools import groupby
data = [('a',1),('a',2),('a',3),('b',10),('b',100),('c',99),('d',-1),('d',-2)]
grouped = dict((key, list(pair[1] for pair in values)) for (key, values) in groupby(data, lambda pair: pair[0]))
# gives {'b': [10, 100], 'a': [1, 2, 3], 'c': [99], 'd': [-1, -2]}
After seeing the responses in the thread and reading the implementation of defaultdict, I implemented my own version of it since I didn't want to use the collections library.
mydict = {}
for alphabet, value in data:
try:
mydict[alphabet].append(value)
except KeyError:
mydict[alphabet] = []
mydict[alphabet].append(value)
You can use defaultdict from the collections module for this:
from collections import defaultdict
l = [('a',1),('a',2),('a',3),('b',10),('b',100),('c',99),('d',-1),('d',-2)]
d = defaultdict(list)
for k,v in l:
d[k].append(v)
for k,v in d.items():
exec(k + "list=" + str(v))

In Python merge two dictionaries so that their keys are added/subtracted

I've two dictionaries, output of factorint from sympy.ntheory. I need to merge them so that the common keys gets their values summed up, i.e. MergedDict[key] = Dict1[key] + Dict2[key], while unique keys remain same.
Also I need to get a merged dictionary with the common keys being differenced, i.e. MergedDict[key] = Dict1[key] - Dict2[key]. Here Dict2 keys will be always a subset of Dict1 keys, so no problem of negative numbers.
I've tried to follow this question. But I'm unable to make it work. So far my approach has been as follows:
from sympy.ntheory import factorint
from collections import defaultdict
d=factorint(12)
dd = defaultdict(lambda: defaultdict(int))
for key, values_dict in d.items():
for date, integer in values_dict.items():
dd[key] += integer
for n in range(2,6):
u = factorint(n)
for key, values_dict in u.items():
for date, integer in values_dict.items():
dd[key] += integer
It gives the error AttributeError: 'int' object has no attribute 'items'. The code above in only for the summing up part. Yet to do anything on the differencing part, assuming that summing up can be changed to work for differencing in case of common keys.
Not sure what you goal is but factorint gives you key/value pairs of ints so you should be summing the values, you are trying to call items on each val from the dict which is an integer and obviously not going to work:
from sympy.ntheory import factorint
from collections import defaultdict
d=factorint(12)
dd = defaultdict(int)
for key, val in d.items():
dd[key] += val
for n in range(2, 6):
u = factorint(n)
for key, val in u.items():
dd[key] += val
print(dd)
Output:
defaultdict(<type 'int'>, {2: 5, 3: 2, 5: 1})
factorint being a dict cannot have duplicate keys so the first loop cann be done using update:
d = factorint(12)
dd = defaultdict(int)
dd.update(d)
for n in range(2, 6):
u = factorint(n)
for key, val in u.items():
dd[key] += val
It seems that collections.Counter can do most of what you want. It might be as simple as (untested, I do not have sympy installed):
from collections import Counter
cnt1 = Counter(Dict1)
cnt2 = Counter(Dict2)
sum_cnt = cnt1 + cnt2
diff_cnt = cnt1 - cnt2

Categories

Resources