I have a dictionary called wordCounts which maps a word to how many times it occurred, how can I get the top n words in the dict while allowing more than n if there is a tie?
As the previous answer says, you can cast as a Counter to make this dataset easier to deal with.
>>> from collections import Counter
>>> d = {"d":1,"c":2,"a":3,'b':3,'e':0,'f':1}
>>> c = Counter(d)
>>> c
Counter({'b': 3, 'a': 3, 'c': 2, 'f': 1, 'd': 1, 'e': 0})
Counter has a most_common(n) method that will take the n most common elements. Note that it will exclude ties. Therefore:
>>> c.most_common(4)
[('b', 3), ('a', 3), ('c', 2), ('f', 1)]
To include all values equal to the nth element, you can do something like the following, without converting to a Counter. This is pretty messy, but it should do the trick.
from collections import Counter
def most_common_inclusive(freq_dict, n):
# find the nth most common value
nth_most_common = sorted(c.values(), reverse=True)[n-1]
return { k: v for k, v in c.items() if v >= nth_most_common }
You can use as follows:
>>> d = {'b': 3, 'a': 3, 'c': 2, 'f': 1, 'd': 1, 'e': 0}
>>> most_common_inclusive(d, 4)
{'d': 1, 'b': 3, 'c': 2, 'f': 1, 'a': 3}
One solution could be:
from collections import Counter, defaultdict
list_of_words = ['dog', 'cat', 'moo', 'dog', 'pun', 'pun']
def get_n_most_common(n, list_of_words):
ct = Counter(list_of_words)
d = defaultdict(list)
for word, quantity in ct.items():
d[quantity].append(word)
most_common = sorted(d.keys(), reverse= True)
return [(word, val) for val in most_common[:n] for word in d[val]]
And the tests:
>> get_n_most_common(2, list_of_words)
=> [('pun', 2), ('dog', 2), ('moo', 1), ('cat', 1)]
>> get_n_most_common(1, list_of_words)
=> [('pun', 2), ('dog', 2)]
MooingRawr is on the right track, but now we need to just get the top n results
l = []
for i, (word, count) in enumerate(sorted(d.items(), reverse=True, key=lambda x: x[1])):
if i >= n and count<l[-1][1]:
break
l.append((word, count))
Related
I have a list of 90 strings and then an X string. In my code I am comparing x to each of the 90 strings and counting frequency of how many matches in each string in the list. This worked at first but then I realised its printing out the cumulative frequency. I changed my code so that the counter is inside my loop but I realise that now although count is resetting its not actually going through all the list? when I print the length of the list in the end rather than 90 results i'm getting 11? and im not sure why.
iv recreated below:
Just use a Counter object from collections
import collections
l = ['a', 'a', 'b', 'c', 'd', 'c']
c = collections.Counter()
c.update(l)
print(c)
Results in this:
Counter({'a': 2, 'c': 2, 'b': 1, 'd': 1})
The problem here is that you are overwriting your dictionary. When you come across a string with the same number of matches as previous string, you set results[matches] to that string, thus erasing the previous string stored there. The length of result is thus the length of the number of unique numbers of matches. I can’t propose a fix as it is not entirely clear what you are trying to implement here.
It's not completely clear what you want to achieve. Here are a couple of options: With
words = [
'ABCD', 'ABBB', 'ACCC', 'AACC', 'ACDA', 'AACC', 'ABBB', 'ACCC',
'AGGG', 'ABBC', 'BBBA', 'BCCD', 'ABBE', 'ABBE', 'ACDE', 'ACCC'
]
base = 'ACEC'
this
results = [(sum(a == b for a, b in zip(base, word)), word) for word in words]
results in
[(1, 'ABCD'), (1, 'ABBB'), (3, 'ACCC'), (2, 'AACC'), (2, 'ACDA'), (2, 'AACC'),
(1, 'ABBB'), (3, 'ACCC'), (1, 'AGGG'), (2, 'ABBC'), (0, 'BBBA'), (1, 'BCCD'),
(1, 'ABBE'), (1, 'ABBE'), (2, 'ACDE'), (3, 'ACCC')]
and this
results = {word: sum(a == b for a, b in zip(base, word)) for word in words}
results in
{'ABCD': 1, 'ABBB': 1, 'ACCC': 3, 'AACC': 2, 'ACDA': 2, 'AGGG': 1,
'ABBC': 2, 'BBBA': 0, 'BCCD': 1, 'ABBE': 1,'ACDE': 2}
and this
results = {}
for word in words:
results.setdefault(sum(a == b for a, b in zip(base, word)), []).append(word)
results in
{0: ['BBBA'],
1: ['ABCD', 'ABBB', 'ABBB', 'AGGG', 'BCCD', 'ABBE', 'ABBE'],
2: ['AACC', 'ACDA', 'AACC', 'ABBC', 'ACDE'],
3: ['ACCC', 'ACCC', 'ACCC']}
I'd like to implement a Counter which drops the least frequent element when the counter's size going beyond some threshold. For that I need to remove the least frequent element.
What is the fastest way to do that in Python?
I know counter.most_common()[-1], but it creates a whole list and seems slow when done extensively? Is there a better command (or maybe a different data structure)?
You may implement least_common by borrowing implementation of most_common and performing necessary changes.
Refer to collections source in Py2.7:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
To change it in order to retrieve least common we need just a few adjustments.
import collections
from operator import itemgetter as _itemgetter
import heapq as _heapq
class MyCounter(collections.Counter):
def least_common(self, n=None):
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=False) # was: reverse=True
return _heapq.nsmallest(n, self.iteritems(), key=_itemgetter(1)) # was _heapq.nlargest
Tests:
c = MyCounter("abbcccddddeeeee")
assert c.most_common() == c.least_common()[::-1]
assert c.most_common()[-1:] == c.least_common(1)
Since your stated goal is to remove items in the counter below a threshold, just reverse the counter (so the values becomes a list of keys with that value) and then remove the keys in the counter below the threshold.
Example:
>>> c=Counter("aaaabccadddefeghizkdxxx")
>>> c
Counter({'a': 5, 'd': 4, 'x': 3, 'c': 2, 'e': 2, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'z': 1})
counts={}
for k, v in c.items():
counts.setdefault(v, []).append(k)
tol=2
for k, v in counts.items():
if k<=tol:
c=c-Counter({}.fromkeys(v, k))
>>> c
Counter({'a': 5, 'd': 4, 'x': 3})
In this example, all counts less than or equal to 2 are removed.
Or, just recreate the counter with a comparison to your threshold value:
>>> c
Counter({'a': 5, 'd': 4, 'x': 3, 'c': 2, 'e': 2, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'z': 1})
>>> Counter({k:v for k,v in c.items() if v>tol})
Counter({'a': 5, 'd': 4, 'x': 3})
If you only want to get the least common value, then the most efficient way to handle this is to simply get the minimum value from the counter (dictionary).
Since you can only say whether a value is the lowest, you actually need to look at all items, so a time complexity of O(n) is really the lowest we can get. However, we do not need to have a linear space complexity, as we only need to remember the lowest value, and not all of them. So a solution that works like most_common() in reverse is too much for us.
In this case, we can simply use min() with a custom key function here:
>>> c = Counter('foobarbazbar')
>>> c
Counter({'a': 3, 'b': 3, 'o': 2, 'r': 2, 'f': 1, 'z': 1})
>>> k = min(c, key=lambda x: c[x])
>>> del c[k]
>>> c
Counter({'a': 3, 'b': 3, 'o': 2, 'r': 2, 'z': 1})
Of course, since dictionaries are unordered, you do not get any influence on which of the lowest values is removed that way in case there are multiple with the same lowest occurrence.
X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
I want to count "frequency" of repetitive elements for example [1,2]
Unless speed is really an issue, the simplest approach is to map the sub arrays to tuples and use a Counter dict:
X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
from collections import Counter
cn = Counter(map(tuple, X))
print(cn)
print(list(filter(lambda x:x[1] > 1,cn.items())))
Counter({(1, 2): 2, (5, 1): 2, (2, -1): 1})
((1, 2), 2), ((5, 1), 2)]
If you consider [1, 2]equal to [2, 1] then you could use a frozenset Counter(map(frozenset, X)
Take a look at numpy.unique: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.unique.html
You can use the return_counts argument for getting the count of each item:
values, counts = numpy.unique(X, return_counts = True)
repeated = values[counts > 1]
Assuming I understand what you want:
Try to count each item in your list into a dictionary dict then select from dict items that its count > 1
The following code might help you:
freq = dict()
for item in x:
if tuple(item) not in x:
freq[tuple(item)] = 1
else:
freq[tuple(item)] += 1
print {k:v for(k,v) in freq.items() if v > 1}
That code will give you the output:
{(1, 2): 2}
I try to sort this
{1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
to this
{3: 4, 4: 3, 1: 2, 2: 1, 0: 0}
with a:b model where sort by most higher b value
import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1), reverse=True)
print(sorted_x)
First result is good but in list form:
[(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)]
And I tried to
print(dict(sorted_x))
But…
{0: 0, 1: 2, 2: 1, 3: 4, 4: 3}
How I can save correctly sorted result and convert it to the dict format?
Thank you so much
upd:
OrderedDict([(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)])
I cant understand how OrderedDict can solve my problem
>>> from collections import OrderedDict
>>> sorted_dict = OrderedDict()
>>> dct = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
>>> dct.values().sort()
>>> for x in sorted(dct.values(), reverse=True):
... keys = [idx for idx in dct if dct[idx]==x]
... for key in keys:
... sorted_dict[key] = x
...
>>>
>>> sorted_dict
OrderedDict([(3, 4), (4, 3), (1, 2), (2, 1), (0, 0)])
>>>
or if you are dead set on writing something that looks like dict syntax to a file,
>>> def print_dict_looking_thing(ordered_dict):
... print "{"
... for key, value in ordered_dict.iteritems():
... print "{0}: {1},".format(key, value)
... print "}"
...
>>> print_dict_looking_thing(sorted_dict)
{
3: 4,
4: 3,
1: 2,
2: 1,
0: 0,
}
That is how OrderedDict "solves your problem". But, a few points worth reemphasizing:
Dicts are unordered. You can't "sort them", only operate on their keys and values.
OrderedDicts remember the order that items were added.
It sounds like you are trying to print to a file something that looks like dict syntax. If you do that and read the dict into python at some point, it will become an unordered object (a dict) again.
It is not super clear what "your problem" is. What is the desired end state; printing a string that reads like dict syntax? Making something that has the accessor methods that dicts do? Clarifying what exactly you want (in terms of performance) would be helpful.
The code above has O(n**2) complexity. Don't use it on big things.
Dictionary can't be ordered. But you can use class collections.OrderedDict.
I am using Python to learn linear algebra and I have two dictionaries:
v = {1: 1, 2: 8, 3: 0}
and
M = {(1, 2): 2, (3, 1): 1, (3, 3): 7, (2, 1): -1}
and I want to make a dictionary that adds all elements where the keys in v is the same as the first part of the tuples in M. Example of what I want as a answer for the two dictionaries here. (I will show the calculations I want to do)
newDict = {1: 1*M[(1, 2)], 2: 8*M[(2, 1)], 3: 0*M[(3, 1)]+0*M[(3, 3)]
which is the same as:
newDict = {1: 1*2, 2: 8*-1, 3: 0*1+3*7}
so I get a final dictionary in the form
newDict = {1:2, 2:-8, 3:0}
as you can see, I want the same keys as in the dictionary v. The closest I have gotten is this:
>>> [v[k]*M[r] for k in v for r in M if k == r[0]]
[2, -8]
Where I at least have the right answers, but I cannot get this to work. I don't know where to go from here or if I am at the right track at all. Sorry if my explanation might be lacking
Because you are basing values on multiple input keys, use a loop, not a comprehension. Using a collections.defaultdict object makes the logic a little simpler too:
from collections import defaultdict
newDict = defaultdict(int)
for x, y in M:
newDict[x] += M[x, y] * v.get(x, 0)
Output:
>>> from collections import defaultdict
>>> v = {1: 1, 2: 8, 3: 0}
>>> M = {(1, 2): 2, (3, 1): 1, (3, 3): 7, (2, 1): -1}
>>> newDict = defaultdict(int)
>>> for x, y in M:
... newDict[x] += M[x, y] * v.get(x, 0)
...
>>> newDict
defaultdict(<type 'int'>, {1: 2, 2: -8, 3: 0})
How about this..
newD = {k : 0 for k in v}
for k in v:
for r in M:
if k == r[0]: newD[k] += v[k]*M[r]