How to find the 2nd max of a Counter - Python - python

The max of a counter can be accessed as such:
c = Counter()
c['foo'] = 124123
c['bar'] = 43
c['foofro'] =5676
c['barbar'] = 234
# This only prints the max key
print max(c), src_sense[max(c)]
# print the max key of the value
x = max(src_sense.iteritems(), key=operator.itemgetter(1))[0]
print x, src_sense[x]
What if i want a sorted counter in descending counts?
How do i access the 2nd maximum, or the 3rd or the Nth maximum key?

most_common(self, n=None) method of collections.Counter instance
List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
and so:
>>> c.most_common()
[('foo', 124123), ('foofro', 5676), ('barbar', 234), ('bar', 43)]
>>> c.most_common(2)[-1]
('foofro', 5676)
Note that max(c) probably doesn't return what you want: iteration over a Counter is iteration over the keys, and so max(c) == max(c.keys()) == 'foofro', because it's the last after string sorting. You'd need to do something like
>>> max(c, key=c.get)
'foo'
to get the (a) key with the largest value. In a similar fashion, you could forego most_common entirely and do the sort yourself:
>>> sorted(c, key=c.get)[-2]
'foofro'

Related

Make a list with the most frequent tuple of a dictionary acording the first element

I'm trying to make a list that contains the most frequent tuple of a dictionary acording the first element. For example:
If d is my dictionary:
d = {(Hello, my): 1,(Hello, world):2, (my, name):3, (my,house):1}
I want to obtain a list like this:
L= [(Hello, world),(my, name)]
So I try this:
L = [k for k,val in d.iteritems() if val == max(d.values())]
But that only gives me the max of all the tuples:
L = [('my', 'name')]
I was thinking that maybe I have to go through my dictionary and make a new one for every first word of each tuple and then find the most frequent and put it on a list, but I'm having trouble to translate that in a code.
from itertools import groupby
# your input data
d = {('Hello', 'my'): 1,('Hello', 'world'):2, ('my', 'name'):3, ('my','house'):1}
key_fu = lambda x: x[0][0] # first element of first element,
# i.e. of ((a,b), c), return a
groups = groupby(sorted(d.iteritems(), key=key_fu), key_fu)
l = [max(g, key=lambda x:x[1])[0] for _, g in groups]
This is achievable in O(n) if you just re-key the mapping off the first word:
>>> d = {('Hello','my'): 1, ('Hello','world'): 2, ('my','name'): 3, ('my','house'): 1}
>>> d_max = {}
>>> for (first, second), count in d.items():
... if count >= d_max.get(first, (None, 0))[1]:
... d_max[first] = (second, count)
...
>>> d_max
{'Hello': ('world', 2), 'my': ('name', 3)}
>>> output = [(first, second) for (first, (second, count)) in d_max.items()]
>>> output
[('my', 'name'), ('Hello', 'world')]
In my opinion you should not just get the max on all the d values otherwise it just get the biggest value contained in your dictionary that is three in the specified case.
What I would do is create an intermediate list ( maybe this can be hidden ) that keeps in memory the first part of the key as second element, and the counter as first element. In this way you can just get the first element on the sorted list, to get the real max key.
You have pairs of words and a count associated to each of them. You could store your information in (or convert it to) 3-tuples:
d = [
('Hello', 'my', 1),
('Hello', 'world', 2),
('my', 'name', 3),
('my', 'house', 1)
]
For each word in the first position, you want to find the word in 2nd position occurs the most frequently. Sort the data according to the first word (any order, just to group them), then according to the count (descending).
d.sort(lambda t1,t2: cmp(t2[2],t1[2]) if (t1[0]==t2[0]) else cmp(t1[0],t2[0]))
Finally, iterate through the resulting array, keeping track of the last word encountered, and append only when encountering a new word in 1st position.
L = []
last_word = ""
for word1, word2, count in d:
if word1 != last_word:
L.append((word1,word2))
last_word = word1
print L
By running this code, you obtain [('Hello', 'world'), ('my', 'name')].

how to replace the alphabetically smallest letter by 1, the next smallest by 2 but do not discard multiple occurrences of a letter?

I am using Python 3 and I want to write a function that takes a string of all capital letters, so suppose s = 'VENEER', and gives me the following output '614235'.
The function I have so far is:
def key2(s):
new=''
for ch in s:
acc=0
for temp in s:
if temp<=ch:
acc+=1
new+=str(acc)
return(new)
If s == 'VENEER' then new == '634335'. If s contains no duplicates, the code works perfectly.
I am stuck on how to edit the code to get the output stated in the beginning.
Note that the built-in method for replacing characters within a string, str.replace, takes a third argument; count. You can use this to your advantage, replacing only the first appearance of each letter (obviously once you replace the first 'E', the second one will become the first appearance, and so on):
def process(s):
for i, c in enumerate(sorted(s), 1):
## print s # uncomment to see process
s = s.replace(c, str(i), 1)
return s
I have used the built-in functions sorted and enumerate to get the appropriate numbers to replace the characters:
1 2 3 4 5 6 # 'enumerate' from 1 -> 'i'
E E E N R V # 'sorted' input 's' -> 'c'
Example usage:
>>> process("VENEER")
'614235'
One way would be to use numpy.argsort to find the order, then find the ranks, and join them:
>>> s = 'VENEER'
>>> order = np.argsort(list(s))
>>> rank = np.argsort(order) + 1
>>> ''.join(map(str, rank))
'614235'
You can use a regex:
import re
s="VENEER"
for n, c in enumerate(sorted(s), 1):
s=re.sub('%c' % c, '%i' % n, s, count=1)
print s
# 614235
You can also use several nested generators:
def indexes(seq):
for v, i in sorted((v, i) for (i, v) in enumerate(seq)):
yield i
print ''.join('%i' % (e+1) for e in indexes(indexes(s)))
# 614235
From your title, you may want to do like this?
>>> from collections import OrderedDict
>>> s='VENEER'
>>> d = {k: n for n, k in enumerate(OrderedDict.fromkeys(sorted(s)), 1)}
>>> "".join(map(lambda k: str(d[k]), s))
'412113'
As #jonrsharpe commented I didn't need to use OrderedDict.
def caps_to_nums(in_string):
indexed_replaced_string = [(idx, val) for val, (idx, ch) in enumerate(sorted(enumerate(in_string), key=lambda x: x[1]), 1)]
return ''.join(map(lambda x: str(x[1]), sorted(indexed_replaced_string)))
First we run enumerate to be able to save the natural sort order
enumerate("VENEER") -> [(0, 'V'), (1, 'E'), (2, 'N'), (3, 'E'), (4, 'E'), (5, 'R')]
# this gives us somewhere to RETURN to later.
Then we sort that according to its second element, which is alphabetical, and run enumerate again with a start value of 1 to get the replacement value. We throw away the alpha value, since it's not needed anymore.
[(idx, val) for val, (idx, ch) in enumerate(sorted([(0, 'V'), (1, 'E'), ...], key = lambda x: x[1]), start=1)]
# [(1, 1), (3, 2), (4, 3), (2, 4), (5, 5), (0, 6)]
Then map the second element (our value) sorting by the first element (the original index)
map(lambda x: str(x[1]), sorted(replacement_values)
and str.join it
''.join(that_mapping)
Ta-da!

Selecting all top words in Python list using Counter

I believe this should be pretty straightforward, but it seems I am not able to think straight to get this right.
I have a list as follows:
comp = [Amazon, Apple, Microsoft, Google, Amazon, Ebay, Apple, Paypal, Google]
I just want to print the words that occur the most. I did the following:
cnt = Counter(comp.split(','))
final_list = cnt.most_common(2)
This gives me the following output:
[[('Amazon', 2), ('Apple', 2)]]
I am not sure what parameter pass in most_common() since it could be different for each input list. So, I would like to know how I can print the top occurring words, be it 3 for one list or 4 for another. So, for the above sample, the output would be as follows:
[[('Amazon', 2), ('Apple', 2), ('Google',2)]]
Thanks
You can use itertools.takewhile here:
>>> from itertools import takewhile
>>> lis = ['Amazon', 'Apple', 'Microsoft', 'Google', 'Amazon', 'Ebay', 'Apple', 'Paypal', 'Google']
>>> c = Counter(lis)
>>> items = c.most_common()
Get the max count:
>>> max_ = items[0][1]
Select only those items where count = max_, and stop as soon as an item with less count is found:
>>> list(takewhile(lambda x: x[1]==max_, items))
[('Google', 2), ('Apple', 2), ('Amazon', 2)]
You've misunderstood Counter.most_common:
most_common(self, n=None)
List the n most common elements and their counts from the most common
to the least. If n is None, then list all element counts.
i.e n is not the count here, it is the number of top items you want to return. It is essentially equivalent to:
>>> c.most_common(4)
[('Google', 2), ('Apple', 2), ('Amazon', 2), ('Paypal', 1)]
>>> c.most_common()[:4]
[('Google', 2), ('Apple', 2), ('Amazon', 2), ('Paypal', 1)]
You can do this by maintaining two variables maxi and maxi_value storing the maximum element and no of times it has occured.
dict = {}
maxi = None
maxi_value = 0
for elem in comp:
try:
dict[elem] += 1
except IndexError:
dict[elem] = 1
if dict[elem] > mini_value:
mini = elem
print (maxi)
Find the number of occurences of one of the top words, and then filter the whole list returned by most_common:
>>> mc = cnt.most_common()
>>> filter(lambda t: t[1] == mc[0][1], mc)

How to write a function to rearrange a list according to the dictionary of index

How to write a function to rearrange a list according to the dictionary of index in python?
for example,
L=[('b',3),('a',2),('c',1)]
dict_index={'a':0,'b':1,'c':2}
I want a list of :
[2,3,1]
where 2 is from 'a',3 is from 'b' and 1 is from 'c', but rearrange only the number in L according to the dict_index
Try this (edited with simpler solution):
L=[('b',3),('a',2),('c',1)]
dict_index={'a':0,'b':1,'c':2}
# Creates a new empty list with a "slot" for each letter.
result_list = [0] * len(dict_index)
for letter, value in L:
# Assigns the value on the correct slot based on the letter.
result_list[dict_index[letter]] = value
print result_list # prints [2, 3, 1]
sorted and the .sort() method of lists take a key parameter:
>>> L=[('b',3),('a',2),('c',1)]
>>> dict_index={'a':0,'b':1,'c':2}
>>> sorted(L, key=lambda x: dict_index[x[0]])
[('a', 2), ('b', 3), ('c', 1)]
and so
>>> [x[1] for x in sorted(L, key=lambda x: dict_index[x[0]])]
[2, 3, 1]
should do it. For a more interesting example -- yours happens to match alphabetical order with the numerical order, so it's hard to see that it's really working -- we can shuffle dict_index a bit:
>>> dict_index={'a':0,'b':2,'c':1}
>>> sorted(L, key=lambda x: dict_index[x[0]])
[('a', 2), ('c', 1), ('b', 3)]
Using list comprehensions:
def index_sort(L, dict_index):
res = [(dict_index[i], j) for (i, j) in L] #Substitute in the index
res = sorted(res, key=lambda entry: entry[0]) #Sort by index
res = [j for (i, j) in res] #Just take the value
return res

How to write a function counting number of letter in list?

How to write a function counting number of letter in list?
for example:
letter_list=['a','b','a','c','b','a']
letter_index={'a':0,'b':1,'c':2}
I want to get a result of:
([3,2,1])
To get the most common items in a list, or just count the number of occurrences, use the Counter class.
from collections import Counter
letter_list=('a','b','a','c','b','a')
counter = Counter(letter_list)
print counter.most_common(1)
# Prints 'a' because it's the most common element
And from this you can also get the number of occurrences of each element:
print counter['a'] # Prints 3
print counter.most_common() # Prints [('a', 5), ('r', 2), ('b', 2)]
Try using dict comprehension. Also, letter_list in your example is a tuple, not a list.
>>> letter_list = ['a','b','a','c','b','a']
>>> {x:letter_list.count(x) for x in letter_list}
{'a': 3, 'c': 1, 'b': 2}
To get the highest occurring item in the list, you could use the Counter module as detailed by #BoppreH, or you could do something like this.
>>> max(set(letter_list), key=letter_list.count)
'b'
Just one small option:
letter_list=('a','b','a','c','b','a')
def __get_res(lVals):
unique = set(lVals)
res = map(lVals.count, unique)
return (max(unique, key=lVals.count), map(lVals.count, unique))
print __get_res(letter_list)

Categories

Resources