Does collection's Counter keeps data sorted? - python

I was reading python collections's Counter. It says following:
>>> from collections import Counter
>>> Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
Counter({'z': 9, 'b': 8, 'a': 4, 'c': 2, 'y': 2, 'v': 2})
Somehow these printed values are printed in descending order (9 > 8 > 4 > 2). Why is it so? Does Counter store values sorted?
PS: Am on python 3.7.7

In terms of the data stored in a Counter object: The data is insertion-ordered as of Python 3.7, because Counter is a subclass of the built-in dict. Prior to Python 3.7, there was no guaranteed order of the data.
However, the behavior you are seeing is coming from Counter.__repr__. We can see from the source code that it will first try to display using the Counter.most_common method, which sorts by value in descending order. If that fails because the values are not sortable, it will fall back to the dict representation, which, again, is insertion-ordered.

The order depends on the python version.
For python < 3.7, there is no guaranteed order, since python 3.7 the order is that of insertion.
Changed in version 3.7: As a dict subclass, Counter inherited the
capability to remember insertion order. Math operations on Counter
objects also preserve order. Results are ordered according to when an
element is first encountered in the left operand and then by the order
encountered in the right operand.
Example on python 3.8 (3.8.10 [GCC 9.4.0]):
from collections import Counter
Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
Output:
Counter({'z': 9, 'a': 4, 'c': 2, 'b': 8, 'y': 2, 'v': 2})
how to check that Counter doesn't sort by count
As __str__ in Counter return the most_common, it is not a reliable way to check the order.
Convert to dict, the __str__ representation will be faithful.
c = Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
print(dict(c))
# {'z': 9, 'a': 4, 'c': 2, 'b': 8, 'y': 2, 'v': 2}

Related

Update dictionary items in list through comprehension

I have a dictionary which i want to use as a template to generate multiple dictionaries with updated dictionary item. This list should be used as a dataset for testing purposes in unit tests in pytest.
I am using following construct in my code(checks are excluded):
def _f(template,**kwargs):
result = [template]
for key, value in kwargs.items():
result = [dict(template_item,**dict([(key,v)])) for v in value for template_item in result]
return result
template = {'a': '', 'b': '', 'x': 'asdf'}
r = _f(template, a=[1,2],b=[11,22])
pprint(r)
[{'a': 1, 'b': 11, 'x': 'asdf'},
{'a': 2, 'b': 11, 'x': 'asdf'},
{'a': 1, 'b': 22, 'x': 'asdf'},
{'a': 2, 'b': 22, 'x': 'asdf'}]
I would like to ask if the construct used to build good enough - possibly it can be written more efficient.
Is this correct way to prepare testing data?
EDIT:
Specially i am unsure about
[dict(template_item,**dict([(key,v)])) for v in value for template_item in result]
and
dict(template_item,**dict([(key,v)]))
Before i was thinking about dict.update() but not suitable for comprehension because it is not returning dictionary.
then i was thinking about simple syntax like
d = {'aa': 11, 'bb': 22}
dict(d,x=33,y=44)
{'aa': 11, 'bb': 22, 'x': 33, 'y': 44}
but i was unable to pass key value through variable. And creating dict just to unpack it sounds counterproductive to me.
Specially i am unsure about...
The thing about updating Python dicts in comprehensions is a bit more complex because they are mutable. In Why doesn't a python dict.update() return the object? the best answer suggests your current solution. Personally I'd probably go with a regular for-loop here in order to ensure the code is legible.
Is this correct way to prepare testing data?
Usually in unit tests you will test both for edge cases and regular cases (you don't wanna repeat yourself, though). You usually want to split the tests, so that each has its own name explaining why it's there and possibly some other data that could help some outsider understand why it's important to make sure this scenario works correctly. Putting all scenarios in one list and then running the test for each one of them without giving the reader additional context (in form of at least a test case name) makes it harder for the reader to distinguish between the cases and judge whether they are all really needed.
Putting each of the scenarios in a separate test case may seem a bit tedious at times, but if any of the tests fails, you can immediately tell which part of the software is failing. If you feel like you write way too many unit tests, then perhaps some of them cover the same kinds of scenarios.
When dealing with unit tests performance is rarely the top priority. Usually what counts more is making the tests number minimal, yet sufficient in order to ensure the software is working correctly. The other prioritized thing is making the tests easily understandable. See below for another take on this (not necessarily more performant yet hopefully more legible).
Alternative solution
You could use itertools.product in order to simplify your code.
The template parameter can be removed (since you can pass the template variable names and their possible values in **kwargs):
from pprint import pprint
import itertools
def _f(**kwargs):
keys, values = zip(*(kwargs.items())) # 1.
subsets = [subset for subset in itertools.product(*values)] # 2.
return [
{key: value for key, value in zip(keys, subset)} for subset in subsets
] # 3.
r = _f(a=[1, 2], b=[11, 22], x=['asdf'])
pprint(r)
Now what's happening in each of these steps:
Step 1.
You split the keyword dict into keys and values. It's important, so that you will fix the order of how you iterate through these arguments every time. The keys and values look like this at this point:
keys = ('a', 'b', 'x')
values = ([1, 2], [11, 22], ['asdf'])
Step 2. You compute the cartesian product of the values, which means you get all the possible combinations of taking a value from each of the values lists. The result of this operation is as follows:
subsets = [(1, 11, 'asdf'), (1, 22, 'asdf'), (2, 11, 'asdf'), (2, 22, 'asdf')]
Step 3.
Now you need to map each of keys to their corresponding values in each of the subsets, hence the list and dict comprehensions, the result should be exactly what you computed using your previous method:
[{'a': 1, 'b': 11, 'x': 'asdf'},
{'a': 1, 'b': 22, 'x': 'asdf'},
{'a': 2, 'b': 11, 'x': 'asdf'},
{'a': 2, 'b': 22, 'x': 'asdf'}]

Merging two Python dictionaries but remaining the order [duplicate]

I'm just starting to play around with Python (VBA background). Why does this dictionary get created out of order? Shouldn't it be a:1, b:2...etc.?
class Card:
def county(self):
c = 0
l = 0
groupL = {} # groupL for Loop
for n in range(0,13):
c += 1
l = chr(n+97)
groupL.setdefault(l,c)
return groupL
pick_card = Card()
group = pick_card.county()
print group
here's the output:
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'k': 11, 'j': 10, 'm': 13, 'l': 12}
or, does it just get printed out of order?
Dictionaries have no order in python. In other words, when you iterate over a dictionary, the order that the keys/items are "yielded" is not the order that you put them into the dictionary. (Try your code on a different version of python and you're likely to get differently ordered output). If you want a dictionary that is ordered, you need a collections.OrderedDict which wasn't introduced until python 2.7. You can find equivalent recipes on ActiveState if you're using an older version of python. However, often it's good enough to just sort the items (e.g. sorted(mydict.items()).
EDIT as requested, an OrderedDict example:
from collections import OrderedDict
groupL = OrderedDict() # groupL for Loop
c = 0
for n in range(0,13):
c += 1
l = chr(n+97)
groupL.setdefault(l,c)
print (groupL)

How to remove the least frequent element from a Counter in Python the fastest way?

I'd like to implement a Counter which drops the least frequent element when the counter's size going beyond some threshold. For that I need to remove the least frequent element.
What is the fastest way to do that in Python?
I know counter.most_common()[-1], but it creates a whole list and seems slow when done extensively? Is there a better command (or maybe a different data structure)?
You may implement least_common by borrowing implementation of most_common and performing necessary changes.
Refer to collections source in Py2.7:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
To change it in order to retrieve least common we need just a few adjustments.
import collections
from operator import itemgetter as _itemgetter
import heapq as _heapq
class MyCounter(collections.Counter):
def least_common(self, n=None):
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=False) # was: reverse=True
return _heapq.nsmallest(n, self.iteritems(), key=_itemgetter(1)) # was _heapq.nlargest
Tests:
c = MyCounter("abbcccddddeeeee")
assert c.most_common() == c.least_common()[::-1]
assert c.most_common()[-1:] == c.least_common(1)
Since your stated goal is to remove items in the counter below a threshold, just reverse the counter (so the values becomes a list of keys with that value) and then remove the keys in the counter below the threshold.
Example:
>>> c=Counter("aaaabccadddefeghizkdxxx")
>>> c
Counter({'a': 5, 'd': 4, 'x': 3, 'c': 2, 'e': 2, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'z': 1})
counts={}
for k, v in c.items():
counts.setdefault(v, []).append(k)
tol=2
for k, v in counts.items():
if k<=tol:
c=c-Counter({}.fromkeys(v, k))
>>> c
Counter({'a': 5, 'd': 4, 'x': 3})
In this example, all counts less than or equal to 2 are removed.
Or, just recreate the counter with a comparison to your threshold value:
>>> c
Counter({'a': 5, 'd': 4, 'x': 3, 'c': 2, 'e': 2, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'z': 1})
>>> Counter({k:v for k,v in c.items() if v>tol})
Counter({'a': 5, 'd': 4, 'x': 3})
If you only want to get the least common value, then the most efficient way to handle this is to simply get the minimum value from the counter (dictionary).
Since you can only say whether a value is the lowest, you actually need to look at all items, so a time complexity of O(n) is really the lowest we can get. However, we do not need to have a linear space complexity, as we only need to remember the lowest value, and not all of them. So a solution that works like most_common() in reverse is too much for us.
In this case, we can simply use min() with a custom key function here:
>>> c = Counter('foobarbazbar')
>>> c
Counter({'a': 3, 'b': 3, 'o': 2, 'r': 2, 'f': 1, 'z': 1})
>>> k = min(c, key=lambda x: c[x])
>>> del c[k]
>>> c
Counter({'a': 3, 'b': 3, 'o': 2, 'r': 2, 'z': 1})
Of course, since dictionaries are unordered, you do not get any influence on which of the lowest values is removed that way in case there are multiple with the same lowest occurrence.

Why the python dictionary key is like this?

I have a dictionary:
dict_a = dict(zip(('a','b','c','d','e'),(1,2,3,4,5)))
The output is:
dict_a = {'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}
I want to know why it is not:
dict_a = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
I do know dict_a is a not sorted object, but still want to know why the key order is a, c, b, e, d, not in the other orders.
Thanks
Dictionaries are not just sorted, they are unordered. Dictionaries are, in the deeper level, keys pointing to memory addresses.
Let's tackle this another way. In traditional languages you have arrays. Internally, arrays are contiguous memory, i.e. x[0] and x[1] are next to eachother in memory. Dictionaries meanwhile are loose collections of pointers. y[a] and y[b] have no physical relationship they have no order.
See longer discussions earlier:
Why is python ordering my dictionary like so?
Why is the order in dictionaries and sets arbitrary?
(And this should rather have been a comment, but I don't have the reputation to write one...)
As you said, dictionaries are not ordered objects. So no matter what order you add items to it they will be jumbled up. Dictionary do not support indexing, so therefore it has no reason to be in the correct order. I guess it saves memory not having to know what position the items are supposed to be.
In a way you can say they have indexing using keys to obtain the associated value and not position as in lists. You can only have a distinct key point to a value as you can only have 1 value at position 0 in a list.
More info at Python documentation
Because the regular dictionay does not contains insertion order process. It use arbitrary order. So if you want to use a dictionary as ordered you should use OrderedDict, and it contains insertion order process. But you need to consider your sutition when you use ordered dictionary, because it slower then regular dictionary when you insert item and update it.
If you want to maintain the order of the items in your dictionary, you can use an OrderedDict:
from collections import OrderedDict
dict_a = ()
dict_a['a'] = 1
dict_a['b'] = 2
dict_a['c'] = 3
dict_a['d'] = 4
dict_a['e'] = 5
Then the order of the dictionary will be maintained:
dict_a = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)])
So, first I thought it was zip that was causing it. However it is not zip since when I enter
>>zip(('a','b','c','d','e'),(1,2,3,4,5)
The output is from left to right
[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]
So I considered maybe it has to do with memory storage between our systems (I'm using windows10). I checked with the same input and got the same output
>>dict_a = dict(zip(('a','b','c','d','e'),(1,2,3,4,5)))
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}
Now, I checked the documentation at python built-in types. Their code is more what you'd expect where
>>c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
has the output from right to left, (which I'm guessing after zipping the two lists the dictionary adds from popping values and using the update function).
>>{'three': 3, 'two': 2, 'one': 1}
No matter how you put in the different dictionaries it still has the same output (dictionaries like sets as OP states order does not matter).
>>d = dict([('two', 2), ('one', 1), ('three', 3)])
{'three': 3, 'two': 2, 'one': 1}
The python manual states for built in types
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument
Given that the key arugments don't change even with or without changing zip I tried one last thing. I decided to switch the keys and items
>>d = dict([(1,'a'), (2,'b'),(3,'c'),(4,'d'),(5,'e')])
Which outputted what we expected
{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Ordered by numbers. My only explanation is that during the actual compiling the letters 'a','b','c','d','e' are being stored as another form (e.g. hexadecimal not ascii) which would change the order of a,b,c,d,e in the list that is used for keys in the dictionary.

Why does this python dictionary get created out of order using setdefault()?

I'm just starting to play around with Python (VBA background). Why does this dictionary get created out of order? Shouldn't it be a:1, b:2...etc.?
class Card:
def county(self):
c = 0
l = 0
groupL = {} # groupL for Loop
for n in range(0,13):
c += 1
l = chr(n+97)
groupL.setdefault(l,c)
return groupL
pick_card = Card()
group = pick_card.county()
print group
here's the output:
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'k': 11, 'j': 10, 'm': 13, 'l': 12}
or, does it just get printed out of order?
Dictionaries have no order in python. In other words, when you iterate over a dictionary, the order that the keys/items are "yielded" is not the order that you put them into the dictionary. (Try your code on a different version of python and you're likely to get differently ordered output). If you want a dictionary that is ordered, you need a collections.OrderedDict which wasn't introduced until python 2.7. You can find equivalent recipes on ActiveState if you're using an older version of python. However, often it's good enough to just sort the items (e.g. sorted(mydict.items()).
EDIT as requested, an OrderedDict example:
from collections import OrderedDict
groupL = OrderedDict() # groupL for Loop
c = 0
for n in range(0,13):
c += 1
l = chr(n+97)
groupL.setdefault(l,c)
print (groupL)

Categories

Resources