how reliable is python’s dictionary ordering? - python

I need to quickly hash a dictionary (a counter), and I’m noticing that python seems to order dictionaries with the same keys in the same order, even if they are constructed differently. In fact the dictionaries seem to be able to survive quite a bit of abuse:
>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> E = {'a': 1, 'b': 2, 'c': 3}
>>> list(E)
['b', 'c', 'a']
>>> list(E)
['b', 'c', 'a']
>>> list(E)
['b', 'c', 'a']
>>> F = {'a': 1, 'b': 2, 'c': 3}
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> G = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(F)
['b', 'c', 'a']
>>> F.pop('a')
1
>>> list(F)
['b', 'c']
>>> F['a'] = 2
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> H = {'b': 2, 'a': 1, 'c': 3}
>>> list(H)
['b', 'c', 'a']
>>> H = {'b': 2, 'c': 1, 'a': 3}
>>> list(H)
['b', 'c', 'a']
>>> K = {'b': 2, 'c': 1, 'a': 3, 'd': 4}
>>> list(K)
['b', 'c', 'a', 'd']
>>> K = {'b': 2, 'c': 1, 'd': 3, 'a': 4}
>>> list(K)
['b', 'c', 'a', 'd']
My question is then, if my dictionaries have the same keys and the same values, can I count on the keys being in the same order, at least for the lifetime of that running instance of python?
Note that I’m aware python is a bit incomprehensible in how it decides to order a dictionary, but I want to know if given the same inputs, the same instance of python will return the same key ordering each time.

Regular python dicts are not ordered. It is never guaranteed that when you get the list of keys that they will be the order you expect them to be.
If you want to preserve order, use an ordered dict.
https://docs.python.org/2/library/collections.html#collections.OrderedDict

Python >3.7
Dictionary order is guaranteed to be insertion order.
Python <3.7
In terms of the language definition, no you cannot rely on stable ordering, because it is not promised in the language definition.
Now, it might be that over the short- and medium-term you will find that this ordering is stable, and this makes sense: computers are deterministic, so it's reasonable to expect the same results from one iteration of the experiment to the next. (however, since they are complex systems, this nondeterministic machine might still produce unexpected results, since you don't know the factors that are determinant) However, this reasoning does not extend to the long-term, which is what you should be programming to, because the language implementation is free to choose any means of ordering those keys that it likes, and to change that choice at any time, as long as the implementation is consistent with the language definition. This means that programs depending on some order remaining stable are subject to breakage if run under different implementations, and they are subject to breakage when the implementation is updated.
This is not a place you want to be, therefore you should not make any assumptions about the stability of ordering of dictionary keys.
That being said, if you are only concerned about stability just across the lifetime of one running instance of python then this seems like a safe gamble - again, computers are deterministic - but still a gamble. Test carefully against cases rather more complex than the ones you're expecting to encounter, and then decide whether that chopping block looks like a comfortable place to rest your neck.

if my dictionaries have the same keys and the same values, can I count on the keys being in the same order
No.
>>> list({'d': 0, 'l': 0})
['d', 'l']
>>> list({'l': 0, 'd': 0})
['l', 'd']

Given that nobody mentioned this yet, I'll tell you that hash randomization is enabled by default since Python 3.3.
With hash randomization, the result of hash('abc') is different between each Python run. Because hashes are at the base of dictionaries (they are used to determine the location of the item in the internal array used by dict), there are even fewer guarantees about ordering.
$ python3.5
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> list(d)
['a', 'c', 'b']
>>> list(d)
['a', 'c', 'b']
$ python3.5
# new process, new random seed, new ordering
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> list(d)
['c', 'a', 'b']
>>> list(d)
['c', 'a', 'b']

Related

Sorting list based on dictionary keys in python

Is there a short way to sort a list based on the order of another dictionary keys?
suppose I have:
lst = ['b', 'c', 'a']
dic = { 'a': "hello" , 'b': "bar" , 'c': "foo" }
I want to sort the list to be ['a','b','c'] based on the order of dic keys.
You can create a lookup of keys versus their insertion order in dic. To do so you can write:
>>> lst = ['d', 'b', 'c', 'a']
>>> dic = {"a": "hello", "b": "bar", "c": "foo"}
>>> order = {k: i for i, k in enumerate(dic)}
>>> order
{'a': 0, 'b': 1, 'c': 2}
Using this you can write a simple lookup for the key argument of sorted to rank items based on order.
>>> sorted(lst, key=order.get)
['a', 'b', 'c']
If there are values in lst that are not found in dic you should call get using a lambda so you can provide a default index. You'll have to choose if you want to rank unknown items at the start or end.
Default to the start:
>>> lst = ['d', 'b', 'c', 'a']
>>> sorted(lst, key=lambda k: order.get(k, -1))
['d', 'a', 'b', 'c']
Default to the end:
>>> lst = ['d', 'b', 'c', 'a']
>>> sorted(lst, key=lambda k: order.get(k, len(order)))
['a', 'b', 'c', 'd']

Python Iterating through two lists only iterates through last element

I am trying to iterate through a double list but am getting the incorrect results. I am trying to get the count of each element in the list.
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = words.count(letters)
for x in countVocabDict:
print(x + ":" + str(countVocabDict[x]))
at the moment, I am getting:
<s>:1
a:1
b:2
c:2
</s>:1
It seems as if it is only iterating through the last list in 'l' : ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']
but I am trying to get:
<s>: 3
a: 4
b: 5
c: 6
</s>:3
In each inner for loop, you are not adding to the current value of dict[letters] but set it to whatever amount is counted for the current sublist (peculiarly) named word.
Fixing your code with a vanilla dict:
>>> l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
>>> d = {}
>>>
>>> for sublist in l:
...: for x in sublist:
...: d[x] = d.get(x, 0) + 1
>>> d
{'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3}
Note that I am not calling list.count in each inner for loop. Calling count will iterate over the whole list again and again. It is far more efficient to just add 1 every time a value is seen, which can be done by looking at each element of the (sub)lists exactly once.
Using a Counter.
>>> from collections import Counter
>>> Counter(x for sub in l for x in sub)
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
Using a Counter and not manually unnesting the nested list:
>>> from collections import Counter
>>> from itertools import chain
>>> Counter(chain.from_iterable(l))
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
The dictionary is being overwritten in every iteration, rather it should update
count_dict[letters] += words.count(letters)
Initialize the dictionary with defaultdict
from collections import defaultdict
count_dict = defaultdict(int)
As #Vishnudev said, you must add current counter. But dict[letters] must exists (else you'll get a KeyError Exception). You can use the get method of dict with a default value to avoir this:
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'],
['<s>', 'a', 'c', 'b', 'c', '</s>'],
['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = dict.get(letters, 0) + 1
As per your question, you seem to know that it only takes on the result of the last sublist. This happens because after every iteration your previous dictionary values are replaced and overwritten by the next iteration values. So, you need to maintain the previous states values and add it to the newly calculated values.
You can try this-
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
d={}
for lis in l:
for x in lis:
if x in d:
d[x]+=1
else:
d[x]=1
So the resulting dictionary d will be as-
{'<s>': 3, 'a': 4, 'c': 6, 'b': 5, '</s>': 3}
I hope this helps!

How to store sorted records in python, with log (n) access time?

I need to store records arranged in ascending order, with log(n) access time. I come from C++ background and if I had to use C++, I would have gone for std::map which implements red-black tree internally. This guaranties the records to always be stored in ascending order of the keys, and also guaranties log(n) access time. But what's the best way to do this in Python3.5?
One way to solve this problem will be to use the bintrees library, but is there a dedicated library for storing sorted records?
You can use sortedContainers which can allow you maintain an always sorted data-structures (list, dict, dictWithKeys, set).
You can install using
pip install sortedcontainers
Here is a quick example
import sortedcontainers
g = {'B': ['A', 'C'],
'C': ['D'],
'A': ['B', 'C'],
'D': [],
}
l = sortedcontainers.SortedDict(g)
>>> l
SortedDict(None, 1000, {'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['D'], 'D': []})
>>> l['G']=['A','B']
>>> l
SortedDict(None, 1000, {'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['D'], 'D': [], 'G': ['A', 'B']})
>>> l['E']=['C','D','G']
>>> l
SortedDict(None, 1000, {'A': ['B', 'C'], 'B': ['A', 'C'], 'C': ['D'], 'D': [], 'E': ['C', 'D', 'G'], 'G': ['A', 'B']})
>>>

Divide list to multiple lists based on elements value

I have the following list:
initial_list = [['B', 'D', 'A', 'C', 'E']]
On each element of the list I apply a function and put the results in a dictionary:
for state in initial_list:
next_dict[state] = move([state], alphabet)
This gives the following result:
next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
What I would like to do is separate the keys from initial_list based on their
values in the next_dict dictionary, basically group the elements of the first list to elements with the same value in the next_dict:
new_list = [['A', 'C'], ['B', 'E'], ['D']]
'A' and 'C' will stay in the same group because they have the same value 'C', 'B' and 'D' will also share the same group because their value is 'D' and then 'D' will be in it's own group.
How can I achieve this result?
You need groupby, after having sorted your list by next_dict values :
It generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function).
from itertools import groupby
initial_list = ['B', 'D', 'A', 'C', 'E']
def move(letter):
return {'A': 'C', 'C': 'C', 'D': 'E', 'E': 'D', 'B': 'D'}.get(letter)
sorted_list = sorted(initial_list, key=move)
print [list(v) for k,v in groupby(sorted_list, key=move)]
#=> [['A', 'C'], ['B', 'E'], ['D']]
Simplest way to achieve this will be to use itertools.groupby with key as dict.get as:
>>> from itertools import groupby
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> [list(i) for _, i in groupby(sorted(initial_list, key=next_dict.get), next_dict.get)]
[['A', 'C'], ['B', 'E'], ['D']]
I'm not exactly sure that's what you want but you can group the values based on their values in the next_dict:
>>> next_dict = {'D': 'E', 'B': 'D', 'A': 'C', 'C': 'C', 'E': 'D'}
>>> # external library but one can also use a defaultdict.
>>> from iteration_utilities import groupedby
>>> groupings = groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__)
>>> groupings
{'C': ['A', 'C'], 'D': ['B', 'E'], 'E': ['D']}
and then convert that to a list of their values:
>>> list(groupings.values())
[['A', 'C'], ['D'], ['B', 'E']]
Combine everything into a one-liner (not really recommended but a lot of people prefer that):
>>> list(groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__).values())
[['A', 'C'], ['D'], ['B', 'E']]
Try this:
next_next_dict = {}
for key in next_dict:
if next_dict[key][0] in next_next_dict:
next_next_dict[next_dict[key][0]] += key
else:
next_next_dict[next_dict[key][0]] = [key]
new_list = next_next_dict.values()
Or this:
new_list = []
for value in next_dict.values():
new_value = [key for key in next_dict.keys() if next_dict[key] == value]
if new_value not in new_list:
new_list.append(new_value)
We can sort your list with your dictionary mapping, and then use itertools.groupby to form the groups. The only amendment I made here is making your initial list an actual flat list.
>>> from itertools import groupby
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> s_key = lambda x: next_dict[x]
>>> [list(v) for k, v in groupby(sorted(initial_list, key=s_key), key=s_key)]
[['A', 'C'], ['B', 'E'], ['D']]

Python count in a sublist in a nest list

x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
Let's say we have a list with random str letters.
How can i create a function so it tells me how many times the letter 'a' comes out, which in this case 2. Or any other letter, like 'b' comes out once, 'f' comes out twice. etc.
Thank you!
You could flatten the list and use collections.Counter:
>>> import collections
>>> x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
>>> d = collections.Counter(e for sublist in x for e in sublist)
>>> d
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})
>>> d['a']
2
import itertools, collections
result = collections.defaultdict(int)
for i in itertools.chain(*x):
result[i] += 1
This will create result as a dictionary with the characters as keys and their counts as values.
Just FYI, you can use sum() to flatten a single nested list.
>>> from collections import Counter
>>>
>>> x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
>>> c = Counter(sum(x, []))
>>> c
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})
But, as Blender and John Clements have addressed, itertools.chain.from_iterable() may be more clear.
>>> from itertools import chain
>>> c = Counter(chain.from_iterable(x)))
>>> c
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})

Categories

Resources