I am using a dictionary of dataframes to do some analysis on NFL teams. I need to loop through the dictionaries (backwards, ordered by time of insertion) for the analysis I plan to do. Each NFL team gets its own dictionary.
My functions iterate through the dictionary with code similar to the line displayed at the top. Each key is a tuple, and the second entry in the tuple denotes the week (of the NFL season) the game was played. I initially inserted week 1's key and value, then week 2's key and value, then week 3's key and value. Seeing the output, this works as planned and means my functions should work as they are meant to. No problems in practice. However, if you view the dictionary itself, the keys are out of order (see the second output).
So what exactly determines the order of the keys when you view the dictionary? The Buccaneers dictionary goes 2 -> 1 -> 3. But this is not the case for each team's dictionary; the order seems completely random. What determines this order? I am curious (I definitely inserted them in 1 -> 2 -> 3 order for every team). I am using Python 3.6
See this question for details. To summarize, dictionaries are ordered in the insertion order since CPython 3.6, but that was an implementation detail before Python 3.7 specifications. The doc states:
Changed in version 3.7: Dictionary order is guaranteed to be insertion order.
Hence the answer to your question is:
if you mean CPython specifically, the dictionary order is the insertion order (though that is not guaranted by the specs and one can imagine, in theory, a patch to CPython 3.6 that breaks this behaviour)
if you mean any implementation (CPython, Jython, PyPy...), the implementation determines the dictionary order: there is no guarantee on the order (unless specified by the implementation).
You might ask why there are implementations of dictionaries that are not ordered by insertion order. I suggest you check the hash table data structure. Basically, the values are put in an array, depending on the hash of the key. The hash is a function that maps a key to the index of an array cell. This is why the lookup is so fast: take the key, compute the hash, read the value in the cell (I ignore the collision resolution details), instead of scanning a whole list of (key, value) pairs for instance.
There is no guarantee that the order of the hashed keys is the same as the order of insertion of the keys (or the order of the keys themselves). If you list the keys by scanning the array, the order of the keys appears to be random.
Remark: you can use the OrderDict class to force the keys to be ordered, but that's the order of keys (e.g. 'Opponent' < 'Reference').
Related
Since Python 3.7, dictionaries preserve order based on insertion.
It seems like you can get the first item in a dictionary using next(iter(my_dict))?
My question is around the Big O time complexity of that operation?
Can I regard next(iter(my_dict)) as a constant time (O(1)) operation? Or what's the best way to retrieve the first item in the dictionary in constant time?
The reason I'm asking is that I'm hoping to use this for coding interviews, where there's a significant emphasis on the time complexity of your solution, rather than how fast it runs in milliseconds.
It's probably the best way (actually you're getting the first key now, next(iter(d.values())) gets your value).
This operation (any iteration through keys, values or items for combined tables at least) iterates through an array holding the dictionary entries:
PyDictKeyEntry *entry_ptr = &DK_ENTRIES(k)[i];
while (i < n && entry_ptr->me_value == NULL) {
entry_ptr++;
i++;
}
entry_ptr->me_value holds the value for each respective key.
If your dictionary is freshly created, this finds the first inserted item during the first iteration (the dictionary entries array is append-only, hence preserving order).
If your dictionary has been altered (you've deleted many of the items) this might, in the worse case, result in O(N) to find the first (among remaining items) inserted item (where N is the total number of original items). This is due to dictionaries not resizing when items are removed and, as a result, entry_ptr->me_value being NULL for many entries.
Note that this is CPython specific. I'm not aware of how other implementations of Python implement this.
I'd like to explore the hash table,
In [1]: book = {"apple":0.67, "milk":1.49, "avocado":1.49, "python":2}
In [5]: [hex(id(key)) for key in book]
Out[5]: ['0x10ffffc70', '0x10ffffab0', '0x10ffffe68', '0x10ee1cca8']
The addresses tell that the keys are far away from each other, especially key "python",
I assumed that they are adjacent to one another.
How could this happen? Is it running in high performance?
There are two ways we can interpret your confusion: either you expected the id() to be the hash function for the keys, or you expected keys to be relocated to the hash table and, since in CPython the id() value is a memory location, that the id() values would say something about the hash table size. We can address both by talking about Python's dictionary implementation and how Python deals with objects in general.
Python dictionaries are implemented as a hash table, which is a table of limited size. To store keys, a hash function generates an integer (same integer for equal values), and the key is stored in a slot based on that number using a modulo function:
slot = hash(key) % len(table)
This can lead to collisions, so having a large range of numbers for the hash function to pick from is going to help reduce the chances there are such collisions. You still have to deal with collisions anyway, but you want to minimise that.
Python does not use the id() function as a hash function here, because that would not produce the same hash for equal values! If you didn't produce the same hash for equal values, then you couldn't use multiple "hello world" strings as a means to find the right slot again, as dictionary["hello world"] = "value" then "hello world" in dictionary would produce different id() values and thus hash to different slots and you would not that the specific string value has already been used as a key.
Instead, objects are expected to implement a __hash__ method, and you can see what that method produces for various objects with the hash() function.
Because keys stored in a dictionary must remain unchanged, Python won't let you store mutable types in a dictionary. Otherwise, if you can change their value, they would no longer be equal to another such object with the old value and shame hash, and you wouldn't find them in the slot that their new hash would map to.
Note that Python puts all objects in a dynamic heap, and uses references everywhere to relate the objects. Dictionaries hold references to keys and values; putting a key into a dictionary does not re-locate the key in memory and the id() of the key won't change. If keys were relocated, then a requirement for the id() function would be violated, the documentation states: This is an integer which is guaranteed to be unique and constant for this object during its lifetime.
As for those collisions: Python deals with collisions by looking for a new slot with a fixed formula, finding an empty slot in a predictable but psuedorandom series of slot numbers; see the dictobject.c source code comments if you want to know the details. As the table fills up, Python will dynamically grow the table to fit more elements, so there will always be empty slots.
I have a set, I add items (ints) to it, and when I print it, the items apparently are sorted:
a = set()
a.add(3)
a.add(2)
a.add(4)
a.add(1)
a.add(5)
print a
# set([1, 2, 3, 4, 5])
I have tried with various values, apparently it needs to be only ints.
I run Python 2.7.5 under MacOSX. It is also reproduced using repl.it (see http://repl.it/TpV)
The question is: is this documented somewhere (haven't find it so far), is it normal, is it something that can be relied on?
Extra question: when is the sort done? during the print? is it internally stored sorted? (is that even possible given the expected constant complexity of insertion?)
This is a coincidence. The data is neither sorted nor does __str__ sort.
The hash values for integers equal their value (except for -1 and long integers outside the sys.maxint range), which increases the chance that integers are slotted in order, but that's not a given.
set uses a hash table to track items contained, and ordering depends on the hash value, and insertion and deletion history.
The how and why of the interaction between integers and sets are all implementation details, and can easily vary from version to version. Python 3.3 introduced hash randomisation for certain types, and Python 3.4 expanded on this, making ordering of sets and dictionaries volatile between Python process restarts too (depending on the types of values stored).
I have a python dictionary (say dict) in which I keep modifying values (the keys remain unaltered). Will the order of keys in the list given by dict.keys() change when I modify the values corresponding to the keys?
No, a python dictionary has an ordering for the keys but does not guarantee what that order will be or how it is calculated.
Which is why they are not guaranteed to be ordered in the first place.
The values stored in the dictionary do not have an effect on the hash values of the keys and so will not change ordering.
Taken from the Python Documentation:
The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the in keyword.
No, the order of the dict will not change because you change the values. The order depends on the keys only (or their hash value, to be more specific at least in CPython). However, it may change between versions and implementations of Python, and in Python 3.3, it will change every time you start Python.
Python dictionaries' key ordering should not be assumed constant.
However, there are other datastructures that do give consistent key ordering, that work a lot like dictionaries:
http://stromberg.dnsalias.org/~strombrg/treap/
http://stromberg.dnsalias.org/~strombrg/red-black-tree-mod/
BTW, you should not name a variable "dict", because there is a builtin type called "dict" that will be made invisible.
This question already has answers here:
Why is the order in dictionaries and sets arbitrary?
(5 answers)
Closed 7 months ago.
I am a bit confused with the output I get from the following.
I do not understand the order of the loop that is executed.
domains = { "de": "Germany", "sk": "Slovakia", "hu": "Hungary",
"us": "United States", "no": "Norway" }
for key in domains:
print key
Output here is
sk
de
no
us
hu
but not
de
sk
hu
us
no
similarly, here
num = {1:"one",4:"two",23:"three",10:"four"}
for key in num:
print key
output is
1
10
4
23
but not
1
4
23
10
Thanks for helping
Python dictionaries do not preserve ordering:
Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions
A dictionary in CPython is implemented as a hash table to enable fast lookups and membership tests, and enumerating the keys or values happens in the order the items are listed in that table; where they are inserted depends on the hash value for the key and if anything was hashed to the same slot before already.
You'll have to either sort the keys every time when displaying or use a a different type of data structure to preserve ordering. Python 2.7 or newer has a collections.OrderedDict() type, or you can use a list of two-value tuples (at which point lookups of individual key-value pairs is going to be slow).
Python dictionaries don't have an order. However, you can specify an order by using the sorted(domains) function. By default, it sorts using the key.
for key in sorted(domains):
print key
will produce
de
hu
no
sk
us
If you want to order based on values, you can use something like sorted(domains.items(), key = lambda(k, v): (v, k)).
The order is unspecified. It is, however, guaranteed to remain unchanged in the absence of modifications to the dictionary.
You can sort the keys when iterating:
for key in sorted(domains):
print key
Finally, it may be useful to note that newer versions of Python have collections.OrderedDict, which preserves the insertion order.
If you want an ordered dictionary in Python you must use collections.OrderedDict
Dictionaries by definition have no order. That puts it into the dangerous "undefined behavior" zone - not a good idea to rely on it in anything you program, as it can change all of a sudden across implementations/instances. Even if it happens to work how you want now... it lays a landmine for you later.