Python - hash() and dict

Python - hash() and dict - python

If we have 2 separate dict, both with the same keys and values, when we print them it will come in different orders, as expected.
So, let's say I want to to use hash() on those dict:
hash(frozenset(dict1.items()))
hash(frozenset(dict2.items()))
I'm doing this to make a new dict with the hash() value created as the new keys .
Even showing up different when printing dict, the value createad by hash() will always be equal? If no, how to make it always the same so I can make comparisons successfully?

If the keys and values hash the same, frozenset is designed to be a stable and unique representation of the underlying values. The docs explicitly state:
Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).
And the rules for hashable types require that:
Hashable objects which compare equal must have the same hash value.
So by definition frozensets with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset (but then you've got bigger problems).
Note that this does not mean they'll iterate in the same order or produce the same repr; thanks to chaining on hash collisions, two frozensets constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):
>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal

Related

Why enumerate should accept a set as an input?

A pyton set is meant as not ordered, so why enumerate accepts them as input?
The same question would apply to dictionary.
From my point of view these are giving the false impression that there is a predictable way of enumerating them, but there is not.
This is quite misleading. I would have expected at least a warning from enumerate whens I request the enumerate(set) or enumerate(dict).
Can anyone explain why this warning is not there? is it "pythonic" to allow enumeration which can be not predictable?

There is a distinction between a container and its iterator. Technically, enumerate doesn't work with set, dict, or list, because none of those types is an iterator. They are iterable, though, meaning enumerate can get an iterator from each by implicitly using the iter function (i.e., enumerate(some_list_dict_or_set) == enumerate(iter(some_list_dict_or_set)))
>>> iter([1,2,3])
<list_iterator object at 0x109d924e0>
>>> iter(dict(a=1, b=2))
<dict_keyiterator object at 0x109d4b818>
>>> iter({1,2,3})
<set_iterator object at 0x109d53ab0>
So while a given container may not have any inherent ordering of its elements, its iterator can impose an order, and enumerate simply pairs that ordering with a sequence of int values.
You can really see the difference between inherent ordering and imposed ordering when comparing dict and OrderedDict in Python 3.7 or later. Both remember the order in which its keys were added, but that order isn't an important part of a dict's identity. That is, two dicts with the same keys and values mapped to those keys are equivalent, no matter what order the keys were added.
>>> dict(a=1, b=2) == dict(b=2, a=1)
True
The same is not true of two OrderedDicts, which are only equal they have the same keys, the same values for those keys, and the keys were added in the same order.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1)
False

enumerate accepts any iterable which includes set and dict. set might be unordered but its order of iteration is not arbitrary; if you iterate the same set multiple times, it will yield elements in the same order.
Also note that as of Python 3.7 dict preserves insertion order. Whether or not this is useful solely depends on your use case.

The address of keys are stored very far from each other

I'd like to explore the hash table,
In [1]: book = {"apple":0.67, "milk":1.49, "avocado":1.49, "python":2}
In [5]: [hex(id(key)) for key in book]
Out[5]: ['0x10ffffc70', '0x10ffffab0', '0x10ffffe68', '0x10ee1cca8']
The addresses tell that the keys are far away from each other, especially key "python",
I assumed that they are adjacent to one another.
How could this happen? Is it running in high performance?

There are two ways we can interpret your confusion: either you expected the id() to be the hash function for the keys, or you expected keys to be relocated to the hash table and, since in CPython the id() value is a memory location, that the id() values would say something about the hash table size. We can address both by talking about Python's dictionary implementation and how Python deals with objects in general.
Python dictionaries are implemented as a hash table, which is a table of limited size. To store keys, a hash function generates an integer (same integer for equal values), and the key is stored in a slot based on that number using a modulo function:
slot = hash(key) % len(table)
This can lead to collisions, so having a large range of numbers for the hash function to pick from is going to help reduce the chances there are such collisions. You still have to deal with collisions anyway, but you want to minimise that.
Python does not use the id() function as a hash function here, because that would not produce the same hash for equal values! If you didn't produce the same hash for equal values, then you couldn't use multiple "hello world" strings as a means to find the right slot again, as dictionary["hello world"] = "value" then "hello world" in dictionary would produce different id() values and thus hash to different slots and you would not that the specific string value has already been used as a key.
Instead, objects are expected to implement a __hash__ method, and you can see what that method produces for various objects with the hash() function.
Because keys stored in a dictionary must remain unchanged, Python won't let you store mutable types in a dictionary. Otherwise, if you can change their value, they would no longer be equal to another such object with the old value and shame hash, and you wouldn't find them in the slot that their new hash would map to.
Note that Python puts all objects in a dynamic heap, and uses references everywhere to relate the objects. Dictionaries hold references to keys and values; putting a key into a dictionary does not re-locate the key in memory and the id() of the key won't change. If keys were relocated, then a requirement for the id() function would be violated, the documentation states: This is an integer which is guaranteed to be unique and constant for this object during its lifetime.
As for those collisions: Python deals with collisions by looking for a new slot with a fixed formula, finding an empty slot in a predictable but psuedorandom series of slot numbers; see the dictobject.c source code comments if you want to know the details. As the table fills up, Python will dynamically grow the table to fit more elements, so there will always be empty slots.

Are sets internally sorted, or is the str method displaying a sorted list?

I have a set, I add items (ints) to it, and when I print it, the items apparently are sorted:
a = set()
a.add(3)
a.add(2)
a.add(4)
a.add(1)
a.add(5)
print a
# set([1, 2, 3, 4, 5])
I have tried with various values, apparently it needs to be only ints.
I run Python 2.7.5 under MacOSX. It is also reproduced using repl.it (see http://repl.it/TpV)
The question is: is this documented somewhere (haven't find it so far), is it normal, is it something that can be relied on?
Extra question: when is the sort done? during the print? is it internally stored sorted? (is that even possible given the expected constant complexity of insertion?)

This is a coincidence. The data is neither sorted nor does __str__ sort.
The hash values for integers equal their value (except for -1 and long integers outside the sys.maxint range), which increases the chance that integers are slotted in order, but that's not a given.
set uses a hash table to track items contained, and ordering depends on the hash value, and insertion and deletion history.
The how and why of the interaction between integers and sets are all implementation details, and can easily vary from version to version. Python 3.3 introduced hash randomisation for certain types, and Python 3.4 expanded on this, making ordering of sets and dictionaries volatile between Python process restarts too (depending on the types of values stored).

What dictates the order of data in a dictionary in Python?

What determines the order of items in a dictionary(specifically in Python, though this may apply to other languages)? For example:
>>> spam = {'what':4, 'shibby':'cream', 'party':'rock'}
>>> spam
{'party': 'rock', 'what': 4, 'shibby': 'cream'}
If I call on spam again, the items will still be in that same order. But how is this order decided?

According to python docs,
Dictionaries are sometimes found in other languages as “associative
memories” or “associative arrays”. Unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys, which can be
any immutable type; strings and numbers can always be keys.
They are arbitary, again from docs:
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys. Numeric types used for keys obey
the normal rules for numeric comparison: if two numbers compare equal
(such as 1 and 1.0) then they can be used interchangeably to index the
same dictionary entry. (Note however, that since computers store
floating-point numbers as approximations it is usually unwise to use
them as dictionary keys.)

The order in an ordinary dictionary is based on an internal hash value, so you're not supposed to make any assumptions about it.
Use collections.OrderedDict for a dictionary whose order you control.

Because dictionary keys are stored in a hash table. According to http://en.wikipedia.org/wiki/Hash_table:
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order.

How do sets work in Python?

>>> {x for x in 'spam'}
{'a', 'p', 's', 'm'}
Why does it change the order? If you take a look at a loop, it works perfectly:
>>> for x in 'spam':
... print(x)
...
s
p
a
m
>>>

Sets in python (and in set theory) are not ordered. So when you loop over them, there is no defined ordering.
You looped over the string literal 'spam' to make a set containing each character in that string. Once you did that, the ordering was gone.
When you perform the for loop over 'spam', you are performing the loop against a string which does have ordering.

From Set types:
These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript [because no ordering is defined among the elemnts]. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
But if you really need to preserve the order, then please check ordered set.
And anyway you may like really to write just >>> set('spam') instead of any comprehension.

set is not an ordered collection, and as such, the internal order of keys is undefined.

From docs.python.org
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built in dict, list, and tuple classes, and the collections module.)

sets are unordered by definition. The reason for this is that their implementation runs faster that way, by using appropriate data structures that do not preserve order. If you need order, you can use the (slower) OrderedDict type.

Python sets are defined as unordered, so Python is free to order them any way it likes (efficiently, I pressme).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - hash() and dict - python

Related

Why enumerate should accept a set as an input?

The address of keys are stored very far from each other

Are sets internally sorted, or is the str method displaying a sorted list?

What dictates the order of data in a dictionary in Python?

How do sets work in Python?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - hash() and dict - python

Related

Why enumerate should accept a set as an input?

The address of keys are stored very far from each other

Are sets internally sorted, or is the __str__ method displaying a sorted list?

What dictates the order of data in a dictionary in Python?

How do sets work in Python?

Categories

Resources

Are sets internally sorted, or is the str method displaying a sorted list?