Python Hash function and Hash Object

Python Hash function and Hash Object - python

What is the different between hashable and hashobject in python?

Hashable
In general means an object has a hash value that never changes in its lifetime and can be compared to other objects. Thanks to those two features, a hashable object can be used as a key in a generic hash map
in python mmutable built-in objects are hashable while mutable containers (such as lists or dictionaries) are not. User-defined objects are by default hashable
Hashtable
in general, hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. Each key given a hash value through hash function for lookup
in python, dictionary is an implementation of hashtable
hash() in python
hash is a hash function that gives you a hash value (for the key inputed)
In [1]: hash ('seed_of_wind')
Out[1]: 8762898084756078118
As mentioned already, this distinctive 'id' is very useful for look up
in theory, a distinctive key will generate a distinctive hash value
By hash object, do you mean by hashable object? If so, it is covered above

Related

The address of keys are stored very far from each other

I'd like to explore the hash table,
In [1]: book = {"apple":0.67, "milk":1.49, "avocado":1.49, "python":2}
In [5]: [hex(id(key)) for key in book]
Out[5]: ['0x10ffffc70', '0x10ffffab0', '0x10ffffe68', '0x10ee1cca8']
The addresses tell that the keys are far away from each other, especially key "python",
I assumed that they are adjacent to one another.
How could this happen? Is it running in high performance?

There are two ways we can interpret your confusion: either you expected the id() to be the hash function for the keys, or you expected keys to be relocated to the hash table and, since in CPython the id() value is a memory location, that the id() values would say something about the hash table size. We can address both by talking about Python's dictionary implementation and how Python deals with objects in general.
Python dictionaries are implemented as a hash table, which is a table of limited size. To store keys, a hash function generates an integer (same integer for equal values), and the key is stored in a slot based on that number using a modulo function:
slot = hash(key) % len(table)
This can lead to collisions, so having a large range of numbers for the hash function to pick from is going to help reduce the chances there are such collisions. You still have to deal with collisions anyway, but you want to minimise that.
Python does not use the id() function as a hash function here, because that would not produce the same hash for equal values! If you didn't produce the same hash for equal values, then you couldn't use multiple "hello world" strings as a means to find the right slot again, as dictionary["hello world"] = "value" then "hello world" in dictionary would produce different id() values and thus hash to different slots and you would not that the specific string value has already been used as a key.
Instead, objects are expected to implement a __hash__ method, and you can see what that method produces for various objects with the hash() function.
Because keys stored in a dictionary must remain unchanged, Python won't let you store mutable types in a dictionary. Otherwise, if you can change their value, they would no longer be equal to another such object with the old value and shame hash, and you wouldn't find them in the slot that their new hash would map to.
Note that Python puts all objects in a dynamic heap, and uses references everywhere to relate the objects. Dictionaries hold references to keys and values; putting a key into a dictionary does not re-locate the key in memory and the id() of the key won't change. If keys were relocated, then a requirement for the id() function would be violated, the documentation states: This is an integer which is guaranteed to be unique and constant for this object during its lifetime.
As for those collisions: Python deals with collisions by looking for a new slot with a fixed formula, finding an empty slot in a predictable but psuedorandom series of slot numbers; see the dictobject.c source code comments if you want to know the details. As the table fills up, Python will dynamically grow the table to fit more elements, so there will always be empty slots.

Python - hash() and dict

If we have 2 separate dict, both with the same keys and values, when we print them it will come in different orders, as expected.
So, let's say I want to to use hash() on those dict:
hash(frozenset(dict1.items()))
hash(frozenset(dict2.items()))
I'm doing this to make a new dict with the hash() value created as the new keys .
Even showing up different when printing dict, the value createad by hash() will always be equal? If no, how to make it always the same so I can make comparisons successfully?

If the keys and values hash the same, frozenset is designed to be a stable and unique representation of the underlying values. The docs explicitly state:
Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).
And the rules for hashable types require that:
Hashable objects which compare equal must have the same hash value.
So by definition frozensets with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset (but then you've got bigger problems).
Note that this does not mean they'll iterate in the same order or produce the same repr; thanks to chaining on hash collisions, two frozensets constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):
>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal

What is the difference between a Ruby Hash and a Python dictionary?

In Python, there are dictionaries:
residents = {'Puffin' : 104, 'Sloth' : 105, 'Burmese Python' : 106}
In Ruby, there are Hashes:
residents = {'Puffin' => 104, 'Sloth' => 105, 'Burmese Python' => 106}
The only difference is the : versus => syntax. (Note that if the example were using variables instead of strings, then there would be no syntax difference.)
In Python, you call a dictionary's value via a key:
residents['Puffin']
# => 104
In Ruby, you grab a Hash's value via a key as well:
residents['Puffin']
# => 104
They appear to be the same.
What is the difference between a Hash in Ruby and a dictionary in Python?

Both Ruby's Hash and Python's dictionary represent a Map Abstract Data Type (ADT)
.. an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
Furthermore, both Hash and dictionary are implemented as Hash Tables which require that keys are hashable and equatable. Generally speaking, insert and delete and fetch operations on a hash table are O(1) amortized or "fast, independent of hash/dict size".
[A hash table] is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
(Map implementations that use Trees, as opposed to Hash Tables, are found in persisted and functional programming contexts.)
Of course, there are also differences between Ruby and Python design choices and the specific/default Map implementations provided:
Default behavior on missing key lookup: nil in Hash, exception in dict1
Insertion-ordering guarantees: guaranteed in Hash (since Ruby 2.0), no guarantee in dict (until Python 3.6)1
Being able to specify a default value generator: Hash only1
Ability to use core mutable types (eg. lists) as keys: Hash only2
Syntax used for Hash/dict Literals, etc..
The [] syntax support is common insofar as both languages provide syntactic sugar for an overloaded index operator, but is implemented differently underneath and has different semantics in the case of missing keys.
1 Python offers defaultdict and OrderedDict implementations as well which have different behavior/functionality from the standard dict. These implementation allow default value generators, missing-key handling, and additional ordering guarantees that are not found in the standard dict type.
2 Certain core types in Python (eg. list and dict) explicitly reject being hashable and thus they cannot be used as keys in a dictionary that is based on hashing. This is not strictly a difference of dict itself and one can still use mutable custom types as keys, although such is discouraged in most cases.

They (dictionary in Python, hash in Ruby) are identical for all practical purposes, and implement a general Dictionary / Hashtable (a key - value store) where you typically store an entry given a unique key, and get fast lookup for it's value.

Now ruby also supports following sysntax:
residents = {'Puffin': 104, 'Sloth': 105, 'Burmese Python': 106}
But then we should access values by the symbol notation:
residents[:Puffin]

What makes lists unhashable?

So lists are unhashable:
>>> { [1,2]:3 }
TypeError: unhashable type: 'list'
The following page gives an explanation:
A list is a mutable type, and cannot be used as a key in a dictionary
(it could change in-place making the key no longer locatable in the
internal hash table of the dictionary).
I understand why it is undesirable to use mutable objects as dictionary keys. However, Python raises the same exception even when I am simply trying to hash a list (independently of dictionary creation)
>>> hash( [1,2] )
TypeError: unhashable type: 'list'
Does Python do this as a guarantee that mutable types will never be used as dictionary keys? Or is there another reason that makes mutable objects impossible to hash, regardless of how I plan to use them?

Dictionaries and sets use hashing algorithms to uniquely determine an item. And those algorithms make use of the items used as keys to come up the unique hash value. Since lists are mutable, the contents of a list can change. After allowing a list to be in a dictionary as a key, if the contents of the list changes, the hash value will also change. If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.
Since, it is not possible to come up with a hash value, internally there is no hashing function defined for lists.
PyObject_HashNotImplemented, /* tp_hash */
As the hashing function is not implemented, when you use it as a key in the dictionary, or forcefully try to get the hash value with hash function, it fails to hash it and so it fails with unhashable type
TypeError: unhashable type: 'list'

What dictates the order of data in a dictionary in Python?

What determines the order of items in a dictionary(specifically in Python, though this may apply to other languages)? For example:
>>> spam = {'what':4, 'shibby':'cream', 'party':'rock'}
>>> spam
{'party': 'rock', 'what': 4, 'shibby': 'cream'}
If I call on spam again, the items will still be in that same order. But how is this order decided?

According to python docs,
Dictionaries are sometimes found in other languages as “associative
memories” or “associative arrays”. Unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys, which can be
any immutable type; strings and numbers can always be keys.
They are arbitary, again from docs:
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys. Numeric types used for keys obey
the normal rules for numeric comparison: if two numbers compare equal
(such as 1 and 1.0) then they can be used interchangeably to index the
same dictionary entry. (Note however, that since computers store
floating-point numbers as approximations it is usually unwise to use
them as dictionary keys.)

The order in an ordinary dictionary is based on an internal hash value, so you're not supposed to make any assumptions about it.
Use collections.OrderedDict for a dictionary whose order you control.

Because dictionary keys are stored in a hash table. According to http://en.wikipedia.org/wiki/Hash_table:
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.