What is the difference between a Ruby Hash and a Python dictionary? - python

In Python, there are dictionaries:
residents = {'Puffin' : 104, 'Sloth' : 105, 'Burmese Python' : 106}
In Ruby, there are Hashes:
residents = {'Puffin' => 104, 'Sloth' => 105, 'Burmese Python' => 106}
The only difference is the : versus => syntax. (Note that if the example were using variables instead of strings, then there would be no syntax difference.)
In Python, you call a dictionary's value via a key:
residents['Puffin']
# => 104
In Ruby, you grab a Hash's value via a key as well:
residents['Puffin']
# => 104
They appear to be the same.
What is the difference between a Hash in Ruby and a dictionary in Python?

Both Ruby's Hash and Python's dictionary represent a Map Abstract Data Type (ADT)
.. an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
Furthermore, both Hash and dictionary are implemented as Hash Tables which require that keys are hashable and equatable. Generally speaking, insert and delete and fetch operations on a hash table are O(1) amortized or "fast, independent of hash/dict size".
[A hash table] is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
(Map implementations that use Trees, as opposed to Hash Tables, are found in persisted and functional programming contexts.)
Of course, there are also differences between Ruby and Python design choices and the specific/default Map implementations provided:
Default behavior on missing key lookup: nil in Hash, exception in dict1
Insertion-ordering guarantees: guaranteed in Hash (since Ruby 2.0), no guarantee in dict (until Python 3.6)1
Being able to specify a default value generator: Hash only1
Ability to use core mutable types (eg. lists) as keys: Hash only2
Syntax used for Hash/dict Literals, etc..
The [] syntax support is common insofar as both languages provide syntactic sugar for an overloaded index operator, but is implemented differently underneath and has different semantics in the case of missing keys.
1 Python offers defaultdict and OrderedDict implementations as well which have different behavior/functionality from the standard dict. These implementation allow default value generators, missing-key handling, and additional ordering guarantees that are not found in the standard dict type.
2 Certain core types in Python (eg. list and dict) explicitly reject being hashable and thus they cannot be used as keys in a dictionary that is based on hashing. This is not strictly a difference of dict itself and one can still use mutable custom types as keys, although such is discouraged in most cases.

They (dictionary in Python, hash in Ruby) are identical for all practical purposes, and implement a general Dictionary / Hashtable (a key - value store) where you typically store an entry given a unique key, and get fast lookup for it's value.

Now ruby also supports following sysntax:
residents = {'Puffin': 104, 'Sloth': 105, 'Burmese Python': 106}
But then we should access values by the symbol notation:
residents[:Puffin]

Related

The address of keys are stored very far from each other

I'd like to explore the hash table,
In [1]: book = {"apple":0.67, "milk":1.49, "avocado":1.49, "python":2}
In [5]: [hex(id(key)) for key in book]
Out[5]: ['0x10ffffc70', '0x10ffffab0', '0x10ffffe68', '0x10ee1cca8']
The addresses tell that the keys are far away from each other, especially key "python",
I assumed that they are adjacent to one another.
How could this happen? Is it running in high performance?
There are two ways we can interpret your confusion: either you expected the id() to be the hash function for the keys, or you expected keys to be relocated to the hash table and, since in CPython the id() value is a memory location, that the id() values would say something about the hash table size. We can address both by talking about Python's dictionary implementation and how Python deals with objects in general.
Python dictionaries are implemented as a hash table, which is a table of limited size. To store keys, a hash function generates an integer (same integer for equal values), and the key is stored in a slot based on that number using a modulo function:
slot = hash(key) % len(table)
This can lead to collisions, so having a large range of numbers for the hash function to pick from is going to help reduce the chances there are such collisions. You still have to deal with collisions anyway, but you want to minimise that.
Python does not use the id() function as a hash function here, because that would not produce the same hash for equal values! If you didn't produce the same hash for equal values, then you couldn't use multiple "hello world" strings as a means to find the right slot again, as dictionary["hello world"] = "value" then "hello world" in dictionary would produce different id() values and thus hash to different slots and you would not that the specific string value has already been used as a key.
Instead, objects are expected to implement a __hash__ method, and you can see what that method produces for various objects with the hash() function.
Because keys stored in a dictionary must remain unchanged, Python won't let you store mutable types in a dictionary. Otherwise, if you can change their value, they would no longer be equal to another such object with the old value and shame hash, and you wouldn't find them in the slot that their new hash would map to.
Note that Python puts all objects in a dynamic heap, and uses references everywhere to relate the objects. Dictionaries hold references to keys and values; putting a key into a dictionary does not re-locate the key in memory and the id() of the key won't change. If keys were relocated, then a requirement for the id() function would be violated, the documentation states: This is an integer which is guaranteed to be unique and constant for this object during its lifetime.
As for those collisions: Python deals with collisions by looking for a new slot with a fixed formula, finding an empty slot in a predictable but psuedorandom series of slot numbers; see the dictobject.c source code comments if you want to know the details. As the table fills up, Python will dynamically grow the table to fit more elements, so there will always be empty slots.

Python Hash function and Hash Object

What is the different between hashable and hashobject in python?
Hashable
In general means an object has a hash value that never changes in its lifetime and can be compared to other objects. Thanks to those two features, a hashable object can be used as a key in a generic hash map
in python mmutable built-in objects are hashable while mutable containers (such as lists or dictionaries) are not. User-defined objects are by default hashable
Hashtable
in general, hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. Each key given a hash value through hash function for lookup
in python, dictionary is an implementation of hashtable
hash() in python
hash is a hash function that gives you a hash value (for the key inputed)
In [1]: hash ('seed_of_wind')
Out[1]: 8762898084756078118
As mentioned already, this distinctive 'id' is very useful for look up
in theory, a distinctive key will generate a distinctive hash value
By hash object, do you mean by hashable object? If so, it is covered above

What dictates the order of data in a dictionary in Python?

What determines the order of items in a dictionary(specifically in Python, though this may apply to other languages)? For example:
>>> spam = {'what':4, 'shibby':'cream', 'party':'rock'}
>>> spam
{'party': 'rock', 'what': 4, 'shibby': 'cream'}
If I call on spam again, the items will still be in that same order. But how is this order decided?
According to python docs,
Dictionaries are sometimes found in other languages as “associative
memories” or “associative arrays”. Unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys, which can be
any immutable type; strings and numbers can always be keys.
They are arbitary, again from docs:
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys. Numeric types used for keys obey
the normal rules for numeric comparison: if two numbers compare equal
(such as 1 and 1.0) then they can be used interchangeably to index the
same dictionary entry. (Note however, that since computers store
floating-point numbers as approximations it is usually unwise to use
them as dictionary keys.)
The order in an ordinary dictionary is based on an internal hash value, so you're not supposed to make any assumptions about it.
Use collections.OrderedDict for a dictionary whose order you control.
Because dictionary keys are stored in a hash table. According to http://en.wikipedia.org/wiki/Hash_table:
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order.

Python data structure design

The data structure should meet the following purpose:
each object is unique with certain key-value pairs
the keys and values are not predetermined and can contain any string value
querying for objects should be fast
Example:
object_123({'stupid':True, 'foo':'bar', ...})
structure.get({'stupid':True, 'foo':'bar', ...}) should return object_123
Optimally this structure is implemented with the standard python data structures available through the standard library.
How would you implement this?
The simplest solution I can think of is to use sorted tuple keys:
def key(d): return tuple(sorted(d.items()))
x = {}
x[key({'stupid':True, 'foo':'bar', ...})] = object_123
x.get(key({'stupid':True, 'foo':'bar', ...})) => object_123
Another option would be to come up with your own hashing scheme for your keys (either by wrapping them in a class or just using numeric keys in the dictionary), but depending on your access pattern this may be slower.
I think SQLite or is what you need. It may not be implemented with standard python structures but it's available through the standard library.
Say object_123 is a dict, which it pretty much looks like. Your structure seems to be a standard dict with keys like (('foo', 'bar'), ('stupid', True)); in other words, tuple(sorted(object_123.items())) so that they're always listed in a defined order.
The reason for the defined ordering is because dict.items() isn't guaranteed to return a list in a given ordering. If your dictionary key is (('foo', 'bar'), ('stupid', True)), you don't want a false negative just because you're searching for (('stupid', True),('foo', 'bar')). Sorting the values is probably the quickest way to protect against that.

What is a tuple useful for?

I am learning Python for a class now, and we just covered tuples as one of the data types. I read the Wikipedia page on it, but, I could not figure out where such a data type would be useful in practice. Can I have some examples, perhaps in Python, where an immutable set of numbers would be needed? How is this different from a list?
Tuples are used whenever you want to return multiple results from a function.
Since they're immutable, they can be used as keys for a dictionary (lists can't).
Tuples make good dictionary keys when you need to combine more than one piece of data into your key and don't feel like making a class for it.
a = {}
a[(1,2,"bob")] = "hello!"
a[("Hello","en-US")] = "Hi There!"
I've used this feature primarily to create a dictionary with keys that are coordinates of the vertices of a mesh. However, in my particular case, the exact comparison of the floats involved worked fine which might not always be true for your purposes [in which case I'd probably convert your incoming floats to some kind of fixed-point integer]
The best way to think about it is:
A tuple is a record whose fields don't have names.
You use a tuple instead of a record when you can't be bothered to specify the field names.
So instead of writing things like:
person = {"name": "Sam", "age": 42}
name, age = person["name"], person["age"]
Or the even more verbose:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person("Sam", 42)
name, age = person.name, person.age
You can just write:
person = ("Sam", 42)
name, age = person
This is useful when you want to pass around a record that has only a couple of fields, or a record that is only used in a few places. In that case specifying a whole new record type with field names (in Python, you'd use an object or a dictionary, as above) could be too verbose.
Tuples originate from the world of functional programming (Haskell, OCaml, Elm, F#, etc.), where they are commonly used for this purpose. Unlike Python, most functional programming languages are statically typed (a variable can only hold one type of value, and that type is determined at compile time). Static typing makes the role of tuples more obvious. For example, in the Elm language:
type alias Person = (String, Int)
person : Person
person = ("Sam", 42)
This highlights the fact that a particular type of tuple is always supposed to have a fixed number of fields in a fixed order, and each of those fields is always supposed to be of the same type. In this example, a person is always a tuple of two fields, one is a string and the other is an integer.
The above is in stark contrast to lists, which are supposed to be variable length (the number of items is normally different in each list, and you write functions to add and remove items) and each item in the list is normally of the same type. For example, you'd have one list of people and another list of addresses - you would not mix people and addresses in the same list. Whereas mixing different types of data inside the same tuple is the whole point of tuples. Fields in a tuple are usually of different types (but not always - e.g. you could have a (Float, Float, Float) tuple to represent x,y,z coordinates).
Tuples and lists are often nested. It's common to have a list of tuples. You could have a list of Person tuples just as well as a list of Person objects. You can also have a tuple field whose value is a list. For example, if you have an address book where one person can have multiple addresses, you could have a tuple of type (Person, [String]). The [String] type is commonly used in functional programming languages to denote a list of strings. In Python, you wouldn't write down the type, but you could use tuples like that in exactly the same manner, putting a Person object in the first field of a tuple and a list of strings in its second field.
In Python, confusion arises because the language does not enforce any of these practices that are enforced by the compiler in statically typed functional languages. In those languages, you cannot mix different kinds of tuples. For example, you cannot return a (String, String) tuple from a function whose type says that it returns a (String, Integer) tuple. You also cannot return a list when the type says you plan to return a tuple, and vice versa. Lists are used strictly for growing collections of items, and tuples strictly for fixed-size records. Python doesn't stop you from breaking any of these rules if you want to.
In Python, a list is sometimes converted into a tuple for use as a dictionary key, because Python dictionary keys need to be immutable (i.e. constant) values, whereas Python lists are mutable (you can add and remove items at any time). This is a workaround for a particular limitation in Python, not a property of tuples as a computer science concept.
So in Python, lists are mutable and tuples are immutable. But this is just a design choice, not an intrinsic property of lists and tuples in computer science. You could just as well have immutable lists and mutable tuples.
In Python (using the default CPython implementation), tuples are also faster than objects or dictionaries for most purposes, so they are occasionally used for that reason, even when naming the fields using an object or dictionary would be clearer.
Finally, to make it even more obvious that tuples are intended to be another kind of record (not another kind of list), Python also has named tuples:
from collections import namedtuple
Person = namedtuple("Person", "name age")
person = Person("Sam", 42)
name, age = person.name, person.age
This is often the best choice - shorter than defining a new class, but the meaning of the fields is more obvious than when using normal tuples whose fields don't have names.
Immutable lists are highly useful for many purposes, but the topic is far too complex to answer here. The main point is that things that cannot change are easier to reason about than things that can change. Most software bugs come from things changing in unexpected ways, so restricting the ways in which they can change is a good way to eliminate bugs. If you are interested, I recommend reading a tutorial for a functional programming language such as Elm, Haskell or Clojure (Elm is the friendliest). The designers of those languages considered immutability so useful that all lists are immutable there. (Instead of changing a list to add and or remove an item, you make a new list with the item added or removed. Immutability guarantees that the old copy of the list can never change, so the compiler and runtime can make the code perform well by re-using parts of the old list in the new one and garbage-collecting the left-over parts when they are longer needed.)
I like this explanation.
Basically, you should use tuples when there's a constant structure (the 1st position always holds one type of value and the second another, and so forth), and lists should be used for lists of homogeneous values.
Of course there's always exceptions, but this is a good general guideline.
Tuples and lists have the same uses in general. Immutable data types in general have many benefits, mostly about concurrency issues.
So, when you have lists that are not volatile in nature and you need to guarantee that no consumer is altering it, you may use a tuple.
Typical examples are fixed data in an application like company divisions, categories, etc. If this data change, typically a single producer rebuilts the tuple.
I find them useful when you always deal with two or more objects as a set.
A tuple is a sequence of values. The values can be any type, and they are indexed by integer, so tuples are not like lists. The most important difference is that tuples are immutable.
A tuple is a comma-separated list of values:
t = 'p', 'q', 'r', 's', 't'
it is good practice to enclose tuples in parentheses:
t = ('p', 'q', 'r', 's', 't')
A list can always replace a tuple, with respect to functionality (except, apparently, as keys in a dict). However, a tuple can make things go faster. The same is true for, for example, immutable strings in Java -- when will you ever need to be unable to alter your strings? Never!
I just read a decent discussion on limiting what you can do in order to make better programs; Why Why Functional Programming Matters Matters
A tuple is useful for storing multiple values.. As you note a tuple is just like a list that is immutable - e.g. once created you cannot add/remove/swap elements.
One benefit of being immutable is that because the tuple is fixed size it allows the run-time to perform certain optimizations. This is particularly beneficial when a tupple is used in the context of a return value or a parameter to a function.
Use Tuple
If your data should or does not need to be changed.
Tuples are faster than lists. We should use a Tuple instead of a List if we are defining a constant set of values and all we are ever going to do
with it is iterate through it.
If we need an array of elements to be
used as dictionary keys, we can use Tuples. As Lists are mutable,
they can never be used as dictionary keys.
Furthermore, Tuples are immutable, whereas Lists are mutable. By the same token, Tuples are fixed size in nature, whereas Lists are dynamic.
a_tuple = tuple(range(1000))
a_list = list(range(1000))
a_tuple.__sizeof__() # 8024 bytes
a_list.__sizeof__() # 9088 bytes
more information :
https://jerrynsh.com/tuples-vs-lists-vs-sets-in-python/
In addition to the places where they're syntactically required like the string % operation and for multiple return values, I use tuples as a form of lightweight classes. For example, suppose you have an object that passes out an opaque cookie to a caller from one method which is then passed into another method. A tuple is a good way to pack multiple values into that cookie without having to define a separate class to contain them.
I try to be judicious about this particular use, though. If the cookies are used liberally throughout the code, it's better to create a class because it helps document their use. If they are only used in one place (e.g. one pair of methods) then I might use a tuple. In any case, because it's Python you can start with a tuple and then change it to an instance of a custom class without having to change any code in the caller.
Tuples are used in :
places where you want your sequence of elements to be immutable
in tuple assignments
a,b=1,2
in variable length arguments
def add(*arg) #arg is a tuple
return sum(arg)

Categories

Resources