What dictates the order of data in a dictionary in Python? - python

What determines the order of items in a dictionary(specifically in Python, though this may apply to other languages)? For example:
>>> spam = {'what':4, 'shibby':'cream', 'party':'rock'}
>>> spam
{'party': 'rock', 'what': 4, 'shibby': 'cream'}
If I call on spam again, the items will still be in that same order. But how is this order decided?

According to python docs,
Dictionaries are sometimes found in other languages as “associative
memories” or “associative arrays”. Unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys, which can be
any immutable type; strings and numbers can always be keys.
They are arbitary, again from docs:
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys. Numeric types used for keys obey
the normal rules for numeric comparison: if two numbers compare equal
(such as 1 and 1.0) then they can be used interchangeably to index the
same dictionary entry. (Note however, that since computers store
floating-point numbers as approximations it is usually unwise to use
them as dictionary keys.)

The order in an ordinary dictionary is based on an internal hash value, so you're not supposed to make any assumptions about it.
Use collections.OrderedDict for a dictionary whose order you control.

Because dictionary keys are stored in a hash table. According to http://en.wikipedia.org/wiki/Hash_table:
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order.

Related

The address of keys are stored very far from each other

I'd like to explore the hash table,
In [1]: book = {"apple":0.67, "milk":1.49, "avocado":1.49, "python":2}
In [5]: [hex(id(key)) for key in book]
Out[5]: ['0x10ffffc70', '0x10ffffab0', '0x10ffffe68', '0x10ee1cca8']
The addresses tell that the keys are far away from each other, especially key "python",
I assumed that they are adjacent to one another.
How could this happen? Is it running in high performance?
There are two ways we can interpret your confusion: either you expected the id() to be the hash function for the keys, or you expected keys to be relocated to the hash table and, since in CPython the id() value is a memory location, that the id() values would say something about the hash table size. We can address both by talking about Python's dictionary implementation and how Python deals with objects in general.
Python dictionaries are implemented as a hash table, which is a table of limited size. To store keys, a hash function generates an integer (same integer for equal values), and the key is stored in a slot based on that number using a modulo function:
slot = hash(key) % len(table)
This can lead to collisions, so having a large range of numbers for the hash function to pick from is going to help reduce the chances there are such collisions. You still have to deal with collisions anyway, but you want to minimise that.
Python does not use the id() function as a hash function here, because that would not produce the same hash for equal values! If you didn't produce the same hash for equal values, then you couldn't use multiple "hello world" strings as a means to find the right slot again, as dictionary["hello world"] = "value" then "hello world" in dictionary would produce different id() values and thus hash to different slots and you would not that the specific string value has already been used as a key.
Instead, objects are expected to implement a __hash__ method, and you can see what that method produces for various objects with the hash() function.
Because keys stored in a dictionary must remain unchanged, Python won't let you store mutable types in a dictionary. Otherwise, if you can change their value, they would no longer be equal to another such object with the old value and shame hash, and you wouldn't find them in the slot that their new hash would map to.
Note that Python puts all objects in a dynamic heap, and uses references everywhere to relate the objects. Dictionaries hold references to keys and values; putting a key into a dictionary does not re-locate the key in memory and the id() of the key won't change. If keys were relocated, then a requirement for the id() function would be violated, the documentation states: This is an integer which is guaranteed to be unique and constant for this object during its lifetime.
As for those collisions: Python deals with collisions by looking for a new slot with a fixed formula, finding an empty slot in a predictable but psuedorandom series of slot numbers; see the dictobject.c source code comments if you want to know the details. As the table fills up, Python will dynamically grow the table to fit more elements, so there will always be empty slots.

Python - hash() and dict

If we have 2 separate dict, both with the same keys and values, when we print them it will come in different orders, as expected.
So, let's say I want to to use hash() on those dict:
hash(frozenset(dict1.items()))
hash(frozenset(dict2.items()))
I'm doing this to make a new dict with the hash() value created as the new keys .
Even showing up different when printing dict, the value createad by hash() will always be equal? If no, how to make it always the same so I can make comparisons successfully?
If the keys and values hash the same, frozenset is designed to be a stable and unique representation of the underlying values. The docs explicitly state:
Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).
And the rules for hashable types require that:
Hashable objects which compare equal must have the same hash value.
So by definition frozensets with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset (but then you've got bigger problems).
Note that this does not mean they'll iterate in the same order or produce the same repr; thanks to chaining on hash collisions, two frozensets constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):
>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal

Does the order of keys in dictionary.keys() in a python dictionary change if the values are changed?

I have a python dictionary (say dict) in which I keep modifying values (the keys remain unaltered). Will the order of keys in the list given by dict.keys() change when I modify the values corresponding to the keys?
No, a python dictionary has an ordering for the keys but does not guarantee what that order will be or how it is calculated.
Which is why they are not guaranteed to be ordered in the first place.
The values stored in the dictionary do not have an effect on the hash values of the keys and so will not change ordering.
Taken from the Python Documentation:
The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the in keyword.
No, the order of the dict will not change because you change the values. The order depends on the keys only (or their hash value, to be more specific at least in CPython). However, it may change between versions and implementations of Python, and in Python 3.3, it will change every time you start Python.
Python dictionaries' key ordering should not be assumed constant.
However, there are other datastructures that do give consistent key ordering, that work a lot like dictionaries:
http://stromberg.dnsalias.org/~strombrg/treap/
http://stromberg.dnsalias.org/~strombrg/red-black-tree-mod/
BTW, you should not name a variable "dict", because there is a builtin type called "dict" that will be made invisible.

How do sets work in Python?

>>> {x for x in 'spam'}
{'a', 'p', 's', 'm'}
Why does it change the order? If you take a look at a loop, it works perfectly:
>>> for x in 'spam':
... print(x)
...
s
p
a
m
>>>
Sets in python (and in set theory) are not ordered. So when you loop over them, there is no defined ordering.
You looped over the string literal 'spam' to make a set containing each character in that string. Once you did that, the ordering was gone.
When you perform the for loop over 'spam', you are performing the loop against a string which does have ordering.
From Set types:
These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript [because no ordering is defined among the elemnts]. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
But if you really need to preserve the order, then please check ordered set.
And anyway you may like really to write just >>> set('spam') instead of any comprehension.
set is not an ordered collection, and as such, the internal order of keys is undefined.
From docs.python.org
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built in dict, list, and tuple classes, and the collections module.)
sets are unordered by definition. The reason for this is that their implementation runs faster that way, by using appropriate data structures that do not preserve order. If you need order, you can use the (slower) OrderedDict type.
Python sets are defined as unordered, so Python is free to order them any way it likes (efficiently, I pressme).

What is a tuple useful for?

I am learning Python for a class now, and we just covered tuples as one of the data types. I read the Wikipedia page on it, but, I could not figure out where such a data type would be useful in practice. Can I have some examples, perhaps in Python, where an immutable set of numbers would be needed? How is this different from a list?
Tuples are used whenever you want to return multiple results from a function.
Since they're immutable, they can be used as keys for a dictionary (lists can't).
Tuples make good dictionary keys when you need to combine more than one piece of data into your key and don't feel like making a class for it.
a = {}
a[(1,2,"bob")] = "hello!"
a[("Hello","en-US")] = "Hi There!"
I've used this feature primarily to create a dictionary with keys that are coordinates of the vertices of a mesh. However, in my particular case, the exact comparison of the floats involved worked fine which might not always be true for your purposes [in which case I'd probably convert your incoming floats to some kind of fixed-point integer]
The best way to think about it is:
A tuple is a record whose fields don't have names.
You use a tuple instead of a record when you can't be bothered to specify the field names.
So instead of writing things like:
person = {"name": "Sam", "age": 42}
name, age = person["name"], person["age"]
Or the even more verbose:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person("Sam", 42)
name, age = person.name, person.age
You can just write:
person = ("Sam", 42)
name, age = person
This is useful when you want to pass around a record that has only a couple of fields, or a record that is only used in a few places. In that case specifying a whole new record type with field names (in Python, you'd use an object or a dictionary, as above) could be too verbose.
Tuples originate from the world of functional programming (Haskell, OCaml, Elm, F#, etc.), where they are commonly used for this purpose. Unlike Python, most functional programming languages are statically typed (a variable can only hold one type of value, and that type is determined at compile time). Static typing makes the role of tuples more obvious. For example, in the Elm language:
type alias Person = (String, Int)
person : Person
person = ("Sam", 42)
This highlights the fact that a particular type of tuple is always supposed to have a fixed number of fields in a fixed order, and each of those fields is always supposed to be of the same type. In this example, a person is always a tuple of two fields, one is a string and the other is an integer.
The above is in stark contrast to lists, which are supposed to be variable length (the number of items is normally different in each list, and you write functions to add and remove items) and each item in the list is normally of the same type. For example, you'd have one list of people and another list of addresses - you would not mix people and addresses in the same list. Whereas mixing different types of data inside the same tuple is the whole point of tuples. Fields in a tuple are usually of different types (but not always - e.g. you could have a (Float, Float, Float) tuple to represent x,y,z coordinates).
Tuples and lists are often nested. It's common to have a list of tuples. You could have a list of Person tuples just as well as a list of Person objects. You can also have a tuple field whose value is a list. For example, if you have an address book where one person can have multiple addresses, you could have a tuple of type (Person, [String]). The [String] type is commonly used in functional programming languages to denote a list of strings. In Python, you wouldn't write down the type, but you could use tuples like that in exactly the same manner, putting a Person object in the first field of a tuple and a list of strings in its second field.
In Python, confusion arises because the language does not enforce any of these practices that are enforced by the compiler in statically typed functional languages. In those languages, you cannot mix different kinds of tuples. For example, you cannot return a (String, String) tuple from a function whose type says that it returns a (String, Integer) tuple. You also cannot return a list when the type says you plan to return a tuple, and vice versa. Lists are used strictly for growing collections of items, and tuples strictly for fixed-size records. Python doesn't stop you from breaking any of these rules if you want to.
In Python, a list is sometimes converted into a tuple for use as a dictionary key, because Python dictionary keys need to be immutable (i.e. constant) values, whereas Python lists are mutable (you can add and remove items at any time). This is a workaround for a particular limitation in Python, not a property of tuples as a computer science concept.
So in Python, lists are mutable and tuples are immutable. But this is just a design choice, not an intrinsic property of lists and tuples in computer science. You could just as well have immutable lists and mutable tuples.
In Python (using the default CPython implementation), tuples are also faster than objects or dictionaries for most purposes, so they are occasionally used for that reason, even when naming the fields using an object or dictionary would be clearer.
Finally, to make it even more obvious that tuples are intended to be another kind of record (not another kind of list), Python also has named tuples:
from collections import namedtuple
Person = namedtuple("Person", "name age")
person = Person("Sam", 42)
name, age = person.name, person.age
This is often the best choice - shorter than defining a new class, but the meaning of the fields is more obvious than when using normal tuples whose fields don't have names.
Immutable lists are highly useful for many purposes, but the topic is far too complex to answer here. The main point is that things that cannot change are easier to reason about than things that can change. Most software bugs come from things changing in unexpected ways, so restricting the ways in which they can change is a good way to eliminate bugs. If you are interested, I recommend reading a tutorial for a functional programming language such as Elm, Haskell or Clojure (Elm is the friendliest). The designers of those languages considered immutability so useful that all lists are immutable there. (Instead of changing a list to add and or remove an item, you make a new list with the item added or removed. Immutability guarantees that the old copy of the list can never change, so the compiler and runtime can make the code perform well by re-using parts of the old list in the new one and garbage-collecting the left-over parts when they are longer needed.)
I like this explanation.
Basically, you should use tuples when there's a constant structure (the 1st position always holds one type of value and the second another, and so forth), and lists should be used for lists of homogeneous values.
Of course there's always exceptions, but this is a good general guideline.
Tuples and lists have the same uses in general. Immutable data types in general have many benefits, mostly about concurrency issues.
So, when you have lists that are not volatile in nature and you need to guarantee that no consumer is altering it, you may use a tuple.
Typical examples are fixed data in an application like company divisions, categories, etc. If this data change, typically a single producer rebuilts the tuple.
I find them useful when you always deal with two or more objects as a set.
A tuple is a sequence of values. The values can be any type, and they are indexed by integer, so tuples are not like lists. The most important difference is that tuples are immutable.
A tuple is a comma-separated list of values:
t = 'p', 'q', 'r', 's', 't'
it is good practice to enclose tuples in parentheses:
t = ('p', 'q', 'r', 's', 't')
A list can always replace a tuple, with respect to functionality (except, apparently, as keys in a dict). However, a tuple can make things go faster. The same is true for, for example, immutable strings in Java -- when will you ever need to be unable to alter your strings? Never!
I just read a decent discussion on limiting what you can do in order to make better programs; Why Why Functional Programming Matters Matters
A tuple is useful for storing multiple values.. As you note a tuple is just like a list that is immutable - e.g. once created you cannot add/remove/swap elements.
One benefit of being immutable is that because the tuple is fixed size it allows the run-time to perform certain optimizations. This is particularly beneficial when a tupple is used in the context of a return value or a parameter to a function.
Use Tuple
If your data should or does not need to be changed.
Tuples are faster than lists. We should use a Tuple instead of a List if we are defining a constant set of values and all we are ever going to do
with it is iterate through it.
If we need an array of elements to be
used as dictionary keys, we can use Tuples. As Lists are mutable,
they can never be used as dictionary keys.
Furthermore, Tuples are immutable, whereas Lists are mutable. By the same token, Tuples are fixed size in nature, whereas Lists are dynamic.
a_tuple = tuple(range(1000))
a_list = list(range(1000))
a_tuple.__sizeof__() # 8024 bytes
a_list.__sizeof__() # 9088 bytes
more information :
https://jerrynsh.com/tuples-vs-lists-vs-sets-in-python/
In addition to the places where they're syntactically required like the string % operation and for multiple return values, I use tuples as a form of lightweight classes. For example, suppose you have an object that passes out an opaque cookie to a caller from one method which is then passed into another method. A tuple is a good way to pack multiple values into that cookie without having to define a separate class to contain them.
I try to be judicious about this particular use, though. If the cookies are used liberally throughout the code, it's better to create a class because it helps document their use. If they are only used in one place (e.g. one pair of methods) then I might use a tuple. In any case, because it's Python you can start with a tuple and then change it to an instance of a custom class without having to change any code in the caller.
Tuples are used in :
places where you want your sequence of elements to be immutable
in tuple assignments
a,b=1,2
in variable length arguments
def add(*arg) #arg is a tuple
return sum(arg)

Categories

Resources