How do sets work in Python? - python

>>> {x for x in 'spam'}
{'a', 'p', 's', 'm'}
Why does it change the order? If you take a look at a loop, it works perfectly:
>>> for x in 'spam':
... print(x)
...
s
p
a
m
>>>

Sets in python (and in set theory) are not ordered. So when you loop over them, there is no defined ordering.
You looped over the string literal 'spam' to make a set containing each character in that string. Once you did that, the ordering was gone.
When you perform the for loop over 'spam', you are performing the loop against a string which does have ordering.

From Set types:
These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript [because no ordering is defined among the elemnts]. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
But if you really need to preserve the order, then please check ordered set.
And anyway you may like really to write just >>> set('spam') instead of any comprehension.

set is not an ordered collection, and as such, the internal order of keys is undefined.

From docs.python.org
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built in dict, list, and tuple classes, and the collections module.)

sets are unordered by definition. The reason for this is that their implementation runs faster that way, by using appropriate data structures that do not preserve order. If you need order, you can use the (slower) OrderedDict type.

Python sets are defined as unordered, so Python is free to order them any way it likes (efficiently, I pressme).

Related

Why enumerate should accept a set as an input?

A pyton set is meant as not ordered, so why enumerate accepts them as input?
The same question would apply to dictionary.
From my point of view these are giving the false impression that there is a predictable way of enumerating them, but there is not.
This is quite misleading. I would have expected at least a warning from enumerate whens I request the enumerate(set) or enumerate(dict).
Can anyone explain why this warning is not there? is it "pythonic" to allow enumeration which can be not predictable?
There is a distinction between a container and its iterator. Technically, enumerate doesn't work with set, dict, or list, because none of those types is an iterator. They are iterable, though, meaning enumerate can get an iterator from each by implicitly using the iter function (i.e., enumerate(some_list_dict_or_set) == enumerate(iter(some_list_dict_or_set)))
>>> iter([1,2,3])
<list_iterator object at 0x109d924e0>
>>> iter(dict(a=1, b=2))
<dict_keyiterator object at 0x109d4b818>
>>> iter({1,2,3})
<set_iterator object at 0x109d53ab0>
So while a given container may not have any inherent ordering of its elements, its iterator can impose an order, and enumerate simply pairs that ordering with a sequence of int values.
You can really see the difference between inherent ordering and imposed ordering when comparing dict and OrderedDict in Python 3.7 or later. Both remember the order in which its keys were added, but that order isn't an important part of a dict's identity. That is, two dicts with the same keys and values mapped to those keys are equivalent, no matter what order the keys were added.
>>> dict(a=1, b=2) == dict(b=2, a=1)
True
The same is not true of two OrderedDicts, which are only equal they have the same keys, the same values for those keys, and the keys were added in the same order.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1)
False
enumerate accepts any iterable which includes set and dict. set might be unordered but its order of iteration is not arbitrary; if you iterate the same set multiple times, it will yield elements in the same order.
Also note that as of Python 3.7 dict preserves insertion order. Whether or not this is useful solely depends on your use case.

Is the order of execution guaranteed when looping over a string?

Is the program below guaranteed to always produce the same output?
s = 'fgvhlsdagfcisdghfjkfdshfsal'
for c in s:
print(c)
Yes, it is. This is because the str type is an immutable sequence. Sequences represent a finite ordered set of elements (see Sequences in the Data model chapter of the Reference guide).
Iteration through a given string (any Sequence) is guaranteed to always produce the same results in the same order for different runs of the CPython interpreter, versions of CPython and implementations of Python.
Yes. Internally the string you have there is stored in an c style array (depending on interpreter implementation), being a sequential array of data, one can create an iterator. In order to use for ... in ... syntax, you need to be able to iterate over the object after the in. A string supplies its own iterator which allows it to be parsed via for in syntax in sequential order as do all python sequences.
The same is true for lists, and even custom objects that you create. However not all iterable python objects will necessarily be in order or represent the values they store, a clear example of this is the dictionary. Dictionary iteration yields keys which may or may not be in the order you added them in (depending on the version of python you use among other things, so don't assume its ordered unless you use OrderedDict) instead of sequential values like list tuple and string.
Yes, it is. Over a string, a for-loop iterates over the characters in order. This is also true for lists and tuples -- a for-loop will iterate over the elements in order.
You may be thinking of sets and dictionaries. These don't specify a particular order, so:
for x in {"a","b","c"}: # over a set
print(x)
for key in {"x":1, "y":2, "z":3}: # over a dict
print(key)
will iterate in some arbitrary order that you can't easily predict in advance.
See this Stack Overflow answer for some additional information on what guarantees are made about the order for dictionaries and sets.
Yes. The for loop is sequential.
Yes, the loop will always print each letter one by one starting from the first character and ending with the last.

Python - hash() and dict

If we have 2 separate dict, both with the same keys and values, when we print them it will come in different orders, as expected.
So, let's say I want to to use hash() on those dict:
hash(frozenset(dict1.items()))
hash(frozenset(dict2.items()))
I'm doing this to make a new dict with the hash() value created as the new keys .
Even showing up different when printing dict, the value createad by hash() will always be equal? If no, how to make it always the same so I can make comparisons successfully?
If the keys and values hash the same, frozenset is designed to be a stable and unique representation of the underlying values. The docs explicitly state:
Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).
And the rules for hashable types require that:
Hashable objects which compare equal must have the same hash value.
So by definition frozensets with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset (but then you've got bigger problems).
Note that this does not mean they'll iterate in the same order or produce the same repr; thanks to chaining on hash collisions, two frozensets constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):
>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal

What dictates the order of data in a dictionary in Python?

What determines the order of items in a dictionary(specifically in Python, though this may apply to other languages)? For example:
>>> spam = {'what':4, 'shibby':'cream', 'party':'rock'}
>>> spam
{'party': 'rock', 'what': 4, 'shibby': 'cream'}
If I call on spam again, the items will still be in that same order. But how is this order decided?
According to python docs,
Dictionaries are sometimes found in other languages as “associative
memories” or “associative arrays”. Unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys, which can be
any immutable type; strings and numbers can always be keys.
They are arbitary, again from docs:
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys. Numeric types used for keys obey
the normal rules for numeric comparison: if two numbers compare equal
(such as 1 and 1.0) then they can be used interchangeably to index the
same dictionary entry. (Note however, that since computers store
floating-point numbers as approximations it is usually unwise to use
them as dictionary keys.)
The order in an ordinary dictionary is based on an internal hash value, so you're not supposed to make any assumptions about it.
Use collections.OrderedDict for a dictionary whose order you control.
Because dictionary keys are stored in a hash table. According to http://en.wikipedia.org/wiki/Hash_table:
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order.

Adding Elements from a List of Lists to a Set?

I'm attempting to add elements from a list of lists into a set. For example if I had
new_list=[['blue','purple'],['black','orange','red'],['green']]
How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
I'm trying to do this so I can use intersection to find out what elements appear in 2 sets. I thought this would work...
results=set()
results2=set()
for element in new_list:
results.add(element)
for element in new_list2:
results2.add(element)
results3=results.intersection(results2)
but I keep receiving:
TypeError: unhashable type: 'list'
for some reason.
Convert the inner lists to tuples, as sets allow you to store only hashable(immutable) objects:
In [72]: new_list=[['blue','purple'],['black','orange','red'],['green']]
In [73]: set(tuple(x) for x in new_list)
Out[73]: set([('blue', 'purple'), ('black', 'orange', 'red'), ('green',)])
How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
Well, despite the misleading name, that's not a set of anything, that's a tuple of lists. To convert a list of lists into a tuple of lists:
new_set = tuple(new_list)
Maybe you wanted to receive this?
new_set=set([['blue','purple'],['black','orange','red'],['green']])
If so… you can't. A set cannot contain unhashable values like lists. That's what the TypeError is telling you.
If this weren't a problem, all you'd have to do is write:
new_set = set(new_list)
And anything more complicated you write will have exactly the same problem as just calling set, so there's no tricky way around it.
Of course you can have a set of tuples, since they're hashable. So, maybe you wanted this:
new_set=set([('blue','purple'),('black','orange','red'),('green')])
That's easy too. Assuming your inner lists are guaranteed to contain nothing but strings (or other hashable values), as in your example it's just:
new_set = set(map(tuple, new_list))
Or, if you use a sort-based set class, you don't need hashable values, just fully-ordered values. For example:
new_set = sortedset(new_list)
Python doesn't come with such a thing in the standard library, but there are some great third-party implementations you can install, like blist.sortedset or bintrees.FastRBTree.
Of course sorted-set operations aren't quite as fast as hash operations in general, but often they're more than good enough. (For a concrete example, if you have 1 million items in the list, hashing will make each lookup 1 million times faster; sorting will only make it 50,000 times faster.)
Basically, any output you can describe or give an example of, we can tell you how to get that, or that it isn't a valid object you can get… but first you have to tell us what you actually want.
By the way, if you're wondering why lists aren't hashable, it's just because they're mutable. If you're wondering why most mutable types aren't hashable, the FAQ explains that.
Make the element a tuple before adding it to the set:
new_list=[['blue','purple'],['black','orange','red'],['green']]
new_list2=[['blue','purple'],['black','green','red'],['orange']]
results=set()
results2=set()
for element in new_list:
results.add(tuple(element))
for element in new_list2:
results2.add(tuple(element))
results3=results.intersection(results2)
print results3
results in:
set([('blue', 'purple')])
Set elements have to be hashable.
for adding lists to a set, instead use tuple
for adding sets to a set, instead use frozenset

Categories

Resources