What is the purpose of collections.ChainMap? - python

In Python 3.3 a ChainMap class was added to the collections module:
A ChainMap class is provided for quickly linking a number of mappings
so they can be treated as a single unit. It is often much faster than
creating a new dictionary and running multiple update() calls.
Example:
>>> from collections import ChainMap
>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = ChainMap(y, x)
>>> for k, v in z.items():
print(k, v)
a 1
c 11
b 10
It was motivated by this issue and made public by this one (no PEP was created).
As far as I understand, it is an alternative to having an extra dictionary and maintaining it with update()s.
The questions are:
What use cases does ChainMap cover?
Are there any real world examples of ChainMap?
Is it used in third-party libraries that switched to python3?
Bonus question: is there a way to use it on Python2.x?
I've heard about it in Transforming Code into Beautiful, Idiomatic Python PyCon talk by Raymond Hettinger and I'd like to add it to my toolkit, but I lack in understanding when should I use it.

I like #b4hand's examples, and indeed I have used in the past ChainMap-like structures (but not ChainMap itself) for the two purposes he mentions: multi-layered configuration overrides, and variable stack/scope emulation.
I'd like to point out two other motivations/advantages/differences of ChainMap, compared to using a dict-update loop, thus only storing the "final" version":
More information: since a ChainMap structure is "layered", it supports answering question like: Am I getting the "default" value, or an overridden one? What is the original ("default") value? At what level did the value get overridden (borrowing #b4hand's config example: user-config or command-line-overrides)? Using a simple dict, the information needed for answering these questions is already lost.
Speed tradeoff: suppose you have N layers and at most M keys in each, constructing a ChainMap takes O(N) and each lookup O(N) worst-case[*], while construction of a dict using an update-loop takes O(NM) and each lookup O(1). This means that if you construct often and only perform a few lookups each time, or if M is big, ChainMap's lazy-construction approach works in your favor.
[*] The analysis in (2) assumes dict-access is O(1), when in fact it is O(1) on average, and O(M) worst case. See more details here.

I could see using ChainMap for a configuration object where you have multiple scopes of configuration like command line options, a user configuration file, and a system configuration file. Since lookups are ordered by the order in the constructor argument, you can override settings at lower scopes. I've not personally used or seen ChainMap used, but that's not surprising since it is a fairly recent addition to the standard library.
It might also be useful for emulating stack frames where you push and pop variable bindings if you were trying to implement a lexical scope yourself.
The standard library docs for ChainMap give several examples and links to similar implementations in third-party libraries. Specifically, it names Django’s Context class and Enthought's MultiContext class.

I'll take a crack at this:
Chainmap looks like a very just-so kind of abstraction. It's a good solution for a very specialized kind of problem. I propose this use case.
If you have:
multiple mappings (e.g, dicts)
some duplication of keys in those mappings (same key can appear in multiple mappings, but not the case that all keys appear in all mappings)
a consuming application which wishes to access the value of a key in the "highest priority" mapping where there is a total ordering over all the mappings for any given key (that is, mappings may have equal priority, but only if it is known that there are no duplications of key within those mappings) (In the Python application, packages can live in the same directory (same priority) but must have different names, so, by definition, the symbol names in that directory cannot be duplicates.)
the consuming application does not need to change the value of a key
while at the same time the mappings must maintain their independent identity and can be changed asynchronously by an external force
and the mappings are big enough, expensive enough to access, or change often enough between application accesses, that the cost of computing the projection (3) each time your app needs it is a significant performance concern for your application...
Then,
you might consider using a chainmap to create a view over the collection of mappings.
But this is all after-the-fact justification. The Python guys had a problem, came up with a good solution in the context of their code, then did some extra work to abstract their solution so we could use it if we choose. More power to them. But whether it's appropriate for your problem is up to you to decide.

To imperfectly answer your:
Bonus question: is there a way to use it on Python2.x?
from ConfigParser import _Chainmap as ChainMap
However keep in mind that this isn't a real ChainMap, it inherits from DictMixin and only defines:
__init__(self, *maps)
__getitem__(self, key)
keys(self)
# And from DictMixin:
__iter__(self)
has_key(self, key)
__contains__(self, key)
iteritems(self)
iterkeys(self)
itervalues(self)
values(self)
items(self)
clear(self)
setdefault(self, key, default=None)
pop(self, key, *args)
popitem(self)
update(self, other=None, **kwargs)
get(self, key, default=None)
__repr__(self)
__cmp__(self, other)
__len__(self)
Its implementation also doesn't seem particularly efficient.

Related

Why does the **kwargs mapping compare equal with a differently ordered OrderedDict?

According to PEP 468:
Starting in version 3.6 Python will preserve the order of keyword arguments as passed to a function. To accomplish this the collected kwargs will now be an ordered mapping. Note that this does not necessarily mean OrderedDict.
In that case, why does this ordered mapping fail to respect equality comparison with Python's canonical ordered mapping type, the collections.OrderedDict:
>>> from collections import OrderedDict
>>> data = OrderedDict(zip('xy', 'xy'))
>>> def foo(**kwargs):
... return kwargs == data
...
>>> foo(x='x', y='y') # expected result: True
True
>>> foo(y='y', x='x') # expected result: False
True
Although iteration order is now preserved, kwargs seems to be behaving just like a normal dict for the comparisons. Python has a C implemented ordered dict since 3.5, so it could conceivably have been used directly (or, if performance was still a concern, a faster implementation using a thin subclass of the 3.6 compact dict).
Why doesn't the ordered mapping received by a function respect ordering in equality comparisons?
Regardless of what an “ordered mapping” means, as long as it’s not necessarily OrderedDict, OrderedDict’s == won’t take into account its order. Docs:
Equality tests between OrderedDict objects are order-sensitive and are implemented as list(od1.items())==list(od2.items()). Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries. This allows OrderedDict objects to be substituted anywhere a regular dictionary is used.
"Ordered mapping" only means the mapping has to preserve order. It doesn't mean that order has to be part of the mapping's == relation.
The purpose of PEP 468 is just to preserve the ordering information. Having order be part of == would produce backward incompatibility without any real benefit to any of the use cases that motivated PEP 468. Using OrderedDict would also be more expensive (since OrderedDict still maintains its own separate linked list to track order, and it can't abandon that linked list without sacrificing big-O efficiency in popitem and move_to_end).
The answer to your first 'why' is because this feature is implemented by using a plain dict in CPython. As #Ryan's answer points out, this means that comparisons won't be order-sensitive.
The second 'why' here is why this doesn't use an OrderedDict.
Using an OrderedDict was the initial plan as stated in the first draft of PEP 486. The idea, as stated in this reply, was to collect some perf data to show the effect of plugging in the OrderedDict since this was a point of contention when the idea was floated around before. The author of the PEP even alluded to the order preserving dict being another option in the final reply on that thread.
After that, the conversation on the topic seems to have died down until Python 3.6 came along. When the new dict came, it had the nice side-effect of just implementing PEP 486 out of the box (as this Python-dev thread states). The specific message in that thread also states how the author wanted the term OrderedDict to be changed to Ordered Mapping. (This is also when a new commit on PEP 468, after the initial one, was made)
As far as I can tell, this rewording was done in order to allow other implementations to provide this feature as they see fit. CPython and PyPy already had a dict that easily implemented PEP 468, other implementations might opt for an OrderedDict, others could go for another form of an ordered mapping.
That does open the door for a problem, though. It does mean that, theoretically, in an implementation of Python 3.6 with an OrderedDict as the structure implementing this feature, the comparison would be order-sensitive while in others (CPython) it wouldn't. (In Python 3.7, all dicts are required to be insertion-ordered so this point is probably moot since all implementations would use it for **kwargs)
Though it does seem like an issue, it really isn't. As #user2357112 pointed out, there's no guarantee on ==. PEP 468 only guarantees order. As far as I can tell, == is basically implementation defined.
In short, it compares equal in CPython because kwargs in CPython is a dict and it's a dict because after 3.6 the whole thing just worked.
Just to add, if you do want to make this check (without relying on an implementation detail (which even then, won't be in python 3.7)), just do
from collections import OrderedDict
>>> data = OrderedDict(zip('xy', 'xy'))
>>> def foo(**kwargs):
... return OrderedDict(kwargs) == data
since this is guaranteed to be True.

Custom class which is a dict, but initialized without dict copy?

For legibility purposes, I would like to have a custom class that behaves exactly like a dict (but carries a meaningful type instead of the more general dict type):
class Derivatives(dict):
"Dictionary that represents the derivatives."
Now, is there a way of building new objects of this class in a way that does not involve copies? The naive usage
derivs = Derivatives({var: 1}) # var is a Python object
in fact creates a copy of the dictionary passed as an argument, which I would like to avoid, for efficiency reasons.
I tried to bypass the copy but then the class of the dict cannot be changed, in CPython:
class Derivatives(dict):
def __new__(cls, init_dict):
init_dict.__class__ = cls # Fails with __class__ assignment: only for heap types
return init_dict
I would like to have both the ability to give an explicit class name to the dictionaries that the program manipulates and an efficient way of building such dictionaries (instead of being forced to copy a Python dict). Is this doable efficiently in Python?
PS: The use case is maybe 100,000 creations of single-key Derivatives, where the key is a variable (not a string, so no keyword initialization). This is actually not slow, so "efficiency reasons" here means more something like "elegance": there is ideally no need to waste time doing a copy when the copy is not needed. So, in this particular case the question is more about the elegance/clarity that Python can bring here than about running speed.
By inheriting from dict you are given three possibilities for constructor arguments: (baring the {} literal)
class dict(**kwarg)
class dict(mapping, **kwarg)
class dict(iterable, **kwarg)
This means that, in order to instantiate your instance you must do one of the following:
Pass the variables as keywords D(x=1) which are then packed into an intermediate dictionary anyway.
Create a plain dictionary and pass it as a mapping.
Pass an iterable of (key,value) pairs.
So in all three of these cases you will need to create intermediate objects to satisfy the dict constructor.
The third option for a single pair it would look like D(((var,1),)) which I highly recommend against for readability sake.
So if you want your class to inherit from a dictionary, using Derivatives({var: 1}) is your most efficient and most readable option.
As a personal note if you will have thousands of single pair dictionaries I'm not sure how the dict setup is the best in the first place, you may just reconsider the basis of your class.
TL;DR: There's not general-purpose way to do it unless you do it in C.
Long answer:
The dict class is implemented in C. Thus, there is no way to access it's internal properties - and most importantly, it's internal hash table, unless you use C.
In C, you could simply copy the pointer representing the hash table into your object without having to iterate over the dict (key, value) pairs and insert them into your object. (Of course, it's a bit more complicated than this. Note that I omit memory management details).
Longer answer:
I'm not sure why you are concerned about efficiency.
Python passes arguments as references. It rarely every copies unless you explicitly tell it to.
I read in the comments that you can't use named parameters, as the keys are actual Python objects. That leaves me to understand that you're worried about copying the dict keys (and maybe values). However, even the dictionary keys are not copied, and passed by reference! Consider this code:
class Test:
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return self.x
t = Test(1, 2)
print(t.y) # prints 2
d = {t: 1}
print(d[t]) # prints 1
keys = list(d.keys())
keys[0].y = 10
print(t.y) # prints 10! No copying was made when inserting object into dictionary.
Thus, the only remaining area of concern is iterating through the dict and inserting the values in your Derivatives class. This is unavoidable, unless you can somehow set the internal hash table of your class to the dict's internal hash table. There is no way to do this in pure python, as the dict class is implemented in C (as mentioned above).
Note that others have suggested using generators. This seems like a good idea too - say if you were reading the derivatives from a file or if you were generating them with a simple formula. It would avoid creating the dict object in the first place. However, there will be no noticable improvements in efficiency if the generators are just wrappers around lists (or any other data structure that can contain an arbritary set of values).
Your best bet is do stick with your original method. Generators are great, but they can't efficiently represent an arbritary set of values (which might be the case in your scenario). It's also not worth it to do it in C.
EDIT: It might be worth it to do it in C, after all!
I'm not too big on the details of the Python C API, but consider defining a class in C, for example,DerivativesBase (deriving from dict). All you do is define an __init__ function in C for DerivativesBase that takes a dict as a parameter and copies the hash table pointer from the dict into your DerivativesBase object. Then, in python, your Derivatives class derives from DerivativesBase and implements the bulk of the functionality.

Sampling dictionaries in Python 3.x

In Python 3, dict_values, dict_keys and dict_items do not support indexing
my_dict = {'a': 0', 'b': 1', 'c': 2}
All of the queries below fail:
my_dict.keys()[1]
my_dict.values()[1]
my_dict.items()[1]
for that reason.
Sometimes I just want to get a random sample of what's in my dictionary. I know I can convert them their output to lists. Do they have any other getter methods that do not require creating another data structure? (I would also imagine that converting them to a list would create a copy, which may not work well for huge dictionaries).
Sometimes I just want to get a random sample of what's in my dictionary. I know I can convert them their output to lists. Do they have any other getter methods that do not require creating another data structure? (I would also imagine that converting them to a list would create a copy, which may not work well for huge dictionaries).
The key types are explained under Dictionary view objects, and also guaranteed to be subclasses of collections.abc.KeysView and friends. Basically, this means you can only count on them having __contains__, __iter__, and __len__.
They don't directly support indexing because their ordering can be invalidated.* But practically, in any implementation of Python, they're only actually invalidated if you mutate the dictionary. Which means you can safely do things like this:
next(itertools.islice(my_dict.keys(), i, None))
Basically, the same way you'd index a set, or any other non-iterator iterable.
* The actual rules as to what behavior is documented have changed a few times. The current version actually says "They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes," which implies the practical rule can now be relied on. But even if you're using an older version that, e.g., explicitly only guarantees consistency between adjacent calls to keys, values, items, and related functions, unless you're worried about someone writing a new implementation of Python 2.6 or 3.1 or something, there's no reason to worry about that.
Of course you probably want to wrap that up in a function that's more readable. In fact, I'd do it in two steps. First, use the nth function from the itertools recipes:
def nth(iterable, n, default=None):
return next(itertools.islice(iterable, n, None), default)
Then wrap up the key indexing:
def getkey(mapping, index, default=None):
return nth(mapping.keys(), index, default)
What if you want a random sample? Well, dictionary views are Sized, as are dictionaries themselves, so you can always use randrange:
def choosekey(mapping):
return getkey(mapping, random.randrange(len(mapping)))
If you just want a key, value or item, use next() and iter():
next(iter(my_dict))
next(iter(my_dict.values()))
next(iter(my_dict.items()))

How to glob for iterable element

I have a python dictionary that contains iterables, some of which are lists, but most of which are other dictionaries. I'd like to do glob-style assignment similar to the following:
myiter['*']['*.txt']['name'] = 'Woot'
That is, for each element in myiter, look up all elements with keys ending in '.txt' and then set their 'name' item to 'Woot'.
I've thought about sub-classing dict and using the fnmatch module. But, it's unclear to me what the best way of accomplishing this is.
The best way, I think, would be not to do it -- '*' is a perfectly valid key in a dict, so myiter['*'] has a perfectly well defined meaning and usefulness, and subverting that can definitely cause problems. How to "glob" over keys which are not strings, including the exclusively integer "keys" (indices) in elements which are lists and not mappings, is also quite a design problem.
If you nevertheless must do it, I would recommend taking full control by subclassing the abstract base class collections.MutableMapping, and implement the needed methods (__len__, __iter__, __getitem__, __setitem__, __delitem__, and, for better performance, also override others such as __contains__, which the ABC does implement on the base of the others, but slowly) in terms of a contained dict. Subclassing dict instead, as per other suggestions, would require you to override a huge number of methods to avoid inconsistent behavior between the use of "keys containing wildcards" in the methods you do override, and in those you don't.
Whether you subclass collections.MutableMapping, or dict, to make your Globbable class, you have to make a core design decision: what does yourthing[somekey] return when yourthing is a Globbable?
Presumably it has to return a different type when somekey is a string containing wildcards, versus anything else. In the latter case, one would imagine, just what is actually at that entry; but in the former, it can't just return another Globbable -- otherwise, what would yourthing[somekey] = 'bah' do in the general case? For your single "slick syntax" example, you want it to set a somekey entry in each of the items of yourthing (a HUGE semantic break with the behavior of every other mapping in the universe;-) -- but then, how would you ever set an entry in yourthing itself?!
Let's see if the Zen of Python has anything to say about this "slick syntax" for which you yearn...:
>>> import this
...
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Consider for a moment the alternative of losing the "slick syntax" (and all the huge semantic headaches it necessarily implies) in favor of clarity and simplicity (using Python 2.7-and-better syntax here, just for the dict comprehension -- use an explicit dict(...) call instead if you're stuck with 2.6 or earlier), e.g.:
def match(s, pat):
try: return fnmatch.fnmatch(s, pat)
except TypeError: return False
def sel(ds, pat):
return [d[k] for d in ds for k in d if match(k, pat)]
def set(ds, k, v):
for d in ds: d[k] = v
so your assignment might become
set(sel(sel([myiter], '*')), '*.txt'), 'name', 'Woot')
(the selection with '*' being redundant if all , I'm just omitting it). Is this so horrible as to be worth the morass of issues I've mentioned above in order to use instead
myiter['*']['*.txt']['name'] = 'Woot'
...? By far the clearest and best-performing way, of course, remains the even-simpler
def match(k, v, pat):
try:
if fnmatch.fnmatch(k, pat):
return isinstance(v, dict)
except TypeError:
return False
for k, v in myiter.items():
if match(k, v, '*'):
for sk, sv in v.items():
if match(sk, sv, '*.txt'):
sv['name'] = 'Woot'
but if you absolutely crave conciseness and compactness, despising the Zen of Python's koan "Sparse is better than dense", you can at least obtain them without the various nightmares I mentioned as needed to achieve your ideal "syntax sugar".
The best way is to subclass dict and use the fnmatch module.
subclass dict: adding functionality you want in an object-oriented way.
fnmatch module: reuse of existing functionality.
You could use fnmatch for functionality to match on dictionary keys although you would have to compromise syntax slightly, especially if you wanted to do this on a nested dictionary. Perhaps a custom dictionary-like class with a search method to return wildcard matches would work well.
Here is a VERY BASIC example that comes with a warning that this is NOT RECURSIVE and will not handle nested dictionaries:
from fnmatch import fnmatch
class GlobDict(dict):
def glob(self, match):
"""#match should be a glob style pattern match (e.g. '*.txt')"""
return dict([(k,v) for k,v in self.items() if fnmatch(k, match)])
# Start with a basic dict
basic_dict = {'file1.jpg':'image', 'file2.txt':'text', 'file3.mpg':'movie',
'file4.txt':'text'}
# Create a GlobDict from it
glob_dict = GlobDict( **basic_dict )
# Then get glob-styl results!
globbed_results = glob_dict.glob('*.txt')
# => {'file4.txt': 'text', 'file2.txt': 'text'}
As for what way is the best? The best way is the one that works. Don't try to optimize a solution before it's even created!
Following the principle of least magic, perhaps just define a recursive function, rather than subclassing dict:
import fnmatch
def set_dict_with_pat(it,key_patterns,value):
if len(key_patterns)>1:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
set_dict_with_pat(it[key],key_patterns[1:],value)
else:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
it[key]=value
Which could be used like this:
myiter=({'dir1':{'a.txt':{'name':'Roger'},'b.notxt':{'name':'Carl'}},'dir2':{'b.txt':{'name':'Sally'}}})
set_dict_with_pat(myiter,['*','*.txt','name'],'Woot')
print(myiter)
# {'dir2': {'b.txt': {'name': 'Woot'}}, 'dir1': {'b.notxt': {'name': 'Carl'}, 'a.txt': {'name': 'Woot'}}}

Is python's "set" stable?

The question arose when answering to another SO question (there).
When I iterate several times over a python set (without changing it between calls), can I assume it will always return elements in the same order? And if not, what is the rationale of changing the order ? Is it deterministic, or random? Or implementation defined?
And when I call the same python program repeatedly (not random, not input dependent), will I get the same ordering for sets?
The underlying question is if python set iteration order only depends on the algorithm used to implement sets, or also on the execution context?
There's no formal guarantee about the stability of sets. However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order. Sets are implemented as open-addressing hashtables (with a prime probe), so inserting or removing items can completely change the order (in particular, when that triggers a resize, which reorganizes how the items are laid out in memory.) You can also have two identical sets that nonetheless produce the items in different order, for example:
>>> s1 = {-1, -2}
>>> s2 = {-2, -1}
>>> s1 == s2
True
>>> list(s1), list(s2)
([-1, -2], [-2, -1])
Unless you're very certain you have the same set and nothing touched it inbetween the two iterations, it's best not to rely on it staying the same. Making seemingly irrelevant changes to, say, functions you call inbetween could produce very hard to find bugs.
A set or frozenset is inherently an unordered collection. Internally, sets are based on a hash table, and the order of keys depends both on the insertion order and on the hash algorithm. In CPython (aka standard Python) integers less than the machine word size (32 bit or 64 bit) hash to themself, but text strings, bytes strings, and datetime objects hash to integers that vary randomly; you can control that by setting the PYTHONHASHSEED environment variable.
From the __hash__ docs:
Note
By default, the __hash__() values of str, bytes and datetime
objects are “salted” with an unpredictable random value. Although they
remain constant within an individual Python process, they are not
predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service
caused by carefully-chosen inputs that exploit the worst case
performance of a dict insertion, O(n^2) complexity. See
http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and
other mappings. Python has never made guarantees about this ordering
(and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.
The results of hashing objects of other classes depend on the details of the class's __hash__ method.
The upshot of all this is that you can have two sets containing identical strings but when you convert them to lists they can compare unequal. Or they may not. ;) Here's some code that demonstrates this. On some runs, it will just loop, not printing anything, but on other runs it will quickly find a set that uses a different order to the original.
from random import seed, shuffle
seed(42)
data = list('abcdefgh')
a = frozenset(data)
la = list(a)
print(''.join(la), a)
while True:
shuffle(data)
lb = list(frozenset(data))
if lb != la:
print(''.join(data), ''.join(lb))
break
typical output
dachbgef frozenset({'d', 'a', 'c', 'h', 'b', 'g', 'e', 'f'})
deghcfab dahcbgef
And when I call the same python
program repeatedly (not random, not
input dependent), will I get the same
ordering for sets?
I can answer this part of the question now after a quick experiment. Using the following code:
class Foo(object) :
def __init__(self,val) :
self.val = val
def __repr__(self) :
return str(self.val)
x = set()
for y in range(500) :
x.add(Foo(y))
print list(x)[-10:]
I can trigger the behaviour that I was asking about in the other question. If I run this repeatedly then the output changes, but not on every run. It seems to be "weakly random" in that it changes slowly. This is certainly implementation dependent so I should say that I'm running the macports Python2.6 on snow-leopard. While the program will output the same answer for long runs of time, doing something that affects the system entropy pool (writing to the disk mostly works) will somethimes kick it into a different output.
The class Foo is just a simple int wrapper as experiments show that this doesn't happen with sets of ints. I think that the problem is caused by the lack of __eq__ and __hash__ members for the object, although I would dearly love to know the underlying explanation / ways to avoid it. Also useful would be some way to reproduce / repeat a "bad" run. Does anyone know what seed it uses, or how I could set that seed?
It’s definitely implementation defined. The specification of a set says only that
Being an unordered collection, sets do not record element position or order of insertion.
Why not use OrderedDict to create your own OrderedSet class?
The answer is simply a NO.
Python set operation is NOT stable.
I did a simple experiment to show this.
The code:
import random
random.seed(1)
x=[]
class aaa(object):
def __init__(self,a,b):
self.a=a
self.b=b
for i in range(5):
x.append(aaa(random.choice('asf'),random.randint(1,4000)))
for j in x:
print(j.a,j.b)
print('====')
for j in set(x):
print(j.a,j.b)
Run this for twice, you will get this:
First time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
a 2030
a 2332
f 1555
a 1045
s 1935
Process finished with exit code 0
Second time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
s 1935
a 2332
a 1045
f 1555
a 2030
Process finished with exit code 0
The reason is explained in comments in this answer.
However, there are some ways to make it stable:
set PYTHONHASHSEED to 0, see details here, here and here.
Use OrderedDict instead.
As pointed out, this is strictly an implementation detail.
But as long as you don’t change the structure between calls, there should be no reason for a read-only operation (= iteration) to change with time: no sane implementation does that. Even randomized (= non-deterministic) data structures that can be used to implement sets (e.g. skip lists) don’t change the reading order when no changes occur.
So, being rational, you can safely rely on this behaviour.
(I’m aware that certain GCs may reorder memory in a background thread but even this reordering will not be noticeable on the level of data structures, unless a bug occurs.)
The definition of a set is unordered, unique elements ("Unordered collections of unique elements"). You should care only about the interface, not the implementation. If you want an ordered enumeration, you should probably put it into a list and sort it.
There are many different implementations of Python. Don't rely on undocumented behaviour, as your code could break on different Python implementations.

Categories

Resources