When CPython set `in` operator is O(n)?

When CPython set `in` operator is O(n)? - python

I was reading about the time complexity of set operations in CPython and learned that the in operator for sets has the average time complexity of O(1) and worst case time complexity of O(n). I also learned that the worst case wouldn't occur in CPython unless the set's hash table's load factor is too high.
This made me wonder, when such a case would occur in the CPython implementation? Is there a simple demo code, which shows a set with clearly observable O(n) time complexity of the in operator?

Load factor is a red herring. In CPython sets (and dicts) automatically resize to keep the load factor under 2/3. There's nothing you can do in Python code to stop that.
O(N) behavior can occur when a great many elements have exactly the same hash code. Then they map to the same hash bucket, and set lookup degenerates to a slow form of linear search.
The easiest way to contrive such bad elements is to create a class with a horrible hash function. Like, e.g., and untested:
class C:
def __init__(self, val):
self.val = val
def __eq__(a, b):
return a.val == b.val
def __hash__(self):
return 3
Then hash(C(i)) == 3 regardless of the value of i.
To do the same with builtin types requires deep knowledge of their CPython implementation details. For example, here's a way to create an arbitrarily large number of distinct ints with the same hash code:
>>> import sys
>>> M = sys.hash_info.modulus
>>> set(hash(1 + i*M) for i in range(10000))
{1}
which shows that the ten thousand distinct ints created all have hash code 1.

You can view the set source here which can help: https://github.com/python/cpython/blob/723f71abf7ab0a7be394f9f7b2daa9ecdf6fb1eb/Objects/setobject.c#L429-L441
It's difficult to devise a specific example but the theory is fairly simple luckily :)
The set stores the keys using a hash of the value, as long as that hash is unique enough you'll end up with the O(1) performance as expected.
If for some weird reason all of your items have different data but the same hash, it collides and it will have to check all of them separately.
To illustrate, you can see the set as a dict like this:
import collection
your_set = collection.defaultdict(list)
def add(value):
your_set[hash(value)].append(value)
def contains(value):
# This is where your O(n) can occur, all values the same hash()
values = your_set.get(hash(value), [])
for v in values:
if v == value:
return True
return False

This a sometimes called the 'amortization' of a set or dictionary. It's shows up now and then as an interview question. As #TimPeters says resizing happens automagically at 2/3 capacity, so you'll only hit O(n) if you force the hash, yourself.
In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.
`/* GROWTH_RATE. Growth rate upon hitting maximum load.
* Currently set to used*3.
* This means that dicts double in size when growing without deletions,
* but have more head room when the number of deletions is on a par with the
* number of insertions. See also bpo-17563 and bpo-33205.
*
* GROWTH_RATE was set to used*4 up to version 3.2.
* GROWTH_RATE was set to used*2 in version 3.3.0
* GROWTH_RATE was set to used*2 + capacity/2 in 3.4.0-3.6.0.
*/
#define GROWTH_RATE(d) ((d)->ma_used*3)`
More to the efficiency point. Why 2/3 ? The Wikipedia article has a nice graph
https://upload.wikimedia.org/wikipedia/commons/1/1c/Hash_table_average_insertion_time.png
accompanying the article . (linear probing curve corresponds to O(1) to O(n) for our purposes, chaining is a more complicated hashing approach)
See https://en.wikipedia.org/wiki/Hash_table
for the complete
Say you have a set or dictionary which is stable, and is at 2/3 - 1 of it underlying capacity. Do you really want sluggish performance forever? You may wish to force resizing it upwards.
"if the keys are always known in advance, you can store them in a set and build your dictionaries from the set using dict.fromkeys()." plus some other useful if dated observations. Improving performance of very large dictionary in Python
For a good read on dictresize(): (dict was in Python before set)
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L415

Related

BIG O time complexity of TSP algorithms

I've written 2 nearest neighbor algorithms in python and I have to analyize the runtime complexity by O(n) and Θ(n).
So I've tried several samples and I don't understand why one of my algorithm is faster than the other one.
So here is my Code for the repeated nearest neighbor (RNN) algorithm:
def repeated_nn_tsp(cities):
return shortest_tour(nn_tsp(cities, start) for start in cities)
def shortest_tour(self, tours):
return min(tours, key=self.tour_length)
nn_tsp has a runtime complexity of O(n^2) and every startpoint will create a new NN Tour. Through all NN tours I have to find the best tour.
That's why I think the time complexity of the RNN has to be T(n)=O(n^3) and T(n)=Θ(n^3).
So here is my Code for the altered nearest neighbor (ANN) algorithm:
def alter_tour(tour):
original_length = tour_length(tour)
for (start, end) in all_segments(len(tour)):
reverse_segment_if_better(tour, start, end)
if tour_length(tour) < original_length:
return alter_tour(tour)
return tour
def all_segments(N):
return [(start, start + length) for length in range(N, 2-1, -1) for start in range(N - length + 1)]
def reverse_segment_if_better(tour, i, j):
A, B, C, D = tour[i-1], tour[i], tour[j-1], tour[j % len(tour)]
if distance(A, B) + distance(C, D) > distance(A, C) + distance(B, D):
tour[i:j] = reversed(tour[i:j])
The time complexity of all_segments should be T(n) = O(1/2 * n^2 - 0.5n) -> O(n^2) and creates n^2 elements.
Inside the Loop through all_segments (through n^2 elements) I call the function reverse_segment_if_better. I'll use the reversed method of python, which causes a time complexity of O(n).
That's why I think the time complexity of the loop has to be O(n^3). When there's a better tour, the function will call itself recursive. I think the outcome of the altered NN has a time complexity of O(n^4). Is that right?
But here we come to my problem: My evaluation, which runs the code 100times over 100cities, shows me that ANN is faster than RNN on average which is the opposite of the runtime complexity I expected. (RNN needs 4.829secs and ANN only needs 0.877secs for 1x 100-city.)
So where did I make a mistake?
Thanks in advance!

First I must say that time-complexity and big-o notations are not always on point, one algorithm may have a 'better' running-time function but still would run slower than expected, or slower than another function with a worst running-time function, in your case, it is very hard to determine what is the worst case to feed the algorithm, and we cannot assure you have done that! Maybe the cases were 'pleasant' with the ANN algorithm while the other one got stuck somewhere..? this is why it is not always 100% correct to rely only on the running time function we calculate.
What I am trying to say, is that you most probably did not make a mistake in your calculations on purpose, because they are hard functions to analyze on the fly, or What kind of input would be the worst, for example
As for the 'why?':
When talking about actual personal running time (as you gave an example of 0.877seconds), it boils down to our own machines, each computer has its own running hardware behind the curtains, not all computers are born the same.
Secondly, when we talk about running time complexity, we drop the low term values as you did with the all_segments function, you can see that you even dropped a negative term which in theory would help us reduce the numbers of 'operations'.
There are many cases in which there is a bit of code not-so efficient, that we bother to execute only if a specific criteria is met, thus reducing the running time.
Last and most importantly is the fact that when we talk about classifying
algorithms into sets such as O(n) or O(nlogn) we are talking about
asymptotic functions, we need to look at the bigger picture and see
what happens when we feed the algorithm very large amount of data,
which I assume you didn't check, because as you wrote, you ran only 100 cities. That may
vary if we would look at let's say, millions and millions of cities.
For your code, I can notice multiple parts that would reasonably be the cause of this 'weird' difference in the running time. The first, is that in theANN code, more specifically in the reverse_segment_if_better function, we are not always reversing the list, only if a certain statement is evaluated to a truthy value. We cannot be sure what kind of input you've given the algorithm, and thus I have only to imagine it is compliant with the second algorithm.
Moreover, it may be that I am missing something (as the function reverse_segment_if_better / we cannot view the function tour_length or distance) but I don't see how you came up with O(n^4) at the end, it seems like it is doing O(n^3):
all_segments- no doubt it is O(n) - returning ~n/2 values
The tricky part is analyzing reverse_segment_if_better and alter_tour - reversing only occurs from i:j thus it is not strictly correct to say it has O(n) - as we do not reverse the whole tour (at least, not for every value of start, end.
It is safe to say that it may be the case of not checking for very large numbers asymptotically, you gave an input and it was kind to this specific algorithm, or the final form of T(n) was not strict enough.

Why is `word == word[::-1]` to test for palindrome faster than a more algorithmic solution in Python?

I wrote a disaster of a question on Code Review asking why Python programmers normally test if a string is a palindrome by comparing the string to itself reversed, instead of a more algorithmic way with lower complexity, assuming that the normal way would be faster.
Here is the pythonic way:
def is_palindrome_pythonic(word):
# The slice requires N operations, plus memory
# and the equality requires N operations in the worst case
return word == word[::-1]
Here is my attempt at a more efficient way to accomplish this:
def is_palindrome_normal(word):
# This requires N/2 operations in the worst case
low = 0
high = len(word) - 1
while low < high:
if word[low] != word[high]:
return False
low += 1
high -= 1
return True
I would expect the normal way would be faster than the pythonic way. See for example this great article
Timing it with timeit, however, brought exactly the opposite result:
setup = '''
def is_palindrome_pythonic(word):
# ...
def is_palindrome_normal(word):
# ...
# N here is 2000
first_half = ''.join(map(str, (i for i in range(1000))))
word = first_half + first_half[::-1]
'''
timeit.timeit('is_palindrome_pythonic(word)', setup=setup, number=1000)
# 0.0052
timeit.timeit('is_palindrome_normal(word)', setup=setup, number=1000)
# 0.4268
I then figured that my n was too small, so I changed the length of word from 2000 to 2,000,000. The pythonic way took about 16 seconds on average, whereas the normal way ran several minutes before I canceled it.
Incidentally, in the best case scenario, where the very first letter does not match the very last letter, the normal algorithm was much faster.
What explains the extreme difference between the speeds of the two algorithms?

Because the "Pythonic" way with slicing is implemented in C. The interpreter / VM doesn't need to execute more than approximately once. The bulk of the algorithm is spent in a tight loop of native code.

As much as I love Python, I have to say that if you want maximum speed you probably shouldn't be using Python. ;)
The rule of thumb in Python time optimization is to use operators or module functions that do the bulk of the work at C speed rather than equivalent code running at Python speed. Even if the two equivalent approaches are using algorithms with the same big-O complexity, the time scaling factor of (mostly) running directly on the CPU vs running on the Python virtual machine has a big impact.
This is even true of an algorithm that's mostly just integer arithmetic, since Python integers are immutable objects, so when you do arithmetic there's the overhead of allocating and initialising a new integer object and disposing of the old one. CPython tries to be frugal, and is pretty smart at managing memory (so every new object doesn't require a system call to allocate memory), and of course the CPython interpreter maintains a cache of integers from -5 to 256 (inclusive) so that arithmetic with small numbers isn't so bad. But it's certainly slower than doing arithmetic at C speed with machine integers.
You can see the difference even with a simple counting loop. On my admittedly ancient 32 bit machine running Python 3.6, using the Bash time command to do the timings,
m = 5000000
for i in range(m):
i
is roughly twice as fast as
m = 5000000
i = 0
while i<m:
i += 1
because range can do the arithmetic at C speed, even though it still has to create a new integer object on each iteration. If you replace the i line in the range version with pass the time is roughly halved.
With more complicated algorithms the time differences can be much more significant, eg string or list copying that happens at the C level can often be done with efficient CPU operators that are much faster than chugging along on the Python virtual machine with Python code.
I agree that this can take a while to get used to if you come from a language that gets compiled to native machine code. And I admit that even after over 10 years of using Python it still feels a little weird to me that when (for example) you need to do some bit manipulation stuff that it can often be faster in Python to do it using string operations on a string composed of '0's and '1's that to do it using the traditional bitwise and arithmetic integer operators.
OTOH, I think it's useful to know the traditional algorithms as well as the Pythonic ones. It's rare that a programmer will work only in Python, so it's good to know how to do things in languages that don't work the way that Python does.

LFU cache implementation in python

I have implemented LFU cache in python with the help of Priority Queue Implementation given at
https://docs.python.org/2/library/heapq.html#priority-queue-implementation-notes
I have given code in the end of the post.
But I feel that code has some serious problems:
1. To give a scenario, suppose there is only one page is continuously getting visited (say 50 times). But this code will always mark the already added node as "removed" and add it to heap again. So basically it will have 50 different nodes for the same page. Hence increasing heap size enormously.
2. This question is almost similar to Q1 of Telephonic Interview of
http://www.geeksforgeeks.org/flipkart-interview-set-2-sde-2/
And the person mentioned that doubly linked list can give better efficiency as compared to heap. Can anyone explain me, how?
from llist import dllist
import sys
from heapq import heappush, heappop
class LFUCache:
heap = []
cache_map = {}
REMOVED = "<removed-task>"
def __init__(self, cache_size):
self.cache_size = cache_size
def get_page_content(self, page_no):
if self.cache_map.has_key(page_no):
self.update_frequency_of_page_in_cache(page_no)
else:
self.add_page_in_cache(page_no)
return self.cache_map[page_no][2]
def add_page_in_cache(self, page_no):
if (len(self.cache_map) == self.cache_size):
self.delete_page_from_cache()
heap_node = [1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def delete_page_from_cache(self):
while self.heap:
count, page_no, page_content = heappop(self.heap)
if page_content is not self.REMOVED:
del self.cache_map[page_no]
return
def update_frequency_of_page_in_cache(self, page_no):
heap_node = self.cache_map[page_no]
heap_node[2] = self.REMOVED
count = heap_node[0]
heap_node = [count+1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def main():
cache_size = int(raw_input("Enter cache size "))
cache = LFUCache(cache_size)
while 1:
page_no = int(raw_input("Enter page no needed "))
print cache.get_page_content(page_no)
print cache.heap, cache.cache_map, "\n"
if __name__ == "__main__":
main()

Efficiency is a tricky thing. In real-world applications, it's often a good idea to use the simplest and easiest algorithm, and only start to optimize when that's measurably slow. And then you optimize by doing profiling to figure out where the code is slow.
If you are using CPython, it gets especially tricky, as even an inefficient algorithm implemented in C can beat an efficient algorithm implemented in Python due to the large constant factors; e.g. a double-linked list implemented in Python tends to be a lot slower than simply using the normal Python list, even for cases where in theory it should be faster.
Simple algorithm:
For an LFU, the simplest algorithm is to use a dictionary that maps keys to (item, frequency) objects, and update the frequency on each access. This makes access very fast (O(1)), but pruning the cache is slower as you need to sort by frequency to cut off the least-used elements. For certain usage characteristics, this is actually faster than other "smarter" solutions, though.
You can optimize for this pattern by not simply pruning your LFU cache to the maximum length, but to prune it to, say, 50% of the maximum length when it grows too large. That means your prune operation is called infrequently, so it can be inefficient compared to the read operation.
Using a heap:
In (1), you used a heap because that's an efficient way of storing a priority queue. But you are not implementing a priority queue. The resulting algorithm is optimized for pruning, but not access: You can easily find the n smallest elements, but it's not quite as obvious how to update the priority of an existing element. In theory, you'd have to rebalance the heap after every access, which is highly inefficient.
To avoid that, you added a trick by keeping elements around even if they are deleted. But this trades in space for time.
If you don't want to trade in time, you could update the frequencies in-place, and simply rebalance the heap before pruning the cache. You regain fast access times at the expense of slower pruning time, like the simple algorithm above. (I doubt there is any speed difference between the two, but I have not measured this.)
Using a double-linked list:
The double-linked list mentioned in (2) takes advantage of the nature of the possible changes here: An element is either added as the lowest priority (0 accesses), or an existing element's priority is incremented exactly by 1. You can use these attributes to your advantage if you design your data structures like this:
You have a double-linked list of elements which is ordered by the frequency of the elements. In addition, you have a dictionary that maps items to elements within that list.
Accessing an element then means:
Either it's not in the dictionary, that is, it's a new item, in which case you can simply append it to the end of the double-linked list (O(1))
or it's in the dictionary, in which case you increment the frequency in the element and move it leftwards through the double-linked list until the list is ordered again (O(n) worst-case, but usually closer to O(1)).
To prune the cache, you simply cut off n elements from the end of the list (O(n)).

How to optimize operations on large (75,000 items) sets of booleans in Python?

There's this script called svnmerge.py that I'm trying to tweak and optimize a bit. I'm completely new to Python though, so it's not easy.
The current problem seems to be related to a class called RevisionSet in the script. In essence what it does is create a large hashtable(?) of integer-keyed boolean values. In the worst case - one for each revision in our SVN repository, which is near 75,000 now.
After that it performs set operations on such huge arrays - addition, subtraction, intersection, and so forth. The implementation is the simplest O(n) implementation, which, naturally, gets pretty slow on such large sets. The whole data structure could be optimized because there are long spans of continuous values. For example, all keys from 1 to 74,000 might contain true. Also the script is written for Python 2.2, which is a pretty old version and we're using 2.6 anyway, so there could be something to gain there too.
I could try to cobble this together myself, but it would be difficult and take a lot of time - not to mention that it might be already implemented somewhere. Although I'd like the learning experience, the result is more important right now. What would you suggest I do?

You could try doing it with numpy instead of plain python. I found it to be very fast for operations like these.
For example:
# Create 1000000 numbers between 0 and 1000, takes 21ms
x = numpy.random.randint(0, 1000, 1000000)
# Get all items that are larger than 500, takes 2.58ms
y = x > 500
# Add 10 to those items, takes 26.1ms
x[y] += 10
Since that's with a lot more rows, I think that 75000 should not be a problem either :)

Here's a quick replacement for RevisionSet that makes it into a set. It should be much faster. I didn't fully test it, but it worked with all of the tests that I did. There are undoubtedly other ways to speed things up, but I think that this will really help because it actually harnesses the fast implementation of sets rather than doing loops in Python which the original code was doing in functions like __sub__ and __and__. The only problem with it is that the iterator isn't sorted. You might have to change a little bit of the code to account for this. I'm sure there are other ways to improve this, but hopefully it will give you a good start.
class RevisionSet(set):
"""
A set of revisions, held in dictionary form for easy manipulation. If we
were to rewrite this script for Python 2.3+, we would subclass this from
set (or UserSet). As this class does not include branch
information, it's assumed that one instance will be used per
branch.
"""
def __init__(self, parm):
"""Constructs a RevisionSet from a string in property form, or from
a dictionary whose keys are the revisions. Raises ValueError if the
input string is invalid."""
revision_range_split_re = re.compile('[-:]')
if isinstance(parm, set):
print "1"
self.update(parm.copy())
elif isinstance(parm, list):
self.update(set(parm))
else:
parm = parm.strip()
if parm:
for R in parm.split(","):
rev_or_revs = re.split(revision_range_split_re, R)
if len(rev_or_revs) == 1:
self.add(int(rev_or_revs[0]))
elif len(rev_or_revs) == 2:
self.update(set(range(int(rev_or_revs[0]),
int(rev_or_revs[1])+1)))
else:
raise ValueError, 'Ill formatted revision range: ' + R
def sorted(self):
return sorted(self)
def normalized(self):
"""Returns a normalized version of the revision set, which is an
ordered list of couples (start,end), with the minimum number of
intervals."""
revnums = sorted(self)
revnums.reverse()
ret = []
while revnums:
s = e = revnums.pop()
while revnums and revnums[-1] in (e, e+1):
e = revnums.pop()
ret.append((s, e))
return ret
def __str__(self):
"""Convert the revision set to a string, using its normalized form."""
L = []
for s,e in self.normalized():
if s == e:
L.append(str(s))
else:
L.append(str(s) + "-" + str(e))
return ",".join(L)
Addition:
By the way, I compared doing unions, intersections and subtractions of the original RevisionSet and my RevisionSet above, and the above code is from 3x to 7x faster for those operations when operating on two RevisionSets that have 75000 elements. I know that other people are saying that numpy is the way to go, but if you aren't very experienced with Python, as your comment indicates, then you might not want to go that route because it will involve a lot more changes. I'd recommend trying my code, seeing if it works and if it does, then see if it is fast enough for you. If it isn't, then I would try profiling to see what needs to be improved. Only then would I consider using numpy (which is a great package that I use quite frequently).

For example, all keys from 1 to 74,000 contain true
Why not work on a subset? Just 74001 to the end.
Pruning 74/75th of your data is far easier than trying to write an algorithm more clever than O(n).

You should rewrite RevisionSet to have a set of revisions. I think the internal representation for a revision should be an integer and revision ranges should be created as needed.
There is no compelling reason to use code that supports python 2.3 and earlier.

Just a thought. I used to do this kind of thing using run-coding in binary image manipulation. That is, store each set as a series of numbers: number of bits off, number of bits on, number of bits off, etc.
Then you can do all sorts of boolean operations on them as decorations on a simple merge algorithm.

Is python's "set" stable?

The question arose when answering to another SO question (there).
When I iterate several times over a python set (without changing it between calls), can I assume it will always return elements in the same order? And if not, what is the rationale of changing the order ? Is it deterministic, or random? Or implementation defined?
And when I call the same python program repeatedly (not random, not input dependent), will I get the same ordering for sets?
The underlying question is if python set iteration order only depends on the algorithm used to implement sets, or also on the execution context?

There's no formal guarantee about the stability of sets. However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order. Sets are implemented as open-addressing hashtables (with a prime probe), so inserting or removing items can completely change the order (in particular, when that triggers a resize, which reorganizes how the items are laid out in memory.) You can also have two identical sets that nonetheless produce the items in different order, for example:
>>> s1 = {-1, -2}
>>> s2 = {-2, -1}
>>> s1 == s2
True
>>> list(s1), list(s2)
([-1, -2], [-2, -1])
Unless you're very certain you have the same set and nothing touched it inbetween the two iterations, it's best not to rely on it staying the same. Making seemingly irrelevant changes to, say, functions you call inbetween could produce very hard to find bugs.

A set or frozenset is inherently an unordered collection. Internally, sets are based on a hash table, and the order of keys depends both on the insertion order and on the hash algorithm. In CPython (aka standard Python) integers less than the machine word size (32 bit or 64 bit) hash to themself, but text strings, bytes strings, and datetime objects hash to integers that vary randomly; you can control that by setting the PYTHONHASHSEED environment variable.
From the __hash__ docs:
Note
By default, the __hash__() values of str, bytes and datetime
objects are “salted” with an unpredictable random value. Although they
remain constant within an individual Python process, they are not
predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service
caused by carefully-chosen inputs that exploit the worst case
performance of a dict insertion, O(n^2) complexity. See
http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and
other mappings. Python has never made guarantees about this ordering
(and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.
The results of hashing objects of other classes depend on the details of the class's __hash__ method.
The upshot of all this is that you can have two sets containing identical strings but when you convert them to lists they can compare unequal. Or they may not. ;) Here's some code that demonstrates this. On some runs, it will just loop, not printing anything, but on other runs it will quickly find a set that uses a different order to the original.
from random import seed, shuffle
seed(42)
data = list('abcdefgh')
a = frozenset(data)
la = list(a)
print(''.join(la), a)
while True:
shuffle(data)
lb = list(frozenset(data))
if lb != la:
print(''.join(data), ''.join(lb))
break
typical output
dachbgef frozenset({'d', 'a', 'c', 'h', 'b', 'g', 'e', 'f'})
deghcfab dahcbgef

And when I call the same python
program repeatedly (not random, not
input dependent), will I get the same
ordering for sets?
I can answer this part of the question now after a quick experiment. Using the following code:
class Foo(object) :
def __init__(self,val) :
self.val = val
def __repr__(self) :
return str(self.val)
x = set()
for y in range(500) :
x.add(Foo(y))
print list(x)[-10:]
I can trigger the behaviour that I was asking about in the other question. If I run this repeatedly then the output changes, but not on every run. It seems to be "weakly random" in that it changes slowly. This is certainly implementation dependent so I should say that I'm running the macports Python2.6 on snow-leopard. While the program will output the same answer for long runs of time, doing something that affects the system entropy pool (writing to the disk mostly works) will somethimes kick it into a different output.
The class Foo is just a simple int wrapper as experiments show that this doesn't happen with sets of ints. I think that the problem is caused by the lack of __eq__ and __hash__ members for the object, although I would dearly love to know the underlying explanation / ways to avoid it. Also useful would be some way to reproduce / repeat a "bad" run. Does anyone know what seed it uses, or how I could set that seed?

It’s definitely implementation defined. The specification of a set says only that
Being an unordered collection, sets do not record element position or order of insertion.
Why not use OrderedDict to create your own OrderedSet class?

The answer is simply a NO.
Python set operation is NOT stable.
I did a simple experiment to show this.
The code:
import random
random.seed(1)
x=[]
class aaa(object):
def __init__(self,a,b):
self.a=a
self.b=b
for i in range(5):
x.append(aaa(random.choice('asf'),random.randint(1,4000)))
for j in x:
print(j.a,j.b)
print('====')
for j in set(x):
print(j.a,j.b)
Run this for twice, you will get this:
First time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
a 2030
a 2332
f 1555
a 1045
s 1935
Process finished with exit code 0
Second time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
s 1935
a 2332
a 1045
f 1555
a 2030
Process finished with exit code 0
The reason is explained in comments in this answer.
However, there are some ways to make it stable:
set PYTHONHASHSEED to 0, see details here, here and here.
Use OrderedDict instead.

As pointed out, this is strictly an implementation detail.
But as long as you don’t change the structure between calls, there should be no reason for a read-only operation (= iteration) to change with time: no sane implementation does that. Even randomized (= non-deterministic) data structures that can be used to implement sets (e.g. skip lists) don’t change the reading order when no changes occur.
So, being rational, you can safely rely on this behaviour.
(I’m aware that certain GCs may reorder memory in a background thread but even this reordering will not be noticeable on the level of data structures, unless a bug occurs.)

The definition of a set is unordered, unique elements ("Unordered collections of unique elements"). You should care only about the interface, not the implementation. If you want an ordered enumeration, you should probably put it into a list and sort it.
There are many different implementations of Python. Don't rely on undocumented behaviour, as your code could break on different Python implementations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.