I need set-like data structure with these properties:
hashable
no duplicate elements
maintains order
immutable
iterable
part of standard library? want to keep it simple
What is happening:
frozenset([3,1,2,2,3]) -> frozenset(1,2,3)
What I need:
frozenset*([3,1,2,2,3]) -> frozenset*(3,1,2)
I thought I could use frozenset but both sets and frozensets
reorder elements. I assume this is for faster duplicate checks?
But in any case I can't have a reordering.
As of Python 3.7 dicts no longer reorder elements and instead guarantee to preserve insertion order. You could use a dict where the keys are your set items and the values are ignored.
>>> dict.fromkeys([3,1,2,2,3])
{3: None, 1: None, 2: None}
Dicts aren't frozen, so if that's crucial then you could first put all the items into a dict and then build a tuple from the keys.
>>> tuple(dict.fromkeys([3,1,2,2,3]).keys())
(3, 1, 2)
This would be pretty close to a frozenset. The main difference is that it would take O(n) rather than O(1) time to check if an item is in the tuple.
There is no such implementation in the standard library
Related
In Python 3.10, I am aware that a dictionary preserves insertion order. However when performing conditional list comprehensions, can this order still be guaranteed?
For example, given:
my_dict = {}
my_dict['a'] = 1
my_dict['b'] = 2
my_dict['c'] = 3
my_dict['d'] = 4
Can one guarantee that either (option A):
print([k for k in my_dict.keys() if k not in ['c']])
or (option B):
print([k for k in (my_dict.keys() - {'c'})])
will always return:
['a', 'b', 'd']
Iterating over dict or dict.keys() should give the same results for any version of Python, since the language guarantees the current order will always be stable, even if it doesn't necessarily match the insertion order. In Python 3, the keys() method provides a dynamic view of the dictionary's entries, so it will directly reflect the current state of the dict. The views themselves may be "set-like", but that does not imply they are unordered (or independently ordered).
The problem with the examples in the question is that they don't compare like with like. The keys() method returns a view (or a list in earlier versions), whereas keys() - {'a'} evaluates to a set (i.e. an object with no guaranteed order). So it is safe to assume option A will always give the same results, but not option B.
I think the short answer is yes, the "preserves insertion order" clause gives you a proper order of keys whenever you go through them (be it via for k in my_dict or my_dict.keys()), and together with the one that #Larry pointed out gives you what you ask for.
However the downwotes on this question are probably due to the fact that if you need an answer to this question for a coding problem, you should either learn more about list comprehensions or just rethink your solution and sort the keys based on insertion order or whatever way of guaranteing you'd imagine
A pyton set is meant as not ordered, so why enumerate accepts them as input?
The same question would apply to dictionary.
From my point of view these are giving the false impression that there is a predictable way of enumerating them, but there is not.
This is quite misleading. I would have expected at least a warning from enumerate whens I request the enumerate(set) or enumerate(dict).
Can anyone explain why this warning is not there? is it "pythonic" to allow enumeration which can be not predictable?
There is a distinction between a container and its iterator. Technically, enumerate doesn't work with set, dict, or list, because none of those types is an iterator. They are iterable, though, meaning enumerate can get an iterator from each by implicitly using the iter function (i.e., enumerate(some_list_dict_or_set) == enumerate(iter(some_list_dict_or_set)))
>>> iter([1,2,3])
<list_iterator object at 0x109d924e0>
>>> iter(dict(a=1, b=2))
<dict_keyiterator object at 0x109d4b818>
>>> iter({1,2,3})
<set_iterator object at 0x109d53ab0>
So while a given container may not have any inherent ordering of its elements, its iterator can impose an order, and enumerate simply pairs that ordering with a sequence of int values.
You can really see the difference between inherent ordering and imposed ordering when comparing dict and OrderedDict in Python 3.7 or later. Both remember the order in which its keys were added, but that order isn't an important part of a dict's identity. That is, two dicts with the same keys and values mapped to those keys are equivalent, no matter what order the keys were added.
>>> dict(a=1, b=2) == dict(b=2, a=1)
True
The same is not true of two OrderedDicts, which are only equal they have the same keys, the same values for those keys, and the keys were added in the same order.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1)
False
enumerate accepts any iterable which includes set and dict. set might be unordered but its order of iteration is not arbitrary; if you iterate the same set multiple times, it will yield elements in the same order.
Also note that as of Python 3.7 dict preserves insertion order. Whether or not this is useful solely depends on your use case.
So the python documentation suggests using itemgetter, attrgetter, or methodgetter from the operator module when applying sorted on complex data types. Further, iterators are smaller and faster than lists for large size objects.
Thus I am wondering how to create an iterator on an OrderDict's values. The reason being that in the OrderDict I wish to sort all the values are also (regular) dictionaries.
For regular dictionaries, one could do this with:
sorted(my_dict.itervalues(), key=itemgetter('my_key'))
however OrderedDict only seems to have the method __iter__() which works on the OrderedDict keys.
So how can I efficiently make an iterator for the values of the OrderedDict.
Note, I am not looking for list comprehension, a lambda function, or extracting the relevant sub key (key inside the dictionary (a value)) values of the OrderedDict.
e.g.
sorted (my_dict, key= lambda key: my_dict[key]['my_key'])
example nested:
test = OrderedDict({'a': {'x':1, 'y':2, 'z':3},
'b': {'x':1, 'y':2, 'z':3}
})
Neither dict nor OrderedDict have an itervalues() method in Python 3. That method only exists in Python 2.
Use dict.values():
sorted(my_dict.values(), key=itemgetter('my_key'))
In Python 2 you want to use itervalues() not so much because it is an iterator, but because dict.values() had to create a new list object which is then discarded again. Iterables are also not faster (rather, they are often slower!), they are instead more memory efficient. In this case it is faster because not having to create a (large) list that you then discard again takes time.
In Python 3, dict.values() creates a view instead, a lightweight object that like dict.itervalues() yields values on demand and doesn't have to produce a list up front.
You don't have to call iter() on this. sorted() takes an iterable, and will itself call iter() on whatever you passed in. Because it does this from native code and doesn't have to look up a global name, it can do this much faster than Python code ever could.
The answer is to call the method .values() to get a view and type set it to iter:
sorted(iter(my_dict.values()), key=itemgetter('my_subkey'))
I was looking for ways to sort a dictionary and came across this code on a SO thread:
import operator
x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
sorted_x = sorted(x.iteritems(), key=operator.itemgetter(1))
How does this code work?
When I call iteritems() over a dictionary I get this:
<dictionary-itemiterator object at 0xf09f18>
I know that this is a reference, but how do you use it?
And afaik, in sorted(a,b), as is supposed to be the thing you want to sort, and b would be the indicator for sorting right? How does itemgetter(1) work here?
operator.itemgetter(1) is equivalent to lambda x: x[1]. It's an efficient way to specify a function that returns the value at index 1 of its input.
.iteritems() is a method of a dictionary that returns an iterator over the entries in the dictionary in (key,value) tuple form.
iteritems() is just like items(), except it returns an iterator rather than a list. For large dictionaries, this saves memory because you can iterate over each individual element without having to build up the complete list of items in memory first.
sorted accepts a keyword argument key which is a function used to determine what to compare by when sorting something. In this case, it is using operator.itemgetter, which is like the function version of doing something[1]. Therefore, the code is sorting on the [1] item of the tuples returned by items(), which is the value stored in the dictionary.
Most python built-ins which deal with lists or list like objects also accept iterators, these are like a pointer into the list, which you can advance to the next item in the list with the next() member function. This can be very convenient for infinite lists or very large lists, (either many elements or very large elements,) to keep memory usage down. See http://docs.python.org/library/stdtypes.html#iterator-types
iteritems() gives an iterator into the list of items in the dictionary.
>>> {x for x in 'spam'}
{'a', 'p', 's', 'm'}
Why does it change the order? If you take a look at a loop, it works perfectly:
>>> for x in 'spam':
... print(x)
...
s
p
a
m
>>>
Sets in python (and in set theory) are not ordered. So when you loop over them, there is no defined ordering.
You looped over the string literal 'spam' to make a set containing each character in that string. Once you did that, the ordering was gone.
When you perform the for loop over 'spam', you are performing the loop against a string which does have ordering.
From Set types:
These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript [because no ordering is defined among the elemnts]. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
But if you really need to preserve the order, then please check ordered set.
And anyway you may like really to write just >>> set('spam') instead of any comprehension.
set is not an ordered collection, and as such, the internal order of keys is undefined.
From docs.python.org
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built in dict, list, and tuple classes, and the collections module.)
sets are unordered by definition. The reason for this is that their implementation runs faster that way, by using appropriate data structures that do not preserve order. If you need order, you can use the (slower) OrderedDict type.
Python sets are defined as unordered, so Python is free to order them any way it likes (efficiently, I pressme).