Time complexity of dict.fromkeys() - python

I'm trying to get an ordered set in Python 3.8. According to this answer, I'm using dict.fromkeys() method to get the unique items from a list preserving the insertion order. What's the time complexity of this method? As I'm using this frequently in my codebase, is it the most efficient way or is there any better way to get an ordered set?
>>> lst = [4,2,4,5,6,2]
>>> dict.fromkeys(lst)
{4: None, 2: None, 5: None, 6: None}
>>> list(dict.fromkeys(lst))
[4, 2, 5, 6]

Related

Python: maintain key sort order on dictionary

I want to iteratively add elements to a dictionary with an integer key for which I would like to keep a key-ordering. Modern Python (3.7+) keeps an insertion order on dict, but I need a key ordering.
Example use-case:
from collections import defaultdict
import numpy as np
my_dict = defaultdict(list)
for i in range(10):
idx = i + np.random.randint(10)
my_dict[idx].append(i)
# Do something with my_dict
...
print(my_dict)
Example output:
>> defaultdict(<class 'list'>, {9: [0, 4, 9], 10: [1], 2: [2], 6: [3, 5], 7: [6, 7], 16: [8]})
Desired output:
print(defaultdict(list, sorted(my_dict.items())))
>> defaultdict(<class 'list'>, {2: [2], 6: [3, 5], 7: [6, 7], 9: [0, 4, 9], 10: [1], 16: [8]})
Of course, this is a very simple sort, but the index shifts (computed above as i + np.random.randint(10)) can become arbitrarily large and I need a low time-complexity solution. Also note that I am also removing items from my_dict inside the loop (e.g., keys with value lesser or equal to i).
What kind of objects/ data structures does Python provide to achieve this? I've looked at PriorityQueue (heapq), which preserves the ordering I need but only that. I need the get and pop methods from the conventional dictionary + the ordering of e.g., PriorityQueue on the keys without having to expensively sort at every iteration.
Edit: The best current solution that I was able to find is the use of SortedDict from the SortedContainers library. Unfortunately, this solution loses the O(1) time complexity of dict.pop, to O(log N), but the dictionary is kept in key-order with O(log N) instead of O(N log N).
I am still open to hearing alternative solutions that preserves the characteristics of SortedDict, but provides O(1) time complexity for SortedDict.pop. Note that pop is always called on the smallest key, just like a queue (/ dequeue).
Here is a function I made to sort the dictionary keys.
def dict_sort(d):
return({x:d[x] for x in sorted(d)})
Basically, the function iterates through the sorted() version of the keys, and then maps them to a new dictionary (in order) and returns that new dictionary.
Call like this:
a=dict_sort(my_dict)
print(a) #this prints out sorted dictionary
Hope this was a help.

Example of set subtraction in python

I'm taking a data structures course in Python, and a suggestion for a solution includes this code which I don't understand.
This is a sample of a dictionary:
vc_metro = {
'Richmond-Brighouse': set(['Lansdowne']),
'Lansdowne': set(['Richmond-Brighouse', 'Aberdeen'])
}
It is suggested that to remove some of the elements in the value, we use this code:
vc_metro['Lansdowne'] -= set(['Richmond-Brighouse'])
I have never seen such a structure, and using it in a basic situation such as:
my_list = [1, 2, 3, 4, 5, 6]
other_list = [1, 2]
my_list -= other_list
doesn't work. Where can I learn more about this recommended strategy?
You can't subtract lists, but you can subtract set objects meaningfully. Sets are hashtables, somewhat similar to dict.keys(), which allow only one instance of an object.
The -= operator is equivalent to the difference method, except that it is in-place. It removes all the elements that are present in both operands from the left one.
Your simple example with sets would look like this:
>>> my_set = {1, 2, 3, 4, 5, 6}
>>> other_set = {1, 2}
>>> my_set -= other_set
>>> my_set
{3, 4, 5, 6}
Curly braces with commas but no colons are interpreted as a set object. So the direct constructor call
set(['Richmond-Brighouse'])
is equivalent to
{'Richmond-Brighouse'}
Notice that you can't do set('Richmond-Brighouse'): that would add all the individual characters of the string to the set, since strings are iterable.
The reason to use -=/difference instead of remove is that differencing only removes existing elements, and silently ignores others. The discard method does this for a single element. Differencing allows removing multiple elements at once.
The original line vc_metro['Lansdowne'] -= set(['Richmond-Brighouse']) could be rewritten as
vc_metro['Lansdowne'].discard('Richmond-Brighouse')

Why does sorting a list of dict keys in one line, with .sort() not work, while sorted() does?

While practicing Python (3.7.3), I find myself wanting to sort the keys of a dict. But I am walking up against something I don't understand, and can't find explained on SO.
edit: I know that the sort() method changes the list itself, while sorted() leaves the original list intact and returns new one. But can someone explain why the list() constructor doesn't seem to return the list anymore when I call it's sort() method?
Can someone explain why this doesn't return anything:
>>> md = {5: 3, 2: 1, 8: 9}
>>> ml = list(md.keys()).sort()
>>> ml
>>>
While if I do it in two separate steps, it does work:
>>> ml = list(md.keys())
>>> ml
[5, 2, 8]
>>> ml.sort()
>>> ml
[2, 5, 8]
>>>
Also, I found that doing it in one line using sorted(), it works as well:
>>> sorted(list(md.keys()))
[2, 5, 8]
sort sorts the iterable in-place, but returns None, which is assigned to ml. That's why the REPL does not show anything.
On the contrary, sorted returns a sorted representation of the original iterable.
sort() sorts directly your array, while sorted() returns a new array. (Docs)

Sorting a list of python sets by value

The frozenset docs says:
The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.
However, the docs for for python sets says:
Since sets only define partial ordering (subset relationships), the output of the list.sort() method is undefined for lists of sets.
This makes me ask: why is the case? And, if I wanted to sort a list of sets by set content, how could I do this? I know that the extension intbitset: https://pypi.python.org/pypi/intbitset/2.3.0 , has a function for returning a bit sequence that represents the set contents. Is there something comparable for python sets?
Tuples, lists, strings, etc. have a natural lexicographic ordering and can be sorted because you can always compare two elements of a given collection. That is, either a < b, b < a, or a == b.
A natural comparison between two sets is having a <= b mean a is a subset of b, which is what the expression a <= b actually does in Python. What the documentation means by "partial ordering" is that not all sets are comparable. Take, for example, the following sets:
a = {1, 2, 3}
b = {4, 5, 6}
Is a a subset of b? No. Is b a subset of a? No. Are they equal? No. If you can't compare them at all, you clearly can't sort them.
The only way you can sort a collection of sets is if your comparison function actually can compare any two elements (a total order). This means you can still sort a collection of sets using the above subset relation, but you will have to ensure that all of the sets are comparable (e.g. [{1}, {1, 2, 4}, {1, 2}]).
The easiest way to do what you want is to transform each individual set into something that you actually can compare. Basically, you do f(a) <= f(b) (where <= is obvious) for some simple function f. This is done with the key keyword argument:
In [10]: def f(some_set):
... return max(some_set)
...
In [11]: sorted([{1, 2, 3, 999}, {4, 5, 6}, {7, 8, 9}], key=f)
Out[11]: [{4, 5, 6}, {7, 8, 9}, {1, 2, 3, 999}]
You're sorting [f(set1), f(set2), f(set3)] and applying the resulting ordering to [set1, set2, set3].
Take an example: say you wanted to sort a list of sets by the "first element" of each set. The issue is that Python sets or frozensets don't have a "first element." They have no sense of their own ordering. A set is an unordered collection with no duplicate elements.
Furthermore, list.sort() sorts the list in place, using only the < operator between items.
If you just use a.sort() without passing any key parameter, saying set_a < set_b (or set_a.__lt__(set_b)) is insufficient. By insufficient, I mean that set_a.__lt__(set_b) is a subset operator. (Is a a subset of b?). As mentioned by #Blender and referenced in your question, this provides for partial rather than total ordering, which is insufficient for defining what ever sequence holds the sets.
From the docs:
set < other: Test whether the set is a proper subset of other, that
is, set <= other and set != other.
You could pass a key to sort(), it just couldn't refer to anything to do with the "ordering" of the sets internally, because remember--there is none.
>>> a = {2, 3, 1}
>>> b = {6, 9, 0, 1}
>>> c = {0}
>>> i = [b, a, c]
>>> i.sort(key=len)
>>> i
[{0}, {1, 2, 3}, {0, 9, 6, 1}]

Why python does not include a ordered dict (by default)?

Python have some great structures to model data.
Here are some :
+-------------------+-----------------------------------+
| indexed by int | no-indexed by int |
+-------------+-------------------+-----------------------------------+
| no-indexed | [1, 2, 3] | {1, 2, 3} |
| by key | or | or |
| | [x+1 in range(3)] | {x+1 in range(3)} |
+-------------+-------------------+-----------------------------------+
| indexed | | {'a': 97, 'c': 99, 'b': 98} |
| by key | | or |
| | | {chr(x):x for x in range(97,100)} |
+-------------+-------------------+-----------------------------------+
Why python does not include by default a structure indexed by key+int (like a PHP Array) ? I know there is a library that emulate this object ( http://docs.python.org/3/library/collections.html#ordereddict-objects). But here is the representation of a "orderedDict" taken from the documentation :
OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])
Wouldn't it be better to have a native type that should logically be writen like this:
['a': 97, 'b': 98, 'c': 99]
And same logic for orderedDict comprehension :
[chr(x):x for x in range(97,100)]
Does it make sense to fill the table cell like this in the python design?
It is there any particular reason for this to not be implemented yet?
Python's dictionaries are implemented as hash tables. Those are inherently unordered data structures. While it is possible to add extra logic to keep track of the order (as is done in collections.OrderedDict in Python 2.7 and 3.1+), there's a non-trivial overhead involved.
For instance, the recipe that the collections documentation suggest for use in Python 2.4-2.6 requires more than twice as much work to complete many basic dictionary operations (such as adding and removing values). This is because it must maintain a doubly-linked list to use for ordered iteration, and it needs an extra dictionary to help maintain the list. While its operations are still O(1), the constant terms are larger.
Since Python uses dict instances everywhere (for all variable lookups, for instance), they need to be very fast or every part of every program will suffer. Since ordered iteration is not needed very often, it makes sense to avoid the overhead it requires in the general case. If you need an ordered dictionary, use the one in the standard library (or the recipe it suggests, if you're using an earlier version of Python).
Your question appears to be "why does Python not have native PHP-style arrays with ordered keys?"
Python has three core non-scalar datatypes: list, dict, and tuple. Dicts and tuples are absolutely essential for implementing the language itself: they are used for assignment, argument unpacking, attribute lookup, etc. Although not really used for the core language semantics, lists are pretty essential for data and programs in Python. All three must be extremely lightweight, have very well-understood semantics, and be as fast as possible.
PHP-style arrays are none of these things. They are not fast or lightweight, have poorly defined runtime complexity, and they have confused semantics since they can be used for so many different things--look at the array functions. They are actually a terrible datatype for almost every use case except the very narrow one for which they were created: representing x-www-form-encoded data. Even for this use case a failing is that earlier keys overwrite the value of later keys: in PHP ?a=1&a=2 results in array('a'=>2). (A common structure for dealing with this in Python is the MultiDict, which has ordered keys and values, and each key can have multiple values.)
PHP has one datatype that must be used for pretty much every use case without being great for any of them. Python has many different datatypes (some core, many more in external libraries) which excel at much more narrow use cases.
Adding a new answer with updated information: As of CPython3.6, dicts preserve order. Though still not index-accessible. Most likely because integer-based item-lookup is ambiguous since dict keys can be int's. (Some custom use cases exist.)
Unfortunately, the documentation for dict hasn't been updated to reflect this (yet) and still says "Keys and values are iterated over in an arbitrary order which is non-random". Ironically, the collections.OrderedDict docs mention the new behaviour:
Changed in version 3.6: With the acceptance of PEP 468, order is retained for keyword arguments passed to the OrderedDict constructor and its update() method.
And here's an article mentioning some more details about it:
A minor but useful internal improvement: Python 3.6 preserves the order of elements for more structures. Keyword arguments passed to a function, attribute definitions in a class, and dictionaries all preserve the order of elements as they were defined.
So if you're only writing code for Py36 onwards, you shouldn't need collections.OrderedDict unless you're using popitem, move_to_end or order-based equality.
Example, in Python 2.7:
>>> d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 0: None}
>>> d
{'a': 1, 0: None, 'c': 3, 'b': 2, 'd': 4}
And in Python 3.6:
>>> d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 0: None}
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 0: None}
>>> d['new'] = 'really?'
>>> d[None]= None
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 0: None, 'new': 'really?', None: None}
>>> d['a'] = 'aaa'
>>> d
{'a': 'aaa', 'b': 2, 'c': 3, 'd': 4, 0: None, 'new': 'really?', None: None}
>>>
>>> # equality is not order-based
>>> d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 0: None}
... d2 = {'b': 2, 'a': 1, 'd': 4, 'c': 3, 0: None}
>>> d2
{'b': 2, 'a': 1, 'd': 4, 'c': 3, 0: None}
>>> d1 == d2
True
As of python 3.7 this is now a default behavior for dictionaries, it was an implementation detail in 3.6 that was adopted as of June 2018 :')
the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.
https://docs.python.org/3/whatsnew/3.7.html

Categories

Resources