I was wondering what is the time complexity of sorting a dictionary by key and sorting a dictionary by value.
for e.g :
for key in sorted(my_dict, key = my_dict.get):
<some-code>
in the above line , what is the time complexity of sorted ? If it is assumed that quicksort is used, is it O(NlogN) on an average and O(N*N) in the worst case ?
and is the time complexity of sorting by value and sorting by key are different ? Since , accessing the value by its key takes only O(1) time, both should be same ?
Thanks.
sorted doesn't really sort a dictionary; it collects the iterable it receives into a list and sorts the list using the Timsort algorithm. Timsort is decidedly not a variant of quicksort, it is a hybrid algorithm closer to merge sort. According to wikipedia, its complexity is O(n log n) in the worst case, with optimizations to speed up the commonly encountered partially ordered data sets.
Since collecting the dict keys and values are both O(n), the complexity of both sorts is the same, determined by the sort algorithm as O(n log n).
Related
I am looking for a Python datastructure that functions as a sorted list that has the following asymptotics:
O(1) pop from beginning (pop smallest element)
O(1) pop from end (pop largest element)
>= O(log n) insert
Does such a datastructure with an efficient implementation exist? If so, is there a library that implements it in Python?
A regular red/black tree or B-tree can do this in an amortized sense. If you store pointers to the smallest and biggest elements of the tree, then the cost of deleting those elements is amortized O(1), meaning that any series of d deletions will take time O(d), though individual deletions may take longer than this. The cost of insertions are O(log n), which is as good as possible because otherwise you could sort n items in less than O(n log n) time with your data structure.
As for libraries that implement this - that I’m not sure of.
There's already a question regarding this, and the answer says that the asymptotic complexity is O(n). But I observed that if an unsorted list is converted into a set, the set can be printed out in a sorted order, which means that at some point in the middle of these operations the list has been sorted. Then, as any comparison sort has the lower bound of Omega(n lg n), the asymptotic complexity of this operation should also be Omega(n lg n). So what exactly is the complexity of this operation?
A set in Python is an unordered collection so any order you see is by chance. As both dict and set are implemented as hash tables in CPython, insertion is average case O(1) and worst case O(N).
So list(set(...)) is always O(N) and set(list(...)) is average case O(N).
You can browse the source code for set here.
I'm interested in the big-O complexity of a loop over the same dictionary twice and then loop over the length of a specific key in the dictionary, as shown in the pseudocode. What is the Big-O for each loop, and what would the final Big-O be?
I have tried looking at other big-O threads in here, but they are either confusing to me, due to my restricted knowledge of big-O, or not as case specific as what I'm looking for.
Thanks
dictionary = A dictionary with 100 keys and corresponding values of 10-20 characters each
for Akey in dictionary:
do something
for Bkey in dictionary:
do something
for i in range(len(dictionary[Bkey]))
do something
For python dictionaries, which are based on a hash-table the worst case of finding an element is O(n). However, the amortized average case is O(1). So if you do a loop over all elements it is O(1) x n -> O(n), unless you have the degenerate case of bad hash-codes and then you have O(n^2). If you do several similar operations but the number of those operations is fixed and doesn't depend on n it doesn't change the O.
if you nest a loop in another loop you will have to multiply the cost.
O(n*const)->O(n). Now you were talking about doing something with the keys but didn't mention the values.
From the pseudocode it seems that you want to take a list of the keys. Iterating over all the keys is O(n).
What data structure - Set, List or Dictionary - in Python will you use to store 10 million integers? Query operations consist of finding the number of times a number appears in the given set. What will be the worst case time and space complexity in all the cases?
This is a sample interview question. What would be the most appropriate answer?
The key to this question is the line that states:
"finding the number of times a number appears in a given set"
The set data structure is going to be incapable of keeping a count of how many times a number appears within the total dataset, and a List is going to be extremely costly to iterate over. Which leaves a dictionary as the only viable option.
Breaking down the options:
Set:
A set automatically de-dupes values added to the set that already exist. So it would be impossible to query the frequency that a number appeared within the stored dataset using a set, because the answer for all numbers stored will be 1.
Time complexity for querying: O(1)
Space complexity for storing: O(n)
List:
A list could be iterated over to determine the frequency of a given number within the list. However this is going to be O(n) operation, and for 10 million integers will not be efficient.
Time complexity for querying: O(n)
Space complexity for storing: O(n)
Dictionary:
A dictionary allows you to store a key-value pair. In this case, you would store the number to be searched as the key, and the count of how many times it has been stored as the associated value. Because of the way that dictionaries will hash keys into distinct buckets (There can be collisions, but let's assume a non-colliding theoretical dictionary for now), the lookup time for a given key approaches O(1). Calculating the count however, is going to slow down a Dictionary; it will take O(n) time complexity to calculate the counts for all keys (because each key will have to be hit at least once in order to append it's count to the running count stored in the value).
Time complexity for querying: O(1)
Time complexity for storing: O(n)
Space complexity for storing: O(2n) = O(n)
Adding to the answer of #John Stark
From the python wiki, the time complexity for querying in a set is O(n). This is because it uses a hash to get the value, but with (a LOT) of bad luck, you might have a hash collision for every key. In the vast majority of cases, however, you won't have collision.
Also, because here the keys are integers, you reduce the hash collisions, if the range of integers is limited. In python 2 with the type int, you can't have collisions.
Add every number as number:1 in dict if number not in dict, else add 1 to Val of that Key.
Then search for specific number as Key, the Val will be the number of times that appears.
I was studying hash tables and a thought came:
Why not use dictionaries for searching an element instead of first sorting the list then doing binary search? (assume that I want to search multiple times)
We can convert a list to a dictionary in O(n) (I think) time because we have to go through all the elements.
We add all those elements to dictionary and this takes O(1) time
When the dictionary is ready,we can then search for any element in O(1) time(average) and O(n) is the worst case
Now if we talk about average case O(n) is better than other sorting algorithms because at best they take O(nlogn).And if I am right about all of what I have said then why not do this way?
I know there are various other things which you can do with the sorted elements which cannot be done in an unsorted dictionary or array.But if we stick only to search then Is it not a better way to do search than other sorting algorithms?
Right, a well-designed hash table can beat sorting and searching.
For a proper choice, there are many factors entering into play such as in-place requirement, dynamism of the data set, number of searches vs. insertions/deletions, ease to build an effective hashing function...
Binary Search is a searching technique which exploits the fact that list of keys in which a key is to be searched is already sorted, it doesn't requires you to sort and then search, making its worst case search time O(log n).
If you do not have a sorted list of keys and want to search a key then you will have to go for linear search which in worst case will run with O(n) complexity, there is no need to sort and then search which definitely slower since best known sorting algos can work in only O(n log n) time.
Building a dictionary from a list of keys and then performing a lookup is of no advantage here because linear search will yield the same for better performance and also there need for auxiliary memory which would be needed in case of dictionary; however if you have multiple lookups and key space is small using a dictionary can of advantage since building the dictionary is one time work of O(n) and subsequent lookups can be done by O(1) at the expense of some memory which will be used by the dictionary.