What is the time complexity of dict.keys() in Python? - python

I came across a question when I solved this LeetCode problem. Although my solution got accepted by the system, I still do not have any idea after searching online for the following question:
What is the time complexity of dict.keys() operation?
Does it return a view of the keys or a real list (stores in memory) of the keys?

In Python 2, it's O(n), and it builds a new list. In Python 3, it's O(1), but it doesn't return a list. To draw a random element from a dict's keys, you'd need to convert it to a list, and that conversion is O(n).
It sounds like you were probably using random.choice(d.keys()) for part 3 of that problem. If so, that was O(n), and you got it wrong. You need to either implement your own hash table or maintain a separate list of elements, without sacrificing average-case O(1) insertions and deletions.

Related

Does looping through a reversed(list) add to the time complexity of my function?

Short but simple question. I have been studying time complexity for a coding interview and I can not find a concise answer to this. I am aware that this question exists on SO. What is the time complexity of Python List Reverse?.
Python has two methods for looping through lists. You can either loop through list.reverse() which reverses the list in place at time complexity O(n) or you can loop through reversed(list). My question is: does using reversed(list) also come at complexity O(n) or not?
I can imagine two possible answers, either it actually reverses the list (in which case I would say: Yes, it does add O(n)) or it just loops through the list from the other side (in which case I would say, No: it doesn't). Can anyone give me a definitive answer to this?
The reversed builtin like many others in Python 3 return an iterator, and hence does not cost an extra n iterations. When dealing with big-O notation it won't matter, but you are interested in the factor, so no,
for i in reversed(my_list):
passes the list exactly once. This is exactly the point of these type of helper functions.
Note you could have used
for i in my_list[::-1]:
which is a common way to iterate, but using slicing already returns a list - so similarly to my_list.reverse, an extra n as well.

What is the algorithmic complexity of converting a collections.deque to python list?

I'm trying to determine the complexity of converting a collections.deque object into a python list object is O(n). I imagine it would have to take every element and convert it into the list, but I cannot seem to find the implementation code behind deque. So has Python built in something more efficient behind the hood that could allow for O(1) conversion to a list?
Edit: Based off the following I do not believe it could be any faster than O(n)
"Indexed access is O(1) at both ends but slows to O(n) in the middle. For fast random access, use lists instead."
If it cannot access a middle node in O(1) time it will not be able to convert without the same complexity.
You have to access every node. O(1) time is impossible for that fact alone.
I would believe that a deque follows the same principles as conventional deques, in that it's constant time to access the first element. You have to do that for n elements, so the runtime to do so would be O(n).
Here is the implementation of deque
However, that is irrelevant for determining complexity to convert a deque to list in python.
If python is not reusing the data structure internally somehow, conversion into a list will require a walk through the deque and it will be O(n).

Best / most pythonic way to remove duplicate from the a list and sort in reverse order

I'm trying to take a list (orig_list below), and return a list (new_list below) which:
does not contain duplicate items (i.e. contains only unique elements)
is sorted in reverse order
Here is what I have so far, which seems... I'm going to say "weird," though I'm sure there is a better way to say that. I'm mostly put off by using list() twice for what seems pretty straightforward, and then I'm wondering about the efficiency of this approach.
new_list = list(reversed(sorted(list(set(orig_list)))))
Question #1 (SO-style question):
Are the following propositions correct?
There is no more efficient way to get unique elements of a list than converting the list to a set and back.
Since sets are unordered in Python one must (1) convert to a set before removing duplicate items because otherwise you'd lose the sort anyway, and (2) you have to convert back to a list before you sort.
Using list(reversed()) is programatically equivalent to using list.sort(reversed=True).
Question #2 (bonus):
Are there any ways to achieve the same result in fewer Os, or using a less verbose approach? If so, what is an / are some example(s)?
sorted(set(orig_list), reverse=True)
Shortest in code, more efficient, same result.
Depending on the size, it may or may not be faster to sort first then dedupe in linear time as user2864740 suggests in comments. (The biggest drawback to that approach is it would be entirely in Python, while the above line executes mostly in native code.)
Your questions:
You do not need to convert from set to list and back. sorted accepts any iterable, so set qualifies, and spits out a list, so no post-conversion needed.
reversed(sorted(x)) is not equivalent to sorted(x, reverse=True). You get the same result, but slower - sort is of same speed whether forward or reverse, so reversed is adding an extra operation that is not needed if you sort to the proper ordering from the start.
You've got a few mildly wasteful steps in here, but your proposition is largely correct. The only real improvements to be made are to get rid of all the unnecessary temporary lists:
new_list = sorted(set(orig_list), reverse=True)
sorted already converts its input to a list (so no need to listify before passing to sorted), and you can have it directly produce the output list sorted in reverse (so no need to produce a list only to make a copy of it in reverse).
The only conceivable improvement on big-O time is if you know the data is already sorted, in which case you can avoid O(n log n) sorting, and uniqify without losing the existing sorted order by using itertools.groupby:
new_list = [key for key, grp in itertools.groupby(orig_list)]
If orig_list is sorted in forward order, you can make the result of this reversed at essentially no cost by changing itertools.groupby(orig_list) to itertools.groupby(reversed(orig_list)).
The groupby solution isn't really practical for initially unsorted inputs, because if duplicates are even remotely common, removing them via uniquification as a O(n) step is almost always worth it, as it reduces the n in the more costly O(n log n) sorting step. groupby is also a relatively slow tool; the nature of the implementation using a bunch of temporary iterators for each group, internal caching of values, etc., means that it's a slower O(n) in practice than the O(n) uniquification via set, with its primary advantage being the streaming aspect (making it scale to data sets streamed from disk or the network and back without storing anything for the long term, where set must pull everything into memory).
The other reason to use sorted+groupby would be if your data wasn't hashable, but was comparable; in that case, set isn't an option, so the only choice is sorting and grouping.

Using dictionary instead of sorting and then searching

I was studying hash tables and a thought came:
Why not use dictionaries for searching an element instead of first sorting the list then doing binary search? (assume that I want to search multiple times)
We can convert a list to a dictionary in O(n) (I think) time because we have to go through all the elements.
We add all those elements to dictionary and this takes O(1) time
When the dictionary is ready,we can then search for any element in O(1) time(average) and O(n) is the worst case
Now if we talk about average case O(n) is better than other sorting algorithms because at best they take O(nlogn).And if I am right about all of what I have said then why not do this way?
I know there are various other things which you can do with the sorted elements which cannot be done in an unsorted dictionary or array.But if we stick only to search then Is it not a better way to do search than other sorting algorithms?
Right, a well-designed hash table can beat sorting and searching.
For a proper choice, there are many factors entering into play such as in-place requirement, dynamism of the data set, number of searches vs. insertions/deletions, ease to build an effective hashing function...
Binary Search is a searching technique which exploits the fact that list of keys in which a key is to be searched is already sorted, it doesn't requires you to sort and then search, making its worst case search time O(log n).
If you do not have a sorted list of keys and want to search a key then you will have to go for linear search which in worst case will run with O(n) complexity, there is no need to sort and then search which definitely slower since best known sorting algos can work in only O(n log n) time.
Building a dictionary from a list of keys and then performing a lookup is of no advantage here because linear search will yield the same for better performance and also there need for auxiliary memory which would be needed in case of dictionary; however if you have multiple lookups and key space is small using a dictionary can of advantage since building the dictionary is one time work of O(n) and subsequent lookups can be done by O(1) at the expense of some memory which will be used by the dictionary.

How does Python's bisect work?

I've read that Python's lists are implemented using pointers. I then see this module http://docs.python.org/2/library/bisect.html which does efficient insertion into a sorted list. How does it do that efficiently? If the list is implemented using pointers and not via contiguous array, then how can it be efficiently searched for the insertion point? And if the list is backed via a contiguous array, then there would have to be element shifting when inserting an element. So how this bisect work efficiently?
I believe the elements of a list are pointed at, but the "list" is really a contiguous array (in C). They're called lists, but they're not linked lists.
Actually, finding an element in a sorted list is pretty good - it's O(logn). But inserting is not that good - it's O(n).
If you need a logn datastructure, it'd be better to use a treap or red-black tree.
It's the searching that's efficient, not the actual insertion. The fast searching makes the whole operation "adding a value and keeping all values in order" fast compared to, for example, appending and then sorting again: O(n) rather than O(n log n).

Categories

Resources