Big-O of loop over dictionary - python

I'm interested in the big-O complexity of a loop over the same dictionary twice and then loop over the length of a specific key in the dictionary, as shown in the pseudocode. What is the Big-O for each loop, and what would the final Big-O be?
I have tried looking at other big-O threads in here, but they are either confusing to me, due to my restricted knowledge of big-O, or not as case specific as what I'm looking for.
Thanks
dictionary = A dictionary with 100 keys and corresponding values of 10-20 characters each
for Akey in dictionary:
do something
for Bkey in dictionary:
do something
for i in range(len(dictionary[Bkey]))
do something

For python dictionaries, which are based on a hash-table the worst case of finding an element is O(n). However, the amortized average case is O(1). So if you do a loop over all elements it is O(1) x n -> O(n), unless you have the degenerate case of bad hash-codes and then you have O(n^2). If you do several similar operations but the number of those operations is fixed and doesn't depend on n it doesn't change the O.
if you nest a loop in another loop you will have to multiply the cost.
O(n*const)->O(n). Now you were talking about doing something with the keys but didn't mention the values.
From the pseudocode it seems that you want to take a list of the keys. Iterating over all the keys is O(n).

Related

Complexity for set

this was my interview question which I got wrong and I am very confused by it.
fruits = {apples, grapes, oranges, pears}
for fruit in fruits:
print(fruit)
My thinking was we access the O(1) N times, so the time complexity is O(n). However, they said that I am incorrect and the answer is O(1). It was multiple-choice, so I did not get feedback. Can anyone explain it?
If you assume the fruits array have a constant amount of members (4) in every case the loop will take constant time any time you use. ;-)
The complexity is O(1) in most situations.
Explanation:
Lookup/Insert/Delete have O(1) complexity as an average case. The reason here is they make use of Hashtable.
How does the Hashtable do it in O(1)?
Whenever you try to insert a new value into a set, the value goes through a hash function and its position is decided b y that hash function. So, calculating that position is O(1) and inserting it at a known position is O(1).
Now, when you wish to find a value x, you just need to know the hash function to know the position of x. You need not loop through all the values (which would make it O(n)).
When is task of Lookup O(n)?
As previously explained, you take a value, insert it into a hash function and find its position as to where you can place it. Imagine a hash function that, in some cases, places different values at the same location. So, when you try to lookup, you know the position, but you still have to do a linear search among all the elements to find the required element. Imagine the worst case "hypothetical" scenario when the hash function places all the elements in a single position. So, here we will again have a linear search among n elements that makes the lookup have a complexity of O(n).
Let me know in comments if you have doubts. And do like it if you gained some information.

List vs Dictionary vs Set for finding the number of times a number appears

What data structure - Set, List or Dictionary - in Python will you use to store 10 million integers? Query operations consist of finding the number of times a number appears in the given set. What will be the worst case time and space complexity in all the cases?
This is a sample interview question. What would be the most appropriate answer?
The key to this question is the line that states:
"finding the number of times a number appears in a given set"
The set data structure is going to be incapable of keeping a count of how many times a number appears within the total dataset, and a List is going to be extremely costly to iterate over. Which leaves a dictionary as the only viable option.
Breaking down the options:
Set:
A set automatically de-dupes values added to the set that already exist. So it would be impossible to query the frequency that a number appeared within the stored dataset using a set, because the answer for all numbers stored will be 1.
Time complexity for querying: O(1)
Space complexity for storing: O(n)
List:
A list could be iterated over to determine the frequency of a given number within the list. However this is going to be O(n) operation, and for 10 million integers will not be efficient.
Time complexity for querying: O(n)
Space complexity for storing: O(n)
Dictionary:
A dictionary allows you to store a key-value pair. In this case, you would store the number to be searched as the key, and the count of how many times it has been stored as the associated value. Because of the way that dictionaries will hash keys into distinct buckets (There can be collisions, but let's assume a non-colliding theoretical dictionary for now), the lookup time for a given key approaches O(1). Calculating the count however, is going to slow down a Dictionary; it will take O(n) time complexity to calculate the counts for all keys (because each key will have to be hit at least once in order to append it's count to the running count stored in the value).
Time complexity for querying: O(1)
Time complexity for storing: O(n)
Space complexity for storing: O(2n) = O(n)
Adding to the answer of #John Stark
From the python wiki, the time complexity for querying in a set is O(n). This is because it uses a hash to get the value, but with (a LOT) of bad luck, you might have a hash collision for every key. In the vast majority of cases, however, you won't have collision.
Also, because here the keys are integers, you reduce the hash collisions, if the range of integers is limited. In python 2 with the type int, you can't have collisions.
Add every number as number:1 in dict if number not in dict, else add 1 to Val of that Key.
Then search for specific number as Key, the Val will be the number of times that appears.

What is the time complexity of dict.keys() in Python?

I came across a question when I solved this LeetCode problem. Although my solution got accepted by the system, I still do not have any idea after searching online for the following question:
What is the time complexity of dict.keys() operation?
Does it return a view of the keys or a real list (stores in memory) of the keys?
In Python 2, it's O(n), and it builds a new list. In Python 3, it's O(1), but it doesn't return a list. To draw a random element from a dict's keys, you'd need to convert it to a list, and that conversion is O(n).
It sounds like you were probably using random.choice(d.keys()) for part 3 of that problem. If so, that was O(n), and you got it wrong. You need to either implement your own hash table or maintain a separate list of elements, without sacrificing average-case O(1) insertions and deletions.

Using dictionary instead of sorting and then searching

I was studying hash tables and a thought came:
Why not use dictionaries for searching an element instead of first sorting the list then doing binary search? (assume that I want to search multiple times)
We can convert a list to a dictionary in O(n) (I think) time because we have to go through all the elements.
We add all those elements to dictionary and this takes O(1) time
When the dictionary is ready,we can then search for any element in O(1) time(average) and O(n) is the worst case
Now if we talk about average case O(n) is better than other sorting algorithms because at best they take O(nlogn).And if I am right about all of what I have said then why not do this way?
I know there are various other things which you can do with the sorted elements which cannot be done in an unsorted dictionary or array.But if we stick only to search then Is it not a better way to do search than other sorting algorithms?
Right, a well-designed hash table can beat sorting and searching.
For a proper choice, there are many factors entering into play such as in-place requirement, dynamism of the data set, number of searches vs. insertions/deletions, ease to build an effective hashing function...
Binary Search is a searching technique which exploits the fact that list of keys in which a key is to be searched is already sorted, it doesn't requires you to sort and then search, making its worst case search time O(log n).
If you do not have a sorted list of keys and want to search a key then you will have to go for linear search which in worst case will run with O(n) complexity, there is no need to sort and then search which definitely slower since best known sorting algos can work in only O(n log n) time.
Building a dictionary from a list of keys and then performing a lookup is of no advantage here because linear search will yield the same for better performance and also there need for auxiliary memory which would be needed in case of dictionary; however if you have multiple lookups and key space is small using a dictionary can of advantage since building the dictionary is one time work of O(n) and subsequent lookups can be done by O(1) at the expense of some memory which will be used by the dictionary.

Can you speed up "for " loop in python with sorting ?

If I have a long unsorted list of 300k elements, will sorting this list first and then do a "for" loop on list speed up code? I need to do a "for loop" regardless, cant use list comprehension.
sortedL=[list].sort()
for i in sortedL:
(if i is somenumber)
"do some work"
How could I signal to python that sortedL is sorted and not read whole list. Is there any benefit to sorting a list? If there is then how can I implement?
It would appear that you're considering sorting the list so that you could then quickly look for somenumber.
Whether the sorting will be worth it depends on whether you are going to search once, or repeatedly:
If you're only searching once, sorting the list will not speed things up. Just iterate over the list looking for the element, and you're done.
If, on the other hand, you need to search for values repeatedly, by all means pre-sort the list. This will enable you to use bisect to quickly look up values.
The third option is to store elements in a dict. This might offer the fastest lookups, but will probably be less memory-efficient than using a list.
The cost of a for loop in python is not dependent on whether the input data is sorted.
That being said, you might be able to break out of the for loop early or have other computation saving things at the algorithm level if you sort first.
If you want to search within a sorted list, you need an algorithm that takes advantage of the sorting.
One possibility is the built-in bisect module. This is a bit of a pain to use, but there's a recipe in the documentation for building simple sorted-list functions on top of it.
With that recipe, you can just write this:
i = index(sortedL, somenumber)
Of course if you're just sorting for the purposes of speeding up a single search, this is a bit silly. Sorting will take O(N log N) time, then searching will take O(log N), for a total time of O(N log N); just doing a linear search will take O(N) time. So, unless you're typically doing log N searches on the same list, this isn't worth doing.
If you don't actually need sorting, just fast lookups, you can use a set instead of a list. This gives you O(1) lookup for all but pathological cases.
Also, if you want to keep a list sorted while continuing to add/remove/etc., consider using something like blist.sortedlist instead of a plain list.

Categories

Resources