QUICKEST way to find a key in a dictionary - python

I have a dictionary, with over 11 million keys (and each value is a list).Each key is a unique integer.
e.g.
Dict1 = {11:"a",12:"b",22:"c",56:"d"}
Then, separately, I have a list of ranges, e.g.
[10-20,30-40,50-60]
And I want to say, for each range in my list of ranges, go through the dictionary and return the value, if the key is within the range.
So it would return:
10-20: "a","b"
50-60: "d"
The actual code that I used is:
for each_key in sorted(dictionary):
if each_key in range(start,end):
print str(dictionary[each_key])
The problem is that this technique is prohibitively long because it's going through all 11 million keys and checking if it's within the range or not.
Is there a way that I can say "skip through all of the dictionary keys until one in found that is higher than the start number" and then "stop once the end number is higher than the key"? Just basically some way that just zooms in on the portion of the dictionary within a certain range very quickly?
Thanks

Just use Python's EAFP principle. It's Easier to Ask Forgiveness than Permission.
Assume that all keys are valid, and catch the error if they're not:
for key in xrange(start, end):
try:
print str(dictionary[key])
except KeyError:
pass
This will just try to get each number as a key, and if there's a KeyError from a non existent key then it will move on to the next iteration.
Note that if you expect a lot of the keys will be missing, it might be faster to test first:
for key in xrange(start, end):
if key in dictionary:
print str(dictionary[key])
Note that xrange is just a slightly different function to range. It will produce the values one by one instead of creating the whole list in advance. It's useful to use in for loops and has no drawbacks in this case.

my thought for this problem is to find the correct keys first. The reason why your solution take too much time is that it use O(n) algorithm to find a correct key. If we can implement binary search method, the complexity will be reduced to O(log(n)), which helps a lot.
Following is my sample code. It works for the example, but I cannot promise it won't get some small bugs. Just find the idea there and implement yours.
def binarySearch(alist, target):
left = 0
right = len(alist) -1
if target>alist[-1]:
return len(alist)
while left < right:
m = (left + right) / 2
if alist[m] == target:
return m
if alist[m] < target:
left = m+1
else:
right = m
return left
def work(dictionary, start, end):
keys = sorted(dictionary.keys())
start_pos = binarySearch(keys, start)
end_pos = binarySearch(keys, end)
print [dictionary[keys[pos]] for pos in range(start_pos,end_pos)]
dictionary = {11:"a",12:"b",22:"c",56:"d"}
work(dictionary, 10, 20)
work(dictionary, 20, 40)
work(dictionary, 10, 60)

This solution ( using OrderedDict and filter ) can help you a bit.
from collections import OrderedDict
d = {2:3, 10:89, 4:5, 23:0}
od = OrderedDict(sorted(d.items()))
lst=["1-10","11-20","21-30"]
lower_lst=map(int,[i.split("-")[0] for i in lst])
upper_lst=map(int,[i.split("-")[1] for i in lst])
for low,up in zip(lower_lst,upper_lst):
print "In range {0}-{1}".format(low,up),filter(lambda a:low <= a[0] <= up,od.iteritems())

Related

Recursively generating a list of lists in a triangular format given a height and value

I recently started looking into recursion to clean up my code and "up my game" as it were. As such, I'm trying to do things which could normally be accomplished rather simply with loops, etc., but practicing them with recursive algorithms instead.
Currently, I am attempting to generate a two-dimensional array which should theoretically resemble a sort of right-triangle in an NxN formation given some height n and the value which will get returned into the 2D-array.
As an example, say I call: my_function(3, 'a');, n = 3 and value = 'a'
My output returned should be: [['a'], ['a', 'a'], ['a', 'a', 'a']]
[['a'],
['a', 'a'],
['a', 'a', 'a']]
Wherein n determines both how many lists will be within the outermost list, as well as how many elements should successively appear within those inner-lists in ascending order.
As it stands, my code currently looks as follows:
def my_function(n, value):
base_val = [value]
if n == 0:
return [base_val]
else:
return [base_val] + [my_function(n-1, value)]
Unfortunately, using my above example n = 3 and value = 'a', this currently outputs: [['a'], [['a'], [['a'], [['a']]]]]
Now, this doesn't have to get formatted or printed the way I showed above in a literal right-triangle formation (that was just a visualization of what I want to accomplish).
I will answer any clarifying questions you need, of course!
return [base_val]
Okay, for n == 0 we get [[value]]. Solid. Er, sort of. That's the result with one row in it, right? So, our condition for the base case should be n == 1 instead.
Now, let's try the recursive case:
return [base_val] + [my_function(n-1, value)]
We had [[value]], and we want to end up with [[value], [value, value]]. Similarly, when we have [[value], [value, value]], we want to produce [[value], [value, value], [value, value, value]] from it. And so on.
The plan is that we get one row at the moment, and all the rest of the rows by recursing, yes?
Which rows will we get by recursing? Answer: the ones at the beginning, because those are the ones that still look like a triangle in isolation.
Therefore, which row do we produce locally? Answer: the one at the end.
Therefore, how do we order the results? Answer: we need to get the result from the recursive call, and add a row to the end of it.
Do we need to wrap the result of the recursive call? Answer: No. It is already a list of lists. We're just going to add one more list to the end of it.
How do we produce the last row? Answer: we need to repeat the value, n times, in a list. Well, that's easy enough.
Do we need to wrap the local row? Answer: Yes, because we want to append it as a single item to the recursive result - not concatenate all its elements.
Okay, let's re-examine the base case. Can we properly handle n == 0? Yes, and it makes perfect sense as a request, so we should handle it. What does our triangle look like with no rows in it? Well, it's still a list of rows, but it doesn't have any rows in it. So that's just []. And we can still append the first row to that, and proceed recursively. Great.
Let's put it all together:
if n == 0:
return []
else:
return my_function(n-1, value) + [[value] * n]
Looks like base_val isn't really useful any more. Oh well.
We can condense that a little further, with a ternary expression:
return [] if n == 0 else (my_function(n-1, value) + [[value] * n])
You have a couple logic errors: off-by-1 with n, growing the wrong side (critically, the non-base implementation should not use a base-sized array), growing by an array of the wrong size. A fixed version:
#!/usr/bin/env python3
def my_function(n, value):
if n <= 0:
return []
return my_function(n-1, value) + [[value]*n]
def main():
print(my_function(3, 'a'))
if __name__ == '__main__':
main()
Since you're returning mutable, you can get some more efficiency by using .append rather than +, which would make it no longer functional. Also note that the inner mutable objects don't get copied (but since the recursion is internal this doesn't really matter in this case).
It would be possible to write a tail-recursive version of this instead, by adding a parameter.
But python is a weird language for using unnecessary recursion.
The easiest way for me to think about recursive algorithms is in terms of the base case and how to build on that.
The base case (case where no recursion is necessary) is when n = 1 (or n = 0, but I'm going to ignore that case). A 1x1 "triangle" is just a 1x1 list: [[a]].
So how do we build on that? Well, if n = 2, we can assume we already have that base case value (from calling f(1)) of [[a]]. So we need to add [a, a] to that list.
We can generalize this as:
f(1) = [[a]]
f(n > 1) = f(n - 1) + [[a] * n]
, or, in Python:
def my_function(n, value):
if n == 1:
return [[value]]
else:
return my_function(n - 1, value) + [[value] * n]
While the other answers proposed another algorithm for solving your Problem, it could have been solved by correcting your solution:
Using a helper function such as:
def indent(x, lst):
new_lst = []
for val in lst:
new_lst += [x] + val
return new_lst
You can implement the return in the original function as:
return [base_val] + indent(value, [my_function(n-1, value)])
The other solutions are more elegant though so feel free to accept them.
Here is an image explaining this solution.
The red part is your current function call and the green one the previous function call.
As you can see, we also need to add the yellow part in order to complete the triangle.
These are the other solutions.
In these solutions you only need to add a new row, so that it's more elegant overall.

Test if all values of a dictionary are equal - when value is unknown

I have 2 dictionaries:
the values in each dictionary should all be equal.
BUT I don't know what that number will be...
dict1 = {'xx':A, 'yy':A, 'zz':A}
dict2 = {'xx':B, 'yy':B, 'zz':B}
N.B. A does not equal B
N.B. Both A and B are actually strings of decimal numbers (e.g. '-2.304998') as they have been extracted from a text file
I want to create another dictionary - that effectively summarises this data - but only if all the values in each dictionary are the same.
i.e.
summary = {}
if dict1['xx'] == dict1['yy'] == dict1['zz']:
summary['s'] = dict1['xx']
if dict2['xx'] == dict2['yy'] == dict2['zz']:
summary['hf'] = dict2['xx']
Is there a neat way of doing this in one line?
I know it is possible to create a dictionary using comprehensions
summary = {k:v for (k,v) in zip(iterable1, iterable2)}
but am struggling with both the underlying for loop and the if statement...
Some advice would be appreciated.
I have seen this question, but the answers all seem to rely on already knowing the value being tested (i.e. are all the entries in the dictionary equal to a known number) - unless I am missing something.
sets are a solid way to go here, but just for code golf purposes here's a version that can handle non-hashable dict values:
expected_value = next(iter(dict1.values())) # check for an empty dictionary first if that's possible
all_equal = all(value == expected_value for value in dict1.values())
all terminates early on a mismatch, but the set constructor is well enough optimized that I wouldn't say that matters without profiling on real test data. Handling non-hashable values is the main advantage to this version.
One way to do this would be to leverage set. You know a set of an iterable has a length of 1 if there is only one value in it:
if len(set(dct.values())) == 1:
summary[k] = next(iter(dct.values()))
This of course, only works if the values of your dictionary are hashable.
While we can use set for this, doing so has a number of inefficiencies when the input is large. It can take memory proportional to the size of the input, and it always scans the whole input, even when two distinct values are found early. Also, the input has to be hashable.
For 3-key dicts, this doesn't matter much, but for bigger ones, instead of using set, we can use itertools.groupby and see if it produces multiple groups:
import itertools
groups = itertools.groupby(dict1.values())
# Consume one group if there is one, then see if there's another.
next(groups, None)
if next(groups, None) is None:
# All values are equal.
do_something()
else:
# Unequal values detected.
do_something_else()
Except for readability, I don't care for all the answers involving set or .values. All of these are always O(N) in time and memory. In practice it can be faster, although it depends on the distribution of values.
Also because set employs hashing operations, you may also have a hefty large constant multiplier to your time cost. And your values have to hashable, when a test for equality is all that's needed.
It is theoretically better to take the first value from the dictionary and search for the first example in the remaining values that is not equal to.
set might be quicker than the solution below because its workings are may reduce to C implementations.
def all_values_equal(d):
if len(d)<=1: return True # Treat len0 len1 as all equal
i = d.itervalues()
firstval = i.next()
try:
# Incrementally generate all values not equal to firstval
# .next raises StopIteration if empty.
(j for j in i if j!=firstval).next()
return False
except StopIteration:
return True
print all_values_equal({1:0, 2:1, 3:0, 4:0, 5:0}) # False
print all_values_equal({1:0, 2:0, 3:0, 4:0, 5:0}) # True
print all_values_equal({1:"A", 2:"B", 3:"A", 4:"A", 5:"A"}) # False
print all_values_equal({1:"A", 2:"A", 3:"A", 4:"A", 5:"A"}) # True
In the above:
(j for j in i if j!=firstval)
is equivalent to:
def gen_neq(i, val):
"""
Give me the values of iterator i that are not equal to val
"""
for j in i:
if j!=val:
yield j
I found this solution, which I find quite a bit I combined another solution found here: enter link description here
user_min = {'test':1,'test2':2}
all(value == list(user_min.values())[0] for value in user_min.values())
>>> user_min = {'test':1,'test2':2}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
False
>>> user_min = {'test':2,'test2':2}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
True
>>> user_min = {'test':'A','test2':'B'}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
False
>>> user_min = {'test':'A','test2':'A'}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
True
Good for a small dictionary, but I'm not sure about a large dictionary, since we get all the values to choose the first one

Fastest way to return duplicate element in list and also find missing element in list?

So my code is as shown below. Input is a list with exactly one duplicate item and one missing item.The answer is a list of two elements long ,first of which is the duplicate element in the list and second the missing element in the list in the range 1 to n.
Example =[1,4,2,5,1] answer=[1,3]
The code below works.
Am , I wrong about the complexity being O(n) and is there any faster way of achieving this in Python?
Also, is there any way I can do this without using extra space.
Note:The elements may be of the order 10^5 or larger
n = max(A)
answer = []
seen = set()
for i in A:
if i in seen:
answer.append(i)
else:
seen.add(i)
for i in xrange(1,n):
if i not in A:
answer.append(i)
print ans
You are indeed correct the complexity of this algorithm is O(n), which is the best you can achieve. You can try to optimize it by aborting the search as soon as you finish the duplicate value. But worst case your duplicate is at the back of the list and you still need to traverse it completely.
The use of hashing (your use of a set) is a good solution. There are a lot other approaches, for instance the use of Counters. But this won't change the assymptotic complexity of the algorithm.
As #Emisor advices, you can leverage the information that you have a list with 1 duplicate and 1 missing value. As you might know if you would have a list with no duplicate and no missing value, summing up all elements of the list would result in 1+2+3+..+n, which can be rewritten in the mathematical equivalent (n*n+1)/2
When you've discovered the duplicate value, you can calculate the missing value, without having to perform:
for i in xrange(1,n):
if i not in A:
answer.append(i)
Since you know the sum if all values would be present: total = (n*n+1)/2) = 15, and you know which value is duplicated. By taking the sum of the array A = [1,4,2,5,1] which is 13 and removing the duplicated value 1, results in 12.
Taking the calculated total and subtracting the calculated 12from it results in 3.
This all can be written in a single line:
(((len(A)+1)*(len(A)+2))/2)-sum(A)-duplicate
Slight optimization (i think)
def lalala2(A):
_max = 0
_sum = 0
seen = set()
duplicate = None
for i in A:
_sum += i
if _max < i:
_max = i
if i in seen:
duplicate = i
elif duplicate is None:
seen.add(i)
missing = -_sum + duplicate + (_max*(_max + 1)/2) # This last term means the sum of every number from 1 to N
return [duplicate , missing]
Looks a bit uglier, and i'm doing stuff like sum() and max() on my own instead of relying on Python's tools. But with this way, we only check every element once. Also, It'll stop adding stuff to the set once it's found the duplicate since it can calculate the missing element from it, once it knows the max

How to make this code execute faster?

def lookfor(alist, number):
if number in alist:
return alist.index(number)
else:
return "no"
So basically I input hundreds of thousands of numbers and I have to send each one of them thorugh "lookfor" to get an output of either the index of "number" in "alist" or get"no" if the number isn't there.
It perfectly computes when I input not as many numbers but takes several minutes when I input xx,xxx-xxx,xxx numbers.
Any suggestions?
Your code iterates through the list until it finds the number you seek (or until it reaches the end), and if it does find the number, it has to iterate the exact same amount to return the index. Why not take advantage of the behavior of the .index method? Just keep in mind that it raises a ValueError if the number is not present in the list.
def lookfor(alist, number):
try:
return alist.index(number)
except ValueError:
return "no"
afterword: use the timeit module to find the most efficient solution, but be sure to use a variety of inputs so that you can find the overall fastest solution.
def index_on(lst):
index = {val:i for i,val in enumerate(lst)}
def lookup(val):
return index.get(val, 'no')
return lookup
search = index_on(alist)
search('123-4567') # => 293 (index in alist)
search('123-4500') # => 'no' (not found)
Your code currently needs to search through the entire list for each call to lookfor. This can be very slow if alist is big enough.
Instead, you should create a dictionary that maps each element to its index in alist. For example, for alist = [7,4,88], you'd have: indexmap = {7:0, 4:1, 88:2}. Then you can search the dictionary with:
def lookfor(indexmap, number):
return indexmap.get(number, "no")
If alist is constant, you can create indexmap during initialization:
indexmap = {number: index for index,number in enumerate(alist)}
If alist changes over time, you can maintain this dictionary together with alist. For example, if you normally add items with append, you can use:
alist.append(number)
if number not in indexmap:
indexmap[number] = len(alist) - 1

Why doesnt this sort function work for Python

Please tell me why this sort function for Python isnt working :)
def sort(list):
if len(list)==0:
return list
elif len(list)==1:
return list
else:
for b in range(1,len(list)):
if list[b-1]>list[b]:
print (list[b-1])
hold = list[b-1]
list[b-1]=list[b]
list[b] = hold
a = [1,2,13,131,1,3,4]
print (sort(a))
It looks like you're attempting to implement a neighbor-sort algorithm. You need to repeat the loop N times. Since you only loop through the array once, you end up with the largest element being in its place (i.e., in the last index), but the rest is left unsorted.
You could debug your algorithm on your own, using pdb.
Or, you could use python's built-in sorting.
Lets take a look at you code. Sort is a built in Python function (at least I believe it is the same for both 2.7 and 3.X) So when you are making your own functions try to stay away from name that function with inbuilt functions unless you are going to override them (Which is a whole different topic.) This idea also applies to the parameter that you used. list is a type in the python language AKA you will not be able to use that variable name. Now for some work on your code after you change all the variables and etc...
When you are going through your function you only will swap is the 2 selected elements are next to each other when needed. This will not work with all list combinations. You have to be able to check that the current i that you are at is in the correct place. So if the end element is the lowest in the List then you have to have it swap all the way to the front of the list. There are many ways of sorting (ie. Quick sort, MergeSort,Bubble Sort) and this isnt the best way... :) Here is some help:
def sortThis(L):
if (len(L) == 0 or len(L) == 1):
return list
else:
for i in range(len(L)):
value = L[i]
j = i - 1
while (j >= 0) and (L[j] > value):
L[j+1] = L[j]
j -= 1
L[j+1] = value
a = [1,2,13,131,1,3,4]
sortThis(a)
print a
Take a look at this for more sorting Fun: QuickSort MergeSort
If it works, it would be the best sorting algotithm in the world (O(n)). Your algorithm only puts the greatest element at the end of the list. you have to apply recursively your function to list[:-1].
You should not use python reserved words

Categories

Resources