I have a function which receives an integer as an input and depending on what range this input lies in, assigns to it a difficulty value. I know that this can be done using if else loops. I was wondering whether there is a more efficient/cleaner way to do it.
I tried to do something like this
TIME_RATING_KEY ={
range(0,46):1,
range(46,91):2,
range(91,136):3,
range(136,201):4,
range(201,10800):5,
}
But found out that we can use range as a key in dict(right?). So is there a better way to do this?
You can implement an interval tree. This kind of data structures are able to return all the intervals that intersect a given input point.
In your case intervals don't overlap, so they would always return 1 interval.
Centered interval trees run in O(log n + m) time, where m is the number of intervals returned (1 in your case). So this would reduce the complexity from O(n) to O(log n).
The idea of these interval trees is the following:
You consider the interval that encloses all the intervals you have
Take the center of that interval and partition the given intervals into those that end before that point, those that contain that point and those that start after it.
Recursively construct the same kind of tree for the intervals ending before the center and those starting after it
Keep the intervals that contain the center point in two sorted sequences. One sorted by starting point, and the other sorted by ending point
When searching go left or right depending on the center point. When you find an overlap you use binary search on the sorted sequence you want to check (this allows for looking up not only intervals that contain a given point but intervals that intersect or contain a given interval).
It's trivial to modify the data structure to return a specific value instead of the found interval.
This said, from the context I don't think you actually need to reduce the efficiency of this lookup and you should probably use the simpler and more readable solution since it would be more maintainable and there are less chances to make mistakes.
However reading about the mroe efficient data structure can turn out useful in the future.
The simplest way is probably just to write a short function:
def convert(n, difficulties=[0, 46, 91, 136, 201]):
if n < difficulties[0]:
raise ValueError
for difficulty, end in enumerate(difficulties):
if n < end:
return difficulty
else:
return len(difficulties)
Examples:
>>> convert(32)
1
>>> convert(68)
2
>>> convert(150)
4
>>> convert(250)
5
As a side note: You can use a range as a dictionary key in Python 3.x, but not directly in 2.x (because range returns a list). You could do:
TIME_RATING_KEY = {tuple(range(0, 46)): 1, ...}
However that won't be much help!
Related
I am trying to make an algorithm that propagates from point to point in a distance matrix using the smallest distance in the proximity. The code has two conditions: the minimum distance must be no less than 0 and each point must be visited once and return to the starting position.
This is my code in its entirety:
def totalDistance(aList):
path = []
for j in range(0,len(aList)):
k=j
order = []
for l in range(0,len(aList)):
order.append(k)
initval= min(x for x in aList[k] if x > 0 )
k = aList[k].index(initval)
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
path.append(order)
return path
The code is meant to return the indexes of the points in within the closes proximity of the evaluated point.
aList = [[0,3,4,6],[3,0,7,3],[4,7,0,9],[6,3,9,0]] and represents the distance matrix.
When running the code, I get the following error:
initval= min(x for x in aList[k] if x > 0 )
ValueError: min() arg is an empty sequence
I presume that when I make the columns in my distance matrix zero with the following function:
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
the min() function is unable to find a value with the given conditions. Is there a better way to format my code such that this does not occur or a better approach to this problem all together?
One technique and a pointer on the rest that you say is working...
For preventing re-visiting / backtracking. One of the common design patterns for this is to keep a separate data structure to "mark" the places you've been. Because your points are numerically indexed, you could use a list of booleans, but I think it is much easier to just keep a set of the places you've been. Something like this...
visited = set() # places already seen
# If I decide to visit point/index "3"...
visited.add(3)
Not really a great practice to modify your input data as you are doing, and especially so if you are looping over it, which you are...leads to headaches.
So then... Your current error is occurring because when you screen the rows for x>0 you eventually get an empty list because you are changing values and then min() chokes. So part of above can fix that, and you don't need to zero-ize, just mark them.
Then, the obvious question...how to use the marks? You can just use it as a part of your search. And it can work well with the enumerate command which can return index values and the value by enumeration.
Try something like this, which will make a list of "eligible" tuples with the distance and index location.
pts_to_consider = [(dist, idx) for idx, dist in enumerate(aList[k])
if dist > 0
and idx not in visited]
There are other ways to do this with numpy and other things, but this is a reasonable approach and close to what you have in code now. Comment back if stuck. I don't want to give away the whole farm because this is probably H/W. Perhaps you can use some of the hints here.
Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far "below" other lines that calculating their points is unnecessary.
Your data structure is a set of tuples
lines = {(y0, Δy0), (y1, Δy1), ...}
You need only the ntop points, hence build a set containing only
the top ntop yi values, with a single pass over the data
top_points = choose(lines, ntop)
EDIT --- to choose the ntop we had to keep track of the smallest
one, and this is interesting info, so let's return also this value
from choose, also we need to initialize decremented
top_points, smallest = choose(lines, ntop)
decremented = top_points
and start a loop...
while True:
Generate a set of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
if decremented == {}: break
Generate a set of candidates
candidates = top_lines.union(decremented)
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
check if new_top_points == top_points
if new_top_points == top_points: break
top_points = new_top_points</strike>
of course we are in a loop...
The difficult part is the choose function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.
It's not a really complicated thing, just a "normal" sorting problem.
Usually sorting requires a large amount of computing time. But your case is one where you don't need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
identify if a graph is growing or falling.
write a generator, that generates the values; from left to right if raising, form right to left if falling.
get the first value from both graphs
insert the lower on into the result list
get a new value from the graph that had the lower value
repeat the last two steps until one generator is "empty"
append the leftover items from the other generator.
I have a list of tuples, which I am using to mark the lower and upper bounds of ranges. For example:
[(3,10), (4,11), (2,6), (8,11), (9,11)] # 5 separate ranges.
I want to find the ranges where three or more of the input ranges overlap. For instance the tuples listed above would return:
[(4,6), (8,11)]
I tried the method provided by #WolframH in answer to this post
But I don't know what I can do to:
Give me more than one output range
Set a threshold of at least three range overlaps to qualify an output
You first have to find all combinations of ranges. Then you can transform them to sets and calculate the intersection:
import itertools
limits = [(3,10), (4,11), (2,6), (8,11), (9,11)]
ranges = [range(*lim) for lim in limits]
results = []
for comb in itertools.combinations(ranges,3):
intersection = set(comb[0]).intersection(comb[1])
intersection = intersection.intersection(comb[2])
if intersection and intersection not in results and\
not any(map(intersection.issubset, results)):
results = filter(lambda res: not intersection.issuperset(res),results)
results.append(intersection)
result_limits = [(res[0], res[-1]+1) for res in map(list,results)]
It should give you all 3-wise intersections
You can, of course, solve this by brute-force checking all the combinations if you want. If you need this algorithm to scale, though, you can do it in (pseudo) nlogn. You can technically come up with a degenerate worst-case that's O(n**2), but whatchagonnado.
Basically, you sort the ranges, then for a given range you look to its immediate left to see that the bounds overlap, and if so you then look right to mark overlapping intervals as results. Pseudocode (which is actually almost valid python, look at that):
ranges.sort()
for left_range, current_range, right_range in sliding_window(ranges, 3):
if left_range.right < current_range.left:
continue
while right_range.left < min(left_range.right, current_range.right):
results.append(overlap(left_range, right_range))
right_range = right_range.next
#Before moving on to the next node, extend the current_range's right bound
#to be the longer of (left_range.right, current_range.right)
#This makes sense if you think about it.
current_range.right = max(left_range.right, current_range.right)
merge_overlapping(results)
(you also need to merge some possibly-overlapping ranges at the end, this is another nlogn operation - though n will usually be much smaller there. I won't discuss the code for that, but it's similar in approach to the above, involving a sort-then-merge. See here for an example.)
I have a dictionary which has coordinates as keys. They are by default in 3 dimensions, like dictionary[(x,y,z)]=values, but may be in any dimension, so the code can't be hard coded for 3.
I need to find if there are other values within a certain radius of a new coordinate, and I ideally need to do it without having to import any plugins such as numpy.
My initial thought was to split the input into a cube and check no points match, but obviously that is limited to integer coordinates, and would grow exponentially slower (radius of 5 would require 729x the processing), and with my initial code taking at least a minute for relatively small values, I can't really afford this.
I heard finding the nearest neighbor may be the best way, and ideally, cutting down the keys used to a range of +- a certain amount would be good, but I don't know how you'd do that when there's more the one point being used.Here's how I'd do it with my current knowledge:
dimensions = 3
minimumDistance = 0.9
#example dictionary + input
dictionary[(0,0,0)]=[]
dictionary[(0,0,1)]=[]
keyToAdd = [0,1,1]
closestMatch = 2**1000
tooClose = False
for keys in dictionary:
#calculate distance to new point
originalCoordinates = str(split( dictionary[keys], "," ) ).replace("(","").replace(")","")
for i in range(dimensions):
distanceToPoint = #do pythagors with originalCoordinates and keyToAdd
#if you want the overall closest match
if distanceToPoint < closestMatch:
closestMatch = distanceToPoint
#if you want to just check it's not within that radius
if distanceToPoint < minimumDistance:
tooClose = True
break
However, performing calculations this way may still run very slow (it must do this to millions of values). I've searched the problem, but most people seem to have simpler sets of data to do this to. If anyone can offer any tips I'd be grateful.
You say you need to determine IF there are any keys within a given radius of a particular point. Thus, you only need to scan the keys, computing the distance of each to the point until you find one within the specified radius. (And if you do comparisons to the square of the radius, you can avoid the square roots needed for the actual distance.)
One optimization would be to sort the keys based on their "Manhattan distance" from the point (that is, add the component offsets), since the Euclidean distance will never be less than this. This would avoid some of the more expensive calculations (though I don't think you need and trigonometry).
If, as you suggest later in the question, you need to handle multiple points, you can obviously process each individually, or you could find the center of those points and sort based on that.
Write the function sinusoid(a, w, n) that will return a list of ordered pairs representing n cycles of a sinusoid with amplitude a and frequency w. Each cycle should contain 180 ordered pairs.
So far I have:
def sinusoid(a,w,n):
return [a*sin(x) for x in range 180]
Please consider the actual functional form of a sinusoidal wave and how the frequency comes into the equation. (Hint: http://en.wikipedia.org/wiki/Sine_wave).
Not sure what is meant exactly by 'ordered pairs', but I would assume it means the x,y pairs. Currently you're only returning a list of single values. Also you might want to take a look at the documentation for Python's sin function.
Okay, we know this is a homework assignment and we're not going to do it for you. However, I'll give you a couple hints.
The instructions:
Write the function sinusoid(a, w, n) that will return a list of ordered pairs representing n cycles of a sinusoid with amplitude a and frequency w. Each cycle should contain 180 ordered pairs.
... translated into a bullet list of requirements:
Write a function
... named sinusoid()
... taking three arguments: a, w, and n
returning a list
... of n cycles(?)
... (each consisting of?) 180 "ordered pairs"
The example you've given does define a function, by the correct name, and taking the correct number of arguments. That's a start (not much of one, frankly, but it's something).
The obvious failings are that it doesn't use two of the arguments that are required and it doesn't return pairs of anything. It seems that it would return 180 numbers which are based on the argument supplied to its first parameter.
Surely you can do a bit better than that.
Let's start with a stub:
def sinusoid(a, w, n):
'''Return n cycles of the sinusoid for a given amplitude and frequence
where each cycle consists of 180 ordered pairs
'''
results = list()
# do stuff here
return results
That's a function, takes three arguments and returns a list. Now for that list to contain anything before we return it we'll have to append some things to it ... and the instructions tell us how many things it should return (n times 180) and what sorts of things they should be (ordered pairs).
That sounds quite a bit like we'll need a loop (for n) and another (for 180). Hmmm ...
That might look like:
for each_cycle in range(n):
for each_pair in range(180):
# do something here
results.append(something) # where something is a tuple ... an "ordered pair"
... or it might look like:
for each_cycle in range(n):
this_cycle = list()
for each_pair in range(180):
this_cycle.append(something)
results.extend(this_cycle)
... or it might even look like:
for each_pair in range(n*180):
results.append(something)
... though, frankly, that seems unlikely. (If you try flattening the inner loop to the outer loop in this way you might find that you're having to use modulo arithmetic to get n back out for some other intermediate computational purposes).
I have no idea what the instructor is actually asking for. It seems likely that the math.sin() function will be involved and I guess "ordered pairs" might be co-ordinates mapped to some sort of graphics subsystem and suitable for plotting a graph. I guess 180 of these to show the sinusoid wave through a full range of its values. Maybe you're supposed to multiply something by the amplitude and/or divide something else by the frequency and maybe you're supposed to even add something for each cycle ... some sort of offset to keep the plot moving towards the right or something.
But it seems like you might start with that stub of a function definition and try pasting in one or another of these loop bodies and then figuring out how to actually return meaningful values in the parts where I've used "something" as a placeholder.
Going with the assumption that these "ordered pairs" are co-ordinates, for plotting, then it seems likely that each of the things you append to your results should be of the form (x,y) where x is monotonically increasing (fancy way of saying it keeps going up, never goes down) and might even always be the range(0,n*180) and y is probably math.sin() of something involved a and w ... but that's just speculation on my part.