Finding local minima in 2D array, in O(N) time

Finding local minima in 2D array, in O(N) time - python

Given an NxN matrix where all numbers are unique, find ANY local minima in O(N) time. Here is an example of the matrix. It's given in
lst = [[30,100,20,19,18],
[29,101,21,104,17],
[28,102,22,105,16],
[27,103,23,106,15],
[26,25,24,107,14]]
So the way to solve it in O(N) time is to essentially split the list into 4 sections ("windows"), find the smallest element in the green (middle row & column) and compare it with the next "neighboring" smallest value. If not smaller than it's neighbor, recurse to either top left, top right, bottom left, bottom right, depending on which position that smaller than mid_smallest is in.
(slide 20)
https://courses.csail.mit.edu/6.006/spring11/lectures/lec02.pdf
I have everything written out so far, but I'm really really struggling to find a way to recurse through one of the 4 sections: Here's the pseudocode:
The portion I'm having a difficulty with is the recursion part ex: Find_find_minimum(Top-left_sub_matrix)
How can I recurse through the top left/right or bottom left/right matrix without creating another matrix?

Solved it. I modularized my function with different inputs as indicators.

Related

Suppose an array contains only two kinds of elements, how to quickly find their boundaries?

I've asked a similar question before, but this time it's different.
Since our array contains only two elements, we might as well set it to 1 and -1, where 1 is on the left side of the array and -1 is on the right side of the array:
[1,...,1,1,-1,-1,...,-1]
Both 1 and -1 exist at the same time and the number of 1 and -1 is not necessarily the same. Also, the numbers of 1 and -1 are both very large.
Then, define the boundary between 1 and -1 as the index of the -1 closest to 1. For example, for the following array:
[1,1,1,-1,-1,-1,-1]
Its boundary is 3.
Now, for each number in the array, I cover it with a device that you have to unlock to see the number in it.
I want to try to unlock as few devices as possible that cover 1, because it takes much longer to see a '1' than it takes to see a '-1'. And I also want to reduce my time cost as much as possible.
How can I search to get the boundary as quickly as possible?

The problem is very like the "egg dropping" problem, but where a wrong guess has a large fixed cost (100), and a good guess has a small cost (1).
Let E(n) be the (optimal) expected cost of finding the index of the right-most 1 in an array (or finding that the array is all -1), assuming each possible position of the boundary is equally likely. Define the index of the right-most 1 to be -1 if the array is all -1.
If you choose to look at the array element at index i, then it's -1 with probability i/(n+1), and 1 with probability (n-i+1)/(n+1).
So if you look at array element i, your expected cost for finding the boundary is (1+E(i)) * i/(n+1) + (100+E(n-i-1)) * (n-i+1)/(n+1).
Thus E(n) = min((1+E(i)) * i/(n+1) + (100+E(n-i-1)) * (n-i+1)/(n+1), i=0..n-1)
For each n, the i that minimizes the equation is the optimal array element to look at for an array of that length.
I don't think you can solve these equations analytically, but you can solve them with dynamic programming in O(n^2) time.
The solution is going to look like a very skewed binary search for large n. For smaller n, it'll be skewed so much that it will be a traversal from the right.

If I am right, a strategy to minimize the expectation of the cost is to draw at a fraction of the interval that favors the -1 outcome, in inverse proportion of the cost. So instead of picking the middle index, take the right centile.
But this still corresponds to a logarithmic asymptotic complexity.
There is probably nothing that you can do regarding the worst case.

Finding the coordinates to randomly distribute images on a display without overlap

I have the following code which randomly generates a list of (X,Y) tuples:
import random
coords = []
for i in range(10):
x = random.randint(85,939)
y = random.randint(75,693)
coords.append((x,y))
In the final list, the X values of each tuple are considered to overlap if the absolute difference between them is less then 85, and the Y values are considered to overlap if the absoulte difference is less than 75. How can I make sure that none of the tuples in the final list will overlap in both dimensions?

The easiest way to do this is to just keep sampling and discarding coordinates which will create overlap. This will however become very inefficient when you come close to filling the available space. If that's not an issue you should go with this solution.
A bit more efficient and as far as I can tell statistically equivalent is to sample one coordinate, like the row, first. Then compute the occupied area in that row and sample from the remaining positions if there are any.
To avoid the same problem as in the easy solution, if there are no available spaces in a row, it should be removed from the possible sampling outcomes for the row (plus the margin of 75 in both directions).
Ideally, you don't compute the occupied regions each time but you keep a mapping from the row to the occupied space in that row and the amount of non-full rows and just update this mapping when inserting new images. You will need storage for n_rows + 1 extra numbers.
To clarify: When sampling from a restricted space just subtract the occupied positions and get a sampling result n. Then find the correct position for n by walking along the coordinate axis n steps, skipping all the occupied positions.

Number of shortest paths

Here is the problem:
Given the input n = 4 x = 5, we must imagine a chessboard that is 4 squares across (x-axis) and 5 squares tall (y-axis). (This input changes, all the up to n = 200 x = 200)
Then, we are asked to determine the minimum shortest path from the bottom left square on the board to the top right square on the board for the Knight (the Knight can move 2 spaces on one axis, then 1 space on the other axis).
My current ideas:
Use a 2d array to store all the possible moves, perform breadth-first
search(BFS) on the 2d array to find the shortest path.
Floyd-Warshall shortest path algorithm.
Create an adjacency list and perform BFS on that (but I think this would be inefficient).
To be honest though I don't really have a solid grasp on the logic.
Can anyone help me with psuedocode, python code, or even just a logical walk-through of the problem?

BFS is efficient enough for this problem as it's complexity is O(n*x) since you explore each cell only one time. For keeping the number of shortest paths, you just have to keep an auxiliary array to save them.
You can also use A* to solve this faster but it's not necessary in this case because it is a programming contest problem.
dist = {}
ways = {}
def bfs():
start = 1,1
goal = 6,6
queue = [start]
dist[start] = 0
ways[start] = 1
while len(queue):
cur = queue[0]
queue.pop(0)
if cur == goal:
print "reached goal in %d moves and %d ways"%(dist[cur],ways[cur])
return
for move in [ (1,2),(2,1),(-1,-2),(-2,-1),(1,-2),(-1,2),(-2,1),(2,-1) ]:
next_pos = cur[0]+move[0], cur[1]+move[1]
if next_pos[0] > goal[0] or next_pos[1] > goal[1] or next_pos[0] < 1 or next_pos[1] < 1:
continue
if next_pos in dist and dist[next_pos] == dist[cur]+1:
ways[next_pos] += ways[cur]
if next_pos not in dist:
dist[next_pos] = dist[cur]+1
ways[next_pos] = ways[cur]
queue.append(next_pos)
bfs()
Output
reached goal in 4 moves and 4 ways
Note that the number of ways to reach the goal can get exponentially big

I suggest:
Use BFS backwards from the target location to calculate (in just O(nx) total time) the minimum distance to the target (x, n) in knight's moves from each other square. For each starting square (i, j), store this distance in d[i][j].
Calculate c[i][j], the number of minimum-length paths starting at (i, j) and ending at the target (x, n), recursively as follows:
c[x][n] = 1
c[i][j] = the sum of c[p][q] over all (p, q) such that both
(p, q) is a knight's-move-neighbour of (i, j), and
d[p][q] = d[i][j]-1.
Use memoisation in step 2 to keep the recursion from taking exponential time. Alternatively, you can compute c[][] bottom-up with a slightly modified second BFS (also backwards) as follows:
c = x by n array with each entry initially 0;
seen = x by n array with each entry initially 0;
s = createQueue();
push(s, (x, n));
while (notEmpty(s)) {
(i, j) = pop(s);
for (each location (p, q) that is a knight's-move-neighbour of (i, j) {
if (d[p][q] == d[i][j] + 1) {
c[p][q] = c[p][q] + c[i][j];
if (seen[p][q] == 0) {
push(s, (p, q));
seen[p][q] = 1;
}
}
}
}
The idea here is to always compute c[][] values for all positions having some given distance from the target before computing any c[][] value for a position having a larger distance, as the latter depend on the former.
The length of a shortest path will be d[1][1], and the number of such shortest paths will be c[1][1]. Total computation time is O(nx), which is clearly best-possible in an asymptotic sense.

My approach to this question would be backtracking as the number of squares in the x-axis and y-axis are different.
Note: Backtracking algorithms can be slow for certain cases and fast for the other
Create a 2-d Array for the chess-board. You know the staring index and the final index. To reach to the final index u need to keep close to the diagonal that's joining the two indexes.
From the starting index see all the indexes that the knight can travel to, choose the index which is closest to the diagonal indexes and keep on traversing, if there is no way to travel any further backtrack one step and move to the next location available from there.
PS : This is a bit similar to a well known problem Knight's Tour, in which choosing any starting point you have to find that path in which the knight whould cover all squares. I have codes this as a java gui application, I can send you the link if you want any help
Hope this helps!!

Try something. Draw boards of the following sizes: 1x1, 2x2, 3x3, 4x4, and a few odd ones like 2x4 and 3x4. Starting with the smallest board and working to the largest, start at the bottom left corner and write a 0, then find all moves from zero and write a 1, find all moves from 1 and write a 2, etc. Do this until there are no more possible moves.
After doing this for all 6 boards, you should have noticed a pattern: Some squares couldn't be moved to until you got a larger board, but once a square was "discovered" (ie could be reached), the number of minimum moves to that square was constant for all boards not smaller than the board on which it was first discovered. (Smaller means less than n OR less than x, not less than (n * x) )
This tells something powerful, anecdotally. All squares have a number associated with them that must be discovered. This number is a property of the square, NOT the board, and is NOT dependent on size/shape of the board. It is always true. However, if the square cannot be reached, then obviously the number is not applicable.
So you need to find the number of every square on a 200x200 board, and you need a way to see if a board is a subset of another board to determine if a square is reachable.
Remember, in these programming challenges, some questions that are really hard can be solved in O(1) time by using lookup tables. I'm not saying this one can, but keep that trick in mind. For this one, pre-calculating the 200x200 board numbers and saving them in an array could save a lot of time, whether it is done only once on first run or run before submission and then the results are hard coded in.
If the problem needs move sequences rather than number of moves, the idea is the same: save move sequences with the numbers.

Sorting points on multiple lines

Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far "below" other lines that calculating their points is unnecessary.

Your data structure is a set of tuples
lines = {(y0, Δy0), (y1, Δy1), ...}
You need only the ntop points, hence build a set containing only
the top ntop yi values, with a single pass over the data
top_points = choose(lines, ntop)
EDIT --- to choose the ntop we had to keep track of the smallest
one, and this is interesting info, so let's return also this value
from choose, also we need to initialize decremented
top_points, smallest = choose(lines, ntop)
decremented = top_points
and start a loop...
while True:
Generate a set of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
if decremented == {}: break
Generate a set of candidates
candidates = top_lines.union(decremented)
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
check if new_top_points == top_points
if new_top_points == top_points: break
top_points = new_top_points</strike>
of course we are in a loop...
The difficult part is the choose function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.

It's not a really complicated thing, just a "normal" sorting problem.
Usually sorting requires a large amount of computing time. But your case is one where you don't need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
identify if a graph is growing or falling.
write a generator, that generates the values; from left to right if raising, form right to left if falling.
get the first value from both graphs
insert the lower on into the result list
get a new value from the graph that had the lower value
repeat the last two steps until one generator is "empty"
append the leftover items from the other generator.

Performance of a "fuzzy" Jaccard index implementation

I'm a trying to calculate a kind of fuzzy Jaccard index between two sets with the following rationale: as the Jaccard index, I want to calculate the ratio between the number of items that are common to both sets and the total number of different items in both sets. The problem is that I want to use a similarity function with a threshold to determine what what counts as the "same" item being in both sets, so that items that are similar:
Aren't counted twice in the union
Are counted in the intersection.
I have a working implementation here (in python):
def fuzzy_jaccard(set1, set2, similarity, threshold):
intersection_size = union_size = len(set1 & set2)
shorter_difference, longer_difference = sorted([set2 - set1, set1 - set2], key=len)
while len(shorter_difference) > 0:
item1, item2 = max(
itertools.product(longer_difference, shorter_difference),
key=lambda (a, b): similarity(a, b)
)
longer_difference.remove(item1)
shorter_difference.remove(item2)
if similarity(item1, item2) > threshold:
union_size += 1
intersection_size += 1
else:
union_size += 2
union_size = union_size + len(longer_difference)
return intersection_size / union_size
The problem here is the this is quadratic in the size of the sets, because in itertools.product I iterate in all possible pairs of items taken one from each set(*). Now, I think I must do this because I want to match each item a from set1 with the best possible candidate b from set2 that isn't more similar to another item a' from set1.
I have a feeling that there should be a O(n) way of doing that I'm not grasping. Do you have any suggestions?
There are other issues two, like recalculating the similarity for each pair once I get the best match, but I don't care to much about them.

I doubt there's any way that would be O(n) in the general case, but you can probably do a lot better than O(n^2) at least for most cases.
Is similarity transitive? By this I mean: can you assume that distance(a, c) <= distance(a, b) + distance(b, c)? If not, this answer probably won't help. I'm treating similarities like distances.
Try clumping the data:
Pick a radius r. Based on intuition, I suggest setting r to one-third of the average of the first 5 similarities you calculate, or something.
The first point you pick in set1 becomes the centre of your first clump. Classify the points in set2 as being in the clump (similarity to the centre point <= r) or outside the clump. Also keep track of points that are within 2r of the clump centre.
You can require that clump centre points be at least a distance of 2r from each other; in that case some points may not be in any clump. I suggest making them at least r from each other. (Maybe less if you're dealing with a large number of dimensions.) You could treat every point as a clump centre but then you wouldn't save any processing time.
When you pick a new point, first compare it with the clump centre points (even though they're in the same set). Either it's in an already existing clump, or it becomes a new clump centre, (or perhaps neither if it's between r and 2r of a clump centre). If it's within r of a clump centre, then compare it with all points in the other set that are within 2r of that clump centre. You may be able to ignore points further than 2r from the clump centre. If you don't find a similar point within the clump (perhaps because the clump has no points left), then you may have to scan all the rest of the points for that case. Hopefully this would mostly happen only when there aren't many points left in the set. If this works well, then in most cases you'd find the most similar point within the clump and would know that it's the most similar point.
This idea may require some tweaking.
If there are a large number of dimenstions involved, then you might find that for a given radius r, frustratingly many points are within 2r of each other while few are within r of each other.
Here's another algorithm. The more time-consuming it is to calculate your similarity function (as compared to the time it takes to maintain sorted lists of points) the more index points you might want to have. If you know the number of dimensions, it might make sense to use that number of index points. You might reject a point as a candidate index point if it's too similar to another index point.
For each of the first point you use and any others you decide to use as index points, generate a list of all the remaining points in the other set, sorted in order of distance from the index point,
When you're comparing a point P1 to points in the other set, I think you can skip over sets for two possible reasons. Consider the most similar point P2 you've found to P1. If P2 is similar to an index point then you can skip all points which are sufficiently dissimilar from that index point. If P2 is dissimilar to an index point then you can skip over all points which are sufficiently similar to that index point. I think in some cases you can skip over some of both types of point for the same index point.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.