I have an array that represents an inverse parabola and I want to find the maximum which can be anywhere in the array. In my application I cannot take the derivative and I have to loop through the array.
I implemented this by iterating through the array starting on the left and until I get a value lower that the previous iteration:
import numpy as np
def simulation(n):
# create inverse parabola
num = 21
parabola= np.linspace(-8, 12, num=num)
parabola= -np.abs(parabola) ** 2
return parabola[n]
previous_iteration = -1000 # some initialization
for n in range(num):
# Configure the entire system
# Run simulation
# simulation(n) - a function returning simulation result with configuration "n"
simulation_result = simulation(n)
if previous_iteration < simulation_result :
previous_iteration = simulation_result
else:
best_iteration = n-1
break
print(best_iteration)
print(previous_iteration)
Is there a faster way to do this?
Edit:
The actual implementation will be on a FPGA and for every iteration I have to configure the system and run a simulation so every iteration costs a lot of time. If I run the simulation with all possible configurations, I will get a parabola vector but that will be time consuming and very inefficient.
Im looking for a way to find the maximum value while generating as less points as possible. Unfortunately the for loop has to stay because that is representation of how the system works. The key here is to change the code inside the for loop.
I edited the code to better explain what I mean.
An inverse parabola (sampled at evenly spaced points) has the property that the differences between adjacent points always decreases. The maximum is just before the differences go negative.
If the difference between the first two points is negative or zero, the maximum is the first point in the array.
Otherwise, do a binary search to find the smallest positive difference between two adjacent points. The maximum will be the second of these two points.
What you want to do is called a grid search. You need to define your search grid points (here in 'x') and calculate all the values of the parabola for these points (here in 'y'). You can use np.argmax to find the index of the argument that yields the maximum value:
import numpy as np
# Define inverse parabola
parabola = lambda x: -np.abs(x)**2
# Search axis
x = np.linspace(-8, 12, 21)
# Calculate parabola at each point
y = parabola(x)
# Find argument that yields maximum value
xmax = x[np.argmax(y)]
print(xmax) # Should be 0.0
I edited the post hopefully the problem clearer now.
What you're looking for is called a "ternary search": https://en.wikipedia.org/wiki/Ternary_search
It works to find the maximum of any function f(x) that has an increasing part, maybe followed by an all-equal part, followed by a decreasing part.
Given bounds low and high, pick 2 points m1 = low +(high-low)/3 and m2 = low + (high-low)*2/3.
Then if f(m1) > f(m2), you know the maximum is at x<=m2, because m2 can't be on the increasing part. So set high=m2 and try again.
Otherwise, you know the maximum is at x>=m1, so set low=m1 and try again.
repeat until high-low < 3 and then just pick whichever of those values is biggest.
Related
I have a set of sphere coordinates in 3D that evolves.
They represent a stack of spheres which are continuously removed from a box from the bottom of the geometry, and reinserted at the top at a random location. Since this kind of simulation is really periodic, I would like to simulate the drainage of the box a few times (say, 5 times, so t=1 takes positions 1 -> t=5 takes positions 5), and then come back to the first state to simulate the next steps (t=6 takes position 1, t=10 takes positions 5, same for t=11->15, etc.)
The problem is that at the coordinates of a given sphere (say, sphere 1) can be very different from the first state to the last simulated one. However, it is very important, for the sake of the simulation, to have a simulation as smooth as possible. If I had to quantify it, I would say that I need the distance between state 5 and state 6 for each pebble to be as low as possible.
It seems to me like an assignment problem. Is there any known solution and method for this kind of problems?
Here is an example of what I would like to have (I mostly use Python):
import numpy as np
# Mockup of the simulation positions
Nspheres = 100
Nsteps = 5 # number of simulated steps
coordinates = np.random.uniform(0,100, (Nsteps, Nspheres, 3)) # mockup x,y,z for each step
initial_positions = coordinates[0]
final_positions = coordinates[Nsteps-1]
**indices_adjust_initial_positions = adjust_initial_positions(initial_positions, final_positions) # to do**
adjusted_initial_positions = initial_positions[indices_adjust_initial_positions]
# Quantification of error made
mean_error = np.mean(np.abs(final_positions-adjusted_initial_positions))
max_error = np.max(np.abs(final_positions-adjusted_initial_positions))
print(mean_error, max_error)
# Assign it for each "cycle"
Ncycles = 5 # Number of times the simulation is repeated
simulation_coordinates = np.empty((Nsteps*Ncycles, Nspheres, 3))
simulation_coordinates[:Nsteps] = np.array(coordinates)
for n in range(1, Ncycles):
new_cycle_coordinates = simulation_coordinates[Nsteps*(n-1):Nsteps*(n):, indices_adjust_initial_positions, :]
simulation_coordinates[Nsteps*n:Nsteps*(n+1)] = new_cycle_coordinates
# Print result
print(simulation_coordinates)
The adjust_initial_positions would therefore take the initial and final states, and determine what would be the ideal set of indices to apply to the initial state to look the most like the final state. Please note that if that makes the problem any simpler, I do not really care if the very top spheres are not really matching between the two states, however it is important to be as close as possible at more towards the bottom.
Would you have any suggestion?
After some research, it seems that scipy.optimize has some nice features able to do something like it. If list1 is my first step, list2 is my last simulated step, we can do something like:
cost = np.linalg.norm(list2[:, np.newaxis, :] - list1, axis=2)
_, indexes = scipy.optimize.linear_sum_assignment(cost)
list3 = list1[indexes]
Therefore, list3 will be as close as list2 as possible thanks to the index sorting, while taking the positions of list1.
I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?
I have came across the following problem. I have created a program that estimates the trajectory of camera recording video along x and y axis. This approach is very common in warp-stabilizers. I want to find the range of values which represent the most stable video footage and get their values. Here is the graph of trajectory. Graph is based on the numpy array. I guess that the best idea would be to pick up the part where the increase of values is the slowest but i am not sure how to do it.
The following code will find the longest range of values that for both X and Y are stable for a certain threshold. This threshold limits the change in X and Y.
You can use it to tune your result.
If you want more stable section than choose epsilon lower.
This will result in a shorter range.
import numpy as np
X=150*np.random.rand(270) # mock data as I do not have yours
Y=150*np.random.rand(270) # Replace X and Y with your X and Y data
epsilon = 80 #threshold
# get indexes where the difference is smaller than the threshold for X and Y
valid_values = np.logical_and(np.abs(np.diff(X))<epsilon,np.abs(np.diff(Y))<epsilon)
cummulative_valid_values=[]
count = 0
# find longest range of values that satisfy the threshold
for id,value in enumerate(valid_values):
if value == True:
count=count+1
else:
count = 0
cummulative_valid_values.append(count)
# Calculate start and end of largest stable range
end_of_range = cummulative_valid_values.index(max(cummulative_valid_values))+1
start_of_range = end_of_range-max(cummulative_valid_values)+1
print("Largest stable range is: ",start_of_range," - ",end_of_range)
What is the most efficient way compute (euclidean) distance of the nearest neighbor for each point in an array?
I have a list of 100k (X,Y,Z) points and I would like to compute a list of nearest neighbor distances. The index of the distance would correspond to the index of the point.
I've looked into PYOD and sklearn neighbors, but those seem to require "teaching". I think my problem is simpler than that. For each point: find nearest neighbor, compute distance.
Example data:
points = [
(0 0 1322.1695
0.006711111 0 1322.1696
0.026844444 0 1322.1697
0.0604 0 1322.1649
0.107377778 0 1322.1651
0.167777778 0 1322.1634
0.2416 0 1322.1629
0.328844444 0 1322.1631
0.429511111 0 1322.1627...)]
compute k = 1 nearest neighbor distances
result format:
results = [nearest neighbor distance]
example results:
results = [
0.005939372
0.005939372
0.017815632
0.030118587
0.041569616
0.053475883
0.065324964
0.077200014
0.089077602)
]
UPDATE:
I've implemented two of the approaches suggested.
Use the scipy.spatial.cdist to compute the full distances matrices
Use a nearest X neighbors in radius R to find subset of neighbor distances for every point and return the smallest.
Results are that Method 2 is faster than Method 1 but took a lot more effort to implement (makes sense).
It seems the limiting factor for Method 1 is the memory needed to run the full computation, especially when my data set is approaching 10^5 (x, y, z) points. For my data set of 23k points, it takes ~ 100 seconds to capture the minimum distances.
For method 2, the speed scales as n_radius^2. That is, "neighbor radius squared", which really means that the algorithm scales ~ linearly with number of included neighbors. Using a Radius of ~ 5 (more than enough given application) it took 5 seconds, for the set of 23k points, to provide a list of mins in the same order as the point_list themselves. The difference matrix between the "exact solution" and Method 2 is basically zero.
Thanks for everyones' help!
Similar to Caleb's answer, but you could stop the iterative loop if you get a distance greater than some previous minimum distance (sorry - no code).
I used to program video games. It would take too much CPU to calculate the actual distance between two points. What we did was divide the "screen" into larger Cartesian squares and avoid the actual distance calculation if the Delta-X or Delta-Y was "too far away" - That's just subtraction, so maybe something like that to qualify where the actual Eucledian distance metric calculation is needed (extend to n-dimensions as needed)?
EDIT - expanding "too far away" candidate pair selection comments.
For brevity, I'll assume a 2-D landscape.
Take the point of interest (X0,Y0) and "draw" an nxn square around that point, with (X0,Y0) at the origin.
Go through the initial list of points and form a list of candidate points that are within that square. While doing that, if the DeltaX [ABS(Xi-X0)] is outside of the square, there is no need to calculate the DeltaY.
If there are no candidate points, make the square larger and iterate.
If there is exactly one candidate point and it is within the radius of the circle incribed by the square, that is your minimum.
If there are "too many" candidates, make the square smaller, but you only need to reexamine the candidate list from this iteration, not all the points.
If there are not "too many" candidates, then calculate the distance for that list. When doing so, first calculate DeltaX^2 + DeltaY^2 for the first candidate. If for subsequent candidates the DetlaX^2 is greater than the minumin so far, no need to calculate the DeltaY^2.
The minimum from that calculation is the minimum if it is within the radius of the circle inscribed by the square.
If not, you need to go back to a previous candidate list that includes points within the circle that has the radius of that minimum. For example, if you ended with one candidate in a 2x2 square that happened to be on the vertex X=1, Y=1, distance/radius would be SQRT(2). So go back to a previous candidate list that has a square greated or equal to 2xSQRT(2).
If warranted, generate a new candidate list that only includes points withing the +/- SQRT(2) square.
Calculate distance for those candidate points as described above - omitting any that exceed the minimum calcluated so far.
No need to do the square root of the sum of the Delta^2 until you have only one candidate.
How to size the initial square, or if it should be a rectangle, and how to increase or decrease the size of the square/rectangle could be influenced by application knowledge of the data distribution.
I would consider recursive algorithms for some of this if the language you are using supports that.
How about this?
from scipy.spatial import distance
A = (0.003467119 ,0.01422762 ,0.0101960126)
B = (0.007279433 ,0.01651597 ,0.0045558849)
C = (0.005392258 ,0.02149997 ,0.0177409387)
D = (0.017898802 ,0.02790659 ,0.0006487222)
E = (0.013564214 ,0.01835688 ,0.0008102952)
F = (0.013375397 ,0.02210725 ,0.0286032185)
points = [A, B, C, D, E, F]
results = []
for point in points:
distances = [{'point':point, 'neighbor':p, 'd':distance.euclidean(point, p)} for p in points if p != point]
results.append(min(distances, key=lambda k:k['d']))
results will be a list of objects, like this:
results = [
{'point':(x1, y1, z1), 'neighbor':(x2, y2, z2), 'd':"distance from point to neighbor"},
...]
Where point is the reference point and neighbor is point's closest neighbor.
The fastest option available to you may be scipy.spatial.distance.cdist, which finds the pairwise distances between all of the points in its input. While finding all of those distances may not be the fastest algorithm to find the nearest neighbors, cdist is implemented in C, so it is likely run faster than anything you try in Python.
import scipy as sp
import scipy.spatial
from scipy.spatial.distance import cdist
points = sp.array(...)
distances = sp.spatial.distance.cdist(points)
# An element is not its own nearest neighbor
sp.fill_diagonal(distances, sp.inf)
# Find the index of each element's nearest neighbor
mins = distances.argmin(0)
# Extract the nearest neighbors from the data by row indexing
nearest_neighbors = points[mins, :]
# Put the arrays in the specified shape
results = np.stack((points, nearest_neighbors), 1)
You could theoretically make this run faster (mostly by combining all of the steps into one algorithm), but unless you're writing in C, you won't be able to compete with SciPy/NumPy.
(cdist runs in Θ(n2) time (if the size of each point is fixed), and every other part of the algorithm in O(n) time, so even if you did try to optimize the code in Python, you wouldn't notice the change for small amounts of data, and the improvements would be overshadowed by cdist for more data.)
I have data that is a list of positions from 0-20 with corresponding intensities, in increments of 0.1. They are discrete values and not a function. The values are (close to) symmetrical and there should be a center point around 10 where if you fold the plot on itself you would have overlapping points. I've done this so far by summing the difference between each pair of values equidistant from center. The data is currently in a dataframe.
This seems different than the question "how to minimize a function with discrete variable values in scipy" because there the objective was actually to find the minimum of a function but I don't have a function.
The issue is there are many options that might actually be the best center position (realistically, from ~9-11 in increments of 0.1) and I don't want to have to manually change the center value, but it's not a function so fmin from scipy.optimize returns values for the center which are not in 0.1 increments. My code so far that does it is:
#Position data has values from 0-20 in 0.1 increments
stepsize = 0.1
max_pos = 15 #max that center could be at
HAB_slice['Position Relative to Center'] = HAB_slice['Position'] - center_guess #Defines position relative to center
HAB_slice['Radial Position'] = np.abs(HAB_slice['Position Relative to Center'].values) #absolute value based on previous
possible_pos = np.linspace(0, max_pos, max_pos / stepsize+1)
for i in range(0, len(possible_pos)): # Loop sums absolute values of difference between same absolute difference from zero. Smaller value is generally best value
temp = HAB_slice[HAB_slice['Radial Position']==possible_pos[i]]
if len(temp) == 2:
center_sum += np.abs(temp['I'].diff().values[1])
And then manually changing the value for center_guess until center_sum is the smallest that it gets, which is really tedious. Note 'I' values are basically the y values.
Can someone show me a method of automating this that does not require the minimizing thing to be a function so that it iterates through the actual values of 'Position' to find the one that yields the smallest center_sum?