Need to find solution that yields minimum value - not a function - python

I have data that is a list of positions from 0-20 with corresponding intensities, in increments of 0.1. They are discrete values and not a function. The values are (close to) symmetrical and there should be a center point around 10 where if you fold the plot on itself you would have overlapping points. I've done this so far by summing the difference between each pair of values equidistant from center. The data is currently in a dataframe.
This seems different than the question "how to minimize a function with discrete variable values in scipy" because there the objective was actually to find the minimum of a function but I don't have a function.
The issue is there are many options that might actually be the best center position (realistically, from ~9-11 in increments of 0.1) and I don't want to have to manually change the center value, but it's not a function so fmin from scipy.optimize returns values for the center which are not in 0.1 increments. My code so far that does it is:
#Position data has values from 0-20 in 0.1 increments
stepsize = 0.1
max_pos = 15 #max that center could be at
HAB_slice['Position Relative to Center'] = HAB_slice['Position'] - center_guess #Defines position relative to center
HAB_slice['Radial Position'] = np.abs(HAB_slice['Position Relative to Center'].values) #absolute value based on previous
possible_pos = np.linspace(0, max_pos, max_pos / stepsize+1)
for i in range(0, len(possible_pos)): # Loop sums absolute values of difference between same absolute difference from zero. Smaller value is generally best value
temp = HAB_slice[HAB_slice['Radial Position']==possible_pos[i]]
if len(temp) == 2:
center_sum += np.abs(temp['I'].diff().values[1])
And then manually changing the value for center_guess until center_sum is the smallest that it gets, which is really tedious. Note 'I' values are basically the y values.
Can someone show me a method of automating this that does not require the minimizing thing to be a function so that it iterates through the actual values of 'Position' to find the one that yields the smallest center_sum?

Related

Calculating the nearest neighbour in a 2d grid using multilevel solution

I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?

Geometric median

I have written some code to find the
Geometric median for a set of weighted points
it is based on this Google-kickstart challenge . I don't want a better solution but want to know what is wrong in my code
. The code iterates against given precision value of 10^-6 to arrive to a value close to the geometric median . the problem I face is it returns correct value for digits until 10^-3 and after that it goes wrong . I cannot figure out what is going wrong . I also noted changing the initializing value alters the result but don't know why.The code also holds good if weight of points is not considered
here is formula i use to find distance to each point: max(abs(i.x-k.x), abs(i.y-k.y))x(weight_of_i) (its is Chebyshev distance)
here is the iteration function i used :
#c = previous centre , stp =previous step ,listy_r = list of points(x,y,wt) ,k = previous sum of distances
def move_ct( c,stp,listy_r,k): #calculates the minimum centre itrateviely , returns c--> center,stp-->step,k-->sum of distances
while True:
tmp=list()
moves = [(c[0], c[1]+stp), (c[0], c[1]-stp),
(c[0]+stp, c[1]), (c[0]-stp, c[1])]
for each in moves:tmp.append(sdist(listy_r, each))
tmp_min = min(tmp)
if tmp_min < k:
k = tmp_min
index = tmp.index(tmp_min)
c = moves[index]
break
else:
stp *= 0.5
return (c,stp,k)
here are the values i initialized:
initial geometric centre = centroid of the weighted points
precision = 10**-6
step = half of distance between highest and lowest coordinates on x,y
I have attached an input text file here that contains 10000 points(it is test case 1 for large input for the challenge) in the format
one point for each line and each point has 3 parameters (x,y,weight)
eg: 980.69 595.86 619.03 where
980.69 = x coordinate
595.86 = y coordinate
619.03 = weight
the result of the 10000 points should give :3288079343.471880 but it gives
3288079343.4719906 as result . Notice it is off only after 10^-3 .

Finding a maximum in inverse parabola iteratively

I have an array that represents an inverse parabola and I want to find the maximum which can be anywhere in the array. In my application I cannot take the derivative and I have to loop through the array.
I implemented this by iterating through the array starting on the left and until I get a value lower that the previous iteration:
import numpy as np
def simulation(n):
# create inverse parabola
num = 21
parabola= np.linspace(-8, 12, num=num)
parabola= -np.abs(parabola) ** 2
return parabola[n]
previous_iteration = -1000 # some initialization
for n in range(num):
# Configure the entire system
# Run simulation
# simulation(n) - a function returning simulation result with configuration "n"
simulation_result = simulation(n)
if previous_iteration < simulation_result :
previous_iteration = simulation_result
else:
best_iteration = n-1
break
print(best_iteration)
print(previous_iteration)
Is there a faster way to do this?
Edit:
The actual implementation will be on a FPGA and for every iteration I have to configure the system and run a simulation so every iteration costs a lot of time. If I run the simulation with all possible configurations, I will get a parabola vector but that will be time consuming and very inefficient.
Im looking for a way to find the maximum value while generating as less points as possible. Unfortunately the for loop has to stay because that is representation of how the system works. The key here is to change the code inside the for loop.
I edited the code to better explain what I mean.
An inverse parabola (sampled at evenly spaced points) has the property that the differences between adjacent points always decreases. The maximum is just before the differences go negative.
If the difference between the first two points is negative or zero, the maximum is the first point in the array.
Otherwise, do a binary search to find the smallest positive difference between two adjacent points. The maximum will be the second of these two points.
What you want to do is called a grid search. You need to define your search grid points (here in 'x') and calculate all the values of the parabola for these points (here in 'y'). You can use np.argmax to find the index of the argument that yields the maximum value:
import numpy as np
# Define inverse parabola
parabola = lambda x: -np.abs(x)**2
# Search axis
x = np.linspace(-8, 12, 21)
# Calculate parabola at each point
y = parabola(x)
# Find argument that yields maximum value
xmax = x[np.argmax(y)]
print(xmax) # Should be 0.0
I edited the post hopefully the problem clearer now.
What you're looking for is called a "ternary search": https://en.wikipedia.org/wiki/Ternary_search
It works to find the maximum of any function f(x) that has an increasing part, maybe followed by an all-equal part, followed by a decreasing part.
Given bounds low and high, pick 2 points m1 = low +(high-low)/3 and m2 = low + (high-low)*2/3.
Then if f(m1) > f(m2), you know the maximum is at x<=m2, because m2 can't be on the increasing part. So set high=m2 and try again.
Otherwise, you know the maximum is at x>=m1, so set low=m1 and try again.
repeat until high-low < 3 and then just pick whichever of those values is biggest.

Number density distribution of an 1D-array - 2 different attempts

I have an large array of elements that I call RelDist (In which dimensionally, is a unit of distance) in a simulated volume. I am attempting to determine the distribution for the "number of values per unit volume" which is also number density. It should be similar to this diagram:
I am aware that the axis is scaled log base 10, the plot of the set should definitely drop off.
Mathematically, I set it up as two equivalent equations:
where N is the number of elements in the array being differentiated in respect to the natural log of the distances. It can also be equivalently re-written in the form of a regular derivative by introducing another factor of r.
Equivalently,
So for ever increasing r, I want to count the change in N of elements per logarithmic bin of r.
As of now, I have trouble setting up the frequency counting in the histogram while accommodating the volume along side it.
Attempt 1
This is using the dN/dlnr/volume equations
def n(dist, numbins):
logdist= np.log(dist)
hist, r_array = np.histogram(logdist, numbins)
dlogR = r_array[1]-r_array[0]
x_array = r_array[1:] - dlogR/2
## I am condifent the above part of this code is correct.
## The succeeding portion does not work.
dR = r_array[1:] - r_array[0:numbins]
dN_dlogR = hist * x_array/dR
volume = 4*np.pi*dist*dist*dist
## The included volume is incorrect
return [x_array, dN_dlogR/volume]
Plotting this does not even properly show a distribution like the first plot I posted above and it only works when I choose the bin number to be the same shape as my input array. The bun number should arbitrary, should it not?
Attempt 2
This is using the equivalent dN/dr/volume equation.
numbins = np.linspace(min(RelDist),max(RelDist), 100)
hist, r_array = np.histogram(RelDist, numbins)
volume = 4*np.float(1000**2)
dR = r_array[1]-r_array[0]
x_array = r_array[1:] - dR/2
y = hist/dR
A little bit easier, but without including the volume term, I get a sort of histogram distribution, which is at least a start.
With this attempt, how would include the volume term with the array?
Example
Start at a distance R value of something like 10, counts the change in number in respect to R, then increasing to a distance value R of 20, counts the change, increase to value of 30, counts the change, and so on so forth.
Here is a txt file of my array if you are interested in re-creating it
https://www.dropbox.com/s/g40gp88k2p6pp6y/RelDist.txt?dl=0
Since no one was able to help answer, I will provide my result in case someone wants to use it for future use:
def n_ln(dist, numbins):
log_dist = np.log10(dist)
bins = np.linspace(min(log_dist),max(log_dist), numbins)
hist, r_array = np.histogram(log_dist, bins)
dR = r_array[1]-r_array[0]
x_array = r_array[1:] - dR/2
volume = [4.*np.pi*i**3. for i in 10**x_array[:] ]
return [10**x_array, hist/dR/volume]

Counting the number of times a threshold is met or exceeded in a multidimensional array in Python

I have an numpy array that I brought in from a netCDF file with the shape (930, 360, 720) where it is organized as (time, latitudes, longitudes).
At each lat/lon pair for each of the 930 time stamps, I need to count the number of times that the value meets or exceeds a threshold "x" (such as 0.2 or 0.5 etc.) and ultimately calculate the percentage that the threshold was exceeded at each point, then output the results so they can be plotted later on.
I have attempted numerous methods but here is my most recent:
lat_length = len(lats)
#where lats has been defined earlier when unpacked from the netCDF dataset
lon_length = len(lons)
#just as lats; also these were defined before using np.meshgrid(lons, lats)
for i in range(0, lat_length):
for j in range(0, lon_length):
if ice[:,i,j] >= x:
#code to count number of occurrences here
#code to calculate percentage here
percent_ice[i,j] += count / len(time) #calculation
#then go on to plot percent_ice
I hope this makes sense! I would greatly appreciate any help. I'm self taught in Python so I may be missing something simple.
Would this be a time to use the any() function? What would be the most efficient way to count the number of times the threshold was exceeded and then calculate the percentage?
You can compare the input 3D array with the threshold x and then sum along the first axis with ndarray.sum(axis=0) to get the count and thereby the percentages, like so -
# Calculate count after thresholding with x and summing along first axis
count = (ice > x).sum(axis=0)
# Get percentages (ratios) by dividing with first axis length
percent_ice = np.true_divide(count,ice.shape[0])
Ah, look, another meteorologist!
There are probably multiple ways to do this and my solution is unlikely to be the fastest since it uses numpy's MaskedArray, which is known to be slow, but this should work:
Numpy has a data type called a MaskedArray which actually contains two normal numpy arrays. It contains a data array as well as a boolean mask. I would first mask all data that are greater than or equal to my threshold (use np.ma.masked_greater() for just greater than):
ice = np.ma.masked_greater_equal(ice)
You can then use ice.count() to determine how many values are below your threshold for each lat/lon point by specifying that you want to count along a specific axis:
n_good = ice.count(axis=0)
This should return a 2-dimensional array containing the number of good points. You can then calculate the number of bad by subtracting n_good from ice.shape[0]:
n_bad = ice.shape[0] - n_good
and calculate the percentage that are bad using:
perc_bad = n_bad/float(ice.shape[0])
There are plenty of ways to do this without using MaskedArray. This is just the easy way that comes to mind for me.

Categories

Resources