How to implement an interactive tone curve in Python? - python

I want to implement a photo editor in python using flask. So far, I managed to apply an s curve to a photo, like this:
import cv2
import numpy as np
image = cv2.imread('apple.jpg')
def sToneCurve(frame):
look_up_table = np.zeros((256, 1), dtype='uint8')
for i in range(256):
look_up_table[i][0] = 255 * (np.sin(np.pi * (i / 255 - 1 / 2)) + 1) / 2
return cv2.LUT(frame, look_up_table)
image_contrasted = sToneCurve(image)
cv2.imwrite('apple_dark.jpg', image_contrasted)
How could I implement an interactive tone curve, so that the user could select how he would like to edit the photos, like this: tone curve and not be a predefined formula applied to the photo, as in the code above. What would be the best approach, what libraries and visualizations for the curve plots to use?

You implement this using "standard" polynomial fitting: you have N points that you need a curve through, so you find the N-1st order polynomial that does that, then use that polynomial as your mapping function.
You're already using numpy, so use numpy.polynomial.polynomial.polyfit with:
x all your points' x coordinates, including your black and white points (which in a proper tone curve users should be able to move off of (0,0) and (1,1) respective),
y all your points' y coordinates,
deg if the polynomial has to pass through all points, which it should, this should be equal to len(x) - 1, as two points is a line, or a first degree polynomial, three points is a quadratic curve, or a second degree polynomial, etc. "The" polynomial through N points is an N-1 degree polynomial,
the rest of the args shouldn't particularly matter.
This gives you a numpy array of polynomial coefficients (let's call that array c) that you can then use for mapping: any pixel with lightness/intensity value i should get mapped to:
mapped = f(I) = c[0] * i**0 + c[1] * i**1 + c[2] * i**2 + ...
Which thankfully numpy can do for you by simply using the corresponding polyval function.
And of course, to make that fast, what you really want to do is build a LUT that you can just directly consult, every time the user changes a coordinate in the tone curve UI, so:
from numpy.polynomial.polynomial import polyfit, polyval
# How big of a LUT you actually need depends entirely
# on the bit depth you're working with, of course...
BIT_DEPTH = 2**16
TONE_LUT = range(0, BIT_DEPTH)
def update_from_tone_ui(coordinates):
"""
Called on user value update, with coordinates being
a list-of-lists a la [[0,0], [0.1,0.1], ...]
"""
x, y = zip(*coordinates)
coefficients = polyfit(x, y, len(x) - 1)
f = lamba i: clamp(polyval(i, coefficients), 0, 1)
# And remember to make sure the input range to f() matches
# the actual x/y domain that we used for the polyfit:
divisor = BIT_DEPTH - 1
TONE_LUT = [BIT_DEPTH * f(i/divisor) for i in range(0, BIT_DEPTH)]
with clamp coming from "somewhere", but if you don't already have one then it's trivially implemented with some shortcut returns:
def clamp(n, floor, ceiling):
if n < floor: return floor
if n > ceiling: return ceiling
return n
(And of course make sure to adjust your clamping values if you don't want your tone curve x and y coordinates in [0,1])
Now, rather than running the mapping function every time, you just directly look up the mapped value. Note that you get a bit of freedom in terms of precision: you could use a tone curve in which the x and y values run from 0 to 1, or you use have them run from 0 to whatever-bit-depth-you-use (28, 216, what have you) but whatever you use, make sure you scale your actual pixel intensities accordingly when you generate your LUT. Otherwise things will look really interesting.

Related

How to quickly calculate a 1D integral over an interpolated 2D array?

Given is a geometrical object, for simplification a semisphere with a certain radius. This is displayed as a 2D matrix with the Z data being the height. Assuming that I cut the object along any line, I want to calculate the area of the cut. My solution is to interpolate the semisphere using scipys RectBivariateSpline to accurately display it.
import numpy as np
import scipy.interpolate as intp
radius = 15.
gridsize = 0.5
spectrum = np.arange(-radius,radius+gridsize,gridsize)
X,Y = np.meshgrid(spectrum,spectrum)
Z = np.where(np.sqrt(X**2+Y**2)<=radius, np.sqrt(radius**2-np.sqrt(X**2+Y**2)**2), 0)
spline = intp.RectBivariateSpline(x = X[0,:], y = Y[:,0], z = Z)
#Example coordinates of the cut
x0 = -4.78
x = -6.73
y0 = -15.
y = 15.
However, the RectBivariateSpline only offers an area integral (which can be quickly checked by setting x0 = x or y0 = y). On the other hand the UnivariateSpline only takes in 1D array, which would only work if my cut happened to be along one specific vector of the matrix Z.
Since I want to perform this operation a few thousand times, I would need a comparably quick way to solve the integral (numerically or analytically doesn't matter as long as the error is somewhat negligible). Does anyone have an idea on how to do this?
It turned out, that, for my case, it was sufficient to sample the spline along my cut (using numpy's arange to gather equally spaced points) and then by integrating via the Simpson rule, which only requires a number of points with a sufficiently low distance (which can be controlled via arange's step parameter).

Programmatical Change of basis for coordinate vectors with different origin of coordinates (python/general maths)

I have a Support Vector Machine that splits my data in two using a decision hyperplane (for visualisation purposes this is a sample dataset with three dimensions), like this:
Now I want to perform a change of basis, such that the hyperplane lies flatly on the x/y plane, such that the distance from each sample point to the decision hyperplane is simply their z-coordinate.
For that, I know that I need to perform a change of basis. The hyperplane of the SVM is given by their coefficient (3d-vector) and intercept (scalar), using (as far as I understand it) the general form for mathematical planes: ax+by+cz=d, with a,b,c being the coordinates of the coefficient and d being the intercept. When plotted as 3d-Vector, the coefficient is a vector orthogonal to the plane (in the image it's the cyan line).
Now to the change of basis: If there was no intercept, I could just assume the vector that is the coefficient is one vector of my new basis, one other can be a random vector that is on the plane and the third one is simply cross product of both, resulting in three orthogonal vectors that can be the column vectors of the transformation-matrix.
The z-function used in the code below comes from simple term rearrangement from the general form of planes: ax+by+cz=d <=> z=(d-ax-by)/c:
z_func = lambda interc, coef, x, y: (interc-coef[0]*x -coef[1]*y) / coef[2]
def generate_trafo_matrices(coefficient, z_func):
normalize = lambda vec: vec/np.linalg.norm(vec)
uvec2 = normalize(np.array([1, 0, z_func(1, 0)]))
uvec3 = normalize(np.cross(uvec1, uvec2))
back_trafo_matrix = np.array([uvec2, uvec3, coefficient]).T
#in other order such that its on the xy-plane instead of the yz-plane
trafo_matrix = np.linalg.inv(back_trafo_matrix)
return trafo_matrix, back_trafo_matrix
This transformation matrix would then be applied to all points, like this:
def _transform(self, points, inverse=False):
trafo_mat = self.inverse_trafo_mat if inverse else self.trafo_mat
points = np.array([trafo_mat.dot(point) for point in points])
return points
Now if the intercept would be zero, that would work perfectly and the plane would be flat on the xy-axis. However as soon as I have an intercept != zero, the plane is not flat anymore:
I understand that that is the case because this is not a simple change of basis, because the coordinate origin of my other basis is not at (0,0,0) but at a different place (the hyperplane could be crossing the coefficient-vector at any point), but my attempts of adding the intercept to the transformation all didn't lead to the correct result:
def _transform(self, points, inverse=False):
trafo_mat = self.inverse_trafo_mat if inverse else self.trafo_mat
intercept = self.intercept if inverse else -self.intercept
ursprung_translate = trafo_mat.dot(np.array([0,0,0])+trafo_matrix[:,0]*intercept)
points = np.array([point+trafo_matrix[:,0]*intercept for point in points])
points = np.array([trafo_mat.dot(point) for point in points])
points = np.array([point-ursprung_translate for point in points])
return points
is for example wrong. I am asking this on StackOverflow and not on the math StackExchange because I think I wouldn't be able to translate the respective math into code, I am glad I even got this far.
I have created a github gist with the code to do the transformation and create the plots at https://gist.github.com/cstenkamp/0fce4d662beb9e07f0878744c7214995, which can be launched using Binder under the link https://mybinder.org/v2/gist/jtpio/0fce4d662beb9e07f0878744c7214995/master?urlpath=lab%2Ftree%2Fchange_of_basis_with_translate.ipynb if somebody wants to play around with the code itself.
Any help is appreciated!
The problem here is that your plane is an affine space, not a vector space, so you can't use the usual transform matrix formula.
A coordinate system in affine space is given by an origin point and a basis (put together, they're called an affine frame). For example, if your origin is called O, the coordinates of the point M in the affine frame will be the cooordinates of the OM vector in the affine frame's basis.
As you can see, the "normal" R^3 space is a special case of affine space where the origin is (0,0,0).
Once we've determined those, we can use the frame change formulas in affine spaces: if we have two affine frames R = (O, b) and R' = (O', b'), the base change formula for a point M is: M(R') = base_change_matrix_from_b'_to_b * (M(R) - O'(R)) (with O'(R) the coordinates of O' in the coordinate system defined by R).
In our case, we're trying to go from the frame with an origin at (0,0,0) and
the canonical basis, to a frame where the origin is the orthogonal projection of (0,0,0) on the plane and the basis is, for instance, the one described in your initial post.
Let's implement these steps:
To begin with, we'll define a Plane class to make our lifes a bit easier:
from dataclasses import dataclass
import numpy as np
#dataclass
class Plane:
a: float
b: float
c: float
d: float
#property
def normal(self):
return np.array([self.a, self.b, self.c])
def __contains__(self, point:np.array):
return np.isclose(self.a*point[0] + self.b*point[1] + self.c*point[2] + self.d, 0)
def project(self, point):
x,y,z = point
k = (self.a*x + self.b*y + self.c*z + self.d)/(self.a**2 + self.b**2 + self.c**2)
return np.array([x - k*self.a, y-k*self.b, z-k*self.c])
def z(self, x, y):
return (- self.d - self.b*y - self.a*x)/self.c
We can then implement make_base_changer, which takes a Plane as an input, and return 2 lambda functions performing the forward and inverse transform (taking and returning a point). You should be able to test
def normalize(vec):
return vec/np.linalg.norm(vec)
def make_base_changer(plane):
uvec1 = plane.normal
uvec2 = [0, -plane.d/plane.b, plane.d/plane.c]
uvec3 = np.cross(uvec1, uvec2)
transition_matrix = np.linalg.inv(np.array([uvec1, uvec2, uvec3]).T)
origin = np.array([0,0,0])
new_origin = plane.project(origin)
forward = lambda point: transition_matrix.dot(point - new_origin)
backward = lambda point: np.linalg.inv(transition_matrix).dot(point) + new_origin
return forward, backward

Inverse FFT returns negative values when it should not

I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.

Construct an array spacing proportional to a function or other array

I have a function (f : black line) which varies sharply in a specific, small region (derivative f' : blue line, and second derivative f'' : red line). I would like to integrate this function numerically, and if I distribution points evenly (in log-space) I end up with fairly large errors in the sharply varying region (near 2E15 in the plot).
How can I construct an array spacing such that it is very well sampled in the area where the second derivative is large (i.e. a sampling frequency proportional to the second derivative)?
I happen to be using python, but I'm interested in a general algorithm.
Edit:
1) It would be nice to be able to still control the number of sampling points (at least roughly).
2) I've considered constructing a probability distribution function shaped like the second derivative and drawing randomly from that --- but I think this will offer poor convergence, and in general, it seems like a more deterministic approach should be feasible.
Assuming f'' is a NumPy array, you could do the following
# Scale these deltas as you see fit
deltas = 1/f''
domain = deltas.cumsum()
To account only for order of magnitude swings, this could be adjusted as follows...
deltas = 1/(-np.log10(1/f''))
I'm just spitballing here ... (as I don't have time to try this out for real)...
Your data looks (roughly) linear on a log-log plot (at least, each segment seems to be... So, I might consider doing a sort-of integration in log-space.
log_x = log(x)
log_y = log(y)
Now, for each of your points, you can get the slope (and intercept) in log-log space:
rise = np.diff(log_y)
run = np.diff(log_x)
slopes = rise / run
And, similarly, the the intercept can be calculated:
# y = mx + b
# :. b = y - mx
intercepts = y_log[:-1] - slopes * x_log[:-1]
Alright, now we have a bunch of (straight) lines in log-log space. But, a straight line in log-log space, corresponds to y = log(intercept)*x^slope in real space. We can integrate that easily enough: y = a/(k+1) x ^ (k+1), so...
def _eval_log_log_integrate(a, k, x):
return np.log(a)/(k+1) * x ** (k+1)
def log_log_integrate(a, k, x1, x2):
return _eval_log_log_integrate(a, k, x2) - _eval_log_log_integrate(a, k, x1)
partial_integrals = []
for a, k, x_lower, x_upper in zip(intercepts, slopes, x[:-1], x[1:]):
partial_integrals.append(log_log_integrate(a, k, x_lower, x_upper))
total_integral = sum(partial_integrals)
You'll want to check my math -- It's been a while since I've done this sort of thing :-)
1) The Cool Approach
At the moment I implemented an 'adaptive refinement' approach inspired by hydrodynamics techniques. I have a function which I want to sample, f, and I choose some initial array of sample points x_i. I construct a "sampling" function g, which determines where to insert new sample points.
In this case I chose g as the slope of log(f) --- since I want to resolve rapid changes in log space. I then divide the span of g into L=3 refinement levels. If g(x_i) exceeds a refinement level, that span is subdivided into N=2 pieces, those subdivisions are added into the samples and are checked against the next level. This yields something like this:
The solid grey line is the function I want to sample, and the black crosses are my initial sampling points.
The dashed grey line is the derivative of the log of my function.
The colored dashed lines are my 'refinement levels'
The colored crosses are my refined sampling points.
This is all shown in log-space.
2) The Simple Approach
After I finished (1), I realized that I probably could have just chosen a maximum spacing in in y, and choose x-spacings to achieve that. Similarly, just divide the function evenly in y, and find the corresponding x points.... The results of this are shown below:
A simple approach would be to split the x-axis-array into three parts and use different spacing for each of them. It would allow you to maintain the total number of points and also the required spacing in different regions of the plot. For example:
x = np.linspace(10**13, 10**15, 100)
x = np.append(x, np.linspace(10**15, 10**16, 100))
x = np.append(x, np.linspace(10**16, 10**18, 100))
You may want to choose a better spacing based on your data, but you get the idea.

Fast, elegant way to calculate empirical/sample covariogram

Does anyone know a good method to calculate the empirical/sample covariogram, if possible in Python?
This is a screenshot of a book which contains a good definition of covariagram:
If I understood it correctly, for a given lag/width h, I'm supposed to get all the pair of points that are separated by h (or less than h), multiply its values and for each of these points, calculate its mean, which in this case, are defined as m(x_i). However, according to the definition of m(x_{i}), if I want to compute m(x1), I need to obtain the average of the values located within distance h from x1. This looks like a very intensive computation.
First of all, am I understanding this correctly? If so, what is a good way to compute this assuming a two dimensional space? I tried to code this in Python (using numpy and pandas), but it takes a couple of seconds and I'm not even sure it is correct, that is why I will refrain from posting the code here. Here is another attempt of a very naive implementation:
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(np.array(coordinates))) # coordinates is a nx2 array
z = np.array(z) # z are the values
cutoff = np.max(distances)/3.0 # somewhat arbitrary cutoff
width = cutoff/15.0
widths = np.arange(0, cutoff + width, width)
Z = []
Cov = []
for w in np.arange(len(widths)-1): # for each width
# for each pairwise distance
for i in np.arange(distances.shape[0]):
for j in np.arange(distances.shape[1]):
if distances[i, j] <= widths[w+1] and distances[i, j] > widths[w]:
m1 = []
m2 = []
# when a distance is within a given width, calculate the means of
# the points involved
for x in np.arange(distances.shape[1]):
if distances[i,x] <= widths[w+1] and distances[i, x] > widths[w]:
m1.append(z[x])
for y in np.arange(distances.shape[1]):
if distances[j,y] <= widths[w+1] and distances[j, y] > widths[w]:
m2.append(z[y])
mean_m1 = np.array(m1).mean()
mean_m2 = np.array(m2).mean()
Z.append(z[i]*z[j] - mean_m1*mean_m2)
Z_mean = np.array(Z).mean() # calculate covariogram for width w
Cov.append(Z_mean) # collect covariances for all widths
However, now I have confirmed that there is an error in my code. I know that because I used the variogram to calculate the covariogram (covariogram(h) = covariogram(0) - variogram(h)) and I get a different plot:
And it is supposed to look like this:
Finally, if you know a Python/R/MATLAB library to calculate empirical covariograms, let me know. At least, that way I can verify what I did.
One could use scipy.cov, but if one does the calculation directly (which is very easy), there are more ways to speed this up.
First, make some fake data that has some spacial correlations. I'll do this by first making the spatial correlations, and then using random data points that are generated using this, where the data is positioned according to the underlying map, and also takes on the values of the underlying map.
Edit 1:
I changed the data point generator so positions are purely random, but z-values are proportional to the spatial map. And, I changed the map so that left and right side were shifted relative to eachother to create negative correlation at large h.
from numpy import *
import random
import matplotlib.pyplot as plt
S = 1000
N = 900
# first, make some fake data, with correlations on two spatial scales
# density map
x = linspace(0, 2*pi, S)
sx = sin(3*x)*sin(10*x)
density = .8* abs(outer(sx, sx))
density[:,:S//2] += .2
# make a point cloud motivated by this density
random.seed(10) # so this can be repeated
points = []
while len(points)<N:
v, ix, iy = random.random(), random.randint(0,S-1), random.randint(0,S-1)
if True: #v<density[ix,iy]:
points.append([ix, iy, density[ix,iy]])
locations = array(points).transpose()
print locations.shape
plt.imshow(density, alpha=.3, origin='lower')
plt.plot(locations[1,:], locations[0,:], '.k')
plt.xlim((0,S))
plt.ylim((0,S))
plt.show()
# build these into the main data: all pairs into distances and z0 z1 values
L = locations
m = array([[math.sqrt((L[0,i]-L[0,j])**2+(L[1,i]-L[1,j])**2), L[2,i], L[2,j]]
for i in range(N) for j in range(N) if i>j])
Which gives:
The above is just the simulated data, and I made no attempt to optimize it's production, etc. I assume this is where the OP starts, with the task below, since the data already exists in a real situation.
Now calculate the "covariogram" (which is much easier than generating the fake data, btw). The idea here is to sort all the pairs and associated values by h, and then index into these using ihvals. That is, summing up to index ihval is the sum over N(h) in the equation, since this includes all pairs with hs below the desired values.
Edit 2:
As suggested in the comments below, N(h) is now only the pairs that are between h-dh and h, rather than all pairs between 0 and h (where dh is the spacing of h-values in ihvals -- ie, S/1000 was used below).
# now do the real calculations for the covariogram
# sort by h and give clear names
i = argsort(m[:,0]) # h sorting
h = m[i,0]
zh = m[i,1]
zsh = m[i,2]
zz = zh*zsh
hvals = linspace(0,S,1000) # the values of h to use (S should be in the units of distance, here I just used ints)
ihvals = searchsorted(h, hvals)
result = []
for i, ihval in enumerate(ihvals[1:]):
start, stop = ihvals[i-1], ihval
N = stop-start
if N>0:
mnh = sum(zh[start:stop])/N
mph = sum(zsh[start:stop])/N
szz = sum(zz[start:stop])/N
C = szz-mnh*mph
result.append([h[ihval], C])
result = array(result)
plt.plot(result[:,0], result[:,1])
plt.grid()
plt.show()
which looks reasonable to me as one can see bumps or troughs at the expected for the h values, but I haven't done a careful check.
The main speedup here over scipy.cov, is that one can precalculate all of the products, zz. Otherwise, one would feed zh and zsh into cov for every new h, and all the products would be recalculated. This calculate could be sped up even more by doing partial sums, ie, from ihvals[n-1] to ihvals[n] at each timestep n, but I doubt that will be necessary.

Categories

Resources