Extrapolate 2d numpy array in one dimension - python

I have numpy.array data set from a simulation, but I'm missing the point at the edge (x=0.1), how can I interpolate/extrapolate the data in z to the edge? I have:
x = [ 0. 0.00667 0.02692 0.05385 0.08077]
y = [ 0. 10. 20. 30. 40. 50.]
# 0. 0.00667 0.02692 0.05385 0.08077
z = [[ 25. 25. 25. 25. 25. ] # 0.
[ 25.301 25.368 25.617 26.089 26.787] # 10.
[ 25.955 26.094 26.601 27.531 28.861] # 20.
[ 26.915 27.126 27.887 29.241 31.113] # 30.
[ 28.106 28.386 29.378 31.097 33.402] # 40.
[ 29.443 29.784 30.973 32.982 35.603]] # 50.
I want to add a new column in z corresponding to x = 0.1 so that my new x will be
x_new = [ 0. 0.00667 0.02692 0.05385 0.08077 0.1]
# 0. 0.00667 0.02692 0.05385 0.08077 0.01
z = [[ 25. 25. 25. 25. 25. ? ] # 0.
[ 25.301 25.368 25.617 26.089 26.787 ? ] # 10.
[ 25.955 26.094 26.601 27.531 28.861 ? ] # 20.
[ 26.915 27.126 27.887 29.241 31.113 ? ] # 30.
[ 28.106 28.386 29.378 31.097 33.402 ? ] # 40.
[ 29.443 29.784 30.973 32.982 35.603 ? ]] # 50.
Where all '?' replaced with interpolated/extrapolated data.
Thanks for any help!

Have you had a look at scipy.interpolate2d.interp2d (which uses splines)?
from scipy.interpolate import interp2d
fspline = interp2d(x,y,z) # maybe need to switch x and y around
znew = fspline([0.1], y)
z = np.c_[[z, znew] # to join arrays
EDIT:
The method that #dnalow and I are imagining is along the following lines:
import numpy as np
import matplotlib.pyplot as plt
# make some test data
def func(x, y):
return np.sin(np.pi*x) + np.sin(np.pi*y)
xx, yy = np.mgrid[0:2:20j, 0:2:20j]
zz = func(xx[:], yy[:]).reshape(xx.shape)
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1,4, figsize=(13, 3))
ax1.imshow(zz, interpolation='nearest')
ax1.set_title('Original')
# remove last column
zz[:,-1] = np.nan
ax2.imshow(zz, interpolation='nearest')
ax2.set_title('Missing data')
# compute missing column using simplest imaginable model: first order Taylor
gxx, gyy = np.gradient(zz[:, :-1])
zz[:, -1] = zz[:, -2] + gxx[:, -1] + gyy[:,-1]
ax3.imshow(zz, interpolation='nearest')
ax3.set_title('1st order Taylor approx')
# add curvature to estimate
ggxx, _ = np.gradient(gxx)
_, ggyy = np.gradient(gyy)
zz[:, -1] = zz[:, -2] + gxx[:, -1] + gyy[:,-1] + ggxx[:,-1] + ggyy[:, -1]
ax4.imshow(zz, interpolation='nearest')
ax4.set_title('2nd order Taylor approx')
fig.tight_layout()
fig.savefig('extrapolate_2d.png')
plt.show()
You could improve the estimate by
(a) adding higher order derivatives (aka Taylor expansion), or
(b) computing the gradients in more directions than just x and y (and then weighting the gradients accordingly).
Also, you will get better gradients if you pre-smooth the image (and now we have a complete Sobel filter...).

Related

How to find all negative slope straight lines and get the one with the minimum y=0 value

I would like to implement a function that plots a histogram and compute the minimum y=0 value of all the negative slope straight lines computed using the coordinates of the bins.
For example, this plot:
i1s1=[1.081576, 1.063301000000001, 0.7665449999999989, 0.6702140000000014, 0.9948089999999983, 0.8247,
1.0281650000000013, 1.0204430000000002, 0.9952000000000005, 0.8824919999999992, 1.0094080000000005,
0.23627600000000015, 0.7032509999999981, 0.34252400000000094, 0.5976010000000009, 0.9419879999999985,
1.1269390000000001, 1.0165110000000013, 0.803722999999998, 1.2493930000000013, 0.3798589999999997,
0.5761640000000021, 1.0876199999999976, 0.8915590000000009, 0.9461050000000029, 1.0046489999999935,
0.8577720000000042, 1.131541999999996, 0.9394370000000052, 1.939746999999997, 0.7513170000000002,
0.7799210000000016, 0.2271250000000009, 0.5776759999999967, 1.0690549999999988, 1.2057460000000049,
2.6899219999999957, 3.521351000000003, 0.8345109999999991, 1.1897260000000003, 0.9561250000000001,
2.113745999999999, 0.7494179999999986, 1.1265460000000047, 0.8125209999999967, 2.5974119999999985,
0.7458990000000014, 1.0843160000000012, 0.9465989999999991, 0.8917330000000021, 0.933920999999998,
2.2939850000000064, 1.3038799999999924, 1.7666460000000086]
n, bins, patches = plt.hist(x=i1s1, bins='auto', color='#0504aa',
alpha=0.6, rwidth=0.9,)
plt.grid(axis='y', alpha=0.35)
plt.xlabel('time [s]')
plt.ylabel('Frequency')
plt.show()
The coordinates of each bin can be calculated like this:
midpoints = bins[:-1] + np.diff(bins)/2
coords = np.column_stack((midpoints,n))
coords = array([[ 0.31381516, 4. ],
[ 0.48719547, 0. ],
[ 0.66057579, 6. ],
[ 0.83395611, 12. ],
[ 1.00733642, 18. ],
[ 1.18071674, 6. ],
[ 1.35409705, 1. ],
[ 1.52747737, 0. ],
[ 1.70085768, 1. ],
[ 1.874238 , 1. ],
[ 2.04761832, 1. ],
[ 2.22099863, 1. ],
[ 2.39437895, 0. ],
[ 2.56775926, 1. ],
[ 2.74113958, 1. ],
[ 2.91451989, 0. ],
[ 3.08790021, 0. ],
[ 3.26128053, 0. ],
[ 3.43466084, 1. ]])
I would like to use coords to
compute all the straight lines with negative slope
get the one that has the minimum y=0 value
plot that lines on the histogram and
get the x-axis value of y=0 of that line
For example, in the histogram above, the solution should be like this:
which has a y=0 value at x= 0.48. I would like to plot the line of the solution and get the value of x of that line when y=0.
In other histograms, the solutions are:
EDIT
Another dataset example to test solutions:
i3s1=[1.4856339999999992, 0.27564800000000034, 1.1008430000000011, 1.2301969999999987, 0.2667920000000006, 0.8187089999999984, 0.42119200000000134, 0.5471469999999989, 1.0582640000000012, 0.7725049999999989, 0.8486200000000004, 0.8414530000000013, 0.34434200000000104, 3.3969810000000003, 4.355844999999999, 1.5555109999999992, 1.2929899999999996, 1.2005979999999994, 2.6386439999999993, 1.2733500000000006, 1.2238090000000028, 1.406841, 1.227254000000002, 1.4577429999999936, 1.204816000000001, 0.4409120000000044, 1.1166549999999944, 0.20276700000000147, 3.8218770000000006, 0.11855700000000269, 2.541343999999995, 0.0911790000000039, 2.0708699999999993, 2.6000489999999985, 0.11452600000000501, 0.16021299999999883, 2.288936999999997, 0.266489, 0.18775300000000072, 2.497996999999998, 0.42036200000000434, 2.3378999999999976, 0.23202399999999557, 2.6313650000000024, 0.20198899999999753, 0.17698099999999783]
Use itertools.combinations to get a list of all possible red lines and then find the linear function with a negative slope that has the smallest x value for y=0.
import pandas as pd
from itertools import combinations
#get all possible combinations of points that a linear function should cross
all_combinations = pd.DataFrame(
map(lambda r: (r[0][0], r[0][1], r[1][0], r[1][1]),
combinations(coords,2)),
columns=["x1", "y1", "x2", "y2"])
#calculate a and b for each linear function using y=a*x+b
all_combinations["a"] = ((all_combinations["y2"] - all_combinations["y1"])
/ (all_combinations["x2"] - all_combinations["x1"]))
all_combinations["b"] = (all_combinations["y2"] -
all_combinations["a"] * all_combinations["x2"])
#for each linear function get the x value for y=0 (called x0)
all_combinations["x0"] = - all_combinations["b"] / all_combinations["a"]
#only keep the functions with a negative slope
neg_slope = all_combinations[(all_combinations["a"] < 0) &
(all_combinations["a"] != float("inf"))
& (all_combinations["a"] != float("-inf"))]
#only keep the function with the smallest x0
miny0 = neg_slope[neg_slope["x0"] == neg_slope["x0"].min()].reset_index()
x1, y1, x0 = miny0["x1"][0],miny0["y1"][0],miny0["x0"][0]
#plot the result
plt.bar(x=coords[:,0], height=coords[:,1],width=0.1)
plt.plot([x1, x0], [y1, 0], color="red")

Is there a more efficient way to generate a distance matrix in numpy

I was wondering if there is a more straight forward, more efficient way of generating a distance matrix given the H x W of the matrix, and the starting index location.
For simplicity lets take a 3x3 matrix where the starting point is (0,0). Thus, the distance matrix to be generated is:
[[ 0. 1. 2. ]
[ 1. 1.41421356 2.23606798]
[ 2. 2.23606798 2.82842712]]
Index (0,1) is 1 distance away, while index (2,2) is 2.828 distance away.
The code I have so far is below:
def get_distances(start, height, width):
matrix = np.zeros((height, width), dtype=np.float16)
indexes = [(y, x) for y, row in enumerate(matrix) for x, val in enumerate(row)]
to_points = np.array(indexes)
start_point = np.array(start)
distances = np.linalg.norm(to_points - start_point, ord=2, axis=1.)
return distances.reshape((height, width))
height = 3
width = 3
start = [0,0]
distance_matrix = get_distances(start, height, width)
This is pretty efficient already, I think. But numpy always surprise me with some tricks that I usually never think of, so I was wondering if there exist one in this scenario. Thanks
You can use hypot() and broadcast:
import numpy as np
x = np.arange(3)
np.hypot(x[None, :], x[:, None])
or the outer method:
np.hypot.outer(x, x)
the result:
array([[ 0. , 1. , 2. ],
[ 1. , 1.41421356, 2.23606798],
[ 2. , 2.23606798, 2.82842712]])
to calculate the distance between every point on a grid to a fixed point (x, y):
x, y = np.ogrid[0:3, 0:3]
np.hypot(x - 2, y - 2)

How to generate a sphere in 3D Numpy array

Given a 3D numpy array of shape (256, 256, 256), how would I make a solid sphere shape inside? The code below generates a series of increasing and decreasing circles but is diamond shaped when viewed in the two other dimensions.
def make_sphere(arr, x_pos, y_pos, z_pos, radius=10, size=256, plot=False):
val = 255
for r in range(radius):
y, x = np.ogrid[-x_pos:n-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
top_half = arr[z_pos+r]
top_half[mask] = val #+ np.random.randint(val)
arr[z_pos+r] = top_half
for r in range(radius, 0, -1):
y, x = np.ogrid[-x_pos:size-x_pos, -y_pos:size-y_pos]
mask = x*x + y*y <= r*r
bottom_half = arr[z_pos+r]
bottom_half[mask] = val#+ np.random.randint(val)
arr[z_pos+2*radius-r] = bottom_half
if plot:
for i in range(2*radius):
if arr[z_pos+i].max() != 0:
print(z_pos+i)
plt.imshow(arr[z_pos+i])
plt.show()
return arr
EDIT: pymrt.geometry has been removed in favor of raster_geometry.
DISCLAIMER: I am the author of both pymrt and raster_geometry.
If you just need to have the sphere, you can use the pip-installable module raster_geometry, and particularly raster_geometry.sphere(), e.g:
import raster_geometry as rg
arr = rg.sphere(3, 1)
print(arr.astype(np.int_))
# [[[0 0 0]
# [0 1 0]
# [0 0 0]]
# [[0 1 0]
# [1 1 1]
# [0 1 0]]
# [[0 0 0]
# [0 1 0]
# [0 0 0]]]
internally, this is implemented as an n-dimensional superellipsoid generator, you can check its source code for details.
Briefly, the (simplified) code would reads like this:
import numpy as np
def sphere(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
# assume shape and position have the same length and contain ints
# the units are pixels / voxels (px for short)
# radius is a int or float in px
assert len(position) == len(shape)
n = len(shape)
semisizes = (radius,) * len(shape)
# genereate the grid for the support points
# centered at the position indicated by position
grid = [slice(-x0, dim - x0) for x0, dim in zip(position, shape)]
position = np.ogrid[grid]
# calculate the distance of all points from `position` center
# scaled by the radius
arr = np.zeros(shape, dtype=float)
for x_i, semisize in zip(position, semisizes):
# this can be generalized for exponent != 2
# in which case `(x_i / semisize)`
# would become `np.abs(x_i / semisize)`
arr += (x_i / semisize) ** 2
# the inner part of the sphere will have distance below or equal to 1
return arr <= 1.0
and testing it:
# this will save a sphere in a boolean array
# the shape of the containing array is: (256, 256, 256)
# the position of the center is: (127, 127, 127)
# if you want is 0 and 1 just use .astype(int)
# for plotting it is likely that you want that
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# just for fun you can check that the volume is matching what expected
# (the two numbers do not match exactly because of the discretization error)
print(np.sum(arr))
# 4169
print(4 / 3 * np.pi * 10 ** 3)
# 4188.790204786391
I am failing to get how your code exactly works, but to check that this is actually producing spheres (using your numbers) you could try:
arr = sphere((256, 256, 256), 10, (127, 127, 127))
# plot in 3D
import matplotlib.pyplot as plt
from skimage import measure
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
verts, faces, normals, values = measure.marching_cubes(arr, 0.5)
ax.plot_trisurf(
verts[:, 0], verts[:, 1], faces, verts[:, 2], cmap='Spectral',
antialiased=False, linewidth=0.0)
plt.show()
Other approaches
One could implement essentially the same with a combination of np.linalg.norm() and np.indices():
import numpy as np
def sphere_idx(shape, radius, position):
"""Generate an n-dimensional spherical mask."""
assert len(position) == len(shape)
n = len(shape)
position = np.array(position).reshape((-1,) + (1,) * n)
arr = np.linalg.norm(np.indices(shape) - position, axis=0)
return arr <= radius
producing the same results (sphere_ogrid is sphere from above):
import matplotlib.pyplot as plt
funcs = sphere_ogrid, sphere_idx
fig, axs = plt.subplots(1, len(funcs), squeeze=False, figsize=(4 * len(funcs), 4))
d = 500
n = 2
shape = (d,) * n
position = (d // 2,) * n
size = (d // 8)
base = sphere_ogrid(shape, size, position)
for i, func in enumerate(funcs):
arr = func(shape, size, position)
axs[0, i].imshow(arr)
However, this is going to be substantially slower and requires much more temporary memory n_dim * shape of the output.
The benchmarks below seems to support the speed assessment:
base = sphere_ogrid(shape, size, position)
for func in funcs:
print(f"{func.__name__:20s}", np.allclose(base, arr), end=" ")
%timeit -o func(shape, size, position)
# sphere_ogrid True 1000 loops, best of 5: 866 µs per loop
# sphere_idx True 100 loops, best of 5: 4.15 ms per loop
size = 100
radius = 10
x0, y0, z0 = (50, 50, 50)
x, y, z = np.mgrid[0:size:1, 0:size:1, 0:size:1]
r = np.sqrt((x - x0)**2 + (y - y0)**2 + (z - z0)**2)
r[r > radius] = 0
Nice question. My answer to a similar question would be applicable here also.
You can try the following code. In the below mentioned code AA is the matrix that you want.
import numpy as np
from copy import deepcopy
''' size : size of original 3D numpy matrix A.
radius : radius of circle inside A which will be filled with ones.
'''
size, radius = 5, 2
''' A : numpy.ndarray of shape size*size*size. '''
A = np.zeros((size,size, size))
''' AA : copy of A (you don't want the original copy of A to be overwritten.) '''
AA = deepcopy(A)
''' (x0, y0, z0) : coordinates of center of circle inside A. '''
x0, y0, z0 = int(np.floor(A.shape[0]/2)), \
int(np.floor(A.shape[1]/2)), int(np.floor(A.shape[2]/2))
for x in range(x0-radius, x0+radius+1):
for y in range(y0-radius, y0+radius+1):
for z in range(z0-radius, z0+radius+1):
''' deb: measures how far a coordinate in A is far from the center.
deb>=0: inside the sphere.
deb<0: outside the sphere.'''
deb = radius - abs(x0-x) - abs(y0-y) - abs(z0-z)
if (deb)>=0: AA[x,y,z] = 1
Following is an example of the output for size=5 and radius=2 (a sphere of radius 2 pixels inside a numpy array of shape 5*5*5):
[[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[1. 1. 1. 1. 1.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]]
I haven't printed the output for the size and radius that you had asked for (size=32 and radius=4), as the output will be very long.
Here is how to create voxels space without numpy, the main idea that you calculate distance between center and voxel and if voxel in radius you will create.
from math import sqrt
def distance_dimension(xyz0 = [], xyz1 = []):
delta_OX = pow(xyz0[0] - xyz1[0], 2)
delta_OY = pow(xyz0[1] - xyz1[1], 2)
delta_OZ = pow(xyz0[2] - xyz1[2], 2)
return sqrt(delta_OX+delta_OY+delta_OZ)
def voxels_figure(figure = 'sphere', position = [0,0,0], size = 1):
xmin, xmax = position[0]-size, position[0]+size
ymin, ymax = position[1]-size, position[1]+size
zmin, zmax = position[2]-size, position[2]+size
voxels = []
if figure == 'cube':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
voxels.append([world_x,world_y,world_z])
elif figure == 'sphere':
for local_z, world_z in zip(range(zmax-zmin), range(zmin, zmax)):
for local_y, world_y in zip(range(ymax-ymin), range(ymin, ymax)):
for local_x, world_x in zip(range(xmax-xmin), range(xmin, xmax)):
radius = distance_dimension(xyz0 = [world_x, world_y,world_z], xyz1 = position)
if radius < size:
voxels.append([world_x,world_y,world_z])
return voxels
voxels = voxels_figure(figure = 'sphere', position = [0,0,0], size = 3)
After you will get voxels indexes, you can apply ~ones for cube matrix.
Instead of using loops, I propose to use a meshgrid + sphere equation + np.where
import numpy as np
def generate_sphere(volumeSize):
x_ = np.linspace(0,volumeSize, volumeSize)
y_ = np.linspace(0,volumeSize, volumeSize)
z_ = np.linspace(0,volumeSize, volumeSize)
r = int(volumeSize/2) # radius can be changed by changing r value
center = int(volumeSize/2) # center can be changed here
u,v,w = np.meshgrid(x_, y_, z_, indexing='ij')
a = np.power(u-center, 2)+np.power(v-center, 2)+np.power(w-center, 2)
b = np.where(a<=r*r,1,0)
return b

Autocorrelation of a multidimensional array in numpy

I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.
from itertools import product
from numpy import empty, roll
def autocorrelate(x):
"""
Compute the multidimensional autocorrelation of an nd array.
input: an nd array of floats
output: an nd array of autocorrelations
"""
# used for transposes
t = roll(range(x.ndim), 1)
# pairs of indexes
# the first is for the autocorrelation array
# the second is the shift
ii = [list(enumerate(range(1, s - 1))) for s in x.shape]
# initialize the resulting autocorrelation array
acor = empty(shape=[len(s0) for s0 in ii])
# iterate over all combinations of directional shifts
for i in product(*ii):
# extract the indexes for
# the autocorrelation array
# and original array respectively
i1, i2 = asarray(i).T
x1 = x.copy()
x2 = x.copy()
for i0 in i2:
# clip the unshifted array at the end
x1 = x1[:-i0]
# and the shifted array at the beginning
x2 = x2[i0:]
# prepare to do the same for
# the next axis
x1 = x1.transpose(t)
x2 = x2.transpose(t)
# normalize shifted and unshifted arrays
x1 -= x1.mean()
x1 /= x1.std()
x2 -= x2.mean()
x2 /= x2.std()
# compute the autocorrelation directly
# from the definition
acor[tuple(i1)] = (x1 * x2).mean()
return acor

scipy smart optimize

I need to fit some points from different datasets with straight lines. From every dataset I want to fit a line. So I got the parameters ai and bi that describe the i-line: ai + bi*x. The problem is that I want to impose that every ai are equal because I want the same intercepta. I found a tutorial here: http://www.scipy.org/Cookbook/FittingData#head-a44b49d57cf0165300f765e8f1b011876776502f. The difference is that I don't know a priopri how many dataset I have. My code is this:
from numpy import *
from scipy import optimize
# here I have 3 dataset, but in general I don't know how many dataset are they
ypoints = [array([0, 2.1, 2.4]), # first dataset, 3 points
array([0.1, 2.1, 2.9]), # second dataset
array([-0.1, 1.4])] # only 2 points
xpoints = [array([0, 2, 2.5]), # first dataset
array([0, 2, 3]), # second, also x coordinates are different
array([0, 1.5])] # the first coordinate is always 0
fitfunc = lambda a, b, x: a + b * x
errfunc = lambda p, xs, ys: array([ yi - fitfunc(p[0], p[i+1], xi)
for i, (xi,yi) in enumerate(zip(xs, ys)) ])
p_arrays = [r_[0.]] * len(xpoints)
pinit = r_[[ypoints[0][0]] + p_arrays]
fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints, ypoints))
I got
Traceback (most recent call last):
File "prova.py", line 19, in <module>
fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints, ypoints))
File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 266, in leastsq
m = check_func(func,x0,args,n)[0]
File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 12, in check_func
res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
File "prova.py", line 14, in <lambda>
for i, (xi,yi) in enumerate(zip(xs, ys)) ])
ValueError: setting an array element with a sequence.
if you just need a linear fit, then it is better to estimate it with linear regression instead of a non-linear optimizer.
More fit statistics could be obtained be using scikits.statsmodels instead.
import numpy as np
from numpy import array
ypoints = np.r_[array([0, 2.1, 2.4]), # first dataset, 3 points
array([0.1, 2.1, 2.9]), # second dataset
array([-0.1, 1.4])] # only 2 points
xpoints = [array([0, 2, 2.5]), # first dataset
array([0, 2, 3]), # second, also x coordinates are different
array([0, 1.5])] # the first coordinate is always 0
xp = np.hstack(xpoints)
indicator = []
for i,a in enumerate(xpoints):
indicator.extend([i]*len(a))
indicator = np.array(indicator)
x = xp[:,None]*(indicator[:,None]==np.arange(3)).astype(int)
x = np.hstack((np.ones((xp.shape[0],1)),x))
print np.dot(np.linalg.pinv(x), ypoints)
# [ 0.01947973 0.98656987 0.98481549 0.92034684]
The matrix of regressors has a common intercept, but different columns for each dataset:
>>> x
array([[ 1. , 0. , 0. , 0. ],
[ 1. , 2. , 0. , 0. ],
[ 1. , 2.5, 0. , 0. ],
[ 1. , 0. , 0. , 0. ],
[ 1. , 0. , 2. , 0. ],
[ 1. , 0. , 3. , 0. ],
[ 1. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 1.5]])
(Side note: use def, not lambda assigned to a name -- that's utterly silly and has nothing but downsides, lambda's only use is making anonymous functions!).
Your errfunc should return a sequence (array or otherwise) of floating point numbers, but it's not, because you're trying to put as the items of your arrays the arrays which are the differences each y point (remember, ypoints aka ys is a list of arrays!) and the fit functions' results. So you need to "collapse" the expression yi - fitfunc(p[0], p[i+1], xi) to a single floating point number, e.g. norm(yi - fitfunc(p[0], p[i+1], xi)).

Categories

Resources