Is there any numpy or scipy or python function to interpolate between two 2D numpy array's? I have two 2D numpy arrays, and I want to apply changes to the first numpy array to make it similar to the second 2D array. The constraint is that I want the changes to be smooth. e.g., let the arrays be:
A
[[1 1 1
1 1 1
1 1 1]]
and
B
[[34 100 15
62 17 87
17 34 60]]
To make A similar to B, I could add 33 to the first grid cell of A and so on.. However, to make the changes smoother, I plan to compute a mean using a 2x2 window on array B and then apply the resulting changes to array A. Is there a built in numpy or scipy method to do this or follow this approach without using for loop.
You've just described a Kalman Filtering / data fusion problem. You have an initial state A that has some errors and you have some observations B that also have some noise. You want to improve your estimate of state A by injecting some information from B, all while accounting for spatially correlated errors in both datasets. We don't have any prior information about the errors in A and B, so we can just make it up. Here's an implementation:
import numpy as np
# Make a matrix of the distances between points in an array
def dist(M):
nx = M.shape[0]
ny = M.shape[1]
x = np.ravel(np.tile(np.arange(nx),(ny,1))).reshape((nx*ny,1))
y = np.ravel(np.tile(np.arange(ny),(nx,1))).reshape((nx*ny,1))
n,m = np.meshgrid(x,y)
d = np.sqrt((n-n.T)**2+(m-m.T)**2)
return d
# Turn a distance matrix into a covariance matrix. Here is a linear covariance matrix.
def covariance(d,scaling_factor):
c = (-d/np.amax(d) + 1)*scaling_factor
return c
A = np.array([[1,1,1],[1,1,1],[1,1,1]]) # background state
B = np.array([[34,100,15],[62,17,87],[17,34,60]]) # observations
x = np.ravel(A).reshape((9,1)) # vector representation
y = np.ravel(B).reshape((9,1)) # vector representation
P_a = np.eye(9)*50 # background error covariance matrix (set to diagonal here)
P_b = covariance(dist(B),2) # observation error covariance matrix (set to a function of distance here)
# Compute the Kalman gain matrix
K = P_a.dot(np.linalg.inv(P_a+P_b))
x_new = x + K.dot(y-x)
A_new = x_new.reshape(A.shape)
print(A)
print(B)
print(A_new)
Now, this method only works if your data are unbiased. So mean(A) must equal mean(B). But you'll still get okay results regardless. Also, you can play with the covariance matrices however you like. I'd recommend reading the Kalman filter wikipedia page for more details.
By the way, the example above yields:
[[ 27.92920141 90.65490699 7.17920141]
[ 55.92920141 7.65490699 79.17920141]
[ 10.92920141 24.65490699 52.17920141]]
One way of smoothing could be to use convolve2d:
import numpy as np
from scipy import signal
B = np.array([[34, 100, 15],
[62, 17, 87],
[17, 34, 60]])
kernel = np.full((2, 2), .25)
smoothed = signal.convolve2d(B, kernel)
# [[ 8.5 33.5 28.75 3.75]
# [ 24. 53.25 54.75 25.5 ]
# [ 19.75 32.5 49.5 36.75]
# [ 4.25 12.75 23.5 15. ]]
The above pads the matrix with zeros from all sides and then calculates the mean of each 2x2 window placing the value at the center of the window.
If the matrices were actually larger, then using a 3x3 kernel (such as np.full((3, 3), 1/9)) and passing mode='same' to convolve2d would give a smoothed B with its shape preserved and elements "matching" the original. Otherwise you may need to decide what to do with the boundary values to make the shapes the same again.
To move A towards the smoothed B, it can be set to a chosen affine combination of the matrices using standard arithmetic operations, for instance: A = .2 * A + .8 * smoothed.
Related
I am seeking to construct a matrix of which I will calculate the inverse. This will be used in an implicit method for solving a nonlinear parabolic PDE. My current calculations are, which will become obvious to why, giving me a singular (no possible inverse) matrix. For context, in reality the matrix will be of dimension 30 by 30 but in these examples I am using smaller matrices for testing purposes.
Say I want to create a large square sparse matrix. Using spdiags only allows you to input members of the main, lower and upper diagonals individually. So how to you make it so that each diagonal has one value for all its entries?
Example Code:
import numpy as np
from scipy.sparse import spdiags
from numpy.linalg import inv
updiag = -0.25
diag = 0.5
lowdiag = -0.25
Jdata = np.array([[diag], [lowdiag], [updiag]])
Diags = [0, -1, 1]
J = spdiags(Jdata, Diags, 3, 3).toarray()
print(J)
inverseJ = inv(J)
print(inverseJ)
This would produce an 3 x 3 matrix but only with the first entry of each diagonal given. I wondered about using np.fill_diagonal but that would require a matrix first and only does the main diagonal. Am I misunderstanding something?
The first argument of spdiags is a matrix of values to be used as the diagonals. You can use it this way:
Jdata = np.array([3 * [diag], 3 * [lowdiag], 3 * [updiag]])
Diags = [0, -1, 1]
J = spdiags(Jdata, Diags, 3, 3).toarray()
print(J)
# [[ 0.5 -0.25 0. ]
# [-0.25 0.5 -0.25]
# [ 0. -0.25 0.5 ]]
Hello i have a question regarding a problem I am facing in python. I was studying about tensors and I saw that each row/column of a tensor must have the same size. Is it possible to create a tensor of perhaps a 3d object or matrix where lets say we have 3 axis : x,y,z
In the x axis I want to create a vector to work as an index. So let x be from 0 to N
Then on the y axis I want to have N random integer vectors of size m (where mm
Is it possible?
My first approach was to create a big vector of Nm and a big matrix of (Nm,Nm) dimensions where i would store all my random vectors and matrices and then if I wanted to change for example the my second vector then i would have to play with the indexes. However is there another way to approach this problem with tensors or numpy that I m unaware of?
Thank you in advance for your advices
First vector, N = 3, [1,2, 3]
Second N vectors with length m, m = 2
[[4,5], [6,7], [7,8]]
So, N matrices of size (m,m)
[[[1,1], [2,2]], [[1,1], [2,2]], [[1,1], [2,2]] ]
Lets create numpy arrays from them.
import numpy as np
N = 3
m = 2
a = np.array([1,2,3])
b = np.random.randn(N, m)
c = np.random.randn(N, m, m)
You see the problem here? The last matrix c has already 3 dimensions according to your definitions.
Your argument can be simplified.
Let's say our final matrix is -
a = np.zeros((3,2,2)) # 3 dimensions, x,y,z
1) For first dimension -
a[0,:,:] = 0 # first axis, first index = 0
a[1,:,:] = 1 # first axis, 2nd index = 1
a[2,:,:] = 2 # first axis, 3rd index = 2
2) Now, we need to fill up the rest of the positions, but dimensions don't match up.
So, it's better to create separate tensors for them.
I am following this tutorial to implement object tracking for my project - https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/
Method is to find centroids of detected objects in the initial frame, and then calculate the shortest distance to the other centroids of detected objects that show up on the next frame. Assumption is that a centroid that is closest would be a same object.
In the tutorial -
from scipy.spatial import distance as dist
...
D = dist.cdist(np.array(objectCentroids), newCentroids)
is used to calculate the distance (Euclidean Distance). Unfortunately, I cannot use scipy module as I am trying to deploy this to AWS Lambda (size limit). In this case, the recommendation is to use this - https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html
D = np.linalg.norm(objectCentroids – newCentroids)
The issue with this is that, unlike dist.cdist, where it computes all and any matrix, np.linalg.norm only outputs 1 value, which is calculated after newCentroids is subtracted from objectCentroids matrix. I am about to loop over n times (however big the matrix is) and append to another matrix to construct the result I need. However, I wasn't sure if my understanding of this concept is correct or not, so I wanted to seek out some help. If anyone knows of a better way, I would appreciate any pointer.
UPDATE
Based on the feedback/answer I got, I updated the code a bit, and well... it seems to be working -
n = arrayObjectCentroids.shape[0]
m = inputCentroids.shape[0]
T = []
for i in range(0,n):
for z in range(0,m):
Tv = np.linalg.norm(arrayObjectCentroids[i] - inputCentroids[z])
# print(f'Tv is \n {Tv}')
T = np.append(T, Tv)
# print(f'T is \n {T}')
print(f'new T is \n {T}')
D = np.reshape(T, (n, m))
print(f'D is \n {D}')
In this case, if there is one object and moving a little -
newCentroids is [[224 86]], and the shape of it is (1, 2)...
objectCentroids is [[224 86]], and the shape objectCentroids is (1, 2)
D is[[0.]]
If I have 3 objects, -
new Centroids is
[[228 79]
[ 45 127]
[103 123]]
shape of inputCentroids is (3, 2)
objectCentroids is
[[228 79]
[ 45 127]
[103 123]]
shape objectCentroids is (3, 2)
D is
[[ 0. 189.19038031 132.51792332]
[189.19038031 0. 58.13776741]
[132.51792332 58.13776741 0. ]]
Great that it works, but I feel like this may not be the best solution out there, and if you have any pointer, I would appreciate it!
Thanks!
EDIT: Edited code to address comments below
If in your case you have vectors in Euclidean space then np.linalg.norm will return the length of that vector.
So objectCentroid – newCentroid will give you the vector between the point at objectCentroid and the point at newCentroid. Note that is between 2 points and not an array containing ALL points.
To get all combinations of points I've used itertools & then reshaped the array to give the same output as dist
import numpy as np
from scipy.spatial import distance as dist
import itertools
# Example data
objectCentroids = np.array([[0,0,0],[1,1,1],[2,2,2], [3,3,3]])
newCentroids = np.array([[4,4,4],[5,5,5],[6,6,6],[7,7,7]])
comb = list(itertools.product(objectCentroids, newCentroids))
all_dist = []
for pair in comb:
dis = np.linalg.norm((pair[0] - pair[1]))
all_dist.append(dis)
all_dist = np.reshape(all_dist, (len(objectCentroids), len(objectCentroids)))
D = dist.cdist(objectCentroids, newCentroids)
print(D)
print(" ")
print(all_dist)
You can use Numpy broadcasting to create a distance matrix.
Read about it here and here.
The basic idea is:
Stack (reshape) your centroids as (1, n, 3) and (n, 1, 3) where the last dimension with shape 3 is (x,y,z). Then subtract the arrays and use np.linalg.norm to calculate the distance along axis ... hm ... probably the last one. That should yield a square (n,n) distance matrix.
I had seen several discussions in this forum about applying median filter with moving window, but my application have a special peculiarity.
I have a 3D array of dimension 750x12000x10000 and I need to apply a median filter to result in a 2D array (12000x10000). For this, each median calculation should consider a fixed neighborhood window (usually 100x100) and all z-axis values. There are some zero values in the matrix and they should not be considered for the calculation of the median. To proccessing real data, I am using numpy.memmap:
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(750, 12000, 10000))
To proccessing the real data stored with memmap, my input array is subdivided into several chunks, but to increase the speed of my tests, I will use in this post a reduced array (11, 200, 300) and a smaller window (11, 5, 5) or (11, 50, 50) and I expect a result matrix (200, 300):
import numpy as np
from timeit import default_timer as timer
zsize, ysize, xsize = (11, 200, 300)
w_size = 5 #to generate a 3D window (all_z, w_size, w_size)
#w_size = 50 #to generate a 3D window (all_z, w_size, w_size)
m_in=np.arange(zsize*ysize*xsize).reshape(zsize, ysize, xsize)
m_out = np.zeros((ysize, xsize))
First, I've tried the brute force method, but it is very slow as expected (even for the small array):
start = timer()
for l in range(0, ysize):
i_l = max(0, l - w_size/2)
o_l = min(ysize, i_l+w_size/2)
for c in range(0, xsize):
i_c = max(0, c - w_size/2)
o_c = min(xsize, i_c+w_size/2)
values = m_in[:, i_l:o_l, i_c:o_c]
values = values[np.nonzero(values)]
value = np.median(values)
m_out[l, c] = value
end = timer()
print("Time elapsed: %f seconds"%(end-start))
#11.7 seconds with 50 in z, 7.9 seconds with 5 in z
To remove the double-for, I tried to use itertools.product, but it still remains slow:
from itertools import product
for l, c in product(range(0, ysize), range(0, xsize)):
i_l = max(0, l - w_size/2)
o_l = min(ysize, i_l+w_size/2)
i_c = max(0, c - w_size/2)
o_c = min(xsize, i_c+w_size/2)
values = m_in[:, i_l:o_l, i_c:o_c]
values = values[np.nonzero(values)]
value = np.median(values)
m_out[l, c] = value
#11.7 seconds with 50 in z, 2.3 seconds with 5
So I tried to use the performance of matrix operations of numpy, so I tried with scipy.ndimage:
from scipy import ndimage
m_all = ndimage.median_filter(m_in, size=(zsize, w_size, w_size))
m_out[:] = m_all[0] #only first layer of 11, considering all the same
#a lot of seconds with 50 in z, 7.9 seconds with 5
and scipy.signal too:
m_all = signal.medfilt(m_in, kernel_size=(zsize, w_size, w_size))
m_out[:] = m_all[0] #only first layer of 11, considering all the same
#a lot of seconds with 50 in z, 7.8 seconds with 5 in z
But in both scipy cases, there are a waste of processing because the function is applied in all 3D positions of input matrix, however, it could be applied only in the first layer using a sliding window with dimension (all_z, w_size, w_size).
In all my tests, I did not had an fast execution time even when I used the reduced matrix and windows ((11, 200, 300) and (11, 50, 50)). The performance will be even more critical using my real data (an array of 750x12000x10000, and window of 750x100x100).
Please, can anyone help me to apply the median filter (3D array to 2D array) with a more best pythonic way?
Edit1
The real data array has many zero values. When considering a single axis, of the 750 values, about 15 are non-zero values. The zeros must be discarded in the processing, and because of this, I am not using a sparse array representation.
This ended up being too long for a comment:
If you were applying a mean-filter, this problem would be trivial: you would take the mean over the z-axis and then apply the mean filter in 2D; this would be exactly equivalent to computing the mean over the full (x,y,z) neighbourhood in one go as the mean operation is associative (if that is the term; I mean: f(f(a,b), c) = f(a, b, c)).
In principle, this is not true for the median. However, as your neighbourhoods in (x,y) and z are both fairly large, I would assume that associativity still approximately holds (unless your data is drawn from a whacky distribution which it probably is not as this looks like some sort of imaging data). If I were you, I would test on some test data if applying the median in z first and then the median filter (or maybe even a mean filter) in (x,y) results in an unacceptable error compared to computing the median exactly by filtering in (x,y,z) simultaneously.
I tried to use numpy.random.multivariate_normal to do random samplings on some 30000+ variables, while it always took all of my memory (32G) and then terminated. Actually, the correlation is spherical and every variable is correlated to about only 2500 other variables. Is there another way to specify the spherical covariance matrix, rather than the full covariance matrix, or any other way to reduce the usage of the memory?
My code is like this:
cm = [] #covariance matrix
for i in range(width*height):
cm.append([])
for j in range(width*height):
cm[i].append(corr_calc()) #corr is inversely proportional to the distance
mean = [vth]*(width*height)
cache_vth=numpy.random.multivariate_normal(mean,cm)
If your correlation is spherical, that is the same as saying that the value along each dimension is uncorrelated to the other dimensions, and that the variance along every dimension is the same. You don't need to build the covariance matrix at all, drawing one sample from your 30,000-D multivariate normal is the same as drawing 30,000 samples from a 1-D normal. That is, instead of doing:
n = 30000
mu= 0
corr = 1
cm = np.eye(n) * corr
mean = np.ones((n,)) * mu
np.random.multivariate_normal(mean, cm)
Which fails when trying to build the cm array, try the following:
n = 30000
mu = 0
corr = 1
>>> np.random.normal(mu, corr, size=n)
array([ 0.88433649, -0.55460098, -0.74259886, ..., 0.66459841,
0.71225572, 1.04012445])
If you want more than one random sample, say 3, try
>>> np.random.normal(mu, corr, size=(3, n))
array([[-0.97458499, 0.05072532, -0.0759601 , ..., -0.31849315,
-2.17552787, -0.36884723],
[ 1.5116701 , 2.53383547, 1.99921923, ..., -1.2769304 ,
0.36912488, 0.3024549 ],
[-1.12615267, 0.78125589, 0.67133243, ..., -0.45441239,
-1.21083007, 1.45696714]])