Simple moving average 2D array python - python

I am trying to compute a simple moving average for each line of a 2D array. The data in each row is a separate data set, so I can't just compute the SMA over the whole array, I need to do it seperately in each line. I have tried a for loop but it is taking the window as rows, rather than individual values.
The equation I am using to compute the SMA is: a1+a2+...an/n
This is the code I have so far:
import numpy as np
#make amplitude array
amplitude=[0,1,2,3, 5.5, 6,5,2,2, 4, 2,3,1,6.5,5,7,1,2,2,3,8,4,9,2,3,4,8,4,9,3]
#split array up into a line for each sample
traceno=5 #number of traces in file
samplesno=6 #number of samples in each trace. This wont change.
amplitude_split=np.array(amplitude, dtype=np.int).reshape((traceno,samplesno))
#define window to average over:
window_size=3
#doesn't work for values that come before the window size. i.e. index 2 would not have enough values to divide by 3
#define limits:
lowerlimit=(window_size-1)
upperlimit=samplesno
i=window_size
for row in range(traceno):
for n in range(samplesno):
while lowerlimit<i<upperlimit:
this_window=amplitude_split[(i-window_size):i]
window_average=sum(this_window)/window_size
i+=1
print(window_average)
My expected output for this data set is:
[[1, 2, 3.33, 4.66]
[3, 2.66, 2.66, 3. ]
[4, 6, 4.33, 3.33]
[4.33, 5, 7, 5. ]
[5, 5.33, 7, 5.33]]
But I am getting:
[2. 3. 3. 4.66666667 2.66666667 3.66666667]
[2.66666667 3.66666667 5. 5. 4. 2.33333333]
[2. 4.33333333 7. 5. 6.33333333 2.33333333]

You can use convolution to [1, 1, ..., 1] of window_size and then divide it to window_size to get average (no need for loop):
from scipy.signal import convolve2d
window_average = convolve2d(amplitude_split, np.ones((1, window_size)), 'valid') / window_size)
convolution to ones basically adds up elements in the window.
output:
[[1. 2. 3.33333333 4.66666667]
[3. 2.66666667 2.66666667 3. ]
[4. 6. 4.33333333 3.33333333]
[4.33333333 5. 7. 5. ]
[5. 5.33333333 7. 5.33333333]]

That should be easy to compute with np.correlate, using a vector np.ones(window_size) / window_size, but unfortunately that function does not seem to be able to broadcast the correlation operation. So here is another simple way to compute that with np.cumsum:
import numpy as np
amplitude = [ 0, 1, 2, 3, 5.5, 6,
5, 2, 2, 4, 2, 3,
1, 6.5, 5, 7, 1, 2,
2, 3, 8, 4, 9, 2,
3, 4, 8, 4, 9, 3]
traceno = 5
samplesno = 6
amplitude_split = np.array(amplitude, dtype=np.int).reshape((traceno, samplesno))
window_size = 3
# Scale down by window size
a = amplitude_split * (1.0 / window_size)
# Cumsum across columns
b = np.cumsum(a, axis=1)
# Add an initial column of zeros
c = np.pad(b, [(0, 0), (1, 0)])
# Take difference to get means
result = c[:, window_size:] - c[:, :-window_size]
print(result)
# [[1. 2. 3.33333333 4.66666667]
# [3. 2.66666667 2.66666667 3. ]
# [4. 6. 4.33333333 3.33333333]
# [4.33333333 5. 7. 5. ]
# [5. 5.33333333 7. 5.33333333]]

Related

How to do an absolute sorting of columns in a matrix (numpy)

So i have a matrix:
a = np.array([[7,-1,0,5],
[2,5.2,4,2],
[3,-2,1,4]])
which i would like to sort by absolute value ascending column . I used np.sort(abs(a)) and sorted(a,key=abs), sorted is probably the right one but do not know how to use it for columns. I wish to get
a = np.array([[2,-1,0,2],
[3,-2,1,4],
[7,5.2,4,5]])
Try argsort on axis=0 then take_along_axis to apply the order to a:
import numpy as np
a = np.array([[7, -1, 0, 5],
[2, 5.2, 4, 2],
[3, -2, 1, 4]])
s = np.argsort(abs(a), axis=0)
a = np.take_along_axis(a, s, axis=0)
print(a)
a:
[[ 2. -1. 0. 2. ]
[ 3. -2. 1. 4. ]
[ 7. 5.2 4. 5. ]]

How to get a mapping from high to low resolution ndarray in numpy

I need to make a function that takes in a vector of indices in high resolution, e.g hr, and outputs their corresponding indices when sampled in low resolution lr.
My thoughts were to create a translation matrix as follows:
In the following matrix, whose high resolution is (6, 12), and whose low resolution is (2, 4)
If the input vector is
v = [0, 1, 4, 24, 36, 42]
I would achieve my translation as
w = m[v] which I would expect to output [0,0,1,0,4,6]
Questions:
Is this the right way to go?
If so, how can I create that m ndarray in numpy?
Also, if there is a better name for this question, please let me know so I could change it.
Space-efficient way:
import numpy as np
hires = np.array((6, 12))
lowres = np.array((2,4))
h, w = hires // lowres
m = np.arange(np.prod(lowres)).reshape(lowres)
print(m)
# [[0 1 2 3]
# [4 5 6 7]]
v = [0, 1, 4, 24, 36, 42]
i, j = np.unravel_index(v, hires)
w = m[i // h, j // w]
print(w)
# [0 0 1 0 4 6]
Space-inefficient way:
import numpy as np
hires = np.array((6, 12))
lowres = np.array((2,4))
h, w = hires // lowres
# DON'T DO THIS. INEFFICIENT
m = np.kron(np.arange(np.prod(lowres)).reshape(lowres), np.ones(h, w), )
print(m)
# [[0. 0. 0. 1. 1. 1. 2. 2. 2. 3. 3. 3.]
# [0. 0. 0. 1. 1. 1. 2. 2. 2. 3. 3. 3.]
# [0. 0. 0. 1. 1. 1. 2. 2. 2. 3. 3. 3.]
# [4. 4. 4. 5. 5. 5. 6. 6. 6. 7. 7. 7.]
# [4. 4. 4. 5. 5. 5. 6. 6. 6. 7. 7. 7.]
# [4. 4. 4. 5. 5. 5. 6. 6. 6. 7. 7. 7.]]
v = [0, 1, 4, 24, 36, 42]
w = m[np.unravel_index(v, hires)]
print(w)
# [0. 0. 1. 0. 4. 6.]
The main idea here is to use np.unravel_index to convert a "flat index" into a tuple of coordinates given the shape of the array you intend to index into.
For example,
In [446]: np.unravel_index([0, 1, 4, 24, 36, 42], (6, 12))
Out[446]: (array([0, 0, 0, 2, 3, 3]), array([0, 1, 4, 0, 0, 6]))
It returns two indexing arrays which together give the coordinates of the 0th, 1st, 4th, etc. "flattened" elements in an array of shape (6, 12).
The space-inefficient method constructs the big m array and then finds w by indexing m with those coordinates: w = m[np.unravel_index(v, hires)].
The more space-efficent method simply integer-divides the coordinates by the block size (in this case, 3-by-3) to generate low-resolution coordinates.
This avoids the need to generate the big matrix m. We can instead use a smaller matrix
In [447]: m = np.arange(np.prod(lowres)).reshape(lowres); m
Out[447]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
and index into that: w = m[i // h, j // w].
You might also be interested in np.ravel_multi_index, which is the inverse of np.unravel_index:
In [451]: np.ravel_multi_index((np.array([0, 0, 0, 2, 3, 3]), np.array([0, 1, 4, 0, 0, 6])), (6, 12))
Out[451]: array([ 0, 1, 4, 24, 36, 42])
It converts the coordinate arrays, i and j, back into v.

Interpolate 2D matrix along columns using Python

I am trying to interpolate a 2D numpy matrix with the dimensions (5, 3) to a matrix with the dimensions (7, 3) along the axis 1 (columns). Obviously, the wrong approach would be to randomly insert rows anywhere between the original matrix, see the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Target (terrible interpolation -> not wanted!):
[[0, 1, 1]
[0, 1.5, 0.5]
[0, 2, 0]
[0, 3, 1]
[0, 3.5, 0.5]
[0, 4, 0]
[0, 5, 1]]
The correct approach would be to take every row into account and interpolate between all of them to expand the source matrix to a (7, 3) matrix. I am aware of the scipy.interpolate.interp1d or scipy.interpolate.interp2d methods, but could not get it to work with other Stack Overflow posts or websites. I hope to receive any type of tips or tricks.
Update #1: The expected values should be equally spaced.
Update #2:
What I want to do is basically use the separate columns of the original matrix, expand the length of the column to 7 and interpolate between the values of the original column. See the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Split into 3 separate Columns:
[0 [1 [1
0 2 0
0 3 1
0 4 0
0] 5] 1]
Expand length to 7 and interpolate between them, example for second column:
[1
1.66
2.33
3
3.66
4.33
5]
It seems like each column can be treated completely independently, but for each column you need to define essentially an "x" coordinate so that you can fit some function "f(x)" from which you generate your output matrix.
Unless the rows in your matrix are associated with some other datastructure (e.g. a vector of timestamps), an obvious set of x values is just the row-number:
x = numpy.arange(0, Source.shape[0])
You can then construct an interpolating function:
fit = scipy.interpolate.interp1d(x, Source, axis=0)
and use that to construct your output matrix:
Target = fit(numpy.linspace(0, Source.shape[0]-1, 7)
which produces:
array([[ 0. , 1. , 1. ],
[ 0. , 1.66666667, 0.33333333],
[ 0. , 2.33333333, 0.33333333],
[ 0. , 3. , 1. ],
[ 0. , 3.66666667, 0.33333333],
[ 0. , 4.33333333, 0.33333333],
[ 0. , 5. , 1. ]])
By default, scipy.interpolate.interp1d uses piecewise-linear interpolation. There are many more exotic options within scipy.interpolate, based on higher order polynomials, etc. Interpolation is a big topic in itself, and unless the rows of your matrix have some particular properties (e.g. being regular samples of a signal with a known frequency range), there may be no "truly correct" way of interpolating. So, to some extent, the choice of interpolation scheme will be somewhat arbitrary.
You can do this as follows:
from scipy.interpolate import interp1d
import numpy as np
a = np.array([[0, 1, 1],
[0, 2, 0],
[0, 3, 1],
[0, 4, 0],
[0, 5, 1]])
x = np.array(range(a.shape[0]))
# define new x range, we need 7 equally spaced values
xnew = np.linspace(x.min(), x.max(), 7)
# apply the interpolation to each column
f = interp1d(x, a, axis=0)
# get final result
print(f(xnew))
This will print
[[ 0. 1. 1. ]
[ 0. 1.66666667 0.33333333]
[ 0. 2.33333333 0.33333333]
[ 0. 3. 1. ]
[ 0. 3.66666667 0.33333333]
[ 0. 4.33333333 0.33333333]
[ 0. 5. 1. ]]

Find numpy array coordinates of neighboring maximum

I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]

numpy: how interpolate between two arrays for various timesteps?

I'm looking for a way to do a simple linear interpolation between two numpy arrays that represent a start and endpoint in time.
The two arrays have the same length:
fst = np.random.random_integers(5, size=(10.))
>>> array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.random.random_integers(5, size=(10.))
>>> array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
Between my start and endpoint there are 3 timesteps. How can I interpolate between fst and snd? I want to be able, taking the first entry of fst and snd as an example, to retrieve the value of each timestep like
np.interp(1, [1,5], [4,1])
np.interp(2, [1,5], [4,1])
...
# that is
np.interp([1,2,3,4,5], [1,5], [4,1])
>>> array([ 4. , 3.25, 2.5 , 1.75, 1. ])
But than not just for the first entry but over the whole array.
Obviously, this won't do it:
np.interp(1, [1,5], [fst,snd])
Well I know I get there in a loop, e.g.
[np.interp(2, [1,5], [item,snd[idx]]) for idx,item in enumerate(fst)]
>>> [3.25, 3.25, 1.5, 3.25, 1.0, 4.25, 3.5, 2.75, 4.75, 2.25]
but I believe when you are lopping over numpy arrays you are doing something fundamentally wrong.
The facilities in scipy.interpolate.interp1d allow this to be done quite easily if you form your samples into a 2D matrix. In your case, you can construct a 2xN array, and construct an interpolation function that operates down the columns:
from scipy.interpolate import interp1d
fst = np.array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
linfit = interp1d([1,5], np.vstack([fst, snd]), axis=0)
You can then generate an interpolated vector at any time of interest. For example linfit(2) produces:
array([ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25])
or you can invoke linfit() with a vector of time values, e.g. linfit([1,2,3]) gives:
array([[ 4. , 4. , 1. , 3. , 1. , 4. , 3. , 2. , 5. , 2. ],
[ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25],
[ 2.5 , 2.5 , 2. , 3.5 , 1. , 4.5 , 4. , 3.5 , 4.5 , 2.5 ]])
If you're only doing linear interpolation, you could also just do something like:
((5-t)/(5-1)) * fst + ((t-1)/(5-1)) * snd
to directly compute the interpolated vector at any time t.

Categories

Resources