I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]
Related
So i have a matrix:
a = np.array([[7,-1,0,5],
[2,5.2,4,2],
[3,-2,1,4]])
which i would like to sort by absolute value ascending column . I used np.sort(abs(a)) and sorted(a,key=abs), sorted is probably the right one but do not know how to use it for columns. I wish to get
a = np.array([[2,-1,0,2],
[3,-2,1,4],
[7,5.2,4,5]])
Try argsort on axis=0 then take_along_axis to apply the order to a:
import numpy as np
a = np.array([[7, -1, 0, 5],
[2, 5.2, 4, 2],
[3, -2, 1, 4]])
s = np.argsort(abs(a), axis=0)
a = np.take_along_axis(a, s, axis=0)
print(a)
a:
[[ 2. -1. 0. 2. ]
[ 3. -2. 1. 4. ]
[ 7. 5.2 4. 5. ]]
I am trying to compute a simple moving average for each line of a 2D array. The data in each row is a separate data set, so I can't just compute the SMA over the whole array, I need to do it seperately in each line. I have tried a for loop but it is taking the window as rows, rather than individual values.
The equation I am using to compute the SMA is: a1+a2+...an/n
This is the code I have so far:
import numpy as np
#make amplitude array
amplitude=[0,1,2,3, 5.5, 6,5,2,2, 4, 2,3,1,6.5,5,7,1,2,2,3,8,4,9,2,3,4,8,4,9,3]
#split array up into a line for each sample
traceno=5 #number of traces in file
samplesno=6 #number of samples in each trace. This wont change.
amplitude_split=np.array(amplitude, dtype=np.int).reshape((traceno,samplesno))
#define window to average over:
window_size=3
#doesn't work for values that come before the window size. i.e. index 2 would not have enough values to divide by 3
#define limits:
lowerlimit=(window_size-1)
upperlimit=samplesno
i=window_size
for row in range(traceno):
for n in range(samplesno):
while lowerlimit<i<upperlimit:
this_window=amplitude_split[(i-window_size):i]
window_average=sum(this_window)/window_size
i+=1
print(window_average)
My expected output for this data set is:
[[1, 2, 3.33, 4.66]
[3, 2.66, 2.66, 3. ]
[4, 6, 4.33, 3.33]
[4.33, 5, 7, 5. ]
[5, 5.33, 7, 5.33]]
But I am getting:
[2. 3. 3. 4.66666667 2.66666667 3.66666667]
[2.66666667 3.66666667 5. 5. 4. 2.33333333]
[2. 4.33333333 7. 5. 6.33333333 2.33333333]
You can use convolution to [1, 1, ..., 1] of window_size and then divide it to window_size to get average (no need for loop):
from scipy.signal import convolve2d
window_average = convolve2d(amplitude_split, np.ones((1, window_size)), 'valid') / window_size)
convolution to ones basically adds up elements in the window.
output:
[[1. 2. 3.33333333 4.66666667]
[3. 2.66666667 2.66666667 3. ]
[4. 6. 4.33333333 3.33333333]
[4.33333333 5. 7. 5. ]
[5. 5.33333333 7. 5.33333333]]
That should be easy to compute with np.correlate, using a vector np.ones(window_size) / window_size, but unfortunately that function does not seem to be able to broadcast the correlation operation. So here is another simple way to compute that with np.cumsum:
import numpy as np
amplitude = [ 0, 1, 2, 3, 5.5, 6,
5, 2, 2, 4, 2, 3,
1, 6.5, 5, 7, 1, 2,
2, 3, 8, 4, 9, 2,
3, 4, 8, 4, 9, 3]
traceno = 5
samplesno = 6
amplitude_split = np.array(amplitude, dtype=np.int).reshape((traceno, samplesno))
window_size = 3
# Scale down by window size
a = amplitude_split * (1.0 / window_size)
# Cumsum across columns
b = np.cumsum(a, axis=1)
# Add an initial column of zeros
c = np.pad(b, [(0, 0), (1, 0)])
# Take difference to get means
result = c[:, window_size:] - c[:, :-window_size]
print(result)
# [[1. 2. 3.33333333 4.66666667]
# [3. 2.66666667 2.66666667 3. ]
# [4. 6. 4.33333333 3.33333333]
# [4.33333333 5. 7. 5. ]
# [5. 5.33333333 7. 5.33333333]]
So for example I have an array that I want to sort by a column in an ascending order, and it's easy to do for integers using 'sorting()', 'np.arrange()', or 'np.argsort()'.
However, what if my column is consisting of floats?
What would you recommend?
Edit:
I mean, I have something like:
a = array([[1.7, 2, 3],
[4.5, 5, 6],
[0.1, 0, 1]])
and I want to get this:
array([[0.1, 0, 1],
[1.7, 2, 3],
[4.5, 5, 6]])
So far with argsort() I get the following error:
Type Error: only integer scalar arrays can be converted to a scalar index
You can use a standard Python's sorted (or sort for in-place sorting), no matter what is contained in the sequence. Just use a custom key, or a custom compare function (cmp). For example, to sort a list of lists (2-d array) ascending by 4th column:
>>> a=[[1.0,2.0,3.0,4.0], [4.0,3.0,2.0,1.0], [0,0,0,0]]
>>> from operator import itemgetter
>>>> sorted(a, key=itemgetter(3))
[[0, 0, 0, 0], [4.0, 3.0, 2.0, 1.0], [1.0, 2.0, 3.0, 4.0]]
The standard way to do this in numpy is to specify the correct axis you want to sort on, by default it sorts on axis=-1:
>>> np.sort(a, axis=0)
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
Or inplace:
>>> a.sort(axis=0)
>>> a
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
To sort just on a specific column you can use argsort(), e.g. column 0:
>>> a[np.argsort(a[:,0])]
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
I'm looking for a way to do a simple linear interpolation between two numpy arrays that represent a start and endpoint in time.
The two arrays have the same length:
fst = np.random.random_integers(5, size=(10.))
>>> array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.random.random_integers(5, size=(10.))
>>> array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
Between my start and endpoint there are 3 timesteps. How can I interpolate between fst and snd? I want to be able, taking the first entry of fst and snd as an example, to retrieve the value of each timestep like
np.interp(1, [1,5], [4,1])
np.interp(2, [1,5], [4,1])
...
# that is
np.interp([1,2,3,4,5], [1,5], [4,1])
>>> array([ 4. , 3.25, 2.5 , 1.75, 1. ])
But than not just for the first entry but over the whole array.
Obviously, this won't do it:
np.interp(1, [1,5], [fst,snd])
Well I know I get there in a loop, e.g.
[np.interp(2, [1,5], [item,snd[idx]]) for idx,item in enumerate(fst)]
>>> [3.25, 3.25, 1.5, 3.25, 1.0, 4.25, 3.5, 2.75, 4.75, 2.25]
but I believe when you are lopping over numpy arrays you are doing something fundamentally wrong.
The facilities in scipy.interpolate.interp1d allow this to be done quite easily if you form your samples into a 2D matrix. In your case, you can construct a 2xN array, and construct an interpolation function that operates down the columns:
from scipy.interpolate import interp1d
fst = np.array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
linfit = interp1d([1,5], np.vstack([fst, snd]), axis=0)
You can then generate an interpolated vector at any time of interest. For example linfit(2) produces:
array([ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25])
or you can invoke linfit() with a vector of time values, e.g. linfit([1,2,3]) gives:
array([[ 4. , 4. , 1. , 3. , 1. , 4. , 3. , 2. , 5. , 2. ],
[ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25],
[ 2.5 , 2.5 , 2. , 3.5 , 1. , 4.5 , 4. , 3.5 , 4.5 , 2.5 ]])
If you're only doing linear interpolation, you could also just do something like:
((5-t)/(5-1)) * fst + ((t-1)/(5-1)) * snd
to directly compute the interpolated vector at any time t.
I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.