I have many (x,y) coordinates on a picture stimulus and I want to
Divide the entire size of the picture into several small cells
Give each cell a name (e.g., A, B, C, D....)
Map each point to the corresponding cell.
For example:
import numpy as np
n = 5
x = np.linspace(0, 10, n)
y = np.linspace(0, 10, n)
xv, yv = np.meshgrid(x, y, indexing='xy')
np.array([xv, yv])
array([[[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ]],
[[ 0. , 0. , 0. , 0. , 0. ],
[ 2.5, 2.5, 2.5, 2.5, 2.5],
[ 5. , 5. , 5. , 5. , 5. ],
[ 7.5, 7.5, 7.5, 7.5, 7.5],
[ 10. , 10. , 10. , 10. , 10. ]]])
If I have a big list of points (x,y)
points = [(1,3), (2,4), (0.4, 0.8), (3.5, 7.9), ...]
What I want to get is a pandas data frame with one column as the coordinates and one column as the cell names. For example, for the above four points if I get these cell names:
location = ['A','K','B','F']
Then I can create a data frame:
pd.DataFrame({'points': points,'location':location})
I want to know how to get the corresponding cell names (i.e. location). Constructing a dataframe from there is easy. Because I have a lot of cells and a lot of coordinates, I was wondering what is an efficient way to do this. The order of the cell names doesn't matter as long as each one is unique. If a point happens to sit on the cell boundary we can simply return np.nan.
If you divide a (N*M) picture into n*m cells like this (n=4,m=3):
A B C D
E F G H
I J K L
You can get the cell-row and -column by using np.digitize and from them find the corresponding letter.
In [115]: N, M = (10, 10)
In [116]: n, m = (4, 3)
In [117]: x_boundries = np.linspace(0, N, n+1)
In [118]: y_boundries = np.linspace(0, M, m+1)
In [119]: letters = np.array([chr(65+i) for i in range(n*m)]).reshape((m, n))
In [120]: letters
Out[120]:
array([['A', 'B', 'C', 'D'],
['E', 'F', 'G', 'H'],
['I', 'J', 'K', 'L']], dtype='<U1')
In [121]: points = [(1,3), (2,4), (0.4, 0.8), (3.5, 7.9)]
In [122]: xs, ys = zip(*points)
In [123]: xs, ys
Out[123]: ((1, 2, 0.4, 3.5), (3, 4, 0.8, 7.9))
In [124]: cell_row = np.digitize(ys, y_boundries)-1
In [125]: cell_column = np.digitize(xs, x_boundries)-1
In [126]: cell_row, cell_column
Out[126]: (array([0, 1, 0, 2]), array([0, 0, 0, 1]))
In [127]: locations = letters[cell_row, cell_column]
In [128]: locations
Out[128]: array(['A', 'E', 'A', 'J'], dtype='<U1')
Related
I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]
I have an nd array that looks as follows:
[[ 0. 1.73205081 6.40312424 7.21110255 2.44948974]
[ 1.73205081 0. 5.09901951 5.91607978 1. ]
[ 6.40312424 5.09901951 0. 1. 4.35889894]
[ 7.21110255 5.91607978 1. 0. 5.09901951]
[ 2.44948974 1. 4.35889894 5.09901951 0. ]]
Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows:
l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)]
Additionally, it would be cool to remove the diagonal elements and also the elements (j,i) as (i,j) are already there. Essentially, is it possible to take just the top triangular matrix of this?
Is this possible to do efficiently (without a lot of loops)? I had created this array with squareform, but couldn't find any docs to do this.
squareform does all this. Read the docs and experiment. It works in both directions. If you give it a matrix it returns the upper triangle values (condensed form). If you give it those values, it returns the matrix.
In [668]: M
Out[668]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
In [669]: spatial.distance.squareform(M)
Out[669]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [670]: v=spatial.distance.squareform(M)
In [671]: v
Out[671]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [672]: spatial.distance.squareform(v)
Out[672]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
You can also specify a force and checks parameter, but without those it just goes by the shape.
Indicies can come from triu
In [677]: np.triu_indices(4,1)
Out[677]:
(array([0, 0, 0, 1, 1, 2], dtype=int32),
array([1, 2, 3, 2, 3, 3], dtype=int32))
In [680]: np.vstack((np.triu_indices(4,1),v)).T
Out[680]:
array([[ 0. , 1. , 0.1],
[ 0. , 2. , 0.5],
[ 0. , 3. , 0.2],
[ 1. , 2. , 2. ],
[ 1. , 3. , 0.3],
[ 2. , 3. , 0.2]])
Just to check, we can fill in a 4x4 matrix with these values
In [686]: A=np.vstack((np.triu_indices(4,1),v)).T
In [687]: MM = np.zeros((4,4))
In [688]: MM[A[:,0].astype(int),A[:,1].astype(int)]=A[:,2]
In [689]: MM
Out[689]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0. , 0. , 2. , 0.3],
[ 0. , 0. , 0. , 0.2],
[ 0. , 0. , 0. , 0. ]])
Those triu indices can also fetch the values from M:
In [693]: I,J = np.triu_indices(4,1)
In [694]: M[I,J]
Out[694]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
squareform uses compiled code in spatial.distance._distance_wrap so I expect it will be quite fast for large arrays. Only problem it just returns the condensed form values, but not the indices. But given the shape,the indices can always be calculated. They don't need to be stored with the values.
If your input is x, first generate the indices:
i0,i1 = np.indices(x.shape)
Then:
np.concatenate((i1,i0,x)).reshape(3,5,5).T
That gives you the first result--for the entire matrix.
As for taking only the upper triangle, you might considering trying np.triu() but I'm not sure exactly what result you're looking for. You can probably figure out how to mask the parts you don't want now though.
you can try this,
print([(x,y, value) for (x,y), value in np.ndenumerate(numpymatrixarray)])
output [(0, 0, 0.0), (0, 1, 1.7320508100000001), (0, 2, 6.4031242400000004), (0, 3, 7.2111025499999997), (0, 4, 2.4494897400000002), (1, 0, 1.7320508100000001), (1, 1, 0.0), (1, 2, 5.0990195099999998), (1, 3, 5.9160797799999996), (1, 4, 1.0), (2, 0, 6.4031242400000004), (2, 1, 5.0990195099999998), (2, 2, 0.0), (2, 3, 1.0), (2, 4, 4.3588989400000004), (3, 0, 7.2111025499999997), (3, 1, 5.9160797799999996), (3, 2, 1.0), (3, 3, 0.0), (3, 4, 5.0990195099999998), (4, 0, 2.4494897400000002), (4, 1, 1.0), (4, 2, 4.3588989400000004), (4, 3, 5.0990195099999998), (4, 4, 0.0)]
Do you really want the top triangular matrix for an [nxm] matrix where n>m? That will give you (nxn-n)/2 elements and lose all the data where m⊖n.
What you probably want is the lower triangular matrix:
def tri_reduce(m):
n=m.shape
if n[0]>n[1]:
i=np.tril_indices(n[0],1,n[1])
else:
i=np.triu_indices(n[0],1,n[1])
return np.vstack((i,m[i])).T
Rebuilding it into a list of tuples would require a loop though I believe. list(tri_reduce(m)) would give a list of nd arrays.
I would like calculate the sum of two in two column in a matrix(the sum between the columns 0 and 1, between 2 and 3...).
So I tried to do nested "for" loops but at every time I haven't the good results.
For example:
c = np.array([[0,0,0.25,0.5],[0,0.5,0.25,0],[0.5,0,0,0]],float)
freq=np.zeros(6,float).reshape((3, 2))
#I calculate the sum between the first and second column, and between the fird and the fourth column
for i in range(0,4,2):
for j in range(1,4,2):
for p in range(0,2):
freq[:,p]=(c[:,i]+c[:,j])
But the result is:
print freq
array([[ 0.75, 0.75],
[ 0.25, 0.25],
[ 0. , 0. ]])
Normaly the good result must be (0., 0.5,0.5) and (0.75,0.25,0). So I think the problem is in the nested "for" loops.
Is there a person who know how I can calculate the sum every two columns, because I have a matrix with 400 columns?
You can simply reshape to split the last dimension into two dimensions, with the last dimension of length 2 and then sum along it, like so -
freq = c.reshape(c.shape[0],-1,2).sum(2).T
Reshaping only creates a view into the array, so effectively, we are just using the summing operation here and as such must be efficient.
Sample run -
In [17]: c
Out[17]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [18]: c.reshape(c.shape[0],-1,2).sum(2).T
Out[18]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Add the slices c[:, ::2] and c[:, 1::2]:
In [62]: c
Out[62]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [63]: c[:, ::2] + c[:, 1::2]
Out[63]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
Here is one way using np.split():
In [36]: np.array(np.split(c, np.arange(2, c.shape[1], 2), axis=1)).sum(axis=-1)
Out[36]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Or as a more general way even for odd length arrays:
In [87]: def vertical_adder(array):
return np.column_stack([np.sum(arr, axis=1) for arr in np.array_split(array, np.arange(2, array.shape[1], 2), axis=1)])
....:
In [88]: vertical_adder(c)
Out[88]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
In [94]: a
Out[94]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [95]: vertical_adder(a)
Out[95]:
array([[ 1, 5, 4],
[11, 15, 9],
[21, 25, 14]])
I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?