Related
I have an array of zeros
arr = np.zeros([5,5])
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
I want to assign values based on index so I did this .
out = np.array([[nan,2.,4.,1.,1.],[nan,3.,4.,4.,4.]])
arr[out[0].astype(int),np.arange(len(out[0]))] = 1
arr[out[1].astype(int),np.arange(len(out[1]))] = 1
Assignment works fine if there is 0 instead of nan.
How can I skip assignment in case of nan? and Is it possible to assign values at once from a multidimensional index array rather than using for loop ?
Mask it -
mask = ~np.isnan(out)
arr[out[0,mask[0]].astype(int),np.flatnonzero(mask[0])] = 1
arr[out[1,mask[1]].astype(int),np.flatnonzero(mask[1])] = 1
Sample run -
In [171]: out
Out[171]:
array([[ nan, 2., 4., 1., 1.],
[ nan, 3., 4., 4., 4.]])
In [172]: mask = ~np.isnan(out)
...: arr[out[0,mask[0]].astype(int),np.flatnonzero(mask[0])] = 1
...: arr[out[1,mask[1]].astype(int),np.flatnonzero(mask[1])] = 1
...:
In [173]: arr
Out[173]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 1.],
[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 1., 1.]])
Alternative, replace the flatnonzero calls with range-masking -
r = np.arange(arr.shape[1])
arr[out[0,mask[0]].astype(int),r[mask[0]]] = 1
arr[out[1,mask[1]].astype(int),r[mask[1]]] = 1
If you are working with a lot many rows than just 2 and you want to assign them in a vectorized manner, here's one method, using linear-indexing -
n = arr.shape[1]
linear_idx = (out*n + np.arange(n))
np.put(arr, linear_idx[~np.isnan(linear_idx)].astype(int), 1)
This question already has an answer here:
python: Multiply slice i of a matrix stack by column i of a matrix efficiently
(1 answer)
Closed 5 years ago.
There are really similar questions here, here, here, but I don't really understand how to apply them to my case precisely.
I have an array of matrices and an array of vectors and I need element-wise dot product. Illustration:
In [1]: matrix1 = np.eye(5)
In [2]: matrix2 = np.eye(5) * 5
In [3]: matrices = np.array((matrix1,matrix2))
In [4]: matrices
Out[4]:
array([[[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]],
[[ 5., 0., 0., 0., 0.],
[ 0., 5., 0., 0., 0.],
[ 0., 0., 5., 0., 0.],
[ 0., 0., 0., 5., 0.],
[ 0., 0., 0., 0., 5.]]])
In [5]: vectors = np.ones((5,2))
In [6]: vectors
Out[6]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [9]: np.array([m # v for m,v in zip(matrices, vectors.T)]).T
Out[9]:
array([[ 1., 5.],
[ 1., 5.],
[ 1., 5.],
[ 1., 5.],
[ 1., 5.]])
This last line is my desired output. Unfortunately it is very inefficient, for instance doing matrices # vectors that computes unwanted dot products due to broadcasting (if I understand well, it returns the first matrix dot the 2 vectors and the second matrix dot the 2 vectors) is actually faster.
I guess np.einsum or np.tensordot might be helpful here but all my attempts have failed:
In [30]: np.einsum("i,j", matrices, vectors)
ValueError: operand has more dimensions than subscripts given in einstein sum, but no '...' ellipsis provided to broadcast the extra dimensions.
In [34]: np.tensordot(matrices, vectors, axes=(0,1))
Out[34]:
array([[[ 6., 6., 6., 6., 6.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.],
[ 6., 6., 6., 6., 6.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 6., 6., 6., 6., 6.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 6., 6., 6., 6., 6.],
[ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 6., 6., 6., 6., 6.]]])
NB: my real-case scenario use more complicated matrices than matrix1 and matrix2
With np.einsum, you might use:
np.einsum("ijk,ki->ji", matrices, vectors)
#array([[ 1., 5.],
# [ 1., 5.],
# [ 1., 5.],
# [ 1., 5.],
# [ 1., 5.]])
You can use # as follows
matrices # vectors.T[..., None]
# array([[[ 1.],
# [ 1.],
# [ 1.],
# [ 1.],
# [ 1.]],
# [[ 5.],
# [ 5.],
# [ 5.],
# [ 5.],
# [ 5.]]])
As we can see it computes the right thing but arranges them wrong.
Therefore
(matrices # vectors.T[..., None]).squeeze().T
# array([[ 1., 5.],
# [ 1., 5.],
# [ 1., 5.],
# [ 1., 5.],
# [ 1., 5.]])
Given a numpy array of zeros, say
arr = np.zeros((5, 5))
and an array of indices that represent vertices of a polygon, say
verts = np.array([[0, 2], [2, 0], [2, 4]])
1) What is the elegant way of doing
for v in verts:
arr[v[0], v[1]] = 1
such that the resulting array is
In [108]: arr
Out[108]:
array([[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
2) How can I fill the array with ones such that the output array is
In [158]: arr
Out[158]:
array([[ 0., 0., 1., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
To answer the first part of your question: arr[tuple(verts.T)] = 1
verts.T transposes your indices to a (2, n) array, where the two rows correspond to the row and column dimensions of arr. These are then unpacked into a tuple of (row_indices, col_indices), which we then use to index into arr.
We could write this a bit more verbosely as:
row_indices = verts[:, 0]
col_indices = verts[:, 1]
arr[row_indices, col_indices] = 1
For the second part, one method that will work for arbitrary polygons would be to use matplotlib.Path.contains_points, as described here:
from matplotlib.path import Path
points = np.indices(arr.shape).reshape(2, -1).T
path = Path(verts)
mask = path.contains_points(points, radius=1e-9)
mask = mask.reshape(arr.shape).astype(arr.dtype)
print(repr(mask))
# array([[ 0., 0., 1., 0., 0.],
# [ 0., 1., 1., 1., 0.],
# [ 1., 1., 1., 1., 1.],
# [ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0.]])
Consider, for reference:
>>> x, y = np.ones((2, 2, 2)), np.zeros((2, 2, 2))
>>> np.concatenate((x, y, x, y), axis=2)
array([[[ 1., 1., 0., 0., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 1., 1., 0., 0.]],
[[ 1., 1., 0., 0., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 1., 1., 0., 0.]]])
We have stacked the arrays along the innermost dimension, merging it - the resulting shape is (2, 2, 8). But suppose I wanted those innermost elements to lie side-by-side instead (this would only work because every dimension of the source arrays is the same, including the one I want to 'stack' in), producing a result with shape (2, 2, 4, 2) as follows?
array([[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]],
[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]]])
The best approach I have is to reshape each source array first, to add a 1-length dimension right before the last:
def pad(npa):
return npa.reshape(npa.shape[:-1] + (1, npa.shape[-1]))
np.concatenate((pad(x), pad(y), pad(x), pad(y)), axis=2) # does what I want
# np.hstack might be better? I always want the second-last dimension, now
But I feel like I am reinventing a wheel. Have I overlooked something that will do this more directly?
You can do it as follows:
>>> xx = x[..., None, :]
>>> yy = y[..., None, :]
>>> np.concatenate((xx, yy, xx, yy), axis=2).shape
(2, 2, 4, 2)
>>> np.concatenate((xx, yy, xx, yy), axis=2)
array([[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]],
[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]]])
>>>
What this example does is change the shape (no data is copied) of the arrays. Slicing with None or equivalently np.newaxis adds an axis:
>>> xx.shape
(2, 2, 1, 2)
>>> xx
array([[[[ 1., 1.]],
[[ 1., 1.]]],
[[[ 1., 1.]],
[[ 1., 1.]]]])
>>>
I am new to python and am trying to some simple classification on raster image.
Basically, I am reading a TIF image as a 2D array and do some calculating and manipulation on it. For classification part, I am trying to create 3 empty arrays for land, water, and clouds. These classes will be assigned a value of 1 under multiple conditions, and eventually assigning these classes as landclass=1, waterclass=2, cloudclass=3 respectively.
apparently I can assign all values in an array to 1 under one condition
like this:
crop = gdal.Open(crop,GA_ReadOnly)
crop = crop.ReadAsArray()
rows,cols = crop.shape
mode = int(stats.mode(crop, axis=None)[0])
water = np.empty(shape(row,cols),dtype=float32)
land = water
clouds = water
than I have something like this (output):
>>> land
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
>>> land[water==0]=1
>>> land
array([[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.]], dtype=float32)
>>> land[crop>mode]=1
>>> land
array([[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.]], dtype=float32)
But how can I have the values in "land" equal to 1 under a couple of conditions without altering the shape of the array?
I tried to do this
land[water==0,crop>mode]=1
and I got ValueError. And I tried this
land[water==0 and crop>mode]=1
and python asks me to use a.all() or a.all()....
For only one condition, the result is exactly what I want, and I have to do it in order to get the result. eg (this is what I have in my actual code):
water[band6 < b6_threshold]=1
water[band7 < b7_threshold_1]=1
water[band6 > b6_threshold]=1
water[band7 < b7_threshold_2]=1
land[band6 > b6_threshold]=1
land[band7 > b7_threshold_2]=1
land[clouds == 1]=1
land[water == 1]=1
land[b1b4 < 0.5]=1
land[band3 < 0.1)]=1
clouds[land == 0]=1
clouds[water == 0]=1
clouds[band6 < (b6_mode-4)]=1
I found this is a bit confusing and I would like to combine all conditions within one statement... Any suggestion on that?
Thank you very much!
You can multiply the boolean arrays for something like "and":
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[(a > 1) * (a < 3)] = 99
>>> a
array([ 1, 99, 3, 4])
And you can add them for something like "or":
>>> a[(a > 1) + (a < 3)] = 123
>>> a
array([123, 123, 123, 123])
Alternatively, if you prefer to think of boolean logic rather than True and False being 0 and 1, you can also use the operators & and | to the same effect.