"stacking" arrays in a new dimension? - python

Consider, for reference:
>>> x, y = np.ones((2, 2, 2)), np.zeros((2, 2, 2))
>>> np.concatenate((x, y, x, y), axis=2)
array([[[ 1., 1., 0., 0., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 1., 1., 0., 0.]],
[[ 1., 1., 0., 0., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 1., 1., 0., 0.]]])
We have stacked the arrays along the innermost dimension, merging it - the resulting shape is (2, 2, 8). But suppose I wanted those innermost elements to lie side-by-side instead (this would only work because every dimension of the source arrays is the same, including the one I want to 'stack' in), producing a result with shape (2, 2, 4, 2) as follows?
array([[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]],
[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]]])
The best approach I have is to reshape each source array first, to add a 1-length dimension right before the last:
def pad(npa):
return npa.reshape(npa.shape[:-1] + (1, npa.shape[-1]))
np.concatenate((pad(x), pad(y), pad(x), pad(y)), axis=2) # does what I want
# np.hstack might be better? I always want the second-last dimension, now
But I feel like I am reinventing a wheel. Have I overlooked something that will do this more directly?

You can do it as follows:
>>> xx = x[..., None, :]
>>> yy = y[..., None, :]
>>> np.concatenate((xx, yy, xx, yy), axis=2).shape
(2, 2, 4, 2)
>>> np.concatenate((xx, yy, xx, yy), axis=2)
array([[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]],
[[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]],
[[ 1., 1.],
[ 0., 0.],
[ 1., 1.],
[ 0., 0.]]]])
>>>
What this example does is change the shape (no data is copied) of the arrays. Slicing with None or equivalently np.newaxis adds an axis:
>>> xx.shape
(2, 2, 1, 2)
>>> xx
array([[[[ 1., 1.]],
[[ 1., 1.]]],
[[[ 1., 1.]],
[[ 1., 1.]]]])
>>>

Related

How to fill numpy array of zeros with ones given indices/coordinates

Given a numpy array of zeros, say
arr = np.zeros((5, 5))
and an array of indices that represent vertices of a polygon, say
verts = np.array([[0, 2], [2, 0], [2, 4]])
1) What is the elegant way of doing
for v in verts:
arr[v[0], v[1]] = 1
such that the resulting array is
In [108]: arr
Out[108]:
array([[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
2) How can I fill the array with ones such that the output array is
In [158]: arr
Out[158]:
array([[ 0., 0., 1., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
To answer the first part of your question: arr[tuple(verts.T)] = 1
verts.T transposes your indices to a (2, n) array, where the two rows correspond to the row and column dimensions of arr. These are then unpacked into a tuple of (row_indices, col_indices), which we then use to index into arr.
We could write this a bit more verbosely as:
row_indices = verts[:, 0]
col_indices = verts[:, 1]
arr[row_indices, col_indices] = 1
For the second part, one method that will work for arbitrary polygons would be to use matplotlib.Path.contains_points, as described here:
from matplotlib.path import Path
points = np.indices(arr.shape).reshape(2, -1).T
path = Path(verts)
mask = path.contains_points(points, radius=1e-9)
mask = mask.reshape(arr.shape).astype(arr.dtype)
print(repr(mask))
# array([[ 0., 0., 1., 0., 0.],
# [ 0., 1., 1., 1., 0.],
# [ 1., 1., 1., 1., 1.],
# [ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0.]])

Return index of every non-zero element in array

I know there is a way to return the index of the maximum element in an array in python: numpy.argmax(). Is there a way to return index of every non-zero element?
For example
array([[ 0., 1., 1., ..., 1., 0., 0.],
[ 0., 1., 1., ..., 1., 0., 1.],
[ 0., 1., 1., ..., 1., 0., 0.],
...,
[ 0., 1., 1., ..., 1., 0., 0.],
[ 0., 1., 1., ..., 1., 0., 0.],
[ 0., 1., 1., ..., 1., 0., 0.]], dtype=float32)
to
[[1, 2, ...,6],
[1,2,...6,8],
...
...
]
Do you want something like this:
x = np.asarray([0, 1, 2, 3, 0, 1])
In [129]: np.nonzero(x)
Out[129]: (array([1, 2, 3, 5]),)
See: [1], [2], [3]

"Trailing" One-Hot Encode

I am trying to do something similar to One-Hot-Encoding but instead of the selected class being 1 and the rest zero, I want all the classes up to (and including the selected class) to be 1. Say I have a training batch with labels (5 possible class labels; 0, 1, 2, 3, 4)
y = np.array([0,2,1,3,4,1])
I can one-hot-encode with
def one_hot_encode(arr, num_classes):
return np.eye(num_classes)[arr]
which gives
>>> one_hot_encode(y, 5)
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0.]])
I liked to get
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])
Anyone know how to do this?
You could achieve this by using a lower-triangular matrix instead of an identity matrix in your function definition:
def many_hot_encode(arr, num_classes):
return np.tril(np.ones(num_classes))[arr]
many_hot_encode(y,5)
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])
You can also use broadcasting -
out = (y[:,None] >= np.arange(num_classes)).astype(float)
Sample run -
In [71]: y = np.array([0,2,1,3,4,1])
In [72]: num_classes = 5
In [73]: (y[:,None] >= np.arange(num_classes)).astype(float)
Out[73]:
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])

Create a Numpy array representing connections in a network

Say I have an array describing network links between nodes:
array([[ 1., 2.],
[ 2., 3.],
[ 3., 4.]])
This would be a linear 4 node network with links from node 1 to node 2 and so on..
What would be the best way to convert this information to an array of the following format?
array([[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.]])
The column numbers then represent the "to nodes" and the rows the "from nodes".
Another example would be:
array([[ 1., 2.],
[ 2., 3.],
[ 2., 4.]])
giving
array([[ 0., 1., 0., 0.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
Node ids should be integers. Also the rows and columns in numpy are numbered from zero, so we have to substract one in each dimension:
import numpy as np
conns = np.array([[ 1, 2],
[ 2, 3],
[ 3, 4]])
net = np.zeros((conns.max(), conns.max()), dtype=int)
# two possibilities:
# if you need the number of connections:
for conn in conns:
net[conn[0]-1, conn[1]-1] += 1
# if you just need a 1 for existing connection(s):
net[conns[:,0]-1, conns[:,1]-1] = 1

multiple condition in fancy indexing

I am new to python and am trying to some simple classification on raster image.
Basically, I am reading a TIF image as a 2D array and do some calculating and manipulation on it. For classification part, I am trying to create 3 empty arrays for land, water, and clouds. These classes will be assigned a value of 1 under multiple conditions, and eventually assigning these classes as landclass=1, waterclass=2, cloudclass=3 respectively.
apparently I can assign all values in an array to 1 under one condition
like this:
crop = gdal.Open(crop,GA_ReadOnly)
crop = crop.ReadAsArray()
rows,cols = crop.shape
mode = int(stats.mode(crop, axis=None)[0])
water = np.empty(shape(row,cols),dtype=float32)
land = water
clouds = water
than I have something like this (output):
>>> land
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
>>> land[water==0]=1
>>> land
array([[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.]], dtype=float32)
>>> land[crop>mode]=1
>>> land
array([[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 0., 0., 0.]], dtype=float32)
But how can I have the values in "land" equal to 1 under a couple of conditions without altering the shape of the array?
I tried to do this
land[water==0,crop>mode]=1
and I got ValueError. And I tried this
land[water==0 and crop>mode]=1
and python asks me to use a.all() or a.all()....
For only one condition, the result is exactly what I want, and I have to do it in order to get the result. eg (this is what I have in my actual code):
water[band6 < b6_threshold]=1
water[band7 < b7_threshold_1]=1
water[band6 > b6_threshold]=1
water[band7 < b7_threshold_2]=1
land[band6 > b6_threshold]=1
land[band7 > b7_threshold_2]=1
land[clouds == 1]=1
land[water == 1]=1
land[b1b4 < 0.5]=1
land[band3 < 0.1)]=1
clouds[land == 0]=1
clouds[water == 0]=1
clouds[band6 < (b6_mode-4)]=1
I found this is a bit confusing and I would like to combine all conditions within one statement... Any suggestion on that?
Thank you very much!
You can multiply the boolean arrays for something like "and":
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[(a > 1) * (a < 3)] = 99
>>> a
array([ 1, 99, 3, 4])
And you can add them for something like "or":
>>> a[(a > 1) + (a < 3)] = 123
>>> a
array([123, 123, 123, 123])
Alternatively, if you prefer to think of boolean logic rather than True and False being 0 and 1, you can also use the operators & and | to the same effect.

Categories

Resources