Vectorize the distribution of a 2D matrix into a 3D matrix - python

I have a 2D matrix K with shape (j, k) and I'd like to turn it into a 3D one Q by distributing the vectors according to another vector d (length i) which would match the new first dimension.
I have already done that with a loop, but the routine is called several times and the overall process takes too long:
def shift(self, arr, num, init):
result = np.zeros((arr.shape[0], arr.shape[1]), dtype=np.int32)
offset = num + init
result[offset:,:] = arr[:-offset,:]
return result
q = np.zeros((i, j, k), dtype=np.int32)
q[0,:,:] = K
for i in range(len(d)):
q[:, i, :] = shift(q[:, i, :], int(d[i]))
Here basically I copy the initial matrix K into the first element of the first dimension of the 3D matrix Q, which is created with np.zeros and the final dimensions (d.shape + K.shape) and afterwards I'm "pushing" for each element in dimension 2 the number of positions in vector d.
Any ideas to avoid that for loop?

For the case in your comment:
In [853]: v = [4, 6, 3, 8]; d = [9, 2, 3, 1]
In [865]: q = np.zeros((10,4), int)
In [866]: q[d,np.arange(4)]=v
In [867]: q
Out[867]:
array([[0, 0, 0, 0],
[0, 0, 0, 8],
[0, 6, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[4, 0, 0, 0]])

Related

How to keep a fixed size of unique values in random positions in an array while replacing others with a mask?

This can be a very simple question as I am still exploring Python. And for this issue I use numpy.
Updated 09/30/21: adopted and modified codes shown below for any potential future reference. I also added an elif in the loop for classes that have fewer counts than the wanted size. Some codes may be unnecessary tho.
new_array = test_array.copy()
uniques, counts = np.unique(new_array, return_counts=True)
print("classes:", uniques, "counts:", counts)
for unique, count in zip(uniques, counts):
#print (unique, count)
if unique != 0 and count > 3:
ids = np.random.choice(count, count-3, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = 0
elif unique != 0 and count <= 3:
ids = np.random.choice(count, count, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = unique
Below is original question.
Let's say I have a 2D array like this:
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
print("existing classes:", np.unique(test_array))
# "existing classes: [0 1 2 4]"
Now I want to keep a fixed size (e.g. 2 values) in each class that != 0 (in this case two 1s, two 2s, and two 4s) and replace the rest with 0. Where the value being replaced is random with each run (or from a seed).
For example, with run 1 I will have
([[0,0,0,0,0],
[1,0,0,1,0],
[0,0,0,0,0],
[2,0,0,0,4],
[4,0,0,2,0],
[0,0,0,0,0]])
with another run it might be
([[0,0,0,0,0],
[1,1,0,0,0],
[0,0,0,0,0],
[2,0,2,0,4],
[4,0,0,0,0],
[0,0,0,0,0]])
etc. Could anyone help me with this?
My strategy is
Create a new array initialized to all zeros
Find the elements in each class
For each class
Randomly sample two of elements to keep
Set those elements of the new array to the class value
The trick is keeping the shape of the indexes appropriate so you retain the shape of the original array.
import numpy as np
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
def sample_classes(arr, n_keep=2, random_state=42):
classes, counts = np.unique(test_array, return_counts=True)
rng = np.random.default_rng(random_state)
out = np.zeros_like(arr)
for klass, count in zip(classes, counts):
# Find locations of the class elements
indexes = np.nonzero(arr == klass)
# Sample up to n_keep elements of the class
keep_idx = rng.choice(count, n_keep, replace=False)
# Select the kept elements and reformat for indexing the output array and retaining its shape
keep_idx_reshape = tuple(ind[keep_idx] for ind in indexes)
out[keep_idx_reshape] = klass
return out
You can use it like
In [3]: sample_classes(test_array) [3/1174]
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 0, 4, 0],
[4, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
In [4]: sample_classes(test_array, n_keep=3)
Out[4]:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 4, 0],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
In [5]: sample_classes(test_array, random_state=88)
Out[5]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[4, 0, 4, 2, 2],
[0, 0, 0, 0, 0]])
In [6]: sample_classes(test_array, random_state=88, n_keep=4)
Out[6]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[2, 2, 0, 4, 4],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
Here is my not-so-elegant solution:
def unique(arr, num=2, seed=None):
np.random.seed(seed)
vals = {}
for i, row in enumerate(arr):
for j, val in enumerate(row):
if val in vals and val != 0:
vals[val].append((i, j))
elif val != 0:
vals[val] = [(i, j)]
new = np.zeros_like(arr)
for val in vals:
np.random.shuffle(vals[val])
while len(vals[val]) > num:
vals[val].pop()
for row, col in vals[val]:
new[row,col] = val
return new
The following should be O(n log n) in array size
def keep_k_per_class(data,k,rng):
out = np.zeros_like(data)
unq,cnts = np.unique(data,return_counts=True)
assert (cnts >= k).all()
# calculate class boundaries from class sizes
CNTS = cnts.cumsum()
# indirectly group classes together by partial sorting
idx = data.ravel().argpartition(CNTS[:-1])
# the following lines implement simultaneous drawing without replacement
# from all classes
# lower boundaries of intervals to draw random numbers from
# for each class they start with the lower class boundary
# and from there grow one by one - together with the
# swapping out below this implements "without replacement"
lb = np.add.outer(np.arange(k),CNTS-cnts)
pick = rng.integers(lb,CNTS,lb.shape)
for l,p in zip(lb,pick):
# populate output array
out.ravel()[idx[p]] = unq
# swap out used indices so still available ones occupy a linear
# range (per class)
idx[p] = idx[l]
return out
Examples:
rng = np.random.default_rng()
>>>
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 2, 0, 4],
[0, 4, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[4, 0, 4, 0, 2],
[0, 0, 0, 0, 0]])
and a large one
>>> BIG = np.add.outer(np.tile(test_array,(100,100)),np.arange(0,500,5))
>>> BIG.size
30000000
>>> res = keep_k_per_class(BIG,30,rng)
### takes ~4 sec
### check
>>> np.unique(np.bincount(res.ravel()),return_counts=True)
(array([ 0, 30, 29988030]), array([100, 399, 1]))

How can I add a 1D array to a segment of a 2D array?

I have a 2D NumPy array filled with zeroes (placeholder values). I would like to add a 1D array filled with ones and zeroes to a part of it. eg.
2D array:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
1D array:
array([1, 0, 1])
Desired end product: I want the array starting in position [2, 1]
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])
Or an insertion in any other position it could reasonably fit in. I have tried to do it with boolean masks but have not had any luck creating one in the correct shape. I have also tried flattening the 2D array, but couldn't figure out how to replace the values in the correct space.
You can indeed flatten the array and create a sequence of positions where you will insert your 1D array segment:
>>> pos = [1, 2]
>>> start = x.shape[1]*pos[0] + pos[1]
>>> seq = start + np.arange(len(segment))
>>> seq
array([7, 8, 9])
Then, you can either index the flattened array:
>>> x_f = x.flatten()
>>> x_f[seq] = segment
>>> x_f.reshape(x.shape)
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])
Alternatively, you can np.ravel_multi_index to get seq and apply np.unravel_index on it.
>>> seq = np.arange(len(segment)) + np.ravel_multi_index(pos, x.shape)
array([7, 8, 9])
>>> indices = np.unravel_index(seq, x.shape)
(array([1, 1, 1]), array([2, 3, 4]))
>>> x[indices] = segment
>>> x
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])

Assign a tensor to multiple slices

Let
a = tensor([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
b = torch.tensor([1, 2])
c = tensor([[1, 2, 0, 0],
[0, 1, 2, 0],
[0, 0, 1, 2]])
Is there a way to obtain c by assigning b to slices of a without any loops? That is, a[indices] = b for some indices or something similar?
You can use scatter method in pytorch.
a = torch.tensor([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
b = torch.tensor([1, 2])
index = torch.tensor([[0,1],[1,2],[2,3]])
a.scatter_(1, index, b.view(-1,2).repeat(3,1))
# tensor([[1, 2, 0, 0],
# [0, 1, 2, 0],
# [0, 0, 1, 2]])
The logic behind this operation is a bit iffy in the sense that it is not clear what the parameters of the operation are.
However, one way of obtaining the desired output from the input with vectorized operations only is:
determine how many rows are needed (3 for your example)
create a a with a number of columns such that b is followed by as many zeros as the number of rows (2 + 3), and the choosen number of rows (3)
assign b to the beginning of a for each
flatten the array, cut num_rows zeros, and reshape to the target shape.
In NumPy, this can be implemented as:
import numpy as np
b = np.array([1, 2])
c = np.array([[1, 2, 0, 0],
[0, 1, 2, 0],
[0, 0, 1, 2]])
num_rows = 3
a = np.zeros((num_rows, len(b) + num_rows), dtype=b.dtype)
a[:, :len(b)] = b
a = a.ravel()[:-num_rows].reshape((num_rows, len(b) + num_rows - 1))
print(a)
# [[1 2 0 0]
# [0 1 2 0]
# [0 0 1 2]]
print(np.all(a == c))
# True
EDIT
The same approach implemented in Torch:
import torch as to
b = to.tensor([1, 2])
c = to.tensor([[1, 2, 0, 0],
[0, 1, 2, 0],
[0, 0, 1, 2]])
num_rows = 3
a = to.zeros((num_rows, len(b) + num_rows), dtype=b.dtype)
a[:, :len(b)] = b
a = a.flatten()[:-num_rows].reshape((num_rows, len(b) + num_rows - 1))
print(a)
# tensor([[1, 2, 0, 0],
# [0, 1, 2, 0],
# [0, 0, 1, 2]])
print(to.all(a == c))
# tensor(1, dtype=torch.uint8)

How to efficiently unroll a matrix by value with numpy?

I have a matrix M with values 0 through N within it. I'd like to unroll this matrix to create a new matrix A where each submatrix A[i, :, :] represents whether or not M == i.
The solution below uses a loop.
# Example Setup
import numpy as np
np.random.seed(0)
N = 5
M = np.random.randint(0, N, size=(5,5))
# Solution with Loop
A = np.zeros((N, M.shape[0], M.shape[1]))
for i in range(N):
A[i, :, :] = M == i
This yields:
M
array([[4, 0, 3, 3, 3],
[1, 3, 2, 4, 0],
[0, 4, 2, 1, 0],
[1, 1, 0, 1, 4],
[3, 0, 3, 0, 2]])
M.shape
# (5, 5)
A
array([[[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 1, 0]],
...
[[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0]]])
A.shape
# (5, 5, 5)
Is there a faster way, or a way to do it in a single numpy operation?
Broadcasted comparison is your friend:
B = (M[None, :] == np.arange(N)[:, None, None]).view(np.int8)
np.array_equal(A, B)
# True
The idea is to expand the dimensions in such a way that the comparison can be broadcasted in the manner desired.
As pointed out by #Alex Riley in the comments, you can use np.equal.outer to avoid having to do the indexing stuff yourself,
B = np.equal.outer(np.arange(N), M).view(np.int8)
np.array_equal(A, B)
# True
You can make use of some broadcasting here:
P = np.arange(N)
Y = np.broadcast_to(P[:, None], M.shape)
T = np.equal(M, Y[:, None]).astype(int)
Alternative using indices:
X, Y = np.indices(M.shape)
Z = np.equal(M, X[:, None]).astype(int)
You can index into the identity matrix like so
A = np.identity(N, int)[:, M]
or so
A = np.identity(N, int)[M.T].T
Or use the new (v1.15.0) put_along_axis
A = np.zeros((N,5,5), int)
np.put_along_axis(A, M[None], 1, 0)
Note if N is much larger than 5 then creating an NxN identity matrix may be considered wasteful. We can mitigate this using stride tricks:
def read_only_identity(N, dtype=float):
z = np.zeros(2*N-1, dtype)
s, = z.strides
z[N-1] = 1
return np.lib.stride_tricks.as_strided(z[N-1:], (N, N), (-s, s))

How to toggle theano matrix based on vector of int position

Using theano tensor operations, how can I toggle one cell on each row of a matrix based on a integer position indicator on the correspond row index of a vector (i.e. |v| = rows of the matrix). For example, given a 100x5 matrix of zeros
M = [
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
] # |M| = 100x5
and a 100-element vector of integer in the range of [0, 4].
V = [2, 4, ..., 0, 2] # |V| = 100, max(V) = 4, min(V) = 0
update (or create another) matrix M to
M = [
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
...
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0]
] # |M| = 100x5
(I know how to do this iteratively using conventional codes, but I want to run it as part of an algorithm on GPU without complicating my input which is currently vector V, so a direct theano implementation would be great.)
I figured out the answer myself. This operation is known as one-hot and it is supported as the "to_one_hot" in Theano's extra_ops package. Code:
M_one_hot = theano.tensor.extra_ops.to_one_hot(V, 5, dtype='int32')

Categories

Resources