Related
I have 2 square array with shape = (25, 25) and I want to check if an entire row is filled with zeros and if the corresponding column is filled with zeros. If this is the case I want to remove those columns and rows from the array.
For example:
array = np.array([[1, 0, 1, 1],
[0, 0, 0, 0],
[1, 0, 1, 1],
[1, 0, 1, 1]])
I want it manipulated to
array=np.array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
I hope you can understand what I am aiming at. In this example row and column two have been removed as they are zero rows/columns.
I could do that by iterating through all of those arrays, as I have 10 million of those arrays I would like to have a pythonic/efficient way to solve this issue.
The second array is a tensorflow array manipulating that should be no problem if I know the index of the rows columns I want removed.
Edit:
I have now found following solution, but it is using for-looping:
def removepadding(y_true, y_pred):
shape = np.shape(y_true)
y_true_cleaned=[]
for i in range(shape[0]):
x = y_true[i]
for n in range(shape[1] - 1, -1, -1):
if sum(x[n, :]) == 0 and sum(x[:, n]) == 0:
x = np.delete(np.delete(x, n, 0), n, 1)
y_true_cleaned.append(x)
return y_true_cleaned
You can do it in one line:
array[array.any(axis = 1)][:, array.any(axis = 0)]
#array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])
if there is negative values in the arr, np.sum may fail.
for 2d array:
import numpy as np
a = np.array([[1,0,2,3,0,4],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[2,0,3,4,0,5],
[3,0,4,5,0,6],
[4,0,5,6,0,7],
[5,0,6,7,0,8]])
row = np.all(a==0, axis=1)
col = np.all(a==0, axis=0)
a[~row][:,~col]
output
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7],
[5, 6, 7, 8]])
for 3d array:
a = np.ones((3,3,3))
a[1,:,1] = 0
a[1,1,:] = 0
a[:,1,1] = 0
z = np.all(a==0, axis=2)
y = np.all(a==0, axis=1)
x = np.all(a==0, axis=0)
Z = ~np.array([z]*a.shape[2])
Y = ~np.array([y]*a.shape[1])
X = ~np.array([x]*a.shape[0])
ZZ, YY, XX = (Z*Y*X).nonzero()
a[ZZ, YY, XX]
You can use np.count_nonzero to get the indices in one step per dimension:
nnz_row = np.count_nonzero(array, axis=1)
nnz_col = np.count_nonzero(array, axis=0)
Now you make a mask of where both are zero:
mask = (nnz_row == 0) & (nnz_col == 9)
You can turn the mask into indices and pass it to np.delete:
ind = np.flatnonzero(mask)
array = np.delete(np.delete(array, ind, axis=0), ind, axis=1)
Alternatively, you can compute the positive mask:
pmask = nnz_row.astype(bool) | nnz_col.astype(bool)
This mask can select directly, analogously to what delete did with the negative mask:
array = array[pmask, :][:, pmask]
Edit: Thanks to #mad physicist, we can use np.flatnonzero. Here's the 2d case:
import numpy as np
a=np.array([[1,0,2,3,0,4],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[2,0,3,4,0,5],
[3,0,4,5,0,6],
[4,0,5,6,0,7],
[5,0,6,7,0,8]])
cols_to_keep = np.flatnonzero(a.sum(axis=0))
rows_to_keep = np.flatnonzero(a.sum(axis=1))
a = a[:, cols_to_keep]
a = a[rows_to_keep, :]
a
>>>
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7],
[5, 6, 7, 8]])
Here's the 3d case:
import numpy as np
a=np.array([
[[1,0,2,3,0,4],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[2,0,3,4,0,5],
[3,0,4,5,0,6],
[4,0,5,6,0,7],
[5,0,6,7,0,8]],
[[0,0,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0]],
[[5,0,5,5,0,5],
[0,0,0,0,0,0],
[0,0,0,0,0,0],
[2,0,3,4,0,5],
[3,0,4,5,0,6],
[4,0,5,6,0,7],
[5,0,6,7,0,8]],
])
ix_keep_axis_0 = np.flatnonzero(a.sum(axis=(1, 2)))
ix_keep_axis_1 = np.flatnonzero(a.sum(axis=(0, 2)))
ix_keep_axis_2 = np.flatnonzero(a.sum(axis=(0, 1)))
a = a[ix_keep_axis_0, :, :]
a = a[:, ix_keep_axis_1, :]
a = a[:, :, ix_keep_axis_2]
a
>>>
array([[[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7],
[5, 6, 7, 8]],
[[5, 5, 5, 5],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7],
[5, 6, 7, 8]]])
I want to implement the following operation.
Given a tensor,
m = ([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
How to implement to remove the vector with value [2, 2, 2] from m?
You can do that like this:
import tensorflow as tf
def remove_row(m, q):
# Assumes m is 2D
mask = tf.math.reduce_any(tf.not_equal(m, q), axis=-1)
return tf.boolean_mask(m, mask)
# Test
m = tf.constant([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
q = tf.constant([2, 2, 2])
tf.print(remove_row(m, q))
# [[1 1 1]
# [3 3 3]]
I am trying to generate a vector-matrix outer product (tensor) using PyTorch. Assuming the vector v has size p and the matrix M has size qXr, the result of the product should be pXqXr.
Example:
#size: 2
v = [0, 1]
#size: 2X3
M = [[0, 1, 2],
[3, 4, 5]]
#size: 2X2X3
v*M = [[[0, 0, 0],
[0, 0, 0]],
[[0, 1, 2],
[3, 4, 5]]]
For two vectors v1 and v2, I can use torch.bmm(v1.view(1, -1, 1), v2.view(1, 1, -1)). This can be easily extended for a batch of vectors. However, I am not able to find a solution for vector-matrix case. Also, I need to do this operation for batches of vectors and matrices.
You can use torch.einsum operator:
torch.einsum('bp,bqr->bpqr', v, M) # batch-wise operation v.shape=(b,p) M.shape=(b,q,r)
torch.einsum('p,qr->pqr', v, M) # cross-batch operation
I was able to do it with following code.
Single vector and matrix
v = torch.arange(3)
M = torch.arange(8).view(2, 4)
# v: tensor([0, 1, 2])
# M: tensor([[0, 1, 2, 3],
# [4, 5, 6, 7]])
torch.mm(v.unsqueeze(1), M.view(1, 2*4)).view(3,2,4)
tensor([[[ 0, 0, 0, 0],
[ 0, 0, 0, 0]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 0, 2, 4, 6],
[ 8, 10, 12, 14]]])
For a batch of vectors and matrices, it can be easily extended using torch.bmm.
v = torch.arange(batch_size*2).view(batch_size, 2)
M = torch.arange(batch_size*3*4).view(batch_size, 3, 4)
torch.bmm(v.unsqueeze(2), M.view(-1, 1, 3*4)).view(-1, 2, 3, 4)
If [batch_size, z, x, y] is the shape of the target matrix, another solution is building two matrices of this shape with appropriate elements in each position and then apply an elementwise multiplication. It works fine with batch of vectors:
# input matrices
batch_size = 2
x1 = torch.Tensor([0,1])
x2 = torch.Tensor([[0,1,2],
[3,4,5]])
x1 = x1.unsqueeze(0).repeat((batch_size, 1))
x2 = x2.unsqueeze(0).repeat((batch_size, 1, 1))
# dimensions
b = x1.shape[0]
z = x1.shape[1]
x = x2.shape[1]
y = x2.shape[2]
# solution
mat1 = x1.reshape(b, z, 1, 1).repeat(1, 1, x, y)
mat2 = x2.reshape(b,1,x,y).repeat(1, z, 1, 1)
mat1*mat2
import numpy as np
import itertools as it
SPIN_POS = np.array([[0, 0, 0], [1, 1, 0], [1, 0, 1], [0, 1, 1],
[2, 2, 0], [3, 3, 0], [3, 2, 1], [2, 3, 1],
[2, 0, 2], [3, 1, 2], [3, 0, 3], [2, 1, 3],
[0, 2, 2], [1, 3, 2], [1, 2, 3], [0, 3, 3]
]) / 4
def gen_posvecs(xdim:int, ydim:int, zdim:int):
"""
Generates position vectors of site pairs in the lattice of size xdim,ydim,zdim
:param x,y,z is the number of unit cells in the x,y,z directions;
:returns array containing the position vectors
"""
poss = np.zeros((xdim,ydim,zdim,16,3))
for x,y,z,s in it.product(range(xdim), range(ydim), range(zdim), range(16)):
poss[x,y,z,s] = np.array([x,y,z]) + SPIN_POS[s]
return poss
A = gen_sepvecs(4,4,4) # A.shape = (4,4,4,16,3)
B = np.subtract.outer(A[...,-1], A) # my attempt at a soln
assert all(A[1,2,0,12] - A[0,1,3,11] == B[1,2,0,12,0,1,3,11]) # should give true
Consider the above code. I have an array A of shape (4,4,4,16,3), which represents 3D position vectors in a lattice (the last axis of dim 3 are the x,y,z coordinates). The first 4 dimensions index the site in the lattice.
What I want
I would like to generate from A, an array containing all possible separation vectors between sites in the lattice. This means an output array B, of shape (4,4,4,16,4,4,4,16,3). The first 4 dimensions being of site i, next 4 dimensions of site j, then the last dimension of the (x,y,z) coordinate of the position vector difference.
i.e., A[a,b,c,d]: shape (3,) is the (x,y,z) of first site; A[r,s,t,u]: shape (3,) is the (x,y,z) of second site; Then I want B[a,b,c,d,r,s,t,u] to be (x,y,z) difference between the first two.
My attempt
I know about the ufunc.outer function, as you can see in my attempt in code. But I'm stuck at applying it together with performing element-wise subtraction on the last axis (the (x,y,z)) of each A.
In my attempt, B has the correct dimensions I want, but it is obviously wrong. Any hints? (barring the use of any for-loops)
I think you just need to do:
B = (A[:, :, :, :, np.newaxis, np.newaxis, np.newaxis, np.newaxis] -
A[np.newaxis, np.newaxis, np.newaxis, np.newaxis])
In your code:
import numpy as np
import itertools as it
SPIN_POS = np.array([[0, 0, 0], [1, 1, 0], [1, 0, 1], [0, 1, 1],
[2, 2, 0], [3, 3, 0], [3, 2, 1], [2, 3, 1],
[2, 0, 2], [3, 1, 2], [3, 0, 3], [2, 1, 3],
[0, 2, 2], [1, 3, 2], [1, 2, 3], [0, 3, 3]
]) / 4
def gen_posvecs(xdim:int, ydim:int, zdim:int):
"""
Generates position vectors of site pairs in the lattice of size xdim,ydim,zdim
:param x,y,z is the number of unit cells in the x,y,z directions;
:returns array containing the position vectors
"""
poss = np.zeros((xdim,ydim,zdim,16,3))
for x,y,z,s in it.product(range(xdim), range(ydim), range(zdim), range(16)):
poss[x,y,z,s] = np.array([x,y,z]) + SPIN_POS[s]
return poss
A = gen_posvecs(4,4,4) # A.shape = (4,4,4,16,3)
B = A[:, :, :, :, np.newaxis, np.newaxis, np.newaxis, np.newaxis] - A[np.newaxis, np.newaxis, np.newaxis, np.newaxis]
assert all(A[1,2,0,12] - A[0,1,3,11] == B[1,2,0,12,0,1,3,11])
# Does not fail
I want to assign 0 to different length slices of a 2d array.
Example:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = np.array([0,1,2,0])
Given the above array arr and indices idxs how can you assign to different length slices. Such that the result is:
arr = np.array([[0,2,3,4],
[0,0,3,4],
[0,0,0,4],
[0,2,3,4]])
These don't work
slices = np.array([np.arange(i) for i in idxs])
arr[slices] = 0
arr[:, :idxs] = 0
You can use broadcasted comparison to generate a mask, and index into arr accordingly:
arr[np.arange(arr.shape[1]) <= idxs[:, None]] = 0
print(arr)
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
This does the trick:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = [0,1,2,0]
for i,j in zip(range(arr.shape[0]),idxs):
arr[i,:j+1]=0
import numpy as np
arr = np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
idxs = np.array([0, 1, 2, 0])
for i, idx in enumerate(idxs):
arr[i,:idx+1] = 0
Here is a sparse solution that may be useful in cases where only a small fraction of places should be zeroed out:
>>> idx = idxs+1
>>> I = idx.cumsum()
>>> cidx = np.ones((I[-1],), int)
>>> cidx[0] = 0
>>> cidx[I[:-1]]-=idx[:-1]
>>> cidx=np.cumsum(cidx)
>>> ridx = np.repeat(np.arange(idx.size), idx)
>>> arr[ridx, cidx]=0
>>> arr
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
Explanation: We need to construct the coordinates of the positions we want to put zeros in.
The row indices are easy: we just need to go from 0 to 3 repeating each number to fill the corresponding slice.
The column indices start at zero and most of the time are incremented by 1. So to construct them we use cumsum on mostly ones. Only at the start of each new row we have to reset. We do that by subtracting the length of the corresponding slice such as to cancel the ones we have summed in that row.