How to effienctly divide blocks of numpy array - python

I dont even know how to phrase what I am trying to do so I'm going straight to a simple example. I have a blocked array that looks something like this:
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
and I want as an output:
np.array([
[1/9,2/9,0,0],
[3/9,4/9,0,0],
[9/9,9/9,0,0],
[0,0,5/8,6/8],
[0,0,7/8,8/8],
[0,0,8/8,8/8]
])
Lets view this as two blocks
Block 1
np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
])
Block 2
np.array([
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
I want to normalize by the last row of each block. I.e I want to divide each block by the last row (plus epsilon for stability so the zeros are 0/(0+eps) = 0).
I need an efficient way to do this.
My current inefficient solution is to create a new array of the same shape as a where block one in the new array is the last row of the corresponding block in a and the divide. As follows:
norming_indices = np.array([2,2,2,5,5,5])
divisors = a[norming_indices, :]
b = a / (divisors + 1e-9)
In this example:
divisors = np.array([
[9,9,0,0],
[9,9,0,0],
[9,9,0,0],
[0,0,8,8],
[0,0,8,8],
[0,0,8,8]
])
This like a very inefficient way to do this, does anyone have a better approach?

Reshape to three dimensions, apply the normalization for each block (last row (index 2) of each 3-row-block (step 3), then reshape back to original shape:
b = a.reshape(-1, 3, 4)
b = b / b[:,2::3].max(axis=2,keepdims=True)
b = b.reshape(a.shape)

np.concatenate may help you
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
b = np.concatenate((a[0:3, :] / (a[2, :] + 1e-9),
a[3:, :] / (a[5, :] + 1e-9)))
print(b)
Output:
[[0.11111111 0.22222222 0. 0. ]
[0.33333333 0.44444444 0. 0. ]
[1. 1. 0. 0. ]
[0. 0. 0.625 0.75 ]
[0. 0. 0.875 1. ]
[0. 0. 1. 1. ]]

Related

Replace column by 0 based on probability

How to replace column in the numpy array be certain number based on probability, if it is (1,X,X) shape.
I found code to replace rows, but cannot figure out how to modify it, so it is applicable for columns replacement.
grid_example = np.random.rand(1,5,5)
probs = np.random.random((1,5))
grid_example[probs < 0.25] = 0
grid_example
Thanks!
Use:
import numpy as np
rng = np.random.default_rng(42)
grid_example = rng.random((1, 5, 5))
probs = rng.random((1, 5))
grid_example[..., (probs < 0.25).flatten()] = 0
print(grid_example)
Output
[[[0. 0.43887844 0. 0. 0.09417735]
[0. 0.7611397 0. 0. 0.45038594]
[0. 0.92676499 0. 0. 0.4434142 ]
[0. 0.55458479 0. 0. 0.6316644 ]
[0. 0.35452597 0. 0. 0.7783835 ]]]
The notation [..., (probs < 0.25).flatten()] applies the boolean indexing to the last index. More on the documentation.

Create Jordan matrix from eigenvalues using NumPy

I have ndarray of eigenvalues and their multiplicities (for instance, np.array([(2.2, 2), (3, 3), (5, 1)])). I need to compute Jordan matrix for this eigenvalues without using Python cycles and iterables (list comprehensions, for loops etc.), only by using NumPy's functions.
I decided to build the matrix by this steps:
Create this blocks using np.vectorize and np.eye with np.fill_diagonal:
Combine blocks into one matrix using hstack and vstack.
But I've got two problems:
Here's snippet of my block creating code:
def eye(t):
eye = np.eye(t[1].astype(int),k=1)
return eye
def jordan_matrix(X: np.ndarray) -> np.ndarray:
dim = np.sum(X[:,1].astype(int))
eyes = np.vectorize(eye, signature='(x)->(n,m)')(X)
return eyes
And I'm getting error ValueError: could not broadcast input array from shape (3,3) into shape (2,2)
I need to create extra zero matrices to fill space which is not used by created blocks, but their sizes are variable and I can't figure out how to create them without using Python's for and its equivalents.
Am I on the right way? How can I get out of this problems?
np.vectorize would basically loop under the hoods. We could use NumPy funcs for actual vectorization at Python level. Here's one such way -
def blockwise_jordan(a):
r = a[:,1].astype(int)
v = np.repeat(a[:,0],r)
out = np.diag(v)
n = out.shape[1]
fillvals = np.ones(n, dtype=out.dtype)
fillvals[r[:-1].cumsum()-1] = 0
out.flat[1::out.shape[1]+1] = fillvals
return out
Sample run -
In [52]: X = np.array([(2.2, 2), (3, 3), (5, 1)])
In [53]: blockwise_jordan(X)
Out[53]:
array([[2.2, 1. , 0. , 0. , 0. , 0. ],
[0. , 2.2, 0. , 0. , 0. , 0. ],
[0. , 0. , 3. , 1. , 0. , 0. ],
[0. , 0. , 0. , 3. , 1. , 0. ],
[0. , 0. , 0. , 0. , 3. , 0. ],
[0. , 0. , 0. , 0. , 0. , 5. ]])
Optimization #1
We can replace the final three steps to perform the conditional assignment of 1s and 0s, like so -
out.flat[1::n+1] = 1
c = r[:-1].cumsum()-1
out[c,c+1] = 0
Here's my solution:
def jordan(a):
e = a[:,0] # eigenvalues
m = a[:,1].astype('int') # multiplicities
d = np.repeat(e, m) # main diagonal
ones = np.ones(d.size - 1)
ones[np.cumsum(m)[:-1] -1] = 0
j = np.diag(d) + np.diag(ones, k=1)
return j
Edit: just realized that my solution is almost the same as Divakar's.

Remove zero vectors from a matrix in TensorFlow

Just like the question says, I'm trying to remove all zeros vectors (i.e [0, 0, 0, 0]) from a tensor.
Given:
array([[ 0. , 0. , 0. , 0. ],
[ 0.19999981, 0.5 , 0. , 0. ],
[ 0.4000001 , 0.29999995, 0.10000002, 0. ],
...,
[-0.5999999 , 0. , -0.0999999 , -0.20000005],
[-0.29999971, -0.4000001 , -0.30000019, -0.5 ],
[ 0. , 0. , 0. , 0. ]], dtype=float32)
I had tried the following code (inspired by this SO):
x = tf.placeholder(tf.float32, shape=(10000, 4))
zeros_vector = tf.zeros(shape=(1, 4), dtype=tf.float32)
bool_mask = tf.not_equal(x, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)
But bool_mask seem also to be of shape (10000, 4), like it was comparing every element in the x tensor to zero, and not rows.
I thought about using tf.reduce_sum where an entire row is zero, but that will omit also rows like [1, -1, 0, 0] and I don't want that.
Ideas?
One possible way would be to sum over the absolute values of the row, in this way it will not omit rows like [1, -1, 0, 0] and then compare it with a zero vector. You can do something like this:
intermediate_tensor = reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.not_equal(intermediate_tensor, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)
I tried solution by Rudresh Panchal and it doesn't work for me. Maybe due versions change.
I found tipo in the first row: reduce_sum(tf.abs(x), 1) -> tf.reduce_sum(tf.abs(x), 1).
Also, bool_mask has rank 2 instead of rank 1, which is required:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N and K must be known statically. In other words, the shape of bool_mask must be for example [6] not [1,6]. tf.squeeze works well to reduce dimension.
Corrected code which works for me:
intermediate_tensor = tf.reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.squeeze(tf.not_equal(intermediate_tensor, zero_vector))
omit_zeros = tf.boolean_mask(x, bool_mask)
Just cast the tensor to tf.bool and use it as a boolean mask:
boolean_mask = tf.cast(x, dtype=tf.bool)
no_zeros = tf.boolean_mask(x, boolean_mask, axis=0)

From list of indices to one-hot matrix

What is the best (elegant and efficient) way in Theano to convert a vector of indices to a matrix of zeros and ones, in which every row is the one-of-N representation of an index?
v = t.ivector() # the vector of indices
n = t.scalar() # the width of the matrix
convert = <your code here>
f = theano.function(inputs=[v, n], outputs=convert)
Example:
n_val = 4
v_val = [1,0,3]
f(v_val, n_val) = [[0,1,0,0],[1,0,0,0],[0,0,0,1]]
I didn't compare the different option, but you can also do it like this. It don't request extra memory.
import numpy as np
import theano
n_val = 4
v_val = np.asarray([1,0,3])
idx = theano.tensor.lvector()
z = theano.tensor.zeros((idx.shape[0], n_val))
one_hot = theano.tensor.set_subtensor(z[theano.tensor.arange(idx.shape[0]), idx], 1)
f = theano.function([idx], one_hot)
print f(v_val)[[ 0. 1. 0. 0.]
[ 1. 0. 0. 0.]
[ 0. 0. 0. 1.]]
It's as simple as:
convert = t.eye(n,n)[v]
There still might be a more efficient solution that doesn't require building the whole identity matrix. This might be problematic for large n and short v's.
There's now a built in function for this theano.tensor.extra_ops.to_one_hot.
y = tensor.as_tensor([3,2,1])
fn = theano.function([], tensor.extra_ops.to_one_hot(y, 4))
print fn()
# [[ 0. 0. 0. 1.]
# [ 0. 0. 1. 0.]
# [ 0. 1. 0. 0.]]

Making a matrix square and padding it with desired value in numpy

In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot
Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)
Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])
For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.

Categories

Resources