Remove zero vectors from a matrix in TensorFlow

Remove zero vectors from a matrix in TensorFlow - python

Just like the question says, I'm trying to remove all zeros vectors (i.e [0, 0, 0, 0]) from a tensor.
Given:
array([[ 0. , 0. , 0. , 0. ],
[ 0.19999981, 0.5 , 0. , 0. ],
[ 0.4000001 , 0.29999995, 0.10000002, 0. ],
...,
[-0.5999999 , 0. , -0.0999999 , -0.20000005],
[-0.29999971, -0.4000001 , -0.30000019, -0.5 ],
[ 0. , 0. , 0. , 0. ]], dtype=float32)
I had tried the following code (inspired by this SO):
x = tf.placeholder(tf.float32, shape=(10000, 4))
zeros_vector = tf.zeros(shape=(1, 4), dtype=tf.float32)
bool_mask = tf.not_equal(x, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)
But bool_mask seem also to be of shape (10000, 4), like it was comparing every element in the x tensor to zero, and not rows.
I thought about using tf.reduce_sum where an entire row is zero, but that will omit also rows like [1, -1, 0, 0] and I don't want that.
Ideas?

One possible way would be to sum over the absolute values of the row, in this way it will not omit rows like [1, -1, 0, 0] and then compare it with a zero vector. You can do something like this:
intermediate_tensor = reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.not_equal(intermediate_tensor, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)

I tried solution by Rudresh Panchal and it doesn't work for me. Maybe due versions change.
I found tipo in the first row: reduce_sum(tf.abs(x), 1) -> tf.reduce_sum(tf.abs(x), 1).
Also, bool_mask has rank 2 instead of rank 1, which is required:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N and K must be known statically. In other words, the shape of bool_mask must be for example [6] not [1,6]. tf.squeeze works well to reduce dimension.
Corrected code which works for me:
intermediate_tensor = tf.reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.squeeze(tf.not_equal(intermediate_tensor, zero_vector))
omit_zeros = tf.boolean_mask(x, bool_mask)

Just cast the tensor to tf.bool and use it as a boolean mask:
boolean_mask = tf.cast(x, dtype=tf.bool)
no_zeros = tf.boolean_mask(x, boolean_mask, axis=0)

Related

How to effienctly divide blocks of numpy array

I dont even know how to phrase what I am trying to do so I'm going straight to a simple example. I have a blocked array that looks something like this:
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
and I want as an output:
np.array([
[1/9,2/9,0,0],
[3/9,4/9,0,0],
[9/9,9/9,0,0],
[0,0,5/8,6/8],
[0,0,7/8,8/8],
[0,0,8/8,8/8]
])
Lets view this as two blocks
Block 1
np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
])
Block 2
np.array([
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
I want to normalize by the last row of each block. I.e I want to divide each block by the last row (plus epsilon for stability so the zeros are 0/(0+eps) = 0).
I need an efficient way to do this.
My current inefficient solution is to create a new array of the same shape as a where block one in the new array is the last row of the corresponding block in a and the divide. As follows:
norming_indices = np.array([2,2,2,5,5,5])
divisors = a[norming_indices, :]
b = a / (divisors + 1e-9)
In this example:
divisors = np.array([
[9,9,0,0],
[9,9,0,0],
[9,9,0,0],
[0,0,8,8],
[0,0,8,8],
[0,0,8,8]
])
This like a very inefficient way to do this, does anyone have a better approach?

Reshape to three dimensions, apply the normalization for each block (last row (index 2) of each 3-row-block (step 3), then reshape back to original shape:
b = a.reshape(-1, 3, 4)
b = b / b[:,2::3].max(axis=2,keepdims=True)
b = b.reshape(a.shape)

np.concatenate may help you
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
b = np.concatenate((a[0:3, :] / (a[2, :] + 1e-9),
a[3:, :] / (a[5, :] + 1e-9)))
print(b)
Output:
[[0.11111111 0.22222222 0. 0. ]
[0.33333333 0.44444444 0. 0. ]
[1. 1. 0. 0. ]
[0. 0. 0.625 0.75 ]
[0. 0. 0.875 1. ]
[0. 0. 1. 1. ]]

Increasing the performance of a code snippet with nested for-loops

I have to run the snippet shown below about 200000 times in a row and the snippet needs about 0.12585 seconds for 1000 iterations. Datapoints has a shape of (3, 2704, 64)
output = []
maxium = 0
for datapoint in datapoints:
tmp = []
for data in datapoint:
maxium = max(data)
if maxium == 0:
tmp.append(data)
else:
tmp.append(data / maxium)
output.append(tmp)
I have tried to rewrite it using map() but this gives me an average of 0.23237 seconds per iteration. This is probably due to the multiple max(y) and list() calls.
np.asarray(list(map(lambda datapoint: list(map(lambda data: data / max(data) if max(data) > 0 else y, datapoint)), datapoints)))
Is there a possibility to optimize the code again to improve performance?

Well here's a short answer:
def bar(datapoints):
m = np.amax(datapoints, axis=2)
m[m == 0] = 1
return datapoints / m[:,:,np.newaxis]
Here's an explanation of how you might have got there (it's how I did get there!):
Let's start off with some example data:
>>> x = np.array([[[1, 2, 3, 4], [11, -12, 13, -14]], [[26, 27, 28, 29], [0, 0, 0, 0]]])
Now check what you get on your original function:
def foo(datapoints):
output = []
maxium = 0
for datapoint in datapoints:
tmp = []
for data in datapoint:
maxium = max(data)
if maxium == 0:
tmp.append(data)
else:
tmp.append(data / maxium)
output.append(tmp)
return numpy.array(output)
The result is:
>>> foo(x)
array([[[ 0.25 , 0.5 , 0.75 , 1. ],
[ 0.84615385, -0.92307692, 1. , -1.07692308]],
[[ 0.89655172, 0.93103448, 0.96551724, 1. ],
[ 0. , 0. , 0. , 0. ]]])
Now let's try out amax:
>>> np.amax(x, axis=0)
array([[26, 27, 28, 29],
[11, 0, 13, 0]])
>>> np.amax(x, axis=2)
array([[ 4, 13],
[29, 0]])
Ah ha, looks like axis=2 is what we're after. Now we want to divide the original array by this, but only in the places where the max is non-zero. How do only divide in some places? The answer is: we divide everywhere, but in some places we divide by 1 so it has no effect. So let's replace zeros with ones:
>>> m = np.amax(x, axis=2)
>>> m[m == 0] = 1
>>> m
array([[ 4, 13],
[29, 1]])
Finally, let's divide by this, broadcasting back over axis 2 which we took the maximum over earlier:
>>> x / m[:,:,np.newaxis]
array([[[ 0.25 , 0.5 , 0.75 , 1. ],
[ 0.84615385, -0.92307692, 1. , -1.07692308]],
[[ 0.89655172, 0.93103448, 0.96551724, 1. ],
[ 0. , 0. , 0. , 0. ]]])
Putting that all together you get bar() at the top.

Try something like this:
maximum = datapoints.max(axis=2, keepdims=True)
output = np.where(maximum==0, datapoints, datapoints/maximum)
You would see a warning invalid value encounter in true_divide but it should work as expected.
Update as #ArthurTacca pointed out:
output = datapoints/np.where(maximum==0, 1, maximum)
will eliminate the warning.

Yes you can definitely speed this up w/ vectorized numpy operations. Here's how I would do it, if I understand what you're trying to do correctly:
import numpy as np
# I use a randomly initialized array here, replace this with your input
arr = np.random.random(size=(3, 2704, 64))
# Find max for 3rd dimension, returns array w/ shape (3, 2704)
max_arr = np.max(arr, axis=2)
# Set up divisor, returns array w/ shape (3, 2704)
divisor = np.where(max_arr == 0, 1, max_arr)
# Use expand_dims to add third dimension, returns array w/ shape (3, 2704, 1)
divisor = np.expand_dims(divisor, axis=2)
# Perform division, shape is (3, 2704, 64)
ans = np.divide(arr, divisor)
From your code, I gather that you intend to scale your data by the max of your 3rd axis, but in the event of there being 0, forego scaling instead. You seem to also want your output to have the same shape as your input, which explains the way you structured output and tmp. That's why I left the code snippet to end w/ output in a numpy array, but if you need it in its original form regardless, its a simple loop to re-arrange your data:
output = []
for i in ans:
tmp = []
for j in i:
tmp.append(list(j))
output.append(tmp)
For future reference, furnish your questions with more detail. It will make it easier for people to participate, and you'll increase the chance of getting your questions answered quickly!

slicing a tensor along a dimension with given index

suppose I have a tensor:
tensor = tf.constant(
[[[0.05340263, 0.27248233, 0.49127685, 0.07926575, 0.96054204],
[0.50013988, 0.05903472, 0.43025479, 0.41379231, 0.86508251],
[0.02033722, 0.11996034, 0.57675261, 0.12049974, 0.65760677],
[0.71859089, 0.22825203, 0.64064407, 0.47443116, 0.64108334]],
[[0.18813498, 0.29462021, 0.09433628, 0.97393446, 0.33451445],
[0.01657461, 0.28126666, 0.64016929, 0.48365073, 0.26672697],
[0.9379696 , 0.44648103, 0.39463243, 0.51797975, 0.4173626 ],
[0.89788558, 0.31063058, 0.05492096, 0.86904097, 0.21696292]],
[[0.07279436, 0.94773635, 0.34173115, 0.7228713 , 0.46553334],
[0.61199848, 0.88508141, 0.97019517, 0.61465985, 0.48971128],
[0.53037002, 0.70782324, 0.32158754, 0.2793538 , 0.62661128],
[0.52787814, 0.17085317, 0.83711126, 0.40567032, 0.71386498]]])
which is of shape (3, 4, 5)
I want to slice it to return a new tensor of shape (3,5), with a given 1D tensor whose value indicates which position to retrieve, for example:
index_tensor = tf.constant([2,1,3])
which results in a new tensor which looks like this:
[[0.02033722, 0.11996034, 0.57675261, 0.12049974, 0.65760677],
[0.01657461, 0.28126666, 0.64016929, 0.48365073, 0.26672697],
[0.52787814, 0.17085317, 0.83711126, 0.40567032, 0.71386498]]
that is , along the second dimension, take items from index 2, 1, and 3.
It is similar to do:
tensor[:,x,:]
except this will only give me item at index 'x' along the dimension, and I want it to be flexible.
Can this be done?

You can use tf.one_hot() to mask index_tensor.
index = tf.one_hot(index_tensor,tensor.shape[1])
[[0. 0. 1. 0.]
[0. 1. 0. 0.]
[0. 0. 0. 1.]]
Then get your result by tf.boolean_mask().
result = tf.boolean_mask(tensor,index)
[[0.02033722 0.11996034 0.57675261 0.12049974 0.65760677]
[0.01657461 0.28126666 0.64016929 0.48365073 0.26672697]
[0.52787814 0.17085317 0.83711126 0.40567032 0.71386498]]

tensor = tf.constant(
[[[0.05340263, 0.27248233, 0.49127685, 0.07926575, 0.96054204],
[0.50013988, 0.05903472, 0.43025479, 0.41379231, 0.86508251],
[0.02033722, 0.11996034, 0.57675261, 0.12049974, 0.65760677],
[0.71859089, 0.22825203, 0.64064407, 0.47443116, 0.64108334]],
[[0.18813498, 0.29462021, 0.09433628, 0.97393446, 0.33451445],
[0.01657461, 0.28126666, 0.64016929, 0.48365073, 0.26672697],
[0.9379696 , 0.44648103, 0.39463243, 0.51797975, 0.4173626 ],
[0.89788558, 0.31063058, 0.05492096, 0.86904097, 0.21696292]],
[[0.07279436, 0.94773635, 0.34173115, 0.7228713 , 0.46553334],
[0.61199848, 0.88508141, 0.97019517, 0.61465985, 0.48971128],
[0.53037002, 0.70782324, 0.32158754, 0.2793538 , 0.62661128],
[0.52787814, 0.17085317, 0.83711126, 0.40567032, 0.71386498]]])
with tf.Session() as sess :
sess.run( tf.global_variables_initializer() )
print(sess.run( tf.concat( [ tensor[0:1,2:3], tensor[1:2,1:2], tensor[2:3,3:4] ] , 1 ) ))
This will print the values like this.
[[[0.02033722 0.11996034 0.5767526 0.12049974 0.6576068 ]
[0.01657461 0.28126666 0.64016926 0.48365074 0.26672697]
[0.52787817 0.17085317 0.83711123 0.40567032 0.713865 ]]]

Changing the scale of a tensor in tensorflow

Sorry if I messed up the title, I didn't know how to phrase this. Anyways, I have a tensor of a set of values, but I want to make sure that every element in the tensor has a range from 0 - 255, (or 0 - 1 works too). However, I don't want to make all the values add up to 1 or 255 like softmax, I just want to down scale the values.
Is there any way to do this?
Thanks!

You are trying to normalize the data. A classic normalization formula is this one:
normalize_value = (value − min_value) / (max_value − min_value)
The implementation on tensorflow will look like this:
tensor = tf.div(
tf.subtract(
tensor,
tf.reduce_min(tensor)
),
tf.subtract(
tf.reduce_max(tensor),
tf.reduce_min(tensor)
)
)
All the values of the tensor will be betweetn 0 and 1.
IMPORTANT: make sure the tensor has float/double values, or the output tensor will have just zeros and ones. If you have a integer tensor call this first:
tensor = tf.to_float(tensor)
Update: as of tensorflow 2, tf.to_float() is deprecated and instead, tf.cast() should be used:
tensor = tf.cast(tensor, dtype=tf.float32) # or any other tf.dtype, that is precise enough

According to the feature scaling in Wikipedia you can also try the Scaling to unit length:
It can be implemented using this segment of code:
In [3]: a = tf.constant([2.0, 4.0, 6.0, 1.0, 0])
In [4]: b = a / tf.norm(a)
In [5]: b.eval()
Out[5]: array([ 0.26490647, 0.52981293, 0.79471946, 0.13245323, 0. ], dtype=float32)

sigmoid(tensor) * 255 should do it.

Let the input be
X = tf.constant([[0.65,0.61, 0.59, 0.62, 0.6 ],[0.25,0.31, 0.89, 0.52, 0.6 ]])
We can define a scaling function
def rescale(X, a=0, b=1):
repeat = X.shape[1]
xmin = tf.repeat(tf.reshape(tf.math.reduce_min(X, axis=1), shape=[-1,1]), repeats=repeat, axis=1)
xmax = tf.repeat(tf.reshape(tf.math.reduce_max(X, axis=1), shape=[-1,1]), repeats=repeat, axis=1)
X = (X - xmin) / (xmax-xmin)
return X * (b - a) + a
This outputs X in range [0,1]
>>rescale(X)
<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[1. , 0.333334 , 0. , 0.5000005 , 0.16666749],
[0. , 0.09375001, 1. , 0.42187497, 0.54687506]],
dtype=float32)>
To scale in range [0, 255]
>> rescale(X, 0, 255)
<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[255. , 85.00017 , 0. , 127.50012 , 42.50021 ],
[ 0. , 23.906252, 255. , 107.57812 , 139.45314 ]],
dtype=float32)>

In some contexts, you need to normalize each image separately - for example adversarial datasets where each image has noise. The following normalizes each image according to its own min and max, assuming the inputs have typical size Batch x YDim x XDim x Channels:
cast_input = tf.cast(inputs,dtype=tf.float32) # e.g. MNIST is integer
input_min = tf.reduce_min(cast_input,axis=[1,2]) # result B x C
input_max = tf.reduce_max(cast_input,axis=[1,2])
ex_min = tf.expand_dims(input_min,axis=1) # put back inner dimensions
ex_max = tf.expand_dims(input_max,axis=1)
ex_min = tf.expand_dims(ex_min,axis=1) # one at a time - better way?
ex_max = tf.expand_dims(ex_max,axis=1) # Now Bx1x1xC
input_range = tf.subtract(ex_max, ex_min)
floored = tf.subtract(cast_input,ex_min) # broadcast
scale_input = tf.divide(floored,input_range)
I would like to expand the dimensions in one short like you can in Numpy, but tf.expand_dims seems to only accept one dimension at a a time - open to suggestions here. Thanks!

If you want the maximum value to be the effective upper bound of the 0-1 range and there's a meaningful zero then using this:
import tensorflow as tf
tensor = tf.constant([0, 1, 5, 10])
tensor = tf.divide(tensor, tf.reduce_max(tensor))
tf.print(tensor)
would result in:
[0 0.1 0.5 1]

Making a matrix square and padding it with desired value in numpy

In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot

Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)

Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])

For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove zero vectors from a matrix in TensorFlow - python

Just cast the tensor to tf.bool and use it as a boolean mask: boolean_mask = tf.cast(x, dtype=tf.bool) no_zeros = tf.boolean_mask(x, boolean_mask, axis=0)

Related

How to effienctly divide blocks of numpy array

Increasing the performance of a code snippet with nested for-loops

slicing a tensor along a dimension with given index

Changing the scale of a tensor in tensorflow

Making a matrix square and padding it with desired value in numpy

Categories

Resources