I have a numpy array of shape (100, 100, 20) (in python 3)
I want to find for each 'pixel' the 15 channels with minimum values, and make them zeros (meaning: make the array sparse, keep only the 5 highest values).
Example:
input: array = [[1,2,3], [7,6,9], [12,71,3]], num_channles_to_zero = 2
output: [[0,0,3], [0,0,9], [0,71,0]]
How can I do it?
what I have for now:
array = numpy.random.rand(100, 100, 20)
inds = numpy.argsort(array, axis=-1) # also shape (100, 100, 20)
I want to do something like
array[..., inds[..., :15]] = 0
but it doesn't give me what I want
np.argsort outputs indices suitable for the [...]_along_axis functions of numpy. This includes np.put_along_axis:
import numpy as np
array = np.random.rand(100, 100, 20)
print(array[0,0])
#[0.44116124 0.94656705 0.20833932 0.29239585 0.33001399 0.82396784
# 0.35841905 0.20670957 0.41473762 0.01568006 0.1435386 0.75231818
# 0.5532527 0.69366173 0.17247832 0.28939985 0.95098187 0.63648877
# 0.90629116 0.35841627]
inds = np.argsort(array, axis=-1)
np.put_along_axis(array, inds[..., :15], 0, axis=-1)
print(array[0,0])
#[0. 0.94656705 0. 0. 0. 0.82396784
# 0. 0. 0. 0. 0. 0.75231818
# 0. 0. 0. 0. 0.95098187 0.
# 0.90629116 0. ]
As it mentioned in the numpy documentation
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>>x = np.array([[1, 2], [3, 4], [5, 6]])
>>>x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
So, for your example:
a = np.array([[1,2,3], [7,6,9], [12,71,3]])
amax = a.argmax(axis=-1)
a[np.arange(a.shape[0]), amax] = 0
a
array([[ 1, 2, 0],
[ 7, 6, 0],
[12, 0, 3]])
Related
I'm looking to see if there is a more efficient way (i.e. using native NumPy functionality) to achieve what I'm doing currently.
My process is I start with an array a:
a = np.array([[0,2,0,-1],[-0.2,0,-0.1,0],[0,0,-0.1,0],[0,0,0,0]])
array([[ 0. , 2. , 0. , -1. ],
[-0.2, 0. , -0.1, 0. ],
[ 0. , 0. , -0.1, 0. ],
[ 0. , 0. , 0. , 0. ]])
I then filter based on where the values are not equal to 0:
r_indices, c_indicies = np.where(a != 0)
(array([0, 0, 1, 1, 2]), array([1, 3, 0, 2, 2]))
From there, I create a Python dictionary b like so:
b = {i: c_indices[r_indices == i] for i in np.unique(r_indices)}
{
0: array([1, 3]),
1: array([0, 2]),
2: array([2])},
}
I do this because I want to know for a given unique row index r, which column indices are not 0.
My own preference is to try to use NumPy as much as possible to take advantage of speed benefits. However, I'm not sure how else to structure this in NumPy since the values in the dictionary could range from a length of 0 (no values are not zero) to 4 (all values are not zero).
Am I being paranoid about the potential speed benefits?
You can use Pandas in the following way:
import pandas as pd
import numpy as np
if __name__=='__main__':
a = np.array([[0, 2, 0, -1], [-0.2, 0, -0.1, 0], [0, 0, -0.1, 0], [0, 0, 0, 0]])
rows, cols = np.where(a !=0)
x = list(zip(rows, cols))
df = pd.DataFrame.from_records(data=x)
l = df.groupby(0)[1].apply(list)
L = [np.array(a) for a in l.values]
d = dict(zip(np.unique(rows), L))
Output
{0: array([1, 3]), 1: array([0, 2]), 2: array([2])}
As pandas works with numpy under the hood, this code will be much more efficient than the regular list comprehension.
Also, if all you need is a dictionary-like object - you could inhance the performance further by using the l Pandas.GroupBy as:
l.loc[0]
which will result in :
[1, 3]
which is equivalent to the b[0] in your example.
and omitting the last two lines altogether, as Pandas provide a very fast mechanisms for handling large amounts of tabular data, and generally preferable to a plain dict object, if they used for the same thing.
Cheers.
I'm trying to to update an element in an array. If I've got an array say:
[[0, 0],
[0, 0]]
as far as I knew the way to update eg. the first element to 0.5, was
array[0,0] = 0.5
However when I print the array the contents are unchanged. I read some things on Stack Overflow about copies being created of arrays but I don't know if this applies.
Any help would be great
Your problem is that your array is integer-valued (because you initialize it with integers), and when you write a float to it, it gets rounded to 0. You can check that this is the case if you write
array = np.array([[0, 0], [0, 0]])
array[0, 0] = 1.5
>>> array = array([[1, 0],
[0, 0]])
To get the expected behaviour, either initialize it with floats
array = np.array([[0., 0.], [0., 0.]])
or explicitly specify dtype
array = np.array([[0, 0], [0, 0]], dtype=np.float32)
You need to change the data type of the numpy array before updating the value to float
import numpy as np
a = [[0,0],[0,0]]
a = np.array(a)
a = a.astype('float64')
a[0,0] = 0.5
print(a)
this will give you
[[0.5 0. ]
[0. 0. ]]
The data type of the array is automatically set to int, 0.5 as an int is 0.
# For example:
In [12]: int(0.5)
Out[12]: 0
# To construct the array try:
array = np.array([[0.0,0.0],[0.0,0.0]])
# or:
array = np.array([[0,0],[0,0]], dtype=float)
Then:
In [9]: array[0,0]=0.5
In [10]: array
Out[10]:
array([[0.5, 0. ],
[0. , 0. ]])
Python nested list objects don't support array-like indexing. You can use only a single value to to index a list
arr = [[0,0], [0,0]]
arr[0][0] = 0.5
arr # [[0.5, 0], [0, 0]]
To use the kind of indexing you mention in your post, you'll have to use a numpy array
import numpy as np
np_arr = np.array([[0,0], [0,0]], dtype=np.float32)
np_arr[0,0] = 0.5
I am trying to interpolate a 2D numpy matrix with the dimensions (5, 3) to a matrix with the dimensions (7, 3) along the axis 1 (columns). Obviously, the wrong approach would be to randomly insert rows anywhere between the original matrix, see the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Target (terrible interpolation -> not wanted!):
[[0, 1, 1]
[0, 1.5, 0.5]
[0, 2, 0]
[0, 3, 1]
[0, 3.5, 0.5]
[0, 4, 0]
[0, 5, 1]]
The correct approach would be to take every row into account and interpolate between all of them to expand the source matrix to a (7, 3) matrix. I am aware of the scipy.interpolate.interp1d or scipy.interpolate.interp2d methods, but could not get it to work with other Stack Overflow posts or websites. I hope to receive any type of tips or tricks.
Update #1: The expected values should be equally spaced.
Update #2:
What I want to do is basically use the separate columns of the original matrix, expand the length of the column to 7 and interpolate between the values of the original column. See the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Split into 3 separate Columns:
[0 [1 [1
0 2 0
0 3 1
0 4 0
0] 5] 1]
Expand length to 7 and interpolate between them, example for second column:
[1
1.66
2.33
3
3.66
4.33
5]
It seems like each column can be treated completely independently, but for each column you need to define essentially an "x" coordinate so that you can fit some function "f(x)" from which you generate your output matrix.
Unless the rows in your matrix are associated with some other datastructure (e.g. a vector of timestamps), an obvious set of x values is just the row-number:
x = numpy.arange(0, Source.shape[0])
You can then construct an interpolating function:
fit = scipy.interpolate.interp1d(x, Source, axis=0)
and use that to construct your output matrix:
Target = fit(numpy.linspace(0, Source.shape[0]-1, 7)
which produces:
array([[ 0. , 1. , 1. ],
[ 0. , 1.66666667, 0.33333333],
[ 0. , 2.33333333, 0.33333333],
[ 0. , 3. , 1. ],
[ 0. , 3.66666667, 0.33333333],
[ 0. , 4.33333333, 0.33333333],
[ 0. , 5. , 1. ]])
By default, scipy.interpolate.interp1d uses piecewise-linear interpolation. There are many more exotic options within scipy.interpolate, based on higher order polynomials, etc. Interpolation is a big topic in itself, and unless the rows of your matrix have some particular properties (e.g. being regular samples of a signal with a known frequency range), there may be no "truly correct" way of interpolating. So, to some extent, the choice of interpolation scheme will be somewhat arbitrary.
You can do this as follows:
from scipy.interpolate import interp1d
import numpy as np
a = np.array([[0, 1, 1],
[0, 2, 0],
[0, 3, 1],
[0, 4, 0],
[0, 5, 1]])
x = np.array(range(a.shape[0]))
# define new x range, we need 7 equally spaced values
xnew = np.linspace(x.min(), x.max(), 7)
# apply the interpolation to each column
f = interp1d(x, a, axis=0)
# get final result
print(f(xnew))
This will print
[[ 0. 1. 1. ]
[ 0. 1.66666667 0.33333333]
[ 0. 2.33333333 0.33333333]
[ 0. 3. 1. ]
[ 0. 3.66666667 0.33333333]
[ 0. 4.33333333 0.33333333]
[ 0. 5. 1. ]]
As the title states, I'm trying to extract the highest n elements per row from a matrix in tensorflow, and store the result in a sparse Tensor.
I've been able to extract the indices and values with tf.nn.top_n, but the indices don't follow the convention required by tf.SparseTensor.
Specifically, tf.nn.top_n returns a matrix of col indices with the same shape as the resulting value matrix (Rows x n), whereas tf.SparseTensor wants a (# non-zero x 2) matrix with 1 row per non-zero element and the columns holding the row and col indices.
The values can an analogous problem whereby a list of non-zero elements is desired instead of a matrix of values.
How can I quickly convert between these indexing notation schemes?
This is doable with a bit of modular arithmetic. Here's an example that works on matrices, although it would be possible to loop over more axes.
import tensorflow as tf
def slices_to_dims(slice_indices):
"""
Args:
slice_indices: An [N, k] Tensor mapping to column indices.
Returns:
An index Tensor with shape [N * k, 2], corresponding to indices suitable for
passing to SparseTensor.
"""
slice_indices = tf.cast(slice_indices, tf.int64)
num_rows = tf.shape(slice_indices, out_type=tf.int64)[0]
row_range = tf.range(num_rows)
item_numbers = slice_indices * num_rows + tf.expand_dims(row_range, axis=1)
item_numbers_flat = tf.reshape(item_numbers, [-1])
return tf.stack([item_numbers_flat % num_rows,
item_numbers_flat // num_rows], axis=1)
Example usage:
dense_shape = [5, 7]
dense_matrix = tf.random_normal(shape=dense_shape)
top_values, top_indices = tf.nn.top_k(dense_matrix, k=2)
sparse_indices = slices_to_dims(top_indices)
sparse_tensor = tf.sparse_reorder(tf.SparseTensor(
indices=sparse_indices,
values=tf.reshape(top_values, [-1]),
dense_shape=dense_shape))
densified_top = tf.sparse_tensor_to_dense(sparse_tensor)
with tf.Session() as session:
sparse_top, dense_original, dense_selected = session.run(
[sparse_tensor, dense_matrix, densified_top])
print(dense_original)
print(dense_selected)
print(sparse_top)
Prints:
[[ 1.44056129 -1.01790774 -0.2795608 2.34854746 -2.27528405 -0.62035948
3.36598897]
[ 0.7114948 -0.42564821 -0.93446779 -0.25373486 -0.51730365 0.72331643
-0.75625718]
[-0.6501748 -0.92748415 -0.95409006 -0.07157528 0.80637723 -0.32177576
-1.4516511 ]
[-1.081038 -0.67226124 -1.19455576 0.44537872 -0.69019234 -0.61539739
0.15328468]
[ 0.43032476 -0.11295394 0.83491379 -0.67906654 0.20325914 -0.0155068
0.52107805]]
[[ 0. 0. 0. 2.34854746 0. 0.
3.36598897]
[ 0.7114948 0. 0. 0. 0. 0.72331643
0. ]
[ 0. 0. 0. -0.07157528 0.80637723 0. 0. ]
[ 0. 0. 0. 0.44537872 0. 0.
0.15328468]
[ 0. 0. 0.83491379 0. 0. 0.
0.52107805]]
SparseTensorValue(indices=array([[0, 3],
[0, 6],
[1, 0],
[1, 5],
[2, 3],
[2, 4],
[3, 3],
[3, 6],
[4, 2],
[4, 6]]), values=array([ 2.34854746, 3.36598897, 0.7114948 , 0.72331643, -0.07157528,
0.80637723, 0.44537872, 0.15328468, 0.83491379, 0.52107805], dtype=float32), dense_shape=array([5, 7]))
How can I get the indices of intersection points between two numpy arrays? I can get intersecting values with intersect1d:
import numpy as np
a = np.array(xrange(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
# inter == array([ 2, 7, 10])
But how can I get the indices into a of the values in inter?
You could use the boolean array produced by in1d to index an arange. Reversing a so that the indices are different from the values:
>>> a[::-1]
array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> a = a[::-1]
intersect1d still returns the same values...
>>> numpy.intersect1d(a, b)
array([ 2, 7, 10])
But in1d returns a boolean array:
>>> numpy.in1d(a, b)
array([ True, False, False, True, False, False, False, False, True,
False, False], dtype=bool)
Which can be used to index a range:
>>> numpy.arange(a.shape[0])[numpy.in1d(a, b)]
array([0, 3, 8])
>>> indices = numpy.arange(a.shape[0])[numpy.in1d(a, b)]
>>> a[indices]
array([10, 7, 2])
To simplify the above, though, you could use nonzero -- this is probably the most correct approach, because it returns a tuple of uniform lists of X, Y... coordinates:
>>> numpy.nonzero(numpy.in1d(a, b))
(array([0, 3, 8]),)
Or, equivalently:
>>> numpy.in1d(a, b).nonzero()
(array([0, 3, 8]),)
The result can be used as an index to arrays of the same shape as a with no problems.
>>> a[numpy.nonzero(numpy.in1d(a, b))]
array([10, 7, 2])
But note that under many circumstances, it makes sense just to use the boolean array itself, rather than converting it into a set of non-boolean indices.
Finally, you can also pass the boolean array to argwhere, which produces a slightly differently-shaped result that's not as suitable for indexing, but might be useful for other purposes.
>>> numpy.argwhere(numpy.in1d(a, b))
array([[0],
[3],
[8]])
If you need to get unique values as given by intersect1d:
import numpy as np
a = np.array([range(11,21), range(11,21)]).reshape(20)
b = np.array([12, 17, 20])
print(np.intersect1d(a,b))
#unique values
inter = np.in1d(a, b)
print(a[inter])
#you can see these values are not unique
indices=np.array(range(len(a)))[inter]
#These are the non-unique indices
_,unique=np.unique(a[inter], return_index=True)
uniqueIndices=indices[unique]
#this grabs the unique indices
print(uniqueIndices)
print(a[uniqueIndices])
#now they are unique as you would get from np.intersect1d()
Output:
[12 17 20]
[12 17 20 12 17 20]
[1 6 9]
[12 17 20]
indices = np.argwhere(np.in1d(a,b))
For Python >= 3.5, there's another solution to do so
Other Solution
Let we go through this step by step.
Based on the original code from the question
import numpy as np
a = np.array(range(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
First, we create a numpy array with zeros
c = np.zeros(len(a))
print (c)
output
>>> [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Second, change array value of c using intersect index. Hence, we have
c[inter] = 1
print (c)
output
>>>[ 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]
The last step, use the characteristic of np.nonzero(), it will return exactly the index of the non-zero term you want.
inter_with_idx = np.nonzero(c)
print (inter_with_idx)
Final output
array([ 2, 7, 10])
Reference
[1] numpy.nonzero
As of numpy version 1.15.0 intersect1d has a return_indices option :
numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)