Randomly choose index based on condition in numpy

Randomly choose index based on condition in numpy - python

Let's say I have 2D numpy array with 0 and 1 as values. I want to randomly pick an index that contains 1. Is there efficient way to do this using numpy?
I achieved it in pure python, but it's too slow.
Example input:
[[0, 1], [1, 0]]
output:
(0, 1)
EDIT:
For clarification: I want my function to get 2D numpy array with values belonging to {0, 1}. I want the output to be a tuple (2D index) of randomly (uniformly) picked value from the given array that is equal to 1.
EDIT2:
Using Paul H's suggestion, I came up with this:
nonzero = np.nonzero(a)
return random.choice(list(zip(nonzero)))
But it doesn't work with numpy's random choice, only with python's. Is there a way to optimise it better?

It's easier to get all the non-zero coordinates and sample from there:
xs,ys = np.where([[0, 1], [1, 0]])
# randomly pick a number:
idx = np.random.choice(np.arange(len(xs)) )
# output:
out = xs[idx], ys[idx]

You may try argwhere and permutation
a = np.array([[0, 1], [1, 0]])
b = np.argwhere(a)
tuple(np.random.permutation(b)[0])

Related

Numpy subtraction of every pair of elements of vector

Given a numpy array, say
x = np.array(
[[0],
[1],
[2]]
)
I would like to find the matrix containing a-b for every possible pair a,b in x. I.e.
[[0-0, 0-1, 0-2]
[1-0, 1-1, 1-2]
[2-0, 2-1, 2-2]]
==
[[ 0, -1, -2]
[+1, 0, -1]
[+2, +1, 0]]
I am avoiding using a for loop for the sake of efficiency.

As Michael wrote, numpy broadcasting can help you with this. If you try to perform an operation on a vector a with shape (3,1) with vector b with shape (1,3), numpy under the treats it as if the rows of a were repeated across the columns and columns of b where repeated across the rows result in the operations you described.
That's why Michael told you to subtract the transpose of the first vector with itself to recover the result you asked for. x-x.T
Broadcasting is good because it acheives this with striding, and most of the time uses less memory.
In the more general case, a and b do not have the same length for this to work. There are more details here:
https://numpy.org/doc/stable/user/basics.broadcasting.html

Alternative to loop for for boolean / nonzero indexing of numpy array

I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.

Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)

filter numpy array into separate arrays based on value, for contour plotting

I have numpy data which I am trying to turn into contour plot data. I realize this can be done through matplotlib, but I am trying to do this with just numpy if possible.
So, say I have an array of numbers 1-10, and and I want to divide the array according to contour "levels". I want to turn the input array into an array of boolean arrays, each of those being the size of the input, with a 1/True for any data point in that contour level and 0/False everywhere else.
For example, suppose the input is:
[1.2,2.3,3.4,2.5]
And the levels are [1,2,3,4],
then the return should be:
[[1,0,0,0],[0,1,0,1],[0,0,1,0]]
So here is the start of an example I whipped up:
import numpy as np
a = np.random.rand(3,3)*10
print(a)
b = np.zeros(54).reshape((6,3,3))
levs = np.arange(6)
#This is as far as I've gotten:
bins = np.digitize(a, levs)
print(bins)
I can use np.digitize to find out which level each value in a should belong to, but that's as far as I get. I'm fairly new to numpy and this really has me scratching me head. Any help would be greatly appreciated, thanks.

We could gather the indices off np.digitize output, which would represent the indices along the first n-1 axes, where n is the no. of dims in output to be set in the output as True values. So, we could use indexing after setting up the output array or we could use a outer range comparison to achieve the same upon leverage broadcasting.
Hence, with broadcasting one that covers generic n-dim arrays -
idx = np.digitize(a, levs)-1
out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
With indexing-based one re-using idx from previous method, it would be -
# https://stackoverflow.com/a/46103129/ #Divakar
def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)
out = np.zeros((idx.max()+1,) + idx.shape,dtype=int) #dtype=bool for bool array
out[all_idx(idx,axis=0)] = 1
Sample run -
In [77]: a = np.array([1.2,2.3,3.4,2.5])
In [78]: levs = np.array([1,2,3,4])
In [79]: idx = np.digitize(a, levs)-1
...: out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
In [80]: out.astype(int)
Out[80]:
array([[1, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 1, 0]])

Numpy : Grouping/ binning values based on associations

Forgive me for a vague title. I honestly don't know which title will suit this question. If you have a better title, let's change it so that it will be apt for the problem at hand.
The problem.
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the sum of the values grouped by associations.
An example for better clarification.
result array:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
values array:
[ 1., 2., 3., 4., 5., 6., 7., 8.]
Note: Here result and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result. The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]
Here, 1st value(values[0]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we are counting the number of associations, then element value at result[0,0] will be 2 as 1st value and 5th value are associated with result[0, 0]. If we are taking the sum, the result[0, 0] = value[0] + value[4] which is 6.
Current solution
# Initialisation. No connection with the solution.
result = np.zeros([4,2], dtype=np.int16)
values = np.linspace(start=1, stop=8, num=8)
y_mapping = np.random.randint(low=0, high=values.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=values.shape[1], size=values.shape[0])
# Summing the values associated with x,y (current solution.)
for i in range(values.size):
x = x_mapping[i]
y = y_mapping[i]
result[-y, x] = result[-y, x] + values[i]
The result,
[[6, 0],
[ 6, 2],
[14, 0],
[ 8, 0]]
Failed solution; But why?
test_result = np.zeros_like(result)
test_result[-y_mapping, x_mapping] = test_result[-y_mapping, x_mapping] + values # solution
To my surprise elements are overwritten in test_result. Values at test_result,
[[5, 0],
[6, 2],
[7, 0],
[8, 0]]
Question
1. Why, in the second solution, every element is overwritten?
As #Divakar has pointed out in the comment in his answer -
NumPy doesn't assign accumulated/summed values when the indices are repeated in test_result[-y_mapping, x_mapping] =. It randomly assigns from one of the instances.
2. Is there any Numpy way to do this? That is without looping? I'm looking for some speed optimization.
Approach #2 in #Divakar's answer gives me good results. For 23315 associations, for loop took 50 ms while Approach #1 took 1.85 ms. Beating all these, Approach #2 took 668 µs.
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2 on an i7 processor.

Approach #1
Most intutive one would be with np.add.at for those repeated indices -
np.add.at(result, [-y_mapping, x_mapping], values)
Approach #2
We need to perform binned summations owing to the possible repeated nature of x,y indices. Hence, another way could be to use NumPy's binned summation func : np.bincount and have an implementation like so -
# Get linear index equivalents off the x and y indices into result array
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
# Get binned summations off values based on linear index as bins
binned_sums = np.bincount(lidx, values, minlength=m*n)
# Finally add into result array
result += binned_sums.astype(result.dtype).reshape(m,n)
If you are always starting off with a zeros array for result, the last step could be made more performant with -
result = binned_sums.astype(out_dtype).reshape(m,n)

I guess you were to write
y_mapping = np.random.randint(low=0, high=result.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=result.shape[1], size=values.shape[0])
With that correction, the code works for me as expected.

sort numpy 2d array by indice of column

I am using numpy in python. I have a 1D(nx1) array and a 2D(nxm) array. I used argsort to get a indice of the 1D array. Now I want to use that indice to sort my 2D(nxm) array's colum.
I want to know how to do it?
For example:
>>>array1d = np.array([1, 3, 0])
>>>array2d = np.array([[1,2,3],[4,5,6]])
>>>array1d_indice = np.argsort(array1d)
array([2, 0, 1], dtype=int64)
I want use array1d_indice to sord array2d colum to get:
[[3, 1, 2],
[6, 4, 5]]
Or anyway easier to achieve this is welcome

If what you mean is that you want the columns sorted based on the vector, then you use argsort on the vector:
vi = np.argsort(vector)
then to arrange the columns of array in the right order,
sorted = array[:, tuple(vi)]
to get rows, switch around the order of : and tuple(vi)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Randomly choose index based on condition in numpy - python

It's easier to get all the non-zero coordinates and sample from there: xs,ys = np.where([[0, 1], [1, 0]]) # randomly pick a number: idx = np.random.choice(np.arange(len(xs)) ) # output: out = xs[idx], ys[idx]

You may try argwhere and permutation a = np.array([[0, 1], [1, 0]]) b = np.argwhere(a) tuple(np.random.permutation(b)[0])

Related

Numpy subtraction of every pair of elements of vector

Alternative to loop for for boolean / nonzero indexing of numpy array

filter numpy array into separate arrays based on value, for contour plotting

Numpy : Grouping/ binning values based on associations

sort numpy 2d array by indice of column

Categories

Resources