how do i change indexes in an array using numba - python

I have a function in which I do some operations and want to speed it up with numba. In my code changing the values in an array with advanced indexing is not working. I think they do say that in the numba documents. But what is a workaround for like numpy.put()?
Here a short example what I want to do:
#example array
array([[ 0, 1, 2],
[ 0, 2, -1],
[ 0, 3, -1]])
changeing the values at given indexes with any method working in numba...to get:
changed values at:[0,0], [1,2], [2,1]
#changed example array by given indexes with one given value (10)
array([[ 10, 1, 2],
[ 0, 2, 10],
[ 0, 10, -1]])
Here what I did in python, but not working with numba:
indexList is a Tuple, which works with numpy.take()
This is the working example python code and the values in the array change to 100.
x = np.zeros((151,151))
print(x.ndim)
indexList=np.array([[0,1,3],[0,1,2]])
indexList=tuple(indexList)
def change(xx,filter_list):
xx[filter_list] = 100
return xx
Z = change(x,indexList)
Now using #jit on the function:
#jit
def change(xx,filter_list):
xx[filter_list] = 100
return xx
Z = change(x,indexList)
Compilation is falling back to object mode WITH looplifting enabled because Function "change" failed type inference due to: No implementation of function Function() found for signature: setitem(array(float64, 2d, C), UniTuple(array(int32, 1d, C) x 2), Literalint)
This error comes up. So I need a workaround for this. numpy.put() is not supported by numba.
I would be greatful for any ideas.
Thankyou

If it's not a problem for your to keep the indexList as an array you can use it in conjunction with for loops in the change function to make it compatible with numba:
indexList = np.array([[0,1,3],[0,1,2]]).T
#njit()
def change(xx, filter_list):
for y, x in filter_list:
xx[y, x] = 100
return xx
change(x, indexList)
Note that the indexList has to be transposed in order to have the y, x coordinates along the 1st axis. In other words, it has to have a shape of (n, 2) rather than (2, n) for the n points to be change. Effectively it's now a list of coordinates: [[0, 0],[1, 1],[3, 2]]

#mandulaj posted the way to go. Here a little different way I went before mandulaj gave his answer.
With this function I get a deprecation warning...so best way to go with #mandulaj and dont forget to transpose the indexList.
#jit
def change_arr(arr,idx,val): # change values in array by np index array to value
for i,item in enumerate(idx):
arr[item[0],item[1]]= val
return arr

Related

Combining 2 numpy 3d arrays into overall shape

I have two 3d np.arrays containing numbers.
both np.arrays can have different shapes (different dimensions).
my objective would be to generate a 3d np.array:
which have a shape which contain both other shapes (ie (1,1,3) & (1,2,1) => (1,2,3))
where each element is the sum of the element of the parent 3d array which have the same coordinates (assuming 0 when the coordinates did not exist)
to summarize, I would like to obtain the following:
a=np.array([[[0, 0, 0, 1]]])
b= np.array([[[0],
[1]]])
addition(a, b)
>>> array([[[0, 0, 0, 1],
[1, 0, 0, 0]]])
Thanks in advance for your help
EDIT: I found better
def addition(a,b):
c = np.zeros(np.max([np.shape(a), np.shape(b)], axis=0), dtype=int)
c[np.where(a!=0)] += a[np.where(a!=0)]
c[np.where(b!=0)] += b[np.where(b!=0)]
return c
OLD:
After multiple research, i haven’t found a good way to do it without
iterate over all of the array :
def addition(a, b):
c = np.zeros(np.max([np.shape(a), np.shape(b)], axis=0), dtype=int)
for index, element in np.ndenumerate(a):
c[index] += element
for index, element in np.ndenumerate(b):
c[index] += element
return c
I’ll continue to search a better way to do it
EDIT 2:
I added a dtype=int, because you seem to want to keep the int version instead of the float.
Have fun

Randomly choose index based on condition in numpy

Let's say I have 2D numpy array with 0 and 1 as values. I want to randomly pick an index that contains 1. Is there efficient way to do this using numpy?
I achieved it in pure python, but it's too slow.
Example input:
[[0, 1], [1, 0]]
output:
(0, 1)
EDIT:
For clarification: I want my function to get 2D numpy array with values belonging to {0, 1}. I want the output to be a tuple (2D index) of randomly (uniformly) picked value from the given array that is equal to 1.
EDIT2:
Using Paul H's suggestion, I came up with this:
nonzero = np.nonzero(a)
return random.choice(list(zip(nonzero)))
But it doesn't work with numpy's random choice, only with python's. Is there a way to optimise it better?
It's easier to get all the non-zero coordinates and sample from there:
xs,ys = np.where([[0, 1], [1, 0]])
# randomly pick a number:
idx = np.random.choice(np.arange(len(xs)) )
# output:
out = xs[idx], ys[idx]
You may try argwhere and permutation
a = np.array([[0, 1], [1, 0]])
b = np.argwhere(a)
tuple(np.random.permutation(b)[0])

Numpy: slicing a volume using a matrix

I have a 3D numpy volume and a 2D numpy matrix:
foo = np.random.rand(20,20,10)
amin = np.argmin(foo, axis=2)
i would like to use amin variable to slice the volume in the same way np.min would do:
grid = np.indices(min.shape)
idcs = np.stack([grid[0], grid[1], min])
fmin = foo[idcs[0], idcs[1], idcs[2]]
problem is that i can't use np.min because i also need the amin neighbors for interpolation reasons, something that i would obtain doing:
pre = foo[idcs[0], idcs[1], np.clip(idcs[2]-1, 0, 9)]
post = foo[idcs[0], idcs[1], np.clip(idcs[2]+1, 0, 9)]
Is there a more pythonic (nupyic) way to do this without creating an np.grid? something like:
foo[:,:,amin-1:amin+1]
that actually works (i would care about margin handling with an early-padding)
You could use np.ogrid instead of np.indices to save memory.
np.ogrid returns an "open" meshgrid:
In [24]: np.ogrid[:5,:5]
Out[24]:
[array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]])]
ogrid returns component arrays which can be used as indices
in the same way as one would use np.indices.
NumPy will automatically broadcast the values in the open mesh when they are used as indices:
In [49]: (np.indices((5,5)) == np.broadcast_arrays(*np.ogrid[:5, :5])).all()
Out[49]: True
import numpy as np
h, w, d = 20, 20, 10
foo = np.random.rand(h, w, d)
amin = np.argmin(foo, axis=2)
X, Y = np.ogrid[:h, :w]
amins = np.stack([np.clip(amin+i, 0, d-1) for i in [-1, 0, 1]])
fmins = foo[X, Y, amins]
It's better to store fmin, pre and post in one array, fmins,
since some NumPy/Scipy operations (like argmin or griddata) may need the values in one array. If, later, you need to operate on the 3 components individually, you can always access them using fmins[i] or define
pre, fmin, post = fmins

Numpy : Grouping/ binning values based on associations

Forgive me for a vague title. I honestly don't know which title will suit this question. If you have a better title, let's change it so that it will be apt for the problem at hand.
The problem.
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the sum of the values grouped by associations.
An example for better clarification.
result array:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
values array:
[ 1., 2., 3., 4., 5., 6., 7., 8.]
Note: Here result and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result. The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]
Here, 1st value(values[0]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we are counting the number of associations, then element value at result[0,0] will be 2 as 1st value and 5th value are associated with result[0, 0]. If we are taking the sum, the result[0, 0] = value[0] + value[4] which is 6.
Current solution
# Initialisation. No connection with the solution.
result = np.zeros([4,2], dtype=np.int16)
values = np.linspace(start=1, stop=8, num=8)
y_mapping = np.random.randint(low=0, high=values.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=values.shape[1], size=values.shape[0])
# Summing the values associated with x,y (current solution.)
for i in range(values.size):
x = x_mapping[i]
y = y_mapping[i]
result[-y, x] = result[-y, x] + values[i]
The result,
[[6, 0],
[ 6, 2],
[14, 0],
[ 8, 0]]
Failed solution; But why?
test_result = np.zeros_like(result)
test_result[-y_mapping, x_mapping] = test_result[-y_mapping, x_mapping] + values # solution
To my surprise elements are overwritten in test_result. Values at test_result,
[[5, 0],
[6, 2],
[7, 0],
[8, 0]]
Question
1. Why, in the second solution, every element is overwritten?
As #Divakar has pointed out in the comment in his answer -
NumPy doesn't assign accumulated/summed values when the indices are repeated in test_result[-y_mapping, x_mapping] =. It randomly assigns from one of the instances.
2. Is there any Numpy way to do this? That is without looping? I'm looking for some speed optimization.
Approach #2 in #Divakar's answer gives me good results. For 23315 associations, for loop took 50 ms while Approach #1 took 1.85 ms. Beating all these, Approach #2 took 668 µs.
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2 on an i7 processor.
Approach #1
Most intutive one would be with np.add.at for those repeated indices -
np.add.at(result, [-y_mapping, x_mapping], values)
Approach #2
We need to perform binned summations owing to the possible repeated nature of x,y indices. Hence, another way could be to use NumPy's binned summation func : np.bincount and have an implementation like so -
# Get linear index equivalents off the x and y indices into result array
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
# Get binned summations off values based on linear index as bins
binned_sums = np.bincount(lidx, values, minlength=m*n)
# Finally add into result array
result += binned_sums.astype(result.dtype).reshape(m,n)
If you are always starting off with a zeros array for result, the last step could be made more performant with -
result = binned_sums.astype(out_dtype).reshape(m,n)
I guess you were to write
y_mapping = np.random.randint(low=0, high=result.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=result.shape[1], size=values.shape[0])
With that correction, the code works for me as expected.

Is there a difference in the way we access elements of a list comprehension and the elements of a numpy array

I am working on a genetic algorithm code. I am fairly new to python.
My code snippet is as follows:
import numpy as np
pop_size = 10 # Population size
noi = 2 # Number of Iterations
M = 2 # Number of Phases in the Data
alpha = [np.random.randint(0, 64, size = pop_size)]* M
phi = [np.random.randint(0, 64, size = pop_size)]* M
reduced_tensor = [np.zeros((pop_size,3,3))]* M
for n_i in range(noi):
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
phi_en = [(phi/63.00) for phi in phi]
for i in range(M):
for j in range(pop_size):
reduced_tensor[i][j] = [[1, 0, 0],
[0, phi_en[i][j], 0],
[0, 0, 0]]
Here I have a list of numpy arrays. The variable 'alpha' is a list containing two numpy arrays. How do I use list comprehension in this case? I want to create a similar list 'alpha_en' which operates on every element of alpha. How do I do that? I know my current code is wrong, it was just trial and error.
What does 'for alpha in alpha' mean (line 11)? This line doesn't give any error, but also doesn't give the desired output. It changes the dimension and value of alpha.
The variable 'reduced_tensor' is a list of an array of 3x3 matrix, i.e., four dimensions in total. How do I differentiate between the indexing of a list comprehension and a numpy array? I want to perform various operations on a list of matrices, in this case, assign the values of phi_en to one of the elements of the matrix reduced_tensor (as shown in the code). How should I do it efficiently? I think my current code is wrong, if not just confusing.
There some questionable programming in these 2 lines
alpha = [np.random.randint(0, 64, size = pop_size)]* M
...
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
The first makes an array, and then makes a list with M pointers to the same thing. Note, M copies of the random array. If I were to change one element of alpha, I'd change them all. I don't see the point to this type of construction.
The [... for alpha in alpha] works because the 2 uses of alpha are different. At least in newer Pythons the i in [i*3 for i in range(3)] does not 'leak out' of the comprehension. That said, I would not approve of that variable naming. At the very least is it confusing to readers.
The arrays in alpha_en are separate. Values are derived from the array in alpha, but they are new.
for a in alphas:
a *= 2
would modify each array in alphas; how ever due to how alphas is constructed this ends up multiplying the array many times.
reduced_tensor = [np.zeros((pop_size,3,3))]* M
has the same problem; it's a list of M references to the same 3d array.
reduced_tensor[i][j]
references the i reference in that list, and the j 'row' of that array. I like to use
reduced_tensor[i][j,:,:]
to make it clearer to me and my reader the expected dimensions of the result.
The iteration over M does nothing for you; it just repeats the same assignment M times.
At the root of your problems is that use of list replication.
In [30]: x=[np.arange(3)]*3
In [31]: x
Out[31]: [array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2])]
In [32]: [id(i) for i in x]
Out[32]: [3036895536, 3036895536, 3036895536]
In [33]: x[0] *= 10
In [34]: x
Out[34]: [array([ 0, 10, 20]), array([ 0, 10, 20]), array([ 0, 10, 20])]

Categories

Resources