numpy 2d array recursion - python

I have a numpy 2d array and i want to run a function that checks if values fron neighboring pixels are lowe than the given value (start value). If it’s i'm trying to run recursive function with this pixel and value from the first one
It works fine for small arrays, but with a big one i have memory errors.
I'm wondering if there is a better way to do this.
In result I'm trying to get numpy 2d array with values that met this criteria.
code:
def check_neighbours(point_position, arr, water_level):
locs = [[-1, -1], [-1, 0], [-1, 1], [0, -1],
[0, 1], [1, 1], [1, -1], [1, 0]]
if point in self.checked_cells:
return True
self.checked_cells.append(point)
neighbours = [self.get_locs(x, point) for x in locs]
for i in neighbours:
n_point = arr[i[0], i[1]]
if n_point <= water_level:
check_neighbours(i, arr, water_level)
check_neigbours([10,20], 2darray, 70)

Thanks everyone for the insights. Turned out the i was looking for something like a flood fill algorithm.
To avoid the stack overflow i had to use an queue within a while loop.

Related

I want to convert a 3D numpy array from the format [g, list([f])] to [[g, f]]

I'm currently doing some work to extract NoData values from a gridded satellite image. The image is presented as a 2D array, where the inner array is every pixel value in a given row from left to right, and the outer array is every row in the image from top to bottom.
Any advice on this?
I have built the following functions:
from more_itertools import locate
def find_indices(liste, item):
indices = locate(liste, lambda x: x == item)
return list(indices)
def find_indices2(liste, item):
indices = locate(liste, lambda x: item in x)
return list(indices)
and I have built two separate arrays of the index positions of:
a) the rows containing a '0' value in them (all of them). This is a 1D array marked as 'f'
b) the pixels with a '0' value within their given row. This is a 2D array, marked as 'g'
Finally, I carried out the following to merge my two arrays.
h = np.dstack((g, f))
Which gives me a 3D array of the form [g, list([f])]. I.e. [[0, list([0, 1, 2, 3, 4, 5...])], [1, list([0, 1, 2, 3, 4, 5...])]].
I want to convert this array into the form [[g, f]]. I.e. [[0, 0], [0,1], [0,2], [0,3], [0,4]...]. This will essentially give me a set of 2D co-ordinates for each NoData pixel which I can then apply to a second satellite pixel to mask it, turn both satellite images to arrays of the same length and run a regression on them.
Assuming I understood correctly what you mean, you could do something like this to convert your data:
data = np.array([[0, list([0, 1, 2, 3])], [1, list([0, 1, 2])]])
for i in range(data.shape[0]):
converted = np.asarray(np.meshgrid(data[i][0],data[i][1])).T.reshape(-1,2)
print(converted)
# and you could vstack here for example
This would give the output:
[[0 0]
[0 1]
[0 2]
[0 3]]
[[1 0]
[1 1]
[1 2]]
This can surely be done faster and more efficiently, but you didn't provide exact information on the data you start with. So I'm just trying to address the conversion part of the question. I think it's a bad idea to store data as list inside a numpy array in the first place, especially if its length varies.

How to make a gradient line between two points in numpy array?

I hope to generate a gradient line between two points in a numpy array. For example, if I have a numpy array like
[[1,0,0,0]
[0,0,0,0]
[0,0,0,0]
[0,0,0,4]]
What I hope to get is find a line between the [0,0] and [3,3], and have a liner gradient. So I hope to make an array like
[[1,0,0,0]
[0,2,0,0]
[0,0,3,0]
[0,0,0,4]]
The tricky part is the matrix may not be a perfect nxn. I don't care if some lines have two non-zero elements (because we cannot get a perfect diagonal for mxn matrix). The element of the same line can be the same in my case.
I am wondering is there an efficient way to make this happen?
You can use np.fill_diagonal:
np.fill_diagonal(arr, np.arange(arr[0,0], arr[-1,-1]+1))
Output:
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])

Randomly choose index based on condition in numpy

Let's say I have 2D numpy array with 0 and 1 as values. I want to randomly pick an index that contains 1. Is there efficient way to do this using numpy?
I achieved it in pure python, but it's too slow.
Example input:
[[0, 1], [1, 0]]
output:
(0, 1)
EDIT:
For clarification: I want my function to get 2D numpy array with values belonging to {0, 1}. I want the output to be a tuple (2D index) of randomly (uniformly) picked value from the given array that is equal to 1.
EDIT2:
Using Paul H's suggestion, I came up with this:
nonzero = np.nonzero(a)
return random.choice(list(zip(nonzero)))
But it doesn't work with numpy's random choice, only with python's. Is there a way to optimise it better?
It's easier to get all the non-zero coordinates and sample from there:
xs,ys = np.where([[0, 1], [1, 0]])
# randomly pick a number:
idx = np.random.choice(np.arange(len(xs)) )
# output:
out = xs[idx], ys[idx]
You may try argwhere and permutation
a = np.array([[0, 1], [1, 0]])
b = np.argwhere(a)
tuple(np.random.permutation(b)[0])

What are the efficient ways to assign values to 2D numpy arrays as functions of indicies

It may be a stupid question but I couldn't find a similar question asked(for now).
For example, I define as function called f(x,y)
def f(x, y):
return x+y
Now I want to output a 2D numpy array, the value of an element is equal to its indices summed, for example, if I want a 2x2 array:
arr = [[0, 1],
[1, 2]]
If I want a 3x3 array, then the output should be:
arr = [[0, 1, 2],
[1, 2, 3],
[2, 3, 4]]
It's not efficient to assign the values one by one, especially if the array size is large, say 10000*10000, which is also a waste of the quick speed of numpy. Although it sounds quite basic but I can't think of a simple and quick solution to it. What is the most common and efficient way to do it?
By the way, the summing indices just an example. I hope that the method can also be generalized to arbitrary functions like, say,
def f(x,y):
return np.cos(x)+np.sin(y)
Or even to higher dimensional arrays, like 4x4 arrays.
You can use numpy.indices, which returns an array representing the indices of a grid; you'll just need to sum along the 0 axis:
>>> a = np.random.random((2,2))
>>> np.indices(a.shape).sum(axis=0) # array([[0, 1], [1, 2]])
>>> a = np.random.random((3,3))
>>> np.indices((3,3)).sum(axis=0) #array([[0, 1, 2], [1, 2, 3], [2, 3, 4]])

Numpy : Grouping/ binning values based on associations

Forgive me for a vague title. I honestly don't know which title will suit this question. If you have a better title, let's change it so that it will be apt for the problem at hand.
The problem.
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the sum of the values grouped by associations.
An example for better clarification.
result array:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
values array:
[ 1., 2., 3., 4., 5., 6., 7., 8.]
Note: Here result and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result. The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]
Here, 1st value(values[0]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we are counting the number of associations, then element value at result[0,0] will be 2 as 1st value and 5th value are associated with result[0, 0]. If we are taking the sum, the result[0, 0] = value[0] + value[4] which is 6.
Current solution
# Initialisation. No connection with the solution.
result = np.zeros([4,2], dtype=np.int16)
values = np.linspace(start=1, stop=8, num=8)
y_mapping = np.random.randint(low=0, high=values.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=values.shape[1], size=values.shape[0])
# Summing the values associated with x,y (current solution.)
for i in range(values.size):
x = x_mapping[i]
y = y_mapping[i]
result[-y, x] = result[-y, x] + values[i]
The result,
[[6, 0],
[ 6, 2],
[14, 0],
[ 8, 0]]
Failed solution; But why?
test_result = np.zeros_like(result)
test_result[-y_mapping, x_mapping] = test_result[-y_mapping, x_mapping] + values # solution
To my surprise elements are overwritten in test_result. Values at test_result,
[[5, 0],
[6, 2],
[7, 0],
[8, 0]]
Question
1. Why, in the second solution, every element is overwritten?
As #Divakar has pointed out in the comment in his answer -
NumPy doesn't assign accumulated/summed values when the indices are repeated in test_result[-y_mapping, x_mapping] =. It randomly assigns from one of the instances.
2. Is there any Numpy way to do this? That is without looping? I'm looking for some speed optimization.
Approach #2 in #Divakar's answer gives me good results. For 23315 associations, for loop took 50 ms while Approach #1 took 1.85 ms. Beating all these, Approach #2 took 668 µs.
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2 on an i7 processor.
Approach #1
Most intutive one would be with np.add.at for those repeated indices -
np.add.at(result, [-y_mapping, x_mapping], values)
Approach #2
We need to perform binned summations owing to the possible repeated nature of x,y indices. Hence, another way could be to use NumPy's binned summation func : np.bincount and have an implementation like so -
# Get linear index equivalents off the x and y indices into result array
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
# Get binned summations off values based on linear index as bins
binned_sums = np.bincount(lidx, values, minlength=m*n)
# Finally add into result array
result += binned_sums.astype(result.dtype).reshape(m,n)
If you are always starting off with a zeros array for result, the last step could be made more performant with -
result = binned_sums.astype(out_dtype).reshape(m,n)
I guess you were to write
y_mapping = np.random.randint(low=0, high=result.shape[0], size=values.shape[0])
x_mapping = np.random.randint(low=0, high=result.shape[1], size=values.shape[0])
With that correction, the code works for me as expected.

Categories

Resources