numpy histogram indexing

numpy histogram indexing - python

considering I have a 3D histogram or for simplicity a 3D numpy array of shape (X,Y,Z)
import numpy as np
array = np.random.random((100,100,100))
What is the best way, using numpy or scipy to obtain array's values' indexes of which satisfy a sphere conditions?
(index_x**2 + index_y**2 + index_z**2) <= radius**2
Obvioulsy, in the later condition, the array center is (0, 0, 0). In general the condition will be
((index_x-center_x)**2 + (index_y-center_y)**2 +(index_z-center_z)**2) <= radius**2
The problem is easy to solve using simply a python loop, but I need that to be optimized.
many thanks for your help

You can first efficiently get the indexes with ogrid() and then obtain the indexes that satisfy your condition with nonzero().
Getting the indexes can be obtained with nonzero() like so:
indexes = numpy.transpose((x**2+y**2+z**2 <= radius**2).nonzero()) # transpose() might be unnecessary: it depends on your needs
where the indexes arrays are obtained efficiently with ogrid():
x, y, z = numpy.ogrid[:100, :100, :100]
or, for an arbitrary shape for your input data array:
x, y, z = ogrid[tuple(slice(None, dim) for dim in data.shape)]

Just for making #EOL nice approach more general, one can define a center within the shape of the array
array = np.random.random((100,100,100))
center = (30,10,25)
radius = 5.0
x, y, z = np.ogrid[-center[0]:array.shape[0]-center[0],-center[1] :array.shape[1]-center[1], -center[2]:array.shape[2]-center[2]]
indexes = numpy.transpose((x**2+y**2+z**2 <= radius**2).nonzero())

Related

Suggestion to vectorize a Python function

I wrote the following function, which takes as inputs three 1D array (namely int_array, x, and y) and a number lim. The output is a number as well.
def integrate_to_lim(int_array, x, y, lim):
if lim >= np.max(x):
res = 0.0
if lim <= np.min(x):
res = int_array[0]
else:
index = np.argmax(x > lim) # To find the first element of x larger than lim
partial = int_array[index]
slope = (y[index-1] - y[index]) / (x[index-1] - x[index])
rest = (x[index] - lim) * (y[index] + (lim - x[index]) * slope / 2.0)
res = partial + rest
return res
Basically, outside form the limit cases lim>=np.max(x) and lim<=np.min(x), the idea is that the function finds the index of the first value of the array x larger than lim and then uses it to make some simple calculations.
In my case, however lim can also be a fairly big 2D array (shape ~2000 times ~1000 elements)
I would like to rewrite it such that it makes the same calculations for the case that lim is a 2D array.
Obviously, the output should also be a 2D array of the same shape of lim.
I am having a real struggle figuring out how to vectorize it.
I would like to stick only to the numpy package.
PS I want to vectorize my function because efficiency is important and as I understand using for loops is not a good choice in this regard.
Edit: my attempt
I was not aware of the function np.take, which made the task way easier.
Here is my brutal attempt that seems to work (suggestions on how to clean up or to make the code faster are more than welcome).
def integrate_to_lim_vect(int_array, x, y, lim_mat):
lim_mat = np.asarray(lim_mat) # Make sure that it is an array
shape_3d = list(lim_mat.shape) + [1]
x_3d = np.ones(shape_3d) * x # 3 dimensional version of x
lim_3d = np.expand_dims(lim_mat, axis=2) * np.ones(x_3d.shape) # also 3d
# I use np.argmax on the 3d matrices (is there a simpler way?)
index_mat = np.argmax(x_3d > lim_3d, axis=2)
# Silly calculations
partial = np.take(int_array, index_mat)
y1_mat = np.take(y, index_mat)
y2_mat = np.take(y, index_mat - 1)
x1_mat = np.take(x, index_mat)
x2_mat = np.take(x, index_mat - 1)
slope = (y1_mat - y2_mat) / (x1_mat - x2_mat)
rest = (x1_mat - lim_mat) * (y1_mat + (lim_mat - x1_mat) * slope / 2.0)
res = partial + rest
# Make the cases with np.select
condlist = [lim_mat >= np.max(x), lim_mat <= np.min(x)]
choicelist = [0.0, int_array[0]] # Shoud these options be a 2d matrix?
output = np.select(condlist, choicelist, default=res)
return output
I am aware that if the limit is larger than the maximum value in the array np.argmax returns the index zero (leading to wrong results). This is why I used np.select to check and correct for these cases.
Is it necessary to define the three dimensional matrices x_3d and lim_3d, or there is a simpler way to find the 2D matrix of the indices index_mat?
Suggestions, especially to improve the way I expanded the dimension of the arrays, are welcome.

I think you can solve this using two tricks. First, a 2d array can be easily flattened to a 1d array, and then your answers can be converted back into a 2d array with reshape.
Next, your use of argmax suggests that your array is sorted. Then you can find your full set of indices using digitize. Thus instead of a single index, you will get a complete array of indices. All the calculations you are doing are intrinsically supported as array operations in numpy, so that should not cause any problems.
You will have to specifically look at the limiting cases. If those are rare enough, then it might be okay to let the answers be derived by the default formula (they will be garbage values), and then replace them with the actual values you desire.

Cutting a subset from middle of NumPy array

I work with raster images and the module rasterio to imprt them as numpy arrays. I would like to cut a portion of size (1000, 1000) out of the middle of each (to avoid the out-of-bound masks of the image).
image = np.random.random_sample((2000, 2000))
s = image.shape
mid = [round(x / 2) for x in s] # middle point of both axes
margins = [[y + x for y in [-500, 500]] for x in mid] # 1000 range around every middle point
The result is a list of 2 lists, for the cut range on each axis. But this is where I stump: range() doesn't accept lists, and I'm attempting the following brute force method:
cut_image = image[range(margins[0][0], margins[0][1]), range(margins[1][0], margins[1][1])]
However:
cut_image.shape
## (1000,)
Slicing an array loses dimension information which is exactly what I don't want.
Consider me confused.
Looking for a more tasteful solution.

As the other answer points it out, you're not really slicing your array, but using indexing on it.
If you want to slice your array (and you're right, that's more elegant than using list of indices) , you'll be happier with slices. That objects represents the start:end:step syntax.
In your case,
import numpy as np
WND = 50
image = np.random.random_sample((200, 300))
s = image.shape
mid = [round(x / 2) for x in s] # middle point of both axes
margins = [[y + x for y in [-WND, WND]] for x in mid] # 1000 range around every middle point
# array[slice(start, end)] -> array[start:end]
x_slice = slice(margins[0][0], margins[0][1])
y_slice = slice(margins[1][0], margins[1][1])
print(x_slice, y_slice)
# slice(50, 150, None) slice(100, 200, None)
cut_image = image[x_slice, y_slice]
print(cut_image.shape)
# (100,100)
Indexing ?
You might wonder what was happening in your question that resulted in only 1000 elements instead of the expected 1000*1000.
Here is a simpler example of indexing with lists on different dimensions
# n and n2 have identical values
n = a[[i0, i1, i2],[j0, j1, j2]]
n2 = np.array([a[i0, j0], a[i1, j1], a[i2, j2]]
This being clarified, you'll understand that instead of taking a block matrix, your code only returns the diagonal coefficients of that block matrix :)

The issue here is that what you're doing is known as integer indexing, instead of slice indexing. The bahaviour changes and may seem counterintuitive when not acquainted with it. You can check the docs for more details.
Here's how you could do it with basic slicing:
# center coordinates of the image
x_0, y_0 = np.asarray(image.shape)//2
# slice taken from the center point
out = image[x_0-x_0//2:x_0+x_0//2, y_0-y_0//2:y_0+y_0//2]
print(out.shape)
# (1000, 1000)

Randomly keeping a single element different from zero along one axis of a numpy array

I need an efficient way to create a numpy array of shape (x,y,3) where only one random element out of the 3 for each tuple (x,y) has a value randomly selected from [-1,0,1]
np.random.randint(-1, 2, (x,y,3))
does the work only for the second half of my requirements.
I could use a nested loop to iterate on each (x, y) and multiple its value by a random mask but it would not be efficient at all.
Here is the loop implementation:
a=np.random.randint(-1, 2, (x,y,3))
for i in range(a.shape[0]):
for j in range(a.shape[1]):
mask = np.array(np.random.permutation([0,1,0]))
a[i][j] = a[i][j] * mask

Rather than generating a whole bunch of extra numbers and turning most of them off, I'd approach this from the point of view of only generating the numbers you need. You want to assign to a random index between 0 and 2 for each x-y pair. So generate a random index, and the random values, and assign:
indices = np.random.randint(3, size=(x, y))
values = np.random.randint(-1, 2, size=(x, y))
result = np.zeros((x, y, 3), dtype=int)
result[(*np.ogrid[:x, :y], indices)] = values
The indexing expression is an advanced index because indices is a list of integers. Using ... or :, : for the first two indices won't do what you want in that case. Instead, np.ogrid generates ranges of the correct shape to force the elements of indices to correspond to the correct x-y coordinates.

Reshaping array of matrices in Python

I have a Numpy array X of n 2x2 matrices, arranged so that X.shape = (2,2,n), that is, to get the first matrix I call X[:,:,0]. I would like to reshape X into an array Y such that I can get the first matrix by calling Y[0] etc., but performing X.reshape(n,2,2) messes up the matrices. How can I get it to preserve the matrices while reshaping the array?
I am essentially trying to do this:
import numpy as np
Y = np.zeros([n,2,2])
for i in range(n):
Y[i] = X[:,:,i]
but without using the for loop. How can I do this with reshape or a similar function?
(To get an example array X, try X = np.concatenate([np.identity(2)[:,:,None]] * n, axis=2) for some n.)

numpy.moveaxis can be used to take a view of an array with one axis moved to a different position in the shape:
numpy.moveaxis(X, 2, 0)
numpy.moveaxis(a, source, destination) takes a view of array a where the axis originally at position source ends up at position destination, so numpy.moveaxis(X, 2, 0) makes the original axis 2 the new axis 0 in the view.
There's also numpy.transpose, which can be used to perform arbitrary rearrangements of an array's axes in one go if you pass it the optional second argument, and numpy.rollaxis, an older version of moveaxis with a more confusing calling convention.

Use swapaxis:
Y = X.swapaxes(0,2)

Python Numpy error : setting an array element with a sequence

I'm quite new to Python and Numpy, so I apologize if I'm missing something obvious here.
I have a function that solves a system of 2 differential equations :
import numpy as np
import numpy.linalg as la
def solve_ode(x0, a0, beta, t):
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
# get eigenvalues and eigenvectors
evals, V = la.eig(At)
Vi = la.inv(V)
# get e^At coeff
eAt = V # np.exp(evals) # Vi
xt = eAt*x0
return xt
However, running it with this code :
import matplotlib.pyplot as plt
# initial values
x0 = 10**6
a0 = 2.5
beta = 0.05
t = np.linspace(0, 3600, 360)
plt.semilogy(t, solve_ode(x0, a0, beta, t))
... throws this error :
ValueError: setting an array element with a sequence.
At this line :
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
Note that t and beta are supposed to be floats. I think Python might not be able to infer this but I don't know how I could do this...
Thx in advance for your help.

You are supplying t as a numpy array of shape 360 from linspace and not simply a float. The resulting At numpy array you are trying to create is then ill formed as all columns must be the same length. In python there is an important difference between lists and numpy arrays. For example, you could do what you have here as a list of lists, e.g.
At = [[0.23*t, (-10**5)*t], [0, -beta*t]]
with dimensions [[360 x 360] x [1 x 360]].
Alternatively, if all elements of At are the length of t the array would work,
At = np.array([[0.23*t, (-10**5)*t], [t, -beta*t]], dtype=np.float32)
with shape [2, 2, 360].

When you give a list or a list of lists, or in this case, a list of list of listss, all of them should have the same length, so that numpy can automatically infer the dimensions (shape) of the resulting matrix.
In your example, it's all correctly put, except the part you put 0 as a column I guess. Not sure what to call it though, cause your expected output is a cube I suppose.
You can fix it by giving the correct number of zeros as bellow:
At = np.array([[0.23*t, (-10**5)*t], [np.zeros(len(t)), -beta*t]], dtype=np.float32)
But check the .shape of the resulting array, and make sure it's what you want.

As others note the problem is the 0 in the inner list. It doesn't match the 360 length arrays generated by the other expressions. np.array can make an object dtype array from that (2x2), but can't make a float one.
At = np.array([[0.23*t, (-10**5)*t], [0*t, -beta*t]])
produces a (2,2,360) array. But I suspect the rest of that function is built around the assumption that At is (2,2) - a 2d square array with eig, inv etc.
What is the return xt supposed to be?
Does this work?
S = np.array([solve_ode(x0, a0, beta, i) for i in t])
giving a 1d array with the same number of values as in t?
I'm not suggesting this is the fastest way of solving the problem, but it's the simplest, especially if you are only generating 360 values.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy histogram indexing - python

Related

Suggestion to vectorize a Python function

Cutting a subset from middle of NumPy array

Randomly keeping a single element different from zero along one axis of a numpy array

Reshaping array of matrices in Python

Python Numpy error : setting an array element with a sequence

Categories

Resources