I have a large 2D array (3x100,000) of 3D coordinates and a second 1D array with certain not sorted coordinates. I would like to find all points with coordinates that contained in the second array
An example:
mat1 = np.array([[1,2,3],[1,2,5],[2,3,6],[10,11,12],[20,2,3]])
mat2 = np.array([1,2,3,6])
So here I need to obtain indexes of 0 and 2. And I need to find each correspondence on around 100,000 coordinates. Is there a specific function in Python to do this work?
To sum up my situation:
Easiest way would be with np.isin -
# a,b are input arrays - mat1,mat2 respectively
In [7]: np.flatnonzero(np.isin(a,b).all(1))
Out[7]: array([0, 2])
Another with np.searchsorted -
In [19]: idx = np.searchsorted(b,a)
In [20]: idx[idx==len(b)] = 0
In [21]: np.flatnonzero((b[idx]==a).all(1))
Out[21]: array([0, 2])
If b is not in sorted order, use np.argsort(b) as sorter arg with np.searchsorted. More info.
Related
I have two large 2D arrays (3x100,000) corresponding to 3D coordinates and I would like to find index of each correspondence.
An example:
mat1 = np.array([[0,0,0],[0,0,0],[0,0,0],[10,11,12],[1,2,3]]).T
mat2 = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]]).T
So here I need to obtain indexes of 3 and 0. And I need to find each correspondence on around 100,000 coordinates. Is there a specific function in Python to do this work? Apply a for loop could be probl
res = [3,0]
To sum up, my need:
We can use Cython-powered kd-tree for quick nearest-neighbor lookup -
In [77]: from scipy.spatial import cKDTree
In [78]: d,idx = cKDTree(mat2.T).query(mat1.T, k=1)
In [79]: idx[np.isclose(d,0)]
Out[79]: array([3, 0])
I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.
Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)
I have numpy data which I am trying to turn into contour plot data. I realize this can be done through matplotlib, but I am trying to do this with just numpy if possible.
So, say I have an array of numbers 1-10, and and I want to divide the array according to contour "levels". I want to turn the input array into an array of boolean arrays, each of those being the size of the input, with a 1/True for any data point in that contour level and 0/False everywhere else.
For example, suppose the input is:
[1.2,2.3,3.4,2.5]
And the levels are [1,2,3,4],
then the return should be:
[[1,0,0,0],[0,1,0,1],[0,0,1,0]]
So here is the start of an example I whipped up:
import numpy as np
a = np.random.rand(3,3)*10
print(a)
b = np.zeros(54).reshape((6,3,3))
levs = np.arange(6)
#This is as far as I've gotten:
bins = np.digitize(a, levs)
print(bins)
I can use np.digitize to find out which level each value in a should belong to, but that's as far as I get. I'm fairly new to numpy and this really has me scratching me head. Any help would be greatly appreciated, thanks.
We could gather the indices off np.digitize output, which would represent the indices along the first n-1 axes, where n is the no. of dims in output to be set in the output as True values. So, we could use indexing after setting up the output array or we could use a outer range comparison to achieve the same upon leverage broadcasting.
Hence, with broadcasting one that covers generic n-dim arrays -
idx = np.digitize(a, levs)-1
out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
With indexing-based one re-using idx from previous method, it would be -
# https://stackoverflow.com/a/46103129/ #Divakar
def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)
out = np.zeros((idx.max()+1,) + idx.shape,dtype=int) #dtype=bool for bool array
out[all_idx(idx,axis=0)] = 1
Sample run -
In [77]: a = np.array([1.2,2.3,3.4,2.5])
In [78]: levs = np.array([1,2,3,4])
In [79]: idx = np.digitize(a, levs)-1
...: out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
In [80]: out.astype(int)
Out[80]:
array([[1, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 1, 0]])
I am new to numpy but have been using python for quite a while as an engineer.
I am writing a program that currently stores stress tensors as 3x3 numpy arrays within another NxM array which represents values through time and through the thickness of a wall, so overall it is an NxMx3x3 numpy array. I want to efficiently calculate the eigenvals and vectors of each 3x3 array within this larger array. So far I have tried to using "fromiter" but this doesn't seem to work because the functions returns 2 arrays. I have also tried apply_along_axis which also doesn't work because it says the inner 3x3 is not a square matrix? I can do it with list comprehension, but this doesn't seem ideal to resort to using lists.
Example just calculating eigenvals using list comprehension
import numpy as np
from scipy import linalg
a=np.random.random((2,2,3,3))
f=linalg.eigvalsh
ans=np.asarray([f(x) for x in a.reshape((4,3,3))])
ans.shape=(2,2,3)
I thought something like this would work but I have played around with it and can't get it working:
np.apply_along_axis(f,0,a)
BTW the 2x2 bit could be up to 5000x100 and this code is repeated ~50x50x200 times hence the need for efficiency. Any help would be greatly appreciated?
You can use numpy.linalg.eigh. It accepts an array like your example a.
Here's an example. First, create an array of 3x3 symmetric arrays:
In [96]: a = np.random.random((2, 2, 3, 3))
In [97]: a = a + np.transpose(a, axes=(0, 1, 3, 2))
In [98]: a[0, 0]
Out[98]:
array([[0.61145048, 0.85209618, 0.03909677],
[0.85209618, 1.79309413, 1.61209077],
[0.03909677, 1.61209077, 1.55432465]])
Compute the eigenvalues and eigenvectors of all the 3x3 arrays:
In [99]: evals, evecs = np.linalg.eigh(a)
In [100]: evals.shape
Out[100]: (2, 2, 3)
In [101]: evecs.shape
Out[101]: (2, 2, 3, 3)
Take a look at the result for a[0, 0]:
In [102]: evals[0, 0]
Out[102]: array([-0.31729364, 0.83148477, 3.44467813])
In [103]: evecs[0, 0]
Out[103]:
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]])
Verify that it is the same as computing the eigenvalues and eigenvectors for a[0, 0] separately:
In [104]: np.linalg.eigh(a[0, 0])
Out[104]:
(array([-0.31729364, 0.83148477, 3.44467813]),
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]]))
I have a 2D array which describes index ranges for a 1D array like
z = np.array([[0,4],[4,9]])
The 1D array
a = np.array([1,1,1,1,0,0,0,0,0,1,1,1,1])
I want to have a view on the 1D array with the index range defined by z. So, for only the first range
a[z[0][0]:z[0][1]]
How to get it for all ranges? Is it possible to use as_strided with unequal lengths defined by z as shape? I want to avoid to copy data, actually I only want a different view on a for further computation.
In [66]: a = np.array([1,1,1,1,0,0,0,0,0,1,1,1,1])
In [67]: z = np.array([[0,4],[4,9]])
So generating the slices from the rows of z we get 2 arrays:
In [68]: [a[x[0]:x[1]] for x in z]
Out[68]: [array([1, 1, 1, 1]), array([0, 0, 0, 0, 0])]
Individually those arrays are views. But together they aren't an array. The lengths diff, so they can't be vstacked into a (2,?) array. They can be hstacked but that won't be a view.
The calculation core of np.array_split is:
sub_arys = []
sary = _nx.swapaxes(ary, axis, 0)
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))
Ignoring the swapaxes bit, this is doing the same thing as my list comprehension.
for x, y in z:
array_view = a[x:y]
# do something with array_view