I have the following code:
for (old_point, new_point) in zip(masked_indices, masked_new_indices):
row, col = old_point
new_row, new_col = new_point
new_img[row, col] = img[new_row, new_col]
Where new_img and img are both 1024x1024x3 ndarrays, and masked_indices and masked_new_indices are both 80000x2 ndarrays.
Why does this statement not have the same behaviour?
new_img[masked_indices] = img[masked_new_indices]
And is there a way to optimize this for loop into a more NumPy-ish style?
I figured it out. Since new_img is three dimensional, I have to specify two indexes. Trying to use a 2D value in a single index was what was causing the unexpected behaviour.
This had the effect that I was going for, although it's not very pretty:
new_img[masked_indices[:, 0], masked_indices[:, 1], :] = img[masked_new_indices[:, 0], masked_new_indices[:, 1], :]
Related
I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.
Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)
I have numpy data which I am trying to turn into contour plot data. I realize this can be done through matplotlib, but I am trying to do this with just numpy if possible.
So, say I have an array of numbers 1-10, and and I want to divide the array according to contour "levels". I want to turn the input array into an array of boolean arrays, each of those being the size of the input, with a 1/True for any data point in that contour level and 0/False everywhere else.
For example, suppose the input is:
[1.2,2.3,3.4,2.5]
And the levels are [1,2,3,4],
then the return should be:
[[1,0,0,0],[0,1,0,1],[0,0,1,0]]
So here is the start of an example I whipped up:
import numpy as np
a = np.random.rand(3,3)*10
print(a)
b = np.zeros(54).reshape((6,3,3))
levs = np.arange(6)
#This is as far as I've gotten:
bins = np.digitize(a, levs)
print(bins)
I can use np.digitize to find out which level each value in a should belong to, but that's as far as I get. I'm fairly new to numpy and this really has me scratching me head. Any help would be greatly appreciated, thanks.
We could gather the indices off np.digitize output, which would represent the indices along the first n-1 axes, where n is the no. of dims in output to be set in the output as True values. So, we could use indexing after setting up the output array or we could use a outer range comparison to achieve the same upon leverage broadcasting.
Hence, with broadcasting one that covers generic n-dim arrays -
idx = np.digitize(a, levs)-1
out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
With indexing-based one re-using idx from previous method, it would be -
# https://stackoverflow.com/a/46103129/ #Divakar
def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)
out = np.zeros((idx.max()+1,) + idx.shape,dtype=int) #dtype=bool for bool array
out[all_idx(idx,axis=0)] = 1
Sample run -
In [77]: a = np.array([1.2,2.3,3.4,2.5])
In [78]: levs = np.array([1,2,3,4])
In [79]: idx = np.digitize(a, levs)-1
...: out = idx==(np.arange(idx.max()+1)).reshape([-1,]+[1]*idx.ndim)
In [80]: out.astype(int)
Out[80]:
array([[1, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 1, 0]])
I've read the numpy doc on slicing(especially the bottom where it discusses variable array indexing)
https://docs.scipy.org/doc/numpy/user/basics.indexing.html
But I'm still not sure how I could do the following: Write a method that either returns a 3D set of indices, or a 4D set of indices that are then used to access an array. I want to write a method for a base class, but the classes that derive from it access either 3D or 4D depending on which derived class is instantiated.
Example Code to illustrate idea:
import numpy as np
a = np.ones([2,2,2,2])
size = np.shape(a)
print(size)
for i in range(size[0]):
for j in range(size[1]):
for k in range(size[2]):
for p in range(size[3]):
a[i,j,k,p] = i*size[1]*size[2]*size[3] + j*size[2]*size[3] + k*size[3] + p
print(a)
print('compare')
indices = (0,:,0,0)
print(a[0,:,0,0])
print(a[indices])
In short, I'm trying to get a tuple(or something) that can be used to make both of the following access depending on how I fill the tuple:
a[i, 0, :, 1]
a[i, :, 1]
The slice method looked promising, but it seems to require a range, and I just want a ":" i.e. the whole dimension. What options are out there for variable numpy array dimension access?
In [324]: a = np.arange(8).reshape(2,2,2)
In [325]: a
Out[325]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
slicing:
In [326]: a[0,:,0]
Out[326]: array([0, 2])
In [327]: idx = (0,slice(None),0) # interpreter converts : into slice object
In [328]: a[idx]
Out[328]: array([0, 2])
In [331]: idx
Out[331]: (0, slice(None, None, None), 0)
In [332]: np.s_[0,:,0] # indexing trick to generate same
Out[332]: (0, slice(None, None, None), 0)
Your code appears to work how you want it using :. The reason the two examples
(a[i, 0, :, 7], a[i, :, 7])
don't work is because the 7 is out of range of the array. If you change the 7 to something in range like 1 then it returns a value, which I believe is what you are looking for.
I have this subcode in Python and I cannot understand what it is or what it does, especially this statement:
X[:,:,:,i]
The subcode is:
train_dict = sio.loadmat(train_location)
X = np.asarray(train_dict['X'])
X_train = []
for i in range(X.shape[3]):
X_train.append(X[:,:,:,i])
X_train = np.asarray(X_train)
Y_train = train_dict['y']
for i in range(len(Y_train)):
if Y_train[i]%10 == 0:
Y_train[i] = 0
Y_train = to_categorical(Y_train,10)
return (X_train,Y_train)
This is called array slicing. As #cᴏʟᴅsᴘᴇᴇᴅ mentioned, x is a 4D array and X[:,:,:,i] gets one specific 3D array slice of it.
Maybe an example with fewer dimensions can help.
matrix = np.arange(4).reshape((2,2))
In this case matrix is a bidimensional array:
array([[0, 1],
[2, 3]])
Therefore matrix[:, 1] will result in a smaller slice of matrix:
array([1, 3])
In original code matrix[:,:,:, 1] each of the first : mean something like "all elements in this dimension".
Have a look at how array slicing works in numpy here.
I am trying to get the x and y coordinates of a given value in a numpy image array.
I can do it by running through the rows and columns manually with a for statement, but this seems rather slow and I am possitive there is a better way to do this.
I was trying to modify a solution I found in this post. Finding the (x,y) indexes of specific (R,G,B) color values from images stored in NumPy ndarrays
a = image
c = intensity_value
y_locs = np.where(np.all(a == c, axis=0))
x_locs = np.where(np.all(a == c, axis=1))
return np.int64(x_locs), np.int64(y_locs)
I have the np.int64 to convert the values back to int64.
I was also looking at numpy.where documentation
I don't quite understand the problem. The axis parameter in all() runs over the colour channels (axis 2 or -1) rather than the x and y indices. Then where() will give you the coordinates of the matching values in the image:
>>> # set up data
>>> image = np.zeros((5, 4, 3), dtype=np.int)
>>> image[2, 1, :] = [7, 6, 5]
>>> # find indices
>>> np.where(np.all(image == [7, 6, 5], axis=-1))
(array([2]), array([1]))
>>>
This is really just repeating the answer you linked to. But is a bit too long for a comment. Maybe you could explain a bit more why you need to modify the previous answer? It doesn't seem like you do need to.