Currently I work with numpy and have to transform large data sets. Starting point are some one-dimensional arrays. These should be combined to a large 2 dimensional array. I attach a small example how it should look like.
# Returns 16 arrays with four numbers in each array (from 0 to 3)
array = [np.arange(4) for i in range(16)]
# Each of these arrays should be transformed to become a two-dimensional shape=(2,2)
array.reshape((2,2))
1. Step:
# Subsequently, the first of all of them are to be brought together:
[[0,1,0,1,0,1,0,1,0,1, ...,0,1],
[2,3,2,3,2,3,2,3,2,3, ...,2,3]]
2. Step:
# Afterwards, the arrays are to be wrapped on the basis of a length (let's say 8). The result should look like this:
[[0,1,0,1,0,1,0,1],
[2,3,2,3,2,3,2,3],
[0,1,0,1,0,1,0,1],
[2,3,2,3,2,3,2,3]]
This is only a miniature example. I'm actually working with an array with the length of 64 that is to be converted to an array with the shape=(8, 8). And at the end I want to create a 2 dimensional array with the dimensions 416x416.
Edit: So my current question is, how do I get to the first and second step in the example above?
Assuming that you have created array (as you described), try the following code:
chunkSize = 8 # No of columns in the result
# Reshape and bring together
array2 = np.hstack([ arr.reshape(2,2) for arr in array ])
# The final result
result = np.vstack(np.hsplit(array2, [ (n + 1) * chunkSize
for n in range(int(array2.shape[1] / chunkSize) - 1) ]))
You can use np.pad, with mode='wrap':
final_width = 8
final_height = 8
a = np.arange(4).reshape(2,2)
np.pad(a, ((0, final_height-a.shape[0]),(0, final_width-a.shape[1])), mode='wrap')
a
out:
array([[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3]])
Well the example you posted explains everything clearly. I don't know what the problem is.
An implementation for your 64 to 8*8 transformation would be like:
import numpy as np
a = np.array([i for i in range(64)]) # defines a 64*1 1-D array
# a = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
# 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
# 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
# 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])
print(a.shape) # (64,)
a = a.reshape((8,8)) # makes "a" an 8*8 2-D array
# array([[ 0, 1, 2, 3, 4, 5, 6, 7],
# [ 8, 9, 10, 11, 12, 13, 14, 15],
# [16, 17, 18, 19, 20, 21, 22, 23],
# [24, 25, 26, 27, 28, 29, 30, 31],
# [32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47],
# [48, 49, 50, 51, 52, 53, 54, 55],
# [56, 57, 58, 59, 60, 61, 62, 63]])
print(a.shape) # (8, 8)
And the same thing goes for converting a 173056*1 array to a 416*416 one.
Maybe you are confused with the fact that you can use reshape method on a 2-D array. Of course you can!
Related
Hi so Im fairly new to python and an assignment require me to print elements that are less than a variable from a numpy array.
I made a 20x10 numpy array of random integers between -5 and 50
x = np.random.randint (-5, 50, (20, 10))
x
array([[17, 23, 15, 13, -1, 17, 30, 14, 2, 3],
[ 8, 0, -5, 3, 10, 10, 48, 6, -1, 34],
[23, 40, 21, 5, 47, 41, 44, 22, 46, 30],
[36, 13, 48, 29, 46, 25, 48, 38, 13, 40],
[18, -4, 1, 37, 48, 43, 25, 11, 21, 30],
[44, 37, 4, 39, 8, 1, 33, 34, 3, 8],
[ 2, 11, 17, 10, 20, 3, 30, 1, 12, 2],
[15, 20, -3, 11, 45, 40, 18, 19, -1, 31],
[39, 44, 18, 25, 49, 20, 15, 28, 32, 18],
[22, 24, 28, 46, 48, 46, 17, 49, 2, 36],
[44, 4, 49, -5, 14, 31, 12, 15, 48, 43],
[-2, 37, -4, 15, 31, -1, 11, 43, 42, 5],
[40, 35, 25, 22, 38, 26, 15, 1, 4, 22],
[42, 30, 14, 7, 13, 44, 5, 29, 28, 38],
[-2, 7, 31, -4, 44, -5, 34, 19, 31, 30],
[ 0, 1, -2, 29, 35, 28, 23, -1, 21, 27],
[40, 46, 4, 48, 0, 28, 2, 25, 3, 49],
[15, 2, -2, 16, 22, 39, -2, 33, 15, 2],
[14, 26, -5, 0, 22, 38, 25, 4, 14, 2],
[16, 32, 23, 3, 38, 41, -5, 35, 46, 33]])
above is the result. Now i want to print the number of elements that are less than 5 in each row.
I managed to do this
print (x[0, :] < 5)
[False False False False True False False False True True]
the result is as shown above but what i wanted was for it to show the number of elements that is less than 5. I wanted for it to give me 3 since there are 3 elements.
Can anyone help me with this? Thank you
It's possible to use np.sum for arrays of type bool like yours. So, at first I have tried the following:
[np.sum(n<5) for n in x]
This gives me a list [3, 4, 0, 0, 2, 3, 4, 2, 0, 1, 2, 3, 2, 0, 3, 4, 4, 4, 4, 2] which is correct but the bad thing is that you need to avoid list comprehensions in numpy actions. Here is the best way to do this in numpy:
np.sum(x<5, axis=1)
This command makes bool array out of x and then calculates True values for each row along y axis (axis number 1)
You can use your boolean mask to index the array and then count the elements. Alternatively, you can use numpy.where(). Similar to your approach, it will give you a boolean mask where a certain condition is met.
For your example:
indices = numpy.where(x < 3)
values_greater_than_3 = x[indices]
count = len(values_greater_than_3)
print(count)
I have these two 1d arrays A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and its label L = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]; where L[i] is the label of A[i].
Objective : I need to randomly shuffle both the 1d arrays in such a way that their labels stay in the same index.
e.g: After shuffle:
A= [2, 4, 9, 1, 3, 6, 0, 7, 5] then
L= [7, 5, 0, 8, 6, 3, 9, 2, 4], A[i] and L[i] should remain same as the original one.
I was thinking of concatenating the above two 1d arrays into a single 2d array and reshuffle it, then again separate the two 1d arrays. It's not working. And I am stuck at reshuffle.
Below is the code that I tried
import numpy as np
import random
# initializing the contents
A = np.arange(0,10)
length= len(A)
print length
print A
labels = np.zeros(10)
for index in range(length):
labels[index] = A[length-index-1]
print labels
# end, contents ready
combine = []
combine.append([A, labels])
print combine
random.shuffle(combine)
print "After shuffle"
print combine
If you are using Numpy just use a numpythonic approach. Create the pairs using np.column_stack and shuffle them with numpy.random.shuffle function:
pairs = np.column_stack((A, L))
np.random.shuffle(pairs)
Demo:
In [16]: arr = np.column_stack((A, L))
In [17]: np.random.shuffle(arr)
In [18]: arr
Out[18]:
array([[4, 5],
[5, 4],
[7, 2],
[1, 8],
[3, 6],
[6, 3],
[8, 1],
[2, 7],
[9, 0],
[0, 9]])
If you want to get the arrays just do a simple indexing:
In [19]: arr[:,0]
Out[19]: array([4, 5, 7, 1, 3, 6, 8, 2, 9, 0])
In [20]: arr[:,1]
Out[20]: array([5, 4, 2, 8, 6, 3, 1, 7, 0, 9])
Your thought was in the right direction. You just needed some Python-Fu:
from random import shuffle
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
L = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
res = list(zip(A, L))
shuffle(res) # shuffles in-place!
A, L = zip(*res) # unzip
print(A) # -> (4, 0, 2, 1, 8, 7, 9, 6, 5, 3)
print(L) # -> (5, 9, 7, 8, 1, 2, 0, 3, 4, 6)
The unzipping operation is explained here in detail in case you are wondering how it works.
You can also keep an index array np.arange(size) where size is the length of A and L and do shuffling on this array. Then use this array to rearrange A and L.
idx = np.arange(10)
np.random.shuffle(idx) # or idx = np.random.shuffle(np.arange(10))
A = np.arange(100).reshape(10, 10)
L = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
L[idx], A[idx]
# output
(array([2, 5, 1, 7, 8, 9, 0, 6, 4, 3]),
array([[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]]))
Reference
Numpy: Rearrange array based upon index array
What does this mean?
data.transpose(3, 0, 1, 2)
Also, if data.shape == (10, 10, 10), why do I get ValueError: axes don't match array?
Let me discuss in terms of Python3.
I use the transpose function in python as data.transpose(3, 0, 1, 2)
This is wrong as this operation requires 4 dimensions, while you only provide 3 (as in (10,10,10)). Reproducible as:
>>> a = np.arange(60).reshape((1,4,5,3))
>>> b = a.transpose((2,0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: axes don't match array
You can either add another dimension simply by reshaping (10,10,10) to (1,10,10,10) if the image batch is 1. This can be done as:
w,h,c = original_image.shape #10,10,10
modified_img = np.reshape((1,w,h,c)) #(1,10,10,10)
what does it mean of 3, 0, 1, 2.
For 2D numpy arrays, transpose for an array (matrix) operates just as the names say. But for higher dimensional arrays like yours, it basically works as moveaxis.
>>> a = np.arange(60).reshape((4,5,3))
>>> b = a.transpose((2,0,1))
>>> b.shape
(3, 4, 5)
>>> c = np.moveaxis(a,-1,0)
>>> c.shape
(3, 4, 5)
>>> b
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
>>> c
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
As evident, both methods work the same.
The operation converts from (samples, rows, columns, channels) into (samples, channels, rows, cols),maybe opencv to pytorch.
Have a look at numpy.transpose
Use transpose(a, argsort(axes)) to invert the transposition of tensors
when using the axes keyword argument.
Transposing a 1-D array returns an unchanged view of the original
array.
e.g.
>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
[2, 3]])
>>>
>>> np.transpose(x)
array([[0, 2],
[1, 3]])
You specified too many values in the transpose
>>> a = np.arange(8).reshape(2,2,2)
>>> a.shape (2, 2, 2)
>>> a.transpose([2,0,1])
array([[[0, 2],
[4, 6]],
[[1, 3],
[5, 7]]])
>>> a.transpose(3,0,1,2) Traceback (most recent call last): File "<interactive input>", line 1, in <module> ValueError: axes don't match array
>>>
From the python documentation on np.transpose, the second argument of the np.transpose function is axes, which is a list of ints, optional
by default and reverse the dimensions, otherwise permute the axes
according to the values given.
Example :
>>> x = np.arange(9).reshape((3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (0,1))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (1,0))
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
The thing is you have taken a 3 dimensional matrix and applied a 4 dimensional transpose.
your command is to convert a 4d matrix(batch,rows,cols,channel) to another 4d matrix (rows,cols,channel,batch) but you need a command to convert 3d matrix.so remove 3 and write
data.transpose(2, 0, 1).
For all i, j, k, l, the following holds true:
arr[i, j, k, l] == arr.transpose(3, 0, 1, 2)[l, i, j, k]
transpose(3, 0, 1, 2) reorders the array dimensions from (a, b, c, d) to (d, a, b, c):
>>> arr = np.zeros((10, 11, 12, 13))
>>> arr.transpose(3, 0, 1, 2).shape
(13, 10, 11, 12)
I have a variable dimension numpy array, for example it could have the following shapes
(64, 64)
(64, 64, 2, 5)
(64, 64, 40)
(64, 64, 10, 20, 4)
What I want to do is that if the number of dimensions is greater than 3, I want to collapse/stack everything else into the third dimension while preserving order. So, in my above example the shapes after the operation should be:
(64, 64)
(64, 64, 10)
(64, 64, 40)
(64, 64, 800)
Also, the order needs to be preserved. For example, the array of the shape (64, 64, 2, 5) should be stacked as
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
i.e. the 3D slices one after the other. Also, after the operation I would like to reshape it back to the original shape without any permutation i.e. preserve the original order.
One way I could do is multiply all the dimension values from 3 to the last dimension i.e.
shape = array.shape
if len(shape) > 3:
final_dim = 1
for i in range(2, len(shape)):
final_dim *= shape[i]
and then reshape the array. Something like:
array.reshape(64, 64, final_dim)
However, I was first of all not sure if the order is preserved as I want and whether there is a better pythonic way to achieve this?
EDIT: As pointed out in the other answers it is even easier to just provide -1 as the third dimension for reshape. Numpy automatically determines the correct shape then.
I am not sure what the problem here is. You can just use np.reshape and it preserves order. See the following code:
import numpy as np
A = np.random.rand(20,20,2,2,18,5)
print A.shape
new_dim = np.prod(A.shape[2:])
print new_dim
B = np.reshape(A, (A.shape[0], A.shape[1], np.prod(A.shape[2:])))
print B.shape
C = B.reshape((20,20,2,2,18,5))
print np.array_equal(A,C)
The output is:
(20L, 20L, 2L, 2L, 18L, 5L)
360
(20L, 20L, 360L)
True
This accomplishes exactly what you asked for.
reshape accept automatic re-dimension :
a=rand(20,20,8,6,4)
s=a.shape[:2]
if a.ndim>2 : s = s+ (-1,)
b=a.reshape(s)
Going by the requirement of stacking for the given (64, 64, 2, 5) sample, I think you need to permute the axes. For the permuting, we can use np.rollaxis, like so -
def collapse_dims(a):
if a.ndim>3:
return np.rollaxis(a,-1,2).reshape(a.shape[0],a.shape[1],-1)
else:
return a
Sample run on the given four sample shapes -
1) Sample shapes :
In [234]: shp1 = (64, 64)
...: shp2 = (64, 64, 2, 5)
...: shp3 = (64, 64, 40)
...: shp4 = (64, 64, 10, 20, 4)
...:
Case #1 :
In [235]: a = np.random.randint(11,99,(shp1))
In [236]: np.allclose(a, collapse_dims(a))
Out[236]: True
Case #2 :
In [237]: a = np.random.randint(11,99,(shp2))
In [238]: np.allclose(a[:,:,:,0], collapse_dims(a)[:,:,0:2])
Out[238]: True
In [239]: np.allclose(a[:,:,:,1], collapse_dims(a)[:,:,2:4])
Out[239]: True
In [240]: np.allclose(a[:,:,:,2], collapse_dims(a)[:,:,4:6]) # .. so on
Out[240]: True
Case #3 :
In [241]: a = np.random.randint(11,99,(shp3))
In [242]: np.allclose(a, collapse_dims(a))
Out[242]: True
Case #4 :
In [243]: a = np.random.randint(11,99,(shp4))
In [244]: np.allclose(a[:,:,:,:,0].ravel(), collapse_dims(a)[:,:,:200].ravel())
Out[244]: True
In [245]: np.allclose(a[:,:,:,:,1].ravel(), collapse_dims(a)[:,:,200:400].ravel())
Out[245]: True
I'll try to illustrate the concern that #Divaker brings up.
In [522]: arr = np.arange(2*2*3*4).reshape(2,2,3,4)
In [523]: arr
Out[523]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]],
[[[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]],
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
4 is the inner most dimension, so it displays the array as 3x4 blocks. And if you pay attention to spaces and [] you'll see there are 2x2 blocks.
Notice what happens when we use the reshape:
In [524]: arr1 = arr.reshape(2,2,-1)
In [525]: arr1
Out[525]:
array([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],
[[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]]])
Now it is 2 2x12 blocks. You can do anything to those 12 element rows, and reshape them back to 3x4 blocks
In [526]: arr1.reshape(2,2,3,4)
Out[526]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
...
But I could also split this array on the last dimension. np.split can do it, but a list comprehension is easier to understand:
In [527]: alist = [arr[...,i] for i in range(4)]
In [528]: alist
Out[528]:
[array([[[ 0, 4, 8],
[12, 16, 20]],
[[24, 28, 32],
[36, 40, 44]]]),
array([[[ 1, 5, 9],
[13, 17, 21]],
[[25, 29, 33],
[37, 41, 45]]]),
array([[[ 2, 6, 10],
[14, 18, 22]],
[[26, 30, 34],
[38, 42, 46]]]),
array([[[ 3, 7, 11],
[15, 19, 23]],
[[27, 31, 35],
[39, 43, 47]]])]
This contains 4 (2,2,3) arrays. Note that the 3 element rows display as columns in the 4d display.
I can reform into a 4d array with np.stack (which is like np.array, but gives more control of how the arrays are joined):
In [529]: np.stack(alist, axis=-1)
Out[529]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
...
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
==========
The split equivalent is [x[...,0] for x in np.split(arr, 4, axis=-1)]. Without the indexing split produces (2, 2, 3, 1) arrays.
collapse_dims produces (for my example):
In [532]: np.rollaxis(arr,-1,2).reshape(arr.shape[0],arr.shape[1],-1)
Out[532]:
array([[[ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11],
[12, 16, 20, 13, 17, 21, 14, 18, 22, 15, 19, 23]],
[[24, 28, 32, 25, 29, 33, 26, 30, 34, 27, 31, 35],
[36, 40, 44, 37, 41, 45, 38, 42, 46, 39, 43, 47]]])
A (2,2,12) array, but with the elements in rows in a different order. It does a transpose on the inner 2 dimensions before flattening.
In [535]: arr[0,0,:,:].ravel()
Out[535]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [536]: arr[0,0,:,:].T.ravel()
Out[536]: array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
Restoring that back to the original order requires another roll or transpose
In [542]: arr2.reshape(2,2,4,3).transpose(0,1,3,2)
Out[542]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
....
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
I'm having a bit of trouble understanding the output of unravel_index in the context of the following bit of code.
Using meshgrid I create two arrays representative of some coordinates:
import numpy as np
x_in=np.arange(-800, 0, 70)
y_in=np.arange(-3500, -2000, 70)
y, x =np.meshgrid(y_in,x_in,indexing='ij')
I then run through one of the grids to identify values within certain limits:
limit=100
x_gd=x[np.logical_and(x>=-600-limit,x<=-600+limit)]
This returns an array with the values I'm interested in - to get the indices of these values I use the following function (which I developed after reading this):
def get_index(array, select_array):
'''
Find the index positions of values from select_array in array
'''
rows,cols=array.shape
flt = array.flatten()
sorted = np.argsort(flt)
pos = np.searchsorted(flt[sorted], select_array)
indices = sorted[pos]
y_indx, x_indx = np.unravel_index(indices, [rows, cols])
return y_indx, x_indx
xx_y_indx, xx_x_indx = get_index(x, x_gd)
xx_x_indx returns what I expect - the col reference for the values from x:
array([2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3,
4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2,
3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4], dtype=int64)
xx_y_indx however returns:
array([15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2,
19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15,
2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19,
15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19, 15, 2, 19], dtype=int64)
when I would expect it to show all rows as the coordinates represented by array x are identical every line - not just in rows 15, 2 and 19.
For what I'm interested in, I can just use the result of xx_x_indx - the column indices. However, I can't explain why the y (row) indices report as they do.
This call to searchsorted is not finding the location of every occurrance of selected_array in flt[sorted]; it is finding the index of the first occurrance.
pos = np.searchsorted(flt[sorted], select_array)
In [273]: pos
Out[273]:
array([44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66,
88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44,
66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88,
44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88])
Notice all the repeated values in pos.
Everything past this point is perhaps not what you intended, since you are not really working with all the locations of the select_array values in flt[sorted] or array.
You could fix the problem by using:
def get_index(array, select_array):
'''
Find the index positions of values from select_array in array
'''
mask = np.logical_or.reduce([array==val for val in np.unique(select_array)])
y_indx, x_indx = np.where(mask)
return y_indx, x_indx
or
def get_index2(array, select_array):
idx = np.in1d(array.ravel(), select_array.ravel())
y_indx, x_indx = np.where(idx.reshape(array.shape))
return y_indx, x_indx
Which is faster depends on the number of elements in np.unique(select_array). When this is large, using a for-loop is slower, and hence get_index2 is faster. But if there are a lot of repeats in select_array and np.unique(select_array) is small, then get_index can be the faster option.
To demonstrate a use of np.unravel_index, you could even use
def get_index3(array, select_array):
idx = np.in1d(array.ravel(), select_array.ravel())
y_indx, x_indx = np.unravel_index(np.where(idx), array.shape)
return y_indx, x_indx
but I think this is slower than get_index2 in all cases since reshape is very fast so using np.where with reshape is faster than using np.where and np.unravel_index.