I have two arrays, say,
n = [1,2,3,4,5,6,7,8,9]
nc = [3,0,2,0,1,2,0,0,0]
The nonzero elements in nc are ncz = [3,2,1,2]. The elements in n corresponding to non zero elements in nc are p = [1,3,5,6]. I need to create a new array with elements of p[1:] inserted after ncz.cumsum()[:-1]+1 i.e after [4,6,7]
Is there any way to do this without using np.insert or a for loop?
Suppose I have m such pairs of arrays. Would I be able to do the same thing for each pair without using a loop? The resulting arrays can be zero padded to bring them to the same shape.
The result would be [1, 2, 3, 4, 3, 5, 6, 5, 7, 6, 8, 9]
To do it using np.insert, one would do:
n = np.array([1,2,3,4,5,6,7,8,9])
nc = np.array([3,0,2,0,1,2,0,0,0])
p1 = n[nc.nonzero()][1:]
ncz1 = nc[nc.nonzero()][:-1].cumsum()
result = np.insert(n,ncz1+1,p1)
I know how to do this using numpy insert operation, but I need to replicate it in theano and theano doesn't have an insert op.
Because of its generality np.insert is rather complex (but available for study), but for your case, with a 1d array, and order insert points, it can be simplified to
np.insert(n, i, p1) with:
In [688]: n
Out[688]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [689]: p1
Out[689]: array([13, 15, 16])
In [690]: i
Out[690]: array([4, 6, 7], dtype=int32)
Target array, z, and the insertion points in that array:
In [691]: j=i+np.arange(len(i))
In [692]: z=np.zeros(len(n)+len(i),dtype=n.dtype)
make a boolean mask - True where n values go, False where p1 values go.
In [693]: ind=np.ones(z.shape,bool)
In [694]: ind[j]=False
In [695]: ind
Out[695]:
array([ True, True, True, True, False, True, True, False, True,
False, True, True], dtype=bool)
copy values in to the right slots:
In [696]: z[ind]=n
In [697]: z[~ind]=p1 # z[j]=p1 would also work
In [698]: z
Out[698]: array([ 1, 2, 3, 4, 13, 5, 6, 15, 7, 16, 8, 9])
This is typical of array operations that return a new array of a different size. Make the target, and copy the appropriate values. This is true even when the operations are done in compiled numpy code (e.g. concatenate).
Related
This question already has an answer here:
Numpy tutorial - Boolean indexing
(1 answer)
Closed 4 years ago.
So I'm trying to learn Numpy and I cannot understand how this block of code is giving the output it is:
arr = array([1,2,3,4,5,6,7,8,9,10])
arr[arr>5]
Output :
array([6,7,8,9,10])
I do know that actually an array of boolean values is returned by arr>5 but I just can't understand how that boolean array when passed to arr[] gives the specified output.
Help Appreciated.
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> a
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
As you already said, a > 5 results in an array of boolean values:
>>> mask = a > 5
>>> mask
array([False, False, False, False, False, True, True, True, True,
True])
This can be interpreted as a mask. Similar to the way you can access single elements, for example the first element, with
>>> a[0]
1
You can access specific elements by using index arrays through this mask:
>>> a[mask]
array([ 6, 7, 8, 9, 10])
1, 2, 3, 4, 5 don't appear because the first 5 elements of mask are False. The rest is True and therefore 6, 7, 8, 9, 10 are shown.
I would like to know the most efficient way to test whether each element from one 1-d array exists in corresponding row of another 2-d array, using python
Specifically, I have two arrays. The first is an 1-d array of integers. The second is a 2-d array of integers.
Sample input:
[1, 4, 12, 9] # array 1
[[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]] # array 2
Expected output:
[True, False, False, True]
You can reshape a to 2d array, compare with b and then check if there's any True in each row:
np.equal(np.reshape(a, (-1,1)), b).any(axis=1)
a = [1, 4, 12, 9] # array 1
b = [[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]]
np.equal(np.reshape(a, (-1,1)), b).any(1)
# array([ True, False, False, True], dtype=bool)
This is a fairly pythonic solution to your problem. Note that this works virtually identically for numpy arrays and standard lists. Definitely not the most efficient solution for huge numpy arrays, but I doubt this would be any type of performance bottleneck, and pursuing premature optimization over readability is a cardinal sin.
a = [1, 4, 12, 9]
b = [
[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]
]
c = [x in y for x, y in zip(a,b)]
I have a multidimensional array, say of shape (4, 3) that looks like
a = np.array([(1,2,3),(4,5,6),(7,8,9),(10,11,12)])
If I have a list of fixed conditions
conditions = [True, False, False, True]
How can I return the list
array([(1,2,3),(10,11,12)])
Using np.extract returns
>>> np.extract(conditions, a)
array([1, 4])
which only returns the first element along each nested array, as opposed to the array itself. I wasn't sure if or how I could do this with np.where. Any help is much appreciated, thanks!
Let's define you variables:
>>> import numpy as np
>>> a = np.array([(1,2,3),(4,5,6),(7,8,9),(10,11,12)])
>>> conditions = [True, False, False, True]
Now, let's select the elements that you want:
>>> a[np.array(conditions)]
array([[ 1, 2, 3],
[10, 11, 12]])
Aside
Note that the simpler a[conditions] has some ambiguity:
>>> a[conditions]
-c:1: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
array([[4, 5, 6],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6]])
As you can see, conditions are treated here as (integer-like) index values which is not what we wanted.
you can use simple list slicing and np.where It's more or less made specifically for this situation..
>>> a[np.where(conditions)]
array([[[ 1, 2, 3],
[10, 11, 12]]])
I'm working on a problem with image processing, and my data is presented as a 3-dimensional NumPy array, where the (x, y, z) entry is the (x, y) pixel (numerical intensity value) of image z. There are 100000 images and each image is 25x25. Thus, the data matrix is of size 25x25x10000. I am trying to convert this into a 2-dimensional matrix of size 10000x625, where each row is a linearization of the pixels in the image. For example, suppose that instead the images were 3x3, we would have the following:
1 2 3
4 5 6 ------> [1, 2, 3, 4, 5, 6, 7, 8, 9]
7 8 9
I am attempting to do this by calling data.reshape((10000, 625)), but the data is no longer aligned properly after doing so. I have tried transposing the matrix in valid stages of reshaping, but that does not seem to fix it.
Does anyone know how to fix this?
If you want the data to be aligned you need to do data.reshape((625, 10000)).
If you want a different layout try np.rollaxis:
data_rolled = np.rollaxis(data, 2, 0) # This is Shape (10000, 25, 25)
data_reshaped = data_rolled.reshape(10000, 625) # Now you can do your reshape.
Numpy needs you to know which elements belong together during reshaping, so only "merge" dimensions that belong together.
The problem is that you aren't respecting the standard index order in your reshape call. The data will only be aligned if the two dimensions you want to combine are in the same position in the new array ((25, 25, 10000) -> (625, 10000)).
Then, to get the shape you want, you can transpose. It's easier to visualize with a smaller example -- when you run into problems like this, always try out a smaller example in the REPL if you can.
>>> a = numpy.arange(12)
>>> a = a.reshape(2, 2, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
>>> a.reshape(4, 3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> a.reshape(4, 3).T
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
No need to rollaxis!
Notice how the print layout that numpy uses makes this kind of reasoning easier. The differences between the first and the second step are only in the bracket positions; the numbers all stay in the same place, which often helps when you want to think through shape issues.
two_d = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
first = np.array((True, True, False, False, False))
second = np.array((False, False, False, True, True))
Now, when I enter:
two_d[first, second]
I get:
array([3,9])
which doesn't make a whole lot of sense to me. Can anybody explain that simply?
When given multiple boolean arrays to index with, NumPy pairs up the indices of the True values. The first true value in first in paired with the first true value in second, and so on. NumPy then fetches the elements at each of these (x, y) indices.
This means that two_d[first, second] is equivalent to:
two_d[[0, 1], [3, 4]]
In other words you're retrieving the values at index (0, 3) and index (1, 4); 3 and 9. Note that if the two arrays had different numbers of true values an error would be raised!
The documents on advanced indexing mention this behaviour briefly and suggest np.ix_ as a 'less surprising' alternative:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Hence you may be looking for:
>>> two_d[np.ix_(first, second)]
array([[3, 4],
[8, 9]])
Check the documentation on boolean indexing.
two_d[first, second] is the same as two_d[first.nonzero(), second.nonzero()], where:
>>> first.nonzero()
(array([0, 1]),)
>>> second.nonzero()
(array([3, 4]),)
Used as indices, this will select 3 and 9 because
>>> two_d[0,3]
3
>>> two_d[1,4]
9
and
>>> two_d[[0,1],[3,4]]
array([3, 9])
Also mildy related: NumPy indexing using List?