I am trying to select specific column elements for each row of a numpy array. For example, in the following example:
In [1]: a = np.random.random((3,2))
Out[1]:
array([[ 0.75670668, 0.1283942 ],
[ 0.51326555, 0.59378083],
[ 0.03219789, 0.53612603]])
I would like to select the first element of the first row, the second element of the second row, and the first element of the third row. So I tried to do the following:
In [2]: b = np.array([0,1,0])
In [3]: a[:,b]
But this produces the following output:
Out[3]:
array([[ 0.75670668, 0.1283942 , 0.75670668],
[ 0.51326555, 0.59378083, 0.51326555],
[ 0.03219789, 0.53612603, 0.03219789]])
which clearly is not what I am looking for. Is there an easy way to do what I would like to do without using loops?
You can use:
a[np.arange(3), (0,1,0)]
in your example above.
OK, just to clarify here, lets do a simple example
A=diag(arange(0,10,1))
gives
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9]])
then
A[0][0:4]
gives
array([0, 0, 0, 0])
that is first row, elements 0 to 3. But
A[0:4][1]
doesn't give the first 4 rows, the 2nd element in each. Instead we get
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0])
i.e the entire 2nd column.
A[0:4,1]
gives
array([0, 1, 0, 0])
I'm sure there is a very good reason for this and which makes perfect sense to programmers
but for those of us uninitiated in that great religion it can be quite confusing.
This isn't an answer so much as an attempt to document this a bit. For the answer above, we would have:
>>> import numpy as np
>>> A = np.array(range(6))
>>> A
array([0, 1, 2, 3, 4, 5])
>>> A.shape = (3,2)
>>> A
array([[0, 1],
[2, 3],
[4, 5]])
>>> A[(0,1,2),(0,1,0)]
array([0, 3, 4])
Specifying a list (or tuple) of individual row and column coordinates allows fancy indexing of the array. The first example in the comment looks similar at first, but the indices are slices. They don't extend over the whole range, and the shape of the array that is returned is different:
>>> A[0:2,0:2]
array([[0, 1],
[2, 3]])
For the second example in the comment
>>> A[[0,1],[0,1]]
array([0, 3])
So it seems that slices are different, but except for that, regardless of how indices are constructed, you can specify a tuple or list of (x-values, y-values), and recover those specific elements from the array.
Related
I have a numpy array of shape (X,Y,Z). I want to check each of the Z dimension and delete the non-zero dimension really fast.
Detailed explanation:
I would like to check array[:,:,0] if any entry is non-zero.
If yes, ignore and check array[:,:,1].
Else if No, delete dimension array[:,:,0]
Also not 100% sure what your after but I think you want
np.squeeze(array, axis=2)
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.squeeze.html
I'm not certain what you want but this hopefully points in the right direction.
Edit 1st Jan:
Inspired by #J.Warren's use of np.squeeze I think np.compress may be more appropriate.
This does the compression in one line
np.compress((a!=0).sum(axis=(0,1)), a, axis=2) #
To explain the first parameter in np.compress
(a!=0).sum(axis=(0, 1)) # sum across both the 0th and 1st axes.
Out[37]: array([1, 1, 0, 0, 2]) # Keep the slices where the array !=0
My first answer which may no longer be relevant
import numpy as np
a=np.random.randint(2, size=(3,4,5))*np.random.randint(2, size=(3,4,5))*np.random.randint(2, size=(3,4,5))
# Make a an array of mainly zeroes.
a
Out[31]:
array([[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 0, 0, 0, 0]]])
res=np.zeros(a.shape[2], dtype=np.bool)
for ix in range(a.shape[2]):
res[ix] = (a[...,ix]!=0).any()
res
Out[34]: array([ True, True, False, False, True], dtype=bool)
# res is a boolean array of which slices of 'a' contain nonzero data
a[...,res]
# use this array to index a
# The output contains the nonzero slices
Out[35]:
array([[[0, 0, 0],
[0, 0, 1],
[0, 0, 0],
[0, 0, 0]],
[[0, 1, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]]])
How can I sample some of the rows of a scipy sparse matrix and form a new scipy sparse matrix from these sampled rows?
For eg. if I have a scipy sparse matrix A with 10 rows and I want to make a new scipy sparse matrix B with rows 1,3,4 from A, how to do that?
Left-multiply with an appropriate indicator matrix. The indicator matrix can be built using scipy.sparse.block_diag or directly, using csr format, as shown below.
>>> import numpy as np
>>> from scipy import sparse
>>>
# create example
>>> m, n = 10, 8
>>> subset = [1,3,4]
>>> A = sparse.csr_matrix(np.random.randint(-10, 5, (m, n)).clip(0, None))
>>> A.A
array([[3, 2, 4, 0, 0, 0, 2, 0],
[0, 0, 2, 0, 0, 0, 0, 0],
[4, 0, 0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 1, 4, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 0],
[0, 0, 0, 4, 0, 4, 4, 0],
[0, 2, 0, 0, 0, 3, 0, 0],
[4, 0, 3, 3, 0, 0, 0, 2],
[4, 0, 0, 0, 0, 2, 0, 1]], dtype=int64)
>>>
# build indicator matrix
# either using block_diag ...
>>> split_points = np.arange(len(subset)+1).repeat(np.diff(np.concatenate([[0], subset, [m-1]])))
>>> indicator = sparse.block_diag(np.split(np.ones(len(subset), int), split_points)).T
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int64)
>>>
# ... or manually---this also works for non sorted non unique subset,
# and is therefore to be preferred over block_diag
>>> indicator = sparse.csr_matrix((np.ones(len(subset), int), subset, np.arange(len(subset)+1)), (len(subset), m))
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])
>>>
# apply
>>> result = indicator#A
>>> result.A
array([[0, 0, 2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 1, 4, 0, 0]], dtype=int64)
What is the fastest way of creating a new matrix that is a result of a "look-up" of some numpy matrix X (using an array of indices to be looked up in matrix X)? Example of what I want to achieve:
indices = np.array([[[1,1],[1,1],[3,3]],[[1,1],[5,8],[6,9]]]) #[i,j]
new_matrix = lookup(X, use=indices)
Output will be something like:
new_matrix = np.array([[3,3,7],[3,4,9]])
where for example X[1,1] was 3. I'm using python 2.
Use sliced columns for indexing into X -
X[indices[...,0], indices[...,1]]
Or with tuple -
X[tuple(indices.T)].T # or X[tuple(indices.transpose(2,0,1))]
Sample run -
In [142]: X
Out[142]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9]])
In [143]: indices
Out[143]:
array([[[1, 1],
[1, 1],
[3, 3]],
[[1, 1],
[5, 8],
[6, 9]]])
In [144]: X[indices[...,0], indices[...,1]]
Out[144]:
array([[3, 3, 7],
[3, 4, 9]])
How can i iterate through a list of lists so as to make any of the lists with a "1" have the top(0), top left(0), top right(0), bottom(0), bottom right(0),bottom left(0) also become a "1" as shown below? making list 1 become list 2
list_1 =[[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]]
list_2 =[[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0]]
This is a common operation known as "dilation" in image processing. Your problem is 2-dimensional, so you would be best served using
a more appropriate 2-d data structure than a list of lists, and
an already available library function, rather than reinvent the wheel
Here is an example using a numpy ndarray and scipy's binary_dilation respectively:
>>> import numpy as np
>>> from scipy import ndimage
>>> a = np.array([[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]], dtype=int)
>>> ndimage.binary_dilation(a, structure=ndimage.generate_binary_structure(2, 2)).astype(a.dtype)
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]])
With numpy, which is more suitable to manipulate 2D list in general. If you're doing image analysis, see #wim answer. Otherwise here is how you could manage it with numpy only.
> import numpy as np
> list_1 =[[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]]
> l = np.array(list_1) # convert the list into a numpy array
> pos = np.where(l==1) # get the position where the array is equal to one
> pos
(array([2]), array([3]))
# make a lambda function to limit the lower indexes:
get_low = lambda x: x-1 if x>0 else x
# get_high is not needed.
# slice the array around that position and set the value to one
> l[get_low(pos[0]):pos[0]+2,
get_low(pos[1]):pos[1]+2] = 1
> l
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]])
> corner
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]])
> p = np.where(corner==1)
> corner[get_low(p[0]):p[0]+2,
get_low(p[1]):p[1]+2] = 1
> corner
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
HTH
Is there an efficient way to generate a list (or an array) of all possible combinations of say 2 ones and 8 zeros? E.g.
[[0,0,0,0,0,0,0,0,1,1],
[0,0,0,0,0,0,0,1,0,1,],
...]
This works, but there could be a better way?
import numpy as np
result = []
for subset in itertools.combinations(range(10), 2):
subset = list(subset)
c = np.zeros(10)
c[subset] = 1
result.append(c)
Would love to have some ideas on how to optimize this code.
Well, it's not much different but doing bulk operations on Numpy arrays is bound to have much less overhead:
import itertools
import numpy
which = numpy.array(list(itertools.combinations(range(10), 2)))
grid = numpy.zeros((len(which), 10), dtype="int8")
# Magic
grid[numpy.arange(len(which))[None].T, which] = 1
grid
#>>> array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
#>>> [1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#>>> [1, 0, 0, 1, 0, 0, 0, 0, 0, 0],
#>>> [1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
#>>> [1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
#>>> ...
The bulk of the time is then spent doing numpy.array(list(itertools.combinations(range(10), 2))). I tried using numpy.fromiter but I didn't get any speed improvements. Since half the time is literally generating the tuples, the only real way to improve further is to generate the combinations in something like C or Cython.
Alternative using numpy.bincount:
>>> [np.bincount(xs, minlength=10) for xs in itertools.combinations(range(10), 2)]
[array([1, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64),
array([1, 0, 1, 0, 0, 0, 0, 0, 0, 0], dtype=int64),
array([1, 0, 0, 1, 0, 0, 0, 0, 0, 0], dtype=int64),
array([1, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int64),
...]
Shouldn't we be using permutations for this? Eg,
from itertools import permutations as perm
a, b = 6, 2
print '\n'.join(sorted([''.join(s) for s in set(t for t in perm(a*'0' + b*'1'))]))