Related
I want to create a 3d array where basically the content is identical to the indeces used to access it. So m[2,5] would result in array([2, 5]).
I couldn't find an obvious solution with the numpy functions indices, ogrid, concatenate, etc.
At the moment I'm using this, but was wondering whether there is a solution that makes better use of the API:
a, b = 3, 4
m = np.ones((a, b, 2))
for x in range(a):
m[x,:, 1] = np.array(range(b))
for y in range(b):
m[:,y,0] = np.array(range(a))
Try np.mgrid:
a, b = 3, 4
m = np.mgrid[:a,:b].transpose(1,2,0)
print(m[1,2])
# array([1, 2])
I want to divide a sparse matrix's rows by scalars given in an array.
For example, I have a csr_matrix C :
C = [[2,4,6], [5,10,15]]
D = [2,5]
I want the result of C after division to be :
result = [[1, 2, 3], [1, 2, 3]]
I have tried this using the method that we use for numpy arrays:
result = C / D[:,None]
But this seems really slow. How to do this efficiently in sparse matrices?
Approach #1
Here's a sparse matrix solution using manual replication with indexing -
from scipy.sparse import csr_matrix
r,c = C.nonzero()
rD_sp = csr_matrix(((1.0/D)[r], (r,c)), shape=(C.shape))
out = C.multiply(rD_sp)
The output is a sparse matrix as well as opposed to the output from C / D[:,None] that creates a full matrix. As such, the proposed approach saves on memory.
Possible performance boost with replication using np.repeat instead of indexing -
val = np.repeat(1.0/D, C.getnnz(axis=1))
rD_sp = csr_matrix((val, (r,c)), shape=(C.shape))
Approach #2
Another approach could involve data method of the sparse matrix that gives us a flattened view into the sparse matrix for in-place results and also avoid the use of nonzero, like so -
val = np.repeat(D, C.getnnz(axis=1))
C.data /= val
Question: I want to divide a sparse matrix's rows by scalars given in an array.
For example:
C = [[2,4,6], [5,10,15]]
D = [2,5]
Answer : use "multiply" provided by sparse matrix interface - it allows to "pointwise" multiply matrices by matrices as well as by vectors and scalars
C = [[2,4,6], [5,10,15]]
D = [2,5]
from scipy.sparse import csr_matrix
c = csr_matrix(C)
c2 = c.multiply( 1/np.array(D).reshape(2,1) )
c2.toarray()
'output:' array([[ 2, 4, 6],
[ 5, 10, 15]], dtype=int64)
PS
Thanks to Alexander Kirillin
one line code: result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
C = [[2,4,6], [5,10,15]] #len(C[0]) = 3
D = [2,5] # len(D) = 2
result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
print result
If you first cast D to type numpy.matrix (which I'm assuming you can do unless D is too big to fit into memory), then you can just run
C.multiply(1.0 / D.T)
to get what you want.
Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])
How can i accomplish column addition with shift using python numpy arrays ?
I have two dimensional array and need it's extended copy.
a = array([[0, 2, 4, 6, 8],
[1, 3, 5, 7, 9]])
i want something like (following is in pseudo code, it doesn't work; there is no a.columns in numpy as far as i know):
shift = 3
mult_factor = 0.7
for column in a.columns - shift :
out[column] = a[column] + 0.7 * a[column + shift]
I also know, that i can do the something similar to what i need using indexes. But i seems that is really overkill enumerating three values and using only one (j) :
for (i,j),value in np.ndenumerate(a):
print i,j
I founded, that i could iterate over columns, but not their indexes:
for column in a.T:
print column
Than i though that i can simply do this with something that is similar to xrange, but applying to multidimensional array:
In [225]: for column in np.ndindex(a.shape[1]):
print column
.....:
(0,)
(1,)
(2,)
(3,)
(4,)
So now i only know how to do this with simple xrange and i am not sure, that is the best solution.
out = np.zeros(a.shape)
shift = 2
mult_factor = 0.7
for i in xrange(a.shape[1]-shift):
print a[:, i]
out[:, i] = a[:, i] + mult_factor * a[:, i+shift]
However it will be not so fast in Python as it maybe can be.
Can you give me an advice how it will be in performance and maybe there is more faster way to accomplish column addition of numpy arrays with shift ?
out = a[:, :-shift] + mult_factor * a[:, shift:]
I think this is what you're looking for. It's a vectorized form of your loop, operating on large slices of a instead of column by column.
I'm not positive I completely understand what the computed quantity should be, but here are two things that seem germane to what you are asking:
If you have a 2D array, called a that you wish to convert to a list of 1D arrays which are the columns of a you can do this
cols = [c for c in a.T]
It looks like what you want can be accomplished with matrix multiplication if I am not mistaken. You could make a banded matrix in numpy using numpy.diag or, since you would have the same values along each band 1, mult_factor, or 0, you could use scipy.linalg.toeplitz
m,n = a.shape
band = np.eye(1,n)
band[0,shift] = mult_factor
T = scipy.linalg.toeplitz(np.eye(1,m),band)
out = np.inner(a,T)
For large matrices, it might make sense to use a sparse matrix for T if you only want to add two or a few columns of a.
I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.
The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))
I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).
What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])
IMHO, this is simplest variant:
m[np.arange(4), select]
Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])