How to perform a sum just for a list of indices over numpy array, e.g., if I have an array a = [1,2,3,4] and a list of indices to sum, indices = [0, 2] and I want a fast operation to give me the answer 4 because the value for summing value at index 0 and index 2 in a is 4
You can use sum directly after indexing with indices:
a = np.array([1,2,3,4])
indices = [0, 2]
a[indices].sum()
The accepted a[indices].sum() approach copies data and creates a new array, which might cause problem if the array is large. np.sum actually has an argument to mask out colums, you can just do
np.sum(a, where=[True, False, True, False])
Which doesn't copy any data.
The mask array can be obtained by:
mask = np.full(4, False)
mask[np.array([0,2])] = True
Try:
>>> a = [1,2,3,4]
>>> indices = [0, 2]
>>> sum(a[i] for i in indices)
4
Faster
If you have a lot of numbers and you want high speed, then you need to use numpy:
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[indices]
array([1, 3])
>>> np.sum(a[indices])
4
Related
I would like to slice a numpy array so that I can exclude a single element from it.
For example, like this:
a = numpy.array([1,2,3,4,5])
b = a[0:1::3:4]
b = [1 2 4 5]
Only that this does not work, so either I am doing something wrong, or it isn't possible.
If you are going to repeatedly 'delete' one item at a time, I'd suggest using a boolean mask:
In [493]: a = np.arange(100)
In [494]: mask = np.ones(a.shape, dtype=bool)
In [495]: for i in [2,5,9,20,3,26,40,60]:
...: mask[i]=0
...: a1 = a[mask]
In [496]: a1.shape
Out[496]: (92,)
That's effectively what np.delete does when given a list or array of deletes
In [497]: a2 = np.delete(a, [2,5,9,20,3,26,40,60])
In [498]: np.allclose(a1,a2)
Out[498]: True
For a single element is joins two slices - either by concatenate or copying to result array of the right size. In all cases we have to make a new array.
One exclusion or many, you seek an discontinuous selection of the elements of the original. That can't be produced with a view, which uses shape and strides to select a regular subset of the original.
You need to do something like below
a = np.array([1,2,3,4,5])
b = a[:2]
c = a[3:]
print ( b )
print ( c )
z= np.concatenate((b,c),axis=None)
print ( z )
Output:
[1 2]
[4 5]
[1 2 4 5]
Hence here everything other than 3 is in new numpy.ndarray z here.
Other way is to use to use np.delete function as shown in one the answers where you can provide list of indexes to be deleted inside the [] where list contains coma seperated index to be deleted.
a = np.array([15,14,13,12,11])
a4=np.delete(a,[1,4])
print(a4)
output is :
[15 13 12]
import numpy as np
a = np.array([1,2,3,4,5])
result = np.delete(a,2)
result = [1,2,4,5]
You could always use sets of slicing
b = a[:2]+a[3:]
Will return [1, 2, 4, 5]
for a numpy return value you could do 2 slices and concatenate the results.
b = a[3:]
c = a[:2]
numpy.concatenate([c,b])
Will return
array([1, 2, 4, 5])
I have two NxN numpy arrays, they are equal size.
If a given row and column in the first array is nonzero, then it is guaranteed that we either have the same value in the same row and column of the other array, or that we have a zero there.
If a given row and column in the first array is zero, then we can have either a zero or a nonzero value in that row and column in the other array.
I would like to combine both array, such that for every [row,col], if one array has a value of zero, and the other has nonzero, then my second array will be modified (if necessary), to have the nonzero value.
And, if they both have a nonzero value, (which is guaranteed to be the same value), then there will be no modification for that row,column - it stays the same.
Example:
array 1:
[[0,9],[2,0]]
array 2:
[[0,0],[2,2]]
After doing my "union", I want array 2 to be:
[[0,9],[2,2]]
What is a fast way to do this for large matrices? Thank you.
All you wanna do is to change the zeros in second array to items in same index in first array. You can do the following:
mask = arr2 == 0
arr2[mask] = arr1[mask]
Demo:
In [7]: arr1 = np.array([[0,9],[2,0]])
In [8]: arr2 = np.array([[0,0],[2,2]])
In [9]: mask = arr2 == 0
In [10]: arr2[mask] = arr1[mask]
In [11]: arr2
Out[11]:
array([[0, 9],
[2, 2]])
Since you are asking for "fast" you may be interested in np.copyto:
>>> a = np.random.randint(0, 2, (100, 100))
>>> b = np.random.randint(-1, 1, (100, 100))
>>>
>>>
>>> timeit("bk = b.copy(); mask=bk==0; bk[mask] = a[mask]", globals=globals(), number=10000)
1.3142543959984323
>>> timeit("bp = b.copy(); np.copyto(bp, a, where=bp==0)", globals=globals(), number=10000)
0.7330851459992118
>>>
# check results are the same
>>> bk = b.copy(); mask=bk==0; bk[mask] = a[mask]
>>> bp = b.copy(); np.copyto(bp, a, where=bp==0)
>>> np.all(bk==bp)
True
I have 2 numpy arrays:
arr_a = array(['1m_nd', '2m_nd', '1m_4wk'],
dtype='<U15')
arr_b = array([0, 1, 1])
I want to select elements from arr_a based on arr_b. I am doing this:
arr_a[arr_b], but I get this as result:
array(['1m_nd', '2m_nd', '2m_nd'],
dtype='<U15')
instead of:
array(['2m_nd', '1m_4wk'],
dtype='<U15')
How do i fix this?
You need to pass it a boolean array, for example:
>>> arr_a[arr_b>0]
array(['2m_nd', '1m_4wk'],
dtype='<U15')
Given arr_a and arr_b, Running the following will give the boolean array for each of the elements in arr_b whose value is 1 => True and 0 => False . Correspondingly the boolean values are checked with the index value in arr_a. Here is the line of code you'd need.
>>> arr_a[arr_b == 1]
array([u'2m_nd', u'1m_4wk'],
dtype='<U15')
But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.
Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])