How can i accomplish column addition with shift using python numpy arrays ?
I have two dimensional array and need it's extended copy.
a = array([[0, 2, 4, 6, 8],
[1, 3, 5, 7, 9]])
i want something like (following is in pseudo code, it doesn't work; there is no a.columns in numpy as far as i know):
shift = 3
mult_factor = 0.7
for column in a.columns - shift :
out[column] = a[column] + 0.7 * a[column + shift]
I also know, that i can do the something similar to what i need using indexes. But i seems that is really overkill enumerating three values and using only one (j) :
for (i,j),value in np.ndenumerate(a):
print i,j
I founded, that i could iterate over columns, but not their indexes:
for column in a.T:
print column
Than i though that i can simply do this with something that is similar to xrange, but applying to multidimensional array:
In [225]: for column in np.ndindex(a.shape[1]):
print column
.....:
(0,)
(1,)
(2,)
(3,)
(4,)
So now i only know how to do this with simple xrange and i am not sure, that is the best solution.
out = np.zeros(a.shape)
shift = 2
mult_factor = 0.7
for i in xrange(a.shape[1]-shift):
print a[:, i]
out[:, i] = a[:, i] + mult_factor * a[:, i+shift]
However it will be not so fast in Python as it maybe can be.
Can you give me an advice how it will be in performance and maybe there is more faster way to accomplish column addition of numpy arrays with shift ?
out = a[:, :-shift] + mult_factor * a[:, shift:]
I think this is what you're looking for. It's a vectorized form of your loop, operating on large slices of a instead of column by column.
I'm not positive I completely understand what the computed quantity should be, but here are two things that seem germane to what you are asking:
If you have a 2D array, called a that you wish to convert to a list of 1D arrays which are the columns of a you can do this
cols = [c for c in a.T]
It looks like what you want can be accomplished with matrix multiplication if I am not mistaken. You could make a banded matrix in numpy using numpy.diag or, since you would have the same values along each band 1, mult_factor, or 0, you could use scipy.linalg.toeplitz
m,n = a.shape
band = np.eye(1,n)
band[0,shift] = mult_factor
T = scipy.linalg.toeplitz(np.eye(1,m),band)
out = np.inner(a,T)
For large matrices, it might make sense to use a sparse matrix for T if you only want to add two or a few columns of a.
Related
My problem
Suppose I have
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)
I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!
How can I do that?
My Try
count = 0
for bitem in b:
for aitem in a:
if aitem==bitem:
count+=1
Is there a better way? Especially in one line, maybe with some comprehension..
The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:
import numpy_indexed as npi
count = len(npi.intersection(a, b))
Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:
count = npi.in_(b, a).sum()
Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.
Here is a simple way to do it:
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = np.count_nonzero(
np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))
print(count)
>>> 2
You can do what you want in one liner as follows:
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
Explanation
Here's an explanation of what's happening:
Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list
Full example:
The final code looks like this:
import numpy as np
from itertools import product
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays
def void_arr(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b[np.in1d(void_arr(b), void_arr(a))]
array([[5, 6],
[1, 2]])
If you just want the number of intersections, it's
np.in1d(void_arr(b), void_arr(a)).sum()
2
Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)
For more information, see the third answer here
Maybe I'm just being lazy here, but let's say that I have two arrays, of length n and m, and I'd like a pairwise minimum of all of the elements of the two arrays compared against each other. For example:
a = [1,5,3]
b = [2,4]
cross_min(a,b)
= [[1,1],[2,4],[2,3]]
This is similar to the behavior of np.outer(), except that instead of multiplying the two arrays, it computes the minimum of the two elements.
Is there an operation in numpy that does a similar thing?
I know that I can just run np.minimum() along b and stack the results together. I'm wondering if this is a well-known operation that I just don't know the name of.
You can use np.minimum.outer(a, b)
You might turn one of the array into a 2d array, and then make use of the broadcasting rule and np.minimum:
import numpy as np
a = np.array([1,5,3])
b = np.array([2,4])
np.minimum(a[:,None], b)
#array([[1, 1],
# [2, 4],
# [2, 3]])
I get two 3d matrix A (32x3x3) and B(32x3x3), and I want to get matrix C with dimension 32x3x3. The calculation can be done using loop like:
a = numpy.random.rand(32, 3, 3)
b = numpy.random.rand(32, 3, 3)
c = numpy.random.rand(32, 3, 3)
for i in range(32):
c[i] = numpy.dot(a[i], b[i])
I believe there must be a more efficient one-line solution to this problem. Can anybody help, thanks.
You could do this using np.einsum:
In [142]: old = orig(a,b)
In [143]: new = np.einsum('ijk,ikl->ijl', a, b)
In [144]: np.allclose(old, new)
Out[144]: True
One advantage of using einsum is that you can almost read off what it's doing from the indices: leave the first axis alone (i), and perform a matrix multiplication on the last two (jk,kl->jl)).
I have a 2D numpy array that I need to take the max of along a specific axis. I then need to later know which indexes were selected for this operation as a mask for another operation which is only done on those same indexes but on another array of the same shape.
Right how I'm doing it by using 2d array indexing, but it's slow and kind of convoluted, particularly the mgrid hack to generate the row indexes. It's just [0,1] for this example but I need the robustness to work with arbitrary shapes.
a = np.array([[0,0,5],[0,0,5]])
b = np.array([[1,1,1],[1,1,1]])
columnIndexes = np.argmax(a,axis=1)
rowIndexes = np.mgrid[0:a.shape[0],0:columnIdx.size-1][0].flatten()
b[rowIndexes,columnIndexes] = b[rowIndexes,columnIndexes]+1
B should now be array([[1,1,2],[1,1,2]]) since it preformed the operation on b for only the indexes of the max along the columns of a.
Anyone know a better way? Preferably using just boolean masking arrays so that I can port this code to run on a GPU without too much hassle. Thanks!
I will suggest an answer but with slightly different data.
c = np.array([[0,1,1],[2,1,0]]) # note that this data has dupes for max in row 1
d = np.array([[0,10,10],[20,10,0]]) # data to be chaged
c_argmax = np.argmax(c,axis=1)[:,np.newaxis]
b_map1 = c_argmax == np.arange(c.shape[1])
# now use the bool map as you described
d[b_map1] += 1
d
[out]
array([[ 0, 11, 10],
[21, 10, 0]])
Note that I created an original with a duplicate of the largest number. The above works with argmax as you requested but you might have wanted to increment all max values. as in:
c_max = np.max(c,axis=1)[:,np.newaxis]
b_map2 = c_max == c
d[b_map2] += 1
d
[out]
array([[ 0, 12, 11],
[22, 10, 0]])
I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.
The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))
I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).
What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])
IMHO, this is simplest variant:
m[np.arange(4), select]
Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])