Numpy: vectorized access of several columns at once?

Numpy: vectorized access of several columns at once? - python

I have scripts with multi-dimensional arrays and instead of for-loops I would like to use a vectorized implementation for my problems (which sometimes contain column operations).
Let's consider a simple example with matrix arr:
> arr = np.arange(12).reshape(3, 4)
> arr
> ([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
> arr.shape
> (3, 4)
So we have a matrix arr with 3 rows and 4 columns.
The simplest case in my scripts is adding something to the values in the array. E.g. I'm doing this for single or multiple rows:
> someVector = np.array([1, 2, 3, 4])
> arr[0] += someVector
> arr
> array([[ 1, 3, 5, 7], <--- successfully added someVector
[ 4, 5, 6, 7], to one row
[ 8, 9, 10, 11]])
> arr[0:2] += someVector
> arr
> array([[ 2, 5, 8, 11], <--- added someVector to two
[ 5, 7, 9, 11], <--- rows at once
[ 8, 9, 10, 11]])
This works well. However, sometimes I need to manipulate one or several columns. One column at a time works:
> arr[:, 0] += [1, 2, 3]
> array([[ 3, 5, 8, 11],
[ 7, 7, 9, 11],
[11, 9, 10, 11]])
^
|___ added the values [1, 2, 3] successfully to
this column
But I am struggling to think out why this does not work for multiple columns at once:
> arr[:, 0:2] += [1, 2, 3]
> ValueError
> Traceback (most recent call last)
> <ipython-input-16-5feef53e53af> in <module>()
> ----> 1 arr[:, 0:2] += [1, 2, 3]
> ValueError: operands could not be broadcast
> together with shapes (3,2) (3,) (3,2)
Isn't this the very same way it works with rows? What am I doing wrong here?

To add a 1D array to multiple columns you need to broadcast the values to a 2D array. Since broadcasting adds new axes on the left (of the shape) by default, broadcasting a row vector to multiple rows happens automatically:
arr[0:2] += someVector
someVector has shape (N,) and gets automatically broadcasted to shape (1, N). If arr[0:2] has shape (2, N), then the sum is performed element-wise as though both arr[0:2] and someVector were arrays of the same shape, (2, N).
But to broadcast a column vector to multiple columns requires hinting NumPy that you want broadcasting to occur with the axis on the right. In fact, you have to add the new axis on the right explicitly by using someVector[:, np.newaxis] or equivalently someVector[:, None]:
In [41]: arr = np.arange(12).reshape(3, 4)
In [42]: arr[:, 0:2] += np.array([1, 2, 3])[:, None]
In [43]: arr
Out[43]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])
someVector (e.g. np.array([1, 2, 3])) has shape (N,) and someVector[:, None] has shape (N, 1) so now broadcasting happens on the right. If arr[:, 0:2] has shape (N, 2), then the sum is performed element-wise as though both arr[:, 0:2] and someVector[:, None] were arrays of the same shape, (N, 2).

Very clear explanation of #unutbu.
As a complement, transposition (.T) can often simplify the task, by working in the first dimension :
In [273]: arr = np.arange(12).reshape(3, 4)
In [274]: arr.T[0:2] += [1, 2, 3]
In [275]: arr
Out[275]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])

Related

Apply multiple masks at once to a Numpy array

Is there a way to apply multiple masks at once to a multi-dimensional Numpy array?
For instance:
X = np.arange(12).reshape(3, 4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
m0 = (X>0).all(axis=1) # array([False, True, True])
m1 = (X<3).any(axis=0) # array([ True, True, True, False])
# In one step: error
X[m0, m1]
# IndexError: shape mismatch: indexing arrays could not
# be broadcast together with shapes (2,) (3,)
# In two steps: works (but awkward)
X[m0, :][:, m1]
# array([[ 4, 5, 6],
# [ 8, 9, 10]])

Try:
>>> X[np.ix_(m0, m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])
From the docs:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Another solution (also straight from the docs but less intuitive IMO):
>>> X[m0.nonzero()[0][:, np.newaxis], m1]
array([[ 4, 5, 6],
[ 8, 9, 10]])

The error tells you what you need to do: the mask dimensions need to broadcast together. You can fix this at the source:
m0 = (X>0).all(axis=1, keepdims=True)
m1 = (X<3).any(axis=0, keepdims=True)
>>> X[m0 & m1]
array([ 4, 5, 6, 8, 9, 10])
You only really need to apply keepdims to m0, so you can leave the masks as 1D:
>>> X[m0[:, None] & m1]
array([ 4, 5, 6, 8, 9, 10])
You can reshape to the desired shape:
>>> X[m0[:, None] & m1].reshape(np.count_nonzero(m0), np.count_nonzero(m1))
array([[ 4, 5, 6],
[ 8, 9, 10]])
Another option is to convert the masks to indices:
>>> X[np.flatnonzero(m0)[:, None], np.flatnonzero(m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])

Numpy call array values by list of indices

I have a 2D array of values, and I want to call it by two list of indices x,y. It used to work perfect before, I don't know why it's not working now, maybe python version, not sure.
x = np.squeeze(np.where(data['info'][:,2]==cdp)[0])
y = np.squeeze(np.where((data['Time']>=ub) & (data['Time']<=lb))[0])
s = data['gather'][x,y]
Error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (36,) (45,)
I don't what is the problem. It works when I do it in two stages.
s = data['gather'][:,y]; s = s[x,:]
But, I can't do this, I need to do at one run

In [92]: data = np.arange(12).reshape(3,4)
In [93]: x,y = np.arange(3), np.arange(4)
In [94]: data[x,y]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-94-8bd18da6c0ef> in <module>
----> 1 data[x,y]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (4,)
When you provide 2 or more arrays as indices, numpy broadcasts them against each other. Understanding broadcasting is important.
In MATLAB providing two indexing arrays (actually 2d matrices) fetches a block. In numpy, to arrays, if they match in shape, fetch elements, e.g. a diagonal:
In [99]: data[x,x]
Out[99]: array([ 0, 5, 10])
The MATLAB equivalent requires an extra function, 'indices to sub' or some such name.
Two stage indexing:
In [95]: data[:,y][x,:]
Out[95]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
ix_ is a handy tool for constructing indices for block access:
In [96]: data[np.ix_(x,y)]
Out[96]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Notice what it produces:
In [97]: np.ix_(x,y)
Out[97]:
(array([[0],
[1],
[2]]), array([[0, 1, 2, 3]]))
that's the same as doing:
In [98]: data[x[:,None], y]
Out[98]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
x[:,None] is (3,1), y is (4,); they broadcast to produce a (3,4) selection.

Accessing chunks at once in a numpy array

Provided a numpy array:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
I wonder how access chosen size chunks with chosen separation, both concatenated and in slices:
E.g.: obtain chunks of size 3 separated by two values:
arr_chunk_3_sep_2 = np.array([0,1,2,5,6,7,10,11,12])
arr_chunk_3_sep_2_in_slices = np.array([[0,1,2],[5,6,7],[10,11,12])
Wha is the most efficient way to do it? If possible, I would like to avoid copying or creating new objects as much as possible. Maybe Memoryviews could be of help here?

Approach #1
Here's one with masking -
def slice_grps(a, chunk, sep):
N = chunk + sep
return a[np.arange(len(a))%N < chunk]
Sample run -
In [223]: arr
Out[223]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [224]: slice_grps(arr, chunk=3, sep=2)
Out[224]: array([ 0, 1, 2, 5, 6, 7, 10, 11, 12])
Approach #2
If the input array is such that the last chunk would have enough runway, we could , we could leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -
# https://stackoverflow.com/a/51640641/ #Divakar
def skipped_view(a, m, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
shp = ((a.size+n-1)//n,n)
return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]
out = skipped_view(arr,chunk,chunk+sep)
Note that the output would be a view into the input array and as such no extra memory overhead and virtually free!
Sample run to make things clear -
In [255]: arr
Out[255]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [256]: chunk = 3
In [257]: sep = 2
In [258]: skipped_view(arr,chunk,chunk+sep)
Out[258]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
# Let's prove that the output is a view indeed
In [259]: np.shares_memory(arr, skipped_view(arr,chunk,chunk+sep))
Out[259]: True

How about a reshape and slice?
In [444]: arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
In [445]: arr.reshape(-1,5)
...
ValueError: cannot reshape array of size 13 into shape (5)
Ah a problem - your array isn't big enough for this reshape - so we have to pad it:
In [446]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)
Out[446]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 0, 0]])
In [447]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)[:,:-2]
Out[447]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
as_strided can get a way with this by including bytes outside the databuffer. Usually that's seen as a bug, though here it can be an asset - provided you really do throw that garbage away.
Or throwing away the last incomplete line:
In [452]: arr[:-3].reshape(-1,5)[:,:3]
Out[452]:
array([[0, 1, 2],
[5, 6, 7]])

Tensorflow- Cartesian product of two 2-D tensors

I have two 2-D tensors and want to have Cartesian product of them. By Cartesian, I mean the concat of every row of first tensor with every row of second tensor. For example:
Input:
[[1,2,3],[4,5,6]]
and
[[7,8],[9,10]]
Output:
[[1,2,3,7,8],
[1,2,3,9,10],
[4,5,6,7,8],
[4,5,6,9,10]]
I've seen this post, but it doesn't work for this case. What is the best for it?
Thanks

Here is one way. Repeat elements a and b along the second and first dimension respectively, further reshape repeated a and then concatenate the two repeated tensors.
a_ = tf.reshape(tf.tile(a, [1, b.shape[0]]), (a.shape[0] * b.shape[0], a.shape[1]))
b_ = tf.tile(b, [a.shape[0], 1])
tf.concat([a_, b_], 1).eval()
#array([[ 1, 2, 3, 7, 8],
# [ 1, 2, 3, 9, 10],
# [ 4, 5, 6, 7, 8],
# [ 4, 5, 6, 9, 10]])

numpy: how to get a max from an argmax result

I have a numpy array of arbitrary shape, e.g.:
a = array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
a.shape = (2, 3, 2)
and a result of argmax over the last axis:
np.argmax(a, axis=-1) = array([[1, 1, 0],
[1, 0, 1]])
I'd like to get max:
np.max(a, axis=-1) = array([[ 2, 4, 8],
[ 8, 9, 12]])
But without recalculating everything. I've tried:
a[np.arange(len(a)), np.argmax(a, axis=-1)]
But got:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)
How to do it? Similar question for 2-d: numpy 2d array max/argmax

You can use advanced indexing -
In [17]: a
Out[17]:
array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
In [18]: idx = a.argmax(axis=-1)
In [19]: m,n = a.shape[:2]
In [20]: a[np.arange(m)[:,None],np.arange(n),idx]
Out[20]:
array([[ 2, 4, 8],
[ 8, 9, 12]])
For a generic ndarray case of any number of dimensions, as stated in the comments by #hpaulj, we could use np.ix_, like so -
shp = np.array(a.shape)
dim_idx = list(np.ix_(*[np.arange(i) for i in shp[:-1]]))
dim_idx.append(idx)
out = a[dim_idx]

For ndarray with arbitrary shape, you can flatten the argmax indices, then recover the correct shape, as so:
idx = np.argmax(a, axis=-1)
flat_idx = np.arange(a.size, step=a.shape[-1]) + idx.ravel()
maximum = a.ravel()[flat_idx].reshape(*a.shape[:-1])

For arbitrary-shape arrays, the following should work :)
a = np.arange(5 * 4 * 3).reshape((5,4,3))
# for last axis
argmax = a.argmax(axis=-1)
a[tuple(np.indices(a.shape[:-1])) + (argmax,)]
# for other axis (eg. axis=1)
argmax = a.argmax(axis=1)
idx = list(np.indices(a.shape[:1]+a.shape[2:]))
idx[1:1] = [argmax]
a[tuple(idx)]
or
a = np.arange(5 * 4 * 3).reshape((5,4,3))
argmax = a.argmax(axis=0)
np.choose(argmax, np.moveaxis(a, 0, 0))
argmax = a.argmax(axis=1)
np.choose(argmax, np.moveaxis(a, 1, 0))
argmax = a.argmax(axis=2)
np.choose(argmax, np.moveaxis(a, 2, 0))
argmax = a.argmax(axis=-1)
np.choose(argmax, np.moveaxis(a, -1, 0))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: vectorized access of several columns at once? - python

Very clear explanation of #unutbu. As a complement, transposition (.T) can often simplify the task, by working in the first dimension : In [273]: arr = np.arange(12).reshape(3, 4) In [274]: arr.T[0:2] += [1, 2, 3] In [275]: arr Out[275]: array([[ 1, 2, 2, 3], [ 6, 7, 6, 7], [11, 12, 10, 11]])

Related

Apply multiple masks at once to a Numpy array

Numpy call array values by list of indices

Accessing chunks at once in a numpy array

Tensorflow- Cartesian product of two 2-D tensors

numpy: how to get a max from an argmax result

Categories

Resources