cumulative sum along given dimension from scratch

cumulative sum along given dimension from scratch - python

I was recently given task (during exam, not funny) to create function returning cumulative sum along given dimension (input: 2d array), without use of np.cumsum ofc; to be honest i find this quite hard to even start with.
function should look like this:
def cumsum_2d(array : np.ndarray, dim : int = 0) -> np.ndarray:
and then result is supposed to be compared with result from actual np.cumsum
I would be grateful for even basic outline or general idea what to do.

Here is another approach that doesn't use ufunc.accumulate or functools.reduce.
It works by inserting an extra dimension, broadcasting the array along that dimension, and then doing a sum where it only considers indices less than or equal to the current index along the summation dimension.
It's morally similar to a brute-force approach where you make a bunch of copies of the array, set the elements you don't want to zero, and then doing the sum.
import numpy as np
def cumsum_2d(array: np.ndarray, dim: int = 0):
# Make sure the dim argument is positive
dim = dim % array.ndim
# Calculate the new shape with an extra copy of dim
shape_new = list(array.shape)
shape_new.insert(dim + 1, array.shape[dim])
# Insert the new dimension and broadcast the array along that dimension
array = np.broadcast_to(np.expand_dims(array, dim + 1), shape_new)
# Save the indices of the array
indices = np.indices(array.shape)
# Sum along the requested dimension, considering only the elements less than the current index
return np.sum(array, axis=dim, where=indices[dim] <= indices[dim + 1])
a = np.random.random((4, 5))
assert np.array_equal(cumsum_2d(a, 1), np.cumsum(a, 1))
assert np.array_equal(cumsum_2d(a, 0), np.cumsum(a, 0))
assert np.array_equal(cumsum_2d(a, -1), np.cumsum(a, -1))
assert np.array_equal(cumsum_2d(a, -2), np.cumsum(a, -2))
Note that this function should work for arrays of any rank, not just two-dimensional ones.

This approach is fairly "from scratch". It does use functools.reduce(), which I assume must be permitted.
import functools
import numpy as np
def cumsum_2d(array: np.ndarray, dim: int = 0) -> np.ndarray:
if not isinstance(dim, int) or not 0 <= dim <= 1:
raise ValueError('"dim": expected integer 0 or 1, got {dim}.')
elif not array.ndim == 2:
raise ValueError(
f"{array.ndim} dimensional array not allowed - 2 dimensional arrays expected."
)
array = array.T if dim == 1 else array
result = [
functools.reduce(lambda x, y: x + y, array[: i + 1]) for i in range(len(array))
]
result = np.array(result)
result = result.T if dim == 1 else result
return result
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dim = 1
print(f"For dim = {dim} and a= \n{a}:")
print(f"...got: \n{cumsum_2d(a, dim)}")
print(f"...expected: \n{np.cumsum(a, dim)}")
This has the result:
# For dim = 1 and a=
# [[1 2 3]
# [4 5 6]
# [7 8 9]]:
# ...got:
# [[ 1 3 6]
# [ 4 9 15]
# [ 7 15 24]]
# ...expected:
# [[ 1 3 6]
# [ 4 9 15]
# [ 7 15 24]]
Trying with dim = 1 raises ValueError per the function definition - this mimics the AxisError raised by np.cumsum under similar circumstances:
ValueError: "dim": expected integer 0 or 1, got 2.
Lastly, trying with a non 2-D array also raises an customised ValueError as programmed, ensuring the user doesn't get any silently passed unexpected behaviour.
b = np.array([[[1, 2, 3], [1, 2, 3]], [[4, 5, 6], [1, 2, 3]], [[7, 8, 9], [1, 2, 3]]])
cumsum_2d(b, dim)
Result:
ValueError: 3 dimensional array not allowed - 2 dimensional arrays expected.

Related

Python equivalent of Matlab shiftdim()

I am currently converting some Matlab code to Python and I am wondering if there is a similar function to Matlab's shiftdim(A, n)
B = shiftdim(A,n) shifts the dimensions of an array A by n positions. shiftdim shifts the dimensions to the left when n is a positive integer and to the right when n is a negative integer. For example, if A is a 2-by-3-by-4 array, then shiftdim(A,2) returns a 4-by-2-by-3 array.

If you use numpy you can use np.moveaxis.
From the docs:
>>> x = np.zeros((3, 4, 5))
>>> np.moveaxis(x, 0, -1).shape
(4, 5, 3)
>>> np.moveaxis(x, -1, 0).shape
(5, 3, 4)
numpy.moveaxis(a, source, destination)[source]
Parameters
a: np.ndarray
The array whose axes should be reordered.
source: int or sequence of int
Original positions of the axes to move. These must be unique.
destination: int or sequence of int
Destination positions for each of the original axes.
These must also be unique.

shiftdim's function is a bit more complex than shifting axes around.
For input shiftdim(A, n), if n is positive, shift the axes to the left by n (i.e., rotate), but if n is negative, shift the axes to the right and append trailing dimensions of size 1.
For input shiftdim(A), remove any trailing dimensions of size 1.
from collections import deque
import numpy as np
def shiftdim(array, n=None):
if n is not None:
if n >= 0:
axes = tuple(range(len(array.shape)))
new_axes = deque(axes)
new_axes.rotate(n)
return np.moveaxis(array, axes, tuple(new_axes))
return np.expand_dims(array, axis=tuple(range(-n)))
else:
idx = 0
for dim in array.shape:
if dim == 1:
idx += 1
else:
break
axes = tuple(range(idx))
# Note that this returns a tuple of 2 results
return np.squeeze(array, axis=axes), len(axes)
Same examples as the Matlab docs
a = np.random.uniform(size=(4, 2, 3, 5))
print(shiftdim(a, 2).shape) # prints (3, 5, 4, 2)
print(shiftdim(a, -2).shape) # prints (1, 1, 4, 2, 3, 5)
a = np.random.uniform(size=(1, 1, 3, 2, 4))
b, nshifts = shiftdim(a)
print(nshifts) # prints 2
print(b.shape) # prints (3, 2, 4)

How to use the output of argmin as index with Numpy [duplicate]

This question already has an answer here:
Numpy get values from np.argmin indices [duplicate]
(1 answer)
Closed 1 year ago.
I want to find the location of minima along a given axis in a rank-3 numpy array. I have obtained these locations with np.argmin, however I'm not sure how to "apply" this to the original matrix to get the actual minima.
For example:
import numpy as np
a = np.random.randn(10, 5, 2)
min_loc = a.argmin(axis = 0) # this gives an array of shape (5, 2)
Now, the problem is how do I get the actual minima using min_loc? I have tried a[min_loc], which gives me a shape (5, 2, 5, 2). What's the logic for this shape? How can I use this auxiliary matrix to get a sensible solution of shape (5, 2)
Note that a.min(axis = 0) is not the solution I'm looking for. I need a solution via argmin.

a[min_loc] does integer array indexing on the first dimension, i.e. it will pick up (5, 2) shaped array for each index in min_loc. Since min_loc itself is (5, 2) shaped, and for each integer in min_loc, it picks up another (5, 2) shaped array. You end up with a (5, 2, 5, 2) array. Same reason a[np.array([0, 3])] has a shape of (2, 5, 2) and a[np.array([[0], [3]])] has a shape of (2, 1, 5, 2), since you only provide the index for the 1st dimension.
For your usecase, you do not want to pick up a subarray for each index in min_loc but rather you need an element. For instance, if you have min_loc = [[5, ...], ...], the first element should have a full indice of 5, 0, 0 instead of 5, :, :. This is exactly what advanced indexing does. Basically by providing an integer array as index for each dimension, you can pick up the element corresponding to the specific positions. And you can construct indices for the 2nd and 3rd dimensions from a (5, 2) shape with np.indices:
j, k = np.indices(min_loc.shape)
a[min_loc, j, k]
# [[-1.82762089 -0.80927253]
# [-1.06147046 -1.70961507]
# [-0.59913623 -1.10963768]
# [-2.57382762 -0.77081778]
# [-1.6918745 -1.99800825]]
where j, k are coordinates for the 2nd and 3rd dimensions:
j
#[[0 0]
# [1 1]
# [2 2]
# [3 3]
# [4 4]]
k
#[[0 1]
# [0 1]
# [0 1]
# [0 1]
# [0 1]]
Or as #hpaulj commented, use np.take_along_axis method:
np.take_along_axis(a, min_loc[None], axis=0)
# [[[-0.93515242 -2.29665325]
# [-1.30864779 -1.483428 ]
# [-1.24262879 -0.71030707]
# [-1.40322789 -1.35580273]
# [-2.10997209 -2.81922197]]]

Finding an index numpy python

Consider a NumPy array of shape (8, 8).
My Question: What is the index (x,y) of the 50th element?
Note: For counting the elements go row-wise.
Example, in array A, where A = [[1, 5, 9], [3, 0, 2]] the 5th element would be '0'.
Can someone explain how to find the general solution for this and, what would be the solution for this specific problem?

You can use unravel_index to find the coordinates corresponding to the index of the flattened array. Usually np.arrays start with index 0, you have to adjust for this.
import numpy as np
a = np.arange(64).reshape(8,8)
np.unravel_index(50-1, a.shape)
Out:
(6, 1)

In a NumPy array a of shape (r, c) (just like a list of lists), the n-th element is
a[(n-1) // c][(n-1) % c],
assuming that n starts from 1 as in your example.
It has nothing to do with r. Thus, when r = c = 8 and n = 50, the above formula is exactly
a[6][1].
Let me show more using your example:
from numpy import *
a = array([[1, 5, 9], [3, 0, 2]])
r = len(a)
c = len(a[0])
print(f'(r, c) = ({r}, {c})')
print(f'Shape: {a.shape}')
for n in range(1, r * c + 1):
print(f'Element {n}: {a[(n-1) // c][(n-1) % c]}')
Below is the result:
(r, c) = (2, 3)
Shape: (2, 3)
Element 1: 1
Element 2: 5
Element 3: 9
Element 4: 3
Element 5: 0
Element 6: 2

numpy.ndarray.faltten(a) returns a copy of the array a collapsed into one dimension. And please note that the counting starts from 0, therefore, in your example 0 is the 4th element and 1 is the 0th.
import numpy as np
arr = np.array([[1, 5, 9], [3, 0, 2]])
fourth_element = np.ndarray.flatten(arr)[4]
or
fourth_element = arr.flatten()[4]
the same for 8x8 matrix.

First need to create a 88 order 2d numpy array using np.array and range.Reshape created array as 88
In the output you check index of 50th element is [6,1]
import numpy as np
arr = np.array(range(1,(8*8)+1)).reshape(8,8)
print(arr[6,1])
output will be 50
or you can do it in generic way as well by the help of numpy where method.
import numpy as np
def getElementIndex(array: np.array, element):
elementIndex = np.where(array==element)
return f'[{elementIndex[0][0]},{elementIndex[1][0]}]'
def getXYOrderNumberArray(x:int, y:int):
return np.array(range(1,(x*y)+1)).reshape(x,y)
arr = getXYOrderNumberArray(8,8)
print(getElementIndex(arr,50))

Sort invariant for numpy.argsort with multiple dimensions

numpy.argsort docs state
Returns:
index_array : ndarray, int
Array of indices that sort a along the specified axis. If a is one-dimensional, a[index_array] yields a sorted a.
How can I apply the result of numpy.argsort for a multidimensional array to get back a sorted array? (NOT just a 1-D or 2-D array; it could be an N-dimensional array where N is known only at runtime)
>>> import numpy as np
>>> np.random.seed(123)
>>> A = np.random.randn(3,2)
>>> A
array([[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471],
[-0.57860025, 1.65143654]])
>>> i=np.argsort(A,axis=-1)
>>> A[i]
array([[[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471]],
[[ 0.2829785 , -1.50629471],
[-1.0856306 , 0.99734545]],
[[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471]]])
For me it's not just a matter of using sort() instead; I have another array B and I want to order B using the results of np.argsort(A) along the appropriate axis. Consider the following example:
>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = np.argsort(A,axis=-1)
>>> BsortA = ???
# should result in [[4,1,3],[5,1,9]]
# so that corresponding elements of B and sort(A) stay together
It looks like this functionality is already an enhancement request in numpy.

The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.
def take_along_axis(arr, ind, axis):
"""
... here means a "pack" of dimensions, possibly empty
arr: array_like of shape (A..., M, B...)
source array
ind: array_like of shape (A..., K..., B...)
indices to take along each 1d slice of `arr`
axis: int
index of the axis with dimension M
out: array_like of shape (A..., K..., B...)
out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
"""
if axis < 0:
if axis >= -arr.ndim:
axis += arr.ndim
else:
raise IndexError('axis out of range')
ind_shape = (1,) * ind.ndim
ins_ndim = ind.ndim - (arr.ndim - 1) #inserted dimensions
dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))
# could also call np.ix_ here with some dummy arguments, then throw those results away
inds = []
for dim, n in zip(dest_dims, arr.shape):
if dim is None:
inds.append(ind)
else:
ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
inds.append(np.arange(n).reshape(ind_shape_dim))
return arr[tuple(inds)]
which yields
>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
[0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
[5, 1, 9]])

This argsort produces a (3,2) array
In [453]: idx=np.argsort(A,axis=-1)
In [454]: idx
Out[454]:
array([[0, 1],
[1, 0],
[0, 1]], dtype=int32)
As you note applying this to A to get the equivalent of np.sort(A, axis=-1) isn't obvious. The iterative solution is sort each row (a 1d case) with:
In [459]: np.array([x[i] for i,x in zip(idx,A)])
Out[459]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
While probably not the fastest, it is probably the clearest solution, and a good starting point for conceptualizing a better solution.
The tuple(inds) from the take solution is:
(array([[0],
[1],
[2]]),
array([[0, 1],
[1, 0],
[0, 1]], dtype=int32))
In [470]: A[_]
Out[470]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
In other words:
In [472]: A[np.arange(3)[:,None], idx]
Out[472]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
The first part is what np.ix_ would construct, but it does not 'like' the 2d idx.
Looks like I explored this topic a couple of years ago
argsort for a multidimensional ndarray
a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
I tried to explain what is going on. The take function does the same sort of thing, but constructs the indexing tuple for a more general case (dimensions and axis). Generalizing to more dimensions, but still with axis=-1 should be easy.
For the first axis, A[np.argsort(A,axis=0),np.arange(2)] works.

We just need to use advanced-indexing to index along all axes with those indices array. We can use np.ogrid to create open grids of range arrays along all axes and then replace only for the input axis with the input indices. Finally, index into data array with those indices for the desired output. Thus, essentially, we would have -
# Inputs : arr, ind, axis
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
out = arr[tuple(idx)]
Just to make it functional and do error checks, let's create two functions - One to get those indices and second one to feed in the data array and simply index. The idea with the first function is to get the indices that could be re-used for indexing into any arbitrary array which would support the necessary number of dimensions and lengths along each axis.
Hence, the implementations would be -
def advindex_allaxes(ind, axis):
axis = np.core.multiarray.normalize_axis_index(axis,ind.ndim)
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
return tuple(idx)
def take_along_axis(arr, ind, axis):
return arr[advindex_allaxes(ind, axis)]
Sample runs -
In [161]: A = np.array([[3,2,1],[4,0,6]])
In [162]: B = np.array([[3,1,4],[1,5,9]])
In [163]: i = A.argsort(axis=-1)
In [164]: take_along_axis(A,i,axis=-1)
Out[164]:
array([[1, 2, 3],
[0, 4, 6]])
In [165]: take_along_axis(B,i,axis=-1)
Out[165]:
array([[4, 1, 3],
[5, 1, 9]])
Relevant one.

Multi-dimensional gather in Tensorflow

The general solution to this question is being worked on in this github issue, but I was wondering if there are workarounds using tf.gather (or something else) to achieve array indexing using a multi-index. One solution I came up with was to broadcast multiply each index in the multi-idx with the cumulative product of the tensor shape, which produces indices suitable for indexing the flattened tensor:
import tensorflow as tf
import numpy as np
def __cumprod(l):
# Get the length and make a copy
ll = len(l)
l = [v for v in l]
# Reverse cumulative product
for i in range(ll-1):
l[ll-i-2] *= l[ll-i-1]
return l
def ravel_multi_index(tensor, multi_idx):
"""
Returns a tensor suitable for use as the index
on a gather operation on argument tensor.
"""
if not isinstance(tensor, (tf.Variable, tf.Tensor)):
raise TypeError('tensor should be a tf.Variable')
if not isinstance(multi_idx, list):
multi_idx = [multi_idx]
# Shape of the tensor in ints
shape = [i.value for i in tensor.get_shape()]
if len(shape) != len(multi_idx):
raise ValueError("Tensor rank is different "
"from the multi_idx length.")
# Work out the shape of each tensor in the multi_idx
idx_shape = [tuple(j.value for j in i.get_shape()) for i in multi_idx]
# Ensure that each multi_idx tensor is length 1
assert all(len(i) == 1 for i in idx_shape)
# Create a list of reshaped indices. New shape will be
# [1, 1, dim[0], 1] for the 3rd index in multi_idx
# for example.
reshaped_idx = [tf.reshape(idx, [1 if i !=j else dim[0]
for j in range(len(shape))])
for i, (idx, dim)
in enumerate(zip(multi_idx, idx_shape))]
# Figure out the base indices for each dimension
base = __cumprod(shape)
# Now multiply base indices by each reshaped index
# to produce the flat index
return (sum(b*s for b, s in zip(base[1:], reshaped_idx[:-1]))
+ reshaped_idx[-1])
# Shape and slice starts and sizes
shape = (Z, Y, X) = 4, 5, 6
Z0, Y0, X0 = 1, 1, 1
ZS, YS, XS = 3, 3, 4
# Numpy matrix and index
M = np.random.random(size=shape)
idx = [
np.arange(Z0, Z0+ZS).reshape(ZS,1,1),
np.arange(Y0, Y0+YS).reshape(1,YS,1),
np.arange(X0, X0+XS).reshape(1,1,XS),
]
# Tensorflow matrix and indices
TM = tf.Variable(M)
TF_flat_idx = ravel_multi_index(TM, [
tf.range(Z0, Z0+ZS),
tf.range(Y0, Y0+YS),
tf.range(X0, X0+XS)])
TF_data = tf.gather(tf.reshape(TM,[-1]), TF_flat_idx)
with tf.Session() as S:
S.run(tf.initialize_all_variables())
# Obtain data via flat indexing
data = S.run(TF_data)
# Check that it agrees with data obtained
# by numpy smart indexing
assert np.all(data == M[idx])
However, this only works on tensors of rank 3 due to this (current) limitation limiting broadcasts to tensors of rank 3.
At the moment I can only think of doing a chained gather, transpose, gather, transpose, gather, but this is unlikely to be efficient. e.g.
shape = (8, 9, 10)
A = tf.random_normal(shape)
data = tf.gather(tf.transpose(tf.gather(A, [1, 3]), [1,0,2]), ...)
Any ideas?

It sounds like you want gather_nd.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

cumulative sum along given dimension from scratch - python

Related

Python equivalent of Matlab shiftdim()

How to use the output of argmin as index with Numpy [duplicate]

Finding an index numpy python

Sort invariant for numpy.argsort with multiple dimensions

Multi-dimensional gather in Tensorflow

Categories

Resources