Apply function n items at a time along axis - python

I am looking for a way to apply a function n items at the time along an axis. E.g.
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8]])
If I apply sum across the rows 2 items at a time I get:
array([[ 4, 6],
[ 12, 14]])
Which is the sum of 1st 2 rows and the last 2 rows.
NB: I am dealing with much larger array and I have to apply the function to n items which I can be decided at runtime.
The data extends along different axis. E.g.
array([[... [ 1, 2, ...],
[ 3, 4, ...],
[ 5, 6, ...],
[ 7, 8, ...],
...], ...])

This is a reduction:
numpy.add.reduceat(a, [0,2])
>>> array([[ 4, 6],
[12, 14]], dtype=int32)
As long as by "larger" you mean longer in the "y" axis, you can extend:
a = numpy.array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12]])
numpy.add.reduceat(a, [0,2,4])
>>> array([[ 4, 6],
[12, 14],
[20, 22]], dtype=int32)
EDIT: actually, this works fine for "larger in both dimensions", too:
a = numpy.arange(24).reshape(6,4)
numpy.add.reduceat(a, [0,2,4])
>>> array([[ 4, 6, 8, 10],
[20, 22, 24, 26],
[36, 38, 40, 42]], dtype=int32)
I will leave it up to you to adapt the indices to your specific case.

Reshape splitting the first axis into two axes, such that the second split axis is of length n to have a 3D array and then sum along that split axis, like so -
a.reshape(a.shape[0]//n,n,a.shape[1]).sum(1)
It should be pretty efficient as reshaping just creates a view into input array.
Sample run -
In [55]: a
Out[55]:
array([[2, 8, 0, 0],
[1, 5, 3, 3],
[6, 1, 4, 7],
[0, 4, 0, 7],
[8, 0, 8, 1],
[8, 3, 3, 8]])
In [56]: n = 2 # Sum every two rows
In [57]: a.reshape(a.shape[0]//n,n,a.shape[1]).sum(1)
Out[57]:
array([[ 3, 13, 3, 3],
[ 6, 5, 4, 14],
[16, 3, 11, 9]])

How about something like this?
n = 2
# calculate the cumsum along axis 0 and take one row from every n rows
cumarr = arr.cumsum(axis = 0)[(n-1)::n]
# calculate the difference of the resulting numpy array along axis 0
np.vstack((cumarr[0][None, :], np.diff(cumarr, axis=0)))
# array([[ 4, 6],
# [12, 14]])

Related

How can I use numpy.sum to output sum as 2d array without a for loop?

I would like to get the sum of each teams score in each round, currently I am able to get the sum of each teams scores from round 1.
Current Input
scores = np.array([
[1, 2, 2, 3, 5, 8, 12], #Round 1
[11, 3, 9, 2, 3, 5, 10]] # Round 2
)
teams = np.array([[0, 1, 2], [1, 2, 3], [6, 5, 4]])
np.sum(np.take(scores, teams), axis=1)
This outputs the correct sum for each team in the round 1
array([ 5, 7, 25])
Is there a way to make it output the sum for each team in each round without using a for loop?
Desired Output
array([[ 5, 7, 25], [23, 14, 18]])
You can add axis=1 to np.take, and then sum on axis 2:
>>> np.take(scores, teams, axis=1).sum(axis=2)
array([[ 5, 7, 25],
[23, 14, 18]])
You didn't specify an axis, so take, as documented, works with the flattened array:
In [216]: np.take(scores.ravel(),teams)
Out[216]:
array([[ 1, 2, 2],
[ 2, 2, 3],
[12, 8, 5]])
Using teams to index the columns:
In [220]: scores[:,teams]
Out[220]:
array([[[ 1, 2, 2],
[ 2, 2, 3],
[12, 8, 5]],
[[11, 3, 9],
[ 3, 9, 2],
[10, 5, 3]]])
and summing:
In [221]: scores[:,teams].sum(axis=2)
Out[221]:
array([[ 5, 7, 25],
[23, 14, 18]])

Numpy flatten subarray while maintaining the shape

I have been going over this issue with numpy for a while and cant figure out if there is a intuitive way of converting the array while maintaining the position of the sub-array. The sizes of the array will change depending on the input so doing it manually with concatenate is not an option but i do have the dimensions.
a= np.array([[[0,1],[2,3]],[[4,5],[6,7]],[[8,9],[10,11]],[[12,13],[14,15]]])
reshaping just flattens the array like
[1,2,3,4]
[5,6,7,8]
etc
I have also tried np.block but besides setting the positions manually i have not had any success
The result i would like to get in this case is (4,4):
[[ 0, 1, 4, 5],
[ 2, 3, 6, 7],
[ 8, 9,12,13],
[10,11,14,15]]
Does anyone of you smart people know if there is something in numpy that i could use to get this result?
Your original has the 16 consecutive values reshaped into 4d array:
In [67]: x=np.arange(16).reshape(2,2,2,2)
In [68]: x
Out[68]:
array([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]]])
Reshape to (4,4) keeps that original order - see the 0,1,2,3...
In [69]: x.reshape(4,4)
Out[69]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
You want to swap values:
In [70]: x.transpose(0,2,1,3)
Out[70]:
array([[[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]]],
[[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]]])
which can then be reshaped to (4,4):
In [71]: x.transpose(0,2,1,3).reshape(4,4)
Out[71]:
array([[ 0, 1, 4, 5],
[ 2, 3, 6, 7],
[ 8, 9, 12, 13],
[10, 11, 14, 15]])

How can I calculate the sum of n-elements in a numpy array in python?

Imagine I have an n x d python array, e.g. a=np.array([[1,2,3],[4,5,6], [7,8,9], [10,11,12], [13,14,15]])
so in this case n=5, d=3 and imagine I have some number c which is smaller or equal than n and what I want to calculate is the following:
Consider every column independently and calculate the sum of every c values; e.g. if c=2, the solution would be
solution=np.array([[1+4, 2+5, 3+6], [7+10,8+11,9+12]])
The last row is skipped because 5 mod 2 = 1, so we need to leave out one line in the end;
If c=1, the solution would be the original array and if e.g. c=3 the solution would be
solution=np.array([[1+4+7, 2+5+8, 3+6+9]]), while the last two lines are omitted;
Now what would be the most elegant and efficient solution to do that? I have searched a lot online but could not find a similar problem
Here's one way -
def sum_in_blocks(a, c):
# Get extent of each col for summing
l = c*(len(a)//c)
# Reshape to 3D considering first l rows, and "cutting" after each c rows
# Then sum along second axis
return a[:l].reshape(-1,c,a.shape[1]).sum(1)
More info on second step - General idea for nd to nd transformation.
Sample runs -
In [79]: a
Out[79]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [80]: sum_in_blocks(a, c=1)
Out[80]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [81]: sum_in_blocks(a, c=2)
Out[81]:
array([[ 5, 7, 9],
[17, 19, 21]])
In [82]: sum_in_blocks(a, c=3)
Out[82]: array([[12, 15, 18]])
Explanation with given sample
In [84]: a
Out[84]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [85]: c = 2
In [87]: l = c*(len(a)//c) # = 4; Get extent of each col for summing
In [89]: a[:l] # hence not relevant rows are skipped
Out[89]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
# Reshape to 3D "cutting" after every c=2 rows
In [90]: a[:l].reshape(-1,c,a.shape[1])
Out[90]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
# Sum along axis=1 for final o/p
In [91]: a[:l].reshape(-1,c,a.shape[1]).sum(axis=1)
Out[91]:
array([[ 5, 7, 9],
[17, 19, 21]])

Why does tensordot/reshape not agree with kron?

If I define a array X with shape (2, 2):
X = np.array([[1, 2], [3, 4]])
and take the kronecker product, then reshape the output using
np.kron(X, X).reshape((2, 2, 2, 2))
I get a resulting matrix:
array([[[[ 1, 2],
[ 2, 4]],
[[ 3, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 4, 8]],
[[ 9, 12],
[12, 16]]]])
However, when I use np.tensordot(X, X, axes=0) the following matrix is output
array([[[[ 1, 2],
[ 3, 4]],
[[ 2, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 9, 12]],
[[ 4, 8],
[12, 16]]]])
which is different from the first output. Why is this the case? I found this while searching for answers, however I don't understand why that solution works or how to generalise to higher dimensions.
My first question is, why do you expect them to be same?
Let's do the kron without reshaping:
In [403]: X = np.array([[1, 2],
...: [3, 4]])
...:
In [404]: np.kron(X,X)
Out[404]:
array([[ 1, 2, 2, 4],
[ 3, 4, 6, 8],
[ 3, 6, 4, 8],
[ 9, 12, 12, 16]])
It's easy to visualize the action.
[X*1, X*2
X*3, X*4]
tensordot normally is thought of as a generalization of np.dot, able to handle more complex situations than the common matrix product (i.e. sum of products on one or more axes). But here there's no summing.
In [405]: np.tensordot(X,X, axes=0)
Out[405]:
array([[[[ 1, 2],
[ 3, 4]],
[[ 2, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 9, 12]],
[[ 4, 8],
[12, 16]]]])
When axes is an integer rather than a tuple, the action is a little tricky to understand. The docs say:
``axes = 0`` : tensor product :math:`a\otimes b`
I just tried to explain what is happening when axes is a scalar (it's not trivial)
How does numpy.tensordot function works step-by-step?
Specifying axes=0 is equivalent to providing this tuple:
np.tensordot(X,X, axes=([],[]))
In any case it's evident from the output that this tensordot is producing the same numbers - but the layout is different from the kron.
I can replicate the kron layout with
In [424]: np.tensordot(X,X,axes=0).transpose(0,2,1,3).reshape(4,4)
Out[424]:
array([[ 1, 2, 2, 4],
[ 3, 4, 6, 8],
[ 3, 6, 4, 8],
[ 9, 12, 12, 16]])
That is I swap the middle 2 axes.
And omitting the reshape, I get the same (2,2,2,2) you get from kron:
np.tensordot(X,X,axes=0).transpose(0,2,1,3)
I like the explicitness of np.einsum:
np.einsum('ij,kl->ijkl',X,X) # = tensordot(X,X,0)
np.einsum('ij,kl->ikjl',X,X) # = kron(X,X).reshape(2,2,2,2)
Or using broadcasting, the 2 products are:
X[:,:,None,None]*X[None,None,:,:] # tensordot 0
X[:,None,:,None]*X[None,:,None,:] # kron

Numpy: replace each element in a row by the maximum of other elements in the same row

Let say we have a 2-D array like this:
>>> a
array([[1, 1, 2],
[0, 2, 2],
[2, 2, 0],
[0, 2, 0]])
For each line I want to replace each element by the maximum of the 2 others in the same line.
I've found how to do it for each column separately, using numpy.amax and an identity array, like this:
>>> np.amax(a*(1-np.eye(3)[0]), axis=1)
array([ 2., 2., 2., 2.])
>>> np.amax(a*(1-np.eye(3)[1]), axis=1)
array([ 2., 2., 2., 0.])
>>> np.amax(a*(1-np.eye(3)[2]), axis=1)
array([ 1., 2., 2., 2.])
But I would like to know if there is a way to avoid a for loop and get directly the result which in this case should look like this:
>>> numpy_magic(a)
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit: after a few hours playing in the console, I've finally come up with the solution I was looking for. Be ready for some mind blowing one line code:
np.amax(a[[range(a.shape[0])]*a.shape[1],:][(np.eye(a.shape[1]) == 0)[:,[range(a.shape[1])*a.shape[0]]].reshape(a.shape[1],a.shape[0],a.shape[1])].reshape((a.shape[1],a.shape[0],a.shape[1]-1)),axis=2).transpose()
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit2: Paul has suggested a much more readable and faster alternative which is:
np.max(a[:, np.where(~np.identity(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
After timing these 3 alternatives, both Paul's solutions are 4 times faster in every contexts (I've benchmarked for 2, 3 and 4 columns with 200 rows). Congratulations for these amazing pieces of code!
Last Edit (sorry): after replacing np.identity with np.eye which is faster, we now have the fastest and most concise solution:
np.max(a[:, np.where(~np.eye(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
Here are two solutions, one that is specifically designed for max and a more general one that works for other operations as well.
Using the fact that all except possibly one maximums in each row are the maximum of the entire row, we can use argpartition to cheaply find the indices of the largest two elements. Then in the position of the largest we put the value of the second largest and everywhere else the largest value. Works also for more than 3 columns.
>>> a
array([[6, 0, 8, 8, 0, 4, 4, 5],
[3, 1, 5, 0, 9, 0, 3, 6],
[1, 6, 8, 3, 4, 7, 3, 7],
[2, 1, 6, 2, 9, 1, 8, 9],
[7, 3, 9, 5, 3, 7, 4, 3],
[3, 4, 3, 5, 8, 2, 2, 4],
[4, 1, 7, 9, 2, 5, 9, 6],
[5, 6, 8, 5, 5, 3, 3, 3]])
>>>
>>> M, N = a.shape
>>> result = np.empty_like(a)
>>> largest_two = np.argpartition(a, N-2, axis=-1)
>>> rng = np.arange(M)
>>> result[...] = a[rng, largest_two[:, -1], None]
>>> result[rng, largest_two[:, -1]] = a[rng, largest_two[:, -2]]>>>
>>> result
array([[8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 6, 9, 9, 9],
[8, 8, 7, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[9, 9, 7, 9, 9, 9, 9, 9],
[8, 8, 8, 8, 5, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[8, 8, 6, 8, 8, 8, 8, 8]])
This solution depends on specific properties of max.
A more general solution that for example also works for sum instead of max would be. Glue two copies of a together (side-by-side, not on top of each other). So the rows are something like a0 a1 a2 a3 a0 a1 a2 a3. For an index x we can get all but ax by slicing [x+1:x+4]. To do this vectorized we use stride_tricks:
>>> a
array([[2, 6, 0],
[5, 0, 0],
[5, 0, 9],
[6, 4, 4],
[5, 0, 8],
[1, 7, 5],
[9, 7, 7],
[4, 4, 3]])
>>> M, N = a.shape
>>> aa = np.c_[a, a]
>>> ast = np.lib.stride_tricks.as_strided(aa, (M, N+1, N-1), aa.strides + aa.strides[1:])
>>> result = np.max(ast[:, 1:, :], axis=-1)
>>> result
array([[6, 2, 6],
[0, 5, 5],
[9, 9, 5],
[4, 6, 6],
[8, 8, 5],
[7, 5, 7],
[7, 9, 9],
[4, 4, 4]])
# use sum instead of max
>>> result = np.sum(ast[:, 1:, :], axis=-1)
>>> result
array([[ 6, 2, 8],
[ 0, 5, 5],
[ 9, 14, 5],
[ 8, 10, 10],
[ 8, 13, 5],
[12, 6, 8],
[14, 16, 16],
[ 7, 7, 8]])
List comprehension solution.
np.array([np.amax(a * (1 - np.eye(3)[j]), axis=1) for j in range(a.shape[1])]).T
Similar to #Ethan's answer but with np.delete(), np.max(), and np.dstack():
np.dstack([np.max(np.delete(a, i, 1), axis=1) for i in range(a.shape[1])])
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
delete() "filters" out each column successively;
max() finds the row-wise maximum of the remaining two columns
dstack() stacks the resulting 1d arrays
If you have more than 3 columns, note that this will find the maximum of "all other" columns rather than the "2-greatest" columns per row. For example:
a2 = np.arange(25).reshape(5,5)
np.dstack([np.max(np.delete(a2, i, 1), axis=1) for i in range(a2.shape[1])])
array([[[ 4, 4, 4, 4, 3],
[ 9, 9, 9, 9, 8],
[14, 14, 14, 14, 13],
[19, 19, 19, 19, 18],
[24, 24, 24, 24, 23]]])

Categories

Resources