Can anyone recommend a way to do a reverse cumulative sum on a numpy array?
Where 'reverse cumulative sum' is defined as below (I welcome any corrections on the name for this procedure):
if
x = np.array([0,1,2,3,4])
then
np.cumsum(x)
gives
array([0,1,3,6,10])
However, I would like to get
array([10,10,9,7,4]
Can anyone suggest a way to do this?
This does it:
np.cumsum(x[::-1])[::-1]
You can use .flipud() for this as well, which is equivalent to [::-1]
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flipud.html
In [0]: x = np.array([0,1,2,3,4])
In [1]: np.flipud(np.flipud(x).cumsum())
Out[1]: array([10, 10, 9, 7, 4]
.flip() is new as of NumPy 1.12, and combines the .flipud() and .fliplr() into one API.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flip.html
This is equivalent, and has fewer function calls:
np.flip(np.flip(x, 0).cumsum(), 0)
The answers given so far seem to be all inefficient if you want the result stored in the original array. As well, if you want a copy, keep in mind this will return a view not a contiguous array and np.ascontiguousarray() is still needed.
How about
view=np.flip(x, 0)
np.cumsum(view, 0, out=view)
#x contains the reverse cumsum result and remains contiguous and unflipped
This modifies the flipped view of x which writes the data properly in reverse order back into the original x variable. It requires no non-contiguous views at the end of execution and is about as speed efficient as possible. I am guessing numpy will never add a reversecumsum method namely because the technique I describe is so trivially and efficiently possible. Albeit, it might be ever so slightly more efficient to have the explicit method.
Otherwise if a copy is desired, then the extra flip is required AND conversion back to a contiguous array, mainly if it will be used in many vector operations thereafter. A tricky part of numpy, but views and contiguity are something to be careful with if you are seriously interested in performance.
Related
I'm just learning python, but have decided to do so by recoding and improving some old java based school AI project.
My project involved a mathematical operation that is basically a discrete convolution operation, but without one of the functions time reversed.
So, while in my original java project I just wrote all the code to do the operation myself, since I'm working in python, and it's got great math libraries like numpy and scipy, I figured I could just make use of an existing convolution function like scipy.convolve. However, this would require me to pre-reverse one of the two arrays so that when scipy.convolve runs, and reverses one of the arrays to perform the convolution, it's really un-reversing the array. (I also still don't know how I can be sure to pre-reverse the right one of the two arrays so that the two arrays are still slid past each other both forwards rather than both backwards, but I assume I should ask that as a separate question.)
Unlike my java code, which only handled one dimensional data, I wanted to extend this project to multidimensional data. And so, while I have learned that if I had a numpy array of known dimension, such as a three dimensional array a, I could fully reverse the array (or rather get back a view that is reversed, which is much faster), by
a = a(::-1, ::-1, ::-1)
However, this requires me to have a ::-1 for every dimension. How can I perform this same reversal within a method for an array of arbitrary dimension that has the same result as the above code?
You can use np.flip. From the documentation:
numpy.flip(m, axis=None)
Reverse the order of elements in an array along the given axis.
The shape of the array is preserved, but the elements are reordered.
Note: flip(m) corresponds to m[::-1,::-1,...,::-1] with ::-1 at all positions.
This is a possible solution:
slices = tuple([slice(-1, -n-1, -1) for n in a.shape])
result = a[slices]
extends to arbitrary number of axes. Verification:
a = np.arange(8).reshape(2, 4)
slices = tuple([slice(-1, -n-1, -1) for n in a.shape])
result = a[slices]
yields:
>>> a
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> result
array([[7, 6, 5, 4],
[3, 2, 1, 0]])
I couldn't find the solution to a performance enhancement problem.
I have a 1D array and I would like to compute sums over sliding windows of indices, here is an example code:
import numpy as np
input = np.linspace(1, 100, 100)
list_of_indices = [[0, 10], [5, 15], [45, 50]] #just an example
output = np.array([input[idx[0]: idx[1]].sum() for idx in list_of_indices])
The computation of the output array is extremely slow compared to numpy vectorised built-in functions.
In real life my list_of_indices contains tens of thousands [lower bound, upper bound] pairs, and this loop is definitely the bottle-neck of a high performance python script.
How to deal with this, using numpy internal functions: like masks, clever np.einsum, or other stuff like these ?
Since I work in HPC field, I am also concerned by memory consumption.
Does anyone have an answer for this problem while respecting the performance requirements?
If:
input is about the same length as output or shorter
The output values have similar magnitude
...you could create a cumsum of your input values. Then the summations turn into subtractions.
cs = np.cumsum(input, dtype=float32) # or float64 if you need it
loi = np.array(list_of_indices, dtype=np.uint16)
output = cs[loi[:,1]] - cs[loi[:,0]]
The numerical hazard here is loss of precision if input has runs of large and tiny values. Then cumsum may not be accurate enough for you.
Here's a simple approach to try: Keep the same solution structure as you already have, which presumably works. Just make the storage creation and indexing more efficient. If you are summing many elements from input for most indexes, the summation ought to take more time than the for looping. For example:
# Put all the indices in a nice efficient structure:
idxx = np.hstack((np.array(list_of_indices, dtype=np.uint16),
np.arange(len(list_of_indices), dtype=np.uint16)[:,None]))
# Allocate appropriate data type to the precision and range you need,
# Do it in one go to be time-efficient
output = np.zeros(len(list_of_indices), dtype=np.float32)
for idx0, idx1, idxo in idxx:
output[idxo] = input[idx0:idx1].sum()
If len(list_if_indices) > 2**16, use uint32 rather than uint16.
I know that this question party has been answered, but I am looking specifically at numpy and scipy. Say I have a grid
lGrid = linspace(0.1, 8, 50)
and I want to find the index that corresponds best to 2, I do
index = abs(lGrid-2).argmin()
lGrid[index]
2.034
However, what if I have a whole matrix of values instead of 2 here. I guess iteration is pretty slow. abs(lGrid-[2,4]) however will fail due to shape issues. I will need a solution that is easily extendable to N-dim matrices. What is the best course of action in this environment?
You can use broadcasting:
from numpy import arange,linspace,argmin
vals = arange(30).reshape(2,5,3) #your N-dimensional input, like array([2,4])
lGrid = linspace(0.1, 8, 50)
result = argmin(abs(lGrid-vals[...,newaxis]),axis=-1)
for example, with input vals = array([2,4]), you obtain result = array([12, 24]) and lGrid[result]=array([ 2.03469388, 3.96938776])
You "guess that Iteration is pretty slow", but I guess it isn't. So I would just just iterate over the "whole Matrix of values instead of 2". Perhaps:
for val in BigArray.flatten():
index = abs(lGrid-val).argmin()
yield lGrid[index]
If lGrid is failry large, then the overhead of iterating in a Python for loop is probably not big in comparison to the vecotirsed operation Happening inside it.
There might be a way you can use broadcasting and reshaping to do the whole thing in one giant operation, but would be complicated, and you might accidentally allocate such a huge array that your machine slows down to a crawl.
I have need to slice an array where I would like zero to be assumed for every dimension except the first.
Given an array:
x = numpy.zeros((3,3,3))
I would like the following behavior, but without needing to know the number of dimensions before hand:
y = a[:,0,0]
Essentially I am looking for something that would take the place of Ellipsis, but instead of expanding to the needed number of : objects, it would expand into the needed number of zeros.
Is there anything built in for this? If not, what is the best way to get the functionality that I need?
Edit:
One way to do this is to use:
y = x.ravel(0:temp.shape[0])
This works fine, however in some cases (such as mine) ravel will need to create a copy of the array instead of a view. Since I am working with large arrays, I want a more memory efficient way of doing this.
You could create a indexing tuple, like this:
x = arange(3*3*3).reshape(3,3,3)
s = (slice(None),) + (0,)*(x.ndim-1)
print x[s] # array([ 0, 9, 18])
print x[:,0,0] # array([ 0, 9, 18])
I guess you could also do:
x.transpose().flat[:3]
but I prefer the first approach, since it works for any dimension (rather than only the first), and it's obviously equally efficient to just writing x[:,0,0], since it's just a different syntax.
I usually use tom10's method, but here's another:
for i in range(x.ndim-1):
x = x[...,0]
I have the following challenge in a simulation for my PhD thesis:
I need to optimize the following code:
repelling_forces = repelling_force_prefactor * np.exp(-(height_r_t/potential_steepness))
In this code snippet 'height_r_t' is a real Numpy array and 'potential_steepness' is an scalar. 'repelling_force_prefactor' is also a Numpy array, which is mostly ZERO, but ONE at pre-calculated position, which do NOT change during runtime (i.e. a Mask).
Obviously the code is inefficient as it would make much more sense to only calculate the exponential function at the positions, where 'repelling_force_prefactor' is non-zero.
The question is how do I do this in the most efficient manner?
The only idea I have up to now would be to define slice to 'height_r_t' using 'repelling_force_prefactor' and apply 'np.exp' to those slices. However, I have made the experience that slicing is slow (not sure if this is generally correct) and the solution seems awkward.
Just as a side-note the ration of 1's to 0's in 'repelling_force_prefactor' is about 1/1000 and I am running this in loop, so efficiency is very important.
(Comment: I wouldn't have a problem with resorting to Cython, as I will need/want to learn it at some point anyway... but I am a novice, so I'd need a good pointer/explanation.)
masked arrays are implemented exactly for your purposes.
Performance is the same as Sven's answer:
height_r_t = np.ma.masked_where(repelling_force_prefactor == 0, height_r_t)
repelling_forces = np.ma.exp(-(height_r_t/potential_steepness))
the advantage of masked arrays is that you do not have to slice and expand your array, the size is always the same, but numpy automatically knows not to compute the exp where the array is masked.
Also, you can sum array with different masks and the resulting array has the intersection of the masks.
Slicing is probably much faster than computing all the exponentials. Instead of using the mask repelling_force_prefactor for slicing directly, I suggest to precompute the indices where it is non-zero and use them for slicing:
# before the loop
indices = np.nonzero(repelling_force_prefactor)
# inside the loop
repelling_forces = np.exp(-(height_r_t[indices]/potential_steepness))
Now repelling_forces will contain only the results that are non-zero. If you have to update some array of the original shape of height_r_t with this values, you can use slicing with indices again, or use np.put() or a similar function.
Slicing with the list of indices will be more efficient than slicing with a boolean mask in this case, since the list of indices is shorter by a factor thousand. Actually measuring the performance is of course up to you.