How to make numpy.cumsum start after the first value

How to make numpy.cumsum start after the first value - python

I have:
import numpy as np
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7, ..., 4])
x = (B/position**2)*dt
A = np.cumsum(x)
assert A[0] == 0 # I want this to be true.
Where B and dt are scalar constants. This is for a numerical integration problem with initial condition of A[0] = 0. Is there a way to set A[0] = 0 and then do a cumsum for everything else?

I don't understand what exactly your problem is, but here are some things you can do to have A[0] = 0.
You can create A to be longer by one index to have the zero as the first entry:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.zeros(len(position) + 1)
A[1:] = np.cumsum((B/position**2)*dt)
Result:
A = [ 0. 0.0625 0.11559096 0.16105356 0.20073547 0.23633533 0.26711403]
len(A) == len(position) + 1
Alternatively, you can manipulate the calculation to substract the first entry of the result:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.cumsum((B/position**2)*dt)
A = A - A[0]
Result:
[ 0. 0.05309096 0.09855356 0.13823547 0.17383533 0.20461403]
len(A) == len(position)
As you see, the results have different lengths. Is one of them what you expect?

1D cumsum
A wrapper around np.cumsum that sets first element to 0:
def cumsum(pmf):
cdf = np.empty(len(pmf) + 1, dtype=pmf.dtype)
cdf[0] = 0
np.cumsum(pmf, out=cdf[1:])
return cdf
Example usage:
>>> np.arange(1, 11)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> cumsum(np.arange(1, 11))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55])
N-D cumsum
A wrapper around np.cumsum that sets first element to 0, and works with N-D arrays:
def cumsum(pmf, axis=None, dtype=None):
if axis is None:
pmf = pmf.reshape(-1)
axis = 0
if dtype is None:
dtype = pmf.dtype
idx = [slice(None)] * pmf.ndim
# Create array with extra element along cumsummed axis.
shape = list(pmf.shape)
shape[axis] += 1
cdf = np.empty(shape, dtype)
# Set first element to 0.
idx[axis] = 0
cdf[tuple(idx)] = 0
# Perform cumsum on remaining elements.
idx[axis] = slice(1, None)
np.cumsum(pmf, axis=axis, dtype=dtype, out=cdf[tuple(idx)])
return cdf
Example usage:
>>> np.arange(1, 11).reshape(2, 5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> cumsum(np.arange(1, 11).reshape(2, 5), axis=-1)
array([[ 0, 1, 3, 6, 10, 15],
[ 0, 6, 13, 21, 30, 40]])

I totally understand your pain, I wonder why Numpy doesn't allow this with np.cumsum. Anyway, though I'm really late and there's already another good answer, I prefer this one a bit more:
np.cumsum(np.pad(array, (1, 0), "constant"))
where array in your case is (B/position**2)*dt. You can change the order of np.pad and np.cumsum as well. I'm just adding a zero to the start of the array and calling np.cumsum.

You can use roll (shift right by 1) and then set the first entry to zero.

Related

Python Numpy - Slicing assignment not assigning correctly

I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:
i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])
And here's what prints:
[3600 3609 3608 ... 3600 3611 3605]
[ 0 9 8 ... 0 11 5]
[3600 3609 3608 ... 3600 3611 3605]
Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?
edit: reproducible example:
df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
arm_resets[:,[i]] = arm_resets[:,[i-1]] + freq
#over_360 = arm_resets[:,[i]] >= periods
#arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets
Given commented out code here's what prints:
array([[ 1, 2, 3, 4, 5, 6],
[ 2, 7, 12, 17, 22, 27],
[ 3, 9, 15, 21, 27, 33],
[ 4, 13, 22, 31, 40, 49]])
What I would expect:
array([[ 1, 2, 3, 4, 5, 1],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4]])
Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:
array([[ 0, 1, 1, 1, 1, 1],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 1, 0]])
The code I use to achieve this from the above arm_resets is this:
fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
fin[i,a[i]] = 1

The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:
arm_resets[over_360, [i]] = ...
You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:
arm_resets[over_360, i] = ...
With slicing, even the following should work, since it calls __setitem__ on a view:
arm_resets[:, i][over_360] = ...
This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:
rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]

You can use np.where()
first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)
You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.
Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.
arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)

how to delete rows and columns in numpy python?

I am having trouble creating a function which takes a matrix M as an input and deletes BOTH rows and columns containing the number 0 and giving an output containing the remaining numbers. Any help is much appreciated as I have my programming exam coming up soon.
By "deleting both rows and columns" this is what I mean:

import numpy as np
x = np.array([[1,2,3,4,5],
[6,0,8,9,10],
[11,12,13,14,15],
[16,0,0,19,20]])
idxs_array = list(np.where(x==0))
idxs_array = [list(dict.fromkeys(x)) for x in idxs_array]
for axis, idxs in enumerate(idxs_array):
sub_factor = 0
for idx in idxs:
x = np.delete(x,idx-sub_factor,axis)
sub_factor += 1
print(x)
# x = [[ 1, 4, 5],
# [11, 14, 15]]

1. Locate zero elements
First of all, we need to identify the location of the zero elements in the matrix, which can be done easily with np.where().
np.where will return the row/column indices of the elements matched specific condition (doc).
row_idx, col_idx = np.where(arr == 0)
2. Remove corresponding rows/columns
To remove corresponding rows and columns, there is an easy way to do this, which is indexing (doc).
That is, you can specify the row (or column) you want to keep with True, else it shall be False.
print(np.arange(4)[[True, False, True, False]])
# array([0, 2])
3. Put two things together
Here is a minimal example.
arr = np.array([[ 1, 2, 3, 4, 5],
[ 6, 0, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 0, 0, 19, 20]])
row_idx, col_idx = np.where(arr == 0)
rm_row_idx = set(row_idx.tolist())
rm_col_idx = set(col_idx.tolist())
row_mask = [i not in rm_row_idx for i in range(arr.shape[0])]
col_mask = [i not in rm_col_idx for i in range(arr.shape[1])]
arr = arr[row_mask, :]
arr = arr[:, col_mask]
print(arr)
# Shall be:
# array([[ 1, 4, 5],
# [11, 14, 15]])

matrix: move n-th row by n position efficiently

I have a numpy 2d array and I need to transform it in a way that the first row remains the same, the second row moves by one position to right, (it can wrap around or just have zero padded to the front). Third row shifts 3 positions to the right, etc.
I can do this through a "for loop" but that is not very efficient. I am guessing there should be a filtering matrix that multipled by the original one will have the same effect, or maybe a numpy trick that will help me doing this? Thanks!
I have looked into numpy.roll() but I don't think it can work on each row separately.
import numpy as np
p = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
'''
p = [ 1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16]
desired output:
p'= [ 1 2 3 4
0 5 6 7
0 0 9 10
0 0 0 13]
'''

We can extract sliding windows into a zeros padded version of the input to have a memory efficient approach and hence performant too. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. More info on use of as_strided based view_as_windows.
Hence, the solution would be -
from skimage.util.shape import view_as_windows
def slide_by_one(p):
m,n = p.shape
z = np.zeros((m,m-1),dtype=p.dtype)
a = np.concatenate((z,p),axis=1)
w = view_as_windows(a,(1,p.shape[1]))[...,0,:]
r = np.arange(m)
return w[r,r[::-1]]
Sample run -
In [60]: p # generic sample of size mxn
Out[60]:
array([[ 1, 5, 9, 13, 17],
[ 2, 6, 10, 14, 18],
[ 3, 7, 11, 15, 19],
[ 4, 8, 12, 16, 20]])
In [61]: slide_by_one(p)
Out[61]:
array([[ 1, 5, 9, 13, 17],
[ 0, 2, 6, 10, 14],
[ 0, 0, 3, 7, 11],
[ 0, 0, 0, 4, 8]])
We can leverage the regular rampy pattern to have a more efficient approach with a more raw usage of np.lib.stride_tricks.as_strided, like so -
def slide_by_one_v2(p):
m,n = p.shape
z = np.zeros((m,m-1),dtype=p.dtype)
a = np.concatenate((z,p),axis=1)
s0,s1 = a.strides
return np.lib.stride_tricks.as_strided(a[:,m-1:],shape=(m,n),strides=(s0-s1,s1))
Another one with some masking -
def slide_by_one_v3(p):
m,n = p.shape
z = np.zeros((len(p),1),dtype=p.dtype)
a = np.concatenate((p,z),axis=1)
return np.triu(a[:,::-1],1)[:,::-1].flat[:-m].reshape(m,-1)

Here is a simple method based on zero-padding and reshaping. It is fast because it avoids advanced indexing and other overheads.
def pp(p):
m,n = p.shape
aux = np.zeros((m,n+m-1),p.dtype)
np.copyto(aux[:,:n],p)
return aux.ravel()[:-m].reshape(m,n+m-2)[:,:n].copy()

Minimum distance for each value in array respect to other

I have two numpy arrays of integers A and B. The values in array A and B correspond to time-points at which events A and B occurred. I would like to transform A to contain the time since the most recent event b occurred.
I know I need to subtract each element of A by its nearest smaller the element of B but am unsure of how to do so. Any help would be greatly appreciated.
>>> import numpy as np
>>> A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
>>> B = np.array([5, 10, 15, 20, 25, 30])
Desired Result:
cond_a = relative_timestamp(to_transform=A, reference=B)
cond_a
>>> array([1, 2, 3, 2, 0, 2, 3, 4])

You can use np.searchsorted to find the indices where the elements of A should be inserted in B to maintain order. In other words, you are finding the closest elemet in B for each element in A:
idx = np.searchsorted(B, A, side='right')
result = A-B[idx-1] # substract one for proper index
According to the docs searchsorted uses binary search, so it will scale fine for large inputs.

Here's an approach consisting on computing the pairwise differences. Note that it has a O(n**2) complexity so it might for larger arrays #brenlla's answer will perform much better.
The idea here is to use np.subtract.outer and then find the minimum difference along axis 1 over a masked array, where only values in B smaller than a are considered:
dif = np.abs(np.subtract.outer(A,B))
np.ma.array(dif, mask = A[:,None] < B).min(1).data
# array([1, 2, 3, 2, 0, 2, 3, 4])

As I am not sure, if it is really faster to calculate all pairwise differences, instead of a python loop over each array entry (worst case O(Len(A)+len(B)), the solution with a loop:
A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
B = np.array([5, 10, 15, 20, 25, 30])
def calculate_next_distance(to_transform, reference):
max_reference = len(reference) - 1
current_reference = 0
transformed_values = np.zeros_like(to_transform)
for i, value in enumerate(to_transform):
while current_reference < max_reference and reference[current_reference+1] <= value:
current_reference += 1
transformed_values[i] = value - reference[current_reference]
return transformed_values
calculate_next_distance(A,B)
# array([1, 2, 3, 2, 0, 2, 3, 4])

Applying an operation to the rows of a matrix, except for some rows

I have a numpy matrix M and I need to apply some operations to all the rows of the matrix, except for a determined rows.
For example, suppose I have rows [3,5] whose elements should be avoided from an operation like M[:,8] = 4. So I want to have all the rows of the 8th column to be set to 4, but I want to avoid doing so to rows 3 and 5. How can I do this in numpy?
Edit: basically I need that to avoid a division by zero when doing a normalization by the sum of the elements of a row. Some rows are all zeros, so doing the summation (which is zero) then dividing by the summation will give a division by zero. What I'm doing is that I find out which rows are all zeros and then I want not to do the normalization operation for those specific rows.

Perhaps something like this?
>>> import numpy as np
>>> M = np.arange(32).reshape(8, 4)
>>> ignore = {3, 5}
>>> rest = [i for i in xrange(M.shape[0]) if i not in ignore]
>>> M[rest, 3] = 4
>>> M
array([[ 0, 1, 2, 4],
[ 4, 5, 6, 4],
[ 8, 9, 10, 4],
[12, 13, 14, 15],
[16, 17, 18, 4],
[20, 21, 22, 23],
[24, 25, 26, 4],
[28, 29, 30, 4]])

Based on your edit, in order to solve your specific problem, where you seem to manipulating a matrix with non-negative entries, you may exploit the following trick
import numpy as np
rng = np.random.RandomState(42)
M = rng.randn(10, 10) ** 2
M[[0, 5]] = 0. # set 2 lines to 0
M_norm = M / (M.sum(axis=1) + 1e-18)[:, np.newaxis]
Obviously this result is not exact, but exact enough to not notice the difference. To make it slightly better, you can also write
M_norm = M / np.maximum(M.sum(axis=1), 1e-18)[:, np.newaxis]
If this still isn't sufficient, and you want it exact, for the general case (negativity allowed) you can write
row_sums = M.sum(axis=1)
row_sums[row_sums == 0] = 1.
M_norm = M / row_sums[:, np.newaxis] # dividing the zeros by 1 still yields 0
To add some robustness, you could also do
tolerance = 1e-6
row_sums = M.sum(axis=1)
OK_rows = np.abs(row_sums) > tolerance
M_norm = np.zeros_like(M)
M_norm[OK_rows] = M[OK_rows] / row_sums[OK_rows][:, np.newaxis]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make numpy.cumsum start after the first value - python

You can use roll (shift right by 1) and then set the first entry to zero.

Related

Python Numpy - Slicing assignment not assigning correctly

how to delete rows and columns in numpy python?

matrix: move n-th row by n position efficiently

Minimum distance for each value in array respect to other

Applying an operation to the rows of a matrix, except for some rows

Categories

Resources