Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.
Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?
for example:
arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr
[1 1 1 2 2 4 6 8 8 8 8 8 2]
A way to do it would be with this function:
def fill_zeros_with_last(arr):
last_val = None # I don't really care about the initial value
for i in range(arr.size):
if arr[i]:
last_val = arr[i]
elif last_val is not None:
arr[i] = last_val
However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.
If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...
Any ideas? or should we go straight to Cython?
Disclaimer:
I would say long ago I found a question in stackoverflow asking something like this or very similar. I wasn't able to find it. :-(
Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...
Here's a solution using np.maximum.accumulate:
def fill_zeros_with_last(arr):
prev = np.arange(len(arr))
prev[arr == 0] = 0
prev = np.maximum.accumulate(prev)
return arr[prev]
We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
Then prev looks like:
array([ 0, 0, 0, 3, 3, 5, 6, 7, 7, 7, 7, 7, 12])
Then we just index into arr with prev and we obtain our result. A test:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
Note: Be careful to understand what this does when the first entry of your array is zero:
>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])
Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:
def fill_zeros_with_last(arr, initial=0):
ind = np.nonzero(arr)[0]
cnt = np.cumsum(np.array(arr, dtype=bool))
return np.where(cnt, arr[ind[cnt-1]], initial)
I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)
If the 0s only come in strings of 1, this use of nonzero might work:
In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])
I can handle your arr by applying this repeatedly until I is empty.
In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
In [287]: while True:
.....: I=np.nonzero(arr==0)[0]
.....: if len(I)==0: break
.....: arr[I] = arr[I-1]
.....:
In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.
Related
Is there any efficient way to replace all zeros in a tensor with the last non-zero value in torch?
For example if I had the tensor:
tensor([[1, 0, 0, 4, 0, 5, 0, 0],
[0, 3, 0, 6, 0, 0, 8, 0]])
The output should be:
tensor([[1, 1, 1, 4, 4, 5, 5, 5],
[0, 3, 3, 6, 6, 6, 8, 8]])
I currently have the following code:
def replace_zeros_with_prev_nonzero(tensor):
output = tensor.clone()
for i in range(len(output)):
prev_value = 0
for j in range(len(tensor[i])):
if tensor[i,j] == 0:
output[i,j] = prev_value
else:
prev_value = tensor[i,j].item()
return output
But it feels though a bit clunky and I'm sure there has to be a better way to do this. So is it possible to write it in fewer lines, or better yet parallelise the operation without treating the tensors as arrays?
You can remove one of the loops by vectorising over 1st dimension.
def replace_zeros_with_prev_nonzero(tensor):
output = tensor.clone()
for i in range(1, tensor.shape[1]):
mask = tensor[:, i] == 0
output[mask, i] = output[mask, i-1]
return output
output[mask, i] = output[mask, i-1] replaces 0 with the previous value (which itself will be replaced if 0 originally except for 0th index).
I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?
Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))
_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.
Say I have a 1D numpy array of numbers myArray = ([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0 ,1, 2, 1, 1, 1]).
I want to create a 2D numpy array that describe the first (column 1) and last (column 2) indices of any "streak" of consecutive 1's that is longer than 2.
So for the example above, the 2D array should look like this:
indicesArray =
([5, 8],
[13, 15])
Since there are at least 3 consecutive ones in the 5th, 6th, 7th, 8th places and in the 13th, 14th, 15th places.
Any help would be appreciated.
Approach #1
Here's one approach inspired by this post -
def start_stop(a, trigger_val, len_thresh=2):
# "Enclose" mask with sentients to catch shifts later on
mask = np.r_[False,np.equal(a, trigger_val),False]
# Get the shifting indices
idx = np.flatnonzero(mask[1:] != mask[:-1])
# Get lengths
lens = idx[1::2] - idx[::2]
return idx.reshape(-1,2)[lens>len_thresh]-[0,1]
Sample run -
In [47]: myArray
Out[47]: array([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 1, 2, 1, 1, 1])
In [48]: start_stop(myArray, trigger_val=1, len_thresh=2)
Out[48]:
array([[ 5, 8],
[13, 15]])
Approach #2
Another with binary_erosion -
from scipy.ndimage.morphology import binary_erosion
mask = binary_erosion(myArray==1,structure=np.ones((3)))
idx = np.flatnonzero(mask[1:] != mask[:-1])
out = idx.reshape(-1,2)+[0,1]
Let's say I have
arr = np.arange(6)
arr
array([0, 1, 2, 3, 4, 5])
and I decide that I want to treat an array "like a circle": When I run out of material at the end, I want to start at index 0 again. That is, I want a convenient way of selecting x elements, starting at index i.
Now, if x == 6, I can simply do
i = 3
np.hstack((arr[i:], arr[:i]))
Out[9]: array([3, 4, 5, 0, 1, 2])
But is there a convenient way of generally doing this, even if x > 6, without having to manually breaking the array apart and thinking through the logic?
For example:
print(roll_array_arround(arr)[2:17])
should return.
array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])
See mode='wrap' in ndarray.take:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
Taking your hypothetical function:
print(roll_array_arround(arr)[2:17])
If it is implied that it is a true slice of the original array that you are after, that is not going to happen; a wrapped-around array cannot be expressed as a strided view of the original; so if you seek a function that maps an ndarray to an ndarray, this will necessarily involve a copy of your data.
That is, efficiency-wise, you shouldnt expect to find solution that significantly differs in performance from the expression below.
print(arr.take(np.arange(2,17), mode='wrap'))
Modulus operation seems like the best fit here -
def rolling_array(n, x, i):
# n is rolling period
# x is length of array
# i is starting number
return np.mod(np.arange(i,i+x),n)
Sample runs -
In [61]: rolling_array(n=6, x=6, i=3)
Out[61]: array([3, 4, 5, 0, 1, 2])
In [62]: rolling_array(n=6, x=17, i=2)
Out[62]: array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])
A solution you can look into would probably be :
from itertools import cycle
list_to_rotate = np.array([1,2,3,4,5])
rotatable_list = cycle(list_to_rotate)
You need to roll your array.
>>> x = np.arange(10)
>>> np.roll(x, 2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
See numpy documentation for more details.
I cut out the zeros of a numpy array, do some stuff and want to insert them back in visual purposes. I do have the indices of the sections and tried to insert the zeros back in with numpy.insert and zip but the index runs out of bounds, even though I start at the lower end. Example:
import numpy as np
a = np.array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])
a = a[a != 0] # cut zeros out
zero_start = [3, 9, 13]
zero_end = [5, 10, 15]
# Now insert the zeros back in using the former indices
for ev in zip(zero_start, zero_end):
a = np.insert(a, ev[0], np.zeros(ev[1]-ev[0]))
>>> IndexError: index 13 is out of bounds for axis 0 with size 12
Seems like he is not refreshing the array size inside the loop. Any suggestions or other (more pythonic) approaches to solve this problem?
Approach #1: Using indexing -
# Get all zero indices
idx = np.concatenate([range(i,j+1) for i,j in zip(zero_start,zero_end)])
# Setup output array of zeros
N = len(idx) + len(a)
out = np.zeros(N,dtype=a.dtype)
# Get mask of non-zero places and assign values from a into those
out[~np.in1d(np.arange(N),idx)] = a
We can also generate the actual indices where a had non-zeros originally and then assign. Thus, the last step of masking could be replaced with something like this -
out[np.setdiff1d(np.arange(N),idx)] = a
Approach #2: Using np.insert given zero_start and zero_end as arrays -
insert_start = np.r_[zero_start[0], zero_start[1:] - zero_end[:-1]-1].cumsum()
out = np.insert(a, np.repeat(insert_start, zero_end - zero_start + 1), 0)
Sample run -
In [755]: a = np.array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])
...: a = a[a != 0] # cut zeros out
...: zero_start = np.array([3, 9, 13])
...: zero_end = np.array([5, 10, 15])
...:
In [756]: s0 = np.r_[zero_start[0], zero_start[1:] - zero_end[:-1]-1].cumsum()
In [757]: np.insert(a, np.repeat(s0, zero_end - zero_start + 1), 0)
Out[757]: array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])