Given a mask numpy array such as:
mask = np.array([0, 0, 1, 0, 0, 0, 1, ...])
I want to replace each 1 with a target vector. Example:
target = np.array([5, 4, 3, 2, 1])
mask = np.array([0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,...])
output = np.array([0, 0, 5, 4, 3, 2, 1, 0, 5, 4, 3, 2, ...])
# Overlaps:
mask = np.array([0, 0, 1, 0, 0, 0, 1, 0, 0, 0,...])
output = np.array([0, 0, 5, 4, 3, 2, 5, 4, 3, 2, ...])
Naivly, one can write this via the following (ignoring boundary problems):
output = np.zeros_like(mask)
for i, x in enumerate(mask):
if x == 1:
output[i:i+len(target)] = target
I'm wondering, whether this is possible without resorting to a for loop?
numpy supports assigning value for the same index multiple times in one go, like so:
mask = np.array([0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0])
padding_idx = [2,3,4,5,6,5,6,7,8,9,8,9,10,11]
padding_values = [5,4,3,2,1,5,4,3,2,1,5,4,3,2]
mask[padding_idx] = padding_values
>>> mask
array([0, 0, 5, 4, 3, 5, 4, 3, 5, 4, 3, 2])
You just need to find out padding_idx and padding_values.
Note that padding_values = [5,4,3,2,1,5,4,3,2,1,5,4,3,2] has one value missing. So you need also to find a number of values missing. After that you can use broadcasting
vector = np.array([5,4,3,2,1])
N = len(vector)
mask = np.array([0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0])
idx = np.flatnonzero(mask)
missing_values = len(mask) - idx[-1] - N
#Broadcast
padding_idx = np.flatnonzero(mask)[:,None] + np.arange(N)
padding_values = np.repeat(vector[np.newaxis, :], len(idx), axis=0)
#Flatten
padding_idx = padding_idx.ravel()[:missing_values]
padding_values = padding_values.ravel()[:missing_values]
#Go!
mask[padding_idx] = padding_values
>>> mask
array([0, 0, 5, 4, 3, 5, 4, 3, 5, 4, 3, 2])
Not a full answer, but some thoughts: The for loop is O(n), where n = len(mask). We can use np.split to get that down to O(k), where k = number of 1s in mask:
def set_target(mask, target):
output = []
i, = np.where(mask == 1)
for split in np.split(mask, i):
if len(split) > len(target):
split[:len(target)] = target
output.append(split)
else:
output.append(target[:len(split)])
return np.concatenate(output, 0)
Related
I want to find the frequency of the values of one array (arr1) given another array (arr2). They are both one-dimensional, and arr2 is sorted and has no repeating elements.
Example:
arr1 = np.array([1, 0, 3, 0, 3, 0, 3, 0, 8, 0, 1, 8, 0])
arr2 = np.array([0, 1, 2, 8])
The output should be: freq= np.array([6, 2, 0, 2)]
What I was trying was this:
arr2, freq = np.unique(arr1, return_counts=True)
But this method doesn't output values that have frequency of 0.
One way to do it can be like below:
import numpy as np
arr1 = np.array([1, 0, 3, 0, 3, 0, 3, 0, 8, 0, 1, 8, 0])
arr2 = np.array([0, 1, 2, 8])
arr3, freq = np.unique(arr1, return_counts=True)
dict_ = dict(zip(arr3, freq))
freq = np.array([dict_[i] if i in dict_ else 0 for i in arr2])
freq
Output:
[6, 2, 0, 2]
Alternative One-liner Solution
import numpy as np
arr1 = np.array([1, 0, 3, 0, 3, 0, 3, 0, 8, 0, 1, 8, 0])
arr2 = np.array([0, 1, 2, 8])
freq = np.array([np.count_nonzero(arr1 == i) for i in arr2])
I need to filter out short nonzero series, that lies between zeros. For example, this array:
t = np.array([1, 3, 1, 0, 0, 1, 8, 3, 0, 8, 2, 4, 7, 0,0,4,1])
should become:
array([1, 3, 1, 0, 0, 0, 0, 0, 0, 8, 2, 4, 7, 0, 0, 4, 1])
I found the first indices of non zero sequanceses, and counted num of non zeros between them. I wrote the following, It works, but look awful. I tried staf but got an errors.
How to rewrite it pythonicly ?
minseq = 4 # length of minimal non zero seq
p = np.where(fhr>0, 1, 0).astype(int)
s = np.array([1]+ list(np.diff(p)))
sind = np.where(s==1)[0][1:]
print(sind)
for i in range(len(sind) - 1):
s1 = sind[i]
e1 = sind[i+1]
subfhr = np.where(fhr[s1:e1] > 0, 1, 0).sum()
if (subfhr < minseq):
print(s1, e1, subfhr)
fhr[s1:e1] = 0
out:
[ 5 9 15]
5 9 3
array([1, 3, 1, 0, 0, 0, 0, 0, 0, 8, 2, 4, 7, 0, 0, 4, 1])
You can use image-processing based binary_closing -
from scipy.ndimage.morphology import binary_closing
def remove_small_nnz(a, W):
K = np.ones(W, dtype=int)
m = a==0
p = binary_closing(m,K)
a[~m & p] = 0
return a
Sample run -
In [97]: a
Out[97]: array([1, 3, 1, 0, 0, 1, 8, 3, 0, 8, 2, 4, 7, 0, 0, 4, 1])
In [98]: remove_small_nnz(a, W=3)
Out[98]: array([1, 3, 1, 0, 0, 1, 8, 3, 0, 8, 2, 4, 7, 0, 0, 4, 1])
In [99]: remove_small_nnz(a, W=4)
Out[99]: array([1, 3, 1, 0, 0, 0, 0, 0, 0, 8, 2, 4, 7, 0, 0, 4, 1])
In [100]: remove_small_nnz(a, W=5)
Out[100]: array([1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 1])
Since you're only looking for nonzeros, you can cast the array to boolean, and look for spots where there is a sequence of however many Trues in a row as you're looking for.
import numpy as np
def orig(fhr, minseq):
p = np.where(fhr>0, 1, 0).astype(int)
s = np.array([1]+ list(np.diff(p)))
sind = np.where(s==1)[0][1:]
for i in range(len(sind) - 1):
s1 = sind[i]
e1 = sind[i+1]
subfhr = np.where(fhr[s1:e1] > 0, 1, 0).sum()
if (subfhr < minseq):
fhr[s1:e1] = 0
return fhr
def update(fhr, minseq):
# convert the sequence to boolean
nonzero = fhr.astype(bool)
# stack the boolean array with lagged copies of itself
seqs = np.stack([nonzero[i:-minseq+i] for i in range(minseq)],
axis=1)
# find the spots where the sequence is long enough
inseq = np.r_[np.zeros(minseq, np.bool), seqs.sum(axis=1) == minseq]
# the start and end of the series is are assumed to be included in result
inseq[minseq] = True
inseq[-1] = True
# make sure that the full sequence is included.
# There may be a way to vectorize this further
for ind in np.where(inseq)[0]:
inseq[ind-minseq:ind] = True
# Apply the inseq array as a mask
return inseq * fhr
fhr = np.array([1, 3, 1, 0, 0, 1, 8, 3, 0, 8, 2, 4, 7, 0,0,4,1])
minseq = 4
print(np.all(orig(fhr, minseq) == update(fhr, minseq)))
# True
I have a numpy binary array like this:
Array A = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
I would like to count how many 0s are there at the left of each 1, and return it in an other array that would look like this for this for this example:
nb_1s = [0, 0, 1, 2, 2, 5]
There are no 0s at the left for the two first 1s so the the first two numbers of the array are 0 etc...
I know that first I have to initiate an array with number of 1s in my array:
def give_zeros(binary_array):
binary_array = np.asarray(binary_array)
nb_zeros = np.zeros(binary_array.sum())
return nb_zeros
But I'm not sure on how to count the number of zeros. Should I iterate in a for loop with 'nditer'? It doesn't seem efficient as i will have to run this function on very large arrays.
Do you have any ideas?
Thank you.
Code
You could use:
(A == 0).cumsum()[A > 0]
# array([0, 0, 1, 2, 2, 5])
or:
(~A).cumsum()[A]
# array([0, 0, 1, 2, 2, 5])
if A is a bool array.
Explanation
A == 0 is a boolean array which is True for each 0:
>>> import numpy as np
>>> A = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
>>> A == 0
array([False, False, True, False, True, False, False, True, True,
True, False, True, True, True, True], dtype=bool)
You can use cumsum() to count the number of Trues:
>>> (A == 0).cumsum()
array([0, 0, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9])
You only need the values where A > 0:
>>> (A == 0).cumsum()[A > 0]
array([0, 0, 1, 2, 2, 5])
Done!
Here's a vectorized way with differentiation of range array from the indices of 1s -
def leftzeros_count(a):
idx = np.flatnonzero(a!=0)
return idx - np.arange(len(idx))
Sample runs -
In [298]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [299]: leftzeros_count(a)
Out[299]: array([0, 0, 1, 2, 2, 5])
In [300]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [301]: leftzeros_count(a)
Out[301]: array([1, 1, 2, 3, 3, 6])
In [302]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1])
In [303]: leftzeros_count(a)
Out[303]: array([ 1, 1, 2, 3, 3, 6, 10])
Runtime test
For the timings, let's tile the given sample a large number of times and time the vectorized approaches -
In [7]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [8]: a = np.tile(a,100000)
# #Eric Duminil's soln
In [9]: %timeit (a == 0).cumsum()[a > 0]
100 loops, best of 3: 10.9 ms per loop
# Proposed in this post
In [10]: %timeit leftzeros_count(a)
100 loops, best of 3: 3.71 ms per loop
In the non-vectorized manner:
>>> x = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
>>> c, y = 0, []
>>> for i in x:
... if i == 1:
... y.append(c)
... else:
... c += 1
...
>>> y
[0, 0, 1, 2, 2, 5]
For vectorized solution, see #Divakar's answer:
In numpy, first find the non-zero indices, with np.nonzero():
>>> np.nonzero(x)[0]
array([ 0, 1, 3, 5, 6, 10])
Then subtract that with the range array of length of indices:
>>> idx = np.nonzero(x)[0]
>>> np.arange(len(idx))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(len(idx))
array([0, 0, 1, 2, 2, 5])
>>> np.arange(x.count(1))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(x.count(1))
array([0, 0, 1, 2, 2, 5])
If the count is cumulative (as per your example) then you can do this easily in O(n). Simply have a counter that increases by one every time you find a zero and then append the value of the counter variable to another array for every one you hit in your initial array.
I have a numpy array and would like to obtain the indexes of the elements that verify a common property. For example, suppose the array is np.array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1]), and I want to have the indexes of all elements equal to 1, so the output would be [0, 4, 5, 8, 10, 14].
I have defined the following procedure
def find_indexes(A):
res = []
for i in range(len(A)):
if A[i] == 1:
res.append(i)
return res
Is there a more "pythonesque" way of doing this? More specifically, I am wondering if there is something similar to boolean indexing:
A[A>=1]
that would return the indexes of the elements rather than the elements themselves.
use np.where.
import numpy as np
x = np.array(np.array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1])
indices, = np.where(x == 1)
print(indices)
Use numpy.where
arr = np.array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1])
print np.where(arr == 1)
(array([ 0, 4, 5, 8, 10, 14]),)
List comprehension for pure python:
ar = [i for i in range(len(a)) if a[i] == 1]
I have an MxN array. I want to zero out all the values after an element in a row is zero or less.
For example the 2x12 array
111110011111
112321341411
should turn into
111110000000
112321341411
Thanks!
It may not be the most efficient method, but I've used np.cumsum for these types of things.
>>> import numpy as np
>>> dat = np.array([[1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1], ])
>>> dat[np.cumsum(dat <= 0, 1, dtype='bool')] = 0
>>> print(dat)
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1]])
#Jaime just pointed out that, np.logical_or.accumulate(dat <= 0, axis=1), is probably better than np.cumsum.
May be you or someone else need alternative solution without using numpy.
>>> dat = ['111110011111','112321341411','000000000000', '123456789120']
>>> def zero(dat):
result = []
for row in dat:
pos = row.find('0')
if pos > 0:
result.append(row[0:pos] + ('0' * (len(row) - pos)))
else:
result.append(row)
return result
>>> res = zero(dat)
>>> res
['111110000000', '112321341411', '000000000000', '123456789120']
>>> dat
['111110011111', '112321341411', '000000000000', '123456789120']