python consecutive counts of an occurence with length - python

this is probably really easy to do but I am looking to calculate the length of consecutive positive occurrences in a list in python. For example, I have a and I am looking to return b:
a=[0,0,1,1,1,1,0,0,1,0,1,1,1,0]
b=[0,0,4,4,4,4,0,0,1,0,3,3,3,0]
I note a similar question on Counting consecutive positive value in Python array but this only returns consecutive counts but not the length of the belonging group.
Thanks

This is similar to a run length encoding problem, so I've borrowed some ideas from that Rosetta code page:
import itertools
a=[0,0,1,1,1,1,0,0,1,0,1,1,1,0]
b = []
for item, group in itertools.groupby(a):
size = len(list(group))
for i in range(size):
if item == 0:
b.append(0)
else:
b.append(size)
b
Out[8]: [0, 0, 4, 4, 4, 4, 0, 0, 1, 0, 3, 3, 3, 0]

At last after so many tries came up with these two lines.
In [9]: from itertools import groupby
In [10]: lst=[list(g) for k,g in groupby(a)]
In [21]: [x*len(_lst) if x>=0 else x for _lst in lst for x in _lst]
Out[21]: [0, 0, 4, 4, 4, 4, 0, 0, 1, 0, 3, 3, 3, 0]

Here's one approach.
The basic premise is that when in a consecutive run of positive values, it will remember all the indices of these positive values. As soon as it hits a zero, it will backtrack and replace all the positive values with the length of their run.
a=[0,0,1,1,1,1,0,0,1,0,1,1,1,0]
glob = []
last = None
for idx, i in enumerate(a):
if i>0:
glob.append(idx)
if i==0 and last != i:
for j in glob:
a[j] = len(glob)
glob = []
# > [0, 0, 4, 4, 4, 4, 0, 0, 1, 0, 3, 3, 3, 0]

Related

Replace all zeros with last non-zero value in torch

Is there any efficient way to replace all zeros in a tensor with the last non-zero value in torch?
For example if I had the tensor:
tensor([[1, 0, 0, 4, 0, 5, 0, 0],
[0, 3, 0, 6, 0, 0, 8, 0]])
The output should be:
tensor([[1, 1, 1, 4, 4, 5, 5, 5],
[0, 3, 3, 6, 6, 6, 8, 8]])
I currently have the following code:
def replace_zeros_with_prev_nonzero(tensor):
output = tensor.clone()
for i in range(len(output)):
prev_value = 0
for j in range(len(tensor[i])):
if tensor[i,j] == 0:
output[i,j] = prev_value
else:
prev_value = tensor[i,j].item()
return output
But it feels though a bit clunky and I'm sure there has to be a better way to do this. So is it possible to write it in fewer lines, or better yet parallelise the operation without treating the tensors as arrays?
You can remove one of the loops by vectorising over 1st dimension.
def replace_zeros_with_prev_nonzero(tensor):
output = tensor.clone()
for i in range(1, tensor.shape[1]):
mask = tensor[:, i] == 0
output[mask, i] = output[mask, i-1]
return output
output[mask, i] = output[mask, i-1] replaces 0 with the previous value (which itself will be replaced if 0 originally except for 0th index).

Find runs and lengths of consecutive values in an array

I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?
Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))
_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.

Finding unique sets without subsets in python array

I have a dataset that needs to output boolean style data, just 1 and 0, for true or not true. I am trying to parse simple data sets I've processed to look for a subset of information in a numpy array, the array is about 100,000 elements in one direction and 20 in the other. I only need to search along the 20 axis, but I need to do that for each of the 100,000 entries and get output that I can map.
I've produced an array of this size made up of zeros, with the intention to simply mark the matching index indicator to a 1. A main hitch is that if I find a long set (I'm working with long sets to small sets), I need to NOT include any smaller set that's within it.
Sample:
[0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,1,0,1]
I need to find here that there are 1 group of 5, starting at index 2, and 1 group of 3, starting at index 9, and not return any subset of the group of 5 as though it were a group of 4 or a group of 3, thus leaving the results for all those already covered values. i.e. for groups of 3, the indices 2, 3, 4, 5, and 6 would all remain zero. It doesn't need to be overly efficient, I don't care if it searches anyways, I just need to not keep the result.
Currently I'm using a codeblock basically like this for a simple search:
values = numpy.array([0,1,1,1,1,1,0,0,1,1,1])
searchval = [1,2]
N = len(searchval)
possibles = numpy.where(values == searchval[0])[0]
print(possibles)
solns = []
for p in possibles:
check = values[p:p+N]
if numpy.all(check == searchval):
solns.append(p)
print(solns)
I've been wracking my brain trying to come up with a way to restructure this or similar code to produce the desires results. The end goal is to be searching for groups of 9 down to groups of 3, and having effectively a matrix of 1s and 0s indicating if an index has a group starting on it that is as long as we want.
Hopefully someone can point me to what I'm missing to make this work. Thanks!
Using more_itertools, a third-party library (pip install more_itertools):
import more_itertools as mit
sample = [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1]
groups = [list(c) for c in mit.consecutive_groups((mit.locate(sample)))]
d = {group[0]: len(group) for group in groups}
d
# {2: 5, 9: 3, 15: 1, 17: 1}
This result reads "At index 2 is a group of 5 ones. At group 9 is a group of 3 ones," etc.
Details
more_itertools.locate finds indices for truthy items by default.
more_itertools.consecutive_groups chunks consecutive numbers together.
The result is a dictionary of (starting-index, length) pairs.
As a dictionary, you can extract different kinds of information:
>>> # List of starting indices
>>> list(d)
[2, 9, 15, 17]
>>> # List indices for all lonely groups
>>> [k for k, v in d.items() if v == 1]
[15, 17]
>>> # List indices of groups greater the 2 items
>>> [k for k, v in d.items() if v > 1]
[2, 9]
Here is a numpy solution. I'm using a small example for demonstration but it easily scales (20 x 100,000 takes 25 ms on my rather modest laptop, see timings at the end of this post):
>>> import numpy as np
>>>
>>>
>>> a = np.random.randint(0, 2, (5, 10), dtype=np.int8)
>>> a
array([[0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 1, 1, 1, 0, 0]], dtype=int8)
>>>
>>> padded = np.pad(a,((1,1),(0,0)), 'constant')
# compare array to itself with offset to mark all switches from
# 0 to 1 or from 1 to 0
# then use 'where' to extract the coordinates
>>> colinds, rowinds = np.where((padded[:-1] != padded[1:]).T)
>>>
# the lengths of sets are the differences between switch points
>>> lengths = rowinds[1::2] - rowinds[::2]
# now we have the lengths we are free to throw the off-switches away
>>> colinds, rowinds = colinds[::2], rowinds[::2]
>>>
# admire
>>> from pprint import pprint
>>> pprint(list(zip(colinds, rowinds, lengths)))
[(0, 2, 1),
(1, 0, 2),
(2, 1, 2),
(2, 4, 1),
(3, 2, 1),
(4, 0, 5),
(5, 0, 1),
(5, 2, 1),
(5, 4, 1),
(6, 1, 1),
(6, 3, 2),
(7, 4, 1)]
Timings:
>>> def find_stretches(a):
... padded = np.pad(a,((1,1),(0,0)), 'constant')
... colinds, rowinds = np.where((padded[:-1] != padded[1:]).T)
... lengths = rowinds[1::2] - rowinds[::2]
... colinds, rowinds = colinds[::2], rowinds[::2]
... return colinds, rowinds, lengths
...
>>> a = np.random.randint(0, 2, (20, 100000), dtype=np.int8)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100)
>>> repeat('find_stretches(a)', **kwds)
[2.475784719004878, 2.4715258619980887, 2.4705517270049313]
Something like this?
from collections import defaultdict
sample = [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1]
# Keys are number of consecutive 1's, values are indicies
results = defaultdict(list)
found = 0
for i, x in enumerate(samples):
if x == 1:
found += 1
elif i == 0 or found == 0:
continue
else:
results[found].append(i - found)
found = 0
if found:
results[found].append(i - found + 1)
assert results == {1: [15, 17], 3: [9], 5: [2]}

How can I find the second most common number in an array?

I have tried using scipy.stats mode to find the most common value. My matrix contains a lot of zeros, though, and so this is always the mode.
For example, if my matrix looks like the following:
array = np.array([[0, 0, 3, 2, 0, 0],
[5, 2, 1, 2, 6, 7],
[0, 0, 2, 4, 0, 0]])
I'd like to have the value of 2 returned.
Try collections.Counter:
import numpy as np
from collections import Counter
a = np.array(
[[0, 0, 3, 2, 0, 0],
[5, 2, 1, 2, 6, 7],
[0, 0, 2, 4, 0, 0]]
)
ctr = Counter(a.ravel())
second_most_common_value, its_frequency = ctr.most_common(2)[1]
As mentioned in some comments, you probably are speaking of numpy arrays.
In this case, it is rather simple to mask the value you want to avoid:
import numpy as np
from scipy.stats import mode
array = np.array([[0, 0, 3, 2, 0, 0],
[5, 2, 1, 2, 6, 7],
[0, 0, 2, 4, 0, 0]])
flat_non_zero = array[np.nonzero(array)]
mode(flat_non_zero)
Which returns (array([2]), array([ 4.])) meaning the value appearing the most is 2, and it appears 4 times (see the doc for more info). So if you want to only get 2, you just need to get the first index of the return value of the mode : mode(flat_non_zero)[0][0]
EDIT: if you want to filter another specific value x from array instead of zero, you can use array[array != x]
original_list = [1, 2, 3, 1, 2, 5, 6, 7, 8] #original list
noDuplicates = list(set(t)) #creates a list of all the unique numbers of the original list
most_common = [noDuplicates[0], original_list.count(noDuplicates[0])] #initializes most_most common to
#the first value and count so we have something to start with
for number in noDuplicates: #loops through the unique numbers
if number != 0: #makes sure that we do not check 0
count = original_list.count(number) #checks how many times that unique number appears in the original list
if count > most_common[1] #if the count is greater than the most_common count
most_common = [number, count] #resets most_common to the current number and count
print(str(most_common[0]) + " is listed " + str(most_common[1]) + "times!")
This loops through your list and finds the most used number and prints it with the number of occurrences in your original list.

Fill zero values of 1d numpy array with last non-zero values

Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.
Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?
for example:
arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr
[1 1 1 2 2 4 6 8 8 8 8 8 2]
A way to do it would be with this function:
def fill_zeros_with_last(arr):
last_val = None # I don't really care about the initial value
for i in range(arr.size):
if arr[i]:
last_val = arr[i]
elif last_val is not None:
arr[i] = last_val
However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.
If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...
Any ideas? or should we go straight to Cython?
Disclaimer:
I would say long ago I found a question in stackoverflow asking something like this or very similar. I wasn't able to find it. :-(
Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...
Here's a solution using np.maximum.accumulate:
def fill_zeros_with_last(arr):
prev = np.arange(len(arr))
prev[arr == 0] = 0
prev = np.maximum.accumulate(prev)
return arr[prev]
We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
Then prev looks like:
array([ 0, 0, 0, 3, 3, 5, 6, 7, 7, 7, 7, 7, 12])
Then we just index into arr with prev and we obtain our result. A test:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
Note: Be careful to understand what this does when the first entry of your array is zero:
>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])
Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:
def fill_zeros_with_last(arr, initial=0):
ind = np.nonzero(arr)[0]
cnt = np.cumsum(np.array(arr, dtype=bool))
return np.where(cnt, arr[ind[cnt-1]], initial)
I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)
If the 0s only come in strings of 1, this use of nonzero might work:
In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])
I can handle your arr by applying this repeatedly until I is empty.
In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
In [287]: while True:
.....: I=np.nonzero(arr==0)[0]
.....: if len(I)==0: break
.....: arr[I] = arr[I-1]
.....:
In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.

Categories

Resources