Find runs and lengths of consecutive values in an array - python

I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?

Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))

_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.

Related

In python, how can I replace first value of a series of matrices by a random number?

I want to simulate N people and each for T amount of time, currently I have that the value of p for each person at T=0 is 0. How can I write this, so instead of having zeros at the first time period, I have a random number that is chosen from a distribution with the other values remaining equal?
N = 100
T = 70
p[N,T]
p[:,0] = 0.0
If you need to change the 1st value of the 2D array from a value taken from an early defined array you can follow this way. I have added an example for your reference.
p = [[0, 1, 1, 1, 1, 1], [0, 2, 2, 3, 3, 3], [0, 2, 3, 3, 3, 4]]
array = [0.6, 0.5, 0.9]
for i in range(len(p)):
if p[i][0] == 0:
p[i][0] = array[i] # use random() if you want to replace with a random number
print(p)
output: [[0.6, 1, 1, 1, 1, 1], [0.5, 2, 2, 3, 3, 3], [0.9, 2, 3, 3, 3, 4]]

How to adjust for loop so that it prints list only once?

I have some sample code that is goes as follows:
import numpy as np
import pandas as pd
x = range(1, 12)
arr1 = np.random.randint(x)
arr2 = np.array(x)
arr3 = np.random.randint(x)
arr4 = np.random.randint(x)
arr5 = np.random.randint(0, 2, 11)
dict_df = {
'arr1' : arr1,
'arr2' : arr2,
'arr3' : arr3,
'arr4' : arr4,
'arr5' : arr5
}
d = pd.DataFrame(dict_df)
num_count = 0
list_of_num = []
for i in d.index:
number = d['arr1'][i]
for num in d['arr5']:
if num == 1:
num_count = 1
number = number
list_of_num.append(number)
elif num == 0:
num_count = 0
print(list_of_num)
I am trying to build list, into which all of the ones in column arr5 are appended if they are preceeded by a -1. The output I am receiving from this is:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8]
The issue with the code is that I am misusing for loops, which is why the list has repeated itself so many times. How can I can I change the code so that the code does not repeat itself?
It appears that you did not intend to nest your loops. The outer loop steps over each row. The inner loop then loops over each row for each iteration of the outer loop. To move along two columns in lockstep, you can write a single loop:
for i in d.index:
if d['arr5'][i]:
list_of_num.append(d['arr1'][i])
num_count += 1
This is of course extremely inefficient and discards all the benefits of using numpy or pandas in the first place. You can accomplish the same thing using boolean masks. In numpy:
array_of_num = arr1[arr5.astype(bool)]
num_count = array_of_num.size
In pandas:
series_of_num = d['arr1'][d['arr5'].astype(bool)]
num_count = series_of_num.size
In both cases, you can replace .astype(bool) with != 0.

How to replace a list comprehension with a numpy command?

Is there a way to replace the following python list comprehension with a numpy function that doesn't work with loops?
a = np.array([0, 1, 1, 1, 0, 3])
bins = np.bincount(a)
>>> bins: [2 3 0 1]
a_counts = [bins[val] for val in y_true]
>>> a_counts: [2, 3, 3, 3, 2, 1]
So the basic idea is to generate an array where the actual values are replaced by the number of occurrences of that specific value in the array.
I want to do this calculation in a custom keras loss function which, to my knowledge, doesn't work with loops or list comprehensions.
You just need to index the result from np.bincount with a:
a = np.array([0, 1, 1, 1, 0, 3])
bins = np.bincount(a)
a_counts = bins[a]
print(a_counts)
# array([2, 3, 3, 3, 2, 1], dtype=int64)
Or use collections.Counter:
from collections import Counter
l = [0, 1, 1, 1, 0, 3]
print(Counter(l))
Which Outputs:
Counter({1: 3, 0: 2, 3: 1})
If you want to avoid loops, you may use pandas library:
import pandas as pd
import numpy as np
a = np.array([0, 1, 1, 1, 0, 3])
a_counts = pd.value_counts(a)[a].values
>>> a_counts: array([2, 3, 3, 3, 2, 1], dtype=int64)

Finding unique sets without subsets in python array

I have a dataset that needs to output boolean style data, just 1 and 0, for true or not true. I am trying to parse simple data sets I've processed to look for a subset of information in a numpy array, the array is about 100,000 elements in one direction and 20 in the other. I only need to search along the 20 axis, but I need to do that for each of the 100,000 entries and get output that I can map.
I've produced an array of this size made up of zeros, with the intention to simply mark the matching index indicator to a 1. A main hitch is that if I find a long set (I'm working with long sets to small sets), I need to NOT include any smaller set that's within it.
Sample:
[0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,1,0,1]
I need to find here that there are 1 group of 5, starting at index 2, and 1 group of 3, starting at index 9, and not return any subset of the group of 5 as though it were a group of 4 or a group of 3, thus leaving the results for all those already covered values. i.e. for groups of 3, the indices 2, 3, 4, 5, and 6 would all remain zero. It doesn't need to be overly efficient, I don't care if it searches anyways, I just need to not keep the result.
Currently I'm using a codeblock basically like this for a simple search:
values = numpy.array([0,1,1,1,1,1,0,0,1,1,1])
searchval = [1,2]
N = len(searchval)
possibles = numpy.where(values == searchval[0])[0]
print(possibles)
solns = []
for p in possibles:
check = values[p:p+N]
if numpy.all(check == searchval):
solns.append(p)
print(solns)
I've been wracking my brain trying to come up with a way to restructure this or similar code to produce the desires results. The end goal is to be searching for groups of 9 down to groups of 3, and having effectively a matrix of 1s and 0s indicating if an index has a group starting on it that is as long as we want.
Hopefully someone can point me to what I'm missing to make this work. Thanks!
Using more_itertools, a third-party library (pip install more_itertools):
import more_itertools as mit
sample = [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1]
groups = [list(c) for c in mit.consecutive_groups((mit.locate(sample)))]
d = {group[0]: len(group) for group in groups}
d
# {2: 5, 9: 3, 15: 1, 17: 1}
This result reads "At index 2 is a group of 5 ones. At group 9 is a group of 3 ones," etc.
Details
more_itertools.locate finds indices for truthy items by default.
more_itertools.consecutive_groups chunks consecutive numbers together.
The result is a dictionary of (starting-index, length) pairs.
As a dictionary, you can extract different kinds of information:
>>> # List of starting indices
>>> list(d)
[2, 9, 15, 17]
>>> # List indices for all lonely groups
>>> [k for k, v in d.items() if v == 1]
[15, 17]
>>> # List indices of groups greater the 2 items
>>> [k for k, v in d.items() if v > 1]
[2, 9]
Here is a numpy solution. I'm using a small example for demonstration but it easily scales (20 x 100,000 takes 25 ms on my rather modest laptop, see timings at the end of this post):
>>> import numpy as np
>>>
>>>
>>> a = np.random.randint(0, 2, (5, 10), dtype=np.int8)
>>> a
array([[0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 1, 1, 1, 0, 0]], dtype=int8)
>>>
>>> padded = np.pad(a,((1,1),(0,0)), 'constant')
# compare array to itself with offset to mark all switches from
# 0 to 1 or from 1 to 0
# then use 'where' to extract the coordinates
>>> colinds, rowinds = np.where((padded[:-1] != padded[1:]).T)
>>>
# the lengths of sets are the differences between switch points
>>> lengths = rowinds[1::2] - rowinds[::2]
# now we have the lengths we are free to throw the off-switches away
>>> colinds, rowinds = colinds[::2], rowinds[::2]
>>>
# admire
>>> from pprint import pprint
>>> pprint(list(zip(colinds, rowinds, lengths)))
[(0, 2, 1),
(1, 0, 2),
(2, 1, 2),
(2, 4, 1),
(3, 2, 1),
(4, 0, 5),
(5, 0, 1),
(5, 2, 1),
(5, 4, 1),
(6, 1, 1),
(6, 3, 2),
(7, 4, 1)]
Timings:
>>> def find_stretches(a):
... padded = np.pad(a,((1,1),(0,0)), 'constant')
... colinds, rowinds = np.where((padded[:-1] != padded[1:]).T)
... lengths = rowinds[1::2] - rowinds[::2]
... colinds, rowinds = colinds[::2], rowinds[::2]
... return colinds, rowinds, lengths
...
>>> a = np.random.randint(0, 2, (20, 100000), dtype=np.int8)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100)
>>> repeat('find_stretches(a)', **kwds)
[2.475784719004878, 2.4715258619980887, 2.4705517270049313]
Something like this?
from collections import defaultdict
sample = [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1]
# Keys are number of consecutive 1's, values are indicies
results = defaultdict(list)
found = 0
for i, x in enumerate(samples):
if x == 1:
found += 1
elif i == 0 or found == 0:
continue
else:
results[found].append(i - found)
found = 0
if found:
results[found].append(i - found + 1)
assert results == {1: [15, 17], 3: [9], 5: [2]}

Fill zero values of 1d numpy array with last non-zero values

Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.
Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?
for example:
arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr
[1 1 1 2 2 4 6 8 8 8 8 8 2]
A way to do it would be with this function:
def fill_zeros_with_last(arr):
last_val = None # I don't really care about the initial value
for i in range(arr.size):
if arr[i]:
last_val = arr[i]
elif last_val is not None:
arr[i] = last_val
However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.
If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...
Any ideas? or should we go straight to Cython?
Disclaimer:
I would say long ago I found a question in stackoverflow asking something like this or very similar. I wasn't able to find it. :-(
Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...
Here's a solution using np.maximum.accumulate:
def fill_zeros_with_last(arr):
prev = np.arange(len(arr))
prev[arr == 0] = 0
prev = np.maximum.accumulate(prev)
return arr[prev]
We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
Then prev looks like:
array([ 0, 0, 0, 3, 3, 5, 6, 7, 7, 7, 7, 7, 12])
Then we just index into arr with prev and we obtain our result. A test:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
Note: Be careful to understand what this does when the first entry of your array is zero:
>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])
Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:
def fill_zeros_with_last(arr, initial=0):
ind = np.nonzero(arr)[0]
cnt = np.cumsum(np.array(arr, dtype=bool))
return np.where(cnt, arr[ind[cnt-1]], initial)
I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)
If the 0s only come in strings of 1, this use of nonzero might work:
In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])
I can handle your arr by applying this repeatedly until I is empty.
In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
In [287]: while True:
.....: I=np.nonzero(arr==0)[0]
.....: if len(I)==0: break
.....: arr[I] = arr[I-1]
.....:
In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.

Categories

Resources