Find consecutive ones in numpy array - python

How can I find the amount of consecutive 1s (or any other value) in each row for of the following numpy array? I need a pure numpy solution.
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
There are two parts to my question, first: what is the maximum number of 1s in a row? Should be
array([2,3,2])
in the example case.
And second, what is the index of the start of the first set of multiple consecutive 1s in a row? For the example case this would be
array([3,9,9])
In this example I put 2 consecutive 1s in a row. But it should be possible to change that to 5 consecutive 1s in a row, this is important.
A similar question was answered using np.unique, but it only works for one row and not an array with multiple rows as the result would have different lengths.

Here's a vectorized approach based on differentiation -
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Sample input, output -
Original sample case :
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Modified case :
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)

I think one problem that is very similar is to check if between the sorted rows the element wise difference is a certain amount. Here if there is a difference of 1 between 5 consecutive would be as follows. It can also be done for difference of 0 for two cards:
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)

Related

Python - Replacing Values Leading Up To 1s in an Array

Pretend I have a pandas Series that consists of 0s and 1s, but this can work with numpy arrays or any iterable. I would like to create a formula that would take an array and an input n and then return a new series that contains 1s at the nth indices leading up to every time that there is at least a single 1 in the original series. Here is an example:
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
> preceding_indices_function(array, 2)
np.array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
For each time there is a 1 in the input array, the two indices preceding it are filled in with 1 regardless of whether there is a 0 or 1 in that index in the original array.
I would really appreciate some help on this. Thanks!
Use a convolution with np.convolve:
N = 2
# craft a custom kernel
kernel = np.ones(2*N+1)
kernel[-N:] = 0
# array([1, 1, 1, 0, 0])
out = (np.convolve(array, kernel, mode='same') != 0).astype(int)
Output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Unless you don't want to use numpy, mozway's transpose is the best solution.
But since several iterations have been given, I add my itertools based solution
[a or b or c for a,b,c in itertools.zip_longest(array, array[1:], array[2:], fillvalue=0)]
zip_longest is the same as classical zip, but if the iterators have different "lengths", the number of iteration is the one of the longest, and finished iterators will return None. Unless you add a fillvalue parameter to zip_longest.
So, here itertools.zip_longest(array, array[1:], array[2:], fillvalue=0) gives a sequence of triplets (a,b,c), of 3 subsequent elements (a being the current element, b the next, c the one after, b and c being 0 if there isn't any next element or element after the next).
So from there, a simple comprehension build a list of [a or b or c] that is 1 if a, or b or c is 1, 0 else.
import numpy as np
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
array = np.array([a or array[idx+1] or array[idx+2] for idx, a in enumerate(array[:-2])] + [array[-2] or array[-1]] + [array[-1]])
this function works if a is a list, should work with other iterables as well:
def preceding_indices_function(array, n):
for i in range(len(a)):
if array[i] == 1:
for j in range(n):
if i-j-1 >= 0:
array[i-j-1] = 1
return array
I got a solution that is similar to the other one but slightly simpler in my opinion:
>>> [1 if (array[i+1] == 1 or array[i+2] == 1) else x for i,x in enumerate(array) if i < len(array) - 2]
[0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1]

Optimization Problem in Python (using binary array) possible solutions

I have a problem where I have a large binary numpy array (1000,2000). The general idea is that the array's columns represent time from 0 to 2000 and each row represents a task. Each 0 in the array represents a failure and each 1 represents success.
What I need to do is select 150 tasks(row axis) out of 1000 available and maximize the total successes (1s) over unique columns. It does not have to be consecutive and we are just looking to maximize success per time period (just need 1 success any additional is extraneous). I would like to select the best "Basket" of 150 tasks. The subarray rows can be taken anywhere from the 1000 initial rows. I want the optimal "Basket" of 150 tasks that lead to the most success across time (columns). (Edited for Additional Clarity)
A real basic example of what the array looks like :
array([[0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]])
I have successfully created a Monte Carlo simulation using randomly generated baskets of tasks in NumPy and then going through the array and summing. As you can imagine this takes a while and given the large number of potential combinations it is inefficient. Can someone point me to a algorithm or way to set this problem up in a solver like PuLP?
Try this:
n = 150
row_sums = np.sum(x, axis=1)
top_n_row_sums = np.argsort(row_sums)[-n:]
max_successes = x[top_n_row_sums]
This takes the sum of every row, grabs the indices of the highest n sums, and indexes into x with those row indices.
Note that the rows will end up sorted in ascending order of their sums across the columns. If you want the rows in normal order (ascending order by index), use this instead:
max_successes = x[sorted(top_n_row_sums)]
Why not just calculate the sum of successes for each row and then you can easily pick the top 150 values.

Python: distance from index to 1s in binary mask

I have a binary mask like this:
X = [[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]]
I have a certain index in this array and want to compute the distance from that index to the closest 1 in the mask. If there's already a 1 at that index, the distance should be zero.
Examples (assuming Manhattan distance):
distance(X, idx=(0, 5)) == 0 # already is a 1 -> distance is zero
distance(X, idx=(1, 2)) == 2 # second row, third column
distance(X, idx=(0, 0)) == 5 # upper left corner
Is there already existing functionality like this in Python/NumPy/SciPy? Both Euclidian and Manhattan distance would be fine.
I'd prefer to avoid computing distances for the entire matrix (as that is pretty big in my case), and only get the distance for my one index.
Here's one for manhattan distance metric for one entry -
def bwdist_manhattan_single_entry(X, idx):
nz = np.argwhere(X==1)
return np.abs((idx-nz).sum(1)).min()
Sample run -
In [143]: bwdist_manhattan_single_entry(X, idx=(0,5))
Out[143]: 0
In [144]: bwdist_manhattan_single_entry(X, idx=(1,2))
Out[144]: 2
In [145]: bwdist_manhattan_single_entry(X, idx=(0,0))
Out[145]: 5
Optimize further on performance by extracting the boudary elements only off the blobs of 1s -
from scipy.ndimage.morphology import binary_erosion
def bwdist_manhattan_single_entry_v2(X, idx):
k = np.ones((3,3),dtype=int)
nz = np.argwhere((X==1) & (~binary_erosion(X,k,border_value=1)))
return np.abs((idx-nz).sum(1)).min()
Number of elements in nz with this method would be smaller number than the earlier one, hence it improves.
You can use scipy.ndimage.morphology.distance_transform_cdt to compute the "taxicab" (Manhattan) distance transform:
import numpy as np
import scipy.ndimage.morphology
x = np.array([[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]])
d = scipy.ndimage.morphology.distance_transform_cdt(1 - x, 'taxicab')
print(d[0, 5])
# 0
print(d[1, 2])
# 2
print(d[0, 0])
# 5
You can do it like this:
def Manhattan_distance(X, idx):
dist = min([ abs(i-idx[0]) + abs(j-idx[1]) for i, row in enumerate(X) for j, val in enumerate(X[i]) if val == 1])
return dist
Thanks.

Efficient way to subset and combine arrays of different lengths

Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?
consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])
We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])

Slicing different rows of a numpy array differently

I'm working on a Monte Carlo radiative transfer code, which simulates firing photons through a medium and statistically modelling their random walk. It runs slowly firing one photon at a time, so I'd like to vectorize it and run perhaps 1000 photons at once.
I have divided my slab through which the photons are passing into nlayers slices between optical depth 0 and depth. Effectively, that means that I have nlayers + 2 regions (nlayers plus the region above the slab and the region below the slab). At each step, I have to keep track of which layers each photon passes through.
Let's suppose that I already know that two photons start in layer 0. One takes a step and ends up in layer 2, and the other takes a step and ends up in layer 6. This is represented by an array pastpresent that looks like this:
[[ 0 2]
[ 0 6]]
I want to generate an array traveled_through with (nlayers + 2) columns and 2 rows, describing whether photon i passed through layer j (endpoint-inclusive). It would look something like this (with nlayers = 10):
[[ 1 1 1 0 0 0 0 0 0 0 0 0]
[ 1 1 1 1 1 1 1 0 0 0 0 0]]
I could do this by iterating over the photons and generating each row of traveled_through individually, but that's rather slow, and sort of defeats the point of running many photons at once, so I'd rather not do that.
I tried to define the array as follows:
traveled_through = np.zeros((2, nlayers)).astype(int)
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + ] = 1
The idea was that in a given photon's row, the indices from the starting layer through and including the ending layer would be set to 1, with all others remaining 0. However, I get the following error:
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + 1 ] = 1
IndexError: invalid slice
My best guess is that numpy does not allow different rows of an array to be indexed differently using this method. Does anyone have suggestions for how to generate traveled_through for an arbitrary number of photons and an arbitrary number of layers?
If the two photons always start at 0, you could perhaps construct your array as follows.
First setting the variables...
>>> pastpresent = np.array([[0, 2], [0, 6]])
>>> nlayers = 10
...and then constructing the array:
>>> (pastpresent[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
Or if the photons have an arbitrary starting layer:
>>> pastpresent2 = np.array([[1, 7], [3, 9]])
>>> (pastpresent2[:,0][:,np.newaxis] < np.arange(nlayers+2)) &
(pastpresent2[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]])
A little trick I kind of like for this kind of thing involves the accumulate method of the logical_xor ufunc:
>>> a = np.zeros(10, dtype=int)
>>> b = [3, 7]
>>> a[b] = 1
>>> a
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
>>> np.logical_xor.accumulate(a, out=a)
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])
Note that this sets to 1 the entries between the positions in b, first index inclusive, last index exclusive, so you have to handle off by 1 errors depending on what exactly you are after.
With several rows, you could make it work as:
>>> a = np.zeros((3, 10), dtype=int)
>>> b = np.array([[1, 7], [0, 4], [3, 8]])
>>> b[:, 1] += 1 # handle the off by 1 error
>>> a[np.arange(len(b))[:, None], b] = 1
>>> a
array([[0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
>>> np.logical_xor.accumulate(a, axis=1, out=a)
array([[0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0]])

Categories

Resources