Numpy vectorization with calculation depending on previous value(s) - python

Is there is a vectorized way to change all concurrent 1s that are within offset of the first 1 into 0s (transform A into B)? I'm currently trying to do this on a numpy array with over 1 million items where speed is critical.
The 1s represent a signal trigger and the 0s represent no trigger. For example: Given an offset of 5, whenever there is a 1, the following 5 items must be 0 (to remove signal concurrency).
Example 1:
offset = 3
A = np.array([1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0])
B = np.array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0])
Example 2:
offset = 2
A = np.array([1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0])
B = np.array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0])

From the comments, it seems that, the question is not just related to use NumPy and …, and the main objective is to speed up the code. Since, you are using the partial solution, mentioned by JohanC, (Which needs much more considerations for this question), I suggest the following methods:
def com_():
n = 1
for i in range(1, len(A)+1):
if A[n-1] == 1:
A[n:n+offset] = 0
n += offset + 1
else:
n += 1
if n > len(A):
break
#nb.jit(forceobj=True)
def com_fast():
B = A.tolist()
n = 1
while n < len(B):
if B[n-1] == 1:
for i in range(offset):
if n+i < len(B):
B[n+i] = 0
n += offset + 1
else:
n += 1
The first method is using A in the form of NumPy array and loops. The second one uses an input in the form of list and loops, and is accelerated by numba as it is mentioned by hpaulj in the comments.
Using the same inputs (1,000,000 in length) for the methods, and running on Google Colab TPU:
1000 loops, best of 5: 153 ms per loop # for com_()
1000 loops, best of 5: 10.2 ms per loop # for com_fast()
Which, I think, will show acceptable performance times with that large data.
I think, this question could not be solved just by NumPy, or if so, It will be very difficult and need to think about it a lot (I have tried and I achieved good results, but finally needs to loops). My guess is that, using numba and libraries like that, could have similar results (in runtime) and, so, it does not need to use just NumPy.

Related

Python - Replacing Values Leading Up To 1s in an Array

Pretend I have a pandas Series that consists of 0s and 1s, but this can work with numpy arrays or any iterable. I would like to create a formula that would take an array and an input n and then return a new series that contains 1s at the nth indices leading up to every time that there is at least a single 1 in the original series. Here is an example:
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
> preceding_indices_function(array, 2)
np.array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
For each time there is a 1 in the input array, the two indices preceding it are filled in with 1 regardless of whether there is a 0 or 1 in that index in the original array.
I would really appreciate some help on this. Thanks!
Use a convolution with np.convolve:
N = 2
# craft a custom kernel
kernel = np.ones(2*N+1)
kernel[-N:] = 0
# array([1, 1, 1, 0, 0])
out = (np.convolve(array, kernel, mode='same') != 0).astype(int)
Output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Unless you don't want to use numpy, mozway's transpose is the best solution.
But since several iterations have been given, I add my itertools based solution
[a or b or c for a,b,c in itertools.zip_longest(array, array[1:], array[2:], fillvalue=0)]
zip_longest is the same as classical zip, but if the iterators have different "lengths", the number of iteration is the one of the longest, and finished iterators will return None. Unless you add a fillvalue parameter to zip_longest.
So, here itertools.zip_longest(array, array[1:], array[2:], fillvalue=0) gives a sequence of triplets (a,b,c), of 3 subsequent elements (a being the current element, b the next, c the one after, b and c being 0 if there isn't any next element or element after the next).
So from there, a simple comprehension build a list of [a or b or c] that is 1 if a, or b or c is 1, 0 else.
import numpy as np
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
array = np.array([a or array[idx+1] or array[idx+2] for idx, a in enumerate(array[:-2])] + [array[-2] or array[-1]] + [array[-1]])
this function works if a is a list, should work with other iterables as well:
def preceding_indices_function(array, n):
for i in range(len(a)):
if array[i] == 1:
for j in range(n):
if i-j-1 >= 0:
array[i-j-1] = 1
return array
I got a solution that is similar to the other one but slightly simpler in my opinion:
>>> [1 if (array[i+1] == 1 or array[i+2] == 1) else x for i,x in enumerate(array) if i < len(array) - 2]
[0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1]

Itertools with conditions in python 3

I am trying to generate some vectors with numbers [0....k-1], and with length k^n. n and k were given before.
k = 4
n = 2
args = list(product(range(k), repeat=n))
# vector=str([i for i in range(k)]*(n+1))
for i in product(range(k), repeat=k ** n):
if (check(i, args)): print(i)
Commented line is not important,it was my idea.
I need to generate this vectors with condition: I want to see each number from [0;k-1] in my vectors more or equal to (n) times. So it is task about permutations with replacements with special conditions to control numbers I can get. What shall I do?
For example I have k=2,n=2 vector from 4 elements and want to see 0 and 1 TWO or more times.
I should get 0011 0101 0110 1001 1010 1100
Everything is easy in example, but when k=5,n=2 (for example) there are 25-size vector and i want to see 0 1 2 3 4 2 times and other 17 numbers should be from 0 1 2 3 4 it becomes difficult.
UPDATE:
Here is a solution that generates the necessary combinations only. It is in principle faster, although the complexity is still exponential and you can quickly hit the limits of recursion.
def my_vectors(k, n):
# Minimum repetitions per element
base_repetitions = [n] * k
# "Unassigned" repetitions
rest = k ** n - k * n
# List reused for permutation construction
permutation = [-1] * (k ** n)
# For each possible repetition assignment
for repetitions in make_repetitions(base_repetitions, rest):
# Make all possible permutations
yield from make_permutations(repetitions, permutation)
# Finds all possible repetition assignments
def make_repetitions(repetitions, rest, first=0):
if rest <= 0:
yield repetitions
else:
for i in range(first, len(repetitions)):
repetitions[i] += 1
yield from make_repetitions(repetitions, rest - 1, i)
repetitions[i] -= 1
# Make all permutations with repetitions
def make_permutations(repetitions, permutation, idx=0):
if idx >= len(permutation):
yield list(permutation)
# If you are going to use the permutation within a loop only
# maybe you can avoid copying the list and do just:
# yield permutation
else:
for elem in range(len(repetitions)):
if repetitions[elem] > 0:
repetitions[elem] -= 1
permutation[idx] = elem
yield from make_permutations(repetitions, permutation, idx + 1)
repetitions[elem] += 1
for v in my_vectors(3, 2):
print(v)
Output:
(0, 0, 0, 0, 0, 1, 1, 2, 2)
(0, 0, 0, 0, 0, 1, 2, 1, 2)
(0, 0, 0, 0, 0, 1, 2, 2, 1)
(0, 0, 0, 0, 0, 2, 1, 1, 2)
(0, 0, 0, 0, 0, 2, 1, 2, 1)
(0, 0, 0, 0, 0, 2, 2, 1, 1)
(0, 0, 0, 0, 1, 0, 1, 2, 2)
(0, 0, 0, 0, 1, 0, 2, 1, 2)
(0, 0, 0, 0, 1, 0, 2, 2, 1)
(0, 0, 0, 0, 1, 1, 0, 2, 2)
...
This is an inefficient but simple way to implement it:
from itertools import product
from collections import Counter
def my_vectors(k, n):
for v in product(range(k), repeat=k ** n):
count = Counter(v)
if all(count[i] >= n for i in range(k)):
yield v
for v in my_vectors(3, 2):
print(v)
Output:
(0, 0, 0, 0, 0, 1, 1, 2, 2)
(0, 0, 0, 0, 0, 1, 2, 1, 2)
(0, 0, 0, 0, 0, 1, 2, 2, 1)
(0, 0, 0, 0, 0, 2, 1, 1, 2)
(0, 0, 0, 0, 0, 2, 1, 2, 1)
(0, 0, 0, 0, 0, 2, 2, 1, 1)
(0, 0, 0, 0, 1, 0, 1, 2, 2)
(0, 0, 0, 0, 1, 0, 2, 1, 2)
(0, 0, 0, 0, 1, 0, 2, 2, 1)
(0, 0, 0, 0, 1, 1, 0, 2, 2)
...
Obviously, as soon as your numbers get slightly bigger it will take forever to run, so it is only useful either for very small problems or as a baseline for comparison.
In any case, the number of items that the problem produces is exponentially large anyway, so although you can make it significantly better (i.e. generate only the right elements instead of all the possible ones and discarding), it cannot be "fast" for any size.

Simultaneous changing of python numpy array elements

I have a vector of integers from range [0,3], for example:
v = [0,0,1,2,1,3, 0,3,0,2,1,1,0,2,0,3,2,1].
I know that I can replace a specific values of elements in the vector by other value using the following
v[v == 0] = 5
which changes all appearences of 0 in vector v to value 5.
But I would like to do something a little bit different - I want to change all values of 0 (let's call them target values) to 1, and all values different from 0 to 0, thus I want to obtain the following:
v = [1,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0]
However, I cannot call the substitution code (which I used above) as follows:
v[v==0] = 1
v[v!=0] = 0
because this obviously leeds to a vector of zeros.
Is it possible to do the above substitution in a parralel way, to obtain the desired vector? (I want to have a universal technique, which will allow me to use it even if I will change what is my target value). Any suggestions will be very helpful!
You can check if v is equal to zero and then convert the boolean array to int, and so if the original value is zero, the boolean is true and converts to 1, otherwise 0:
v = np.array([0,0,1,2,1,3, 0,3,0,2,1,1,0,2,0,3,2,1])
(v == 0).astype(int)
# array([1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0])
Or use numpy.where:
np.where(v == 0, 1, 0)
# array([1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0])

Find consecutive ones in numpy array

How can I find the amount of consecutive 1s (or any other value) in each row for of the following numpy array? I need a pure numpy solution.
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
There are two parts to my question, first: what is the maximum number of 1s in a row? Should be
array([2,3,2])
in the example case.
And second, what is the index of the start of the first set of multiple consecutive 1s in a row? For the example case this would be
array([3,9,9])
In this example I put 2 consecutive 1s in a row. But it should be possible to change that to 5 consecutive 1s in a row, this is important.
A similar question was answered using np.unique, but it only works for one row and not an array with multiple rows as the result would have different lengths.
Here's a vectorized approach based on differentiation -
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Sample input, output -
Original sample case :
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Modified case :
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)
I think one problem that is very similar is to check if between the sorted rows the element wise difference is a certain amount. Here if there is a difference of 1 between 5 consecutive would be as follows. It can also be done for difference of 0 for two cards:
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)

Slicing different rows of a numpy array differently

I'm working on a Monte Carlo radiative transfer code, which simulates firing photons through a medium and statistically modelling their random walk. It runs slowly firing one photon at a time, so I'd like to vectorize it and run perhaps 1000 photons at once.
I have divided my slab through which the photons are passing into nlayers slices between optical depth 0 and depth. Effectively, that means that I have nlayers + 2 regions (nlayers plus the region above the slab and the region below the slab). At each step, I have to keep track of which layers each photon passes through.
Let's suppose that I already know that two photons start in layer 0. One takes a step and ends up in layer 2, and the other takes a step and ends up in layer 6. This is represented by an array pastpresent that looks like this:
[[ 0 2]
[ 0 6]]
I want to generate an array traveled_through with (nlayers + 2) columns and 2 rows, describing whether photon i passed through layer j (endpoint-inclusive). It would look something like this (with nlayers = 10):
[[ 1 1 1 0 0 0 0 0 0 0 0 0]
[ 1 1 1 1 1 1 1 0 0 0 0 0]]
I could do this by iterating over the photons and generating each row of traveled_through individually, but that's rather slow, and sort of defeats the point of running many photons at once, so I'd rather not do that.
I tried to define the array as follows:
traveled_through = np.zeros((2, nlayers)).astype(int)
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + ] = 1
The idea was that in a given photon's row, the indices from the starting layer through and including the ending layer would be set to 1, with all others remaining 0. However, I get the following error:
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + 1 ] = 1
IndexError: invalid slice
My best guess is that numpy does not allow different rows of an array to be indexed differently using this method. Does anyone have suggestions for how to generate traveled_through for an arbitrary number of photons and an arbitrary number of layers?
If the two photons always start at 0, you could perhaps construct your array as follows.
First setting the variables...
>>> pastpresent = np.array([[0, 2], [0, 6]])
>>> nlayers = 10
...and then constructing the array:
>>> (pastpresent[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
Or if the photons have an arbitrary starting layer:
>>> pastpresent2 = np.array([[1, 7], [3, 9]])
>>> (pastpresent2[:,0][:,np.newaxis] < np.arange(nlayers+2)) &
(pastpresent2[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]])
A little trick I kind of like for this kind of thing involves the accumulate method of the logical_xor ufunc:
>>> a = np.zeros(10, dtype=int)
>>> b = [3, 7]
>>> a[b] = 1
>>> a
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
>>> np.logical_xor.accumulate(a, out=a)
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])
Note that this sets to 1 the entries between the positions in b, first index inclusive, last index exclusive, so you have to handle off by 1 errors depending on what exactly you are after.
With several rows, you could make it work as:
>>> a = np.zeros((3, 10), dtype=int)
>>> b = np.array([[1, 7], [0, 4], [3, 8]])
>>> b[:, 1] += 1 # handle the off by 1 error
>>> a[np.arange(len(b))[:, None], b] = 1
>>> a
array([[0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
>>> np.logical_xor.accumulate(a, axis=1, out=a)
array([[0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0]])

Categories

Resources