Numpy vectorized summation with variable number of factors - python

I am currently computing a function that contains a summation over an index. The index is between 0 and the integer part of T; ideally I would like to be able to compute this summation quickly for several values of T.
In a real-life case, most of the values of T are small, but a small percentage can be one or two orders of magnitude larger than the average.
What I am doing now is:
1) I define the vector T, e.g. (my real-life data have a much larger number of entries, it is just to give an idea):
import numpy as np
T = np.random.exponential(5, 10)
2) I create a matrix containing the factors between 0 and int(T), and then zeroes:
n = int(T.max())
j = ((np.arange(n) < T[:,np.newaxis])*np.arange(1,n+1)).astype(int).transpose()
print(j)
[[ 1 1 1 1 1 1 1 1 1 1]
[ 2 0 2 2 2 0 2 0 2 2]
[ 0 0 3 0 3 0 3 0 3 3]
[ 0 0 4 0 4 0 0 0 4 4]
[ 0 0 5 0 5 0 0 0 5 5]
[ 0 0 6 0 6 0 0 0 6 6]
[ 0 0 7 0 7 0 0 0 0 7]
[ 0 0 8 0 8 0 0 0 0 8]
[ 0 0 9 0 9 0 0 0 0 9]
[ 0 0 0 0 10 0 0 0 0 10]
[ 0 0 0 0 11 0 0 0 0 0]
[ 0 0 0 0 12 0 0 0 0 0]]
3) I generate the single elements of the summation, using a mask to avoid applying the function to the elements that are zero:
A = np.log(1 + (1 + j) * 5)* (j>0)
4) I sum along the columns:
A.sum(axis=0)
Obtaining:
array([ 5.170484 , 2.39789527, 29.96464821, 5.170484 ,
42.29052851, 2.39789527, 8.21500643, 2.39789527,
18.49060911, 33.9899999 ])
Is there a fastest/better way to vectorize that? I have the feeling that it is very slow due to the large amount of zeroes that do not contribute to the sum, but since I am a beginner with NumPy I couldn't figure out a better way of writing it.
EDIT: in my actual problem, the function applied to j depends also on a second parameter tau (in a vector of the same size of T). So the items contained in every column are not the same.

Looking at your j, for each column it has numbers going from 1 to N, where N is being decided based on each T element. Then, you are summing along each column, which is the same as summing until N because rest of the elements are zeros anyway. Those summed values could be calculated with np.cumsum and those N values that are basically the limits of each column in j could be directly calculated from T. These N values are then used as indices to index into the cumsum-ed values to give us the final output.
This should be pretty fast and memory efficient, given that cumsum is the only computation done and that too on a 1D array, as compared to the summation done in the original approach on a 2D array along each column. Thus, we have a vectorized approach like so -
n = int(T.max())
vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
out = vals[(T.astype(int)).clip(max=n-1)]
In terms of memory usage, we are generating three variables -
n : Scalar
vals : 1D array of n elements
out : 1D array of T.size elements (this is the output anyway)
Runtime test and verify output -
In [5]: def original_app(T):
...: n = int(T.max())
...: j = ((np.arange(n) < T[:,None])*np.arange(1,n+1)).astype(int).transpose()
...: A = np.log(1 + (1 + j) * 5)* (j>0)
...: return A.sum(axis=0)
...:
...: def vectorized_app(T):
...: n = int(T.max())
...: vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
...: return vals[(T.astype(int)).clip(max=n-1)]
...:
In [6]: # Input array
...: T = np.random.exponential(5, 10000)
In [7]: %timeit original_app(T)
100 loops, best of 3: 9.62 ms per loop
In [8]: %timeit vectorized_app(T)
10000 loops, best of 3: 50.1 µs per loop
In [9]: np.allclose(original_app(T),vectorized_app(T)) # Verify outputs
Out[9]: True

Related

How to change the array elements according specific condition

I have an array for an example:
import numpy as np
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
Requirement :
In the data array, if element 1's are consecutive as the square size
of ((3,3)) and more than square size no changes. Otherwise, replace
element value 1 with zero except the square size.
Expected output :
[[4 4 4 0 1 1 1 0 0 0 0 0 0 0 0]
[3 0 0 0 1 1 1 0 0 0 0 0 0 0 0]
[6 0 0 0 1 1 1 0 0 0 0 0 0 0 0]]
I will provide here as solutions two different approaches. One which doesn't and one which is using Python loops. Let's start with the common header:
import numpy as np
from skimage.util import view_as_windows as winview
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
Below an approach without using Python loops resulting in shortest code, but requiring import of an additional module skimage:
clmn = np.where(np.all(winview(data,(3,3))[0],axis=(1,2)))[0][0]
data[data == 1] = 0 # set all ONEs to zero
data[0:3,clmn+3:] = 0 # set after match to zero
data[0:3,clmn:clmn+3] = 1 # restore ONEs
Another one is using Python loops and only two lines longer:
for clmn in range(0,data.shape[1]):
if np.all(data[0:3,clmn:clmn+3]):
data[data==1] = 0
data[0:3,clmn+3:] = 0
data[0:3,clmn:clmn+3] = 1
break
Instead of explaining how the above code using loops works I have put the 'explanations' into the names of the used variables so the code becomes hopefully self-explaining. With this explanations and some redundant code you can use the code below for another shaped haystack to search for in another array of same kind. For an array with more rows as the shape of the sub-array there will be necessary to loop also over the rows and optimize the code skipping some unnecessary checks.
import numpy as np
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
indx_of_clmns_in_shape = 1
indx_of_rows_in_shape = 0
subarr_shape = (3, 3)
first_row = 0
first_clmn = 0
for clmn in range(first_clmn,data.shape[indx_of_clmns_in_shape],1):
sub_data = data[
first_row:first_row+subarr_shape[indx_of_rows_in_shape],
clmn:clmn+subarr_shape[indx_of_clmns_in_shape]]
if np.all(sub_data):
data[data == 1] = 0
data[first_row : subarr_shape[indx_of_rows_in_shape],
clmn+subarr_shape[indx_of_clmns_in_shape] : ] = 0
data[first_row : subarr_shape[indx_of_rows_in_shape],
clmn : clmn+subarr_shape[indx_of_clmns_in_shape]] = 1
break
# print(sub_data)
print(data)
all three versions of the code give the same result:
[[4 4 4 0 1 1 1 0 0 0 0 0 0 0 0]
[3 0 0 0 1 1 1 0 0 0 0 0 0 0 0]
[6 0 0 0 1 1 1 0 0 0 0 0 0 0 0]]
Should be easy to do with a double for loop and a second array
rows = len(source_array)
columns = len(source_array[0])
# Create a result array of same size
result_array = [[0 for _ in range(rows)] for _ in range(columns)]
for i in range(rows):
for j in range(columns):
# Copy non 1s
if source_array[i][j] != 1:
result_array[i][j] = source_array[i][j]
# if enough rows left to check then check
if i < rows - 3:
if j < columns - 3:
# Create set on the selected partition
elements = set(source_array[i][j:j+3] + source_array[i+1][j:j+3] + source_array[i+2][j:j+3])
# Copy 1s to new array
if len(elements) == 1 and 1 in elements:
for sq_i in range(i,i+3):
for sq_j in range(j,j+3):
result_array[sq_i][sq_j] = 1

Changing proportion of agreeing values in numpy arrays

I have a problem I've been trying to think through. Say I have a numpy array that looks like this (in the actual implementation, len(array) will be around 4500):
array = np.repeat([0, 1, 2], 2)
array >> [0, 0, 1, 1, 2, 2]
From this, I'm trying to generate a new (shuffled) array where the proportion of values that randomly agree with array is a particular proportion p. So let's say p = .5. Then, an example new array would be something like
array = [0, 0, 1, 1, 2, 2]
new_array = [0, 1, 2, 1, 0, 2]
where you can see that exactly 50% of the values in new_array agree with the values in array. The requirements of the output array are:
np.count_nonzero(array - new_array) / len(array) = p, and
set(np.unique(array)) == set(np.unique(new_array)).
By "agree" I mean array[i] == new_array[i] for agreeing indices i. All values in new_array should be the same as array, just shuffled.
I'm sure there's an elegant way of doing this -- can anybody think of something?
Thanks!
You can try something like
import random
p = 0.5
arr = np.array([0, 0, 1, 1, 2, 2])
# number of similar elements required
num_sim_element = round(len(arr)*p)
# creating indeices of similar element
hp = {}
for i,e in enumerate(arr):
if(e in hp):
hp[e].append(i)
else:
hp[e] = [i]
#print(hp)
out_map = []
k = list(hp.keys())
v = list(hp.values())
index = 0
while(len(out_map) != num_sim_element):
if(len(v[index]) > 0):
k_ = k[index]
random.shuffle(v[index])
v_ = v[index].pop()
out_map.append((k_,v_))
index += 1
index %= len(k)
#print(out_map)
out_unique = set([i[0] for i in out_map])
out_indices = [i[-1] for i in out_map]
out_arr = arr.copy()
#for i in out_map:
# out_arr[i[-1]] = i[0]
for i in set(range(len(arr))).difference(out_indices):
out_arr[i] = random.choice(list(out_unique.difference([out_arr[i]])))
print(arr)
print(out_arr)
assert 1 - (np.count_nonzero(arr - out_arr) / len(arr)) == p
assert set(np.unique(arr)) == set(np.unique(out_arr))
[0 0 1 1 2 2]
[1 0 1 0 0 2]
Here's a version that might be a little easier to follow:
import math, random
# generate array of random values
a = np.random.rand(4500)
# make a utility list of every position in that array, and shuffle it
indices = [i for i in range(0, len(a))]
random.shuffle(indices)
# set the proportion you want to keep the same
proportion = 0.5
# make two lists of indices, the ones that stay the same and the ones that get shuffled
anchors = indices[0:math.floor(len(a)*proportion)]
not_anchors = indices[math.floor(len(a)*proportion):]
# get values of non-anchor indices, and shuffle them
not_anchor_values = [a[i] for i in not_anchors]
random.shuffle(not_anchor_values)
# loop original array, if an anchor position, keep original value
# if not an anchor, draw value from shuffle non-anchor value list and increment the count
final_list = []
count = 0
for e,i in enumerate(a):
if e in not_anchors:
final_list.append(i)
else:
final_list.append(not_anchor_values[count])
count +=1
# test proportion of matches and non-matches in output
match = []
not_match = []
for e,i in enumerate(a):
if i == final_list[e]:
match.append(True)
else:
not_match.append(True)
len(match)/(len(match)+len(not_match))
Comments in the code explain the approach.
(EDITED to include a different and more accurate approach)
One should note that not all values of the shuffled fraction p (number of shuffled elements divided by the total number of elements) is accessible.
The possible value of p depend on the size of the input and on the number of repeated elements.
That said, I can suggest two possible approaches:
split your input into pinned and unpinned indices of the correct size and then shuffle the unpinned indices.
import numpy as np
def partial_shuffle(arr, p=1.0):
n = arr.size
k = round(n * p)
shuffling = np.arange(n)
shuffled = np.random.choice(n, k, replace=False)
shuffling[shuffled] = np.sort(shuffled)
return arr[shuffling]
The main advantage of approach (1) is that it can be easily implemented in a vectorized form using np.random.choice() and advanced indexing.
On the other hand, this works well as long as you are willing to accept that some shuffling may return you some elements unshuffled because of repeating values or simply because the shuffling indexes are accidentally coinciding with the unshuffled ones.
This causes the requested value of p to be typically larger than the actual value observed.
If one needs a relatively more accurate value of p, one could just try performing a search on the p parameter giving the desired value on the output, or go by trial-and-error.
implement a variation of the Fisher-Yates shuffle where you: (a) reject swappings of positions whose value is identical and (b) pick only random positions to swap that were not already visited.
def partial_shuffle_eff(arr, p=1.0, inplace=False, tries=2.0):
if not inplace:
arr = arr.copy()
n = arr.size
k = round(n * p)
tries = round(n * tries)
seen = set()
i = l = t = 0
while i < n and l < k:
seen.add(i)
j = np.random.randint(i, n)
while j in seen and t < tries:
j = np.random.randint(i, n)
t += 1
if arr[i] != arr[j]:
arr[i], arr[j] = arr[j], arr[i]
l += 2
seen.add(j)
while i in seen:
i += 1
return arr
While this approach gets to a more accurate value of p, it is still limited by the fact that the target number of swaps must be even.
Also, for inputs with lots of uniques the second while (while j in seen ...) is potentially an infinite loop so a cap on the number of tries should be set.
Finally, you would need to go with explicit looping, resulting in a much slower execution speed, unless you can use Numba's JIT compilation, which would speed up your execution significantly.
import numba as nb
partial_shuffle_eff_nb = nb.njit(partial_shuffle_eff)
partial_shuffle_eff_nb.__name__ = 'partial_shuffle_eff_nb'
To test the accuracy of the partial shuffling we may use the (percent) Hamming distance:
def hamming_distance(a, b):
assert(a.shape == b.shape)
return np.count_nonzero(a == b)
def percent_hamming_distance(a, b):
return hamming_distance(a, b) / len(a)
def shuffling_fraction(a, b):
return 1 - percent_hamming_distance(a, b)
And we may observe a behavior similar to this:
funcs = (
partial_shuffle,
partial_shuffle_eff,
partial_shuffle_eff_nb
)
n = 12
m = 3
arrs = (
np.zeros(n, dtype=int),
np.arange(n),
np.repeat(np.arange(m), n // m),
np.repeat(np.arange(3), 2),
np.repeat(np.arange(3), 3),
)
np.random.seed(0)
for arr in arrs:
print(" " * 24, arr)
for func in funcs:
shuffled = func(arr, 0.5)
print(f"{func.__name__:>24s}", shuffled, shuffling_fraction(arr, shuffled))
# [0 0 0 0 0 0 0 0 0 0 0 0]
# partial_shuffle [0 0 0 0 0 0 0 0 0 0 0 0] 0.0
# partial_shuffle_eff [0 0 0 0 0 0 0 0 0 0 0 0] 0.0
# partial_shuffle_eff_nb [0 0 0 0 0 0 0 0 0 0 0 0] 0.0
# [ 0 1 2 3 4 5 6 7 8 9 10 11]
# partial_shuffle [ 0 8 2 3 6 5 7 4 9 1 10 11] 0.5
# partial_shuffle_eff [ 3 8 11 0 4 5 6 7 1 9 10 2] 0.5
# partial_shuffle_eff_nb [ 9 10 11 3 4 5 6 7 8 0 1 2] 0.5
# [0 0 0 0 1 1 1 1 2 2 2 2]
# partial_shuffle [0 0 2 0 1 2 1 1 2 2 1 0] 0.33333333333333337
# partial_shuffle_eff [1 1 1 0 0 1 0 0 2 2 2 2] 0.5
# partial_shuffle_eff_nb [1 2 1 0 1 0 0 1 0 2 2 2] 0.5
# [0 0 1 1 2 2]
# partial_shuffle [0 0 1 1 2 2] 0.0
# partial_shuffle_eff [1 1 0 0 2 2] 0.6666666666666667
# partial_shuffle_eff_nb [1 2 0 1 0 2] 0.6666666666666667
# [0 0 0 1 1 1 2 2 2]
# partial_shuffle [0 0 1 1 0 1 2 2 2] 0.2222222222222222
# partial_shuffle_eff [0 1 2 1 0 1 2 2 0] 0.4444444444444444
# partial_shuffle_eff_nb [0 0 1 0 2 1 2 1 2] 0.4444444444444444
or, for an input closer to your use-case:
n = 4500
m = 3
arr = np.repeat(np.arange(m), n // m)
np.random.seed(0)
for func in funcs:
shuffled = func(arr, 0.5)
print(f"{func.__name__:>24s}", shuffling_fraction(arr, shuffled))
# partial_shuffle 0.33777777777777773
# partial_shuffle_eff 0.5
# partial_shuffle_eff_nb 0.5
Finally some small benchmarking:
n = 4500
m = 3
arr = np.repeat(np.arange(m), n // m)
np.random.seed(0)
for func in funcs:
print(f"{func.__name__:>24s}", end=" ")
%timeit func(arr, 0.5)
# partial_shuffle 213 µs ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# partial_shuffle_eff 10.9 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# partial_shuffle_eff_nb 172 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

changing the boolean values of an array according to a formula for the indices

I want to create a 64 components array showing all the squares in which the two rooks of an empty chessboard could move from their current position. So far I am doing it with for and while loops.
I first create a function just to better visualize the board:
import numpy as np
def from_array_to_matrix(v):
m=np.zeros((8,8)).astype('int')
for row in range(8):
for column in range(8):
m[row,column]=v[row*8+column]
return m
and here I show how I actually build the array:
# positions of the two rooks
a=np.zeros(64).astype('int')
a[15] = 1
a[25] = 1
print from_array_to_matrix(a)
# attack_a will be all the squares where they could move in the empty board
attack_a=np.zeros(64).astype('int')
for piece in np.where(a)[0]:
j=0
square=piece+j*8
while square<64:
attack_a[square]=1
j+=1
square=piece+j*8
j=0
square=piece-j*8
while square>=0:
attack_a[square]=1
j+=1
square=piece-j*8
j=0
square=piece+j
while square<8*(1+piece//8):
attack_a[square]=1
j+=1
square=piece+j
j=0
square=piece-j
while square>=8*(piece//8):
attack_a[square]=1
j+=1
square=piece-j
print attack_a
print from_array_to_matrix(attack_a)
I have been advised to avoid for and while loops whenever it is possible to use other ways, because they tend to be time consuming. Is there any way to achieve the same result without iterating the process with for and while loops ?
Perhaps using the fact that the indices to which I want to assign the value 1 can be determined by a function.
There are a couple of different ways to do this. The simplest thing is of course to work with matrices.
But you can vectorize operations on the raveled array as well. For example, say you had a rook at position 0 <= n < 64 in the linear array. To set the row to one, use integer division:
array[8 * (n // 8):8 * (n // 8 + 1)] = True
To set the column, use modulo:
array[n % 8::8] = True
You can convert to a matrix using reshape:
matrix = array.reshape(8, 8)
And back using ravel:
array = martix.ravel()
Or reshape:
array = matrix.reshape(-1)
Setting ones in a matrix is even simpler, given a specific row 0 <= m < 8 and column 0 <= n < 8:
matrix[m, :] = matrix[:, n] = True
Now the only question is how to vectorize multiple indices simultaneously. As it happens, you can use a fancy index in one axis. I.e, the expression above can be used with an m and n containing multiple elements:
m, n = np.nonzero(matrix)
matrix[m, :] = matrix[:, n] = True
You could even play games and do this with the array, also using fancy indexing:
n = np.nonzero(array)[0]
r = np.linspace(8 * (n // 8), 8 * (n // 8 + 1), 8, False).T.ravel()
c = np.linspace(n % 8, n % 8 + 64, 8, False)
array[r] = array[c] = True
Using linspace allows you to generate multiple sequences of the same size simultaneously. Each sequence is a column, so we transpose before raveling, although this is not required.
Use reshaping to convert 1-D array to 8x8 2-D matrix and then numpy advance indexing to select rows and columns to set to 1:
import numpy as np
def from_array_to_matrix(v):
return v.reshape(8,8)
# positions of the two rooks
a=np.zeros(64).astype('int')
a[15] = 1
a[25] = 1
a = from_array_to_matrix(a)
# attack_a will be all the squares where they could move in the empty board
attack_a=np.zeros(64).astype('int')
attack_a = from_array_to_matrix(attack_a)
#these two lines replace your for and while loops
attack_a[np.where(a)[0],:] = 1
attack_a[:,np.where(a)[1]] = 1
output:
a:
[[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]]
attack_a:
[[0 1 0 0 0 0 0 1]
[1 1 1 1 1 1 1 1]
[0 1 0 0 0 0 0 1]
[1 1 1 1 1 1 1 1]
[0 1 0 0 0 0 0 1]
[0 1 0 0 0 0 0 1]
[0 1 0 0 0 0 0 1]
[0 1 0 0 0 0 0 1]]

How to assign ones and zeros to specific indices of an array using numpy?

I wanted to construct a 6 x 9 matrix with entries zeros and ones in a specific way as follows. In the zeroth row column, 0 to 2 should be 1 and in the first-row column,3 to 5 should be one and in the second-row column, 6 to 8 should be one, with all the other entries to be zeros. In the third row, element 0,3,6 should be one and the other should be zeros. In the fourth row, element 1,4,7 should be one and the other elements should be zeros. In the fifth row,2,5,8 should be one and the remaining should be zeros. Half of the rows follow one way enter the value 1 and the other half of the row follows different procedures to enter the value one. How do extend this some 20 x 100 case where the first 10 rows follow one procedure as mentioned above and the second half follows different procedures
The 6x9 by matrix looks as follows
[[1,1,1,0,0,0,0,0,0],
[0,0,0,1,1,1,0,0,0],
[0,0,0,0,0,0,1,1,1],
[1,0,0,1,0,0,1,0,0],
[0,1,0,0,1,0,0,1,0],
[0,0,1,0,0,1,0,0,1]]
EDIT: Code I used to create this matrix:
import numpy as np
m=int(input("Enter the value of m, no. of points = "))
pimatrix=np.zeros((2*m +1)*(m**2)).reshape((2*m+1),(m**2))
for i in range(2*m + 1):
for j in range(m**2):
if((i<m) and ((j<((i+1)*m) and j>=(i*m)))):
pimatrix[i][j]=1
if (i>(m-1)):
for k in range(-1,m-1,1):
if(j == i+(k*m)):
pimatrix[i][j]=1
if i==2*m:
pimatrix[i][j]=1
print(pimatrix)
Try to use numpy.put function numpy.put
The best approach depends on the rules you plan to follow, but an easy approach would be to initialise the array as an array of zeroes:
import numpy as np
a = np.zeros([3, 4], dtype = int)
You can then write the logic to loop over the appropriate rows and set 1's as needed. You can simply access any element of the array by its coordinates:
a[2,1] = 1
print(a)
Result:
[[0 0 0 0]
[0 0 0 0]
[0 1 0 0]]
Without a general rule, it's hard to say what your intended logic is exactly, but assuming these rules: the top half of the array has runs of three ones on each consecutive row, starting in the upper left and moving down a row at the end of every run, until it reaches the bottom of the top half, where it wraps around to the top; the bottom half has runs of single ones, following the same pattern.
Implementing that, with your given example:
import numpy as np
a = np.zeros([6, 9], dtype=int)
def set_ones(a, run_length, start, end):
for n in range(a.shape[1]):
a[start + ((n // run_length) % (end - start)), n] = 1
set_ones(a, 3, 0, a.shape[0] // 2)
set_ones(a, 1, a.shape[0] // 2, a.shape[0])
print(a)
Result:
[[1 1 1 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 0 0 0 0 1 1 1]
[1 0 0 1 0 0 1 0 0]
[0 1 0 0 1 0 0 1 0]
[0 0 1 0 0 1 0 0 1]]

Modifying numpy array to get minimum number of values between elements

I have a numpy array of the form: arr = 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1
I would like to modify it such that there are atleast seven 0s between any two 1s. If there are less than seven 0s, then convert the intervining 1's to 0.
I am thinking that numpy.where could work here, but not sure how to do it in a succint, pythonic manner:
The output should look like this:
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
numpy.where(arr[:] > 1.0, 1.0, 0.0)
The following code is a really ugly hack, but it gets the job done in linear time (assuming 7 is fixed) without resorting to Python loops and without needing anything like Numba or Cython. I don't recommend using it, especially if 7 might be 700 next month.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr2 = numpy.append(1-arr, [0]*7)
numpy.power.at(rolling_window(arr2[1:], 7), np.arange(len(arr)), arr2[:-7, None])
arr = 1 - arr2[:-7]
It works by setting 1s to 0s and vice versa, then for each element x, setting each element y in the next 7 spots to y**x, then undoing the 0/1 switch. The power operation sets everything within 7 spaces of a 0 to 1, in such a way that the effect is immediately visible to power operations further down the array.
Now this is just a simple implementation using for loops and ifs but I am pretty sure it can be condensed.(a lot!) And yeah there's no need to do Numpy for this, it will only complicate things for you.
question = [0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1]
result = [0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]
indexesOf1s = []
for index,item in enumerate(question): #Here just calculate all the index of 1s
if item == 1:
indexesOf1s.append(index)
for i in indexesOf1s: #Iterate over the indexes and change acc to conditions
sub = i - indexes[indexes.index(i)-1]
if sub>0 and sub>=7:
question[i] = 1
elif sub>0:
question[i] = 0
print question
print result

Categories

Resources