Modifying numpy array to get minimum number of values between elements - python

I have a numpy array of the form: arr = 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1
I would like to modify it such that there are atleast seven 0s between any two 1s. If there are less than seven 0s, then convert the intervining 1's to 0.
I am thinking that numpy.where could work here, but not sure how to do it in a succint, pythonic manner:
The output should look like this:
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
numpy.where(arr[:] > 1.0, 1.0, 0.0)

The following code is a really ugly hack, but it gets the job done in linear time (assuming 7 is fixed) without resorting to Python loops and without needing anything like Numba or Cython. I don't recommend using it, especially if 7 might be 700 next month.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr2 = numpy.append(1-arr, [0]*7)
numpy.power.at(rolling_window(arr2[1:], 7), np.arange(len(arr)), arr2[:-7, None])
arr = 1 - arr2[:-7]
It works by setting 1s to 0s and vice versa, then for each element x, setting each element y in the next 7 spots to y**x, then undoing the 0/1 switch. The power operation sets everything within 7 spaces of a 0 to 1, in such a way that the effect is immediately visible to power operations further down the array.

Now this is just a simple implementation using for loops and ifs but I am pretty sure it can be condensed.(a lot!) And yeah there's no need to do Numpy for this, it will only complicate things for you.
question = [0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1]
result = [0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]
indexesOf1s = []
for index,item in enumerate(question): #Here just calculate all the index of 1s
if item == 1:
indexesOf1s.append(index)
for i in indexesOf1s: #Iterate over the indexes and change acc to conditions
sub = i - indexes[indexes.index(i)-1]
if sub>0 and sub>=7:
question[i] = 1
elif sub>0:
question[i] = 0
print question
print result

Related

How to change the array elements according specific condition

I have an array for an example:
import numpy as np
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
Requirement :
In the data array, if element 1's are consecutive as the square size
of ((3,3)) and more than square size no changes. Otherwise, replace
element value 1 with zero except the square size.
Expected output :
[[4 4 4 0 1 1 1 0 0 0 0 0 0 0 0]
[3 0 0 0 1 1 1 0 0 0 0 0 0 0 0]
[6 0 0 0 1 1 1 0 0 0 0 0 0 0 0]]
I will provide here as solutions two different approaches. One which doesn't and one which is using Python loops. Let's start with the common header:
import numpy as np
from skimage.util import view_as_windows as winview
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
Below an approach without using Python loops resulting in shortest code, but requiring import of an additional module skimage:
clmn = np.where(np.all(winview(data,(3,3))[0],axis=(1,2)))[0][0]
data[data == 1] = 0 # set all ONEs to zero
data[0:3,clmn+3:] = 0 # set after match to zero
data[0:3,clmn:clmn+3] = 1 # restore ONEs
Another one is using Python loops and only two lines longer:
for clmn in range(0,data.shape[1]):
if np.all(data[0:3,clmn:clmn+3]):
data[data==1] = 0
data[0:3,clmn+3:] = 0
data[0:3,clmn:clmn+3] = 1
break
Instead of explaining how the above code using loops works I have put the 'explanations' into the names of the used variables so the code becomes hopefully self-explaining. With this explanations and some redundant code you can use the code below for another shaped haystack to search for in another array of same kind. For an array with more rows as the shape of the sub-array there will be necessary to loop also over the rows and optimize the code skipping some unnecessary checks.
import numpy as np
data=np.array([[4,4,4,0,1,1,1,0,0,0,0,1,0,0,1],
[3,0,0,1,1,1,1,1,1,1,1,0,0,1,0],
[6,0,0,1,1,1,1,1,0,0,0,0,1,0,0]])
indx_of_clmns_in_shape = 1
indx_of_rows_in_shape = 0
subarr_shape = (3, 3)
first_row = 0
first_clmn = 0
for clmn in range(first_clmn,data.shape[indx_of_clmns_in_shape],1):
sub_data = data[
first_row:first_row+subarr_shape[indx_of_rows_in_shape],
clmn:clmn+subarr_shape[indx_of_clmns_in_shape]]
if np.all(sub_data):
data[data == 1] = 0
data[first_row : subarr_shape[indx_of_rows_in_shape],
clmn+subarr_shape[indx_of_clmns_in_shape] : ] = 0
data[first_row : subarr_shape[indx_of_rows_in_shape],
clmn : clmn+subarr_shape[indx_of_clmns_in_shape]] = 1
break
# print(sub_data)
print(data)
all three versions of the code give the same result:
[[4 4 4 0 1 1 1 0 0 0 0 0 0 0 0]
[3 0 0 0 1 1 1 0 0 0 0 0 0 0 0]
[6 0 0 0 1 1 1 0 0 0 0 0 0 0 0]]
Should be easy to do with a double for loop and a second array
rows = len(source_array)
columns = len(source_array[0])
# Create a result array of same size
result_array = [[0 for _ in range(rows)] for _ in range(columns)]
for i in range(rows):
for j in range(columns):
# Copy non 1s
if source_array[i][j] != 1:
result_array[i][j] = source_array[i][j]
# if enough rows left to check then check
if i < rows - 3:
if j < columns - 3:
# Create set on the selected partition
elements = set(source_array[i][j:j+3] + source_array[i+1][j:j+3] + source_array[i+2][j:j+3])
# Copy 1s to new array
if len(elements) == 1 and 1 in elements:
for sq_i in range(i,i+3):
for sq_j in range(j,j+3):
result_array[sq_i][sq_j] = 1

Python DataFrame Accumulator Based on Flag

I have a logic-driven flag column and I need to create a column that increments by 1 when the flag is true and decrements by 1 when the flag is false down to a floor of zero.
I've tried a few different methods and I can't get the Accumulator 'shift' to reference the new value created by the process. I know the method below wouldn't stop at zero anyway, but I was just trying to work through the concept before and this is the most to-the-point example to explain the goal. Do I need a for loop to iterate line-by-line?
df = pd.DataFrame(data=np.random.randint(2,size=10), columns=['flag'])
df['accum'] = 0
df['accum'] = np.where(df['flag'] == 1, df['accum'].shift(1) + 1, df['accum'].shift(1) - 1)
df['dOutput'] = [1,0,1,2,1,2,3,2,1,0] #desired output
df
Output
As far as I know, there's no numpy or pandas vectorized operation to do this, so, you should iterate line-by-line:
def cumsum_with_floor(series):
acc = 0
output = []
accum_list = []
for val in series:
val = 1 if val else -1
acc += val
accum_list.append(val)
acc = acc if acc > 0 else 0
output.append(acc)
return pd.Series(output, index=series.index), pd.Series(accum_list, index=series.index)
series = pd.Series([1,0,1,1,0,0,0,1])
dOutput, accum = cumsum_with_floor(series)
dOutput
Out:
0 1
1 0
2 1
3 2
4 1
5 0
6 0
7 1
dtype: int64
accum # shifted by one step forward compared with you example
Out:
0 1
1 -1
2 1
3 1
4 -1
5 -1
6 -1
7 1
dtype: int64
But may be there's somebody who knows suitable combination of pd.clip and pd.cumsum or other vectorized operations.

Numpy vectorized summation with variable number of factors

I am currently computing a function that contains a summation over an index. The index is between 0 and the integer part of T; ideally I would like to be able to compute this summation quickly for several values of T.
In a real-life case, most of the values of T are small, but a small percentage can be one or two orders of magnitude larger than the average.
What I am doing now is:
1) I define the vector T, e.g. (my real-life data have a much larger number of entries, it is just to give an idea):
import numpy as np
T = np.random.exponential(5, 10)
2) I create a matrix containing the factors between 0 and int(T), and then zeroes:
n = int(T.max())
j = ((np.arange(n) < T[:,np.newaxis])*np.arange(1,n+1)).astype(int).transpose()
print(j)
[[ 1 1 1 1 1 1 1 1 1 1]
[ 2 0 2 2 2 0 2 0 2 2]
[ 0 0 3 0 3 0 3 0 3 3]
[ 0 0 4 0 4 0 0 0 4 4]
[ 0 0 5 0 5 0 0 0 5 5]
[ 0 0 6 0 6 0 0 0 6 6]
[ 0 0 7 0 7 0 0 0 0 7]
[ 0 0 8 0 8 0 0 0 0 8]
[ 0 0 9 0 9 0 0 0 0 9]
[ 0 0 0 0 10 0 0 0 0 10]
[ 0 0 0 0 11 0 0 0 0 0]
[ 0 0 0 0 12 0 0 0 0 0]]
3) I generate the single elements of the summation, using a mask to avoid applying the function to the elements that are zero:
A = np.log(1 + (1 + j) * 5)* (j>0)
4) I sum along the columns:
A.sum(axis=0)
Obtaining:
array([ 5.170484 , 2.39789527, 29.96464821, 5.170484 ,
42.29052851, 2.39789527, 8.21500643, 2.39789527,
18.49060911, 33.9899999 ])
Is there a fastest/better way to vectorize that? I have the feeling that it is very slow due to the large amount of zeroes that do not contribute to the sum, but since I am a beginner with NumPy I couldn't figure out a better way of writing it.
EDIT: in my actual problem, the function applied to j depends also on a second parameter tau (in a vector of the same size of T). So the items contained in every column are not the same.
Looking at your j, for each column it has numbers going from 1 to N, where N is being decided based on each T element. Then, you are summing along each column, which is the same as summing until N because rest of the elements are zeros anyway. Those summed values could be calculated with np.cumsum and those N values that are basically the limits of each column in j could be directly calculated from T. These N values are then used as indices to index into the cumsum-ed values to give us the final output.
This should be pretty fast and memory efficient, given that cumsum is the only computation done and that too on a 1D array, as compared to the summation done in the original approach on a 2D array along each column. Thus, we have a vectorized approach like so -
n = int(T.max())
vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
out = vals[(T.astype(int)).clip(max=n-1)]
In terms of memory usage, we are generating three variables -
n : Scalar
vals : 1D array of n elements
out : 1D array of T.size elements (this is the output anyway)
Runtime test and verify output -
In [5]: def original_app(T):
...: n = int(T.max())
...: j = ((np.arange(n) < T[:,None])*np.arange(1,n+1)).astype(int).transpose()
...: A = np.log(1 + (1 + j) * 5)* (j>0)
...: return A.sum(axis=0)
...:
...: def vectorized_app(T):
...: n = int(T.max())
...: vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
...: return vals[(T.astype(int)).clip(max=n-1)]
...:
In [6]: # Input array
...: T = np.random.exponential(5, 10000)
In [7]: %timeit original_app(T)
100 loops, best of 3: 9.62 ms per loop
In [8]: %timeit vectorized_app(T)
10000 loops, best of 3: 50.1 µs per loop
In [9]: np.allclose(original_app(T),vectorized_app(T)) # Verify outputs
Out[9]: True

Calculating number of permutations of a matrix with elements being adjacent integers only

I'm trying to write a Python code in order to determine the number of possible permutations of a matrix where neighbouring elements can only be adjacent integer numbers. I also wish to know how many times each total set of numbers appears (by that I mean, the same numbers of each integer in n matrices, but not in the same matrix permutation)
Forgive me if I'm not being clear, or if my terminology isn't ideal! Consider a 5 x 5 zero matrix. This is an acceptable permutaton, as all of the elements are adjacent to an identical number.
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
25 x 0, 0 x 1, 0 x 2
The elements within the matrix can be changed to 1 or 2. Changing any of the elements to 1 would also be an acceptable permutation, as the 1 would be surrounded by an adjacent integer, 0. For example, changing the central [2,2] element of the matrix:
0 0 0 0 0
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
24 x 0, 1 x 1, 0 x 2
However, changing the [2,2] element in the centre to a 2 would mean that all of the elements surrounding it would have to switch to 1, as 2 is not adjacent to 0.
0 0 0 0 0
0 1 1 1 0
0 1 2 1 0
0 1 1 1 0
0 0 0 0 0
16 x 0, 8 x 1, 1 x 2
I want to know how many permutations are possible from that zeroed 5x5 matrix by changing the elements to 1 and 2, whilst keeping neighbouring elements as adjacent integers. In other words, any permutations where 0 and 2 are adjacent are not allowed.
I also wish to know how many matrices contain a certain number of each integer. For example, both of the below matrices would be 24 x 0, 1 x 1, 0 x 2. Over every permutation, I'd like to know how many correspond to this frequency of integers.
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Again, sorry if I'm not being clear or my nomenclature is poor! Thanks for your time - I'd really appreciate some help with this, and any words or guidance would be kindly received.
Thanks,
Sam
First, what you're calling a permutation isn't.
Secondly your problem is that a naive brute force solution would look at 3^25 = 847,288,609,443 possible combinations. (Somewhat less, but probably still in the hundreds of billions.)
The right way to solve this is called dynamic programming. What you need to do for your basic problem is calculate, for i from 0 to 4, for each of the different possible rows you could have there how many possible matrices you could have had that end in that row.
Add up all of the possible answers in the last row, and you'll have your answer.
For the more detailed count, you need to divide it by row, by cumulative counts you could be at for each value. But otherwise it is the same.
The straightforward version should require tens of thousands of operation. The detailed version might require millions. But this will be massively better than the hundreds of billions that the naive recursive version takes.
Just search for some more simple rules:
1s can be distributed arbitrarily in the array, since the matrix so far only consists of 0s. 2s can aswell be distributed arbitrarily, since only neighbouring elements must be either 1 or 2.
Thus there are f(x) = n! / x! possibilities to distributed 1s and 2s over the matrix.
So the total number of possible permutations is 2 * sum(x = 1 , n * n){f(x)}.
Calculating the number of possible permutations with a fixed number of 1s can easily be solved by simple calculating f(x).
The number of matrices with a fixed number of 2s and 1s is a bit more tricky. Here you can only rely on the fact that all mirrored versions of the matrix yield the same number of 1s and 2s and are valid. Apart from using that fact you can only brute-force search for correct solutions.

Recursion and Percolation

I'm trying to write a function that will check for undirected percolation in a numpy array. In this case, undirected percolation occurs when there is some kind of path that the liquid can follow (the liquid can travel up, down, and sideways, but not diagonally). Below is an example of an array that could be given to us.
1 0 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 1 0 0
1 0 1 0 1
The result of percolation in this scenario is below.
1 0 1 1 0
1 0 0 0 0
1 0 1 0 0
1 1 1 0 0
1 0 1 0 0
In the scenario above, the liquid could follow a path and everything with a 1 currently would refill except for the 1's in positions [1,4] and [4,4].
The function I'm trying to write starts at the top of the array and checks to see if it's a 1. If it's a 1, it writes it to a new array. What I want it to do next is check the positions above, below, left, and right of the 1 that has just been assigned.
What I currently have is below.
def flow_from(sites,full,i,j)
n = len(sites)
if j>=0 and j<n and i>=0 and i<n: #Check to see that value is in array bounds
if sites[i,j] == 0:
full[i,j] = 0
else:
full[i,j] = 1
flow_from(sites, full, i, j + 1)
flow_from(sites, full, i, j - 1)
flow_from(sites, full, i + 1, j)
flow_from(sites, full, i - 1, j)
In this case, sites is the original matrix, for example the one shown above. New is the matrix that has been replaced with it's flow matrix. Second matrix shown. And i and j are used to iterate through.
Whenever I run this, I get an error that says "RuntimeError: maximum recursion depth exceeded in comparison." I looked into this and I don't think I need to adjust my recursion limit, but I have a feeling there's something blatantly obvious with my code that I just can't see. Any pointers?
Forgot about your code block. This is a known problem with a known solution from the scipy library. Adapting the code from this answer and assume your data is in an array named A.
from scipy.ndimage import measurements
# Identify the clusters
lw, num = measurements.label(A)
area = measurements.sum(A, lw, index=np.arange(lw.max() + 1))
print A
print area
This gives:
[[1 0 1 1 0]
[1 0 0 0 1]
[1 0 1 0 0]
[1 1 1 0 0]
[1 0 1 0 1]]
[[1 0 2 2 0]
[1 0 0 0 3]
[1 0 1 0 0]
[1 1 1 0 0]
[1 0 1 0 4]]
[ 0. 9. 2. 1. 1.]
That is, it's labeled all the "clusters" for you and identified the size! From here you can see that the clusters labeled 3 and 4 have size 1 which is what you want to filter away. This is a much more powerful approach because now you can filter for any size.

Categories

Resources