Efficient Logical AND of Every Combination of Two Mask Elements

Efficient Logical AND of Every Combination of Two Mask Elements - python

I am looking to take a numpy array which is a 1D boolean mask of size N, and transform it into a new mask where each element represents a boolean AND over two mask elements (I don't want to repeat the same combinations twice since the order has no importance for the logical 'AND').
Example input:
mask = [1, 0, 1] = [a, b, c]
Expected output:
newmask = [1*0, 1*1, 0*1] = [0, 1, 0] = [a*b, a*c, b*c]

From a list of elements you can create all possibile combinations of them where their order doesn't matter, without wasting time on repeated combinations:
from itertools import combinations_with_replacement
import numpy as np
n = 3
elements_to_combine = [0, 1]
for c in combinations_with_replacement(elements_to_combine, n):
x = np.array(list(c))
print(x)
and the output is:
[0, 0, 0]
[0, 0, 1]
[0, 1, 1]
[1, 1, 1]
Now you have a straight foward method to compute only the combinations you need. You may also add elements to the list "elements_to_combine" and you may also increase the size of n according to your needs. Since you didn't specify precisely the kind of elmeents to be used and how you intend to mask your elements using the logical AND operations, I will leave the rest to you. Hope this solves your performance issues.
Cheers!

Related

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?

I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Find the index of first non-zero element to the right of given elements in python

I have a 2D numpy.ndarray. Given a list of positions, I want to find the positions of first non-zero elements to the right of the given elements in the same row. Is it possible to vectorize this? I have a huge array and looping is taking too much time.
Eg:
matrix = numpy.array([
[1, 0, 0, 1, 1],
[1, 1, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1]
])
query = numpy.array([[0,2], [2,1], [1,3], [0,1]])
Expected Result:
>> [[0,3], [2,4], [1,4], [0,3]]
Currently I'm doing this using for loops as follows
for query_point in query:
y, x = query_point
result_point = numpy.min(numpy.argwhere(self.matrix[y, x + 1:] == 1)) + x + 1
print(f'{y}, {result_point}')
PS: I also want to find the first non-zero element to the left as well. I guess, the solution to find the right point can be easily tqeaked to find the left point.

If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as matrix that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying query to this index array, which numpy supports directly.
It is actually much easier to find the left index, so let's start with that. We can transform matrix into an array of indices like this:
r, c = np.nonzero(matrix)
left_ind = np.zeros(matrix.shape, dtype=int)
left_ind[r, c] = c
Now you can find the indices of the preceding nonzero element by using np.maximum similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730:
np.maximum.accumulate(left_ind, axis=1, out=left_ind)
Now you can index directly into ind to get the previous nonzero column index:
left_ind[query[:, 0], query[:, 1]]
or
left_ind[tuple(query.T)]
Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices:
right_ind = np.zeros(matrix.shape, dtype=int)
right_ind[r, c] = matrix.shape[1] - c
You can use any number larger than matrix.shape[1] as your constant as well. The important thing is that the reversed indices all come out greater than zero so np.maximum.accumulate overwrites the zeros. Now you can use np.maximum.accumulate in the same way on the reversed array:
right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1]
In this case, I would recommend against using out=right_ind, since right_ind[:, ::-1] is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally.
Now you can index the array in the same way as before:
right_ind[(*query.T,)]
In both cases, you need to stack with the first column of query, since that's the row key:
>>> row, col = query.T
>>> np.stack((row, left_ind[row, col]), -1)
array([[0, 0],
[2, 0],
[1, 1],
[0, 0]])
>>> np.stack((row, right_ind[row, col]), -1)
array([[0, 3],
[2, 4],
[1, 4],
[0, 3]])
>>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1)
array([[0, 0, 3],
[2, 0, 4],
[1, 1, 4],
[0, 0, 3]])
If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.

I came up with a solution to get both your wanted indices,
i.e. to the left and to the right from the indicated position.
First define the following function, to get the row number and both indices:
def inds(r, c, arr):
ind = np.nonzero(arr[r])[0]
indSlice = ind[ind < c]
iLeft = indSlice[-1] if indSlice.size > 0 else None
indSlice = ind[ind > c]
iRight = indSlice[0] if indSlice.size > 0 else None
return r, iLeft, iRight
Parameters:
r and c are row number (in the source array) and the "starting"
index in this row,
arr is the array to look in (matrix will be passed here).
Then define the vectorized version of this function:
indsVec = np.vectorize(inds, excluded=['arr'])
And to get the result, run:
result = np.vstack(indsVec(query[:, 0], query[:, 1], arr=matrix)).T
The result is:
array([[0, 0, 3],
[2, 0, 4],
[1, 1, 4],
[0, 0, 3]], dtype=int64)
Your expected result is the left and right column (row number
and the index of first non-zero element after the "starting" position.
The middle column is the index of last non-zero element before the "starting" position.
This solution is resistant to "non-existing" case (if there are no
any "before" or "after" non-zero element). In such case the respective
index is returned as None.

ufunc.at for cases where target indices are unique (buffered call possible then)

I use ufunc.at similar as a sparse matrix multiplication or better, as a flow in a graph. c[:, 0] denotes the target index where each element denoted by the source index c[:, 1] will be summed up
c = np.array([[0, 1], [0, 2], [1, 1]) # sum up 1 and 2 into 0, and 1 into 1
src = ... # source vector
targ = ... # target vector, not necessarily 0 in the beginning
np.add.at(targ, c[:, 0], src[c[:, 1]]) # sum up into bins
One could similarly write:
targ[c[:, 0]] += src[c[:, 1]]
That approach will only work if all target indices c[:, 0] are unique, else there will be sort of race conditions. I also expect, that it is a bit faster because it does not need to care about accumulation internally, but can just do an 'one shot' addition, what is way more efficient when it comes to vectorization. Numpy calls this buffered/unbuffered operation.
Is there a similar syntax for the buffered version with unique target indices? (Basically just for convenience and more consistently looking code.)

Generating binary lists that sum to a given number

I am attempting Project Euler #15, which essentially reduces to computing the number of binary lists of length 2*size such that their entries sum to size, for the particular case size = 20. For example, if size = 2 there are 6 such lists: [1,1,0,0], [1,0,1,0], [1,0,0,1], [0,1,1,0], [0,1,1,0], [0,1,0,1], [0,0,1,1]. Of course the number of such sequences is trivial to compute for any value size and is equal to some binomial coefficient but I am interested in explicitly generating the correct sequences in Python. I have tried the following:
import itertools
size = 20
binary_lists = itertools.product(range(2), repeat = 2*size)
lattice_paths = {lists for lists in binary_lists if sum(lists) == size}
but the last line makes me run into memory errors. What would be a neat way to accomplish this?

There are far too many for the case of size=20 to iterate over (even if we don't materialize them, 137846528820 is not a number we can loop over in a reasonable time), so it's not particularly useful.
But you can still do it using built-in tools by thinking of the positions of the 1s:
from itertools import combinations
def bsum(size):
for locs in combinations(range(2*size), size):
vec = [0]*(2*size)
for loc in locs:
vec[loc] = 1
yield vec
which gives
>>> list(bsum(1))
[[1, 0], [0, 1]]
>>> list(bsum(2))
[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]
>>> sum(1 for x in bsum(12))
2704156
>>> factorial(24)//factorial(12)**2
2704156

I'm not 100% sure of the math on this problem, but your last line is taking a generator and dumping it into a list, and based on your example, and your size of 20, that is a massive list. If you want to sum it, just iterate, but I don't think you can get a nice view of every combo

Python - how to find numbers in a list which are not the minimum

I have a list S = [a[n],b[n],c[n]] and for n=0 the minimum of list S is the value 'a'. How do I select the values b and c given that I know the minimum? The code I'm writing runs through many iterations of n, and I want to examine the elements which are not the minimum for a given iteration in the loop.
Python 2.7.3, 32-bit. Numpy 1.6.2. Scipy 0.11.0b1

If you can flatten the whole list into a numpy array, then use argsort, the first row of argsort will tell you which array contains the minimum value:
a = [1,2,3,4]
b = [3,-4,5,8]
c = [6,1,-7,12]
S = [a,b,c]
S2 = np.array(S)
S2.argsort(axis=0)
array([[0, 1, 2, 0],
[1, 2, 0, 1],
[2, 0, 1, 2]])

Maybe you can do something like
S.sort()
S[1:3]
This is what you want?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient Logical AND of Every Combination of Two Mask Elements - python

Related

Function Failing at Large List Sizes

Find the index of first non-zero element to the right of given elements in python

ufunc.at for cases where target indices are unique (buffered call possible then)

Generating binary lists that sum to a given number

Python - how to find numbers in a list which are not the minimum

Categories

Resources