Related
I have a 2D numpy.ndarray. Given a list of positions, I want to find the positions of first non-zero elements to the right of the given elements in the same row. Is it possible to vectorize this? I have a huge array and looping is taking too much time.
Eg:
matrix = numpy.array([
[1, 0, 0, 1, 1],
[1, 1, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1]
])
query = numpy.array([[0,2], [2,1], [1,3], [0,1]])
Expected Result:
>> [[0,3], [2,4], [1,4], [0,3]]
Currently I'm doing this using for loops as follows
for query_point in query:
y, x = query_point
result_point = numpy.min(numpy.argwhere(self.matrix[y, x + 1:] == 1)) + x + 1
print(f'{y}, {result_point}')
PS: I also want to find the first non-zero element to the left as well. I guess, the solution to find the right point can be easily tqeaked to find the left point.
If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as matrix that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying query to this index array, which numpy supports directly.
It is actually much easier to find the left index, so let's start with that. We can transform matrix into an array of indices like this:
r, c = np.nonzero(matrix)
left_ind = np.zeros(matrix.shape, dtype=int)
left_ind[r, c] = c
Now you can find the indices of the preceding nonzero element by using np.maximum similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730:
np.maximum.accumulate(left_ind, axis=1, out=left_ind)
Now you can index directly into ind to get the previous nonzero column index:
left_ind[query[:, 0], query[:, 1]]
or
left_ind[tuple(query.T)]
Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices:
right_ind = np.zeros(matrix.shape, dtype=int)
right_ind[r, c] = matrix.shape[1] - c
You can use any number larger than matrix.shape[1] as your constant as well. The important thing is that the reversed indices all come out greater than zero so np.maximum.accumulate overwrites the zeros. Now you can use np.maximum.accumulate in the same way on the reversed array:
right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1]
In this case, I would recommend against using out=right_ind, since right_ind[:, ::-1] is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally.
Now you can index the array in the same way as before:
right_ind[(*query.T,)]
In both cases, you need to stack with the first column of query, since that's the row key:
>>> row, col = query.T
>>> np.stack((row, left_ind[row, col]), -1)
array([[0, 0],
[2, 0],
[1, 1],
[0, 0]])
>>> np.stack((row, right_ind[row, col]), -1)
array([[0, 3],
[2, 4],
[1, 4],
[0, 3]])
>>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1)
array([[0, 0, 3],
[2, 0, 4],
[1, 1, 4],
[0, 0, 3]])
If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.
I came up with a solution to get both your wanted indices,
i.e. to the left and to the right from the indicated position.
First define the following function, to get the row number and both indices:
def inds(r, c, arr):
ind = np.nonzero(arr[r])[0]
indSlice = ind[ind < c]
iLeft = indSlice[-1] if indSlice.size > 0 else None
indSlice = ind[ind > c]
iRight = indSlice[0] if indSlice.size > 0 else None
return r, iLeft, iRight
Parameters:
r and c are row number (in the source array) and the "starting"
index in this row,
arr is the array to look in (matrix will be passed here).
Then define the vectorized version of this function:
indsVec = np.vectorize(inds, excluded=['arr'])
And to get the result, run:
result = np.vstack(indsVec(query[:, 0], query[:, 1], arr=matrix)).T
The result is:
array([[0, 0, 3],
[2, 0, 4],
[1, 1, 4],
[0, 0, 3]], dtype=int64)
Your expected result is the left and right column (row number
and the index of first non-zero element after the "starting" position.
The middle column is the index of last non-zero element before the "starting" position.
This solution is resistant to "non-existing" case (if there are no
any "before" or "after" non-zero element). In such case the respective
index is returned as None.
This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 3 years ago.
I just started programming with Python and I have a question concerning 2D arrays.
I need to create a matrix (numpy forbidden) of a certain size from av[1] and save it so that I can multiply it with another one later on.
The logical thing for me was:
1- Get the length of av[1]; 2- Transform the av[1][] to it's ASCII equivalent with ord and finally 3- Insert that value into a list called key_matrix.
The matrix needs to be the smallest size possible and in a "square form", so I calculate the smallest square containing the len of av[1] ( called matrix_size ) and then I initialize my array like this:
key_matrix = [[0] * matrix_size] * matrix_size;
Like that I get my square matrix filled with 0.
If matrix_size = 3 for example I get:
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
Now the strange part is (and also the part that I'm stuck on) the following, I use
index1 = 0;
index2 = 0;
key_matrix[index1][index2] = ord(sys.argv[1][0]);
print(key_matrix);
to fill only the 1st element of the 1st line.
While the result should be:
[[97, 0, 0], [0, 0, 0], [0, 0, 0]]
I get:
[[97, 0, 0], [97, 0, 0], [97, 0, 0]]
Me and my friends really cant seem to figure out the reason why it does this, any help is welcome!
Thank you all for reading :)
The memory address of the lists are the same when you write [[0] * matrix_size] * matrix_size, it basically reuses the memory address of the first [0] * matrix_size array as the same reference for multiple times for other lists. Since these lists are not independent from memory address perspective, we would have issue modifing only one of the list. You can do below as a quick fix for the issue:
>>> key_matrix2 = [[0] *3 for i in range(matrix_size)]
>>> key_matrix2
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> key_matrix2[0][1] = 2
>>> key_matrix2
[[0, 2, 0], [0, 0, 0], [0, 0, 0]]
I am trying to find the numpy matrix operations to get the same result as in the following for loop code. I believe it will be much faster but I am missing some python skills to do it.
It works line by line, each value from a line of x is multiplied by each value of the same line in e and then summed.
The first item of result would be (2*0+2*1+2*4+2*2+2*3)+(0*0+...)+...+(1*0+1*1+1*4+1*2+1*3)=30
Any idea would be much appreciated :).
e = np.array([[0,1,4,2,3],[2,0,2,3,0,1]])
x = np.array([[2,0,0,0,1],[0,3,0,0,4,0]])
result = np.zeros(len(x))
for key, j in enumerate(x):
for jj in j:
for i in e[key]:
result[key] += jj*i
>>> result
Out[1]: array([ 30., 56.])
Those are ragged arrays as they have lists of different lengths. So, a fully vectorized approach even if possible won't be straight-forward. Here's one using np.einsum in a loop comprehension -
[np.einsum('i,j->',x[n],e[n]) for n in range(len(x))]
Sample run -
In [381]: x
Out[381]: array([[2, 0, 0, 0, 1], [0, 3, 0, 0, 4, 0]], dtype=object)
In [382]: e
Out[382]: array([[0, 1, 4, 2, 3], [2, 0, 2, 3, 0, 1]], dtype=object)
In [383]: [np.einsum('i,j->',x[n],e[n]) for n in range(len(x))]
Out[383]: [30, 56]
If you are still feel persistent about a fully vectorized approach, you could make a regular array with the smaller lists being filled zeros. For the same, here's a post that lists a NumPy based approach to do the filling.
Once, we have the regular shaped arrays as x and e, the final result would be simply -
np.einsum('ik,il->i',x,e)
Is this close to what you are looking for?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
It seems like you are trying to get the dot product of matrices.
I'm doing some work with the Ising model. I've written a code to help me count the multiplicity of a lattice but I can't get up to any big numbers without getting a MemoryError.
The basic idea is, you have a list of zeros and ones, say [0,0,1,1]. I want to generate a set of all possible orderings of the ones and zeros. So in this example I want a set like this:
[(1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)]
At the moment I have done it like this:
set_initial=[0,0,1,1]
set_intermediate=[]
for subset in itertools.permutations(set_initial,4):
set_intermediate.append(subset)
set_final=list(set(set_intermediate))
The issue is that in the set_intermediate, for this example, there are 2^4 elements, only six of which are unique. And to take another example such as [0,0,0,0,0,0,0,0,1], there are 2^9 elements, only 9 of which are unique.
Is there another way of doing this so that set_intermediate isn't such a bottleneck?
Instead of permutations, you can think in terms of selecting the positions of the 1s as combinations. (I knew I'd done something similar before..)
from itertools import combinations
def binary_perm(seq):
n_on = sum(seq)
for comb in combinations(range(len(seq)), n_on):
out = [0]*len(seq)
for loc in comb:
out[loc] = 1
yield out
Not super-speedy, but generates exactly the right number of outputs, and so can handle longer sequences:
>>> list(binary_perm([0,0,1,1]))
[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]
>>> %timeit sum(1 for x in binary_perm([1]+[0]*10**4))
1 loops, best of 3: 409 ms per loop
Of course, usually you'd want to avoid looping over these in the first place, but depending on what you're doing with the permuations you might not be able to get away with simply calculating the number of unique permutations directly.
Try this inbuilt method itertools.permutation(iterable,r)
I am attempting Project Euler #15, which essentially reduces to computing the number of binary lists of length 2*size such that their entries sum to size, for the particular case size = 20. For example, if size = 2 there are 6 such lists: [1,1,0,0], [1,0,1,0], [1,0,0,1], [0,1,1,0], [0,1,1,0], [0,1,0,1], [0,0,1,1]. Of course the number of such sequences is trivial to compute for any value size and is equal to some binomial coefficient but I am interested in explicitly generating the correct sequences in Python. I have tried the following:
import itertools
size = 20
binary_lists = itertools.product(range(2), repeat = 2*size)
lattice_paths = {lists for lists in binary_lists if sum(lists) == size}
but the last line makes me run into memory errors. What would be a neat way to accomplish this?
There are far too many for the case of size=20 to iterate over (even if we don't materialize them, 137846528820 is not a number we can loop over in a reasonable time), so it's not particularly useful.
But you can still do it using built-in tools by thinking of the positions of the 1s:
from itertools import combinations
def bsum(size):
for locs in combinations(range(2*size), size):
vec = [0]*(2*size)
for loc in locs:
vec[loc] = 1
yield vec
which gives
>>> list(bsum(1))
[[1, 0], [0, 1]]
>>> list(bsum(2))
[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]
>>> sum(1 for x in bsum(12))
2704156
>>> factorial(24)//factorial(12)**2
2704156
I'm not 100% sure of the math on this problem, but your last line is taking a generator and dumping it into a list, and based on your example, and your size of 20, that is a massive list. If you want to sum it, just iterate, but I don't think you can get a nice view of every combo