MATLAB "any" conditional deletion translation to Python - python

I'm having trouble understanding what B = A(~any(A < threshold, 2), :); (in MATLAB) does given array A with dimensions N x 3.
Ultimately, I am trying to implement a function do perform the same operation in Python (so far, I have something like B = A[not any(A[:,1] < threshold), :], which I know to be incorrect), and I was wondering what the numpy equivalent to such an operation would be.
Thank you!

Not much of difference really. In MATLAB, you are performing ANY along the rows with any(...,2). In NumPy, you have axis to denote those dimensions and for a 2D array, it would be np.any(...,axis=1).
Thus, the NumPy equivalent implementation would be -
import numpy as np
B = A[~np.any(A < threshold,axis=1),:]
This indexing is also termed as slicing in NumPy terminology. Since, we are slicing along the first axis, we can drop the all-elements-selection along the rest of the axes. So, it would simplify to -
B = A[~np.any(A < threshold,axis=1)]
Finally, we can use the method ndarray.any and skip the mention of axis parameter to shorten the code further, like so -
B = A[~(A < threshold).any(1)]

Related

Trouble understanding the function of : in array indexing? Python

I have this line in the code:
next_J[v] = np.min(Q[v, :] + J)
Where essentially Q is an matrix of size n x n and J is a vector of size n. What does Q[v, :] mean?
I tried to code this out but still do not understand what exactly it does.
Q[v,:] is translated by the interpreter as Q.__getitem__((v, slice(None))
Note the (,) tuple syntax.
For a 2d array, this means select the v row. The slice selects all columns, and isn't actually needed in this context.
For a list this produces an error. alist[v] would work.
Q[v]+J may not work fot lists, depending on what J is. + is different for lists. The class of an object is important in understanding code.
The use of : in python indexing is basic. So is its use jn numpy indexing.
There's a lot more about using slicing with numpy arrays at https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding

Numpy compare two arrays take for each entry the smaller one

I have two Arrays (which are very big), which have the same dimensions. In one of the both Arrays (in my code it's called "minMap") i want to save the smaller value of those both arrays
My current code looks like this:
for y in range(loadedMap.shape[1]):
for x in range(loadedMap.shape[0]):
if loadedMap[x][y] < minMap[x][y]:
minMap[x][y] = loadedMap[x][y]
It's working but im pretty sure it's a dumb solution, because I haven't used any numpy functionality. Maybe a solution with vectorization is faster? I didn't know how to do that :/
(Sorry for my bad english)
There is a function np.minimum() which does exactly what you want:
# a and b are 2 arrays with same shape
c = np.minimum(a, b)
# c will contain minimum values from a and b

Sum ndarray values

Is there an easier way to get the sum of all values (assuming they are all numbers) in an ndarray :
import numpy as np
m = np.array([[1,2],[3,4]])
result = 0
(dim0,dim1) = m.shape
for i in range(dim0):
for j in range(dim1):
result += m[i,j]
print result
The above code seems somewhat verbose for a straightforward mathematical operation.
Thanks!
Just use numpy.sum():
result = np.sum(matrix)
or equivalently, the .sum() method of the array:
result = matrix.sum()
By default this sums over all elements in the array - if you want to sum over a particular axis, you should pass the axis argument as well, e.g. matrix.sum(0) to sum over the first axis.
As a side note your "matrix" is actually a numpy.ndarray, not a numpy.matrix - they are different classes that behave slightly differently, so it's best to avoid confusing the two.
Yes, just use the sum method:
result = m.sum()
For example,
In [17]: m = np.array([[1,2],[3,4]])
In [18]: m.sum()
Out[18]: 10
By the way, NumPy has a matrix class which is different than "regular" numpy arrays. So calling a regular ndarray matrix causes some cognitive dissonance. To help others understand your code, you may want to change the name matrix to something else.

Numpy Array index problems

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

In SciPy, fancy indexing for csr_matrices

I am new to Python, so forgive me ahead of time if this is an elementary question, but I have searched around and have not found a satisfying answer.
I am trying to do the following using NumPy and SciPy:
I,J = x[:,0], x[:1] # x is a two column array of (r,c) pairs
V = ones(len(I))
G = sparse.coo_matrix((V,(I,J))) # G's dimensions are 1032570x1032570
G = G + transpose(G)
r,c = G.nonzero()
G[r,c] = 1
...
NotImplementedError: Fancy indexing in assignment not supported for csr matrices
Pretty much, I want all the nonzero values to equal 1 after adding the transpose, but I get the fancy indexing error messages.
Alternatively, if I could show that the matrix G is symmetric, adding the transpose would not be necessary.
Any insight into either approach would be very much appreciated.
In addition to doing something like G = G / G, you can operate on G.data.
So, in your case, doing either:
G.data = np.ones(G.nnz)
or
G.data[G.data != 0] = 1
Will do what you want. This is more flexible, as it allows you to preform other types of filters (e.g. G.data[G.data > 0.9] = 1 or G.data = np.random.random(G.nnz))
The second option will only set the values to one if they have a nonzero value. During some calculations, you'll wind up with zero values that are "dense" (i.e. they're actually stored as a value in the sparse array). (You can remove these in-place with G.eliminate_zeros())

Categories

Resources