Trouble understanding the function of : in array indexing? Python - python

I have this line in the code:
next_J[v] = np.min(Q[v, :] + J)
Where essentially Q is an matrix of size n x n and J is a vector of size n. What does Q[v, :] mean?
I tried to code this out but still do not understand what exactly it does.

Q[v,:] is translated by the interpreter as Q.__getitem__((v, slice(None))
Note the (,) tuple syntax.
For a 2d array, this means select the v row. The slice selects all columns, and isn't actually needed in this context.
For a list this produces an error. alist[v] would work.
Q[v]+J may not work fot lists, depending on what J is. + is different for lists. The class of an object is important in understanding code.
The use of : in python indexing is basic. So is its use jn numpy indexing.
There's a lot more about using slicing with numpy arrays at https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding

Related

List comprehension for np matrixes

I have two np.matrixes, one of which I'm trying to normalize. I know, in general, list comprehensions are faster than for loops, so I'm trying to convert my double for loop into a list expression.
# normalize the rows and columns of A by B
for i in range(1,q+1):
for j in range(1,q+1):
A[i-1,j-1] = A[i-1,j-1] / (B[i-1] / B[j-1])
This is what I have gotten so far:
A = np.asarray([A/(B[i-1]/B[j-1]) for i, j in zip(range(1,q+1), range(1,q+1))])
but I think I'm taking the wrong approach because I'm not seeing any significant time difference.
Any help would be appreciated.
First, if you really do mean np.matrix, stop using np.matrix. It has all sorts of nasty incompatibilities, and its role is obsolete now that # for matrix multiplication exists. Even if you're stuck on a Python version without #, using the dot method with normal ndarrays is still better than dealing with np.matrix.
You shouldn't use any sort of Python-level iteration construct with NumPy arrays, whether for loops or list comprehensions, unless you're sure you have no better options. Assuming A is 2D and B is 1D with shapes (q, q) and (q,) respectively, what you should instead do for this case is
A *= B
A /= B[:, np.newaxis]
broadcasting the operation over A. This will allow NumPy to perform the iteration at C level directly over the arrays' underlying data buffers, without having to create wrapper objects and perform dynamic dispatch on every operation.

Building a numpy matrix

I'm trying to build a matrix in numpy. The matrix dimensions should be (5001x7). Here is my code:
S=np.array([.0788,.0455,.0222,.0042,.0035,.0029,.0007])
#This is vector S, comprised of 7 scalars.
lamb=list(range(0,5001))
#This is a list of possible values for lambda, a parameter in my data.
M = np.empty([5001,7], order='C')
#This is the empty matrix which is to be filled in the iterations below.
for i in S:
for j in lamb:
np.append(M,((S[i]**2)/(lamb[j]+S[i]**2)))
The problem I'm having is that M remains a matrix of zero vectors.
Important details:
1) I've assigned the final line as:
M=np.append(M,((S[i]**2)/(lamb[j]+S[i]**2)))
I then get an array of values of length 70,014 in a 1d array. I'm not really sure what to make of it.
2) I've already tried switching the dtype parameter between 'float' and 'int' for matrix M.
3) I receive this warning when I run the code:
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
app.launch_new_instance()
4) I'm working in Python 3.4
I really appreciate your help. Thank you!
1) append adds to the end of the array, which is why your final array has 5001x7x2=70014 elements. Only the first half is zeros. It flattens the array to 1D because you didn't specify an axis to append.
2) A much more "numpy" way to do this whole process is broadcasting
S=np.array([.0788,.0455,.0222,.0042,.0035,.0029,.0007])
lamb=np.arange(0,5001)
M=(S[:,None]**2)/(lamb[None,:]+S[:,None]**2)
np.append makes a copy of the array and appends values to the end of the copy (making the array larger each time), whereas I think you want to modify M in place:
for i in range(len(S)):
for j in range(len(lamb)):
M[j][i] = ((S[i]**2)/(lamb[j]+S[i]**2))

MATLAB "any" conditional deletion translation to Python

I'm having trouble understanding what B = A(~any(A < threshold, 2), :); (in MATLAB) does given array A with dimensions N x 3.
Ultimately, I am trying to implement a function do perform the same operation in Python (so far, I have something like B = A[not any(A[:,1] < threshold), :], which I know to be incorrect), and I was wondering what the numpy equivalent to such an operation would be.
Thank you!
Not much of difference really. In MATLAB, you are performing ANY along the rows with any(...,2). In NumPy, you have axis to denote those dimensions and for a 2D array, it would be np.any(...,axis=1).
Thus, the NumPy equivalent implementation would be -
import numpy as np
B = A[~np.any(A < threshold,axis=1),:]
This indexing is also termed as slicing in NumPy terminology. Since, we are slicing along the first axis, we can drop the all-elements-selection along the rest of the axes. So, it would simplify to -
B = A[~np.any(A < threshold,axis=1)]
Finally, we can use the method ndarray.any and skip the mention of axis parameter to shorten the code further, like so -
B = A[~(A < threshold).any(1)]

How to find the index of an array within an array

I have created an array in the way shown below; which represents 3 pairs of co-ordinates. My issue is I don't seem to be able to find the index of a particular pair of co-ordinates within the array.
import numpy as np
R = np.random.uniform(size=(3,2))
R
Out[5]:
array([[ 0.57150157, 0.46611662],
[ 0.37897719, 0.77653461],
[ 0.73994281, 0.7816987 ]])
R.index([ 0.57150157, 0.46611662])
The following is returned:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
The reason I'm trying to do this is so I can extend a list, with the index of a co-ordinate pair, within a for-loop.
e.g.
v = []
for A in R:
v.append(R.index(A))
I'm just not sure why the index function isn't working, and can't seem to find a way around it.
I'm new to programming so excuse me if this seems like nonsense.
index() is a method of the type list, not of numpy.array. Try:
R.tolist().index(x)
Where x is, for example, the third entry of R. This first convert your array into a list, then you can use index ;)
You can achieve the desired result by converting your inner arrays (the coordinates) to tuples.
R = map(lambda x: (x), R);
And then you can find the index of a tuple using R.index((number1, number2));
Hope this helps!
[Edit] To explain what's going on in the code above, the map function goes through (iterates) the items in the array R, and for each one replaces it with the return result of the lambda function.
So it's equivalent to something along these lines:
def someFunction(x):
return (x)
for x in range(0, len(R)):
R[x] = someFunction(R[x])
So it takes each item and does something to it, putting it back in the list. I realized that it may not actually do what I thought it did (returning (x) doesn't seem to change a regular array to a tuple), but it does help your situation because I think by iterating through it python might create a regular array out of the numpy array.
To actually convert to a tuple, the following code should work
R = map(tuple, R)
(credits to https://stackoverflow.com/a/10016379/2612012)
Numpy arrays don't an index function, for a number of reasons. However, I think you're wanting something different.
For example, the code you mentioned:
v = []
for A in R:
v.append(R.index(A))
Would just be (assuming R has unique rows, for the moment):
v = range(len(R))
However, I think you might be wanting the built-in function enumerate. E.g.
for i, row in enumerate(R):
# Presumably you're doing something else with "row"...
v.append(i)
For example, let's say we wanted to know the indies where the sum of each row was greater than 1.
One way to do this would be:
v = []
for i, row in enumerate(R)
if sum(row) > 1:
v.append(i)
However, numpy also provides other ways of doing this, if you're working with numpy arrays. For example, the equivalent to the code above would be:
v, = np.where(R.sum(axis=1) > 1)
If you're just getting started with python, focus on understanding the first example before worry too much about the best way to do things with numpy. Just be aware that numpy arrays behave very differently than lists.

Numpy Array index problems

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

Categories

Resources