How to find indices of matching elements in arrays? - python

I have two vectors, vector A is (1298,1), Vector B varies in a for loop but is always just a column vector, I am trying to use numpy.where to find the A-indices of the elements in B. Currently I have a for loop combing through Vector B element-wise and using numpy.isclose but I was wondering if anyone knows a quicker function and/or how to do this without a nested for loop? It works but very slowly.
The for loops looks like this
sphere_indices=[]
for k in range(len(A)):
for j in range(len(B)):
if np.isclose(B[j,0],A[k,0]):
sphere_indices.append(k) ```

There was never any reason to iterate through the all 1298 elements of vector A, in order to use numpy.where and numpy.isclose I just needed to use the elements in B one at a time so numpy can broadcast properly. The following code runs much faster. Any further improvements are always welcome.
for j in range(len(index)):
sphere_indices1=np.where(np.isclose(sphere_index[:,0],index[j,0]))
sphere_indices.append(sphere_indices1[0])```

Related

List comprehension for np matrixes

I have two np.matrixes, one of which I'm trying to normalize. I know, in general, list comprehensions are faster than for loops, so I'm trying to convert my double for loop into a list expression.
# normalize the rows and columns of A by B
for i in range(1,q+1):
for j in range(1,q+1):
A[i-1,j-1] = A[i-1,j-1] / (B[i-1] / B[j-1])
This is what I have gotten so far:
A = np.asarray([A/(B[i-1]/B[j-1]) for i, j in zip(range(1,q+1), range(1,q+1))])
but I think I'm taking the wrong approach because I'm not seeing any significant time difference.
Any help would be appreciated.
First, if you really do mean np.matrix, stop using np.matrix. It has all sorts of nasty incompatibilities, and its role is obsolete now that # for matrix multiplication exists. Even if you're stuck on a Python version without #, using the dot method with normal ndarrays is still better than dealing with np.matrix.
You shouldn't use any sort of Python-level iteration construct with NumPy arrays, whether for loops or list comprehensions, unless you're sure you have no better options. Assuming A is 2D and B is 1D with shapes (q, q) and (q,) respectively, what you should instead do for this case is
A *= B
A /= B[:, np.newaxis]
broadcasting the operation over A. This will allow NumPy to perform the iteration at C level directly over the arrays' underlying data buffers, without having to create wrapper objects and perform dynamic dispatch on every operation.

trying to sum two arrays

I'm trying to code something like this:
where x and y are two different numpy arrays and the j is an index for the array. I don't know the length of the array because it will be entered by the user and I cannot use loops to code this.
My main problem is finding a way to move between indexes since i would need to go from
x[2]-x[1] ... x[3]-x[2]
and so on.
I'm stumped but I would appreciate any clues.
A numpy-ic solution would be:
np.square(np.diff(x)).sum() + np.square(np.diff(y)).sum()
A list comprehension approach would be:
sum([(x[k]-x[k-1])**2+(y[k]-y[k-1])**2 for k in range(1,len(x))])
will give you the result you want, even if your data appears as list.
x[2]-x[1] ... x[3]-x[2] can be generalized to:
x[[1,2,3,...]-x[[0,1,2,...]]
x[1:]-x[:-1] # ie. (1 to the end)-(0 to almost the end)
numpy can take the difference between two arrays of the same shape
In list terms this would be
[i-j for i,j in zip(x[1:], x[:-1])]
np.diff does essentially this, a[slice1]-a[slice2], where the slices are as above.
The full answer squares, sums and squareroots.

Array operations using multiple indices of same array

I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?
The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.

Numpy Array index problems

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

Create new array with array elements/amounts set by two other arrays in Python

I have two arrays in Python (numpy arrays):
a=array([5,7,3,5])
b=array([1,2,3,4])
and I wish to create a third array with each element from b appearing a times in the new array, as:
c=array([1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,4,4,4,4,4])
Is there a fast, numPythonic way of doing this with a minimum of looping? I need to use this operation thousands of times in a loop over a fairly large array, so I would like to have it be as fast as possible.
Cheers,
Mike
I believe repeat is what you want:
c = repeat(b, a)

Categories

Resources