I'm doing the following in python:
tmp = np.empty_like(J,dtype=X.dtype)
for idx, (ii, jj) in enumerate(zip(I, J)):
tmp[idx] = sum((X[ii] - X[jj])**2)
where X is a 50000 x 128 numpy array
and I and J are integer numpy arrays of size (763690,) (columns and rows of a sparse matrix)
Now the problem is that the above operation takes about 30 seconds to complete, and I don't see what I need to do to speed this up. I know it can be done faster, since I have a similar code in Matlab where it barely takes any time.
What am I doing wrong here?
Is it something about memory stride access?, not using builtin functions? or something else? should I parallelize/vectorize it?
(I know the title is terrible, but I couldn't figure out a good way to write it, suggestions are welcome!)
We can do this with:
np.sum((X[I]-X[J])**2, axis=1)
Here we thus first use subscripting to generate a 763 690×128 matrix X[I] where for each item in I we use the corresponding row in X. The same happens for X[j]. We then subtract the two, and obtain a 763 690×128 matrix. We can element-wise square the items, and then calculate the sum over the first axis. This thus means that fore every row, we obtain a single value. The result is thus a flat array with 763 690 elements.
Willems method worked wonderful!
np.sum((X[I]-X[J])**2, axis=1)
it took the operation time from ~30s to ~0.6s, thank you very much :)
Related
I've been given the challenge to code np.argmin without numpy .
I've been thinking hard for about a day..
I have no idea whether I should use a for statement,
an if statement, a while statement, or another function..
First question!
First, I thought about how to express it with an inequality sign to distinguish between cases.
using the if statement
a[0,0] - a[0,1] > 0
a[0,0] - a[0,1] < 0
I tried to write the code by dividing the two cases.
There were too many cases, so I stopped.
Couldn't it be done with an If statement?
Second question!
We know that the argmin method represents the address of a pointer as an array value.
What is in the screen capture is what I arbitrarily input as a two-dimensional list.
ndarray.
Because the task is limited to receiving a two-dimensional list as input
I thought that the directions of axis=0 and axis=1 are fixed.
Then axis=0 freezes the column and compares row to row
Is it okay to think that axis=1 freezes rows and compares columns to columns?
Third question!
After receiving an arbitrary two-dimensional list, ndarray is
I thought it would be in the form of a matrix of the form ixj.
Then, if you use a.shape, the output value is output as (i , j).
How can we extract i and j here?
It's really hard to think about all day long.
Any hints would be appreciated.
def argmin(a):
return min(range(len(a)), key=lambda x : a[x])
def argmax(a):
return max(range(len(a)), key=lambda x : a[x])
This code is for 1D list.
I have two vectors, vector A is (1298,1), Vector B varies in a for loop but is always just a column vector, I am trying to use numpy.where to find the A-indices of the elements in B. Currently I have a for loop combing through Vector B element-wise and using numpy.isclose but I was wondering if anyone knows a quicker function and/or how to do this without a nested for loop? It works but very slowly.
The for loops looks like this
sphere_indices=[]
for k in range(len(A)):
for j in range(len(B)):
if np.isclose(B[j,0],A[k,0]):
sphere_indices.append(k) ```
There was never any reason to iterate through the all 1298 elements of vector A, in order to use numpy.where and numpy.isclose I just needed to use the elements in B one at a time so numpy can broadcast properly. The following code runs much faster. Any further improvements are always welcome.
for j in range(len(index)):
sphere_indices1=np.where(np.isclose(sphere_index[:,0],index[j,0]))
sphere_indices.append(sphere_indices1[0])```
I'm trying to code something like this:
where x and y are two different numpy arrays and the j is an index for the array. I don't know the length of the array because it will be entered by the user and I cannot use loops to code this.
My main problem is finding a way to move between indexes since i would need to go from
x[2]-x[1] ... x[3]-x[2]
and so on.
I'm stumped but I would appreciate any clues.
A numpy-ic solution would be:
np.square(np.diff(x)).sum() + np.square(np.diff(y)).sum()
A list comprehension approach would be:
sum([(x[k]-x[k-1])**2+(y[k]-y[k-1])**2 for k in range(1,len(x))])
will give you the result you want, even if your data appears as list.
x[2]-x[1] ... x[3]-x[2] can be generalized to:
x[[1,2,3,...]-x[[0,1,2,...]]
x[1:]-x[:-1] # ie. (1 to the end)-(0 to almost the end)
numpy can take the difference between two arrays of the same shape
In list terms this would be
[i-j for i,j in zip(x[1:], x[:-1])]
np.diff does essentially this, a[slice1]-a[slice2], where the slices are as above.
The full answer squares, sums and squareroots.
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.
a other stupid question from my side ;) I have some issues with the following snippet with len(x)=len(y)=7'700'000:
from numpy import *
for k in range(len(x)):
if x[k] == xmax:
xind = -1
else:
xind = int(floor((x[k]-xmin)/xdelta))
if y[k] == ymax:
yind = -1
else:
yind = int(floor((y[k]-ymin)/ydelta))
arr = append(arr,grid[xind,yind])
All variables are floats or integers except arr and grid. arr is a 1D-array and grid is a 2D-array.
My problem is that it takes a long time to run through the loop (several minutes). Can anyone explain me, why this takes such a long time? Have anyone a suggestion? Even if I try to exchange range() through arange()then I save only some second.
Thanks.
1st EDIT
Sorry. Forgot to tell that I'm importing numpy
2nd EDIT
I have some points in a 2D-grid. Each cell of the grid have a value stored. I have to find out which position the point have and apply the value to a new array. That's my problem and my idea.
p.s.: look at the picture if you want to understand it better. the values of the cell are represented with different colors.
How about something like:
import numpy as np
xind = np.floor((x-xmin)/xdelta).astype(int)
yind = np.floor((y-ymin)/ydelta).astype(int)
xind[np.argmax(x)] = -1
yind[np.argmax(y)] = -1
arr = grid[xind,yind]
Note: if you're using numpy don't treat the arrays like python lists if you want to do things efficiently.
for x_item, y_item in zip(x, y):
# do stuff.
There's also izip for if you don't want to generate a giant extra list.
I cannot see an obvious problem, beside the size of the data. Is your computer able to hold everything in memory? If not, you are probably "jumping around" in swapped memory, which will always be slow. If the complete data is in memory, give psyco a try. It might speed up your calculation a lot.
I suspect the problem might be in the way you're storing the results:
arr = append(arr,grid[xind,yind])
The docs for append say it returns:
A copy of arr with values appended
to axis. Note that append does
not occur in-place: a new array is
allocated and filled.
This means you'll be deallocating and allocating a larger and larger array every iteration. I suggest allocating an array of the correct size up-front, then populating it with data in each iteration. e.g.:
arr = empty(len(x))
for k in range(len(x)):
...
arr[k] = grid[xind,yind]
x's lenght is 7 millions? I think that's why!
THe iterations ocurrs 7 millions times,
probably you shoud make another kind of loop.
It's really necesary looping over 7 m times?