I am mainly interested in ((d1,d2)) numpy arrays (matrices) but the question makes sense for arrays with more axes. I have function f(i,j) and I'd like to initialize an array by some operation of this function
A=np.empty((d1,d2))
for i in range(d1):
for j in range(d2):
A[i,j]=f(i,j)
This is readable and works but I am wondering if there is a faster way since my array A will be very large and I have to optimize this bit.
One way is to use np.fromfunction. Your code can be replaced with the line:
np.fromfunction(f, shape=(d1, d2))
This is implemented in terms of NumPy functions and so should be quite a bit faster than Python for loops for larger arrays.
a=np.arange(d1)
b=np.arange(d2)
A=f(a,b)
Note that if your arrays are of different size, then you have to create a meshgrid:
X,Y=meshgrid(a,b)
A=f(X,Y)
Related
I am trying to get a an array generated from applying differnt functions all stored in a numpy array on the same parameter, is there an efficient way coding this using numpy?
#func_array- a numpy array of different functions that get the same parameter
#X - parameter for evey function in func_array
def aplly_all(func_array, X):
return func_array(X)
#where return value is an array where index i has the value - func_array[i](X)
the only solution i thought of is iterating through the func_array and i wonder if there is a faster way of doing it
I once had the exact same questions, and this is what I was told:
The vectorization speed-up that numpy array operations provide is due to the base data-types defined for the array (say an array of floats, for instance).
When the array elements are objects, this advantage is mostly nullified. Since functions are objects, func_array is an array of objects. Thus any other method will hardly provide any speedup over iteration.
This is what I've learnt. I'm open to more experienced advice.
my actual data is huge and quite heavy. But if I simplify and say if I have a list of numbers
x = [1,3,45,45,56,545,67]
and I have a function that performs some action on these numbers?
def sumnum(x):
return(x=np.sqrt(x)+1)
whats the best way apply this function to the list? I dont want to apply for loop. Would 'map' be the best option or anything faster/ efficient than that?
thanks,
Prasad
In standard Python, the map function is probably the easiest way to apply a function to an array (not sure about efficiency though). However, if your array is huge, as you mentioned, you may want to look into using numpy.vectorize, which is very similar to Python's built-in map function.
Edit: A possible code sample:
vsumnum = np.vectorize(sumnum)
x = vsumnum(x)
The first function call returns a function which is vectorized, meaning that numpy has prepared it to be mapped to your array, and the second function call actually applies the function to your array and returns the resulting array. Taken from the docs, this method is provided for convenience, not efficiency and is basically the same as a for loop
Edit 2:
As #Ch3steR mentioned, numpy also allows for elementwise operations to arrays, so in this case because you are doing simple operations, you can just do np.sqrt(x) + 1, which will add 1 to the square root of each element. Functions like map and numpy.vectorize are better for when you have more complicated operations to apply to an array
I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that
K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.
I am exactly using iteration over each element of array to update it. For each element in dataset z, I am assigning the cluster array from nearest centroid via iteration through each element.
for i in range(z):
clstr[i] = closest_center(data[i], cen)
and my update function is
def closest_center(x, clist):
dlist = [fabs(x - i) for i in clist]
return clist[dlist.index(min(dlist))]
Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.
I noticed that opencv has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?
My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.
The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center. This is straightforward for 1-dimensional arrays:
import numpy as np
def closest_center(x, clist):
return clist[np.argmin(np.abs(x - clist))]
I have a function that I want to have quickly access the first (aka zeroth) element of a given Numpy array, which itself might have any number of dimensions. What's the quickest way to do that?
I'm currently using the following:
a.reshape(-1)[0]
This reshapes the perhaps-multi-dimensionsal array into a 1D array and grabs the zeroth element, which is short, sweet and often fast. However, I think this would work poorly with some arrays, e.g., an array that is a transposed view of a large array, as I worry this would end up needing to create a copy rather than just another view of the original array, in order to get everything in the right order. (Is that right? Or am I worrying needlessly?) Regardless, it feels like this is doing more work than what I really need, so I imagine some of you may know a generally faster way of doing this?
Other options I've considered are creating an iterator over the whole array and drawing just one element from it, or creating a vector of zeroes containing one zero for each dimension and using that to fancy-index into the array. But neither of these seems all that great either.
a.flat[0]
This should be pretty fast and never require a copy. (Note that a.flat is an instance of numpy.flatiter, not an array, which is why this operation can be done without a copy.)
You can use a.item(0); see the documentation at numpy.ndarray.item.
A possible disadvantage of this approach is that the return value is a Python data type, not a numpy object. For example, if a has data type numpy.uint8, a.item(0) will be a Python integer. If that is a problem, a.flat[0] is better--see #user2357112's answer.
np.hsplit(x, 2)[0]
Source: https://numpy.org/doc/stable/reference/generated/numpy.dsplit.html
Source:
https://numpy.org/doc/stable/reference/generated/numpy.hsplit.html
## y -- numpy array of shape (1, Ty)
if you want to get the first element:
use y.shape[0]
if you want to get the second element:
use y.shape[1]
Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
You can also use the take for more complicated extraction (to get few elements):
numpy.take(a, indices, axis=None, out=None, mode='raise')[source] Take
elements from an array along an axis.
For the sake of speeding up my algorithm that has numpy arrays with tens of thousands of elements, I'm wondering if I can reduce the time used by numpy.delete().
In fact, if I can just eliminate it?
I have an algorithm where I've got my array alpha.
And this is what I'm currently doing:
alpha = np.delete(alpha, 0)
beta = sum(alpha)
But why do I need to delete the first element? Is it possible to simply sum up the entire array using all elements except the first one? Will that reduce the time used in the deletion operation?
Avoid np.delete whenever possible. It returns a a new array, which means that new memory has to be allocated, and (almost) all the original data has to be copied into the new array. That's slow, so avoid it if possible.
beta = alpha[1:].sum()
should be much faster.
Note also that sum(alpha) is calling the Python builtin function sum. That's not the fastest way to sum items in a NumPy array.
alpha[1:].sum() calls the NumPy array method sum which is much faster.
Note that if you were calling alpha.delete in a loop, then the code may be deleting more than just the first element from the original alpha. In that case, as Sven Marnach points out, it would be more efficient to compute all the partial sums like this:
np.cumsum(alpha[:0:-1])[::-1]