apply numpy functions array to an array of elements

apply numpy functions array to an array of elements - python

I am trying to get a an array generated from applying differnt functions all stored in a numpy array on the same parameter, is there an efficient way coding this using numpy?
#func_array- a numpy array of different functions that get the same parameter
#X - parameter for evey function in func_array
def aplly_all(func_array, X):
return func_array(X)
#where return value is an array where index i has the value - func_array[i](X)
the only solution i thought of is iterating through the func_array and i wonder if there is a faster way of doing it

I once had the exact same questions, and this is what I was told:
The vectorization speed-up that numpy array operations provide is due to the base data-types defined for the array (say an array of floats, for instance).
When the array elements are objects, this advantage is mostly nullified. Since functions are objects, func_array is an array of objects. Thus any other method will hardly provide any speedup over iteration.
This is what I've learnt. I'm open to more experienced advice.

Related

In python, whats the most efficient way to apply function to a list

my actual data is huge and quite heavy. But if I simplify and say if I have a list of numbers
x = [1,3,45,45,56,545,67]
and I have a function that performs some action on these numbers?
def sumnum(x):
return(x=np.sqrt(x)+1)
whats the best way apply this function to the list? I dont want to apply for loop. Would 'map' be the best option or anything faster/ efficient than that?
thanks,
Prasad

In standard Python, the map function is probably the easiest way to apply a function to an array (not sure about efficiency though). However, if your array is huge, as you mentioned, you may want to look into using numpy.vectorize, which is very similar to Python's built-in map function.
Edit: A possible code sample:
vsumnum = np.vectorize(sumnum)
x = vsumnum(x)
The first function call returns a function which is vectorized, meaning that numpy has prepared it to be mapped to your array, and the second function call actually applies the function to your array and returns the resulting array. Taken from the docs, this method is provided for convenience, not efficiency and is basically the same as a for loop
Edit 2:
As #Ch3steR mentioned, numpy also allows for elementwise operations to arrays, so in this case because you are doing simple operations, you can just do np.sqrt(x) + 1, which will add 1 to the square root of each element. Functions like map and numpy.vectorize are better for when you have more complicated operations to apply to an array

NumPy empty() array not giving random float after defining normal NumPy array

I was using the NumPy np.empty() to get an array with a random value, but it doesn't work when I define a normal np.array() before.
Here is the two functions I used:
import numpy as np
def create_float_array(x):
return np.array([float(x)])
def get_empty_array():
return np.empty((), dtype=np.float).tolist()
Just to test the get_empty_array(), I wrote in the console:
>>> get_empty_array() # Should return a random float
>>> 0.007812501848093234
I was pleased with the result, so I tried this, but it didn't work the way I wanted:
>>> create_float_array(3.1415) # Create a NumPy array with the float given
>>> array([3.1415])
>>> get_empty_array() # Should return another random value in a NumPy array
>>> 3.1415
I am not too sure as to why creating a NumPy array affects the np.empty() method from giving a random value. Apparently, it gives the same value as the value in the np.array(), in this case, 3.1415.
Note that I chose to leave the shape of the np.empty() to nothing for testing purposes, but in reality it would have some shape.
Finally, I know this is not the correct way of getting random values, but I need to use the np.empty() in my program, but don't exactly know why this behaviour occurs.

Just to clarify the point:
np.empty is not giving truly random values. The offical NumPy documentation states that it will contain "uninitialized entries" or "arbitary data":
numpy.empty(shape, dtype=float, order='C')
Return a new array of given shape and type, without initializing entries.
[...]
Returns:
out : ndarray
Array of uninitialized (arbitrary) data of the given shape, dtype, and order. Object arrays will be initialized to None.
So what does uninitialized or arbitrary mean? To understand that you have to understand that when you create an object (any object) you need to ask someone (that someone can be the NumPy internals, the Python internals or your OS) for the required amount of memory.
So when you create an empty array NumPy asks for memory. The amount of memory for a NumPy array will be some overhead for the Python object and a certain amount of memory to contain the values of the array. That memory may contain anything. So an "uninitialized value" means that it simply contains whatever is in that memory you got.
What happened here is just a coincidence. You created an array containing one float, then you printed it, then it is destroyed again because you noone kept a reference to it (although that is CPython specific, other Python implementations may not free the memory immediately, they just free it eventually). Then you create an empty array containing one float. The amount of memory for the second array is identical to the amount of memory just released by the first memory. Here's where the coincidence comes in: So maybe something (NumPy, Python or your OS) decided to give you the same memory location again.

update elements using numpy array function

I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that
K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.
I am exactly using iteration over each element of array to update it. For each element in dataset z, I am assigning the cluster array from nearest centroid via iteration through each element.
for i in range(z):
clstr[i] = closest_center(data[i], cen)
and my update function is
def closest_center(x, clist):
dlist = [fabs(x - i) for i in clist]
return clist[dlist.index(min(dlist))]
Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.
I noticed that opencv has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?
My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.

The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center. This is straightforward for 1-dimensional arrays:
import numpy as np
def closest_center(x, clist):
return clist[np.argmin(np.abs(x - clist))]

Quick way to access first element in Numpy array with arbitrary number of dimensions?

I have a function that I want to have quickly access the first (aka zeroth) element of a given Numpy array, which itself might have any number of dimensions. What's the quickest way to do that?
I'm currently using the following:
a.reshape(-1)[0]
This reshapes the perhaps-multi-dimensionsal array into a 1D array and grabs the zeroth element, which is short, sweet and often fast. However, I think this would work poorly with some arrays, e.g., an array that is a transposed view of a large array, as I worry this would end up needing to create a copy rather than just another view of the original array, in order to get everything in the right order. (Is that right? Or am I worrying needlessly?) Regardless, it feels like this is doing more work than what I really need, so I imagine some of you may know a generally faster way of doing this?
Other options I've considered are creating an iterator over the whole array and drawing just one element from it, or creating a vector of zeroes containing one zero for each dimension and using that to fancy-index into the array. But neither of these seems all that great either.

a.flat[0]
This should be pretty fast and never require a copy. (Note that a.flat is an instance of numpy.flatiter, not an array, which is why this operation can be done without a copy.)

You can use a.item(0); see the documentation at numpy.ndarray.item.
A possible disadvantage of this approach is that the return value is a Python data type, not a numpy object. For example, if a has data type numpy.uint8, a.item(0) will be a Python integer. If that is a problem, a.flat[0] is better--see #user2357112's answer.

np.hsplit(x, 2)[0]
Source: https://numpy.org/doc/stable/reference/generated/numpy.dsplit.html
Source:
https://numpy.org/doc/stable/reference/generated/numpy.hsplit.html

## y -- numpy array of shape (1, Ty)
if you want to get the first element:
use y.shape[0]
if you want to get the second element:
use y.shape[1]
Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
You can also use the take for more complicated extraction (to get few elements):
numpy.take(a, indices, axis=None, out=None, mode='raise')[source] Take
elements from an array along an axis.

Fastest way to initialize numpy array with values given by function

I am mainly interested in ((d1,d2)) numpy arrays (matrices) but the question makes sense for arrays with more axes. I have function f(i,j) and I'd like to initialize an array by some operation of this function
A=np.empty((d1,d2))
for i in range(d1):
for j in range(d2):
A[i,j]=f(i,j)
This is readable and works but I am wondering if there is a faster way since my array A will be very large and I have to optimize this bit.

One way is to use np.fromfunction. Your code can be replaced with the line:
np.fromfunction(f, shape=(d1, d2))
This is implemented in terms of NumPy functions and so should be quite a bit faster than Python for loops for larger arrays.

a=np.arange(d1)
b=np.arange(d2)
A=f(a,b)
Note that if your arrays are of different size, then you have to create a meshgrid:
X,Y=meshgrid(a,b)
A=f(X,Y)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.