NumPy Vectorize a function, unknown shape - python

I have function that take numpy array as parameter, for example:
def f(arr):
return arr.sum()
and I want to create numpy array from each vec in A, so if A.shape = (14,12,7), my function myfunc(A).shape = (14,12)
i.e.
myfunc(A)[x, y] = f(A[x, y])
Note that len(A.shape) is not specified.

You can apply sum along the last axis:
A.sum(axis=-1)
For example:
In [1]: np.ones((14,12,7)).sum(axis=-1).shape
Out[1]: (14, 12)
If you have a generic function you can use apply_along_axis:
np.apply_along_axis(sum, -1, A)

Related

Python - Numpy - Get a list of numpy.random.normal values in a particular shape without using a loop

I am new to Python and curious if there is a way to instantly create a list of numpy.random.normal values of a specific shape without using a loop like this?
def get_random_normal_values():
i = 0
result = []
while i < 100:
result.append(np.random.normal(0.0, 1.0, (16, 16, 3)))
i = i + 1
return result
Simply use the size parameter to numpy.random.normal to define the final shape of the desired array:
result = np.random.normal(size=(100, 16, 16, 3)) # default parameters are loc=0.0, scale=1.0
You will get an array instead of a list of arrays, but this does not remove functionality (you can still loop over it). Please make your use case more explicit if this doesn't suit your needs.
Yes, of course it is posssible:
import numpy as np
def get_random_normal_values(n_samples: int, size: tuple[int]) -> list[np.array]:
all_samples = np.random.normal(np.random.normal(0.0, 1.0, (n_samples, ) + size))
return list(all_samples)
A distribution of (n_samples, m, n, o) shape is generated where (m, n, o) = size (input parameter). After that, we "flatten" it into a list of length n_samples where each item of that list is an array of shape (m, n, o).
Try This :
np.random.normal(0.0, 1.0, (100, 16, 16, 3))

What is necessary for a function to be able to take stacked NumPy arguments?

While plotting with a meshgrid defined like this:
Y,X = np.mgrid[-5:5:20j, -5:5:20j]
For example this function
def polynomial(x,y):
return x**2 + y**2
can handle polynomial(X,Y) but not
stack = np.dstack((X,Y))
polynomial(stack)
--->
Type error: polynomial missing 1 required positional argument: 'y'
while on the other hand e.g. the pdf of SciPy.stats multivariate_normal
mu = [0,0]
sigma = [[3,2]
[2,3]]
normal = st.multivariate_normal(mu, sigma)
normal = normal.pdf
can't handle
normal(X,Y)
--->
Type error: pdf() takes 2 positional arguments but 3 were given
but it can handle normal(stack). Both are functions of two variables but the way they accept arguments is apparantly very different.
What changes would I have to make to polynomial such that it can accept stacked arguments like normal can?
Look at the array shapes:
In [165]: Y,X = np.mgrid[-5:5:20j, -5:5:20j]
In [166]: Y.shape
Out[166]: (20, 20)
In [167]: X.shape
Out[167]: (20, 20)
That's 2 arrays. Joining them on a new trailing axis:
In [168]: stack = np.dstack((X,Y))
In [169]: stack.shape
Out[169]: (20, 20, 2)
An alternative way, with a new leading axis:
In [170]: stack1 = np.stack((X,Y))
In [171]: stack1.shape
Out[171]: (2, 20, 20)
ogrid makes a sparse pair:
In [172]: y,x = np.ogrid[-5:5:20j, -5:5:20j]
In [173]: y.shape
Out[173]: (20, 1)
In [174]: x.shape
Out[174]: (1, 20)
Your function can handle these, since they behave the same as X and Y with respect to broadcasting operators like +:
In [175]: def polynomial(x,y):
...: return x**2 + y**2
...:
In [176]: polynomial(x,y).shape
Out[176]: (20, 20)
The stacked arrays can be used via:
In [177]: polynomial(stack[...,0],stack[...,1]).shape
Out[177]: (20, 20)
In [178]: polynomial(stack1[0],stack1[1]).shape
Out[178]: (20, 20)
The function takes 2 arrays - that's explicit in the definition.
The signature of pdf is (from the docs)
pdf(x, mean=None, cov=1, allow_singular=False)
x : array_like
Quantiles, with the last axis of `x` denoting the components.
It doesn't accept a second positional argument, though you can specify added keyword ones. How it handles the dimensions of x is internal to the function (not part of its signature), but presumably the dstack grid works, with "2 components".
Each function has its own signature. You can't presume that the pattern for one applies to another. Keep the docs at hand!
digging further, pdf passes the x though a function that
Adjust quantiles array so that last axis labels the components of
each data point.
This function definition should handle both the forms (not tested):
def polynomial(x,y=None):
if y is None:
x, y = x[...,0], x[...,1]
return x**2 + y**2

Numpy mask array multiple times and fill nans in 3D array with values from another 3D array

I have the following code:
import numpy as np
def fill(arr1, arr2, arr3, arr4, thresh= 0.5):
out_arr = np.zeros(arr1.shape)
for i in range(0,len(arr1)):
arr1[i] = np.where(np.abs(arr1[i])<=thresh,np.nan,arr1[i])
mask = np.isnan(arr1[i])
arr1[i] = np.nan_to_num(arr1[i])
merged1 = (arr2[i]*mask)+arr1[i]
merged2 = np.where(np.abs(merged1)<=thresh,np.nan,merged1)
mask = np.isnan(merged2)
merged2 = np.nan_to_num(merged2)
merged3 = (arr3[i]*mask)+merged2
merged3 = np.where(np.abs(merged3)<=thresh,np.nan,merged3)
mask = np.isnan(merged3)
merged3 = np.nan_to_num(merged3)
merged4 = (arr4[i]*mask)+merged3
out_arr[i] = merged4
return(out_arr)
arr1 = np.random.rand(10, 10, 10)
arr2 = np.random.rand(10, 10, 10)
arr3 = np.random.rand(10, 10, 10)
arr4 = np.random.rand(10, 10, 10)
arr = fill(arr1, arr2, arr3, arr4, 0.5)
I wonder if there is a more efficient way of doing this maybe with masked arrays? Basically what I am doing is to replace values below the threshold in each layer of the 3D array with the next array, and this over 4 arrays. How would this look like for n arrays?
Thanks!
Your function can be simplified in several ways. In terms of efficiency, the most significant aspect is that you do not need to iterate over the first dimension, you can operate on the whole arrays directly. Besides that, you can refactor the replacement logic to something much simpler, and use a a loop to avoid repeating the same code over and over:
import numpy as np
# Function accepts as many arrays as wanted, with at least one
# (threshold needs to be passed as keyword parameter)
def fill(arr1, *arrs, thresh=0.5):
# Output array
out_arr = arr1.copy()
for arr in arrs:
# Replace values that are still below threshold
mask = np.abs(out_arr) <= thresh
out_arr[mask] = arr[mask]
return out_arr
Since thresh needs to be passed as keyword parameter in this function, you would call it as:
arr = fill(arr1, arr2, arr3, arr4, thresh=0.5)

Split a numpy array using masking in python

I have a numpy array my_array of size 100x20. I want to create a function that receives as an input a 2d numpy array my_arr and an index x and will return two arrays one with size 1x20 test_arr and one with 99x20 train_arr. The vector test_arr will correspond to the row of the matrix my_arr with the index x and the train_arr will contain the rest rows. I tried to follow a solution using masking:
def split_train_test(my_arr, x):
a = np.ma.array(my_arr, mask=False)
a.mask[x, :] = True
a = np.array(a.compressed())
return a
Apparently this is not working as i wanted. How can i return a numpy array as a result and the train and test arrays properly?
You can use simple index and numpy.delete for this:
def split_train_test(my_arr, x):
return np.delete(my_arr, x, 0), my_arr[x:x+1]
my_arr = np.arange(10).reshape(5,2)
train, test = split_train_test(my_arr, 2)
train
#array([[0, 1],
# [2, 3],
# [6, 7],
# [8, 9]])
test
#array([[4, 5]])
You can also use a boolean index as the mask:
def split_train_test(my_arr, x):
# define mask
mask=np.zeros(my_arr.shape[0], dtype=bool)
mask[x] = True # True only at index x, False elsewhere
return my_arr[mask, :], my_arr[~mask, :]
Sample run:
test_arr, train_arr = split_train_test(np.random.rand(100, 20), x=10)
print(test_arr.shape, train_arr.shape)
((1L, 20L), (99L, 20L))
EDIT:
If someone is looking for the general case where more than one element needs to be allocated to the test array (say 80%-20% split), x can also accept an array:
my_arr = np.random.rand(100, 20)
x = np.random.choice(np.arange(my_arr.shape[0]), int(my_arr .shape[0]*0.8), replace=False)
test_arr, train_arr = split_train_test(my_arr, x)
print(test_arr.shape, train_arr.shape)
((80L, 20L), (20L, 20L))

How to get constant function to keep shape in NumPy

I have a NumPy array A with shape (m,n) and want to run all the elements through some function f. For a non-constant function such as for example f(x) = x or f(x) = x**2 broadcasting works perfectly fine and returns the expected result. For f(x) = 1, applying the function to my array A however just returns the scalar 1.
Is there a way to force broadcasting to keep the shape, i.e. in this case to return an array of 1s?
F(x) = 1 is not a function you need to create a function with def or lambda and return 1. Then use np.vectorize to apply the function on your array.
>>> import numpy as np
>>> f = lambda x: 1
>>>
>>> f = np.vectorize(f)
>>>
>>> f(np.arange(10).reshape(2, 5))
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
This sounds like a job for np.ones_like, or np.full_like in the general case:
def f(x):
result = np.full_like(x, 1) # or np.full_like(x, 1, dtype=int) if you don't want to
# inherit the dtype of x
if result.shape == 0:
# Return a scalar instead of a 0D array.
return result[()]
else:
return result
Use x.fill(1). Make sure to return it properly as fill doesn't return a new variable, it modifies x

Categories

Resources