Python: vectorise function over two multi-dimensional arrays simultaneously - python

In python, it is simple to vectorise a function f(x) of a scalar x over a single array a1: just use f(a1). But suppose I have two (or in principle, multiple) arrays a1, a2 having the same shape Nx3, and I want to vectorise a function, g(x,y) with x,y scalars, simultaneously over both. Something like g(a1,a2), which will return an object again with the common dimension N.
EDIT:
If a1, a2 are both 1-dimensional, this becomes trivial. We use simple broadcasting as noted below. However, for multi-dimensional arrays, the answer is not evident to me. So, how do I this, preferably using numpy?
Example (EDITED):
a1 = np.array of size 20x3 # so that each row is a 3-vector
a2 = np.array of size 20x3 # ditto
def f(x, y): # acts on each element
... complicated function, using other global variables ...
return ... (scalar)
Without vectorisation, I need to loop f individually over all 20 rows, and get an output length 20 vector:
result = []
for i, elem in a1:
result.append(f(elem, a2[i]))
result = np.array(result)
However, I want to eliminate the for loop, and have a single statement, using numpy vectorisation. The reason is to be able to use the numpy wrapper of jax (https://github.com/google/jax) then to speed this up on a GPU. Something naive like
result = f(a1, a2)
does not work. So what is the correct syntax?

use numpy's vectorize.
A simple example of np.vectorize using a simple lambda function:
import numpy as np
f = np.vectorize(lambda x: 2*x)
f([[2,3],[3,4],[1,1]])
# output:
array([[4, 6],
[6, 8],
[2, 2]])

It may depend on the operation you need to execute, if it was a simple sum than the following will work:
import numpy as np
a = np.arange(3*2*20).reshape((20,3,2))
b = np.arange(2*20).reshape((20,2))
res = (a.transpose((1,2,0))+b.transpose((1,0))).transpose((2,0,1))
print(a[0],b[0])
[[0 1]
[2 3]
[4 5]] [0 1]
print(res[0])
[[0 2]
[2 4]
[4 6]]
First the input data is transposed so that the correct dimensions will be involved in the broadcasted operation. After summation, the output is tranposed back.

I have also been trying to do something similar over the past few days. I finally managed to do it with np.vectorize, using function signatures. Try with the code snippet below:
fn_vectorized = np.vectorize(interpolate.interp1d,
signature='(n),(n)->()')
interp_fn_array = fn_vectorized(x[np.newaxis, :, :], y)
Here, I was doing the vectorization of the interp1d function. x and y are arrays of shape (m x n). The objective was to generate an array of interpolation functions, for row i of x and row i of y. The array 'interp_fn_array' contains the interpolation functions (shape is (1 x m).

Related

Numpy function, adding the log of the exponential. Python

I am new user to Python.
I want to add many exponential functions, and then take (and store in memory) the logarithm of the result. (Side note : I am doing this because the sum of the exponential functions is very large so storing the log value of this result is a workaround). Can anyone help me use this numpy function https://numpy.org/doc/stable/reference/generated/numpy.logaddexp.html
In the below code I have a 2 x 2 matrix M and a 2 dimensional vector v. I want to first add v the columns of M. So in the below code the result should be
[[11, 22], [13, 24]]
Then I want to take the exponential of each value and sum across the rows (ending up with a vector of length 2), and storing the logarithm of the result. However the below code outputs a matrix and I cant work out how to use the "out=None" imput for the logaddexp function.
import numpy as np
M = np.array([[1, 2], [3, 4]])
v = np.array([10, 20])
result = np.logaddexp(M, v[None, :])
The function np.logaddexp() performs an elementwise operation. In your case, you need the addition to be performed along a given axis. Using some basic functions, you can try the following.
import numpy as np
M = np.array([[1, 2], [3, 4]]) # '2 x 2' array
v = np.array([[10, 20]]) # '1 x 2' array
sum_Mv = M + v # '2 x 2' array
result = np.log(np.sum(np.exp(sum_Mv), axis=1))
Change the 'axis' parameter if needed.
If you still want to use np.logaddexp(), you can split the summed matrix into two halves and perform the operation as shown below.
result = np.logaddexp(sum_Mv[:, 0], sum_Mv[:, 1])
TLDR:
import numpy as np
M = np.array([[1, 2], [3, 4]])
v = np.array([10, 20])
result = np.logaddexp.reduce(M + v, axis=___)
Fill in ___ depending on what "sum across the rows" means
Consider the difference between np.add and np.sum.
np.add, much like the + operator, always takes in 2 arguments, x1 and x2, and adds them together. np.add is a numpy ufunc. If x1 or x2 is an array_like, then the arguments are broadcast together.
np.sum always takes in 1 argument, typically an array_like of items, and performs a summation of all of the elements in the array_like. This is essentially equivalent to iteratively taking an element from the array_like and repeatedly calling np.add with that element on a running result variable. The running result variable is initialized with 0.
Similarly, what np.sum is to np.add, np.prod is to np.multiply (with running result initalized as 1).
Every np.ufunc (such as np.add and np.multiply, but also np.logaddexp), comes with a reduce method and an accompanying identity property that is used as initialization for the running result.
np.add.reduce is exactly equivalent to np.sum. np.multiply.reduce is exactly equivalent to np.prod.
What you're looking to do is a log-sum-exp; but numpy only offers np.logaddexp. As such, you can use np.logaddexp.reduce to get the required functionality. Confusion arises from the fact that you're adding M and v as well as adding exponential terms together. You can simply perform the M + v operation first, and pass the resulting array (the intermediate result in your question), to np.logaddexp.reduce. Note that M + v is equivalent to M + v[None, :] in this case due to numpy's broadcasting rules.

Python - matrix multiplication

i have an array y with shape (n,), I want to compute the inner product matrix, which is a n * n matrix
However, when I tried to do it in Python
np.dot(y , y)
I got the answer n, this is not what I am looking for
I have also tried:
np.dot(np.transpose(y),y)
np.dot(y, np.transpose(y))
I always get the same answer n
I think you are looking for:
np.multiply.outer(y,y)
or equally:
y = y[None,:]
y.T#y
example:
y = np.array([1,2,3])[None,:]
output:
#[[1 2 3]
# [2 4 6]
# [3 6 9]]
You can try to reshape y from shape (70,) to (70,1) before multiplying the 2 matrices.
# Reshape
y = y.reshape(70,1)
# Either below code would work
y*y.T
np.matmul(y,y.T)
One-liner?
np.dot(a[:, None], a[None, :])
transpose doesn't work on 1-D arrays, because you need atleast two axes to 'swap' them. This solution adds a new axis to the array; in the first argument, it looks like a column vector and has two axes; in the second argument it still looks like a row vector but has two axes.
Looks like what you need is the # matrix multiplication operator. dot method is only to compute dot product between vectors, what you want is matrix multiplication.
>>> a = np.random.rand(70, 1)
>>> (a # a.T).shape
(70, 70)
UPDATE:
Above answer is incorrect. dot does the same things if the array is 2D. See the docs here.
np.dot computes the dot product of two arrays. Specifically,
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
Simplest way to do what you want is to convert the vector to a matrix first using np.matrix and then using the #. Although, dot can also be used # is better because conventionally dot is used for vectors and # for matrices.
>>> a = np.random.rand(70)
(70,)
>>> a.shape
>>> a = np.matrix(a).T
>>> a.shape
(70, 1)
>>> (a # a.T).shape
(70, 70)

vectorized addition of parameterized arrays

I'd like to sum a series of vectors defined by a (complicated) function of parameters. Is there a way to use vectorization in place of the nested for loop to get the same value for ans in the code below? Memory usage is not to be a constraint. Note that for the actual problem the nested for loop is known to limit performance, despite this not being the case here.
import numpy as np
a1_vec = np.array([0.3, 1])
a2_vec = np.array([3.3, 10])
b1_vec = np.array([0.5, 0.7])
b2_vec = np.array([1.5, 1.3])
x = np.arange(0, 10000)
ans = 0
for a1, b1 in zip(a1_vec, b1_vec):
for a2, b2 in zip(a2_vec, b2_vec):
ans += x*np.exp(- a1 - b2) + x**(1 / 2)*np.cos(b1) + x**(1 / 3)*np.sin(a2)
So as #anon pointed out in his comment, you can use numpy's array broadcasting:
using X[:,None,None] or the more explicit X[:,np.newaxis,np.newaxis] (which just is an more explicit alias for None). With this np.newaxis, you can introduce a new, empty dimension into your array. Your X.shape (dimensionality of the array) will be no longer be (10000) but (10000,1,1).
Any operation on two arrays needs the same dimensions on both arrays (e.g. (2,5) and (2,5) will work). One of the cool features of numpy is the broadcasting, which happens when an array operation is done between two arrays where one dimension that is larger than 1 is combined with a dimension of 1 (e.g. (2,5) and (2,1) where the second dimension is 5 in the first and 1 in the second vector). In this case numpy will broadcast the array with the dimension of 1 by simply looping over it (and it will do so in a c-compiled fast way). In the example it will take the of the (2,5) array with the (2,1) array it will pretend that the second array is also a (2,5) array just with the same two numbers 5 times repeated.
so as an easy example look at
a = np.arange(3)
print(a*a)
#[0,2,4]
this is just normal element by element multiplication
now if you introduce empty dimensions using broadcasting rules:
print(a[:,None]*a[None,:])
#[[0 0 0]
# [0 1 2]
# [0 2 4]]
It's a really cool and the very key to understanding the power of numpy, but admittedly it also took me some time to get familiar with it.

Vector dot product along one dimension for multidimensional arrays

I want to compute the sum product along one dimension of two multidimensional arrays, using Theano.
I'll describe precisely what I want to do using numpy first. numpy.tensordot and numpy.dot seem to always do a matrix product, whereas I'm in essence looking for a batched equivalent of a vector product. Given x and y, I want to compute z like so:
x = np.random.normal(size=(200, 2, 2, 1000))
y = np.random.normal(size=(200, 2, 2))
# this is how I now approach it:
z = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
# z is of shape (200, 2, 1000)
Now I know that numpy.einsum would probably be able to help me here, but again, I want to do this particular computation in Theano, which does not have an einsum equivalent. I will need to use dot, tensordot, or Theano's specialized einsum subset functions batched_dot or batched_tensordot.
The reason I'm looking to change my approach to this is performance; I suspect that using builtin (CUDA) dot products will be faster than relying on broadcasting, element-wise product, and sum.
In Theano, none of the dimensions of three and four dimensional tensors are broadcastable. You have to explicitly set them. Then the Numpy principles will work just fine. One way to do this is to use T.patternbroadcast. To read more about broadcasting, refer this.
You have three dimensions in one of the tensors. So first you need to append a singleton dimension at the end and then make that dimension broadcastable. These two things can be achieved with a single command - T.shape_padaxis. The entire code is as follows:
import theano
from theano import tensor as T
import numpy as np
X = T.ftensor4('X')
Y = T.ftensor3('Y')
Y_broadcast = T.shape_padaxis(Y, axis=-1) # appending extra dimension and making it
# broadcastable
Z = T.sum((X*Y_broadcast), axis=1) # element-wise multiplication
f = theano.function([X, Y], Z, allow_input_downcast=True)
# Making sure that it works and gives correct results
x = np.random.normal(size=(3, 2, 2, 4))
y = np.random.normal(size=(3, 2, 2))
theano_result = f(x,y)
numpy_result = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
print np.amax(theano_result - numpy_result) # prints 2.7e-7 on my system, close enough!
I hope this helps.

Convert a 1D array to a 2D array in numpy

I want to convert a 1-dimensional array into a 2-dimensional array by specifying the number of columns in the 2D array. Something that would work like this:
> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
[3, 4],
[5, 6]])
Does numpy have a function that works like my made-up function "vec2matrix"? (I understand that you can index a 1D array like a 2D array, but that isn't an option in the code I have - I need to make this conversion.)
You want to reshape the array.
B = np.reshape(A, (-1, 2))
where -1 infers the size of the new dimension from the size of the input array.
You have two options:
If you no longer want the original shape, the easiest is just to assign a new shape to the array
a.shape = (a.size//ncols, ncols)
You can switch the a.size//ncols by -1 to compute the proper shape automatically. Make sure that a.shape[0]*a.shape[1]=a.size, else you'll run into some problem.
You can get a new array with the np.reshape function, that works mostly like the version presented above
new = np.reshape(a, (-1, ncols))
When it's possible, new will be just a view of the initial array a, meaning that the data are shared. In some cases, though, new array will be acopy instead. Note that np.reshape also accepts an optional keyword order that lets you switch from row-major C order to column-major Fortran order. np.reshape is the function version of the a.reshape method.
If you can't respect the requirement a.shape[0]*a.shape[1]=a.size, you're stuck with having to create a new array. You can use the np.resize function and mixing it with np.reshape, such as
>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)
Try something like:
B = np.reshape(A,(-1,ncols))
You'll need to make sure that you can divide the number of elements in your array by ncols though. You can also play with the order in which the numbers are pulled into B using the order keyword.
If your sole purpose is to convert a 1d array X to a 2d array just do:
X = np.reshape(X,(1, X.size))
convert a 1-dimensional array into a 2-dimensional array by adding new axis.
a=np.array([10,20,30,40,50,60])
b=a[:,np.newaxis]--it will convert it to two dimension.
There is a simple way as well, we can use the reshape function in a different way:
A_reshape = A.reshape(No_of_rows, No_of_columns)
You can useflatten() from the numpy package.
import numpy as np
a = np.array([[1, 2],
[3, 4],
[5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")
Output:
original array: [[1 2]
[3 4]
[5 6]]
flattened array = [1 2 3 4 5 6]
some_array.shape = (1,)+some_array.shape
or get a new one
another_array = numpy.reshape(some_array, (1,)+some_array.shape)
This will make dimensions +1, equals to adding a bracket on the outermost
Change 1D array into 2D array without using Numpy.
l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part
while end <= len(l):
temp = []
for i in range(start, end):
temp.append(l[i])
new.append(temp)
start += part
end += part
print("new values: ", new)
# for uneven cases
temp = []
while start < len(l):
temp.append(l[start])
start += 1
new.append(temp)
print("new values for uneven cases: ", new)
import numpy as np
array = np.arange(8)
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)

Categories

Resources