Vectorized Evaluation of a Function, Broadcasting and Element Wise Operations - python

Given this...
I have to explain what this code does, knowing that it performs the vectorized evaluation of F, using broadcasting and element wise operations concepts...
def F(x_pos, alpha):
D = x_pos.reshape(1,-1) - x_pos.reshape(-1,1)
return (1./alpha) * (alpha.reshape(1,-1) * R(D)).sum(axis=1)
My explanation is:
In the first line of the function F receives x_pos and alpha as parameters (both numpy arrays), in the second line the matrix D is calculated by means of broadcasting (basic operations such as addition in arrays numpy are performed elementwise, ie, element by element, but it is also possible with arranys of different size if numpy can transform them into others of the same size, this conversion is called broadcasting), subtracting an array of order 1xN with another of order Nx1, resulting in the matrix D of order NxN containing x_j - x_1, x_j - x_2, etc. as elements, finally, in the last line the reciprocal of alpha is calculated (which clearly is an arrangement), where each element is multiplied by the sum of the R evaluation of each cell of the matrix D multiplied by alpha_j horizontally (due to axis = 1 in the argument)
Questions:
Considering I'm new to Python, is my explanation OK?
The code has an error or not? Because I don't see that the "j must be different from 1, 2, ..., n" in each sum is taken into consideration in the code... and If it's in fact wrong... How can I fix the code so it do exactly the same thing as stated as in the image?

Few comments/improvements/fixes could be suggested here.
1] The first step could be alternatively done with just introducing a new axis and subtracting with itself, like so -
D = x_pos[:,None] - x_pos
In my opinion, this is a cleaner option. The performance benefit might be just marginal.
2] In the second line, I think it needs a fix as we need to avoid computations for the diagonal elements of R(D). So, If I got that correctly, the corrected code would be -
vals = R(D)
np.fill_diagonal(vals,0)
out = (1./alpha) * (alpha.reshape(1,-1) * vals).sum(axis=1)
Now, let's make the code a bit more idiomatic/cleaner.
At that line, we could write : (alpha * vals) instead of alpha.reshape(1,-1) * vals. This is because the shapes are already aligned for broadcasting as shown in a schematic diagram below -
alpha : n
vals : n x n
Thus, alpha would be automatically extended to 2D with its elements broadcasted along the first axis for the length of vals and then elementwise multiplications being generated with it. Again, this is meant as a cleaner code.
There's a further performance improvement possible here with (alpha.reshape(1,-1) * vals).sum(axis=1) being replaceable with a matrix-multiplicatiion using np.dot as alpha.dot(vals). The benefit on performance should be noticeable with this step.
So, the second step reduces to -
out = (1./alpha) * alpha.dot(vals)

Related

Writing Code using NumPy without any loops

I am writing a program that utilizes NumPy to calculate accuracy between testing and training points, but I am not sure how to utilize the vectorized functions as opposed to the for loops I have used in my code.
Here is my code(Is there a way to simply the code so that I do not need any loops?)
ty#command to import NumPy package
import numpy as np
iris_train=np.genfromtxt("iris-train-data.csv",delimiter=',',usecols=(0,1,2,3),dtype=float)
iris_test=np.genfromtxt("iris-test-data.csv",delimiter=',',usecols=(0,1,2,3),dtype=float)
train_cat=np.genfromtxt("iris-training-data.csv",delimiter=',',usecols=(4),dtype=str)
test_cat=np.genfromtxt("iris-testing-data.csv",delimiter=',',usecols=(4),dtype=str)
correct = 0
for i in range(len(iris_test)):
n = 0
old_distance = float('inf')
while n < len(iris_train):
#finding the difference between test and train point
iris_diff = (abs(iris_test[i] - iris_train[n])**2)
#summing up the calculated differences
iris_sum = sum(iris_diff)
new_distance = float(np.sqrt(iris_sum))
#if statement to update distance
if new_distance < old_distance:
index = n
old_distance = new_distance
n += 1
print(i + 1, test_cat[i], train_cat[index])
if test_cat[i] == train_cat[index]:
correct += 1
accuracy = ((correct)/float((len(iris_test)))*100)
print(f"Accuracy:{accuracy: .2f}%")pe here
:
The trick with computing the distances is to insert extra dimensions using numpy.newaxis and use broadcasting to compute a matrix with the distance from every testing sample to every training sample in one vectorized operation. Using numpy's broadcasting rules, diff has shape (num_test_samples, num_train_samples, num_features), and distance has shape (num_test_samples, num_train_samples) since we summed along the last axis in the call to numpy.sum.
Then you can use numpy.argmin to find the index of the closest training sample for every testing sample. index has shape (num_test_samples, ) since we did the reduction operation along the last axis of distance.
Finally, you can use index to select the training classification closest
to the testing classification. We can construct a boolean array that represents the equality between the testing classification and the closest training classification using the == operator. The number of correct classifications is then the sum of the True elements of this boolean array. Since True is casted to 1 and False is casted to 0 we can simply sum this boolean array to get the number of correct classifications.
# Compute the distance from every training sample to every testing sample
# Note that `np.sqrt` is not necessary since sqrt is a monotonically
# increasing function -- removing it doesn't change the answer
diff = iris_test[:, np.newaxis] - iris_train[np.newaxis, :]
distance = np.sqrt(np.sum(np.square(diff), axis=-1))
# Compute the index of the closest training sample to the testing sample
index = np.argmin(distance, axis=-1)
# Check if class of the closest training sample matches the class
# of the testing sample
correct = (test_cat == train_cat[index]).sum()
If I get correctly what you are doing (but I don't really need to, to answer the question), for each vector of iris_test, you are searching for the closest one in isis_train. Closest being here in the sense of euclidean distance.
So you have 3 embedded loop (pseudo-python)
for u in iris_test:
for v in iris_train:
s=0
for i in range(dimensionOfVectors):
s+=(iris_test[i]-iris_train[i])**2
dist=sqrt(s)
You are right to try to get rid of python loops. And the most important one to get rid of is the inner one. And you already got rid of this one. Since the inner loop of my pseudo code is, in your code, implicitly in:
iris_diff = (abs(iris_test[i] - iris_train[n])**2)
and
iris_sum = sum(iris_diff)
Both those line iterates through all dimensions of your vectors. But do it not in python but in internal numpy code, so it is fast.
One may object that you don't really need abs after a **2, that you could have called the np.linalg.norm function that does all those operations in one call
new_distance = np.linalg.norm(iris_test[i]-iris_train[n])
which is faster than your code. But at least, in your code, that loop over all components of the vectors is already vectorized.
The next stage is to vectorize the middle loop.
That also can be accomplished. Instead of computing one by one
new_distance = np.linalg.norm(iris_test[i]-iris_train[n])
You could compute in one call all the len(iris_train) distances between iris_test[i] and all iris_train[n].
new_distances = np.linalg.norm(iris_test[i]-iris_train, axis=1)
The trick here lies in numpy broadcasting and axis parameter
broadcasting means that you can compute the difference between a 1D, length W vector, and a 2D n×W array (iris_test[0] is a 1D vector, and iris_train is 2D-array whose number of columns is the same as the length of iris_test[0]). Because in such case, numpy broadcasts the 1st operator, and returns a 2D n×W array as result, whose each line k is iris_test[0] - iris_train[k].
Calling np.linalg.norm on that n×W 2D matrix would return a single float (the norm of the whole matrix). Unless you restrict the norm to the 2nd axis (axis=1). In which case, it returns n floats, each of them being the norm of one row.
In other words, after the previous line of code, new_distances[k] is the distance between iris_test[i] and iris_train[k].
Once that done, you can easily find k such as this distance is the smallest, using np.argmin.
np.argmin(new_distances) is the index of the smallest of the distances.
So, all together, your code could be rewritten as:
correct = 0
for i in range(len(iris_test)):
new_distances = np.linalg.norm(iris_test[i]-iris_train, axis=1)
index=np.argmin(new_distances)
#printing out classifications
print(i + 1, test_cat[i], train_cat[index])
if test_cat[i] == train_cat[index]:
correct += 1

Central difference with Convolution

So basically I am trying to do finite differencing on a 2d array without doing too many for loops. I would like to have the Hessian matrix of the array, and the gradient. So I need both the first order and second order derivative of the array.
This can be achieved by evaluating the following equation on on the array.
To deal with boundaries we only compute it for the interior points, so code for this derivate might look something like the following
arr = np.random.rand(16).reshape(4,4)
result = np.zeros_like(arr)
w, h = arr.shape
for i in range(1, w-1):
for j in range(1, h-1):
result[i,j] = (arr[i+1, j] - arr[i-1, j]) / (2*dx)
This gives the correct answer but can be very slow compared nu numpy operations, so I thought to myself. This is basically just a convolution with a kernel that looks like this
kernel = [1, 0 , -1]
So we execute the following code
from scipy.sigmal import convolve
result = np.pad((convolve(arr,kernel,mode='same',
method = 'direct')/(2*dx))[1:-1, 1:-1], 1).T
Since we are only dealing with the interior points, we cut them of and pad with zeros afterwards, to mimick what would happened in the previous naive case.
This works! But with some arrays, the mean squared error between the naive case and the convolution case sky rockets. So it seems that the numerical error increases very much for some cases.
I would like the speed gained by convolution with the stability of the naive case. Any help?
We can simply slice and operate. Hence, after output initialization, do -
result[1:-1,1:-1] = (arr[2:,1:-1] - arr[:-2,1:-1])/(2*dx)
Convolution IMHO would be an overkill when working with NumPy arrays, as slicing arrays are virtually free on memory and performance. Being compute heavy, one can look into numexpr though to leverage multi-cores.

numpy vector multiplication speed?

I'm new to numpy, and found such strange(as for me) behavior.
I'm implementing logistic regression cost function, here I have 2 column vectors with same dimension and same types(dfloat). y contains bunch of zeros and ones, and a contains float numbers in range (-1, 1).
At some point I should get dot product so I transpose one and multiply them:
x = y.T # a
But when I use
x = y # a.T
occasionally performance decreases about 3 times, while results are the same
Why is this so? Isn't operations are the same?
Thanks.
The performance decreases, and you get a very different answer!
For vector multiplication (unlike number multiplication) a # b != b # a. In your case (assuming column vectors), a.T # b is a number, but a # b.T is a full-blown matrix! So, if your vectors are both of shape (1, y), the last operation will result in a (y, y) matrix, which may be pretty huge. Of course, it'll take way more time to compute such a matrix (a.k.a. add a whole lot of numbers and produce a whole lot of numbers), than to add a bunch of numbers and produce one single number.
That's how matrix (or vector) multiplication works.

Numpy: Perform Multiplication-like Addition

I wanted to define my own addition operator that takes an Nx1 vector (call it A) and a 1xN vector (B) such that the element in the i^th row and j^th column is the sum of the i^th element in A and the j^th element in B. An example is illustrated here.
I was able to write the following code for the function (and it is correct as far as I know).
def test_fn(a, b):
a_len = a.shape[0]
b_len = b.shape[1]
prod = np.array([[0]*a_len]*b_len)
for i in range(a_len):
for j in range(b_len):
prod[i, j] = a[i, 0] + b[0, j]
return prod
However, the vectors I am working with contain thousands of elements, and the function above is quite slow. I was wondering if there was a better way to approach this problem, or if there was a numpy function that could be of use. Any help would be appreciated.
According to numpy's broadcasting rules, you can use a+b to implement your own defined operator.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

Parallel array manipulations in numpy

I have a code in which I need to handle some big numpy arrays. For example I have a 3D array A and I need to construct another 3d array B using the elements of A. However all the elements of B are independent of each other. Example:
for i in np.arange(Nx):
for j in np.arange(Ny):
for k in np.arange(Nz):
B[i][j][k] = A[i+1][j][k]*np.sqrt(A[i][j-1][k-1])
So it will speed up immensely if I can construct the B array parallely. What is the simplest way to do this in python?
I also have similar matrix operations like normalizing each row of a 2D array. Example
for i in np.arange(Nx):
f[i,:] = f[i,:]/np.linalg.norm(f[i,:])
This will also speed up if it runs parallely for each row. How can it be done?
You should look into Numpy's roll function. I think this is equivalent to your first block of code (though you need to decide what happens at the edges - roll "wraps around"):
B = np.roll(A,1,axis=0) * np.sqrt(np.roll(np.roll(A,-1,axis=1),-1,axis=2))
Another fairly horrible one-liner for your second case is:
f /= np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]
Explanation of this line:
We are first going to calculate the norm of each row. Let's
f = np.random.rand(5,6)
Square each element of f
f**2
Sum the squares along axis 1, which "flattens" out that axis.
np.sum(f**2, axis=1)
Take the square root of the sum of the squares.
np.sqrt(np.sum(f**2, axis=1))
We now have the norm of each row.
To divide each original row of f by this correctly we need to make use of the Numpy broadcasting rules to effectively add a dimension:
np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]
And finally we calculate our result
f /= np.sqrt(np.sum(f**2, axis=1))[...,np.newaxis]
If you are taking good care of the edges, the standard way of going about your first vectorization would be something like this:
B = np.zeros(A.shape)
B[:-1, 1:, 1:] = A[1:, 1:, 1:] * np.sqrt(A[:-1, :-1, :-1])
You would then need to fill B[-1, :, :], B[:, 0, :] and B[:, :, 0] with appropriate values.
Extending this to other indices should be pretty straightforward.
To perform parallel processing in numpy, you should look at mpi4py. It's an MPI binding for Python. It allows distributed processing.

Categories

Resources