Normalize 2d arrays - python

Consider a square matrix containing positive numbers, given as a 2d numpy array A of shape ((m,m)). I would like to build a new array B that has the same shape with entries
B[i,j] = A[i,j] / (np.sqrt(A[i,i]) * np.sqrt(A[j,j]))
An obvious solution is to loop over all (i,j) but I'm wondering if there is a faster way.

Two approaches leveraging broadcasting could be suggested.
Approach #1 :
d = np.sqrt(np.diag(A))
B = A/d[:,None]
B /= d
Approach #2 :
B = A/(d[:,None]*d) # d same as used in Approach #1
Approach #1 has lesser memory overhead and as such I think would be faster.

You can normalize each row of your array by the main diagonal leveraging broadcasting using
b = np.sqrt(np.diag(a))
a / b[:, None]
Also, you can normalize each column using
a / b[None, :]
To do both, as your question seems to ask, using
a / (b[:, None] * b[None, :])
If you want to prevent the creation of intermediate arrays and do the operation in place, you can use
a /= b[:, None]
a /= b[None, :]

Related

Applying mathematical operation between rows of two numpy arrays

Let's assume we have two numpy arrays A (n1xm) and B (n2xm) and I want to apply a certain mathematical operation between the rows of both tables.
For example, let's say that we want to calculate the Euclidean distance between each row of A and each row of B and store it at a new numpy table C (n1xn2).
The simple for-loop approach would be something like the following:
C = np.zeros((A.shape[0],B.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = np.linalg.norm(A[i]-B[j])
However, the above implementation is not the most efficient. How could I write this differently by using vectorization to speed up the implementation ?
You can broadcast over a new axis:
# n1 x m x n2
diff = A[:, :, None] - B[:, :, None].T
# n1 x n2 after summing across m
dists = np.sqrt((diff * diff).sum(1))

Broadcast an array by N dimensions

I have two arrays,
import numpy as np
a = np.ones(100)
b = np.ones(1000).reshape(100, 1, 10)
dims_difference = b.ndim - a.ndim
Assume that b has more dimensions than a, but not necessarily two. I want to extend a to make sure the operation a + b works as intended (over the first axis). When I know that it is two, I can do that by hard-coding:
a = a[:, None, None]
How can I do this in a general way when the number of dimensions that need to be added at the and are contained in dims_difference?
One - not so elegant - solution based on #hpaulj's comment is the following:
for i in np.arange(dims_difference)+1:
a = np.expand_dims(a, i)

Want to define an ndarray in numpy elementwise

I have 2 2d numpy arrays, A with shape (i,j) and B (i, k) where j >> k. I want to define a new 3d array C such that each element in C is the broadcasted element wise product of each column in A with the whole matrix B. In other words as a normal python loop I would do it like this
for x in range(j):
C[x] = A[:,x]*B
However j is very large in this case and it would benefit me a lot if I am able to use Numpy's functionality to maybe define an ndarray C elementwise like in my loop above.
Thank you for your help
You can use broadcasting like this:
a.T[:, :, None] * b
Example:
import numpy as np
np.random.seed(444)
i, j, k = 2, 10, 3
a = np.random.randn(i, j)
b = np.random.randn(i, k)
c = a.T[:, :, None] * b
print(c.shape)
# (10, 2, 3)
Transposing stems from the fact that you want to internally operate for each column in a, and [:, :, None] expands the dimensionality to enable broadcasting, as explained in NumPy's broadcasting rules.

Numpy - Speed up iteration comparison?

The following use case:
I have a Numpy matrix/array with a few thousand 2d points. Call it A.
Eg:
[1 2]
[300 400]
..
[123 242]
I also have another Numpy matrix with a few 2d points as above. Call it B.
Basically, I want to iterate through A, then iterate through B and compute the distance between A[i] and B[j]. Then assign that back to another array. I could do it like this:
for i, (x0, x1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = 0
for j, (p0, p1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = weight_distance + distance((p0,p1),(x0,x1))
weight_array[i] = weight_distance
But this is too slow. What might be a Numpy way to approach this?
What you're probably looking for is the code in scipy.spatial.distance, particularly the cdist function. This can efficiently compute the pairwise distances between arrays of points for a wide variety of metrics.
import numpy as np
from scipy.spatial.distance import cdist
A = np.random.random((1000, 2))
B = np.random.random((100, 2))
D = cdist(A, B, metric='euclidean')
print(D.shape) # (1000, 100)
weights = D.sum(1)
print(weights.shape) # (1000,)
Here euclidean is the standard root-sum-square distance that you're probably used to, and D[i, j] holds the distance between A[i] and B[j], and so summing along axis 1 gives the desired weights.
There are ways to do this via broadcasting directly in numpy, but that approach would use several large temporary arrays, and will in general be slower than the scipy cdist approach.
Edit:
I thought I may as well add a note on the NumPy-only approach. It looks like this:
D2 = np.sqrt(((A[:, None, :] - B[None, :, :]) ** 2).sum(-1))
weights2 = D2.sum(1)
np.allclose(weights, weights2) # True
Let's break it down:
A[:, None, :] adds a new dimension to A, so its shape is now [1000, 1, 2]. Similar for B[None, :, :], which becomes [1, 100, 2]
A[:, None, :] - B[None, :, :] is a broadcasting operation which results in an array of differences, with shape [1000, 100, 2]
We square every element of this result.
the sum(-1) method on this result sums across the last dimension, resulting in an array of shape [1000, 100]
we take the square root of the result, which gives the distance matrix
we sum along axis 1 to get the weights
Notice that this broadcasting approach creates not one, but two temporary arrays of size 1000 * 100 * 2 along the way, which is why it is less efficient than a purpose-built compiled function like cdist.

numpy matrix multiplication

I am trying to figure out how to do a kind of scalar matrix multiplication in numpy.
I have
a = array(((1,2,3),(4,5,6)))
b = array((11,12))
and i want to do
a op b
to result in
array(((1*11,2*11,3*11),(4*12,5*12,6*12))
right now I am using the following expression
c= a * array((b, b, b)).transpose()
It seems like there must be a more efficient way of doing this though
Taking advantage of broadcasting:
(a.T * b).T
The alternative to transposing a is to change the shape of b to make broadcasting give the result you're looking for:
a * b[:, np.newaxis]
Note that adding the new axis to b gives the following array:
array([[11],
[12]])

Categories

Resources