Compute a kernel matrix with a custom kernel function - python

is there a way to create something like a correlation matrix with a different function:
starting from this:
X = array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
which is in the shape: (n_samples, n_features) and turn it into something like this:
array([[f(X[0],X[0]),f(X[0],X[1]),f(X[0],X[2])],
[f(X[1],X[0]), f(X[1],X[1]),f(X[1],X[2])],
[f(X[2],X[0]), f(X[2],X[1]),f(X[2],X[2])]])
thanks!
which is essentially all samples passed to a function with eachother
so the way i currenty solve it is with a nested loop:
for i in range(samples):
for j in range(samples):
r = test_kernel(X[i],X[j])
output[i,j] = r
but i doubt thats the most efficient way to do it, since i the matrix is symetrical, i have to do multiple calculations twice

As far as I understand your question, you are searching for a possibility to use some custom metric to define a kernel matrix based on a vector as input. You can use pairwise_kernels from sklearn.metrics.pairwise:
Vector as input
import numpy as np
from sklearn.metrics.pairwise import pairwise_kernels
x = np.array([[1],[2],[3]])
print('Input vector:\n', x)
kernel_default = pairwise_kernels(x)
print('Default metric - Squared euclidean norm as kernel function: \n', kernel_default)
def custom_kernel(x, y):
# Here you can define your custom transform, e.g.:
return x**3 + y**3
kernel_custom = pairwise_kernels(x, metric=custom_kernel)
print('Some custom norm, which has no meaning...:\n', kernel_custom)
results in
Input vector:
[[1]
[2]
[3]]
Default metric - Squared euclidean norm as kernel function:
[[1. 2. 3.]
[2. 4. 6.]
[3. 6. 9.]]
Some custom norm, which has no meaning...:
[[ 2. 9. 28.]
[ 9. 16. 35.]
[28. 35. 54.]]
Multi-dimensional input
import numpy as np
from sklearn.metrics.pairwise import pairwise_kernels
x = np.array([[1, 1],[2, 2],[3, 3]])
print('Input vector:\n', x)
def custom_kernel(x, y):
# Here you can define your custom transform, e.g.:
return np.sum(x)**3 + np.sum(y)**3
kernel_custom = pairwise_kernels(x, metric=custom_kernel)
print('Some custom norm, which has no meaning...:\n', kernel_custom)
results in
Input vector:
[[1 1]
[2 2]
[3 3]]
Some custom norm, which has no meaning...:
[[ 16. 72. 224.]
[ 72. 128. 280.]
[224. 280. 432.]]

array([[f(X[i],X[j]) for i in range (len(X))] for j in range (len(X))])?

Related

Broadcasting a function to a 3D array Python

I tried understanding numpy broadcasting with 3d arrays but I think the OP there is asking something slightly different.
I have a 3D numpy array like so -
IQ = np.array([
[[1,2],
[3,4]],
[[5,6],
[7,8]]
], dtype = 'float64')
The shape of this array is (2,2,2). I want to apply a function to each 1x2 array in this 3D matrix like so -
def func(IQ):
I = IQ[0]
Q = IQ[1]
amp = np.power((np.power(I,2) + np.power(Q, 2)),1/2)
phase = math.atan(Q/I)
return [amp, phase]
As you can see, I want to apply my function to each 1x2 array and replace it with the return value of my function. The output is a 3D array with the same dimensions. Is there a way to broadcast this function to each 1x2 array in my original 3D array? Currently I am using loops which becomes very slow as the 3D array increases in dimensions.
Currently I am doing this -
#IQ is defined from above
for i in range(IQ.shape[0]):
for j in range(IQ.shape[1]):
I = IQ[i,j,0]
Q = IQ[i,j,1]
amp = np.power((np.power(I,2) + np.power(Q, 2)),1/2)
phase = math.atan(Q/I)
IQ[i,j,0] = amp
IQ[i,j,1] = phase
And the returned 3D array is -
[[[ 2.23606798 1.10714872]
[ 5. 0.92729522]]
[[ 7.81024968 0.87605805]
[10.63014581 0.85196633]]]
One way is to slice the arrays to extract the I and Q values, perform the computations using normal broadcasting, and then stick the values back together:
>>> Is, Qs = IQ[...,0], IQ[...,1]
>>> np.stack(((Is**2 + Qs**2) ** 0.5, np.arctan2(Qs, Is)), axis=-1)
array([[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]])
It can be done using arrays:
# sort of sum of squares along axis 2, ie (IQ[..., 0]**2 + IQ[..., 1]**2 + ...)**0.5
amp = np.sqrt(np.square(IQ).sum(axis=2))
amp
>>> array([[ 2.23606798, 5. ],
[ 7.81024968, 10.63014581]])
# and phase is arctan for each component in each matrix
phase = np.arctan2(IQ[..., 1], IQ[..., 0])
phase
>>> array([[1.10714872, 0.92729522],
[0.87605805, 0.85196633]])
# then combine the arrays to 3d
np.stack([amp, phase], axis=2)
>>> array([[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]])
I = IQ[..., 0]
Q = IQ[..., 1]
amp = np.linalg.norm(IQ, axis= 2)
phase = np.arctan(Q/I)
IQ[..., 0] = amp
IQ[..., 1] = phase
IQ
>> [[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]]

python tensorflow l2 loss over axis

I am using python 3 with tensorflow
I have a matrix, each row is a vector, I want to get a distance matrix - that is computer using the l2 norm loss, each value in the matrix will be a distance between two vectors
e.g
Dij = l2_distance(M(i,:), Mj(j,:))
Thanks
edit:
this is not a duplicate of that other question is about computing the norm for the each row of a matrix, I need the pairwise norm distance between each row to every other row.
This answer shows how to compute the pair-wise sum of squared differences between a collection of vectors. By simply post-composing with the square root, you arrive at your desired pair-wise distances:
M = tf.constant([[0, 0], [2, 2], [5, 5]], dtype=tf.float64)
r = tf.reduce_sum(M*M, 1)
r = tf.reshape(r, [-1, 1])
D2 = r - 2*tf.matmul(M, tf.transpose(M)) + tf.transpose(r)
D = tf.sqrt(D2)
with tf.Session() as sess:
print(sess.run(D))
# [[0. 2.82842712 7.07106781]
# [2.82842712 0. 4.24264069]
# [7.07106781 4.24264069 0. ]]
You can write a TensorFlow operation based on the formula of Euclidean distance (L2 loss).
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x1, x2))))
Sample would be
import tensorflow as tf
x1 = tf.constant([1, 2, 3], dtype=tf.float32)
x2 = tf.constant([4, 5, 6], dtype=tf.float32)
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x1, x2))))
with tf.Session() as sess:
print(sess.run(distance))
As pointed out by #fuglede, if you want to output the pairwise distances, then we can use
tf.sqrt(tf.square(tf.subtract(x1, x2)))

Squared Mahalanobis distance function in Python returning array - why?

The code is:
import numpy as np
def Mahalanobis(x, covariance_matrix, mean):
x = np.array(x)
mean = np.array(mean)
covariance_matrix = np.array(covariance_matrix)
return (x-mean)*np.linalg.inv(covariance_matrix)*(x.transpose()-mean.transpose())
#variables x and mean are 1xd arrays; covariance_matrix is a dxd matrix
#the 1xd array passed to x should be multiplied by the (inverted) dxd array
#that was passed into the second argument
#the resulting 1xd matrix is to be multiplied by a dx1 matrix, the transpose of
#[x-mean], which should result in a 1x1 array (a number)
But for some reason I get a matrix for my output when I enter the parameters
Mahalanobis([2,5], [[.5,0],[0,2]], [3,6])
output:
out[]: array([[ 2. , 0. ],
[ 0. , 0.5]])
It seems my function is just giving me the inverse of the 2x2 matrix that I input in the 2nd argument.
You've made the classic mistake of assuming that the * operator is doing matrix multiplication. This is not true in Python/numpy (see http://www.scipy-lectures.org/intro/numpy/operations.html and https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html). I broke it down into intermediate steps and used the dot function
import numpy as np
def Mahalanobis(x, covariance_matrix, mean):
x = np.array(x)
mean = np.array(mean)
covariance_matrix = np.array(covariance_matrix)
t1 = (x-mean)
print(f'Term 1 {t1}')
icov = np.linalg.inv(covariance_matrix)
print(f'Inverse covariance {icov}')
t2 = (x.transpose()-mean.transpose())
print(f'Term 2 {t2}')
mahal = t1.dot(icov.dot(t2))
#return (x-mean)*np.linalg.inv(covariance_matrix).dot(x.transpose()-mean.transpose())
return mahal
#variables x and mean are 1xd arrays; covariance_matrix is a dxd matrix
#the 1xd array passed to x should be multiplied by the (inverted) dxd array
#that was passed into the second argument
#the resulting 1xd matrix is to be multiplied by a dx1 matrix, the transpose of
#[x-mean], which should result in a 1x1 array (a number)
Mahalanobis([2,5], [[.5,0],[0,2]], [3,6])
produces
Term 1 [-1 -1]
Inverse covariance [[2. 0. ]
[0. 0.5]]
Term 2 [-1 -1]
Out[9]: 2.5
One can use scipy's mahalanobis() function to verify:
import scipy.spatial, numpy as np
scipy.spatial.distance.mahalanobis([2,5], [3,6], np.linalg.inv([[.5,0],[0,2]]))
# 1.5811388300841898
1.5811388300841898**2 # squared Mahalanobis distance
# 2.5000000000000004
def Mahalanobis(x, covariance_matrix, mean):
x, m, C = np.array(x), np.array(mean), np.array(covariance_matrix)
return (x-m)#np.linalg.inv(C)#(x-m).T
Mahalanobis([2,5], [[.5,0],[0,2]], [3,6])
# 2.5
np.isclose(
scipy.spatial.distance.mahalanobis([2,5], [3,6], np.linalg.inv([[.5,0],[0,2]]))**2,
Mahalanobis([2,5], [[.5,0],[0,2]], [3,6])
)
# True

Mahalanabois distance in python returns matrix instead of distance

This should be a simple question, either I am missing information, or I have mis-coded this.
I am trying to implement Mahalanabois distance in python which I am following from the formula in python.
My code is as follows:
a = np.array([[1, 3, 5]])
b = np.array([[4, 5, 6]])
X = np.empty((0,3), float)
X = np.vstack([X, [2,3,4]])
X = np.vstack([X, a])
X = np.vstack([X, b])
n = ((a-b).T)*(np.cov(X)**-1)*(a-b)
dist = np.sqrt(n)
dist returns a 3x3 array but should I not be expecting a single number representing the distance?
dist = array([[ 1.5 , 1.73205081, 1.22474487],
[ 1.73205081 , 2. , 1.41421356],
[ 1.22474487 , 1.41421356, 1. ]])
Wikipedia does not suggest (to me) that it should return a matrix. Googling implementations of mahalanbois distance in python I have not found something to compare it to.
From wiki page you could see, that a and b are vectors but in your case they are arrays. So you need reverse transposing. And also there should be matrix multiplication. In numpy * means element-wise multiplication, for matrix you should use np.dot function or .dot method of the np.array. For your case answer is:
n = (a-b).dot((np.cov(X)**-1).dot((a-b).T))
dist = np.sqrt(n)
In [54]: n
Out[54]: array([[ 25.]])
In [55]: dist
Out[55]: array([[ 5.]])
EDIT
As #roadrunner66 noticed you should use inverse matrix instead of inverse matrix of element. Usually np.linalg.inv works for that cases but for that you've got Singular Error and you need to use np.linalg.pinv:
n = (a-b).dot((np.linalg.pinv(np.cov(X))).dot((a-b).T))
dist = np.sqrt(n)
In [90]: n
Out[90]: array([[ 1.77777778]])
In [91]: dist
Out[91]: array([[ 1.33333333]])

How to normalize a NumPy array to a unit vector?

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
This function handles the situation where vector v has the norm value of 0.
Is there any similar functions provided in sklearn or numpy?
If you're using scikit-learn you can use sklearn.preprocessing.normalize:
import numpy as np
from sklearn.preprocessing import normalize
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.
import numpy as np
def normalized(a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2==0] = 1
return a / np.expand_dims(l2, axis)
A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))
print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))
This might also work for you
import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v has length 0.
In that case, introducing a small constant to prevent the zero division solves this.
As proposed in the comments one could also use
v/np.linalg.norm(v)
To avoid zero division I use eps, but that's maybe not great.
def normalize(v):
norm=np.linalg.norm(v)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True):
# d is a (n x dimension) np array
d = _d if not copy else np.copy(_d)
d -= np.min(d, axis=0)
d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
return d
Uses numpys peak to peak function.
a = np.random.random((5, 3))
b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1
c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1
If you don't need utmost precision, your function can be reduced to:
v_norm = v / (np.linalg.norm(v) + 1e-16)
You mentioned sci-kit learn, so I want to share another solution.
sci-kit learn MinMaxScaler
In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.
It also deal with NaN issues for us.
NaNs are treated as missing values: disregarded in fit, and maintained
in transform. ... see reference [1]
Code sample
The code is simple, just type
# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference
[1] sklearn.preprocessing.MinMaxScaler
There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:
import transformations as trafo
import numpy as np
data = np.array([[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
[1.0, 2.0, 3.0]])
print(trafo.unit_vector(data, axis=1))
If you work with multidimensional array following fast solution is possible.
Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.
import numpy as np
arr = np.array([
[1, 2, 3],
[0, 0, 0],
[5, 6, 7]
], dtype=np.float)
lengths = np.linalg.norm(arr, axis=-1)
print(lengths) # [ 3.74165739 0. 10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0. 0. 0. ]
# [0.47673129 0.57207755 0.66742381]]
If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:
import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize
vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()
If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.
import numpy as np
import vg
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.
Without sklearn and using just numpy.
Just define a function:.
Assuming that the rows are the variables and the columns the samples (axis= 1):
import numpy as np
# Example array
X = np.array([[1,2,3],[4,5,6]])
def stdmtx(X):
means = X.mean(axis =1)
stds = X.std(axis= 1, ddof=1)
X= X - means[:, np.newaxis]
X= X / stds[:, np.newaxis]
return np.nan_to_num(X)
output:
X
array([[1, 2, 3],
[4, 5, 6]])
stdmtx(X)
array([[-1., 0., 1.],
[-1., 0., 1.]])
For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.
a / np.linalg.norm(a, axis=1, keepdims=True)
If you want all values in [0; 1] for 1d-array then just use
(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
Where a is your 1d-array.
An example:
>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])
Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.
A simple dot product would do the job. No need for any extra package.
x = x/np.sqrt(x.dot(x))
By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use
norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

Categories

Resources