Matrix operation using python - python

I have three set of matrix and I am required to produce the desired output. The problem is I didn't know the operation to solve the matrix.
The matrix:
a= [[1]
[1]
[0]
[0]]
b= [[ 1. ]
[-0.5 ]
[-0.8660254]
[ 0. ]]
c= [[ 1]
[-1]
[ 0]
[ 0]]
Using the three matrix, I need to produce the result of
d=[[ 1]
[0.5]
[0.86]
[0]]
So what is a?b?c?=d. I hope anyone may help me. Thank you.

Use this code to get the desired result. First, convert the lists to arrays and then perform the following operation.
a= np.array([[1],[1],[0],[0]])
b= np.array([[ 1. ],[-0.5 ],[-0.8660254],[ 0. ]])
c= np.array([[ 1],[-1],[ 0],[ 0]])
d=np.array([[ 1],[0.5],[0.86], [0]])
a-b+c
array([[1. ],
[0.5 ],
[0.8660254],
[0. ]])

The answer is simply:
a - b + c = d

Related

Eucledian distance matrix between two matrices

I have the following function that calculates the eucledian distance between all combinations of the vectors in Matrix A and Matrix B
def distance_matrix(A,B):
n=A.shape[1]
m=B.shape[1]
C=np.zeros((n,m))
for ai, a in enumerate(A.T):
for bi, b in enumerate(B.T):
C[ai][bi]=np.linalg.norm(a-b)
return C
This works fine and creates an n*m-Matrix from a d*n-Matrix and a d*m-Matrix containing the eucledian distance between all combinations of the column vectors.
>>> print(A)
[[-1 -1 1 1 2]
[ 1 -1 2 -1 1]]
>>> print(B)
[[-2 -1 1 2]
[-1 2 1 -1]]
>>> print(distance_matrix(A,B))
[[2.23606798 1. 2. 3.60555128]
[1. 3. 2.82842712 3. ]
[4.24264069 2. 1. 3.16227766]
[3. 3.60555128 2. 1. ]
[4.47213595 3.16227766 1. 2. ]]
I spent some time looking for a numpy or scipy function to achieve this in a more efficient way. Is there such a function or what would be the vecotrized way to do this?
You can use:
np.linalg.norm(A[:,:,None]-B[:,None,:],axis=0)
or (totaly equivalent but without in-built function)
((A[:,:,None]-B[:,None,:])**2).sum(axis=0)**0.5
We need a 5x4 final array so we extend our array this way:
A[:,:,None] -> 2,5,1
↑ ↓
B[:,None,:] -> 2,1,4
A[:,:,None] - B[:,None,:] -> 2,5,4
and we apply our sum over the axis 0 to finally get a 5,4 ndarray.
Yes, you can broadcast your vectors:
A = np.array([[-1, -1, 1, 1, 2], [ 1, -1, 2, -1, 1]])
B = np.array([[-2, -1, 1, 2], [-1, 2, 1, -1]])
C = np.linalg.norm(A.T[:, None, :] - B.T[None, :, :], axis=-1)
print(C)
array([[2.23606798, 1. , 2. , 3.60555128],
[1. , 3. , 2.82842712, 3. ],
[4.24264069, 2. , 1. , 3.16227766],
[3. , 3.60555128, 2. , 1. ],
[4.47213595, 3.16227766, 1. , 2. ]])
You can get an explanation of how it works here:
https://sparrow.dev/pairwise-distance-in-numpy/

calculate distance from all points in numpy array to a single point on the basis of index

Suppose a 2d array is given as:
arr = array([[1, 1, 1],
[4, 5, 8],
[2, 6, 9]])
if point=array([1,1]) is given then I want to calculate the euclidean distance from all indices of arr to point (1,1). The result should be
array([[1.41 , 1. , 1.41],
[1. , 0. , 1. ],
[1.41 , 1. , 1.41]])
For loop is too slow to do these computations. Is there any faster method to achieve this using numpy or scipy?
Thanks!!!
Approach #1
You can use scipy.ndimage.morphology.distance_transform_edt -
def distmat(a, index):
mask = np.ones(a.shape, dtype=bool)
mask[index[0],index[1]] = False
return distance_transform_edt(mask)
Approach #2
Another with NumPy-native tools -
def distmat_v2(a, index):
i,j = np.indices(a.shape, sparse=True)
return np.sqrt((i-index[0])**2 + (j-index[1])**2)
Sample run -
In [60]: a
Out[60]:
array([[1, 1, 1],
[4, 5, 8],
[2, 6, 9]])
In [61]: distmat(a, index=[1,1])
Out[61]:
array([[1.41421356, 1. , 1.41421356],
[1. , 0. , 1. ],
[1.41421356, 1. , 1.41421356]])
In [62]: distmat_v2(a, index=[1,1])
Out[62]:
array([[1.41421356, 1. , 1.41421356],
[1. , 0. , 1. ],
[1.41421356, 1. , 1.41421356]])
Benchmarking
Other proposed solution(s) :
# https://stackoverflow.com/a/61629292/3293881 #Ehsan
def norm_method(arr, point):
point = np.asarray(point)
return np.linalg.norm(np.indices(arr.shape, sparse=True)-point)
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
In [66]: import benchit
In [76]: funcs = [distmat, distmat_v2, norm_method]
In [77]: inputs = {n:(np.random.rand(n,n),[1,1]) for n in [3,10,50,100,500,1000,2000,5000]}
In [83]: T = benchit.timings(funcs, inputs, multivar=True, input_name='Length')
In [84]: In [33]: T.plot(logx=True, colormap='Dark2', savepath='plot.png')
So, distmat_v2 seems to be doing really well, We can further improve on it, by leveraging numexpr.
Extend to array of indices
We could extend the listed solutions to cover for the generic/bigger case of list/array of indices w.r.t. whom we need to get euclidean distances at rest of the positions, like so -
def distmat_indices(a, indices):
indices = np.atleast_2d(indices)
mask = np.ones(a.shape, dtype=bool)
mask[indices[:,0],indices[:,1]] = False
return distance_transform_edt(mask)
def distmat_indices_v2(a, indices):
indices = np.atleast_2d(indices)
i,j = np.indices(a.shape, sparse=True)
return np.sqrt(((i-indices[:,0])[...,None])**2 + (j-indices[:,1,None])**2).min(1)
Sample run -
In [143]: a = np.random.rand(4,5)
In [144]: distmat_indices(a, indices=[[2,2],[0,3]])
Out[144]:
array([[2.82842712, 2. , 1. , 0. , 1. ],
[2.23606798, 1.41421356, 1. , 1. , 1.41421356],
[2. , 1. , 0. , 1. , 2. ],
[2.23606798, 1.41421356, 1. , 1.41421356, 2.23606798]])
On top of #Divakar's good solutions, if you are looking for something abstract, you can use:
np.linalg.norm(np.indices(arr.shape, sparse=True)-point)
Note that it works with numpy 1.17+ (argument sparse is added on the versions 1.17+ of numpy). Upgrade your numpy and enjoy.
In case you have older than 1.17 version of numpy , you can add dimensions to your point by using this:
np.linalg.norm(np.indices(arr.shape)-point[:,None,None], axis=0)
output for point=np.array([1,1]) and given array in question:
[[1.41421356 1. 1.41421356]
[1. 0. 1. ]
[1.41421356 1. 1.41421356]]

Getting first principal component and reduction in variance with PCA using Numpy

I am following this example here: https://machinelearningmastery.com/calculate-principal-component-analysis-scratch-python/
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
which gives:
original data
[[1 2]
[3 4]
[5 6]]
column mean
[ 3. 4.]
centered matrix
[[-2. -2.]
[ 0. 0.]
[ 2. 2.]]
covariance matrix
[[ 4. 4.]
[ 4. 4.]]
vectors
[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
values
[ 8. 0.]
projected data
[[-2.82842712 0. ]
[ 0. 0. ]
[ 2.82842712 0. ]]
If I want to find the first principal direction, do I simply take the eigenvalue that corresponds to the largest eigenvector? Therefore:[0.70710678, 0.70710678] ?
Building upon this, is the first principal component the highest eigenvector projected onto the data? Something like:
vectors[:,:1].T.dot(C.T)
which gives:
array([[-2.82842712, 0. , 2.82842712]])
I just fear I have the terminology confused, or I'm oversimplifying things. Thanks in advance!

Building NumPy array using values from another array

Consider the following code:
import numpy as np
index_info = np.matrix([[1, 1], [1, 2]])
value = np.matrix([[0.5, 0.5]])
initial = np.zeros((3, 3))
How can I produce a matrix, final, which has the structure of initial with the elements specified by value at the locations specified by index_info WITHOUT a for loop? In this toy example, see below.
final = np.matrix([[0, 0, 0], [0, 0.5, 0.5], [0, 0, 0]])
With a for loop, you can easily loop through all of the index's in index_info and value and use that to populate initial and form final. But is there a way to do so with vectorization (no for loop)?
Convert index_info to a tuple and use it to assign:
>>> initial[(*index_info,)]=value
>>> initial
array([[0. , 0. , 0. ],
[0. , 0.5, 0.5],
[0. , 0. , 0. ]])
Please note that use of the matrix class is discouraged. Use ndarray instead.
You can do this with NumPy's array indexing:
>>> initial = np.zeros((3, 3))
>>> row = np.array([1, 1])
>>> col = np.array([1, 2])
>>> final = np.zeros_like(initial)
>>> final[row, col] = [0.5, 0.5]
>>> final
array([[0. , 0. , 0. ],
[0. , 0.5, 0.5],
[0. , 0. , 0. ]])
This is similar to #PaulPanzer's answer, where he is unpacking row and col from index_info all in one step. In other words:
row, col = (*index_info,)

Sparse arrays from tuples

I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])
Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]

Categories

Resources