It's a python question: let's say I have an m+1-dimensional numpy array a consisting of non-negative numbers, and I would like to obtain an array b of the same size where the last coordinates are normalized so that they sum up to 1, or zero in case all of them were zeros. For example, if m = 2, my code would be as follows
import numpy as np
a = np.array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]])
for i1 in range(len(a)):
for i2 in range(len(a)):
s = a[i1][i2].sum()
if s > 0:
a[i1][i2] = a[i1][i2]/s
however I find this method sloppy. Also, it works only for fixed m.
This can be done by broadcasting. There are several ways to take into account the zero-sum exception. Without taking it into account, you could write
import numpy as np
shape = (2, 3, 4)
X = np.random.randn(*shape) ** 2
sums = X.sum(-1)
Y = X / sums[..., np.newaxis]
Now, in order to take into account potential zero-sum-ness of some lines, we set one line of the data to 0:
X[0, 0, :] = 0
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
You will observe that Y.sum(axis=-1) has the entry 0 in coordinate (0,0) reflecting the zero-ness of the corresponding line.
EDIT: Application to the concrete example
X = np.array(array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]]))
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
yields Y == X (because along the last axis the sum is already one or zero.)
Related
I am trying to calculate the inverse matrix using the Gauss-Jordan Method. For that, I need to find the solution X to A.X = I (A and X being N x N matrices, and I the identity matrix).
However, for every column vector of the solution matrix X I calculate in the first loop, I have to use the original matrix A, but I don't know why it keeps changing when I did a copy of it in the beginning.
def SolveGaussJordanInvMatrix(A):
N = len(A[:,0])
I = np.identity(N)
X = np.zeros([N,N], float)
A_orig = A.copy()
for m in range(N):
x = np.zeros(N, float)
v = I[:,m]
A = A_orig
for p in range(N): # Gauss-Jordan Elimination
A[p,:] /= A[p,p]
v[p] /= A[p,p]
for i in range(p): # Cancel elements above the diagonal element
v[i] -= v[p] * A[i,p]
A[i,p:] -= A[p,p:]*A[i,p]
for i in range(p+1, N): # Cancel elements below the diagonal element
v[i] -= v[p] * A[i,p]
A[i,p:] -= A[p,p:]*A[i,p]
X[:,m] = v # Add column vector to the solution matrix
return X
A = np.array([[2, 1, 4, 1 ],
[3, 4, -1, -1],
[1, -4, 7, 5],
[2, -2, 1, 3]], float)
SolveGaussJordanInvMatrix(A)
Does anyone know how turn A back to its original form after the Gauss-Elimination loop?
I'm getting
array([[ 228.1, 0. , 0. , 0. ],
[-219.9, 1. , 0. , 0. ],
[ -14.5, 0. , 1. , 0. ],
[-176.3, 0. , 0. , 1. ]])
and expect
[[ 1.36842105 -0.89473684 -1.05263158 1. ]
[-1.42105263 1.23684211 1.13157895 -1. ]
[ 0.42105263 -0.23684211 -0.13157895 -0. ]
[-2. 1.5 1.5 -1. ]]
here is my matrix setup:
for a in b:
size_x = len(a) + 1
size_y = len(b) + 1
matrix = np.zeros ((size_x, size_y))
for x in range(size_x):
matrix [x, 0] = x
for y in range(size_y):
matrix [0, y] = y
for x in range(1, size_x):
for y in range(1, size_y):
if a[x-1] == b[y-1]:
matrix [x,y] = min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix [x,y] = min(
matrix[x-1,y] + 1,
matrix[x-1,y-1] + 1,
matrix[x,y-1] + 1
)
print(matrix)
This would give outputs such as
t e s t
[[ 0. 1. 2. 3. 4.]
t [ 1. 0. 1. 2. 3.]
e [ 2. 1. 0. 1. 2.]
x [ 3. 2. 1. 1. 2.]
t [ 4. 3. 2. 1. 1.]]
In which the bottom right-hand corner is the final value. How do I take this out and add it to a list?
You can access the ith index of an array arr by using this expression: arr[i] .
In order to answer your question -> accessing the bottom-right value of a 2d matrix simply use,
matrix[matrix.length-1][matrix[matrix.length-1].length-1]
or better
lastRowIndex = matrix.length-1;
lastColIndex = matrix[lastRowIndex].length-1;
bottomRightValue = matrix[lastRowIndex][lastColIndex]
I'm looking for efficient alternate ways to compute cosine angle between 2D vectors. Your insights on this problem will be of much help.
Problem Statement:
vectors is a 2D array where vectors are stored. The shape of the vectors array is (N, 2) where N is the number of vectors. vectors[:, 0] has x-component and vectors[:, 1] has y-component.
I have to find the angle between all the vectors in vectors. For example, if there are three vectors A, B, C in vectors, I need to find the angle between A and B, B and C, and, A and C.
I have implemented it and wants to know alternative ways.
Current Implementation:
vectors = np.array([[1, 3], [2, 4], [3, 5]])
vec_x = vectors[:, 0]
vec_y = vectors[:, 1]
a1 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_x
a2 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_y
a1b1 = a1 * a1.T
a2b2 = a2 * a2.T
mask = np.triu_indices(a1b1.shape[0], 0) # We are interested in lower triangular matrix
a1b1[mask] = 0
a2b2[mask] = 0
numer = a1b1 + a2b2
denom = np.ones([vec_x.shape[0], vec_x.shape[0]]) * np.sqrt(np.square(a1) + np.square(a2))
denom = denom * denom.T
denom[mask] = 0
eps = 1e-7
dot_res = np.rad2deg(np.arccos(np.divide(numer, denom + eps)))
dot_res[mask] = 0
print(dot_res)
Output:
[[ 0. 0. 0. ]
[ 8.13010519 0. 0. ]
[12.52880911 4.39870821 0. ]]
Questions:
Is there any alternative way to do this more efficient?
Can we improve the speed of the current version in some way?
Use scipy.spatial.distance.pdist:
import numpy as np
import scipy.spatial.distance
vectors = np.array([[1, 3], [2, 4], [3, 5]])
# Compute cosine distance
dist = scipy.spatial.distance.pdist(vectors, 'cosine')
# Compute angles
angle = np.rad2deg(np.arccos(1 - dist))
# Make it into a matrix
angle_matrix = scipy.spatial.distance.squareform(angle)
print(angle_matrix)
# [[ 0. 8.13010235 12.52880771]
# [ 8.13010235 0. 4.39870535]
# [12.52880771 4.39870535 0. ]]
I want to write only one function to calculate true mean (don't count the zero element when averaging numbers in row or column) of each row or column of matrix. I try to control whether it is by-row or by-column calculation using axis parameters as 1 or 0, respectively.
This is the function for by-column calculation
def true_mean(matrix, axis):
countnonzero = (matrix!=0).sum(axis)
mask = countnonzero!=0
output_mat = np.zeros(matrix.T.shape[axis])
output_mat[mask] = matrix[:,mask].sum(axis)/countnonzero[mask] # line4
return output_mat
Test the function
eachPSM = np.ones([5,4])
eachPSM[0] = 0
eachPSM[2,2:4] = 5
print each PSM
> [[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 1. 1. 5. 5.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
ans = true_mean(eachPSM,0)
print ans
> [ 1. 1. 2. 2.]
However, if I want to calculate by row (axis = 1), only line4 has to change to
output_mat[mask] = matrix[mask,:].sum(axis)/countnonzero[mask]
Is there a way to flip matrix[:,mask] to matrix[mask,:] by using only number 0 and 1? So I can have only one function for calculating true mean from row and column.
You can use the fact that the [] operator takes a tuple as input argument:
indexer = [slice(None), slice(None)]
indexer[axis] = mask
print(x[indexer])
slice(None) is equivalent to :, so we construct a tuple that takes the full matrix [:, :] and replace the entry of the desired axis with the mask.
Complete example:
import numpy as np
x = np.arange(9).reshape(3, 3)
mask = np.array([True, False, True])
for axis in [0, 1]:
indexer = [slice(None)] * x.ndim
indexer[axis] = mask
print(x[indexer])
prints
[[0 1 2]
[6 7 8]]
and
[[0 2]
[3 5]
[6 8]]
I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...