Elementwise division, disregarding zeros - python

It's a python question: let's say I have an m+1-dimensional numpy array a consisting of non-negative numbers, and I would like to obtain an array b of the same size where the last coordinates are normalized so that they sum up to 1, or zero in case all of them were zeros. For example, if m = 2, my code would be as follows
import numpy as np
a = np.array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]])
for i1 in range(len(a)):
for i2 in range(len(a)):
s = a[i1][i2].sum()
if s > 0:
a[i1][i2] = a[i1][i2]/s
however I find this method sloppy. Also, it works only for fixed m.

This can be done by broadcasting. There are several ways to take into account the zero-sum exception. Without taking it into account, you could write
import numpy as np
shape = (2, 3, 4)
X = np.random.randn(*shape) ** 2
sums = X.sum(-1)
Y = X / sums[..., np.newaxis]
Now, in order to take into account potential zero-sum-ness of some lines, we set one line of the data to 0:
X[0, 0, :] = 0
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
You will observe that Y.sum(axis=-1) has the entry 0 in coordinate (0,0) reflecting the zero-ness of the corresponding line.
EDIT: Application to the concrete example
X = np.array(array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]]))
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
yields Y == X (because along the last axis the sum is already one or zero.)

Related

How to keep a matrix unchanged

I am trying to calculate the inverse matrix using the Gauss-Jordan Method. For that, I need to find the solution X to A.X = I (A and X being N x N matrices, and I the identity matrix).
However, for every column vector of the solution matrix X I calculate in the first loop, I have to use the original matrix A, but I don't know why it keeps changing when I did a copy of it in the beginning.
def SolveGaussJordanInvMatrix(A):
N = len(A[:,0])
I = np.identity(N)
X = np.zeros([N,N], float)
A_orig = A.copy()
for m in range(N):
x = np.zeros(N, float)
v = I[:,m]
A = A_orig
for p in range(N): # Gauss-Jordan Elimination
A[p,:] /= A[p,p]
v[p] /= A[p,p]
for i in range(p): # Cancel elements above the diagonal element
v[i] -= v[p] * A[i,p]
A[i,p:] -= A[p,p:]*A[i,p]
for i in range(p+1, N): # Cancel elements below the diagonal element
v[i] -= v[p] * A[i,p]
A[i,p:] -= A[p,p:]*A[i,p]
X[:,m] = v # Add column vector to the solution matrix
return X
A = np.array([[2, 1, 4, 1 ],
[3, 4, -1, -1],
[1, -4, 7, 5],
[2, -2, 1, 3]], float)
SolveGaussJordanInvMatrix(A)
Does anyone know how turn A back to its original form after the Gauss-Elimination loop?
I'm getting
array([[ 228.1, 0. , 0. , 0. ],
[-219.9, 1. , 0. , 0. ],
[ -14.5, 0. , 1. , 0. ],
[-176.3, 0. , 0. , 1. ]])
and expect
[[ 1.36842105 -0.89473684 -1.05263158 1. ]
[-1.42105263 1.23684211 1.13157895 -1. ]
[ 0.42105263 -0.23684211 -0.13157895 -0. ]
[-2. 1.5 1.5 -1. ]]

How to call the number in a matrix

here is my matrix setup:
for a in b:
size_x = len(a) + 1
size_y = len(b) + 1
matrix = np.zeros ((size_x, size_y))
for x in range(size_x):
matrix [x, 0] = x
for y in range(size_y):
matrix [0, y] = y
for x in range(1, size_x):
for y in range(1, size_y):
if a[x-1] == b[y-1]:
matrix [x,y] = min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix [x,y] = min(
matrix[x-1,y] + 1,
matrix[x-1,y-1] + 1,
matrix[x,y-1] + 1
)
print(matrix)
This would give outputs such as
t e s t
[[ 0. 1. 2. 3. 4.]
t [ 1. 0. 1. 2. 3.]
e [ 2. 1. 0. 1. 2.]
x [ 3. 2. 1. 1. 2.]
t [ 4. 3. 2. 1. 1.]]
In which the bottom right-hand corner is the final value. How do I take this out and add it to a list?
You can access the ith index of an array arr by using this expression: arr[i] .
In order to answer your question -> accessing the bottom-right value of a 2d matrix simply use,
matrix[matrix.length-1][matrix[matrix.length-1].length-1]
or better
lastRowIndex = matrix.length-1;
lastColIndex = matrix[lastRowIndex].length-1;
bottomRightValue = matrix[lastRowIndex][lastColIndex]

Fastest way to compute angle between 2D vectors

I'm looking for efficient alternate ways to compute cosine angle between 2D vectors. Your insights on this problem will be of much help.
Problem Statement:
vectors is a 2D array where vectors are stored. The shape of the vectors array is (N, 2) where N is the number of vectors. vectors[:, 0] has x-component and vectors[:, 1] has y-component.
I have to find the angle between all the vectors in vectors. For example, if there are three vectors A, B, C in vectors, I need to find the angle between A and B, B and C, and, A and C.
I have implemented it and wants to know alternative ways.
Current Implementation:
vectors = np.array([[1, 3], [2, 4], [3, 5]])
vec_x = vectors[:, 0]
vec_y = vectors[:, 1]
a1 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_x
a2 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_y
a1b1 = a1 * a1.T
a2b2 = a2 * a2.T
mask = np.triu_indices(a1b1.shape[0], 0) # We are interested in lower triangular matrix
a1b1[mask] = 0
a2b2[mask] = 0
numer = a1b1 + a2b2
denom = np.ones([vec_x.shape[0], vec_x.shape[0]]) * np.sqrt(np.square(a1) + np.square(a2))
denom = denom * denom.T
denom[mask] = 0
eps = 1e-7
dot_res = np.rad2deg(np.arccos(np.divide(numer, denom + eps)))
dot_res[mask] = 0
print(dot_res)
Output:
[[ 0. 0. 0. ]
[ 8.13010519 0. 0. ]
[12.52880911 4.39870821 0. ]]
Questions:
Is there any alternative way to do this more efficient?
Can we improve the speed of the current version in some way?
Use scipy.spatial.distance.pdist:
import numpy as np
import scipy.spatial.distance
vectors = np.array([[1, 3], [2, 4], [3, 5]])
# Compute cosine distance
dist = scipy.spatial.distance.pdist(vectors, 'cosine')
# Compute angles
angle = np.rad2deg(np.arccos(1 - dist))
# Make it into a matrix
angle_matrix = scipy.spatial.distance.squareform(angle)
print(angle_matrix)
# [[ 0. 8.13010235 12.52880771]
# [ 8.13010235 0. 4.39870535]
# [12.52880771 4.39870535 0. ]]

Transpose `matrix[:,mask]` to `matrix[mask,:]` by using only number `0` and `1`?

I want to write only one function to calculate true mean (don't count the zero element when averaging numbers in row or column) of each row or column of matrix. I try to control whether it is by-row or by-column calculation using axis parameters as 1 or 0, respectively.
This is the function for by-column calculation
def true_mean(matrix, axis):
countnonzero = (matrix!=0).sum(axis)
mask = countnonzero!=0
output_mat = np.zeros(matrix.T.shape[axis])
output_mat[mask] = matrix[:,mask].sum(axis)/countnonzero[mask] # line4
return output_mat
Test the function
eachPSM = np.ones([5,4])
eachPSM[0] = 0
eachPSM[2,2:4] = 5
print each PSM
> [[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 1. 1. 5. 5.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
ans = true_mean(eachPSM,0)
print ans
> [ 1. 1. 2. 2.]
However, if I want to calculate by row (axis = 1), only line4 has to change to
output_mat[mask] = matrix[mask,:].sum(axis)/countnonzero[mask]
Is there a way to flip matrix[:,mask] to matrix[mask,:] by using only number 0 and 1? So I can have only one function for calculating true mean from row and column.
You can use the fact that the [] operator takes a tuple as input argument:
indexer = [slice(None), slice(None)]
indexer[axis] = mask
print(x[indexer])
slice(None) is equivalent to :, so we construct a tuple that takes the full matrix [:, :] and replace the entry of the desired axis with the mask.
Complete example:
import numpy as np
x = np.arange(9).reshape(3, 3)
mask = np.array([True, False, True])
for axis in [0, 1]:
indexer = [slice(None)] * x.ndim
indexer[axis] = mask
print(x[indexer])
prints
[[0 1 2]
[6 7 8]]
and
[[0 2]
[3 5]
[6 8]]

Scipy Sparse Matrix special substraction

I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...

Categories

Resources