Making a matrix square and padding it with desired value in numpy - python

In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot

Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)

Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])

For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.

Related

Pythonic way for double for loop

I have the following code:
import numpy as np
epsilon = np.array([[0. , 0.00172667, 0.00071437, 0.00091779, 0.00154501],
[0.00128983, 0. , 0.00028139, 0.00215905, 0.00094862],
[0.00035811, 0.00018714, 0. , 0.00029365, 0.00036993],
[0.00035631, 0.00112175, 0.00022906, 0. , 0.00291149],
[0.00021527, 0.00017653, 0.00010341, 0.00104458, 0. ]])
Sii = np.array([19998169., 14998140., 9997923., 7798321., 2797958.])
n = len(Sii)
epsilonijSjj = np.zeros((n,n))
for i in range(n):
for j in range(n):
epsilonijSjj[i,j] = epsilon[i][j]*Sii[j]
print (epsilonijSjj)
How can I avoid the double for loop and write the code in a fast Pythonic way?
Thank you in advance
Numpy allow you to multiply 2 arrays directly.
So rather than define a 0 based array and populating it with the altered elements of the other array, you can simply create a copy of the other array and apply the multiplication directly like so:
import numpy as np
epsilon = np.array([[0. , 0.00172667, 0.00071437, 0.00091779, 0.00154501],
[0.00128983, 0. , 0.00028139, 0.00215905, 0.00094862],
[0.00035811, 0.00018714, 0. , 0.00029365, 0.00036993],
[0.00035631, 0.00112175, 0.00022906, 0. , 0.00291149],
[0.00021527, 0.00017653, 0.00010341, 0.00104458, 0. ]])
Sii = np.array([19998169., 14998140., 9997923., 7798321., 2797958.])
epsilonijSjj = epsilon.copy()
epsilonijSjj *= Sii
print(epsilonijSjj)
Output:
[[ 0. 25896.8383938 7142.21625351 7157.22103059
4322.87308958]
[25794.23832127 0. 2813.31555297 16836.96495505
2654.19891796]
[ 7161.54430059 2806.7519196 0. 2289.97696165
1035.04860294]
[ 7125.54759639 16824.163545 2290.12424238 0.
8146.22673742]
[ 4305.00584063 2647.6216542 1033.88521743 8145.97015018
0. ]]
Or, just do this, which is faster because it doesn't require creating a copy of an array:
import numpy as np
epsilon = np.array([[0. , 0.00172667, 0.00071437, 0.00091779, 0.00154501],
[0.00128983, 0. , 0.00028139, 0.00215905, 0.00094862],
[0.00035811, 0.00018714, 0. , 0.00029365, 0.00036993],
[0.00035631, 0.00112175, 0.00022906, 0. , 0.00291149],
[0.00021527, 0.00017653, 0.00010341, 0.00104458, 0. ]])
Sii = np.array([19998169., 14998140., 9997923., 7798321., 2797958.])
epsilonijSjj = epsilon * Sii

Create Jordan matrix from eigenvalues using NumPy

I have ndarray of eigenvalues and their multiplicities (for instance, np.array([(2.2, 2), (3, 3), (5, 1)])). I need to compute Jordan matrix for this eigenvalues without using Python cycles and iterables (list comprehensions, for loops etc.), only by using NumPy's functions.
I decided to build the matrix by this steps:
Create this blocks using np.vectorize and np.eye with np.fill_diagonal:
Combine blocks into one matrix using hstack and vstack.
But I've got two problems:
Here's snippet of my block creating code:
def eye(t):
eye = np.eye(t[1].astype(int),k=1)
return eye
def jordan_matrix(X: np.ndarray) -> np.ndarray:
dim = np.sum(X[:,1].astype(int))
eyes = np.vectorize(eye, signature='(x)->(n,m)')(X)
return eyes
And I'm getting error ValueError: could not broadcast input array from shape (3,3) into shape (2,2)
I need to create extra zero matrices to fill space which is not used by created blocks, but their sizes are variable and I can't figure out how to create them without using Python's for and its equivalents.
Am I on the right way? How can I get out of this problems?
np.vectorize would basically loop under the hoods. We could use NumPy funcs for actual vectorization at Python level. Here's one such way -
def blockwise_jordan(a):
r = a[:,1].astype(int)
v = np.repeat(a[:,0],r)
out = np.diag(v)
n = out.shape[1]
fillvals = np.ones(n, dtype=out.dtype)
fillvals[r[:-1].cumsum()-1] = 0
out.flat[1::out.shape[1]+1] = fillvals
return out
Sample run -
In [52]: X = np.array([(2.2, 2), (3, 3), (5, 1)])
In [53]: blockwise_jordan(X)
Out[53]:
array([[2.2, 1. , 0. , 0. , 0. , 0. ],
[0. , 2.2, 0. , 0. , 0. , 0. ],
[0. , 0. , 3. , 1. , 0. , 0. ],
[0. , 0. , 0. , 3. , 1. , 0. ],
[0. , 0. , 0. , 0. , 3. , 0. ],
[0. , 0. , 0. , 0. , 0. , 5. ]])
Optimization #1
We can replace the final three steps to perform the conditional assignment of 1s and 0s, like so -
out.flat[1::n+1] = 1
c = r[:-1].cumsum()-1
out[c,c+1] = 0
Here's my solution:
def jordan(a):
e = a[:,0] # eigenvalues
m = a[:,1].astype('int') # multiplicities
d = np.repeat(e, m) # main diagonal
ones = np.ones(d.size - 1)
ones[np.cumsum(m)[:-1] -1] = 0
j = np.diag(d) + np.diag(ones, k=1)
return j
Edit: just realized that my solution is almost the same as Divakar's.

How to vectorize a 'for' loop which calls a function (that takes a 2-Dimensional array as argument) over a 3-Dimensional numpy array

I have a numpy array containing the XYZ coordinates of the k-neighboors (k=10) points from a point cloud:
k_neighboors
Out[53]:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
...,
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
...,
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
...,
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
...,
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
...,
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
...,
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
...,
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
With this shape:
k_neighboors.shape
Out[54]: (2999986, 10, 3)
And I have this function which applies a Principal Component Analysis to some data provided as 2-Dimensional array:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
Parameters
----------
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
array([
[V11, ... , V1m],
...,
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
Returns
-------
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
Notes
-----
The correlation matrix is a better choice when there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
"""
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
else:
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
So the question is: how can I apply the PCA function mentioned above over each of the 2999986 10x3 arrays in a way that doesn't take for ever like this one:
data = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
w, v = PCA(k_neighboors[i])
data[i] = v[:,2]
break #: I break the loop in order to don't have to wait for ever.
data
Out[64]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
Thanks to #Divakar and #Eelco comments.
Using the function that Divakar post on this answer
def vectorized_app(data):
diffs = data - data.mean(1,keepdims=True)
return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
And using what Eelco pointed on his comment, I end up with this.
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich gives the same results as the for loop, without taking forever (althought still takes a while):
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
data2
Out[52]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
I don't know if there could be a better way to do this, so I'm going to keep the question open.

Raise diagonal matrix to the negative power 1/2

I am trying to compute the matrix which has the following equation.
S = (D^−1/2) * W * (D^−1/2)
where D is a diagonal matrix of this form:
array([[ 0.59484625, 0. , 0. , 0. ],
[ 0. , 0.58563893, 0. , 0. ],
[ 0. , 0. , 0.58280472, 0. ],
[ 0. , 0. , 0. , 0.58216725]])
and W:
array([[ 0. , 0.92311635, 0.94700586, 0.95599748],
[ 0.92311635, 0. , 0.997553 , 0.99501248],
[ 0.94700586, 0.997553 , 0. , 0.9995501 ],
[ 0.95599748, 0.99501248, 0.9995501 , 0. ]])
I tried to compute D^-1/2 by using numpy function linalg.matrix_power(D,-1/2) and numpy.power(D,-1/2) and matrix_power function raises TypeError: exponent must be an integer and numpy.power function raises RuntimeWarning: divide by zero encountered in power.
How to compute negative power -1/2 for diagonal matrix. Please help.
If you can update D(like in your own answer) then simply update the items at its diagonal indices and then call np.dot:
>>> D[np.diag_indices(4)] = 1/ (D.diagonal()**0.5)
>>> np.dot(D, W).dot(D)
array([[ 0. , 0.32158153, 0.32830723, 0.33106193],
[ 0.32158153, 0. , 0.34047794, 0.33923936],
[ 0.32830723, 0.34047794, 0. , 0.33913717],
[ 0.33106193, 0.33923936, 0.33913717, 0. ]])
Or create a new zeros array and then fill its diagonal elements with 1/ (D.diagonal()**0.5):
>>> arr = np.zeros(D.shape)
>>> np.fill_diagonal(arr, 1/ (D.diagonal()**0.5))
>>> np.dot(arr, W).dot(arr)
array([[ 0. , 0.32158153, 0.32830723, 0.33106193],
[ 0.32158153, 0. , 0.34047794, 0.33923936],
[ 0.32830723, 0.34047794, 0. , 0.33913717],
[ 0.33106193, 0.33923936, 0.33913717, 0. ]])
I got the answer by computing thro' mathematical terms but would love to see any straight forward one liners :)
def compute_diagonal_to_negative_power():
for i in range(4):
for j in range(4):
if i == j:
element = D[i][j]
numerator = 1
denominator = math.sqrt(element)
D[i][j] = numerator / denominator
return D
diagonal_matrix = compute_diagonal_to_negative_power()
S = np.dot(diagonal_matrix, W).dot(diagonal_matrix)
print(S)
"""
[[ 0. 0.32158153 0.32830723 0.33106193]
[ 0.32158153 0. 0.34047794 0.33923936]
[ 0.32830723 0.34047794 0. 0.33913718]
[ 0.33106193 0.33923936 0.33913718 0. ]]
"""
Source: https://math.stackexchange.com/questions/340321/raising-a-square-matrix-to-a-negative-half-power
You can do the following:
numpy.power(D,-1/2, where=(D!=0))
And then you will avoid getting the warning:
RuntimeWarning: divide by zero encountered in power
numpy will divide every value on the matrix element-wise by it's own square root, which is not zero, so basically you won't try to divide by zero anymore.

compute a xi-xj matrix in numpy without loops (by api calls)

How to compute a xi-xj matrix in numpy without loops (by api calls)?
Here's what to start with:
import numpy as np
x = np.random.rand(4)
xij = np.matrix([xi-xj for xj in x for xi in x]).reshape(4,4)
You can take advantage of broadcasting to subtract x as a column vector from x as a flat array and produce the matrix.
>>> x = np.random.rand(4)
Then:
>>> x - x[:,np.newaxis]
array([[ 0. , 0.89175647, 0.80930233, 0.37955823],
[-0.89175647, 0. , -0.08245415, -0.51219825],
[-0.80930233, 0.08245415, 0. , -0.4297441 ],
[-0.37955823, 0.51219825, 0.4297441 , 0. ]])
If you want a matrix object (and not the default array object) you could write:
np.matrix(x - x[:,np.newaxis])
By reshaping the array, you can use the minus operator to calculate what you want
import numpy as np
x = np.random.rand(4)
x = x.reshape(-1,1)
xij = np.matrix(x.T - x)
Another alternative is to use np.subtract.outer:
In [35]: x = np.random.rand(4)
In [36]: np.matrix([xi-xj for xj in x for xi in x]).reshape(4,4)
Out[36]:
matrix([[ 0. , 0.45365177, 0.07227472, -0.05824887],
[-0.45365177, 0. , -0.38137705, -0.51190064],
[-0.07227472, 0.38137705, 0. , -0.13052359],
[ 0.05824887, 0.51190064, 0.13052359, 0. ]])
In [37]: -np.subtract.outer(x, x)
Out[37]:
array([[-0. , 0.45365177, 0.07227472, -0.05824887],
[-0.45365177, -0. , -0.38137705, -0.51190064],
[-0.07227472, 0.38137705, -0. , -0.13052359],
[ 0.05824887, 0.51190064, 0.13052359, -0. ]])
(Note that the result is a numpy array, not a matrix.)

Categories

Resources