Symmetric matrices in numpy? - python

I wish to initiate a symmetric matrix in python and populate it with zeros.
At the moment, I have initiated an array of known dimensions but this is unsuitable for subsequent input into R as a distance matrix.
Are there any 'simple' methods in numpy to create a symmetric matrix?
Edit
I should clarify - creating the 'symmetric' matrix is fine. However I am interested in only generating the lower triangular form, ie.,
ar = numpy.zeros((3, 3))
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
I want:
array([[ 0],
[ 0, 0 ],
[ 0., 0., 0.]])
Is this possible?

I don't think it's feasible to try work with that kind of triangular arrays.
So here is for example a straightforward implementation of (squared) pairwise Euclidean distances:
def pdista(X):
"""Squared pairwise distances between all columns of X."""
B= np.dot(X.T, X)
q= np.diag(B)[:, None]
return q+ q.T- 2* B
For performance wise it's hard to beat it (in Python level). What would be the main advantage of not using this approach?

Related

How to multiply diagonal elements by each other using numpy?

For the purpose of this exercise, let's consider a matrix where the element m_{i, j} is given by the rule m_{i, j} = i*j if i == j and 0 else.
Is there an easy "numpy" way of calculating such a matrix without having to resort to if statements checking for the indices?
You can use the numpy function diag to construct a diagonal matrix if you give it the intended diagonal as a 1D array as input.
So you just need to create that, like [i**2 for i in range (N)] with N the dimension of the matrix.
You could use the identity matrix given by numpy.identity(n) and then multiply it by a n dimensional vector.
Assuming you have a squared matrix, you can do this:
import numpy as np
ary = np.zeros((4, 4))
_ = [ary.__setitem__((i, i), i**2) for i in range(ary.shape[0])]
print(ary)
# array([[0., 0., 0., 0.],
# [0., 1., 0., 0.],
# [0., 0., 4., 0.],
# [0., 0., 0., 9.]])

Numpy list comprehension iterating over 2D array

I have a bit of code that loads up a long (100k-1mil) set of lines, it has an index in the first column followed by 18 values, for a total of 19 floats per line. This all is put into a numpy array.
I need to do some simple processing on the matrix to keep the index column and get out 1s and 0s depending on conditions of whether values are positive or negative, but the criterion varies as the columns are sequential pairs of values with different reference values.
The code below goes through the columns 2-19 first by evens then odds to check the values, and then creates a temporary list to put into the array I want to have at the end.
I know there's a simpler way to do this, with list comprehension and possibly lambda, but I'm not proficient enough with this to figure it out. So I'm hoping someone can help me reduce the length of this code into something more compact. More efficient would be great too, but I know that the compact methods don't always increase efficiency. It will however help me better understand list comprehension, with and without numpy.
Sample values for reference:
0.000 72.250 -158.622 86.575 -151.153 85.807 -149.803 84.285 -143.701 77.723 -160.471 96.587 -144.020 75.827 -157.071 87.629 -148.856 100.814 -140.488
10.000 56.224 -174.351 108.309 -154.148 68.564 -155.721 83.634 -132.836 75.030 -177.971 100.623 -146.616 61.856 -150.885 92.147 -150.124 91.841 -153.112
20.000 53.357 -153.537 58.190 -160.235 77.575 176.257 93.771 -150.549 77.789 -161.534 103.589 -146.363 73.623 -159.441 99.315 -129.663 92.842 -138.736
And here is the code snippet:
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.zeros(datain.shape) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataarray = numpy.zeros(len(datain[0]))
phit = numpy.zeros((len(dataarray)-1)/2)
psit = numpy.zeros((len(dataarray)-1)/2)
for i in range(len(datain)):
dataarray = numpy.copy(datain[i])
phit[:] = dataarray[1::2]
psit[:] = dataarray[2::2]
temp = []
for j in range(len(phit)):
if(phit[j] < 0):
temp.append(1)
else:
temp.append(0)
if(psit[j] > 0):
temp.append(1)
else:
temp.append(0)
dataout[i][1:] = temp
Thanks in advance, I know there's a fair number of questions on these topics here; unfortunately I couldn't find one that helped me get to a solution.
As #abarnert mentioned, the solution here is not to write better loops, but (since you're using Numpy) to not loop in Python at all by understanding how to use Numpy in more advanced ways.
What you have is a matrix like
[ [idx, v0a, v0b, v1a, v1b, ... ], ... ]
And you want a matrix that's basically
[ [idx, 1 if v0a < 0 else 0, 1 if v0b > 0 else 0, ... ], ... ]
We're going to do this in two steps: first, we'll transform the matrix slightly so that the comparisons are all the same; second, we'll apply the comparison in-place.
The only difference between how we handle "even" and "odd" columns is that one is being checked for <0, the other >0. If we modify the second group of columns by multiplying them by -1, then these comparisons both become simply <0:
datain[:, 2::2] *= -1
Now we just want to know, for every value (besides the first column), is that value <0. This is super easy:
datain[:, 1:] < 0
This returns a matrix of boolean values, where each value represents whether or not the corresponding cell in datain[:, 1:] was less than 0. You want these as integers, 1 for True and 0 for False; it turns out, when we assign these boolean values back into our original array (which contains floats), numpy will cast the bools into floats automatically; True will get cast to 1.0, and False will get cast to 0.0.
If you don't want to throw away your original data, simply copy it off first. Here's the complete code:
# If you want to preserve your old data, create a copy for us to modify
dataout = np.array(datain)
# Now assign your integer values into your data array
dataout[:, 2::2] *= -1
dataout[:, 1:] = datain[:, 1:] < 0
For the sample input you provided:
array([[ 0. , 72.25 , 158.622, 86.575, 151.153, 85.807,
149.803, 84.285, 143.701, 77.723, 160.471, 96.587,
144.02 , 75.827, 157.071, 87.629, 148.856, 100.814,
140.488],
[ 10. , 56.224, 174.351, 108.309, 154.148, 68.564,
155.721, 83.634, 132.836, 75.03 , 177.971, 100.623,
146.616, 61.856, 150.885, 92.147, 150.124, 91.841,
153.112],
[ 20. , 53.357, 153.537, 58.19 , 160.235, 77.575,
-176.257, 93.771, 150.549, 77.789, 161.534, 103.589,
146.363, 73.623, 159.441, 99.315, 129.663, 92.842,
138.736]])
This code ends up with the following final result:
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[10., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[20., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]])
Thanks to abarnert for pointing me in the right direction with this, the solution is pretty simple.
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.empty(datain.shape, dtype=int) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataout[:, 1::2] = datain[:, 1::2] < 0
dataout[:, 2::2] = datain[:, 2::2] > 0
That's it! Much shorter, much more readable, and gets me the values I want.

How can I translate a MATLAB cell in Python 3?

Just to give you some context:
I have to translate some MATLAB code into Python 3 one, but here I've been confronted to a little problem.
Matlab:
for i in 1:num_nodes
for j in 1:num_nodes
K{i,j} = zeros(3,3);
Which I translated into:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
Also, further in the Matlab code there's a third loop:
for k in 1:3
K{i,j}(k,k) = -1
Which also kind of... Upsets me?
The fact is I don't really see how I can translate this kind of variable into Python. Also, I guess that my Python code's kind of "broken" - and I'm not really asking to any of you to improve it - , so I'm just asking which is the best way to translate Matlab's cell into Python?
I finally found something apparently simple to translate this, using list comprehension - according to kazemakase's answer. The actual Python code is now looking like this:
k_topology = [[np.zeros((3,3)) for j in range(self.get_nb_nodes_from_network())]\
for i in range(self.get_nb_nodes_from_network())]
And looks like something like this in Output:
[[array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])], ..., [array(...)]]
(There's really too many values to paste it here, but I think you got it.)
The first question you need to ask is "what is a Matlab cell and what could be a suitable corresponding Python type?"
If I remember correctly from my bad old Matlab days, a cell is sort of a container that holds content of mixed types. It is something like a dynamically typed array or matrix. It is multidimensionally indexed.
Python is dynamically typed, so any Python contianer can basically fulfill this function. Lists in Python are indexed, so nested lists could work - but they are somewhat weird to set up and access:
K = [[None] * num_nodes for _ in range(num_nodes)]
K[i][j] # need two indices to access elements of a nested list.
For the particular scenario a dictionary better mirrors Matlab syntax. Although a ditionary takes only one index, we can exploit the fact that tuples can be declared without brackets and that dictionaries can take tuples as index:
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = np.zeros((3, 3))
for k in 1:3
K[i, j][k, k] = -1
While the dictionary is syntactically more concise, element access is potentially less performant than in nested lists. Nested look different than Matlab code. The choice depends on performance or similarity to the original code. But if performance is an issue there are many more things to consider, anyway. In summary: There is no one best way to do it.
Since the OP expclicitly asked not to improve the code, I explicitly ask him/her to ignore this part of the answer.
A better way to build diagonal matrices is to use np.ones instead of looping over diagonal elements.
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = -np.ones((3, 3))
Also, nested lists can be constructed without (much) prior initialization, if that is the preferred approach:
K = []
for i in range(num_nodes):
K.append([])
for j in range(num_nodes):
K[-1].append(-np.ones((3, 3)))
Now, for the peace of my soul, let me take apart provide feedback on the OP's code:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
This has nothing to do with the original Matlab code (different variable names)
Both loops use i. j is never defined.
[[i][j]] builds a list with one element i and tries to take the jth element. If j is ever something other than 0 this will cause an error.
list.extend a appends all elements of the argument individually to the list - in this case individual rows. list.append would be correct to use as the whole 3x3 matrix should be appended as one element in K.
np.zeros(3, 3) should be np.zeros((3, 3)) (assuming np is an alias for numpy) because the function takes the shape is the first argument, not multiple arguments.
Using the Octave/scipy save/loadmat that I demonstrated in the linked post:
In an Octave session
>> num_nodes=3
num_nodes = 3
>> num_nodes=3;
>> K=cell(num_nodes, num_nodes);
>> for i = 1:num_nodes
for j = 1:num_nodes
K{i,j} = zeros(2,2);
end
end
>> K
K =
{
[1,1] =
0 0
0 0
[2,1] =
0 0
0 0
etc
Access one cell:
>> K{1,2}
ans =
0 0
0 0
Access one element of one cell:
>> K{1,2}(1,1)
ans = 0
>> save -7 kfile.mat K
In Python
In [31]: from scipy import io
In [32]: data = io.loadmat('kfile.mat')
In [34]: data
Out[34]:
{'K': array([[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])]], dtype=object),
'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, written by Octave 4.0.0, 2017-02-15 19:05:44 UTC',
'__version__': '1.0'}
In [35]: data['K'].shape
Out[35]: (3, 3)
In [36]: data['K'][0,0].shape
Out[36]: (2, 2)
In [37]: data['K'][0,0][0,0]
Out[37]: 0.0
loadmat treats a cell as a 2d object dtype array; while regular matrices are 2d numeric arrays. Object arrays are, in many ways like a nested Python list.

Efficient incremental sparse matrix in python / scipy / numpy

Is there a way in Python to have an efficient incremental update of sparse matrix?
H = lil_matrix((n,m))
for (i,j) in zip(A,B):
h(i,j) += compute_something
It seems that such a way to build a sparse matrix is quite slow (lil_matrix is the fastest sparse matrix type for that).
Is there a way (like using dict of dict or other kind of approaches) to efficiently build the sparse matrix H?
In https://stackoverflow.com/a/27771335/901925 I explore incremental matrix assignment.
lol and dok are the recommended formats if you want to change values. csr will give you an efficiency warning, and coo does not allow indexing.
But I also found that dok indexing is slow compared to regular dictionary indexing. So for many changes it is better to build a plain dictionary (with the same tuple indexing), and build the dok matrix from that.
But if you can calculate the H data values with a fast numpy vector operation, as opposed to iteration, it is best to do so, and construct the sparse matrix from that (e.g. coo format). In fact even with iteration this would be faster:
h = np.zeros(A.shape)
for k, (i,j) in enumerate(zip(A,B)):
h[k] = compute_something
H = sparse.coo_matrix((h, (A, B)), shape=(n,m))
e.g.
In [780]: A=np.array([0,1,1,2]); B=np.array([0,2,2,1])
In [781]: h=np.zeros(A.shape)
In [782]: for k, (i,j) in enumerate(zip(A,B)):
h[k] = i+j+k
.....:
In [783]: h
Out[783]: array([ 0., 4., 5., 6.])
In [784]: M=sparse.coo_matrix((h,(A,B)),shape=(4,4))
In [785]: M
Out[785]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
In [786]: M.A
Out[786]:
array([[ 0., 0., 0., 0.],
[ 0., 0., 9., 0.],
[ 0., 6., 0., 0.],
[ 0., 0., 0., 0.]])
Note that the (1,2) value is the sum 4+5. That's part of the coo to csr conversion.
In this case I could have calculated h with:
In [791]: A+B+np.arange(A.shape[0])
Out[791]: array([0, 4, 5, 6])
so there's no need for iteration.
Nope, do not use csr_matrix or csc_matrix, as they are going to be even more slower than lil_matrix, if you construct them incrementally. The Dictionary of Key based sparse matrix is exactly what you are looking for
from scipy.sparse import dok_matrix
S = dok_matrix((5, 5), dtype=np.float32)
for i in range(5):
for j in range(5):
S[i,j] = i+j # Update elements
A faster way would be:
H_ij = compute_something_vectorized()
H = coo_matrix((H_ij, (A, B))).tocsr()
The data for duplicate coordinates are then summed, see the docs for coo_matrix.

What are the advantages of using numpy.identity over numpy.eye?

Having looked over the man pages for numpy's eye and identity, I'd assumed that identity was a special case of eye, since it has fewer options (e.g. eye can fill shifted diagonals, identity cannot), but could plausibly run more quickly. However, this isn't the case on either small or large arrays:
>>> np.identity(3)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> np.eye(3)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> timeit.timeit("import numpy; numpy.identity(3)", number = 10000)
0.05699801445007324
>>> timeit.timeit("import numpy; numpy.eye(3)", number = 10000)
0.03787708282470703
>>> timeit.timeit("import numpy", number = 10000)
0.00960087776184082
>>> timeit.timeit("import numpy; numpy.identity(1000)", number = 10000)
11.379066944122314
>>> timeit.timeit("import numpy; numpy.eye(1000)", number = 10000)
11.247124910354614
What, then, is the advantage of using identity over eye?
identity just calls eye so there is no difference in how the arrays are constructed. Here's the code for identity:
def identity(n, dtype=None):
from numpy import eye
return eye(n, dtype=dtype)
As you say, the main difference is that with eye the diagonal can may be offset, whereas identity only fills the main diagonal.
Since the identity matrix is such a common construct in mathematics, it seems the main advantage of using identity is for its name alone.
To see the difference in an example, run the below codes:
import numpy as np
#Creates an array of 4 x 4 with the main diagonal of 1
arr1 = np.eye(4)
print(arr1)
print("\n")
#or you can change the diagonal position
arr2 = np.eye(4, k=1) # or try with another number like k= -2
print(arr2)
print("\n")
#but you can't change the diagonal in identity
arr3 = np.identity(4)
print(arr3)
np.identity returns a square matrix (special case of a 2D-array) which is an identity matrix with the main diagonal (i.e. 'k=0') as 1's and the other values as 0's. you can't change the diagonal k here.
np.eye returns a 2D-array, which fills the diagonal, i.e. 'k' which can be set, with 1's and rest with 0's.
So, the main advantage depends on the requirement. If you want an identity matrix, you can go for identity right away, or can call the np.eye leaving the rest to defaults.
But, if you need a 1's and 0's matrix of a particular shape/size or have a control over the diagonal you can go for eye method.
Just like how a matrix is a special case of an array, np.identity is a special case of np.eye.
Additional references:
Eye and Identity - HackerRank

Categories

Resources