I have a matrix of vectors where each row is a vector. I want to take the mean of all the vectors, then calculate the cosine distance between each vector and this mean, returning an array of distances.
>>> x = arange(1,10).reshape(3,3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> m = x.mean(0)
array([4., 5., 6.])
The cosine values are as follows
>>> from scipy.spatial.distance import cosine
cosine([1,2,3], [4,5,6])
0.0253681538029239
>>> cosine([4,5,6], [4,5,6])
0.0
>>> cosine([7,8,9], [4,5,6])
0.001809107314273195
Therefore I want to write a function f such that
>>> f(x, m)
array([0.0253681538029239, 0.0, 0.001809107314273195])
(Or the transpose of such an array. It doesn't matter.)
What is the most efficient, most numpythonic way to write f? It seems like the trick is to get the proper broadcast over the cosine function, but I haven't figured out how to do this. The following doesn't work.
>>> from numpy import frompyfunc
>>> f = frompyfunc(cosine, 2, 1)
>>> f(x, m)
array([[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]], dtype=object)
(It looks like here numpy is applying cosine element-wise instead of row-wise.)
Is there a way to do this without writing a for-loop?
It looks like this is possible with apply_along_axis.
>>> from numpy import apply_along_axis
>>> from functools import partial
>>> g = partial(cosine, m)
>>> apply_along_axis(g, 1, x)
array([0.02536815, 0. , 0.00180911])
Is this the most efficient way?
You need to reshape your mean array to be 2D.
>>> from scipy.spatial.distance import cdist
>>> cdist(x, m.reshape(1, -1), metric='cosine')
array([[2.53681538e-02],
[2.22044605e-16],
[1.80910731e-03]])
Guess the trick would be to use cdist that works on 2D arrays in a vectorized manner to get us those cosine distances. So, one way would be -
In [59]: from scipy.spatial.distance import cosine
In [61]: cdist(x,x.mean(0,keepdims=True),'cosine')
Out[61]:
array([[2.53681538e-02],
[2.22044605e-16],
[1.80910731e-03]])
That keepdims lets the input to be 2D and hence makes it compatible with the cdist input requirements.
Related
I'm working on a tight binding model for graphene using pythTB. I want to incorporate spinfull elements in the calculation. The hamiltonian for the rashba hopping terms has the pauli spin matrix vector crossed with the site hopping vector.
Initially I created a list of matrices and crossed that with the vector, unfortunately this did not yield the correct result (I think that after the vector cross product was taken, then the cross product of the matrices were taken).
Next, I declared 3 symbols 's_x', 's_y', and 's_z' and used those instead of the matrices in my pauli spin matrix vector. After taking the cross product I received the correct result. The problem I am having is that I cannot substitute a matrix into the variable symbols I added in. Is it possible to do this? Or will I need to take the cross product manually?
Here is some of my code:
from __future__ import print_function
from pythtb import * # import TB model class
from sympy import symbols
import numpy as np
import matplotlib.pyplot as plt
# create list of pauli spin matrices
sx = [[0., 1.],[1., 0.]]
sy = [[0., -1.j],[1.j, 0.]]
sz = [[1., 0.],[0., -1.]]
Id = [[1., 0.], [0., 1.]]
s_pauli = np.zeros((4, 2, 2), dtype=complex)
s_pauli = [Id, sx, sy, sz]
# create s_pauli without identity matrix
s_pau = np.zeros((3, 2, 2), dtype=complex)
s_pau = [ s_x, s_y, s_z]
ab00 = [ 0.5, 0.28867513, 0.]
sig_x_ab00 = np.cross( s_pau, ab00)
If I print sig_x_ab00[2] (which is the only one I'm currently interested in), then I get:
0.288675134594813*s_x - 0.5*s_y
After obtaining that, I wanted to substitute s_pauli[1] for s_x and s_pauli[2] for s_y by doing the following command:
sig_x_ab00_ = sig_x_ab00.subs(s_x, s_pauli[1])
And I get the following error output:
AttributeError: 'numpy.ndarray' object has no attribute 'subs'
Is what I am doing at all valid? Or is there a better way to go about this?
Any input is much appreciated!
Thanks!
Let's run your code, but looking at each step. Don't make assumptions.
I'm using an isympy interactive environment; That ipython with sympy enhancements. I also imported np.
In [4]: ab00 = [ 0.5, 0.28867513, 0.]
In [5]: s_pauli
Out[5]:
[[[1.0, 0.0], [0.0, 1.0]],
[[0.0, 1.0], [1.0, 0.0]],
[[0.0, (-0-1j)], [1j, 0.0]],
[[1.0, 0.0], [0.0, -1.0]]]
This is a list. The previous np.zeros(...) expression does nothing. In Python we don't set the 'type' of a variable.
We can make an array from this list:
In [6]: np.array(s_pauli)
s_pauli[1] works because it is just list indexing.
And the added symbols:
In [11]: s_x, s_y, s_z = symbols('s_x s_y s_z')
In [12]: s_x
Out[12]: sₓ
In [13]: s_pau = [ s_x, s_y, s_z]
Again, s_pau is a list, not an array. When used in cross it will be turned into an array:
In [14]: np.array(s_pau)
Out[14]: array([s_x, s_y, s_z], dtype=object)
Note that is an object dtype array, which is still very much like a list. Some basic math works, because math like multiply and add are defined for the symbols. But transcendentals like np.log and np.sin don't work on such arrays.
cross just uses multiply and addition, so it works with these object arrays:
In [15]: sig = np.cross( s_pau, ab00)
In [16]: sig
Out[16]: array([-0.28867513*s_z, 0.5*s_z, 0.28867513*s_x - 0.5*s_y], dtype=object)
sig is a numpy array. It is not a sympy expression, and does not have a subs method. Again, it pays to pay close attention to what is happening.
The elements of the array are sympy expressions:
In [17]: sig[2]
Out[17]: 0.28867513⋅sₓ - 0.5⋅s_y
In [20]: s2 = sig[2]
subs with a scalar value works:
In [22]: s2.subs(s_x, 1)
Out[22]: 0.28867513 - 0.5⋅s_y
but not with a list
In [23]: s2.subs(s_x, s_pauli[1])
Out[23]: 0.28867513⋅sₓ - 0.5⋅s_y
However if I make sympy matrix from it:
In [24]: s_pauli[1]
Out[24]: [[0.0, 1.0], [1.0, 0.0]]
In [25]: Matrix(s_pauli[1])
Out[25]:
⎡0.0 1.0⎤
⎢ ⎥
⎣1.0 0.0⎦
In [26]: s2.subs(s_x, Out[25])
Out[26]:
⎡ 0 0.28867513⎤
-0.5⋅s_y + ⎢ ⎥
⎣0.28867513 0 ⎦
The substitution does work.
In general mixing sympy and numpy is hit-or-miss; something work, almost more by accident than by design. Others don't. sympy.lambdify is the most reliable way of making a function that will work with numpy arrays.
In this case I suspect you'd be better of using a sympy version of cross, and doing the sympy.Matrix substitutions.
I am new to numpy but have been using python for quite a while as an engineer.
I am writing a program that currently stores stress tensors as 3x3 numpy arrays within another NxM array which represents values through time and through the thickness of a wall, so overall it is an NxMx3x3 numpy array. I want to efficiently calculate the eigenvals and vectors of each 3x3 array within this larger array. So far I have tried to using "fromiter" but this doesn't seem to work because the functions returns 2 arrays. I have also tried apply_along_axis which also doesn't work because it says the inner 3x3 is not a square matrix? I can do it with list comprehension, but this doesn't seem ideal to resort to using lists.
Example just calculating eigenvals using list comprehension
import numpy as np
from scipy import linalg
a=np.random.random((2,2,3,3))
f=linalg.eigvalsh
ans=np.asarray([f(x) for x in a.reshape((4,3,3))])
ans.shape=(2,2,3)
I thought something like this would work but I have played around with it and can't get it working:
np.apply_along_axis(f,0,a)
BTW the 2x2 bit could be up to 5000x100 and this code is repeated ~50x50x200 times hence the need for efficiency. Any help would be greatly appreciated?
You can use numpy.linalg.eigh. It accepts an array like your example a.
Here's an example. First, create an array of 3x3 symmetric arrays:
In [96]: a = np.random.random((2, 2, 3, 3))
In [97]: a = a + np.transpose(a, axes=(0, 1, 3, 2))
In [98]: a[0, 0]
Out[98]:
array([[0.61145048, 0.85209618, 0.03909677],
[0.85209618, 1.79309413, 1.61209077],
[0.03909677, 1.61209077, 1.55432465]])
Compute the eigenvalues and eigenvectors of all the 3x3 arrays:
In [99]: evals, evecs = np.linalg.eigh(a)
In [100]: evals.shape
Out[100]: (2, 2, 3)
In [101]: evecs.shape
Out[101]: (2, 2, 3, 3)
Take a look at the result for a[0, 0]:
In [102]: evals[0, 0]
Out[102]: array([-0.31729364, 0.83148477, 3.44467813])
In [103]: evecs[0, 0]
Out[103]:
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]])
Verify that it is the same as computing the eigenvalues and eigenvectors for a[0, 0] separately:
In [104]: np.linalg.eigh(a[0, 0])
Out[104]:
(array([-0.31729364, 0.83148477, 3.44467813]),
array([[-0.55911658, 0.79634401, 0.23070516],
[ 0.63392772, 0.23128064, 0.73800062],
[-0.53434473, -0.55887877, 0.63413738]]))
What are the advantages and disadvantages of each?
From what I've seen, either one can work as a replacement for the other if need be, so should I bother using both or should I stick to just one of them?
Will the style of the program influence my choice? I am doing some machine learning using numpy, so there are indeed lots of matrices, but also lots of vectors (arrays).
Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays) are
N-dimensional. Matrix objects are a subclass of ndarray, so they inherit all
the attributes and methods of ndarrays.
The main advantage of numpy matrices is that they provide a convenient notation
for matrix multiplication: if a and b are matrices, then a*b is their matrix
product.
import numpy as np
a = np.mat('4 3; 2 1')
b = np.mat('1 2; 3 4')
print(a)
# [[4 3]
# [2 1]]
print(b)
# [[1 2]
# [3 4]]
print(a*b)
# [[13 20]
# [ 5 8]]
On the other hand, as of Python 3.5, NumPy supports infix matrix multiplication using the # operator, so you can achieve the same convenience of matrix multiplication with ndarrays in Python >= 3.5.
import numpy as np
a = np.array([[4, 3], [2, 1]])
b = np.array([[1, 2], [3, 4]])
print(a#b)
# [[13 20]
# [ 5 8]]
Both matrix objects and ndarrays have .T to return the transpose, but matrix
objects also have .H for the conjugate transpose, and .I for the inverse.
In contrast, numpy arrays consistently abide by the rule that operations are
applied element-wise (except for the new # operator). Thus, if a and b are numpy arrays, then a*b is the array
formed by multiplying the components element-wise:
c = np.array([[4, 3], [2, 1]])
d = np.array([[1, 2], [3, 4]])
print(c*d)
# [[4 6]
# [6 4]]
To obtain the result of matrix multiplication, you use np.dot (or # in Python >= 3.5, as shown above):
print(np.dot(c,d))
# [[13 20]
# [ 5 8]]
The ** operator also behaves differently:
print(a**2)
# [[22 15]
# [10 7]]
print(c**2)
# [[16 9]
# [ 4 1]]
Since a is a matrix, a**2 returns the matrix product a*a.
Since c is an ndarray, c**2 returns an ndarray with each component squared
element-wise.
There are other technical differences between matrix objects and ndarrays
(having to do with np.ravel, item selection and sequence behavior).
The main advantage of numpy arrays is that they are more general than
2-dimensional matrices. What happens when you want a 3-dimensional array? Then
you have to use an ndarray, not a matrix object. Thus, learning to use matrix
objects is more work -- you have to learn matrix object operations, and
ndarray operations.
Writing a program that mixes both matrices and arrays makes your life difficult
because you have to keep track of what type of object your variables are, lest
multiplication return something you don't expect.
In contrast, if you stick solely with ndarrays, then you can do everything
matrix objects can do, and more, except with slightly different
functions/notation.
If you are willing to give up the visual appeal of NumPy matrix product
notation (which can be achieved almost as elegantly with ndarrays in Python >= 3.5), then I think NumPy arrays are definitely the way to go.
PS. Of course, you really don't have to choose one at the expense of the other,
since np.asmatrix and np.asarray allow you to convert one to the other (as
long as the array is 2-dimensional).
There is a synopsis of the differences between NumPy arrays vs NumPy matrixes here.
Scipy.org recommends that you use arrays:
*'array' or 'matrix'? Which should I use? - Short answer
Use arrays.
They support multidimensional array algebra that is supported in
MATLAB
They are the standard vector/matrix/tensor type of NumPy. Many
NumPy functions return arrays, not matrices.
There is a clear
distinction between element-wise operations and linear algebra
operations.
You can have standard vectors or row/column vectors if you
like.
Until Python 3.5 the only disadvantage of using the array type
was that you had to use dot instead of * to multiply (reduce) two
tensors (scalar product, matrix vector multiplication etc.). Since
Python 3.5 you can use the matrix multiplication # operator.
Given the above, we intend to deprecate matrix eventually.
Just to add one case to unutbu's list.
One of the biggest practical differences for me of numpy ndarrays compared to numpy matrices or matrix languages like matlab, is that the dimension is not preserved in reduce operations. Matrices are always 2d, while the mean of an array, for example, has one dimension less.
For example demean rows of a matrix or array:
with matrix
>>> m = np.mat([[1,2],[2,3]])
>>> m
matrix([[1, 2],
[2, 3]])
>>> mm = m.mean(1)
>>> mm
matrix([[ 1.5],
[ 2.5]])
>>> mm.shape
(2, 1)
>>> m - mm
matrix([[-0.5, 0.5],
[-0.5, 0.5]])
with array
>>> a = np.array([[1,2],[2,3]])
>>> a
array([[1, 2],
[2, 3]])
>>> am = a.mean(1)
>>> am.shape
(2,)
>>> am
array([ 1.5, 2.5])
>>> a - am #wrong
array([[-0.5, -0.5],
[ 0.5, 0.5]])
>>> a - am[:, np.newaxis] #right
array([[-0.5, 0.5],
[-0.5, 0.5]])
I also think that mixing arrays and matrices gives rise to many "happy" debugging hours.
However, scipy.sparse matrices are always matrices in terms of operators like multiplication.
As per the official documents, it's not anymore advisable to use matrix class since it will be removed in the future.
https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
As other answers already state that you can achieve all the operations with NumPy arrays.
As others have mentioned, perhaps the main advantage of matrix was that it provided a convenient notation for matrix multiplication.
However, in Python 3.5 there is finally a dedicated infix operator for matrix multiplication: #.
With recent NumPy versions, it can be used with ndarrays:
A = numpy.ones((1, 3))
B = numpy.ones((3, 3))
A # B
So nowadays, even more, when in doubt, you should stick to ndarray.
Matrix Operations with Numpy Arrays:
I would like to keep updating this answer
about matrix operations with numpy arrays if some users are interested looking for information about matrices and numpy.
As the accepted answer, and the numpy-ref.pdf said:
class numpy.matrix will be removed in the future.
So now matrix algebra operations has to be done
with Numpy Arrays.
a = np.array([[1,3],[-2,4]])
b = np.array([[3,-2],[5,6]])
Matrix Multiplication (infix matrix multiplication)
a#b
array([[18, 16],
[14, 28]])
Transpose:
ab = a#b
ab.T
array([[18, 14],
[16, 28]])
Inverse of a matrix:
np.linalg.inv(ab)
array([[ 0.1 , -0.05714286],
[-0.05 , 0.06428571]])
ab_i=np.linalg.inv(ab)
ab#ab_i # proof of inverse
array([[1., 0.],
[0., 1.]]) # identity matrix
Determinant of a matrix.
np.linalg.det(ab)
279.9999999999999
Solving a Linear System:
1. x + y = 3,
x + 2y = -8
b = np.array([3,-8])
a = np.array([[1,1], [1,2]])
x = np.linalg.solve(a,b)
x
array([ 14., -11.])
# Solution x=14, y=-11
Eigenvalues and Eigenvectors:
a = np.array([[10,-18], [6,-11]])
np.linalg.eig(a)
(array([ 1., -2.]), array([[0.89442719, 0.83205029],
[0.4472136 , 0.5547002 ]])
An advantage of using matrices is for easier instantiation through text rather than nested square brackets.
With matrices you can do
np.matrix("1, 1+1j, 0; 0, 1j, 0; 0, 0, 1")
and get the desired output directly:
matrix([[1.+0.j, 1.+1.j, 0.+0.j],
[0.+0.j, 0.+1.j, 0.+0.j],
[0.+0.j, 0.+0.j, 1.+0.j]])
If you use arrays, this does not work:
np.array("1, 1+1j, 0; 0, 1j, 0; 0, 0, 1")
output:
array('1, 1+1j, 0; 0, 1j, 0; 0, 0, 1', dtype='<U29')
This is my goal, using Python Numpy:
I would like to create a (1000,1000) dimensional array/matrix of dot product values. That means each array/matrix entry is the dot product of vectors 1 through 1000. Constructing this is theoretically simple: one defines a (1,1000) dimensional matrix of vectors v1, v2, ..., v1000
import numpy as np
vectorvalue = np.matrix([v1, v2, v3, ..., v1000])
and takes the dot product with the transpose, i.e.
matrix_of_dotproducts = np.tensordot(vectorvalue.T, vectorvalue)
And the shape of the array/matrix will be (1000, 1000). The (1,1) entry will be the dot product of vectors (v1,v1), the (1,2) entry will be the dot product of vectors (v1,v2), etc. In order to calculate the dot product with numpy for a three-dimensional vector, it's wise to use numpy.tensordot() instead of numpy.dot()
Here's my problem: I'm not beginning with an array of vector values. I'm beginning with three 1000 element arrays of each coordinate values, i.e. an array of x-coordinates, y-coordinates, and z-coordinates.
xvalues = np.array([x1, x2, x3, ..., x1000])
yvalues = np.array([y1, y2, y3, ..., y1000])
zvalues = np.array([z1, z2, z3, ..., z1000])
Is the easiest thing to do to construct a (3, 1000) numpy array/matrix and then take the tensor dot product for each pair?
v1 = np.array([x1,y1,z1])
v2 = np.array([x2,y2,z2])
...
I'm sure there's a more tractable and efficient way to do this...
PS: To be clear, I would like to take a 3D dot product. That is, for vectors
A = (a1, a2, a3)
and B = (b1, b2, b3),
the dot product should be
dotproduct(A,B) = a1b1 + a2b2 + a3b3.
IIUC, you can build the intermediate array as you suggested:
>>> arr = np.vstack([xvalues, yvalues, zvalues]).T
>>> out = arr.dot(arr.T)
Which seems to be what you want:
>>> out.shape
(1000, 1000)
>>> out[3,4]
1.193097281209083
>>> arr[3].dot(arr[4])
1.193097281209083
So, you're not far off with your initial thought. There's very little overhead involved in concatenating the arrays, but if you're interested in doing in within numpy, there's a built-in set of functions, vstack, hstack, and dstack that should perform exactly as you wish. (Vertical, horizontal, and depth respectively)
I'll leave it up to you to determine which to you where, but here's an example shamelessly stolen from the docs to help get you started:
>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 3, 4])
>>> np.vstack((a,b))
array([[1, 2, 3],
[2, 3, 4]])
For reference: vstack docs, hstack docs, and dstack docs
If it feels a little over-the-top to have three separate functions here then you're right! That's why numpy also has the concatenate function. It's just a generalization of vstack, hstack, and dstack that takes an axis argument.
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
Concatenate docs
I have two one-dimensional numpy matrices:
[[ 0.69 0.41]] and [[ 0.81818182 0.18181818]]
I want to multiply these two to get the result
[[0.883, 0.117]] (the result is normalized)
If I use np.dot I get ValueError: matrices are not aligned
Does anybody have an idea what I am doing wrong?
EDIT
I solved it in a kind of hacky way, but it worked for me, regardless of if there is a better solution or not.
new_matrix = np.matrix([ a[0,0] * b[0,0], a[0,1] * b[0,1] ])
It seems you want to do element-wise math. Numpy arrays do this by default.
In [1]: import numpy as np
In [2]: a = np.matrix([.69,.41])
In [3]: b = np.matrix([ 0.81818182, 0.18181818])
In [4]: np.asarray(a) * np.asarray(b)
Out[4]: array([[ 0.56454546, 0.07454545]])
In [5]: np.matrix(_)
Out[5]: matrix([[ 0.56454546, 0.07454545]])