Related
I have a dense over-determined matrix. I can find solutions with np.linalg.lstsq. I want to get an answer with mostly 0s.
Here is an artificial example that produces similar results to my code:
import numpy as np
# Build matrix and vector
M = np.array([[2.71*i, 3.14*i, 4*i, 5.99*i, 6*i] for i in range(1, 1000) ])
v = np.array([[i] for i in range(1, 1000)])
solution, _, _, _ = np.linalg.lstsq(M, v)
print("solution", solution)
# print("error", np.linalg.norm(v - np.dot(M, solution)))
I got solution with many small non-zero terms.
solution [[ 0.06342455]
[ 0.1076765 ]
[ 0.12445127]
[-0.00251504]
[ 0.00121254]]
The solution is not unique and I would like to also have solution to be mostly 0s. Like [0, 0, -.5, 0, .5] or [1/2.71, 0, 0, 0, 0]. Is there an easy way to do this?
A rotation matrix can either be written in two ways.
With row unit vectors,
[ ex_0 ex_1 ex_2]
[ ey_0 ey_1 ey_2]
[ ez_0 ez_1 ez_2]
Or with column unit vectors,
[ ex_0 ey_0 ez_0]
[ ex_1 ey_1 ez_1]
[ ex_2 ey_2 ez_2]
In the following code which definition is being used? Is DCM in row or column format?
from sympy.vector import CoordSysCartesian
from sympy import symbols, pi, Matrix
theta = symbols('theta')
N = CoordSysCartesian('N')
A = N.orient_new_axis('A', theta, N.k)
dcm = N.rotation_matrix(A).subs({'theta':pi/2})
Output,
Matrix([
[0, -1, 0],
[1, 0, 0],
[0, 0, 1]])
Context: I'm doing a bunch of simulations that require me to implement different Hamiltonians. These Hamiltonians are just matrices, built out of Kronecker products of some common elements, with some prefactors that I have to calculate based on the system parameters. E.g, using ⊗ for the Kronecker product
H = w1(a,b,c) * sigmax ⊗ I + w2(x,y,z)*I ⊗ sigmay
I was hoping I could make a simple parser that could read in the values of a,b,c,x,y,z and an expression for the Hamiltonian and construct the necessary matrix. Sympy seems like an obvious candidate, but I can't get a matrix expression to build using strings.
from sympy import symbols,Matrix,MatrixSymbol
from sympy.physics import msigma
from sympy.physics.quantum import TensorProduct
w1,w2 = symbols('w1 w2')
X1 = MatrixSymbol('X1',4,4)
X2 = MatrixSymbol('X2',4,4)
x = msigma(1)
x_1 = TensorProduct(eye(2),x)
x_2 = TensorProduct(x,eye(2))
exp = w1*X1 + w2*X2
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
will work. But, trying
exp = MatrixExpr('w1*X1+w2*X2')
or
exp = MatrixExpr(sympify('w1*X1+w2*X2'))
or even
exp = sympify('w1*X1 + w2*X2')
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)])
won't.
It also won't work if I change w1 or w2 to be 1x1 instances of a MatrixSymbol.
What am I doing wrong here? This is my first time using sympy so I'm very clear that I may just be missing something.
Let's look what's going on in simpler case:
exp = sympify('w1*X1'); right_exp = w1*X1
type(exp), type(right_exp)
Out[47]: (sympy.core.mul.Mul, sympy.matrices.expressions.matmul.MatMul)
Looks like simpify doesn'y understand that X1 is a matrix. So, if we mention it explicit, everything will be allright:
exp = sympify("w1*MatrixSymbol('X1',4,4)")
exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[49]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
right_exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[50]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
And the final statement:
exp = sympify("w1*MatrixSymbol('X1',4,4)+w2*MatrixSymbol('X2',4,4)")
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
Out[63]:
Matrix([
[ 0, 0.5, 2, 0],
[0.5, 0, 0, 2],
[ 2, 0, 0, 0.5],
[ 0, 2, 0.5, 0]])
What's going on? If you read Basics of expressions in SymPy you can find there statement that "matrices aren’t sympifiable" and simpify interprets X1 as a symbol.
It's hard to say how to behave in another situations. There are notes in docs that warn:
Sometimes autosimplification during sympification results in
expressions that are very different in structure than what was
entered.
With using python library numpy, it is possible to use the function cumprod to evaluate cumulative products, e.g.
a = np.array([1,2,3,4,2])
np.cumprod(a)
gives
array([ 1, 2, 6, 24, 48])
It is indeed possible to apply this function only along one axis.
I would like to do the same with matrices (represented as numpy arrays), e.g. if I have
S0 = np.array([[1, 0], [0, 1]])
Sx = np.array([[0, 1], [1, 0]])
Sy = np.array([[0, -1j], [1j, 0]])
Sz = np.array([[1, 0], [0, -1]])
and
b = np.array([S0, Sx, Sy, Sz])
then I would like to have a cumprod-like function which gives
np.array([S0, S0.dot(Sx), S0.dot(Sx).dot(Sy), S0.dot(Sx).dot(Sy).dot(Sz)])
(This is a simple example, in reality I have potentially large matrices evaluated over n-dimensional meshgrids, so I seek for the most simple and efficient way to evaluate this thing.)
In e.g. Mathematica I would use
FoldList[Dot, IdentityMatrix[2], {S0, Sx, Sy, Sz}]
so I searched for a fold function, and all I found is an accumulate method on numpy.ufuncs. To be honest, I know that I am probably doomed because an attempt at
np.core.umath_tests.matrix_multiply.accumulate(np.array([pauli_0, pauli_x, pauli_y, pauli_z]))
as mentioned in a numpy mailing list yields the error
Reduction not defined on ufunc with signature
Do you have an idea how to (efficiently) do this kind of calculation ?
Thanks in advance.
As food for thought, here are 3 ways of evaluating the 3 sequential dot products:
With the normal Python reduce (which could also be written as a loop)
In [118]: reduce(np.dot,[S0,Sx,Sy,Sz])
array([[ 0.+1.j, 0.+0.j],
[ 0.+0.j, 0.+1.j]])
The einsum equivalent
In [119]: np.einsum('ij,jk,kl,lm',S0,Sx,Sy,Sz)
The einsum index expression looks like a sequence of operations, but it is actually evaluated as a 5d product with summation on 3 axes. In the C code this is done with an nditer and strides, but the effect is as follows:
In [120]: np.sum(S0[:,:,None,None,None] * Sx[None,:,:,None,None] *
Sy[None,None,:,:,None] * Sz[None,None,None,:,:],(1,2,3))
In [127]: np.prod([S0[:,:,None,None,None], Sx[None,:,:,None,None],
Sy[None,None,:,:,None], Sz[None,None,None,:,:]]).sum((1,2,3))
A while back while creating a patch from np.einsum I translated that C code to Python, and also wrote a Cython sum-of-products function(s). This code is on github at
https://github.com/hpaulj/numpy-einsum
einsum_py.py is the Python einsum, with some useful debugging output
sop.pyx is the Cython code, which is compiled to sop.so.
Here's how it could be used for part of your problem. I'm skipping the Sy array since my sop is not coded for complex numbers (but that could be changed).
import numpy as np
import sop
import einsum_py
S0 = np.array([[1., 0], [0, 1]])
Sx = np.array([[0., 1], [1, 0]])
Sz = np.array([[1., 0], [0, -1]])
print np.einsum('ij,jk,kl', S0, Sx, Sz)
# [[ 0. -1.] [ 1. 0.]]
# same thing, but with parsing information
einsum_py.myeinsum('ij,jk,kl', S0, Sx, Sz, debug=True)
"""
{'max_label': 108, 'min_label': 105, 'nop': 3,
'shapes': [(2, 2), (2, 2), (2, 2)],
'strides': [(16, 8), (16, 8), (16, 8)],
'ndim_broadcast': 0, 'ndims': [2, 2, 2], 'num_labels': 4,
....
op_axes [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
"""
# take op_axes (for np.nditer) from this debug output
op_axes = [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
w = sop.sum_product_cy3([S0,Sx,Sz], op_axes)
print w
As written sum_product_cy3 cannot take an arbitrary number of ops. Plus the iteration space increases with each op and index. But I can imagine calling it repeatedly, either at the Cython level, or from Python. I think it has potential for being faster than repeat(dot...) for lots of small arrays.
A condensed version of the Cython code is:
def sum_product_cy3(ops, op_axes, order='K'):
#(arr, axis=None, out=None):
cdef np.ndarray[double] x, y, z, w
cdef int size, nop
nop = len(ops)
ops.append(None)
flags = ['reduce_ok','buffered', 'external_loop'...]
op_flags = [['readonly']]*nop + [['allocate','readwrite']]
it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for x, y, z, w in it:
for i in range(x.shape[0]):
w[i] = w[i] + x[i] * y[i] * z[i]
return it.operands[nop]
I have an Nx5 array containing N vectors of form 'id', 'x', 'y', 'z' and 'energy'. I need to remove duplicate points (i.e. where x, y, z all match) within a tolerance of say 0.1. Ideally I could create a function where I pass in the array, columns that need to match and a tolerance on the match.
Following this thread on Scipy-user, I can remove duplicates based on a full array using record arrays, but I need to just match part of an array. Moreover this will not match within a certain tolerance.
I could laboriously iterate through with a for loop in Python but is there a better Numponic way?
You might look at scipy.spatial.KDTree.
How big is N ?
Added: oops, tree.query_pairs is not in scipy 0.7.1 .
When in doubt, use brute force: split the space (here side^3) into little cells,
one point per cell:
""" scatter points to little cells, 1 per cell """
from __future__ import division
import sys
import numpy as np
side = 100
npercell = 1 # 1: ~ 1/e empty
exec "\n".join( sys.argv[1:] ) # side= ...
N = side**3 * npercell
print "side: %d npercell: %d N: %d" % (side, npercell, N)
np.random.seed( 1 )
points = np.random.uniform( 0, side, size=(N,3) )
cells = np.zeros( (side,side,side), dtype=np.uint )
id = 1
for p in points.astype(int):
cells[tuple(p)] = id
id += 1
cells = cells.flatten()
# A C, an E-flat, and a G walk into a bar.
# The bartender says, "Sorry, but we don't serve minors."
nz = np.nonzero(cells)[0]
print "%d cells have points" % len(nz)
print "first few ids:", cells[nz][:10]
I have finally got a solution that I am happy with, this is a slightly cleaned up cut and paste from my own code. There may yet be some bugs.
Note: that it still uses a 'for' loop. I could use Denis's idea of KDTree above coupled with the rounding to get the full solution.
import numpy as np
def remove_duplicates(data, dp_tol=None, cols=None, sort_by=None):
'''
Removes duplicate vectors from a list of data points
Parameters:
data An MxN array of N vectors of dimension M
cols An iterable of the columns that must match
in order to constitute a duplicate
(default: [1,2,3] for typical Klist data array)
dp_tol An iterable of three tolerances or a single
tolerance for all dimensions. Uses this to round
the values to specified number of decimal places
before performing the removal.
(default: None)
sort_by An iterable of columns to sort by (default: [0])
Returns:
MxI Array An array of I vectors (minus the
duplicates)
EXAMPLES:
Remove a duplicate
>>> import wien2k.utils
>>> import numpy as np
>>> vecs1 = np.array([[1, 0, 0, 0],
... [2, 0, 0, 0],
... [3, 0, 0, 1]])
>>> remove_duplicates(vecs1)
array([[1, 0, 0, 0],
[3, 0, 0, 1]])
Remove duplicates with a tolerance
>>> vecs2 = np.array([[1, 0, 0, 0 ],
... [2, 0, 0, 0.001 ],
... [3, 0, 0, 0.02 ],
... [4, 0, 0, 1 ]])
>>> remove_duplicates(vecs2, dp_tol=2)
array([[ 1. , 0. , 0. , 0. ],
[ 3. , 0. , 0. , 0.02],
[ 4. , 0. , 0. , 1. ]])
Remove duplicates and sort by k values
>>> vecs3 = np.array([[1, 0, 0, 0],
... [2, 0, 0, 2],
... [3, 0, 0, 0],
... [4, 0, 0, 1]])
>>> remove_duplicates(vecs3, sort_by=[3])
array([[1, 0, 0, 0],
[4, 0, 0, 1],
[2, 0, 0, 2]])
Change the columns that constitute a duplicate
>>> vecs4 = np.array([[1, 0, 0, 0],
... [2, 0, 0, 2],
... [1, 0, 0, 0],
... [4, 0, 0, 1]])
>>> remove_duplicates(vecs4, cols=[0])
array([[1, 0, 0, 0],
[2, 0, 0, 2],
[4, 0, 0, 1]])
'''
# Deal with the parameters
if sort_by is None:
sort_by = [0]
if cols is None:
cols = [1,2,3]
if dp_tol is not None:
# test to see if already an iterable
try:
null = iter(dp_tol)
tols = np.array(dp_tol)
except TypeError:
tols = np.ones_like(cols) * dp_tol
# Convert to numbers of decimal places
# Find the 'order' of the axes
else:
tols = None
rnd_data = data.copy()
# set the tolerances
if tols is not None:
for col,tol in zip(cols, tols):
rnd_data[:,col] = np.around(rnd_data[:,col], decimals=tol)
# TODO: For now, use a slow Python 'for' loop, try to find a more
# numponic way later - see: http://stackoverflow.com/questions/2433882/
sorted_indexes = np.lexsort(tuple([rnd_data[:,col] for col in cols]))
rnd_data = rnd_data[sorted_indexes]
unique_kpts = []
for i in xrange(len(rnd_data)):
if i == 0:
unique_kpts.append(i)
else:
if (rnd_data[i, cols] == rnd_data[i-1, cols]).all():
continue
else:
unique_kpts.append(i)
rnd_data = rnd_data[unique_kpts]
# Now sort
sorted_indexes = np.lexsort(tuple([rnd_data[:,col] for col in sort_by]))
rnd_data = rnd_data[sorted_indexes]
return rnd_data
if __name__ == '__main__':
import doctest
doctest.testmod()
Have not tested this but if you sort your array along x then y then z this should get you the list of duplicates. You then need to choose which to keep.
def find_dup_xyz(anarray, x, y, z): #for example in an data = array([id,x,y,z,energy]) x=1 y=2 z=3
dup_xyz=[]
for i, row in enumerated(sortedArray):
nx=1
while (abs(row[x] - sortedArray[i+nx[x])<0.1) and (abs(row[z] and sortedArray[i+nx[y])<0.1) and (abs(row[z] - sortedArray[i+nx[z])<0.1):
nx=+1
dup_xyz.append(row)
return dup_xyz
Also just found this
http://mail.scipy.org/pipermail/scipy-user/2008-April/016504.html