I have a matrix x, and a matrix p of the same structure and size.
One row represents the coordinates of an n-dimensional point.
I have a function f which takes a point (a row so to say) and computes a score for it.
Given x and p, I'd like to replace row i in p with row i in x if row i in x is smaller than row i in p according to my function f, formally:
for all row indices i do:
p[i] = (x[i] if f(x[i]) < f(p[i]) else p[i])
Python's list comprehension is way to slow, so I need to do it in numpy, but I'm new to numpy and have tried and failed hard while trying to figure it out.
From other computations I already have, I've called them benchmarks for some reason, vectors for x and p where the value at index i is the score of row i.
Here's the relevant code:
benchmark_x = FUNCTION(x)
benchmark_p = FUNCTION(p)
# TODO Too slow, ask smart guys from StackOverflow
p = np.array([x[i] if benchmark_x[i] < benchmark_p[i] else p[i] for i in range(p.shape[0])])
How about this ?
pos = benchmark_x < benchmark_p
p[pos] = x[pos]
I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
For lack of a Latex editor, here is a picture of a piecewise function that I wish to plot using Sympy. I want to pass in two arrays of the coefficients and a value for x, then evaluate it and plot the function. (Edit : there are exactly one more p than there are alphas, image updated)
This is my attempt so far (alpha and p are lists/arrays, t is a number):
def getf(alpha,p,t):
#Create the argument list of tuples for the SymPy.Piecewise function
argtuples = []
for n,number in enumerate(alpha):
if n == 0:
argtuples.append((p[0]*x, x<alpha[0]))
elif 0<n and n<list(enumerate(alpha))[-1][0]:
argtuples.append((p[0]*alpha[0] + Sum(p[i]*(alpha[i] - alpha[i-1]),(i,1,n)) + p[n+1]*(x-alpha[n]), alpha[n-1] <= x < alpha[n]))
else:
argtuples.append((p[0]*alpha[0] + Sum(p[i]*(alpha[i] - alpha[i-1]),(i,1,n)) + p[n+1]*(x-alpha[n]), x>=alpha[n]))
f = Piecewise(argtuples)
return f(t)
from sympy import Piecewise, Sum
from sympy.abc import x, i
getf([10000,50000,100000,1000000],[0.05,0.08,0.15,0.30,0.40],1000001)
However, I'm getting the error "list indices must be integers or slices, not Symbol". How can I reference the coefficient values that I have passed into the function, given that the array could be any length?
You cannot use a symbolic index on a Python list (here i is symbolic, since you are importing it from abc). If you know the list ahead of time, you should use the Python sum function to sum the values, instead of Sum.
sum(p[i]*(alpha[i] - alpha[i-1]) for i in range(1, n))
There is also another problem, which is that you have alpha[n-1] <= x < alpha[n]. This unfortunately won't work, due to the way Python handles chained inequalities. You have to write this as And(alpha[n-1] <= 1, x < alpha[n]) Otherwise you will get TypeError: cannot determine truth value of Relational.
Say I would like to remove the diagonal from a scipy.sparse.csr_matrix. Is there an efficient way of doing so? I saw that in the sparsetools module there are C functions to return the diagonal.
Based on other SO answers here and here my current approach is the following:
def csr_setdiag_val(csr, value=0):
"""Set all diagonal nonzero elements
(elements currently in the sparsity pattern)
to the given value. Useful to set to 0 mostly.
"""
if csr.format != "csr":
raise ValueError('Matrix given must be of CSR format.')
csr.sort_indices()
pointer = csr.indptr
indices = csr.indices
data = csr.data
for i in range(min(csr.shape)):
ind = indices[pointer[i]: pointer[i + 1]]
j = ind.searchsorted(i)
# matrix has only elements up until diagonal (in row i)
if j == len(ind):
continue
j += pointer[i]
# in case matrix has only elements after diagonal (in row i)
if indices[j] == i:
data[j] = value
which I then follow with
csr.eliminate_zeros()
Is that the best I can do without writing my own Cython code?
Based on #hpaulj's comment, I created an IPython Notebook which can be seen on nbviewer. This shows that out of all methods mentioned the following is the fastest (assume that mat is a sparse CSR matrix):
mat - scipy.sparse.dia_matrix((mat.diagonal()[scipy.newaxis, :], [0]), shape=(one_dim, one_dim))
Is there a smart and space-efficient symmetric matrix in numpy which automatically (and transparently) fills the position at [j][i] when [i][j] is written to?
import numpy
a = numpy.symmetric((3, 3))
a[0][1] = 1
a[1][0] == a[0][1]
# True
print(a)
# [[0 1 0], [1 0 0], [0 0 0]]
assert numpy.all(a == a.T) # for any symmetric matrix
An automatic Hermitian would also be nice, although I won’t need that at the time of writing.
If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:
def symmetrize(a):
"""
Return a symmetrized version of NumPy array a.
Values 0 are replaced by the array value at the symmetric
position (with respect to the diagonal), i.e. if a_ij = 0,
then the returned array a' is such that a'_ij = a_ji.
Diagonal values are left untouched.
a -- square NumPy array, such that a_ij = 0 or a_ji = 0,
for i != j.
"""
return a + a.T - numpy.diag(a.diagonal())
This works under reasonable assumptions (such as not doing both a[0, 1] = 42 and the contradictory a[1, 0] = 123 before running symmetrize).
If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__:
class SymNDArray(numpy.ndarray):
"""
NumPy array subclass for symmetric matrices.
A SymNDArray arr is such that doing arr[i,j] = value
automatically does arr[j,i] = value, so that array
updates remain symmetrical.
"""
def __setitem__(self, (i, j), value):
super(SymNDArray, self).__setitem__((i, j), value)
super(SymNDArray, self).__setitem__((j, i), value)
def symarray(input_array):
"""
Return a symmetrized version of the array-like input_array.
The returned array has class SymNDArray. Further assignments to the array
are thus automatically symmetrized.
"""
return symmetrize(numpy.asarray(input_array)).view(SymNDArray)
# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a # a[1, 0] == 42 too!
(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1, which correctly sets a[1, :] elements.
Note that Python 3 removed the possibility of writing def …(…, (i, j),…), so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes…
The more general issue of optimal treatment of symmetric matrices in numpy bugged me too.
After looking into it, I think the answer is probably that numpy is somewhat constrained by the memory layout supportd by the underlying BLAS routines for symmetric matrices.
While some BLAS routines do exploit symmetry to speed up computations on symmetric matrices, they still use the same memory structure as a full matrix, that is, n^2 space rather than n(n+1)/2. Just they get told that the matrix is symmetric and to use only the values in either the upper or the lower triangle.
Some of the scipy.linalg routines do accept flags (like sym_pos=True on linalg.solve) which get passed on to BLAS routines, although more support for this in numpy would be nice, in particular wrappers for routines like DSYRK (symmetric rank k update), which would allow a Gram matrix to be computed a fair bit quicker than dot(M.T, M).
(Might seem nitpicky to worry about optimising for a 2x constant factor on time and/or space, but it can make a difference to that threshold of how big a problem you can manage on a single machine...)
There are a number of well-known ways of storing symmetric matrices so they don't need to occupy n^2 storage elements. Moreover, it is feasible to rewrite common operations to access these revised means of storage. The definitive work is Golub and Van Loan, Matrix Computations, 3rd edition 1996, Johns Hopkins University Press, sections 1.27-1.2.9. For example, quoting them from form (1.2.2), in a symmetric matrix only need to store A = [a_{i,j} ] fori >= j. Then, assuming the vector holding the matrix is denoted V, and that A is n-by-n, put a_{i,j} in
V[(j-1)n - j(j-1)/2 + i]
This assumes 1-indexing.
Golub and Van Loan offer an Algorithm 1.2.3 which shows how to access such a stored V to calculate y = V x + y.
Golub and Van Loan also provide a way of storing a matrix in diagonal dominant form. This does not save storage, but supports ready access for certain other kinds of operations.
This is plain python and not numpy, but I just threw together a routine to fill
a symmetric matrix (and a test program to make sure it is correct):
import random
# fill a symmetric matrix with costs (i.e. m[x][y] == m[y][x]
# For demonstration purposes, this routine connect each node to all the others
# Since a matrix stores the costs, numbers are used to represent the nodes
# so the row and column indices can represent nodes
def fillCostMatrix(dim): # square array of arrays
# Create zero matrix
new_square = [[0 for row in range(dim)] for col in range(dim)]
# fill in main diagonal
for v in range(0,dim):
new_square[v][v] = random.randrange(1,10)
# fill upper and lower triangles symmetrically by replicating diagonally
for v in range(1,dim):
iterations = dim - v
x = v
y = 0
while iterations > 0:
new_square[x][y] = new_square[y][x] = random.randrange(1,10)
x += 1
y += 1
iterations -= 1
return new_square
# sanity test
def test_symmetry(square):
dim = len(square[0])
isSymmetric = ''
for x in range(0, dim):
for y in range(0, dim):
if square[x][y] != square[y][x]:
isSymmetric = 'NOT'
print "Matrix is", isSymmetric, "symmetric"
def showSquare(square):
# Print out square matrix
columnHeader = ' '
for i in range(len(square)):
columnHeader += ' ' + str(i)
print columnHeader
i = 0;
for col in square:
print i, col # print row number and data
i += 1
def myMain(argv):
if len(argv) == 1:
nodeCount = 6
else:
try:
nodeCount = int(argv[1])
except:
print "argument must be numeric"
quit()
# keep nodeCount <= 9 to keep the cost matrix pretty
costMatrix = fillCostMatrix(nodeCount)
print "Cost Matrix"
showSquare(costMatrix)
test_symmetry(costMatrix) # sanity test
if __name__ == "__main__":
import sys
myMain(sys.argv)
# vim:tabstop=8:shiftwidth=4:expandtab
To construct a NxN matrix that is symmetric along the main diagonal, and with 0's on the main diagonal you can do :
a = np.array([1, 2, 3, 4, 5])
b = np.zeros(shape=(a.shape[0], a.shape[0]))
upper = np.triu(b + a)
lower = np.tril(np.transpose(b + a))
D = (upper + lower) * (np.full(a.shape[0], fill_value=1) - np.eye(a.shape[0]))
This is kind of a special case, but recently I've used this kind of matrix for network adjacency representation.
Hope that helps.
Cheers.
It is trivial to Pythonically fill in [i][j] if [j][i] is filled in. The storage question is a little more interesting. One can augment the numpy array class with a packed attribute that is useful both to save storage and to later read the data.
class Sym(np.ndarray):
# wrapper class for numpy array for symmetric matrices. New attribute can pack matrix to optimize storage.
# Usage:
# If you have a symmetric matrix A as a shape (n,n) numpy ndarray, Sym(A).packed is a shape (n(n+1)/2,) numpy array
# that is a packed version of A. To convert it back, just wrap the flat list in Sym(). Note that Sym(Sym(A).packed)
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
if len(obj.shape) == 1:
l = obj.copy()
p = obj.copy()
m = int((np.sqrt(8 * len(obj) + 1) - 1) / 2)
sqrt_m = np.sqrt(m)
if np.isclose(sqrt_m, np.round(sqrt_m)):
A = np.zeros((m, m))
for i in range(m):
A[i, i:] = l[:(m-i)]
A[i:, i] = l[:(m-i)]
l = l[(m-i):]
obj = np.asarray(A).view(cls)
obj.packed = p
else:
raise ValueError('One dimensional input length must be a triangular number.')
elif len(obj.shape) == 2:
if obj.shape[0] != obj.shape[1]:
raise ValueError('Two dimensional input must be a square matrix.')
packed_out = []
for i in range(obj.shape[0]):
packed_out.append(obj[i, i:])
obj.packed = np.concatenate(packed_out)
else:
raise ValueError('Input array must be 1 or 2 dimensional.')
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.packed = getattr(obj, 'packed', None)
```