Given a two-dimensional array T of size NxN, filled with various natural numbers (They do not have to be sorted in any way as in the example below.). My task is to write a program that transforms the array in such a way that all elements lying above the main diagonal are larger than each element lying on the diagonal and all elements lying below the main diagonal are to be smaller than each element on the diagonal.
For example:
T looks like this:
[2,3,5][7,11,13][17,19,23] and one of the possible solutions is:
[13,19,23][3,7,17][5,2,11]
I have no clue how to do this. Would anyone have an idea what algorithm should be used here?
Let's say the matrix is NxN.
Put all N² values inside an array.
Sort the array with whatever method you prefer (ascending order).
In your final array, the (N²-N)/2 first values go below the diagonal, the following N values go to the diagonal, and the final (N²-N)/2 values go above the diagonal.
The following pseudo-code should do the job:
mat <- array[N][N] // To be initialized.
vec <- array[N*N]
for i : 0 to (N-1)
for j : 0 to (N-1)
vec[i*N+j]=mat[i][j]
next j
next i
sort(vec)
p_below <- 0
p_diag <- (N*N-N)/2
p_above <- (N*N+N)/2
for i : 0 to (N-1)
for j : 0 to (N-1)
if (i>j)
mat[i][j] = vec[p_above]
p_above <- p_above + 1
endif
if (i<j)
mat[i][j] = vec[p_below]
p_below <- p_below + 1
endif
if (i=j)
mat[i][j] = vec[p_diag]
p_diag <- p_diag + 1
endif
next j
next i
Code can be heavily optimized by sorting directly the matrix, using a (quite complex) custom sort operator, so it can be sorted "in place". Technically, you'll do a bijection between the matrix indices to a partitioned set of indices representing "below diagonal", "diagonal" and "above diagonal" indices.
But I'm unsure that it can be considered as an algorithm in itself, because it will be highly dependent on the language used AND on how you stored, internally, your matrix (and how iterators/indices are used). I could write one in C++, but I lack knownledge to give you such an operator in Python.
Obviously, if you can't use a standard sorting function (because it can't work on anything else but an array), then you can write your own with the tricky comparison builtin the algorithm.
For such small matrixes, even a bubble-sort can work properly, but obviously implementing at least a quicksort would be better.
Elements about optimizing:
First, we speak about the trivial bijection from matrix coordinate [x][y] to [i]: i=x+y*N. The invert is obviously x=floor(i/N) & y=i mod N. Then, you can parse the matrix as a vector.
This is already what I do in the first part initializing vec, BTW.
With matrix coordinates, it's easy:
Diagonal is all cells where x=y.
The "below" partition is everywhere x<y.
The "above" partition is everywhere x>y.
Look at coordinates in the below 3x3 matrix, it's quite evident when you know it.
0,0 1,0 2,0
0,1 1,1 2,1
0,2 1,2 2,2
We already know that the ordered vector will be composed of three parts: first the "below" partition, then the "diagonal" partition, then the "above" partition.
The next bijection is way more tricky, since it requires either a piecewise linear function OR a look-up table. The first requires no additional memory but will use more CPU power, the second use as much memory as the matrix but will require less CPU power.
As always, optimization for speed often cost memory. If memory is scarse because you use huge matrixes, then you'll prefer a function.
In order to shorten a bit, I'll explain only for "below" partition. In the vector, the (N-1) first elements will be the ones belonging to the first column. Then, we'll have (N-2) elements for the 2nd column, (N-3) for the third, until we had only 1 element for the (N-1)th column. You see the scheme, sum of the number of elements and the column (zero-based index) is always (N-1).
I won't write the function, because it's quite complex and, honestly, it won't help so much to understand. Simply know that converting from matrix indices to vector is "quite easy".
The opposite is more tricky and CPU-intensive, and it SHOULD use a (N-1) element vector to store where each column starts within the vector to GREATLY speed up the process. Thanks, this vector can also be used (from end to begin) for the "above" partition, so it won't burn too much memory.
Now, you can sort your "vector" normally, simply by chaining the two bijection together with the vector index, and you'll get a matrix cell instead. As long as the sorting algorithm is stable (that's usually the case), it will works and will sort your matrix "in place", at the expense of a lot of mathematical computing to "route" the linear indexes to matrix indexes.
Please note that, despite we speak about bijections, we need ONLY the "vector to matrix" formulas. The "matrix to vector" are important - it MUST be a bijection! - but you won't use them, since you'll sort directly the (virtual) vector from 0 to N²-1.
Related
For alpha and k fixed integers with i < k also fixed, I am trying to encode a sum of the form
where all the x and y variables are known beforehand. (this is essentially the alpha coordinate of a big iterated matrix-vector multiplication)
For a normal sum varying over one index I usually create a 1d array A and set A[i] equal to the i indexed entry of the sum then use sum(A), but in the above instance the entries of the innermost sum depend on the indices in the previous sum, which in turn depend on the indices in the sum before that, all the way back out to the first sum which prevents me using this tact in a straightforward manner.
I tried making a 2D array B of appropriate length and width and setting the 0 row to be the entries in the innermost sum, then the 1 row as the entries in the next sum times sum(np.transpose(B),0) and so on, but the value of the first sum (of row 0) needs to vary with each entry in row 1 since that sum still has indices dependent on our position in row 1, so on and so forth all the way up to sum k-i.
A sum which allows for a 'variable' filled in by each position of the array it's summing through would thusly do the trick, but I can't find anything along these lines in numpy and my attempts to hack one together have thus far failed -- my intuition says there is a solution that involves summing along the axes of a k-i dimensional array, but I haven't been able to make this precise yet. Any assistance is greatly appreciated.
One simple attempt to hard-code something like this would be:
for j0 in range(0,n0):
for j1 in range(0,n1):
....
Edit: (a vectorized version)
You could do something like this: (I didn't test it)
temp = np.ones(n[k-i])
for j in range(0,k-i):
temp = x[:n[k-i-1-j],:n[k-i-j]].T#(y[:n[k-i-j]]*temp)
result = x[alpha,:n[0]]#(y[:n[0]]*temp)
The basic idea is that you try to press it into a matrix-vector form. (note that this is python3 syntax)
Edit: You should note that you need to change the "k-1" to where the innermost sum is (I just did it for all sums up to index k-i)
This is 95% identical to #sehigle's answer, but includes a generic N vector:
def nested_sum(XX, Y, N, alpha):
intermediate = np.ones(N[-1], dtype=XX.dtype)
for n1, n2 in zip(N[-2::-1], N[:0:-1]):
intermediate = np.sum(XX[:n1, :n2] * Y[:n2] * intermediate, axis=1)
return np.sum(XX[alpha, :N[0]] * Y[:N[0]] * intermediate)
Similarly, I have no knowledge of the expression, so I'm not sure how to build appropriate tests. But it runs :\
I can test the rank of a matrix using np.linalg.matrix_rank(A) . But how can I test if all the rows of A are orthogonal efficiently?
I could take all pairs of rows and compute the inner product between them but is there a better way?
My matrix has fewer rows than columns and the rows are not unit vectors.
This answer basically summarizes the approaches mentioned in the question and the comments, and adds some comparison/insights about them
Approach #1 -- checking all row-pairs
As you suggested, you can iterate over all row pairs, and compute the inner product. If A.shape==(N,M), i.e. you have N rows of size M each, you end up with a O(M*N^2) complexity.
Approach #2 -- matrix multiplication
As suggested in the comments by #JoeKington, you can compute the multiplication A.dot(A.T), and check all the non-diagonal elements. Depending on the algorithm used for matrix multiplication, this can be faster than the naive O(M*N^2) algorithm, but only asymptotically better. Unless your matrices are big, they would be slower.
The advantages of approach #1:
You can "short circuit" -- quit the check as soon as you find the first non-orthogonal pair
requires less memory. In #2, you create a temporary NxN matrix.
The advantages of approach #2:
The multiplication is fast, as it is implemented in the heavily-optimized linear-algebra library (BLAS of ATLAS). I believe those libraries choose the right algorithm to use according to input size (i.e. they won't use the fancy algorithms on small matrices, because they are slower for small matrices. There's a big constant hidden behind that O-notation).
less code to write
My bet is that for small matrices, approach #2 would prove faster due to the fact the LA libraries are heavily optimized, and despite the fact they compute the entire multiplication, even after processing the first pair of non-orthogonal rows.
It seems that this will do
product = np.dot(A,A.T)
np.fill_diagonal(product,0)
if (product.any() == 0):
Approach #3: Compute the QR decomposition of AT
In general, to find an orthogonal basis of the range space of some matrix X, one can compute the QR decomposition of this matrix (using Givens rotations or Householder reflectors). Q is an orthogonal matrix and R upper triangular. The columns of Q corresponding to non-zero diagonal entries of R form an orthonormal basis of the range space.
If the columns of X=AT, i.e., the rows of A, already are orthogonal, then the QR decomposition will necessarily have the R factor diagonal, where the diagonal entries are plus or minus the lengths of the columns of X resp. the rows of A.
Common folklore has it that this approach is numerically better behaved than the computation of the product A*AT=RT*R. This may only matter for larger matrices. The computation is not as straightforward as the matrix product, however, the amount of operations is of the same size.
(U.T # U == np.eye(U.shape[0])).all()
This will give 'True' if matrix 'U' is orthogonal otherwise 'False', here 'all()' function is used to convert the matrix of boolean values(True/False values) that we get after 'U.T # U == np.eye(U.shape[0])', into a single boolean value.
if you want to check that matrix is approximately orthonormal(by this I mean that the matrix that we get after 'U.T # U' is nearly equal to an identity matrix),
Then use 'np.allclose()' like this
np.allclose(U.T # U, np.eye(U.shape[0]))
Note: '#' is used for matrix multiplication
I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data).
I want to multiply them (which produces a triangular matrix - however I dont get this since I don't limit the dot product to yield a triangular matrix).
I am wondering what the best way of multiplying them is (memory-wise and speed-wise) - I am going to do the computation on a m2.4xlarge AWS instance which has >60GB of RAM. I would prefer to keep the calc in RAM for speed reasons.
I appreciate that SciPy has sparse matrices and so does h5py, but have no experience in either.
Whats the best option to go for?
Thanks in advance
UPDATE: sparsity of the boolean matrices is <0.6%
If your matrices are relatively empty it might be worthwhile encoding them as a data structure of the non-False values. Say a list of tuples describing the location of the non-False values. Or a dictionary with the tuples as the keys.
If you use e.g. a list of tuples you could use a list comprehension to find the items in the second list that can be multiplied with an element from the first list.
a = [(0,0), (3,7), (5,2)] # et cetera
b = ... # idem
for r, c in a:
res = [(r, k) for j, k in b if k == j]
-- EDITED TO SATISFY BELOW COMMENT / DOWNVOTER --
You're asking how to multiply matrices fast and easy.
SOLUTION 1: This is a solved problem: use numpy. All these operations are easy in numpy, and since they are implemented in C, are rather blazingly fast.
http://www.numpy.org/
http://www.scipy.org
also see:
Very large matrices using Python and NumPy
http://docs.scipy.org/doc/scipy/reference/sparse.html
SciPy and Numpy have sparse matrices and matrix multiplication. It doesn't use much memory since (at least if I wrote it in C) it probably uses linked lists, and thus will only use the memory required for the sum of the datapoints, plus some overhead. And, it will almost certainly be blazingly fast compared to pure python solution.
SOLUTION 2
Another answer here suggests storing values as tuples of (x, y), presuming value is False unless it exists, then it's true. Alternate to this is a numeric matrix with (x, y, value) tuples.
REGARDLESS: Multiplying these would be Nasty time-wise: find element one, decide which other array element to multiply by, then search the entire dataset for that specific tuple, and if it exists, multiply and insert the result into the result matrix.
SOLUTION 3 ( PREFERRED vs. Solution 2, IMHO )
I would prefer this because it's simpler / faster.
Represent your sparse matrix with a set of dictionaries. Matrix one is a dict with the element at (x, y) and value v being (with x1,y1, x2,y2, etc.):
matrixDictOne = { 'x1:y1' : v1, 'x2:y2': v2, ... }
matrixDictTwo = { 'x1:y1' : v1, 'x2:y2': v2, ... }
Since a Python dict lookup is O(1) (okay, not really, probably closer to log(n)), it's fast. This does not require searching the entire second matrix's data for element presence before multiplication. So, it's fast. It's easy to write the multiply and easy to understand the representations.
SOLUTION 4 (if you are a glutton for punishment)
Code this solution by using a memory-mapped file of the required size. Initialize a file with null values of the required size. Compute the offsets yourself and write to the appropriate locations in the file as you do the multiplication. Linux has a VMM which will page in and out for you with little overhead or work on your part. This is a solution for very, very large matrices that are NOT SPARSE and thus won't fit in memory.
Note this solves the complaint of the below complainer that it won't fit in memory. However, the OP did say sparse, which implies very few actual datapoints spread out in giant arrays, and Numpy / SciPy handle this natively and thus nicely (lots of people at Fermilab use Numpy / SciPy regularly, I'm confident the sparse matrix code is well tested).
This may be more of an 'approach' or conceptual question.
Basically, I have a python a multi-dimensional list like so:
my_list = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
What I have to do is iterate through the array and compare each element with those directly surrounding it as though the list was layed out as a matrix.
For instance, given the first element of the first row, my_list[0][0], I need to know know the value of my_list[0][1], my_list[1][0] and my_list[1][1]. The value of the 'surrounding' elements will determine how the current element should be operated on. Of course for an element in the heart of the array, 8 comparisons will be necessary.
Now I know I could simply iterate through the array and compare with the indexed values, as above. I was curious as to whether there was a more efficient way which limited the amount of iteration required? Should I iterate through the array as is, or iterate and compare only values to either side and then transpose the array and run it again. This, however would ignore those values to the diagonal. And should I store results of the element lookups, so I don't keep determining the value of the same element multiple times?
I suspect this may have a fundamental approach in Computer Science, and I am eager to get feedback on the best approach using Python as opposed to looking for a specific answer to my problem.
You may get faster, and possibly even simpler, code by using numpy, or other alternatives (see below for details). But from a theoretical point of view, in terms of algorithmic complexity, the best you can get is O(N*M), and you can do that with your design (if I understand it correctly). For example:
def neighbors(matrix, row, col):
for i in row-1, row, row+1:
if i < 0 or i == len(matrix): continue
for j in col-1, col, col+1:
if j < 0 or j == len(matrix[i]): continue
if i == row and j == col: continue
yield matrix[i][j]
matrix = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
for i, row in enumerate(matrix):
for j, cell in enumerate(cell):
for neighbor in neighbors(matrix, i, j):
do_stuff(cell, neighbor)
This has takes N * M * 8 steps (actually, a bit less than that, because many cells will have fewer than 8 neighbors). And algorithmically, there's no way you can do better than O(N * M). So, you're done.
(In some cases, you can make things simpler—with no significant change either way in performance—by thinking in terms of iterator transformations. For example, you can easily create a grouper over adjacent triplets from a list a by properly zipping a, a[1:], and a[2:], and you can extend this to adjacent 2-dimensional nonets. But I think in this case, it would just make your code more complicated that writing an explicit neighbors iterator and explicit for loops over the matrix.)
However, practically, you can get a whole lot faster, in various ways. For example:
Using numpy, you may get an order of magnitude or so faster. When you're iterating a tight loop and doing simple arithmetic, that's one of the things that Python is particularly slow at, and numpy can do it in C (or Fortran) instead.
Using your favorite GPGPU library, you can explicitly vectorize your operations.
Using multiprocessing, you can break the matrix up into pieces and perform multiple pieces in parallel on separate cores (or even separate machines).
Of course for a single 4x6 matrix, none of these are worth doing… except possibly for numpy, which may make your code simpler as well as faster, as long as you can express your operations naturally in matrix/broadcast terms.
In fact, even if you can't easily express things that way, just using numpy to store the matrix may make things a little simpler (and save some memory, if that matters). For example, numpy can let you access a single column from a matrix naturally, while in pure Python, you need to write something like [row[col] for row in matrix].
So, how would you tackle this with numpy?
First, you should read over numpy.matrix and ufunc (or, better, some higher-level tutorial, but I don't have one to recommend) before going too much further.
Anyway, it depends on what you're doing with each set of neighbors, but there are three basic ideas.
First, if you can convert your operation into simple matrix math, that's always easiest.
If not, you can create 8 "neighbor matrices" just by shifting the matrix in each direction, then perform simple operations against each neighbor. For some cases, it may be easier to start with an N+2 x N+2 matrix with suitable "empty" values (usually 0 or nan) in the outer rim. Alternatively, you can shift the matrix over and fill in empty values. Or, for some operations, you don't need an identical-sized matrix, so you can just crop the matrix to create a neighbor. It really depends on what operations you want to do.
For example, taking your input as a fixed 6x4 board for the Game of Life:
def neighbors(matrix):
for i in -1, 0, 1:
for j in -1, 0, 1:
if i == 0 and j == 0: continue
yield np.roll(np.roll(matrix, i, 0), j, 1)
matrix = np.matrix([[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,1,0],
[0,1,1,1,0,0,1,0],
[0,1,1,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]])
while True:
livecount = sum(neighbors(matrix))
matrix = (matrix & (livecount==2)) | (livecount==3)
(Note that this isn't the best way to solve this problem, but I think it's relatively easy to understand, and likely to illuminate whatever your actual problem is.)
I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be highly inefficient (and in my case, impossible) to store all the data twice, I'm currently storing data at the coordinate (i,j) where i is always smaller than j. So in other words, I have a value stored at (2,3) and no value stored at (3,2), even though (3,2) in my model should be equal to (2,3). (See the matrix below for an example)
My problem is that I need to be able to randomly extract the data corresponding to a given index, but, at least the way, I'm currently doing it, half the data is in the row and half is in the column, like so:
M =
[1 2 3 4
0 5 6 7
0 0 8 9
0 0 0 10]
So, given the above matrix, I want to be able to do a query like M[1], and get back [2,5,6,7]. I have two questions:
1) Is there a more efficient (preferably built-in) way to do this than first querying the row, and then the column, and then concatenating the two? This is bad because whether I use CSC (column-based) or CSR (row-based) internal representation, one of the two queries is highly inefficient.
2) Am I even using the right part of Scipy? I have seen a few functions in the Scipy library that mention triangular matrices, but they seem to revolve around getting triangular matrices from a full matrix. In my case, (I think) I already have a triangular matrix, and want to manipulate it.
Many thanks.
I would say that you can't have the cake and eat it too: if you want efficient storage, you cannot store full rows (as you say); if you want efficient row access, I'd say that you have to store full rows.
While real performances depend on your application, you could check whether the following approach works for you:
You use Scipy's sparse matrices for efficient storage.
You automatically symmetrize your matrix (there is a small recipe on StackOverflow, that works at least on regular matrices).
You can then access its rows (or columns); whether this is efficient depends on the implementation of sparse matrices…