Numpy: populating arrays to vectorize nested summations

Numpy: populating arrays to vectorize nested summations - python

I would like to vectorize nested sums in Python, in order to speed up the process. At the moment I have nested for loops.
for ja in np.arange(0,Na):
for jb in np.arange(0,Nb):
for ma in np.arange(-ja,ja+1):
...
The end result is the sum across 2x2 matrices, each with entries dependent on the values of ja,jb,ma,mb.
The matrices look like:
[[f11(ja,jb,ma,mb),f12(ja,jb,ma,mb)],
[f21(ja,jb,ma,mb),f22(ja,jb,ma,mb)]]
where fij are functions. The functions can be applied to arrays as they will work element by element (expoenetials, square roots, trig functions etc...). I can create arrays like:
ja=[0,0,0,1,1,1,2,2,2,3,3,3]
jb=[0,1,2,3,0,1,2,3,0,1,2,3]
By using
range_a = np.arange(0,Na/2+1)
range_b = np.arange(0,Nb/2+1)
ja = np.tile(a_range,Nb/2+1)
jb = np.repeat(b_range,Na/2+1)
But my trouble is to create arrays, such that for each value in the above j we have the m structure (from -j to j):
ma=[0,0,0,-1,-1,-1,0,0,0,1,1,1,-2,-2,-2,...]
mb=[0,-1,0,1,-2,-1,0,1,2,...]
I am having trouble making those m arrays though! each time a -j,..,j structure repeats it has a different length, so I cannot use functions like numpy.tile and numpy.repeat. So, any ideas on how to do this?
My further intentions might be relevant: I hope to be able to pad them with zeros and construct 2*length matrices, so that only one entry is populated (I need 4 matrices for each vector, total of 4 vectors). Then I can apply the functions to and add these upp. To calculate the summation across the 2*2 matrices I will dot product with a matrix that is length*4 in shape. The result is 2*2. Perhaps a better strategy exists? I thought this might have occurred before, as it has a common application in physics (trace over a density operator) but I have not found it.

Related

Ordering a two-dimensional array relative to the main diagonal

Given a two-dimensional array T of size NxN, filled with various natural numbers (They do not have to be sorted in any way as in the example below.). My task is to write a program that transforms the array in such a way that all elements lying above the main diagonal are larger than each element lying on the diagonal and all elements lying below the main diagonal are to be smaller than each element on the diagonal.
For example:
T looks like this:
[2,3,5][7,11,13][17,19,23] and one of the possible solutions is:
[13,19,23][3,7,17][5,2,11]
I have no clue how to do this. Would anyone have an idea what algorithm should be used here?

Let's say the matrix is NxN.
Put all N² values inside an array.
Sort the array with whatever method you prefer (ascending order).
In your final array, the (N²-N)/2 first values go below the diagonal, the following N values go to the diagonal, and the final (N²-N)/2 values go above the diagonal.
The following pseudo-code should do the job:
mat <- array[N][N] // To be initialized.
vec <- array[N*N]
for i : 0 to (N-1)
for j : 0 to (N-1)
vec[i*N+j]=mat[i][j]
next j
next i
sort(vec)
p_below <- 0
p_diag <- (N*N-N)/2
p_above <- (N*N+N)/2
for i : 0 to (N-1)
for j : 0 to (N-1)
if (i>j)
mat[i][j] = vec[p_above]
p_above <- p_above + 1
endif
if (i<j)
mat[i][j] = vec[p_below]
p_below <- p_below + 1
endif
if (i=j)
mat[i][j] = vec[p_diag]
p_diag <- p_diag + 1
endif
next j
next i
Code can be heavily optimized by sorting directly the matrix, using a (quite complex) custom sort operator, so it can be sorted "in place". Technically, you'll do a bijection between the matrix indices to a partitioned set of indices representing "below diagonal", "diagonal" and "above diagonal" indices.
But I'm unsure that it can be considered as an algorithm in itself, because it will be highly dependent on the language used AND on how you stored, internally, your matrix (and how iterators/indices are used). I could write one in C++, but I lack knownledge to give you such an operator in Python.
Obviously, if you can't use a standard sorting function (because it can't work on anything else but an array), then you can write your own with the tricky comparison builtin the algorithm.
For such small matrixes, even a bubble-sort can work properly, but obviously implementing at least a quicksort would be better.
Elements about optimizing:
First, we speak about the trivial bijection from matrix coordinate [x][y] to [i]: i=x+y*N. The invert is obviously x=floor(i/N) & y=i mod N. Then, you can parse the matrix as a vector.
This is already what I do in the first part initializing vec, BTW.
With matrix coordinates, it's easy:
Diagonal is all cells where x=y.
The "below" partition is everywhere x<y.
The "above" partition is everywhere x>y.
Look at coordinates in the below 3x3 matrix, it's quite evident when you know it.
0,0 1,0 2,0
0,1 1,1 2,1
0,2 1,2 2,2
We already know that the ordered vector will be composed of three parts: first the "below" partition, then the "diagonal" partition, then the "above" partition.
The next bijection is way more tricky, since it requires either a piecewise linear function OR a look-up table. The first requires no additional memory but will use more CPU power, the second use as much memory as the matrix but will require less CPU power.
As always, optimization for speed often cost memory. If memory is scarse because you use huge matrixes, then you'll prefer a function.
In order to shorten a bit, I'll explain only for "below" partition. In the vector, the (N-1) first elements will be the ones belonging to the first column. Then, we'll have (N-2) elements for the 2nd column, (N-3) for the third, until we had only 1 element for the (N-1)th column. You see the scheme, sum of the number of elements and the column (zero-based index) is always (N-1).
I won't write the function, because it's quite complex and, honestly, it won't help so much to understand. Simply know that converting from matrix indices to vector is "quite easy".
The opposite is more tricky and CPU-intensive, and it SHOULD use a (N-1) element vector to store where each column starts within the vector to GREATLY speed up the process. Thanks, this vector can also be used (from end to begin) for the "above" partition, so it won't burn too much memory.
Now, you can sort your "vector" normally, simply by chaining the two bijection together with the vector index, and you'll get a matrix cell instead. As long as the sorting algorithm is stable (that's usually the case), it will works and will sort your matrix "in place", at the expense of a lot of mathematical computing to "route" the linear indexes to matrix indexes.
Please note that, despite we speak about bijections, we need ONLY the "vector to matrix" formulas. The "matrix to vector" are important - it MUST be a bijection! - but you won't use them, since you'll sort directly the (virtual) vector from 0 to N²-1.

Swap the rows of one matrix to make sure each matrices obtained is different to another?

I have a matrix NxM.
N is big enough N >> 10000.
I wonder if there is an algorithm to mix all the lines of a matrix to get a 100 matrix for example. My matrices C must not be identical.
Thoughts?

So, do you want to keep the shape of the matrix and just shuffle the rows or do you want to get subsets of the matrix?
For the first case I think the permutation algorithm from numpy could be your choice. Just create a permutation of a index list, like Souin propose.
For the second case just use the numpy choice funtion (also from the random module) without replacement if I understood your needs correctly.

Array operations using multiple indices of same array

I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?

The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.

Large matrix multiplication in Python - what is the best option?

I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data).
I want to multiply them (which produces a triangular matrix - however I dont get this since I don't limit the dot product to yield a triangular matrix).
I am wondering what the best way of multiplying them is (memory-wise and speed-wise) - I am going to do the computation on a m2.4xlarge AWS instance which has >60GB of RAM. I would prefer to keep the calc in RAM for speed reasons.
I appreciate that SciPy has sparse matrices and so does h5py, but have no experience in either.
Whats the best option to go for?
Thanks in advance
UPDATE: sparsity of the boolean matrices is <0.6%

If your matrices are relatively empty it might be worthwhile encoding them as a data structure of the non-False values. Say a list of tuples describing the location of the non-False values. Or a dictionary with the tuples as the keys.
If you use e.g. a list of tuples you could use a list comprehension to find the items in the second list that can be multiplied with an element from the first list.
a = [(0,0), (3,7), (5,2)] # et cetera
b = ... # idem
for r, c in a:
res = [(r, k) for j, k in b if k == j]

-- EDITED TO SATISFY BELOW COMMENT / DOWNVOTER --
You're asking how to multiply matrices fast and easy.
SOLUTION 1: This is a solved problem: use numpy. All these operations are easy in numpy, and since they are implemented in C, are rather blazingly fast.
http://www.numpy.org/
http://www.scipy.org
also see:
Very large matrices using Python and NumPy
http://docs.scipy.org/doc/scipy/reference/sparse.html
SciPy and Numpy have sparse matrices and matrix multiplication. It doesn't use much memory since (at least if I wrote it in C) it probably uses linked lists, and thus will only use the memory required for the sum of the datapoints, plus some overhead. And, it will almost certainly be blazingly fast compared to pure python solution.
SOLUTION 2
Another answer here suggests storing values as tuples of (x, y), presuming value is False unless it exists, then it's true. Alternate to this is a numeric matrix with (x, y, value) tuples.
REGARDLESS: Multiplying these would be Nasty time-wise: find element one, decide which other array element to multiply by, then search the entire dataset for that specific tuple, and if it exists, multiply and insert the result into the result matrix.
SOLUTION 3 ( PREFERRED vs. Solution 2, IMHO )
I would prefer this because it's simpler / faster.
Represent your sparse matrix with a set of dictionaries. Matrix one is a dict with the element at (x, y) and value v being (with x1,y1, x2,y2, etc.):
matrixDictOne = { 'x1:y1' : v1, 'x2:y2': v2, ... }
matrixDictTwo = { 'x1:y1' : v1, 'x2:y2': v2, ... }
Since a Python dict lookup is O(1) (okay, not really, probably closer to log(n)), it's fast. This does not require searching the entire second matrix's data for element presence before multiplication. So, it's fast. It's easy to write the multiply and easy to understand the representations.
SOLUTION 4 (if you are a glutton for punishment)
Code this solution by using a memory-mapped file of the required size. Initialize a file with null values of the required size. Compute the offsets yourself and write to the appropriate locations in the file as you do the multiplication. Linux has a VMM which will page in and out for you with little overhead or work on your part. This is a solution for very, very large matrices that are NOT SPARSE and thus won't fit in memory.
Note this solves the complaint of the below complainer that it won't fit in memory. However, the OP did say sparse, which implies very few actual datapoints spread out in giant arrays, and Numpy / SciPy handle this natively and thus nicely (lots of people at Fermilab use Numpy / SciPy regularly, I'm confident the sparse matrix code is well tested).

Scipy sparse triangular matrix?

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be highly inefficient (and in my case, impossible) to store all the data twice, I'm currently storing data at the coordinate (i,j) where i is always smaller than j. So in other words, I have a value stored at (2,3) and no value stored at (3,2), even though (3,2) in my model should be equal to (2,3). (See the matrix below for an example)
My problem is that I need to be able to randomly extract the data corresponding to a given index, but, at least the way, I'm currently doing it, half the data is in the row and half is in the column, like so:
M =
[1 2 3 4
0 5 6 7
0 0 8 9
0 0 0 10]
So, given the above matrix, I want to be able to do a query like M[1], and get back [2,5,6,7]. I have two questions:
1) Is there a more efficient (preferably built-in) way to do this than first querying the row, and then the column, and then concatenating the two? This is bad because whether I use CSC (column-based) or CSR (row-based) internal representation, one of the two queries is highly inefficient.
2) Am I even using the right part of Scipy? I have seen a few functions in the Scipy library that mention triangular matrices, but they seem to revolve around getting triangular matrices from a full matrix. In my case, (I think) I already have a triangular matrix, and want to manipulate it.
Many thanks.

I would say that you can't have the cake and eat it too: if you want efficient storage, you cannot store full rows (as you say); if you want efficient row access, I'd say that you have to store full rows.
While real performances depend on your application, you could check whether the following approach works for you:
You use Scipy's sparse matrices for efficient storage.
You automatically symmetrize your matrix (there is a small recipe on StackOverflow, that works at least on regular matrices).
You can then access its rows (or columns); whether this is efficient depends on the implementation of sparse matrices…

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.