tensorflow gather then reduce_sum

tensorflow gather then reduce_sum - python

Let's say I have a matrix M of size 10x5, and a set of indices ix of size 4x3. I want to do tf.reduce_sum(tf.gather(M,ix),axis=1) which would give me a result of size 4x5. However, to do this, it creates an intermediate gather matrix of size 4x3x5. While at these small sizes this isn't a problem, if these sizes grow large enough, I get an OOM error. However, since I'm simply doing a sum over the 1st dimension, I never need to calculate the full matrix. So my question is, is there a way to calculate the end 4x5 matrix without going through the intermediate 4x3x5 matrix?

I think you can just multiply by sparse matrix -- I was searching if the two are internally equivalent then I landed on your post

Related

Faster Matrix-Vector-Multiplication (MVM) if the matrix elements are computed on-the-fly

I am currently working on a Project where I have to calculate the extremal Eigenvalues using the Lanczos-Algorithm. I replaced the MVM so that the Matrix-Elements are calculated on-the-fly, because I will have to calculate the Eigenvalues of real huge matrices. This slows down my Code because for-loops are slower in python thand MVM. Is there any way to improve on my code in an easy way? I tried using Cython but I had no real luck here.
for i in range(0,dim):
for j in range(0,dim):
temp=get_Matrix_Element(i,j)
ws[i]=ws[i]+temp*v[j]
This replaces:
ws = M.dot(v)
Update
The ,atrix M is sparse and can be stored for "small" systems in a sparse-matrix-format using scipy.sparse. For large systems with up to ~10^9 dimensions I need to calculate the matrix elements on-the-fly

The easiest and fast to implement solution would be to go half-way: precompute one row at a time.
In your original solution (M.dot(v)) you had to store dim x dim which grows quadratically. If you precompute one row, it scales linearly and should not cause you the troubles (since you are already storing a resultant vector ws of the same size).
The code should be along the lines of:
for i in range(0,dim):
temp=get_Matrix_Row(i)
ws[i]=temp.dot(v)
where temp is now an dim x 1 vector.
Such modification should allow more optimizations to happen during the dot product without major code modifications.

two-different-sections question about python code

i'm new to python so this is a two-different-sections question... first I don't understand what this code means and whats for the DESCR this supposed to be for description isn't ? and for the split part with values? i don't understand the values
datasets = [ds.DESCR.split()[0] for ds in datasets]
clf_name = [str(clf).split('(')[0][:12] for clf in clfs]
second when do i use np.ones or np.zeros i know to generate an array of ones or zeros but what i mean is is when specificly in data science does it require me to initialize an array with ones or zeros?

This code is creating two lists using list comprehension.
The ds.DESCR and other expressions here can mean anything, depending on the context.
As for your second sub-question, I'd advise to as something more specific.
If you need ones, you use np.ones, if you need zeros, you use np.zeros. That's it.

Np.zeros is great if you for example gradually update your matrix with values. Every entry that is not updated by your algorithm stays zero.
In application this could be a matrix that shows you edges in a picture. You create a matrix of the size of the picture filled with zeros and then go over the picture with an kernel that detects edges. For every edge you detect you increase the value in the matrix at the position of the detected edge.
A matrix or a vector of ones is great to do some matrix multiplications. Assume some vector of shape (n,1) x (1,n) of a Vector filled with ones will expand the vector to a matrix of shape (n,n). This is and similar cases can make a vector/matrix of ones necessary.

How can I append to a numpy array without reassigning the result to a new variable?

I have a matrix M with dimensions (m, n) and I need to append new columns to it from a matrix L with dimensions (m, l). So basically I will end up with a matrix (m, n + l).
No problem in doing this, I can use:
numpy.concatenate
numpy.vstack
numpy.append
in the following fashion np.command(M, L) and it will return me a new matrix. The problem arises with the fact that I need to append many many matrices to the original matrix, and the size of these matrices L are not known beforehand.
So I ended up with
# M is my original matrix
while:
# find out my L matrix
M = np.append(M, L)
# check if I do not need to append the matrix
Knowing that my matrix M has approximately 100k rows, and I add on average 5k columns, the process is super slow and takes more than couple of hours (I don't know exactly how long because I gave up after 2 hours).
The problem here is clearly in this append function (I tried it with vstack and nothing changes). Also if I just calculate matrices L (without appending them), I spend less than 10 minutes for the task. I assume that this reassigning of matrix is what makes it slow. Intuitively it makes sense because I am constantly recreating the matrix M and removing the old matrix. But I do not know how to get rid of the reassigning part.
One idea is that creating an empty matrix beforehand and then populating it with correct columns should be faster, but the problem is that I do not know with what dimensions I should create it (there is no way to predict the number of columns in my matrix).
So how can I improve performance here?

There's no way to append to an existing numpy array without creating a copy.
The reason is that a numpy array must be backed by a contiguous block of memory. If I create a (1000, 10) array, then decide that I want to append another row, I'd need to be able to extend the chunk of RAM corresponding to the array so that it's big enough to accommodate (1001, 10) elements. In the general case this is impossible, since the adjacent memory addresses may already be allocated to other objects.
The only way to 'concatenate' arrays is to get the OS to allocate another chunk of memory big enough for the new array, then copy the contents of the original array and the new row into this space. This is obviously very inefficient if you're doing it repeatedly in a loop, especially since the copying step becomes more and more expensive as your array gets larger and larger.
Here are two possible work-arounds:
Use a standard Python list to accumulate your rows inside your while loop, then convert the list to an array in a single step, outside the loop. Appending to a Python list is very cheap compared with concatenating numpy arrays, since a list is just an array of pointers which don't necessarily have to reference adjacent memory addresses, and therefore no copying is required.
Take an educated guess at the number of rows in your final array, then allocate a numpy array that's slightly bigger and fill in the rows as you go along. If you run out of space, concatenate on another chunk of rows. Obviously the concatenation step is expensive, since you'll need to make a copy, but you're much better off doing this once or twice than on every iteration of your loop. When you're choosing the initial number of rows in your output array there will be a trade-off between avoiding over-allocating and unnecessary concatenation steps. Once you're done, you could then 'trim off' any unused rows using slice indexing.

Array for Large Data

I need to form a 2D matrix with total size 2,886 X 2,003,817. I try to use numpy.zeros to make a 2D zero element matrix and then calculate and assign each element of Matrix (most of them are zero son I need to replace few of them).
but when I try numpy.zero to initialize my matrix I get the following memory error:
C=numpy.zeros((2886,2003817)) "MemoryError"
I also try to form the matrix without initialization. Basically I calculate the element of each row in each iteration of my algorithm and then
C=numpy.concatenate((C,[A]),axis=0)
in which C is my final matrix and A is the calculated row at the current iteration. But I find out this method takes a lots of time, I am guessing it is because of using numpy.concatenate(?)
could you please let me know if there is a way to avoid memory error in initializing my matrix or is there any better method or suggestion to form the matrix in this size?
Thanks,
Amir

If your data has a lot of zeros in it, you should use scipy.sparse matrix.
It is a special data structure designed to save memory for matrices that have a lot of zeros. However, if your matrix is not that sparse, sparse matrices start to take up more memory. There are many kinds of sparse matrices, and each of them is efficient at one thing, while inefficient at another thing, so be careful with what you choose.

How to assemble large sparse matrices effectively in python/scipy

I am working on an FEM project using Scipy. Now my problem is, that
the assembly of the sparse matrices is too slow. I compute the
contribution of every element in dense small matrices (one for each
element). For the assembly of the global matrices I loop over all
small dense matrices and set the matrice entries the following way:
[i,j] = someList[k][l]
Mglobal[i,j] = Mglobal[i,j] + Mlocal[k,l]
Mglobal is a lil_matrice of appropriate size, someList maps the
indexing variables.
Of course this is rather slow and consumes most of the matrice
assembly time. Is there a better way to assemble a large sparse matrix
from many small dense matrices? I tried scipy.weave but it doesn't
seem to work with sparse matrices

I posted my response to the scipy mailing list; stack overflow is a bit easier
to access so I will post it here as well, albeit a slightly improved version.
The trick is to use the IJV storage format. This is a trio of three arrays
where the first one contains row indicies, the second has column indicies, and
the third has the values of the matrix at that location. This is the best way
to build finite element matricies (or any sparse matrix in my opinion) as access
to this format is really fast (just filling an an array).
In scipy this is called coo_matrix; the class takes the three arrays as an
argument. It is really only useful for converting to another format (CSR os
CSC) for fast linear algebra.
For finite elements, you can estimate the size of the three arrays by something
like
size = number_of_elements * number_of_basis_functions**2
so if you have 2D quadratics you would do number_of_elements * 36, for example.
This approach is convenient because if you have local matricies you definitely
have the global numbers and entry values: exactly what you need for building
the three IJV arrays. Scipy is smart enough to throw out zero entries, so
overestimating is fine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.