svd formular: A ≈ UΣV*
I use numpy.linalg.svd to run svd algorithm.
And I want to set dimension of matrix.
For example: A=3*5 dimension, after running numpy.linalg.svd, U=3*3 dimension, Σ=3*1 dimension, V*=5*5 dimension.
I need to set specific dimension like U=3*64 dimension, V*=64*5 dimension. But it seems there is no optional dimension parameter can be set in numpy.linalg.svd.
If A is a 3 x 5 matrix then it has rank at most 3. Therefore the SVD of A contains at most 3 singular values. Note that in your example above, the singular values are stored as a vector instead of a diagonal matrix. Trivially this means that you can pad your matrices with zeroes at the bottom. Since the full S matrix contains of 3 values on the diagonal followed by the rest 0's (in your case it would be 64x64 with 3 nonzero values), the bottom rows of V and the right rows of U don't interact at all and can be set to anything you want.
Keep in mind that this isn't the SVD of A anymore, but instead the condensed SVD of the matrix augmented with a lot of 0's.
Related
The Fast Fourier Transform (fft; documentation) transforms 'a' into its fourier, spectral equivalent:
numpy.fft.fft(a, n=None, axis=-1, norm=None)
The parameter, n represents—so far as I understand it—how many samples are in the output, where the output is either cropped if n is smaller than the number of samples in a, or padded with zeros if n is larger.
What does axis do? What does it mean exactly? I haven't been able to find any clear examples of its use.
np.fft.fft computes the one-dimensional discrete Fourier transform. If you give a one dimensional input (a vector), it will just compute the transform for that input. However, if your input has more than one dimension, like a 2D matrix, or higher, NumPy assumes you are giving many vectors and you want to compute the transform of each of them. The axis parameter indicates the dimension corresponding to those vectors, and by default it is the last one (-1). So, for example, for a 2D matrix m, if axis=0 then each column m[:, 0], m[:, 1], etc. would be the vectors for which the transform is computed, while passing axis=1 (equivalent to the default axis=-1), each row m[0, :], m[1, :], etc. would be considered a vector for the transform. If you want to compute the transform of all values in the input, regardless of the dimensions, you would have to flatten the input, for example with np.ravel.
Btw, this is a very common convention in NumPy (and many other algebra packages), where a one-dimensional operation can work on multidimensional inputs by receiving an axis parameter that indicates the dimension over which the operation is performed.
numpy.fft.fft() returnes a one-dimensional fourier-transform of your array. That means if you have an array of shape (N,M) it will not give you a two-dimensional fft (np.fft.fft2() does) but return the fft along the last axis. If you like to have the fft calculated rather along the columns than the rows you should pass axis=0.
I'm trying to understand logistic and linear regression and was able to understand the theory behind it (doing andrew ng course).
We have X -> given features -> matrix of (m , n+1) where m - no. of cases and n- features given (excluding x0)
We have y - > the label to predict -> matrix of (m,1)
Now while I'm implementing it from scratch in python, I'm confused as to why we use transpose of theta in the sigmoid function.
Also we use theta transpose X for linear regression too.
We do not have to perform matrix multiplication anywhere while coding, its straight element to element coding, what's the need for the transpose or is my understanding wrong and we need to take matrix multiplication during implementation.
My main concern is that I'm very confused as to where we do matrix multiplication and where we do element wise multiplication in logistic and linear regression
You are a bit off topic for this area, but the piece you appear to be hung up on is the treatment of x and Theta.
In the use cases you describe, x is a vector of inputs, or the "feature vector". The Theta vector is the vector of coefficients. Both are usually expressed as column vectors and of course, must be of the same dimension.
So to "make a prediction" you need the inner product of these two, and the output needs to be a scalar (by definition for inner product) so you need to transpose the theta vector in order to properly express that operation, which is a matrix multiplication of two vectors. Make sense?
For matrix multiplication, the number of Columns in the first element must equal the number of rows in the second element. Since one of the elements your multiplying has either one column or one row, it does not appear to be matrix multiplication due to it's simplicity. But it still is matrix multiplication
Let me provide an example,
Let A be (m,n) matrix
We can perform scalar multiplication, for some fixed a in the real numbers
If we want to multiply A to some vector, x, we need to meet some restrictions. Here it is common to mistake the dot product for matrix multiplication, but they serve completely different purposes.
So our restrictions for multiplying an (m,n) matrix, A by a vector x is that x has the same number of entries as A has columns To do this in your example, one of the elements needed to be transposed.
I have a basis set of square matrices and a data set that I need to find the coefficients for given that my data is a linear sum of the basis set.
def basis(a,b,c):
return a*gam1+b*gam2+c*kapp+allsky
So data = basis and I need the best fit (least square) values of the coefficients a,b and c. The basis matrices and the data matrices are all square matrices of 89x89. I have tried using np.linalg.lstsq however since my A matrix would need to be a matrix of the 4 basis matrices the array dimension becomes 4 and throws an error stating the array dimension must be 2. Any help is appreciated.
The sparse matrix has only 0 and 1 at each entry (i,j) (1 stands for sample i has feature j). How can I estimate the co-occurrence matrix for each feature given this sparse representation of data points? Especially, I want to find pairs of features that co-occur in at least 50 samples. I realize it might be hard to produce the exact result, is there any approximated algorithm in data mining that allows me to do that?
This can be solved reasonably easily if you go to a transposed matrix.
Of any two features (now rows, originally columns) you compute the intersection. If it's larger than 50, you have a frequent cooccurrence.
If you use an appropriate sparse encoding (now of rows, but originally of columns - so you probably need not only to transpose the matrix, but also to reencode it) this operation using O(n+m), where n and m are the number of nonzero values.
If you have an extremely high number of features this make take a while. But 100000 should be feasible.
I have a numpy ndarray object with the following shape:
(3, 256, 170, 256).
So, basically this represents an array of 3-dimensional vectors. The dimension of the vector is the first element as it enables one to write something like: array[0] for the relevant vector component.
Now, I am trying to use scipy pdist function, which computes the distance between the entries. So, I need to modify this array, so that it can be represented as a two dimensional matrix, where the number of rows is 256*170*256 and the number of columns is 3 and pdist should return me the matrix where each element is the squared distance between the corresponding 3 dimensional vectors (if I have interpreted the documentation correctly).
Can someone tell me how I can get a view into this numpy array, so that I can generate this matrix. I do not want to copy the data again (as these matrices can be quite large), so looking for some efficient solutions.