The Fast Fourier Transform (fft; documentation) transforms 'a' into its fourier, spectral equivalent:
numpy.fft.fft(a, n=None, axis=-1, norm=None)
The parameter, n represents—so far as I understand it—how many samples are in the output, where the output is either cropped if n is smaller than the number of samples in a, or padded with zeros if n is larger.
What does axis do? What does it mean exactly? I haven't been able to find any clear examples of its use.
np.fft.fft computes the one-dimensional discrete Fourier transform. If you give a one dimensional input (a vector), it will just compute the transform for that input. However, if your input has more than one dimension, like a 2D matrix, or higher, NumPy assumes you are giving many vectors and you want to compute the transform of each of them. The axis parameter indicates the dimension corresponding to those vectors, and by default it is the last one (-1). So, for example, for a 2D matrix m, if axis=0 then each column m[:, 0], m[:, 1], etc. would be the vectors for which the transform is computed, while passing axis=1 (equivalent to the default axis=-1), each row m[0, :], m[1, :], etc. would be considered a vector for the transform. If you want to compute the transform of all values in the input, regardless of the dimensions, you would have to flatten the input, for example with np.ravel.
Btw, this is a very common convention in NumPy (and many other algebra packages), where a one-dimensional operation can work on multidimensional inputs by receiving an axis parameter that indicates the dimension over which the operation is performed.
numpy.fft.fft() returnes a one-dimensional fourier-transform of your array. That means if you have an array of shape (N,M) it will not give you a two-dimensional fft (np.fft.fft2() does) but return the fft along the last axis. If you like to have the fft calculated rather along the columns than the rows you should pass axis=0.
Related
I'm trying to understand logistic and linear regression and was able to understand the theory behind it (doing andrew ng course).
We have X -> given features -> matrix of (m , n+1) where m - no. of cases and n- features given (excluding x0)
We have y - > the label to predict -> matrix of (m,1)
Now while I'm implementing it from scratch in python, I'm confused as to why we use transpose of theta in the sigmoid function.
Also we use theta transpose X for linear regression too.
We do not have to perform matrix multiplication anywhere while coding, its straight element to element coding, what's the need for the transpose or is my understanding wrong and we need to take matrix multiplication during implementation.
My main concern is that I'm very confused as to where we do matrix multiplication and where we do element wise multiplication in logistic and linear regression
You are a bit off topic for this area, but the piece you appear to be hung up on is the treatment of x and Theta.
In the use cases you describe, x is a vector of inputs, or the "feature vector". The Theta vector is the vector of coefficients. Both are usually expressed as column vectors and of course, must be of the same dimension.
So to "make a prediction" you need the inner product of these two, and the output needs to be a scalar (by definition for inner product) so you need to transpose the theta vector in order to properly express that operation, which is a matrix multiplication of two vectors. Make sense?
For matrix multiplication, the number of Columns in the first element must equal the number of rows in the second element. Since one of the elements your multiplying has either one column or one row, it does not appear to be matrix multiplication due to it's simplicity. But it still is matrix multiplication
Let me provide an example,
Let A be (m,n) matrix
We can perform scalar multiplication, for some fixed a in the real numbers
If we want to multiply A to some vector, x, we need to meet some restrictions. Here it is common to mistake the dot product for matrix multiplication, but they serve completely different purposes.
So our restrictions for multiplying an (m,n) matrix, A by a vector x is that x has the same number of entries as A has columns To do this in your example, one of the elements needed to be transposed.
I'm going to be doing some geometric calculations involving 2-D and 3D points using numpy.
What is the canonical representation of a 2-D or 3-D point? Please assume minimal familiarity with numpy, data shapes, etc.
The representation of a single point in Cartesian space is somewhat trivial. You could even use flat tuples or lists to represent them and matrix operations would still work, but if you want to add or scale them (which is fundamentally what linear spaces are for) you have to use arrays. I don't see a reason why not to use a 1d array with shape (d,) in d dimensions: you can use those both as column and row vectors on either side of a matrix using the # matmul operator:
import numpy as np
rot90 = np.array([[0, -1, 0], [1, 0, 0], [0, 0, 1]]) # rotate 90 degrees around z
inp = np.array([1, 0, 0]) # x
# rotate:
inp_rot = rot90 # inp # y
# inverse transform:
inp_invrot = inp # rot90 # -y
A much better question is how to represent collections of points in Cartesian space. If you have N points you will probably want to use a 2d array. But which shape should it be, (N, d) or (d, N)? The answer depends on your use case but without further input you'll want to choose (N, d).
Arrays in numpy are "C-contiguous" by default, which is also called row-major memory layout. This means that on creation an array occupies a contiguous block of memory by default, and items are laid out in memory row after row, with these indices as an example:
>>> np.arange(2*3).reshape(2, 3)
array([[0, 1, 2],
[3, 4, 5]])
One of the reasons we use numpy is that a contiguous block of memory for a given type occupies much less space than a native python container of the same size, at least for large datasets. The other reason is that we can use vectorized operations that work on slices of the input "simultaneously". The quotes are there because fundamentally the hands of the CPU are bound, but it turns out that you can achieve quite some speedup by making good use of CPU caches. And this is where memory layout comes into play: by using operations on an array that access elements close in memory you have a higher chance of making use of caching, and the reduced communication between RAM and CPU will lead to shorter runtimes.
The problem is not trivial, because vectorizing along larger non-contiguous dimensions might end up faster than vectorizing along smaller contiguous ones. However, without any additional information it's a good rule of thumb to put those dimensions last where you are likely to perform vectorized operations and reductions such as .mean() or .sum(). In case of N points in d-dimensional space it's quite likely that you will want to handle each point separately. Loops in matrix multiplications and things like scalar products and vector norms will all want you to work with one component after the other for a given point.
This is why you will see numpy and scipy functions usually assume arrays of shape (N, d): the inner dimension is second and the "batch" index is first. Consider for example numpy.linalg.eig:
Parameters:
a : (…, M, M) array
Matrices for which the eigenvalues and right eigenvectors will be computed
Returns:
w : (…, M) array
The eigenvalues, each repeated according to its multiplicity. The eigenvalues
are not necessarily ordered. The resulting array will be of complex type,
unless the imaginary part is zero in which case it will be cast to a real
type. When a is real the resulting eigenvalues will be real (0 imaginary
part) or occur in conjugate pairs
[...]
It treats multidimensional arrays as batches of matrices, where the last two indices correspond to the Cartesian indices. Similarly the returned eigenvalues and eigenvectors have batch indices first and vector space indices last.
A more direct example is scipy.spatial.distance.pdist which computes the distance between pairs of points in a collection:
Parameters
X : ndarray
An m by n array of m original observations in an n-dimensional space.
[...]
Again you can see the convention that Cartesian indices are last. The same goes for scipy.interpolate.griddata and probably a bunch of other functions.
So if you have a good reason to use either representation: do that. But if you don't have a good indicator (such as the results of profiling both representations) you should stick with the "batch of vectors/matrices" approach usually employed by numpy and scipy (shape (N, d)), because you might even end up using some of these functions, for which your representation will then be native.
Represent them in your source code as tuples or lists, e.g. (1, 0) or [1, 0, 1].
As per this example from scipy:
>>> from scipy.spatial import distance
>>> distance.euclidean([1, 0, 0], [0, 1, 0])
1.4142135623730951
svd formular: A ≈ UΣV*
I use numpy.linalg.svd to run svd algorithm.
And I want to set dimension of matrix.
For example: A=3*5 dimension, after running numpy.linalg.svd, U=3*3 dimension, Σ=3*1 dimension, V*=5*5 dimension.
I need to set specific dimension like U=3*64 dimension, V*=64*5 dimension. But it seems there is no optional dimension parameter can be set in numpy.linalg.svd.
If A is a 3 x 5 matrix then it has rank at most 3. Therefore the SVD of A contains at most 3 singular values. Note that in your example above, the singular values are stored as a vector instead of a diagonal matrix. Trivially this means that you can pad your matrices with zeroes at the bottom. Since the full S matrix contains of 3 values on the diagonal followed by the rest 0's (in your case it would be 64x64 with 3 nonzero values), the bottom rows of V and the right rows of U don't interact at all and can be set to anything you want.
Keep in mind that this isn't the SVD of A anymore, but instead the condensed SVD of the matrix augmented with a lot of 0's.
The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?
Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)
Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.
I have a numpy ndarray object with the following shape:
(3, 256, 170, 256).
So, basically this represents an array of 3-dimensional vectors. The dimension of the vector is the first element as it enables one to write something like: array[0] for the relevant vector component.
Now, I am trying to use scipy pdist function, which computes the distance between the entries. So, I need to modify this array, so that it can be represented as a two dimensional matrix, where the number of rows is 256*170*256 and the number of columns is 3 and pdist should return me the matrix where each element is the squared distance between the corresponding 3 dimensional vectors (if I have interpreted the documentation correctly).
Can someone tell me how I can get a view into this numpy array, so that I can generate this matrix. I do not want to copy the data again (as these matrices can be quite large), so looking for some efficient solutions.