Applying CNN to Fast time Fourier Transform?

Applying CNN to Fast time Fourier Transform? - python

I have data that fast time fourier transform is applied.
(amplitudes at specific Hzs)
There are solutions on internet that CNN is applied to mel spectrogram, however, I see no solution that CNN is applied to Fast Fourier Transformed signal.
Is it possible that CNN is applied to Fast Fourier Transformed signals?
Or is it not possible because CNN is considering temporal attribute?
Thanks!

I'm assuming each row of your spreadsheet is IID, e.g. it wouldn't change the problem to re-order the rows in that spreadsheet.
In this case you have a pretty typical ML problem. The fact that the FFT has already been applied and specific frequency responses (columns) have been extracted is a process called "feature engineering". Prior to the common use of neural networks, this was a standard step in all machine learning problems and remains common to a great many domains.
With data that has been feature engineered, you should look to traditional ML algorithms. Random Forests, XGBoost, and Linear Regression come to mind. A fully connected neural network is also appropriate, but I would typically expect it to under-perform other ML methods.
The hallmark of a CNN is that it operates on an ordered sequence of data. In your case the raw data, from which your dataset was derived, would be appropriate for a CNN. In a sound file you have a 1D sequence of information. You could not re-order the data in the time dimension without fundamentally changing its meaning.
A 2D CNN operates over an image where the pixel order in X and Y cannot be changed. Again the sequential order of the data matters. The same applies for 3D CNNs.
Be aware that the application of a FFT has fundamentally biased your solution by representing it only in a limited set of frequency responses. All feature engineering is fundamentally biasing the problem, presumably in a well thoughout-out way. However, it's entirely possible that other useful signals in the data exist, which aren't expressed by the FFT # 10, 20, 30 Hz, etc. The CNN has the capacity to learn its own version of an FFT as well as other non cyclic patterns. Typically, the lack of a feature engineering step is the key differentiator between the CNN and traditional ML algorithms.

Related

The use of feature scaling in scikit learn

I'm studing machine learning from here and the course uses 'scikit learn' from regression - https://www.udemy.com/machinelearning/
I can see that for some training regression algorithms, the author uses feature scaling and for some he doesn't because some 'scikit learn' regression algorithms take care of feature scaling by themselves.
How to know in which training algorithm we need to do feature scaling and where we don't need to ?

No machine learning technique needs feature scaling, for some algoirthms scaled inputs make the optimizing easier on the computer which results in faster training time.
Typically, algorithms that leverage distance or assume normality benefit from feature scaling. https://medium.com/greyatom/why-how-and-when-to-scale-your-features-4b30ab09db5e

It depends on the algorithm you are using and your dataset.
Support Vector Machines (SVM), these models converge faster if you scale your features . The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges
In K-means clustering, you find out the Euclidean distance for clustering different data points together. Thus it seems to be a good reason to scale your features so that the centroid doesn't get much affected by the large or abnormal values.
In case of regression, scaling your features will not be of much help since the relation of coefficients between original dataset and the relation of coefficients between scaled dataset will be the same.
In case of Decision Trees, they don't usually require feature scaling.
In case of models which have learning rates involved and are using gradient descent, the input scale does effect the gradients. So feature scaling would be considered in this case.

A very simple answer. Some algorithm does the feature scaling even if you don't and some do not. So, if the algorithm does not, you need to manually scale the features.
You can google which algorithm does the feature scaling, but its good to be safe by manually scaling the feature. Always make sure, the features are scaled, otherwise, the algorithm would give output offset to ideal.

Way to embed fixed length spectograms to tensor with CNN perhaps

I'm developing a way to compare two spectrograms and score their similarity.
I have been thinking for a long time how to do so, how to pick the whole model/approach.
Audioclips I'm using to make spectrograms are recordings from android phone, i convert them from .m4a to .wav and then process them to plot the spectrogram, all in python.
All audio recordings have same length
Thats something that really help because all the data can then be represented in the same dimensional space.
I filtered the audio using Butterworth Bandpass Filter, which is commonly used in voice filtering thanks to its steady behavior in the persisted part of signal. As cutoff freq i used 400Hz and 3500Hz
After this procedure the output looks like this
My first idea was to find region of interest using OpenCV on that spectrogram, so i filtered color and get this output, which can be roughly use to get the limits of the signal, but that will make every clip different lenght and i perhaps dont want that to happen
Now to get to my question - i was thinking about embedding those spectrograms to multidimensional points and simply score their accuracy as the distance to the most accurate sample, which would be visualisable thanks to dimensionality reduction in some cluster-like space. But thats seems to plain, doesnt involve training and thus making it hard to verify. SO
Is there any possibility to use Convolution Neural Network, or combination of networks like CNN -> delayed NN to embed this spectogram to multidim point and thus making it possible to not compare them directly but comparing output of the network?
If there is anything i missed in this question please comment, i would fix that right away, thank you very much for your time.
Josef K.
EDIT:
After tip from Nikolay Shmyrev i switched to using the Mel spectrogram:
That looks much more promising, but my question remains almost the same, can i use pretrained CNN models, like VGG16 to embed those spectrograms to tensors and thus being able to compare them ?? And if so, how? Just remove last fully connected layer and Flatten it instead?

In my opinion, and according to Yann Lecun, when you target speech recognition with Deep Neural Network you have two obligations:
You need to use a Recurrent Neural Network in order to have the memory ability (memory is really important for speech recognition...)
and
you will need a lot of training data 
You may try to use RNN on tensorflow, but you definitely need a lot of training data.
If you don't want (or can't) find or generate a lot training data, you have forget the deep learning to solve this ...
In that case (forget deep learning) you may take a look of how Shazam work (based on fingerprint algorithm)

You can use CNN of course, tensorflow has special classes for that for example as many other frameworks. You simply convert your image to a tensor and apply the network and as a result you get lower-dimensional vector you can compare.
You can train your own CNN too.
For best accuracy it is better to scale lower frequencies (bottom part) and compress higher frequencies in your picture since lower frequencies have more importance. You can read about Mel Scale for more information

What algorithm to chose for binary image classification

Lets say I have two arrays in dataset:
1) The first one is array classified as (0,1) - [0,1,0,1,1,1,0.....]
2) And the second array costists of grey scale image vectors with 2500 elements in each(numbers from 0 to 300). These numbers are pixels from 50*50px images. - [[13 160 239 192 219 199 4 60..][....][....][....][....]]
The size of this dataset is quite significant (~12000 elements).
I am trying to build bery basic binary classificator which will give appropriate results. Lets say I wanna choose non deep learning but some supervised method.
Is it suitable in this case? I've already tried SVM of sklearn with various parameters. But the outcome is inappropriately inacurate and consists mainly of 1: [1,1,1,1,1,0,1,1,1,....]
What is the right approach? Isnt a size of dataset enough to get a nice result with supervised algorithm?

You should probably post this on cross-validated:
But as a direct answer you should probably look into sequence to sequence learners as it has been clear to you SVM is not the ideal solution for this.
You should look into Markov models for sequential learning if you dont wanna go the deep learning route, however, Neural Networks have a very good track record with image classification problems.
Ideally for a Sequential learning you should try to look into Long Short Term Memory Recurrent Neural Networks, and for your current dataset see if pre-training it on an existing data corpus (Say CIFAR-10) may help.
So my recomendation is give Tensorflow a try with a high level library such as Keras/SKFlow.
Neural Networks are just another tool in your machine learning repertoire and you might aswell give them a real chance.
An Edit to address your comment:
Your issue there is not a lack of data for SVM,
the SVM will work well, for a small dataset, as it will be easier for it to overfit/fit a separating hyperplane on this dataset.
As you increase your data dimensionality, keep in mind that separating it using a separating hyperplane becomes increasingly difficult[look at the curse of dimensionality].
However if you are set on doing it this way, try some dimensionality reduction
such as PCA.
Although here you're bound to find another fence-off with Neural Networks,
since the Kohonen Self Organizing Maps do this task beautifully, you could attempt to
project your data in a lower dimension therefore allowing the SVM to separate it with greater accuracy.
I still have to stand by saying you may be using the incorrect approach.

Semi-supervised Gaussian mixture model clustering in Python

I have images that I am segmenting using a gaussian mixture model from scikit-learn. Some images are labeled, so I have a good bit of prior information that I would like to use. I would like to run a semi-supervised training of a mixture model, by providing some of the cluster assignments ahead of time.
From the Matlab documentation, I can see that Matlab allows initial values to be set. Are there any python libraries, especially scikit-learn approaches that would allow this?

The standard GMM does not work in a semi-supervised fashion. The initial values you mentioned is likely the initial values for the mean vectors and covariance matrices for the gaussians which will be updated by the EM algorithm.
A simple hack will be to group your labeled data based on their labels and individually estimate mean vectors and covariance matrices for them and pass these as the initial values to your MATLAB function (scikit-learn does not allow this as far as I'm aware). Hopefully this will position your Gaussians at the "correct locations". The EM algorithm will then take it from there to adjust these parameters.
The downside of this hack is that it does not guarantee that it will respect your true label assignment, hence even if a data point is assigned a particular cluster label, there is a chance that it might be re-assigned to another cluster. Also, noise in your feature vectors or labels could also cause your initial Gaussians to cover a much larger region than it is suppose to, hence wrecking havoc on the EM algorithm. Also, if you do not have sufficient data points for a particular cluster, your estimated covariance matrices might be singular, hence breaking this trick altogether.
Unless it is a must for you to use GMM to cluster your data (for e.g., you know for sure that gaussians model your data well), then perhaps you can just try the semi-supervised methods in scikit-learn . These will propagate the labels based on feature similarities to your other data point. However, I doubt this can handle large dataset as it requires the graph laplacian matrix to be built from pairs of samples, unless there is some special implementation trick to handle this in scikit-learn.

Scikit Learn Variable Bias

I am using Scikit to make some prediction on a very large set of data. The data is very wide, but not very long so I want to set some weights to the parts of the data. If I know some parts of the data are more important then other parts how should I inform SCikit of this, or does it kinda break the whole machine learning approach to do some pre-teaching.

The most straightforward way of doing this is perhaps by using Principal Component Analysis on your data matrix X. Principal vectors form an orthogonal basis of X, and they are each one a linear combination of the original feature space (normally columns) of X. The decomposition is such that each principal vector has a corresponding eigenvalue (or singular value depending on how you compute PCA) a scalar that reflects how much reconstruction can be made solely on the basis of that principal vector alone, in a least-squares sense.
The magnitude of coefficients of principal vectors can be interpreted as importance of the individual features of your data, since each coefficient maps 1:1 to a feature or column of the matrix. By selecting one or two principal vectors and examining their magnitudes, you may have a preliminary insight of what columns are more relevant, of course up to how much these vectors approximate the matrix.
This is the detailed scikit-learn API description. Again, PCA is a simple but just one way of doing it, among others.

This probably depends a bit on the machine learning algorithm you're using -- many will discover feature importances on their own (as elaborated via the feature_importances_ property in random forest and others).
If you're using a distance-based measure (e.g. k-means, knn) you could manually weight the features differently by scaling the values of each feature accordingly (though it's possible scikit does some normalization...).
Alternatively, if you know some features really don't carry much information you could simply eliminate them, though you'd lose any diagnostic value these features might unexpectedly bring. There are some tools in scikit for feature selection that might help make this kind of judgement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.