I have a deep learning model which extracts features of the original time series data, then uses PCA to reduce the dimensionality to 2D, then perform clustering using GMM. I am then planning to use my clustered info to label a class of signals that I am interested in looking for in the original data. However, I'm having trouble wrapping my head around on how to do that since from my understanding, I have lost information after doing PCA. So is this possible, and if yes how would I go about it?
I first start with 3 columns of data, each with length 1780800. They are then reshaped to an array of size (108, 3, 16800) to be fed into the model.
The model I am referring to is as below:
Full research paper is https://www.nature.com/articles/s41467-020-17841-x.
It does not seem right if you are using PCA on time points as feature values. What you need is a feature extractor that first converts the time series to euclidean feature space (Check feature extraction for time series). Then, you can use basic clustering tools in sklearn and visualize it using TSNE to check if it makes sense or not. You need to validate each step before going into the next one.
Related
I have data that fast time fourier transform is applied.
(amplitudes at specific Hzs)
There are solutions on internet that CNN is applied to mel spectrogram, however, I see no solution that CNN is applied to Fast Fourier Transformed signal.
Is it possible that CNN is applied to Fast Fourier Transformed signals?
Or is it not possible because CNN is considering temporal attribute?
Thanks!
I'm assuming each row of your spreadsheet is IID, e.g. it wouldn't change the problem to re-order the rows in that spreadsheet.
In this case you have a pretty typical ML problem. The fact that the FFT has already been applied and specific frequency responses (columns) have been extracted is a process called "feature engineering". Prior to the common use of neural networks, this was a standard step in all machine learning problems and remains common to a great many domains.
With data that has been feature engineered, you should look to traditional ML algorithms. Random Forests, XGBoost, and Linear Regression come to mind. A fully connected neural network is also appropriate, but I would typically expect it to under-perform other ML methods.
The hallmark of a CNN is that it operates on an ordered sequence of data. In your case the raw data, from which your dataset was derived, would be appropriate for a CNN. In a sound file you have a 1D sequence of information. You could not re-order the data in the time dimension without fundamentally changing its meaning.
A 2D CNN operates over an image where the pixel order in X and Y cannot be changed. Again the sequential order of the data matters. The same applies for 3D CNNs.
Be aware that the application of a FFT has fundamentally biased your solution by representing it only in a limited set of frequency responses. All feature engineering is fundamentally biasing the problem, presumably in a well thoughout-out way. However, it's entirely possible that other useful signals in the data exist, which aren't expressed by the FFT # 10, 20, 30 Hz, etc. The CNN has the capacity to learn its own version of an FFT as well as other non cyclic patterns. Typically, the lack of a feature engineering step is the key differentiator between the CNN and traditional ML algorithms.
I'm wondering how to train a Multivariate Bayesian Structural Time Series (BSTS) model that automatically performs feature selection on hundreds of input time series using Tensorflow Probability.
The TF-Probability BSTS blog post shows how to include seasonal effects alongside a single input feature:
...
temp_effect = sts.LinearRegression(
design_matrix=tf.reshape(temp - np.mean(temp),
(-1, 1)), name='temp_effect')
...
model = sts.Sum([..., temp_effect,...],
observed_time_series=observed_time_series)
But what about when there are multiple input time series?
Reading through the documentation makes it seem that with many inputs the SparseLinearRegression would be preferrable, which makes sense, but how should I adapt my code?
The documentation for both LinearRegression and SparseLinearRegression method suggests using design_matrix=tf.stack([series1, series2], axis=-1), weights_prior_scale=0.1), but since that's different from how TF-Probability's own blog post uses it I am unsure if that is the best way to go.
Should I be adding all (several hundred) input features inside the design_matrix of a single SparseLinearRegression, or should I be adding a separate LinearRegression for each feature and then use sts.Sum() to combine them all into the model? Though I would like the functionality of visualizing the impact of each feature, I am most interested in having the model automatically perform feature selection and generate weights for the remaining features which I can have access to.
I do not understand how model.predict(...) works on a a time series forecasting problem. I usually use it with a CNN and it is pretty straight forward but for time series I don't understand what it returns.
For example I am currently doing an exercise where I have to forecast the power consumption based on data using LTSM, I succeeded to train my model but when I want to know what the power cusumption will be tomorrow (so no data except past ones) I don't know what input to use.
Traditional ML algorithms, which you might be more used to, generally expect the data in a 2D structure like this:
For sequential data, such as a stream of timed events associated with each user, it’s also possible to create a lagged 2D dataset, where the history of different features for different IDs is aligned into single rows, with this structure:
This can be a good way to work because once your data is in the correct shape you can use it with fast to set up and train models. However, models using features engineered using this approach generally don’t have any capacity to “learn” anything about the natural sequence of the data. To something like a tree-based ensemble model receiving this format, feature 1 at time t and time t-1 in the example above are treated completely independently and this can severely limit the model’s predictive power.
There are types of deep learning architecture specifically designed for modelling sequence data called recurrent neural nets (RNN). Two of the most popular cells to use in these are long short term memory (LSTM) and gated recurrent units (GRU). There’s a good post on how to understand how LSTM cells work here, but the TL;DR is they have a structure that allows them to learn from sequences of data.
Cells like LSTM expect a 3D tensor of input data. We arrange it so that one axis has the data features along it, the second axis has the sequence steps (like time ticks) and the third axis has each of the different examples we want to predict a single "y" value for stacked along it. Using the same type of dataset as the lagged example above, it would look something like this:
The ability to learn patterns in sequences of data like this is particularly beneficial for both time series and text data, which are naturally ordered.
To return to your original question, when you want to predict something in your test set you'll need to pass it sequences represented just like the ones it was trained in (this is a reasonably good rule of supervised learning in general). For example, if the data is trained like the last example above, you'll need to pass it a 2D example for each ID you want to make a prediction for.
You should explore the way the original training data is represented and make sure you understand it well, as you'll need to create the same shape of data to make predictions. X_train.shape is a great place to start, if you have your training data in a pandas dataframe or numpy arrays, to see what the dimensionality is, and then you can inspect entries along each axis until you get a good feel for the data it contains.
I have a trouble with supervised classification method I am using for my data.
Let's think we are training our algorithm with a data (N=70) after reducing the dimensions from 100 to 2 by using LDA dimensionality reduction method.
Now, we would like to predict the class of the 71st sample, whose class is completely unknown to us. However, it has 100 features still; so we have to reduce its dimensions.
That seems easy in the first look: I can use the transform characteristics of the first reduction. For example, in python:
clf.fit(X,Y)
lda = LinearDiscriminantAnalysis(n_components=2)
flda = lda.fit(X, Y)
X_lda = flda.transform(X)
I had stored the fitting properties of training data. X_p is my single sample. So when I use 'flda' again for transformation, same fitting information is used:
X_p = flda.transform(X_p.reshape(1, -1))
However, it doesn't predict properly! To test, I used my first N=70 data. Extract one of them (so now, it is N=69). I used 70th data as test sample. And it didn't predict properly again.
When I compared my previous data (N=70) and the new one (N=69), I saw that every single number changed! If I am not missing something (I hope I am missing and you can tell me what I am missing) LDA dimensionality reduction is not applicable for real machine learning applications, because only one data could change everything.
As a note, the plot of the reduced data doesn't change, despite all numbers significanly change (which means the relative locations of points do not change).
Do you know how LDA dimensionality reduction is used in real machine learning applications? What must I do, to test one sample in the following order:
Reduce dimensions to 2 for training data
Reduce dimensions to 2 for test data
Predict!
without using the same tranformation charactheristics?
Lets say I have two arrays in dataset:
1) The first one is array classified as (0,1) - [0,1,0,1,1,1,0.....]
2) And the second array costists of grey scale image vectors with 2500 elements in each(numbers from 0 to 300). These numbers are pixels from 50*50px images. - [[13 160 239 192 219 199 4 60..][....][....][....][....]]
The size of this dataset is quite significant (~12000 elements).
I am trying to build bery basic binary classificator which will give appropriate results. Lets say I wanna choose non deep learning but some supervised method.
Is it suitable in this case? I've already tried SVM of sklearn with various parameters. But the outcome is inappropriately inacurate and consists mainly of 1: [1,1,1,1,1,0,1,1,1,....]
What is the right approach? Isnt a size of dataset enough to get a nice result with supervised algorithm?
You should probably post this on cross-validated:
But as a direct answer you should probably look into sequence to sequence learners as it has been clear to you SVM is not the ideal solution for this.
You should look into Markov models for sequential learning if you dont wanna go the deep learning route, however, Neural Networks have a very good track record with image classification problems.
Ideally for a Sequential learning you should try to look into Long Short Term Memory Recurrent Neural Networks, and for your current dataset see if pre-training it on an existing data corpus (Say CIFAR-10) may help.
So my recomendation is give Tensorflow a try with a high level library such as Keras/SKFlow.
Neural Networks are just another tool in your machine learning repertoire and you might aswell give them a real chance.
An Edit to address your comment:
Your issue there is not a lack of data for SVM,
the SVM will work well, for a small dataset, as it will be easier for it to overfit/fit a separating hyperplane on this dataset.
As you increase your data dimensionality, keep in mind that separating it using a separating hyperplane becomes increasingly difficult[look at the curse of dimensionality].
However if you are set on doing it this way, try some dimensionality reduction
such as PCA.
Although here you're bound to find another fence-off with Neural Networks,
since the Kohonen Self Organizing Maps do this task beautifully, you could attempt to
project your data in a lower dimension therefore allowing the SVM to separate it with greater accuracy.
I still have to stand by saying you may be using the incorrect approach.