time series anomaly detection steps-model-tools - python

I will have a device which will measure for example the temperature values every specific time intervals.
I want to train a model to understand which values do not belong to "normal" values and if so, raise an alert.So, I want an anomaly time series detection model.
At first, I thought of using a clustering model (kmeans, hierarchical).So, in the beginning of time, I will have many alerts.Later, some clusters will be created and hopefully I will have a good model!
But, since I don't have any experience on this I want to ask if this approach is right or what other approaches exist.And what kind of tools should I use (either Python either R).
I have read many links and a few papers and I can see that some people do not suggest kmeans do not use k means.
Also, I am not sure how/if to use Dynamic Time Warping Clustering.
I read a paper Clustering of Time Series Subsequences is Meaningless which states that under some conditions applying clustering is meaningless.
I also saw the tsouliers package , the twitter's anomaly detection but I am not sure as I said which approach/tools should I use.

I am also trying to get anomaly detection working. I started using ARMA model to solve this but they had lot of false positives. So I am trying to implement using PCA. Idea is to approximate the function using PCA components. I got this idea through this link. Hopefully this helps you. Will update to better answer when I implement this fully.

Related

Regression problem optimization using ML or DL

I have some data (data from sensors and etc.) from an energy system. consider the x-axis is temperature and the y-axis is energy consumption. Suppose we just have data and we don't have access to the mathematical formulation of the problem:
energy consumption vs temperature curve
In the above figure, it is absolutely obvious that the optimum point is 20. I want to predict the optimum point using ML or DL models. Based on the courses that I have taken I know that it's a regression supervised learning problem, however, I don't know how can I do optimization on this kind of problem.
I don't want you to write a code for this problem. I just want you to give me some hints and instructions about doing this optimization problem.
Also if you recommend any references or courses, I will welcome them to learn how to predict the optimum point of a regression supervised learning problem without knowing the mathematical formulation of the problem.
There are lots of ways that you can try when it comes to optimizing your model, for example, fine tuning your model. What you can do with fine tuning is to try different options that a model consists of and find the smallest errors or higher accuracy based on the actual and predicted data.
Using DecisionTreeRegressor model, you can try to use different split criterion, limit the minimum number of split & depth to see which give you the best predicted scores/errors. For neural network model, using keras, you can try different optimizers, try different loss functions, tune your parameters etc. and try all out as a combination of model.
As for resources, you can go Google, Youtube, and other platform to use keywords such as "fine tuning DNN model" and a lot of resources will pop up for your reference. The bottom line is that you will need to try out different models and fine tune your model until when you are satisfied with your results. The results will be based on your judgement and there is no right or wrong answers (i.e., errors are always there), it just completely up to you on how would you like to achieve your solution with handful of ML and DL models that you got. My advice to you is to spend more time on getting your hands dirty. It will be worth it in the long run. HFGL!

How do I make a good feature using machine learning on a timeseries forecast that has only traffic volume as an input?

So I have a time series that only has traffic volume. I've done FB prophet and neural prophet. They work okay, but I would like to do something using machine learning. So far I have the problem of trying to make my features. Using the classical dayofyear, month, etc does not give me good results. I have tried using shift where I get the average, minimum, and max of the two previous days. However that would work, but my problem is when I try to predict days in advance the feature doesn't really work for that since I cant get the average of that day. My main concern is trying to find a good feature that my predicting future dataframe also has. A picture of my data is included. Does anyone know how I would do this?
First of all, you have to clarify some definitions. FBProphet works on a mechanism that is the same as any machine learning algorithm that is, fitting the model and then predicting the output. Being an additive regression model with a piecewise linear or logistic growth curve trend, it can be considered as a Machine Learning method that allows us to predict a continuous outcome variable.
Secondly, I think you missed the most important word that your question was about - namely: Feature engineering.
Feature Engineering includes :
Process of using domain knowledge to Extract features (characteristics, properties, attributes) from raw data.
Process of transforming raw data into features that better represent the underlying problem to the predictive models, etc..
But it's very unlikely to use Machine Learning to do Feature engineering. You do Feature engineering in order to improve your Machine Learning model. Many techniques such as imputation, handling outliers, binning, log transform, one-hot encoding, grouping operations, feature split, scaling are hybrid methods using a statistical approach and/or domain knowledge.
Regarding your data, bearing in my mind that the seasonality is already handled by FBProphet, I am not confident if feature engineering transformations such as adding the day of the week, adding holidays periods, etc... could really help improve performance...
To conclude, it is not possible to create ex-nihilo new features that would outperform your model. Whether you process/transform your data or add external domain-knowledge dataset

Best model to predict failure using time series from sensors

I'm working with a company on a project to develop ML models for predictive maintenance. The data we have is a collection of log files. In each log file we have time series from sensors (Temperature, Pressure, MototSpeed,...) and a variable in which we record the faults occurred. The aim here is to build a model that will use the log files as its input (the time series) and to predict whether there will be a failure or not. For this I have some questions:
1) What is the best model capable of doing this?
2) What is the solution to deal with imbalanced data? In fact, for some kind of failures we don't have enough data.
I tried to construct an RNN classifier using LSTM after transforming the time series to sub time series of a fixed length. The targets were 1 if there was a fault and 0 if not. The number of ones compared to the number of zeros is negligible. As a result, the model always predicted 0. What is the solution?
Mohamed, for this problem you could actually start with traditional ML models (random forest, lightGBM, or anything of this nature). I recommend you focus on your features. For example you mentioned Pressure, MototSpeed. Look at some window of time going back. Calculate moving averages, min/max values in that same window, st.dev. To tackle this problem you will need to have a set of healthy features. Take a look at featuretools package. You can either use it or get some ideas what features can be created using time series data. Back to your questions.
1) What is the best model capable of doing this? Traditional ML methods as mentioned above. You could also use deep learning models, but I would first start with easy models. Also if you do not have a lot of data I probably would not touch RNN models.
2) What is the solution to deal with imbalanced data? You may want to oversample or undersample your data. For oversampling look at the SMOTE package.
Good luck

How to start to recognize object in opencv

My final goal is to dynamically recognize different object which is 2D and often has same appearance(2D deck game) in video. I was studying opencv-python tutorial, but there aren't any topic about this, so I want to know what topic, library or function should I learn to reach my goal. Thanks.
You can try several Machine Learning techniques. You may get inspiration from Viola-Jones or Histograms of Oriented Gradients + SVM algorithms (even though those algorithms solve a problem that may differ yours, I had plenty of insights from them). In other words, try "sliding" a window along a horizontal and vertical axes of predefined aspect ratio and try to recognize the Region Of Interest with a model of your choice (CNN, SVM, Logistic Regression etc.). But the problem may be that you will need to train a model, which may require a lot of data.
Or you can do a template matching, which is more of Image Processing problem rather than Machine Learning. It would not require dataset and training, but it will be sensitive to noises, lighting, and position.
Good luck!

NN: outputting a probability density function instead of a single value

This might sound silly but I'm just wondering about the possibility of modifying a neural network to obtain a probability density function rather than a single value when you are trying to predict a scalar. I know that when you are trying to classify images or words you can get a probability for each class, so I'm thinking there might be a way to do something similar with a continuous value and plot it. (Similar to the posterior plot with bayesian optimisation)
Such details could be interesting when deploying a model for prediction and could provide more flexibility than a single value.
Does anyone knows a way to obtain such an output?
Thanks!
Ok So I found a solution to this issue, though it adds a lot of overhead.
Initially I thought the keras callback could be of use but despite the fact that it provided the flexibility that I wanted i.e.: train only on test data or only a subset and not for every test. It seems that callbacks are only given summary data from the logs.
So the first step what to create a custom metric that would do the same calculation as any metric with the 2 arrays ( the true value and the predicted value) and once those calculations are done, output them to a file for later use.
Then once we found a way to gather all the data for every sample, the next step was to implement a method that could give a good measure of error. I'm currently implementing a handful of methods but the most fitting one seem to be bayesian bootstraping ( user lmc2179 has a great python implementation). I also implemented ensemble methods and gaussian process as alternatives or to use as other metrics and some other bayesian methods.
I'll try to find if there are internals in keras that are set during the training and testing phases to see if I can set a trigger for my metric. The main issue with using all the data is that you obtain a lot of unreliable data points at the start since the network is not optimized. Some data filtering could be useful to remove a good amount of those points to improve the results of the error predictors.
I'll update if I find anything interesting.

Categories

Resources