Is there any way to use multiple time-series to train one model and use this model for predictions given a new time-series as an input? It is rather a theoretical question but did not know where else to post it.
It's theoretically possible, nevertheless every time-series has it's own components about seasonality, stationarity, frequency. (In case you talk about mixing series).
I've seen some work using wavelets-decomposition, deep-learning, time-series and uses several datasets and weights to train the model. But the time-series are similar, same metric different times (aka Temperature in a city from 2000-2001, 2005-2007).
I found some library called darts
Related
I'm working on a project where I'm tasked to find anomalous data (count of people) across different dimensions (categorical i.e country, occupation and a few more) and different days.
Below is a sample of the data
count is count for people per day, country and occupation
How do I go about this? Any recommended Python libraries or models? I found lots of tutorials on multivariate time series analysis but my data isn't multivariate time series as the categorical variables in this dataset do not depend on time.
You can try with LSTM, BiRNN, GRU with multivariable time-series prediction.
You can use tensorflow or pytorch to build the model.
Sklearn has multiple possibilities. You could take a look at Isolation Forest.
I am trying to get the confidence intervals from an XGBoost saved model in a .tar.gz file that is created using python XGBoost library.
The problem is that the model has already been fitted, and I dont have training data any more, I just have inference or serving data to predict. All the examples that I found entail using a training and test data to create either quantile regression models, or bagged models, but I dont think I have the chance to do that.
Why your desired approach will not work
I assume we are talking about regression here. Given a regression model that you cannot modify, I think you will not be able to achieve your desired result using only the given model. The model was trained to calculate a continuous value that appoximates some objective value (i.e., its true value) based on some given input. Nothing more.
Possible solution
The only workaround I can think of would be to train two more models. These model's training goal would be to predict the quality of the output of your given model. One would calculate the upper bound of a given (i.e., predefined by you at training time) confidence interval and the other one the lower bound. This would probably include a lot of feature engineering. One would probably like to find features that correlate with the prediction quality of the original model.
I'm working with a company on a project to develop ML models for predictive maintenance. The data we have is a collection of log files. In each log file we have time series from sensors (Temperature, Pressure, MototSpeed,...) and a variable in which we record the faults occurred. The aim here is to build a model that will use the log files as its input (the time series) and to predict whether there will be a failure or not. For this I have some questions:
1) What is the best model capable of doing this?
2) What is the solution to deal with imbalanced data? In fact, for some kind of failures we don't have enough data.
I tried to construct an RNN classifier using LSTM after transforming the time series to sub time series of a fixed length. The targets were 1 if there was a fault and 0 if not. The number of ones compared to the number of zeros is negligible. As a result, the model always predicted 0. What is the solution?
Mohamed, for this problem you could actually start with traditional ML models (random forest, lightGBM, or anything of this nature). I recommend you focus on your features. For example you mentioned Pressure, MototSpeed. Look at some window of time going back. Calculate moving averages, min/max values in that same window, st.dev. To tackle this problem you will need to have a set of healthy features. Take a look at featuretools package. You can either use it or get some ideas what features can be created using time series data. Back to your questions.
1) What is the best model capable of doing this? Traditional ML methods as mentioned above. You could also use deep learning models, but I would first start with easy models. Also if you do not have a lot of data I probably would not touch RNN models.
2) What is the solution to deal with imbalanced data? You may want to oversample or undersample your data. For oversampling look at the SMOTE package.
Good luck
I want to predict company's sales. I tried with LSTM but all the examples that I found only use two variables (time and sales).
https://www.kaggle.com/freespirit08/time-series-for-beginners-with-arima
This page mentioned that time series only use two variables but I think that is not suficient to build a good forecast. After this, I found different 'multiple features' options like polynomial regression with PolynomialFeatures from sklearn or regression trees. I haven't write a script with these last algorithms yet, then I wanna know your recommendations about what model to use.
Thanks.
You could try Facebook's Prophet, which allows you to take into account additional regressors, or Amazon's DeepAR.
But I also have seen forecasting models based not on ARIMA style time series but on simple linear regression with extensive feature engineering (features=store+product+historical values) in production.
Hope this helps.
I would recommend using Prophet. As this has certain advantages over conventional models like ARIMA:
It take cares of empty value well.
Tunning its parameters is way easier and intuition based.
Traditional time series forecasting model expects data points to be in consistent time interval. However, that’s not the case with “Prophet”. Time interval need not to be same throughout.
I do not understand how model.predict(...) works on a a time series forecasting problem. I usually use it with a CNN and it is pretty straight forward but for time series I don't understand what it returns.
For example I am currently doing an exercise where I have to forecast the power consumption based on data using LTSM, I succeeded to train my model but when I want to know what the power cusumption will be tomorrow (so no data except past ones) I don't know what input to use.
Traditional ML algorithms, which you might be more used to, generally expect the data in a 2D structure like this:
For sequential data, such as a stream of timed events associated with each user, it’s also possible to create a lagged 2D dataset, where the history of different features for different IDs is aligned into single rows, with this structure:
This can be a good way to work because once your data is in the correct shape you can use it with fast to set up and train models. However, models using features engineered using this approach generally don’t have any capacity to “learn” anything about the natural sequence of the data. To something like a tree-based ensemble model receiving this format, feature 1 at time t and time t-1 in the example above are treated completely independently and this can severely limit the model’s predictive power.
There are types of deep learning architecture specifically designed for modelling sequence data called recurrent neural nets (RNN). Two of the most popular cells to use in these are long short term memory (LSTM) and gated recurrent units (GRU). There’s a good post on how to understand how LSTM cells work here, but the TL;DR is they have a structure that allows them to learn from sequences of data.
Cells like LSTM expect a 3D tensor of input data. We arrange it so that one axis has the data features along it, the second axis has the sequence steps (like time ticks) and the third axis has each of the different examples we want to predict a single "y" value for stacked along it. Using the same type of dataset as the lagged example above, it would look something like this:
The ability to learn patterns in sequences of data like this is particularly beneficial for both time series and text data, which are naturally ordered.
To return to your original question, when you want to predict something in your test set you'll need to pass it sequences represented just like the ones it was trained in (this is a reasonably good rule of supervised learning in general). For example, if the data is trained like the last example above, you'll need to pass it a 2D example for each ID you want to make a prediction for.
You should explore the way the original training data is represented and make sure you understand it well, as you'll need to create the same shape of data to make predictions. X_train.shape is a great place to start, if you have your training data in a pandas dataframe or numpy arrays, to see what the dimensionality is, and then you can inspect entries along each axis until you get a good feel for the data it contains.