I'm working on a project where I'm tasked to find anomalous data (count of people) across different dimensions (categorical i.e country, occupation and a few more) and different days.
Below is a sample of the data
count is count for people per day, country and occupation
How do I go about this? Any recommended Python libraries or models? I found lots of tutorials on multivariate time series analysis but my data isn't multivariate time series as the categorical variables in this dataset do not depend on time.
You can try with LSTM, BiRNN, GRU with multivariable time-series prediction.
You can use tensorflow or pytorch to build the model.
Sklearn has multiple possibilities. You could take a look at Isolation Forest.
Related
I am new to timeseries and I have a problem, I have a dataset of 3 columns, time, category, frequency of this category. The time is from 2016 to end of 2017. I need to forecast the frequency of each category during 2018: Dataset:
I need to use Neural Prophet to forecast the frequency of these data and I know only how to forecast ds,y. So Please advice on how to do this task using neural prophet
Thanks
As mentioned in Neuraprophet docs, here https://neuralprophet.com/model-overview/:
If you have many series that you expect to produce forecasts for, you
need to do this one at a time.
In your case you have multiple timeseries i.e. you have a separate timeseries corresponding to each primary_industry. So if you are going to apply neuralprophet on this dataset you will have to fit the model on each industry separately.
Is there any way to use multiple time-series to train one model and use this model for predictions given a new time-series as an input? It is rather a theoretical question but did not know where else to post it.
It's theoretically possible, nevertheless every time-series has it's own components about seasonality, stationarity, frequency. (In case you talk about mixing series).
I've seen some work using wavelets-decomposition, deep-learning, time-series and uses several datasets and weights to train the model. But the time-series are similar, same metric different times (aka Temperature in a city from 2000-2001, 2005-2007).
I found some library called darts
I'm working with a company on a project to develop ML models for predictive maintenance. The data we have is a collection of log files. In each log file we have time series from sensors (Temperature, Pressure, MototSpeed,...) and a variable in which we record the faults occurred. The aim here is to build a model that will use the log files as its input (the time series) and to predict whether there will be a failure or not. For this I have some questions:
1) What is the best model capable of doing this?
2) What is the solution to deal with imbalanced data? In fact, for some kind of failures we don't have enough data.
I tried to construct an RNN classifier using LSTM after transforming the time series to sub time series of a fixed length. The targets were 1 if there was a fault and 0 if not. The number of ones compared to the number of zeros is negligible. As a result, the model always predicted 0. What is the solution?
Mohamed, for this problem you could actually start with traditional ML models (random forest, lightGBM, or anything of this nature). I recommend you focus on your features. For example you mentioned Pressure, MototSpeed. Look at some window of time going back. Calculate moving averages, min/max values in that same window, st.dev. To tackle this problem you will need to have a set of healthy features. Take a look at featuretools package. You can either use it or get some ideas what features can be created using time series data. Back to your questions.
1) What is the best model capable of doing this? Traditional ML methods as mentioned above. You could also use deep learning models, but I would first start with easy models. Also if you do not have a lot of data I probably would not touch RNN models.
2) What is the solution to deal with imbalanced data? You may want to oversample or undersample your data. For oversampling look at the SMOTE package.
Good luck
I want to predict company's sales. I tried with LSTM but all the examples that I found only use two variables (time and sales).
https://www.kaggle.com/freespirit08/time-series-for-beginners-with-arima
This page mentioned that time series only use two variables but I think that is not suficient to build a good forecast. After this, I found different 'multiple features' options like polynomial regression with PolynomialFeatures from sklearn or regression trees. I haven't write a script with these last algorithms yet, then I wanna know your recommendations about what model to use.
Thanks.
You could try Facebook's Prophet, which allows you to take into account additional regressors, or Amazon's DeepAR.
But I also have seen forecasting models based not on ARIMA style time series but on simple linear regression with extensive feature engineering (features=store+product+historical values) in production.
Hope this helps.
I would recommend using Prophet. As this has certain advantages over conventional models like ARIMA:
It take cares of empty value well.
Tunning its parameters is way easier and intuition based.
Traditional time series forecasting model expects data points to be in consistent time interval. However, that’s not the case with “Prophet”. Time interval need not to be same throughout.
I have the bank data of around 4 years of different branches. I am trying to predict number of rows in daily and hourly level. I have issue_datetime (year, month, day, hour) as important features. I applied different regression techniques (linear, decision trees, random forest, xgb) using graph lab but could not get better accuracy.
I was also thinking to set the threshold based on past data like taking the mean of counts in daily, monthly level after removing outliers and set that as a threshold.
What is the best approach?
Since you have 1d time series data, it should be relatively easy to graph your data and look for interesting patterns.
Once you establish that there are some non-stationary aspects to your data, the class of models you are probably wanting to check out first are auto-regressive models, possibly with seasonal additions. ARIMA models are pretty standard for time-series data. http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/