My question is something that I didn't encounter anywhere, I've been wondering if it was possible for a TF Model to determinate values between 2 dates that have real / validated values assigned to them.
I have an example :
Let's take the price of Nickel, here's it's chart the last week :
There is no data for the two following dates : 19/11 and 20/11
But we have the data points before and after.
So is it possible to use the datas from before and after these 2 points to guess the values of the 2 missing dates ?
Thank you a lot !
It would be possible to create a machine learning model to predict the prices given a dataset of previous prices. Take a look at this post for instance. You would have to modify it slightly such that it predicts the prices in the gaps given previous and upcoming prices.
But for the example you gave assuming the dates are of this year 2022, these are a Saturday and Sunday, the stock market is closed on the weekends, hence there is not price of the item. Also notice that there are other days in the year where there is not trading occurring, think about holidays, then there also is not price of course.
Related
I work in the gym space, and I'm trying to predict the numbers of gym leavers we will see next month, the following month etc.
The number of leavers are directly be impacted by the number of joiners we had 13 months ago (for a 12 month contract) or 4 months ago (for a 3 month) contract. As you need to give a months notice.
There is some seasonality in Jan/Sept, but ultimately the type of contract a member joins only and length is the biggest contribution to how long they'd likely stay.
We have over a hundred permutations on contract types and length.
What is the best way to model this in python, and which methods.
I've created a proof of concept model in excel, which looks at historic churn rates, a month 1/2/3etc by contract, and can apply that to our current member mix and their tenure to predict how many will leave this month but it's extremely messy on lots of worksheets. But it is accurate, and outside irregular macro events is very accurate in predicting Leavers within the next month..
I've tried a linear regression based on the leaver volume this month, against all the Joiners in t-1, t-2... t-64.. but it spits out a bunch of co-efficients which don't provide any reasonable number. Some are (+)ive and some (-)ive. But i thought over a longer enough period the numbers of joiners could show estimate leavers.
I've thought Time series next, but struggle to understand how to set the data up to run that. As i have some many contract mixes, and in one way, i need to look at the data and say, this person is on this contract, has been with us X months, so has this chance of leaving.
I have a confusion about how to forecast future steps using MA. All the articles out there validate the model by only considering the historical data that were OBSERVED. However, once we validate an MA model has a good performance on our train data, we need to set up a pipeline for future forecats.
The problem is that for n-step ahead future forecast, all the data is observed only for the first forecast. What happens to the other n-1 steps? Here is an example.
Lets say we have a dataset from Jan 2021 to June 2022 and based on our experiments, we noticed a moving average using the last 3 values leads to the best error for a 3-step ahead forecast horizon.
Now we want to forecast for July, August and September. For July, we already observed the prior 3 months values so we can get a mean of that and its the forecast. However, actual July data is missing for August forecast. What happens here? Should we consider the forecasted value for July and the actuals for June and May to find the value for August?
I am sorry if my question is trivial but I am trying to code it myself in python so I want to make sure I am doing it the right way.
I am pretty new to ML and completely new to creating my own models. I have went through tensorflows time-series forecasting tutorial and other LSTM time series examples on how to predict with multi-variate inputs. After trying multiple examples I think I realized that this is not what I want to achieve.
My problem involves a dataset that is in hourly intervals and includes 4 different variables with the possibility of more in the future. One column being the datetime. I want to train a model with data that can range from one month to many years. I will then create an input set that involves a few of the variables included during training with at least one of them missing which is what I want it to predict.
For example if I had 2 years worth of solar panel data and weather conditions from Jan-12-2016 to Jan-24-2018 I would like to train the model on that then have it predict the solar panel data from May-05-2021 to May-09-2021 given the weather conditions during that date range. So in my case I am not necessarily forecasting but using existing data to point at a certain day of any year given the conditions of that day at each hour. This would mean I should be able to go back in time as well.
Is this possible to achieve using tensorflow and If so are there any resources or tutorials I should be looking at?
See Best practice for encoding datetime in machine learning.
Relevant variables for solar power seem to be hour-of-day and day-of-year. (I don't think separating into month and day-of-month is useful, as they are part of the same natural cycle; if you had data over spending habits of people who get paid monthly, then it would make sense to model the month cycle, but Sun doesn't do anything particular monthly.) Divide them by hours-in-day and days-in-year respectively to get them to [0,1), multiply by 2*PI to get to a circle, and create a sine column and a cosine column for each of them. This gives you four features that capture the cyclic nature of day and year.
CyclicalFeatures discussed in the link is a nice convenience, but you should still pre-process your day-of-year into a [0,1) interval manually, as otherwise you can't get it to handle leap years correctly - some years days-in-year are 365 and some years 366, and CyclicalFeatures only accepts a single max value per column.
Dataset Image
I have been working on predicting water usage on a weekly basis. I have starting day of every week in one column and water consumed in another column, I want my model prediction in such a way that I give the input in date time format like 21-01-2021 (say)in the predict() function. Which model and how can I achieve this?
I've previously tried with ARIMA model in time series analysis.
Most of the ML/DL algorithms use floating point input values, that said and based on most of the datasets that I've seen, you should do some data transformation and compute a time delta (you'd see something like TimeDT). That's done setting a setting a base date (the first date that appears in you train data) compute next row delta based on your criteria (seconds, hours or days elapsed, etc).
TL;DR
As I understood you're computing based on the day of the week (correct me please if I'm wrong), so your time delta would be daily, restarting each week? the most appropriated in that case is based on the calendar, decompose the date and add two new features: week_of_year and day_of_week.
Is week_of_year important? well, in summer might be a tendency on consume more water, that's something your dataset can tell you.
I have a data set which contains site usage behavior of users over a period of six months. It contains data about:
Number of pages viewed
Number of unique cookies associated with each user
Different number of OS, Browsers used
Different number of cities visited
Everything over here is collected on a six month time frame. I have used this data to train a model to predict a target variable 'y'. Everything is numeric in format.
Now I know since its a six month data, and the model is built upon this 6 months of data, I can use this to predict on the next six month data to get target variable y.
My question is that if instead of using it to predict on six month time frame, I use the model to predict on monthly time frame, will it give me incorrect results?
My logic tells me yes, as for example, I used tree method such as Decision tree and Random forest, these algorithms kind of makes thresholds to give output "0/1". Now the variables I mentioned above such as number of cookies associated, OS, Browser etc would have different values if we look at it from one month stand point and if we look at it from 6 months standpoint. For example, number of unique cookies associated with a user would be less if seen over a month where as it will be more if seen from 6 months standpoint.
But I am confused as to if the model will automatically adjust these values while running on monthly data or not. Request you to help me understand the if I am thinking this right or wrong. Also please provide logical explanation if possible.
Thanks.
Is your minimum unit of mesurement 6 months ? I hope not, but if yes, then I would sugges that you dont try to predict the next 1 month.
Seasonality within a year aside, you would need daily volume measurements.. I would be very worried to build anything on monthly or even weekly numbers.
In terms of modelling techniques, please stick to simple regression methods like kungphu suggests.