Calculating EWMA for athlete training load data with Python - python

Good day all!
I am new to the programming world and I'm struggling with the following.
Training load is calculated by multiplying the sessions rating (sRPE) with the duration of the session in minutes. The acute load of the past 7 days is then compared to the chronic load of the past 28 days. Below is an example of such a table:
Training load calculations
The challenge comes when the EWMA (exponentially weighted moving averages) must be calculated (formula is below). Lambda = 2/(number of days + 1).
EWMA formula
The part I'm struggling with is the EWMA of the previous day. In Excel, I could select the individual cell or a Vlookup to reference to the athlete name and training date.
In order to select the previous EWMA, the athlete name and training date must be used to select the correct value for each athlete's workload ratio.
How can I do this with Python? I tried to use Pandas (not very well), but to no avail :(

Related

What is window_size in time-series and What are the advantages and disadvantages of having small and large window_size?

I am quite beginner in machine learning. I have tried a lot to understand this concept but I am unable to understand it on google. I need to understand this concept in easy way.
Please explain this question in easy words and much detail.
This question is best suited for stack exchange as it is not a specific coding question.
Window size is the duration of observations that you ask an algorithm to consider when learning a time series. For example, if you need to predict tomorrow's temperature, and you use a window of 5 days, then the algorithm will divide your entire time series into segments of 6 days (5 training days and 1 prediction days) and try to learn how to use only 5 days of data to predict next 1 day based on the historic records.
Advantage of short window:
You get more samples out of the time series so that your estimation of short term effects are more reliable (100 days historic time series will provide you around 95 samples if you are using a 5 day window - so the model is more certain about what the influence of the past 5 days has on next day temperature)
Advantage of long window
long windows allow you to better learn seasonal and trend effects (think about events that happen yearly, monthly...etc). If your window is small - say 5 days, your model will not learn any seasonal effect that occurs monthly. However, if your window is 60 days, then every sample of data that you feed to the model would have at least 2 occurrences of the monthly seasonal effect, and this would enable your model to learn such seasonality.
The downside of long window is the number of samples decreases. Assuming an 100 day time series, 60 day window will only yield 40 samples of data. This would mean every parameter of your model is now fitted on much smaller sample of data and may be reduce the reliability of the model.
"window size" typically refers to the number of time periods that are used to calculate a statistic or model.
Advantages and Disadvantages of various window sizes relate to the balance between:
the sensitivity to changes in the data vs susceptibility to noise & outliers
If you have ever dealt with moving average indicators on the stock market, you will understand that each window size has a purpose, and these different window sizes are often used in combination to get a more holistic view/understanding. eg. MA20 vs MA50 vs MA100. Each of these indicators are using a different window size to calculate the moving average of the stock of interest.
Image Source: Yahoo Finance

What model to predict GYM Leavers based on recent GYM Joiners? HELP!! Time series vs Multiple Linear Regression

I work in the gym space, and I'm trying to predict the numbers of gym leavers we will see next month, the following month etc.
The number of leavers are directly be impacted by the number of joiners we had 13 months ago (for a 12 month contract) or 4 months ago (for a 3 month) contract. As you need to give a months notice.
There is some seasonality in Jan/Sept, but ultimately the type of contract a member joins only and length is the biggest contribution to how long they'd likely stay.
We have over a hundred permutations on contract types and length.
What is the best way to model this in python, and which methods.
I've created a proof of concept model in excel, which looks at historic churn rates, a month 1/2/3etc by contract, and can apply that to our current member mix and their tenure to predict how many will leave this month but it's extremely messy on lots of worksheets. But it is accurate, and outside irregular macro events is very accurate in predicting Leavers within the next month..
I've tried a linear regression based on the leaver volume this month, against all the Joiners in t-1, t-2... t-64.. but it spits out a bunch of co-efficients which don't provide any reasonable number. Some are (+)ive and some (-)ive. But i thought over a longer enough period the numbers of joiners could show estimate leavers.
I've thought Time series next, but struggle to understand how to set the data up to run that. As i have some many contract mixes, and in one way, i need to look at the data and say, this person is on this contract, has been with us X months, so has this chance of leaving.

Create a ML model with tensorflow that predicts a values at any given time range at hourly intervals

I am pretty new to ML and completely new to creating my own models. I have went through tensorflows time-series forecasting tutorial and other LSTM time series examples on how to predict with multi-variate inputs. After trying multiple examples I think I realized that this is not what I want to achieve.
My problem involves a dataset that is in hourly intervals and includes 4 different variables with the possibility of more in the future. One column being the datetime. I want to train a model with data that can range from one month to many years. I will then create an input set that involves a few of the variables included during training with at least one of them missing which is what I want it to predict.
For example if I had 2 years worth of solar panel data and weather conditions from Jan-12-2016 to Jan-24-2018 I would like to train the model on that then have it predict the solar panel data from May-05-2021 to May-09-2021 given the weather conditions during that date range. So in my case I am not necessarily forecasting but using existing data to point at a certain day of any year given the conditions of that day at each hour. This would mean I should be able to go back in time as well.
Is this possible to achieve using tensorflow and If so are there any resources or tutorials I should be looking at?
See Best practice for encoding datetime in machine learning.
Relevant variables for solar power seem to be hour-of-day and day-of-year. (I don't think separating into month and day-of-month is useful, as they are part of the same natural cycle; if you had data over spending habits of people who get paid monthly, then it would make sense to model the month cycle, but Sun doesn't do anything particular monthly.) Divide them by hours-in-day and days-in-year respectively to get them to [0,1), multiply by 2*PI to get to a circle, and create a sine column and a cosine column for each of them. This gives you four features that capture the cyclic nature of day and year.
CyclicalFeatures discussed in the link is a nice convenience, but you should still pre-process your day-of-year into a [0,1) interval manually, as otherwise you can't get it to handle leap years correctly - some years days-in-year are 365 and some years 366, and CyclicalFeatures only accepts a single max value per column.

How to build an prediction model which takes input in date_time format?

Dataset Image
I have been working on predicting water usage on a weekly basis. I have starting day of every week in one column and water consumed in another column, I want my model prediction in such a way that I give the input in date time format like 21-01-2021 (say)in the predict() function. Which model and how can I achieve this?
I've previously tried with ARIMA model in time series analysis.
Most of the ML/DL algorithms use floating point input values, that said and based on most of the datasets that I've seen, you should do some data transformation and compute a time delta (you'd see something like TimeDT). That's done setting a setting a base date (the first date that appears in you train data) compute next row delta based on your criteria (seconds, hours or days elapsed, etc).
TL;DR
As I understood you're computing based on the day of the week (correct me please if I'm wrong), so your time delta would be daily, restarting each week? the most appropriated in that case is based on the calendar, decompose the date and add two new features: week_of_year and day_of_week.
Is week_of_year important? well, in summer might be a tendency on consume more water, that's something your dataset can tell you.

Xgboost with simple time series dataframe Python

I have this dataframe with the hourly production of a plant. I have many 0 values in the dataframe (sometimes the plant doesn't work for external reasons).
If anyone want to have a glance on the dataframe here the link (https://www.mediafire.com/file/55915xs2acdl7h4/Dataframe.zip/file)
I would like to make a prediction for the next 24 hours, but I'm having many troubles with xgboost. After splitting the data (X_train/y_train; X_test/y_test) the algorithm is stuck.
1) Should I change algorithm?
2) Am I missing some parameter tuning steps?

Categories

Resources