Exponential Smoothing with alpha and beta greater than one

Exponential Smoothing with alpha and beta greater than one - python

I have the following time series
year value
2001-01-01 433.0
2002-01-01 445.0
2003-01-01 406.0
2004-01-01 416.0
2005-01-01 432.0
2006-01-01 458.0
2007-01-01 418.0
2008-01-01 392.0
2009-01-01 464.0
2010-01-01 434.0
2012-01-01 435.0
2013-01-01 437.0
2014-01-01 465.0
2015-01-01 442.0
2016-01-01 456.0
2017-01-01 448.0
2018-01-01 433.0
2019-01-01 399.0
that I want to fit with an Exponential Smoothing model. I define my model the following way:
model = ExponentialSmoothing(dataframe, missing='drop', trend='mul', seasonal_periods=5,
seasonal='add',initialization_method="heuristic")
model = model.fit(optimized=True, method="basinhopping")
where I let the algorithm to optimize the values of smoothing_level=$\alpha$, smoothing_trending=$\beta$, smoothing_seasonal=$\gamma$ and damping_trend=$\phi$.
However, when I print the results for this specific case, i get: $\alpha=1.49$, $\beta=1.41$, $\gamma=0.0$ and $\phi=0.0$.
Could someone explain me what's happening here?
Are these values of $\alpha$ and $\beta$ greater than 1 acceptable?

I think you're misinterpreting the results. We can run your model as follows:
data = [
433.0, 445.0, 406.0, 416.0, 432.0, 458.0,
418.0, 392.0, 464.0, 434.0, 435.0, 437.0,
465.0, 442.0, 456.0, 448.0, 433.0, 399.0]
model = sm.tsa.ExponentialSmoothing(data, missing='drop', trend='mul', seasonal_periods=5,
seasonal='add',initialization_method="heuristic")
res = model.fit(optimized=True, method="basinhopping")
print(res.params['smoothing_level'])
print(res.params['smoothing_trend'])
which gives me:
1.4901161193847656e-08
1.4873988732462211e-08
Notice the e-08 part - the first parameter isn't equal to 1.49, it's equal to 0.0000000149.

Related

Python: Linear Regression of Subsets of Columns

I have a dataset of monthly stock returns that I am trying to regress on market returns by quarter and by company ID (PERMNO). Here's what the data looks like:
date PERMNO MCAP FIRMRF MKTRF qtr
0 2018-01-02 10026 2.784892e+06 -0.017514 0.0085 1
7339 2018-01-03 10026 2.757077e+06 -0.010048 0.0059 1
14671 2018-01-04 10026 2.795160e+06 0.013753 0.0042 1
22003 2018-01-05 10026 2.768464e+06 -0.009610 0.0066 1
29334 2018-01-08 10026 2.770518e+06 0.000682 0.0019 1
... ... ... ... ... ... ...
8455011 2022-03-25 93436 1.044531e+09 -0.003235 0.0027 1
8464495 2022-03-28 93436 1.128454e+09 0.080345 0.0073 1
8473980 2022-03-29 93436 1.136443e+09 0.007080 0.0145 1
8483469 2022-03-30 93436 1.130676e+09 -0.005075 -0.0083 1
8492959 2022-03-31 93436 1.113736e+09 -0.014982 -0.0155 1
The goal is to have the slope (beta) and standard error of each firm, each quarter, stored as values in the same dataframe (the quarterly regression values would be repeating for each line in a given quarter).
I've been scouring stackoverflow the past few days and have tried to repurpose a bunch of different answers here, but to no avail. So far, I'm assuming that it will need to look something like:
for i in daily['qtr']:
for x in daily['PERMNO']:
reg = sm.OLS(daily['FIRMRF'], sm.add_constant(daily['MKTRF']))
results = reg.fit()
Any help, hints or advice is appreciated!

After a few days of experimenting I've found a solution to the problem. I was able to define a function that contains the OLS regression and then apply it via a groupby function:
#Pull beta estimates
#Def Regression func
def regress(data, yvar, xvars):
Y = data[yvar]
X = sm.add_constant(data[xvars])
result = sm.OLS(Y, X).fit()
return result.params
#regression within group
params1 = data.groupby(['PERMNO', 'year', 'qtr']).apply(regress, 'FIRMRF', 'MKTRF')
params_raw = pd.DataFrame(params1)
params_raw.head(10)
Hopefully this will be useful to someone else!

Algorithm for predict(start=start_date, end=end_date) on unique same named weather station

Story that pertain to a new design solution
Goal is to use weather data to run ARIMA model fit on each group of like named 'stations' with their associated precipitation data, then finally execute a 30 day forward forecast. Looking to process specific same named stations and then next process the next unique same named stations, etc.
The algorithm to add question
How to write algorithm to run ARIMA model for each UNIQUE 'station' and, perhaps grouping stations to be unique groups to run ARIMA model on the group, and then fit for a 30 day forward forecast? The ARIMA(2,1,1) is a working order terms from auto.arima().
How to write a group algorithm for same named 'stations' before running the ARIMA model, fit, forecast? Or what other approach would achieve a set of like named stations to process specific same named stations and then move unto the next unique same named stations.
Working code executes but needs broader algorithm
Code was working, but on last run, predict(start=start_date, end=end_date) issued a key error. Removed NA, so this may fix the predict(start, end)
wd.weather_data = wd.weather_data[wd.weather_data['date'].notna()]
forecast_models = [50000]
n = 1
df_all_stations = data_prcp.drop(['level_0', 'index', 'prcp'], axis=1)
wd.weather_data.sort_values("date", axis = 0, ascending = True, inplace = True)
for station_name in wd.weather_data['station']:
start_date = pd.to_datetime(wd.weather_data['date'])
number_of_days = 31
end_date = pd.to_datetime(start_date) + pd.DateOffset(days=30)
model = statsmodels.tsa.arima_model.ARIMA(wd.weather_data['prcp'], order=(2,1,1))
model_fit = model.fit()
forecast = model_fit.predict(start=start_date, end=end_date)
forecast_models.append(forecast)
Data Source
<bound method NDFrame.head of station date tavg tmin tmax prcp snow
0 Anchorage, AK 2018-01-01 -4.166667 -8.033333 -0.30 0.3 80.0
35328 Grand Forks, ND 2018-01-01 -14.900000 -23.300000 -6.70 0.0 0.0
86016 Key West, FL 2018-01-01 20.700000 16.100000 25.60 0.0 0.0
59904 Wilmington, NC 2018-01-01 -2.500000 -7.100000 0.00 0.0 0.0
66048 State College, PA 2018-01-01 -13.500000 -17.000000 -10.00 4.5 0.0
... ... ... ... ... ... ... ...
151850 Kansas City, MO 2022-03-30 9.550000 3.700000 16.55 21.1 0.0
151889 Springfield, MO 2022-03-30 12.400000 4.500000 17.10 48.9 0.0
151890 St. Louis, MO 2022-03-30 14.800000 8.000000 17.60 24.9 0.0
151891 State College, PA 2022-03-30 0.400000 -5.200000 6.20 0.2 0.0
151899 Wilmington, NC 2022-03-30 14.400000 6.200000 20.20 0.0 0.0
wdir wspd pres
0 143.0 5.766667 995.133333
35328 172.0 33.800000 1019.200000
86016 4.0 13.000000 1019.900000
59904 200.0 21.600000 1017.000000
66048 243.0 12.700000 1015.200000
... ... ... ...
151850 294.5 24.400000 998.000000
151889 227.0 19.700000 997.000000
151890 204.0 20.300000 996.400000
151891 129.0 10.800000 1020.400000
151899 154.0 16.400000 1021.900000
Error
KeyError: 'The `start` argument could not be matched to a location related to the index of the data.'

How to use Prophet's make_future_dataframe with multiple regressors?

make_future_dataframe seems to only produce a dataframe with date (ds) values, which in turn results in ValueError: Regressor 'var' missing from dataframe when attempting to generate forecasts when using the code below.
m = Prophet()
m.add_country_holidays(country_name='US')
m.add_regressor('var')
m.fit(df)
forecasts = m.predict(m.make_future_dataframe(periods=7))
Looking through the python docs, there doesn't seem to be any mention of how to combat this issue using Prophet. Is my only option to write additional code to lag all regressors by the period for which I want to generate forecasts (ex. take var at t-7 to produce a 7 day daily forecast)?

The issue here is that the future = m.make_future_dataframe method creates a dataset future where the only column is the ds date column. In order to predict using a model with regressors you also need columns for each regressor in the future dataset.
Using my original training data which I called regression_data, I solved this by predicting the values for the regressor variables and then filling those into a future_w_regressors dataset which was a merge of future and regression_data.
Assume you have a trained model model ready.
# List of regressors
regressors = ['Total Minutes','Sent Emails','Banner Active']
# My data is weekly so I project out 1 year (52 weeks), this is what I want to forecast
future = model.make_future_dataframe(52, freq='W')
at this point if you run model.predict(future) you will get the error you've been getting. What we need to do is incorporate the regressors. . I merge regression_data with future so that the observations from the past are filled. As you can see, the observations looking forward are empty (towards the end of the table)
# regression_data is the dataframe I used to train the model (include all covariates)
# merge the data you used to train the model
future_w_regressors = regression_data[regressors+['ds']].merge(future, how='outer', on='ds')
future_w_regressors
Total Minutes Sent Emails Banner Active ds
0 7.129552 9.241493e-03 0.0 2018-01-07
1 7.157242 8.629305e-14 0.0 2018-01-14
2 7.155367 8.629305e-14 0.0 2018-01-21
3 7.164352 8.629305e-14 0.0 2018-01-28
4 7.165526 8.629305e-14 0.0 2018-02-04
... ... ... ... ...
283 NaN NaN NaN 2023-06-11
284 NaN NaN NaN 2023-06-18
285 NaN NaN NaN 2023-06-25
286 NaN NaN NaN 2023-07-02
287 NaN NaN NaN 2023-07-09
Solution 1: Predict Regressors
For the next step I create a dataset with only the empty regressor values in it, loop through each regressor, train a naive prophet model on each, predict their values for the future date, fill those values into the empty regressors dataset and place those values into the future_w_regressors dataset.
# Get the segment for which we have no regressor values
empty_future = future_w_regressors[future_w_regressors[regressors[0]].isnull()]
only_future = empty_future[['ds']]
# Create a dictionary to hold the different independent variable forecasts
for regressor in regressors:
# Prep a new training dataset
train = regression_data[['ds',regressor]]
train.columns = ['ds','y'] # rename the variables so they can be submitted to the prophet model
# Train a model for this regressor
rmodel = Prophet()
rmodel.weekly_seasonality = False # this is specific to my case
rmodel.fit(train)
regressor_predictions = rmodel.predict(only_future)
# Replace the empty values in the empty dataset with the predicted values from the regressor model
empty_future[regressor] = regressor_predictions['yhat'].values
# Fill in the values for all regressors in the future_w_regressors dataset
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = empty_future[regressors].values
Now the future_w_regressors table no longer has missing values
future_w_regressors
Total Minutes Sent Emails Banner Active ds
0 7.129552 9.241493e-03 0.000000 2018-01-07
1 7.157242 8.629305e-14 0.000000 2018-01-14
2 7.155367 8.629305e-14 0.000000 2018-01-21
3 7.164352 8.629305e-14 0.000000 2018-01-28
4 7.165526 8.629305e-14 0.000000 2018-02-04
... ... ... ... ...
283 7.161023 -1.114906e-02 0.548577 2023-06-11
284 7.156832 -1.138025e-02 0.404318 2023-06-18
285 7.150829 -5.642398e-03 0.465311 2023-06-25
286 7.146200 -2.989316e-04 0.699624 2023-07-02
287 7.145258 1.568782e-03 0.962070 2023-07-09
And I can run the predict command to get my forecasts which now extend into 2023 (original data ended in 2022):
model.predict(future_w_regressors)
ds trend yhat_lower yhat_upper trend_lower trend_upper Banner Active Banner Active_lower Banner Active_upper Sent Emails Sent Emails_lower Sent Emails_upper Total Minutes Total Minutes_lower Total Minutes_upper additive_terms additive_terms_lower additive_terms_upper extra_regressors_additive extra_regressors_additive_lower extra_regressors_additive_upper yearly yearly_lower yearly_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yhat
0 2018-01-07 2.118724 2.159304 2.373065 2.118724 2.118724 0.000000 0.000000 0.000000 3.681765e-04 3.681765e-04 3.681765e-04 0.076736 0.076736 0.076736 0.152302 0.152302 0.152302 0.077104 0.077104 0.077104 0.075198 0.075198 0.075198 0.0 0.0 0.0 2.271026
1 2018-01-14 2.119545 2.109899 2.327498 2.119545 2.119545 0.000000 0.000000 0.000000 3.437872e-15 3.437872e-15 3.437872e-15 0.077034 0.077034 0.077034 0.098945 0.098945 0.098945 0.077034 0.077034 0.077034 0.021911 0.021911 0.021911 0.0 0.0 0.0 2.218490
2 2018-01-21 2.120366 2.074524 2.293829 2.120366 2.120366 0.000000 0.000000 0.000000 3.437872e-15 3.437872e-15 3.437872e-15 0.077014 0.077014 0.077014 0.064139 0.064139 0.064139 0.077014 0.077014 0.077014 -0.012874 -0.012874 -0.012874 0.0 0.0 0.0 2.184506
3 2018-01-28 2.121187 2.069461 2.279815 2.121187 2.121187 0.000000 0.000000 0.000000 3.437872e-15 3.437872e-15 3.437872e-15 0.077110 0.077110 0.077110 0.050180 0.050180 0.050180 0.077110 0.077110 0.077110 -0.026931 -0.026931 -0.026931 0.0 0.0 0.0 2.171367
4 2018-02-04 2.122009 2.063122 2.271638 2.122009 2.122009 0.000000 0.000000 0.000000 3.437872e-15 3.437872e-15 3.437872e-15 0.077123 0.077123 0.077123 0.046624 0.046624 0.046624 0.077123 0.077123 0.077123 -0.030498 -0.030498 -0.030498 0.0 0.0 0.0 2.168633
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 2023-06-11 2.062645 2.022276 2.238241 2.045284 2.078576 0.025237 0.025237 0.025237 -4.441732e-04 -4.441732e-04 -4.441732e-04 0.077074 0.077074 0.077074 0.070976 0.070976 0.070976 0.101867 0.101867 0.101867 -0.030891 -0.030891 -0.030891 0.0 0.0 0.0 2.133621
284 2023-06-18 2.061211 1.975744 2.199376 2.043279 2.077973 0.018600 0.018600 0.018600 -4.533835e-04 -4.533835e-04 -4.533835e-04 0.077029 0.077029 0.077029 0.025293 0.025293 0.025293 0.095176 0.095176 0.095176 -0.069883 -0.069883 -0.069883 0.0 0.0 0.0 2.086504
285 2023-06-25 2.059778 1.951075 2.162531 2.041192 2.077091 0.021406 0.021406 0.021406 -2.247903e-04 -2.247903e-04 -2.247903e-04 0.076965 0.076965 0.076965 0.002630 0.002630 0.002630 0.098146 0.098146 0.098146 -0.095516 -0.095516 -0.095516 0.0 0.0 0.0 2.062408
286 2023-07-02 2.058344 1.953027 2.177666 2.039228 2.076373 0.032185 0.032185 0.032185 -1.190929e-05 -1.190929e-05 -1.190929e-05 0.076915 0.076915 0.076915 0.006746 0.006746 0.006746 0.109088 0.109088 0.109088 -0.102342 -0.102342 -0.102342 0.0 0.0 0.0 2.065090
287 2023-07-09 2.056911 1.987989 2.206830 2.037272 2.075110 0.044259 0.044259 0.044259 6.249949e-05 6.249949e-05 6.249949e-05 0.076905 0.076905 0.076905 0.039813 0.039813 0.039813 0.121226 0.121226 0.121226 -0.081414 -0.081414 -0.081414 0.0 0.0 0.0 2.096724
288 rows × 28 columns
Note that I trained the model for each regressor naively. However, you could optimize prediction for those independent variables if you wanted to.
Solution 2: Use last year's regressor values
Alternatively, you could just say that you don't want to compound the uncertainty of regressor forecasts on your main forecast and just want an idea of how forecasts might change for different values of the regressor. In that case you might just copy the regressor values from the last year into the missing future_w_regressors dataset. This has the added benefit of easily simulating drops or increases relative to current regressor levels:
from datetime import timedelta
last_date = regression_data.iloc[-1]['ds']
one_year_ago = last_date - timedelta(days=365) # works with data at any scale
last_year_of_regressors = regression_data.loc[regression_data['ds']>one_year_ago, regressors]
# If you want to simulate a 10% drop in levels compared to this year
last_year_of_regressors = last_year_of_regressors * 0.9
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = last_year_of_regressors.iloc[:len(future_w_regressors[future_w_regressors[regressors[0]].isnull()])].values

The sum() function does not work well when adding zeros in pandas with python

I try sum all rows of dataframe with the following code:
df = pd.read_csv('dataset.csv', parse_dates =["date"], index_col ="date")
df.fillna(0, inplace=True)
df.head(10)
Result
date CALIFORNIA-CENTRAL CALIFORNIA-SOUTH Ciudad Guzman La Barca Zamora Ensenada
2014-01-01 0.0 34.778435 7.157517 20.216307 5.405574 10.788031
2014-02-01 0.0 43.540938 9.275431 26.198327 7.005088 13.980217
2014-03-01 0.0 36.032682 10.508932 29.682333 7.936665 15.839388
2014-04-01 0.0 51.948925 7.894117 22.296823 5.961877 11.898257
2014-05-01 0.0 24.522135 12.399953 35.023496 9.364822 18.689594
2014-06-01 0.0 66.884840 5.440356 15.366209 4.108722 8.199873
2014-07-01 0.0 43.974770 9.204158 25.997018 6.951260 13.872793
2014-08-01 0.0 51.296811 8.001250 22.599420 6.042787 12.059732
2014-09-01 0.0 30.358093 11.441187 32.315475 8.640733 17.244512
2014-10-01 0.0 45.377776 8.973664 25.345991 6.777184 13.525385
Then I make a resample weekly beginning by Monday and apply sum() function:
weekly = df.resample('W-MON').sum()
weekly.head(10)
Result:
date CALIFORNIA-CENTRAL CALIFORNIA-SOUTH Ciudad Guzman La Barca Zamora Ensenada
2014-01-05 0.161442 164.688463 35.074387 99.067123 26.489245 52.865205
2014-01-12 360.218088 198.894117 23.145886 65.375235 17.480477 34.886197
2014-01-19 11.395859 330.487743 58.833493 166.174392 44.432845 88.675667
2014-01-26 0.000000 295.610943 66.435441 187.645989 50.174067 100.133559
2014-02-02 0.000000 315.052247 63.241508 178.624769 47.761912 95.319564
2014-02-09 214.260106 227.217402 42.471614 119.960488 32.075855 64.014534
2014-02-16 141.873044 255.491621 49.718734 140.429880 37.549101 74.937619
2014-02-23 0.000000 330.671191 60.675535 171.377213 45.824011 91.452049
2014-03-02 0.000000 269.364174 70.747417 199.825104 53.430602 106.632702
2014-03-09 207.773431 238.926975 41.613565 117.536941 31.427831 62.721257
And if you can see, the sum in column 'CALIFORNIA-CENTRAL', it is wrong, it should be 0 in the first rows
My question is, what is the problem? Why can't the sum() function add zeros?
I very well checked my dataset.csv file and it is correct, I rebooted the kernel, cleared my cache and checked the type column (floating)
What am I doing wrong?

How to access last year values to compare year on year ? datetime index

I have a dataframe of 2 years worth of dates and revenue on given day,
import pandas as pd
import datetime
import itertools
import time
import plotly.graph_objects as go
startDate = datetime.date(2018,1,1)
endDate = datetime.date(2019,12,31)
date_range = pd.date_range(start=startDate, end=endDate)
performance_df = pd.DataFrame(date_range)
performance_df.columns = ['Date']
performance_df = performance_df.set_index(['Date'])
performance_df['Revenue'] = [25891.84678700861, 25851.615540667623, 25037.71189951304, 26715.764965288607, 23988.35694961679, 19029.057983049668, 16935.481705163278, 22756.072913397493, 30385.672828716073, 32970.13217533639, 31089.167074855934, 24262.972414940097, 18261.273831731618, 18304.754083985797, 26297.835664941533, 32619.66940484445, 35565.26222544722, 33229.97193979324, 25405.647135516112, 19980.890374561146, 20487.553160719217, 29709.0323217462, 38164.493647661984, 39050.80114673623, 36612.554432511824, 28169.78252364524, 22086.641617812107, 21631.662705640312, 28419.94529036299, 35644.61736420142, 35829.065860994495, 32907.079142030154, 25951.247521574016, 22888.00983435945, 22582.648252027546, 30024.92542296243, 37891.251492167445, 39065.307017542225, 35326.30407697067, 28447.88908662872, 25042.500493029664, 26403.83421252776, 32380.475740928025, 36605.55089473326, 36006.56039455697, 35189.153100968004, 29780.77465095395, 24909.218739056418, 23685.75537938559, 31839.08994457272, 39061.7208522828, 39973.50309446715, 36623.93766798115, 28038.08152342491, 23004.30712890111, 22349.30571082852, 29259.27790736973, 36562.99112657728, 34942.83314919648, 30908.429691071553, 25025.293504644822, 22417.499234687977, 23235.05923247665, 30142.36658055089, 37322.22656885001, 39533.654081050176, 38817.55694852113, 32902.066818425184, 27289.215659267025, 26836.10240333383, 32714.6554385672, 40479.10841583944, 43032.710867507936, 42172.617851188385, 34581.509020848, 28796.31836571319, 28742.324093757332, 34514.108362569794, 41762.5838531726, 43526.12116978522, 39641.50434996709, 30279.354030113776, 23901.27040606382, 24482.3224100694, 34144.561683174434, 40900.767127300045, 41325.58696351466, 32836.06314047833, 28141.78555667737, 30152.48501882366, 31302.601421675394, 30866.3243266088, 35869.875330241855, 40580.31733582241, 41993.5864357607, 39319.78415151001, 35718.297902676924, 36181.313810975684, 41230.810723606075, 46047.43448330563, 47857.354050289476, 45322.56791751129, 37822.96403899934, 31215.19286619295, 29918.90585181318, 35176.29194324105, 43574.87496458186, 46034.04274563455, 43441.151399711365, 34723.430758204755, 27862.663129803153, 28567.807404056617, 37133.90907964683, 43009.01499641166, 45626.712492547886, 43142.632484355025, 33862.51562817326, 27397.539418702294, 27826.66395150021, 33130.40536093521, 39810.27565167983, 43719.59292308625, 41154.292760403885, 35925.79865016479, 32567.778590584392, 33297.526224142544, 32947.89653815421, 42798.953783336874, 47614.60147123756, 45254.62604676405, 38343.504556057334, 33179.627837955115, 33582.52982828824, 42105.00412410267, 48043.66651093638, 48837.40726130669, 48430.69364401822, 41076.68912572308, 34892.890461267096, 34935.159059467, 43160.00734032636, 52344.94145539043, 53619.43675580939, 52103.10220317212, 46364.68259720105, 43350.64074445112, 43840.939241180975, 41388.2098953964, 50023.86997334399, 53997.492172694445, 51246.8781429738, 44896.933023168574, 39424.66568855407, 39201.81892657556, 48299.32823370456, 57482.470637739105, 59523.110347325885, 55947.08067737157, 46045.719863431106, 38883.154337118904, 39475.701405236185, 49427.033216828415, 60480.43065506853, 61135.583867704656, 56363.832578283495, 46645.212577169856, 38947.471275125194, 39329.754583734706, 50567.536867536335, 60653.82734696712, 63896.50017170786, 59898.432410040055, 48184.54638977423, 41190.50886536953, 41376.579389025275, 51676.40473294583, 60212.103940879424, 61073.542419917525, 57551.1469742174, 46747.318316404155, 40323.9814761604, 41029.31091363546, 49818.65317184012, 58397.91950877408, 61499.7188209775, 58381.773792521184, 51645.68301028936, 46686.85877971279, 46284.20382322595, 53168.20709487424, 59629.08096569072, 61559.59693441008, 58214.20592954166, 50189.35908931843, 44830.32689867108, 44553.609729770156, 50596.800094530976, 57784.320856900726, 58856.24561158704, 55747.50121815237, 45909.22320056169, 39700.62514340684, 40026.08060928283, 45670.31405277475, 53989.53076463664, 56840.67697150013, 51990.04207895543, 44479.51204240872, 40155.77712289925, 39704.41534828166, 44546.13749709498, 47853.054952435254, 47513.191562263164, 44746.64958742695, 38570.66136465314, 34913.4920574524, 34960.54662729273, 38809.56679736621, 44328.06662512622, 46786.97399754649, 43176.24626069548, 38761.63401887685, 35505.42439182791, 34491.44625748903, 38371.78994694245, 43751.84248007749, 45179.535352503226, 42830.262169078655, 37807.84955152587, 33490.071062830524, 33451.981211263366, 38489.460640344005, 44884.80556430299, 48173.59627480145, 45230.34903136869, 40408.815586107376, 37482.963570560685, 37298.73472663822, 35613.14542796679, 44369.054329647, 48985.813091293036, 47004.185926539445, 40072.386470837155, 36025.40024944878, 37087.92873340035, 43829.83096193774, 53122.12039373634, 56019.219188405055, 56327.64714973823, 49299.052443800436, 43566.64775098363, 44683.28514853064, 53151.23631794093, 62692.280707330996, 64588.59942255457, 61225.42426070072, 52811.43214958245, 45888.1735729261, 46327.71392726898, 55065.63459685981, 63657.55109568991, 66424.12156809494, 64999.66733479166, 57939.99891629782, 50574.620616169435, 50558.85074509659, 58418.01819988318, 64054.52320755815, 65362.71077771696, 63212.24817914635, 53816.39717322037, 47299.69358112465, 46438.29122104288, 51606.847773139874, 58286.00492998514, 60724.299744674456, 60074.178339144695, 53563.16147623882, 47537.06339596468, 48392.602700494775, 56176.46157312282, 64575.322111131725, 66575.84159174575, 64667.37570830546, 58256.091303140376, 53478.61964952481, 55099.86601961843, 62950.88160139487, 67181.84218779847, 66728.1189827789, 64883.456569064045, 56046.87389471389, 49400.485446729814, 47508.2631477567, 50017.97093003997, 54896.861775391895, 54433.64513960683, 50522.90141287548, 40609.51437863042, 33685.99041915158, 31782.758247954887, 36404.455159974576, 40003.119259204075, 39722.56468522227, 38114.81374102996, 32394.83359664386, 27315.366586900353, 27310.107293953042, 33405.16835200959, 42120.632096858724, 43190.48348102931, 41290.86540942159, 35119.35131893462, 28756.590790603772, 28610.11081953303, 35550.66376207889, 41120.32617529186, 42589.10273496922, 42320.707348300246, 35497.925445967456, 28680.144395217914, 28433.68805319704, 34480.02122168917, 40808.10672190518, 41632.86607595266, 39212.58899530489, 32942.6873470945, 27158.723820603347, 27161.92132942049, 33334.34617535648, 38679.06248665687, 40681.03562440046, 39477.59519930245, 34513.37459740981, 28345.14667273714, 28289.697206577042, 34737.3173677138, 40574.91034815245, 40556.06688657629, 39927.322507441866, 34634.7483828078, 27666.467364275486, 27774.36383185118, 33950.84687537262, 39518.06131165054, 39587.56870083202, 38832.66031065059, 32258.462222065184, 23343.00831465727, 23914.89577468648, 28173.094174897382, 30306.555827203953, 28284.310391780135, 24228.75442600916, 20495.999364892246, 19302.93644485608, 21391.090776536974, 21072.220129100904, 19770.681250398102, 22751.205447107975, 25744.075479601106, 27119.697588885116, 28894.626077292316, 30321.364424584666, 29665.55870322018, 28601.71879337108, 28071.180317842627, 27522.026515632668, 25081.934367325113, 21303.503766392783, 18866.89154435026, 23938.585815421528, 29814.69141061624, 33132.41332574798, 32403.437424673357, 25058.826704400228, 19710.392712492816, 20359.305357642494, 28801.270965356667, 36009.139554077155, 38616.97439807099, 36667.40186755878, 28534.365261187533, 22467.35716166321, 22792.672711448053, 29707.297931805748, 37470.29566599212, 39159.19770376187, 35885.39779973568, 28198.615393579756, 22794.986958484948, 22618.6644213648, 31692.436151849328, 38546.03983274927, 38392.05968074403, 36221.937805245456, 28193.106505443564, 24431.214224965315, 24273.041173434154, 33527.551077655895, 39799.45238198621, 42167.39446971978, 38724.44521961169, 31303.32596831539, 28852.214711208617, 29328.603401220582, 38595.29805477188, 46086.742816943195, 46212.71514312982, 40873.403063823505, 34050.57262699937, 30182.502158521005, 29877.84162199769, 36061.93273317978, 42775.17967463026, 46034.16044887075, 41613.766613228676, 33678.29697864473, 28330.525176789113, 27672.35860253945, 35245.249394927574, 44232.03285856061, 45226.19144817676, 43043.919289296115, 33089.63437761405, 27591.940183796116, 27719.98729388228, 34909.70477643947, 43035.046467701366, 45301.769969577756, 42110.40663131329, 33482.256424365776, 29072.549855117086, 28488.907914432693, 37030.57030038991, 44216.431591844026, 46331.06515629943, 45488.943714074994, 37613.19663707921, 31321.72125252333, 30932.753428207085, 37772.776810934076, 45796.64013962434, 47820.36900857583, 45315.76501111126, 35448.39425262605, 29850.35013283804, 29591.159982436213, 37654.079017344375, 45495.78841466611, 47908.959672341305, 44361.65729402939, 35983.7996238946, 32075.219160365035, 33116.287276080366, 42322.98190708696, 51963.46941425464, 52271.57848952532, 52043.23599364105, 45266.807463334626, 40278.50721681852, 41783.0851672551, 49025.27275062121, 57753.777188678556, 57148.82659973791, 51158.111967429824, 40270.45350840617, 32304.431822795817, 32070.63494283356, 37752.91137050926, 42330.03164511995, 42399.727284674096, 36018.92734513604, 30062.163733450554, 29588.1255976938, 29092.231273784368, 26893.12179696699, 34929.949841169895, 42461.99046702016, 41600.38455975693, 34379.79505825023, 29348.680842359154, 30674.019184850084, 36438.884319303455, 42592.755250922644, 42370.3178916497, 42156.89423782389, 35502.775577422006, 32961.0702792199, 33247.123325205226, 32638.287197185437, 44674.20771450234, 51377.68957910347, 50811.51554085845, 40938.066497686414, 34693.61195056415, 35794.33867394901, 43608.768716936305, 53030.999471673196, 55375.512665613925, 51350.83787518498, 43515.67353483048, 36881.67381692623, 38068.73656246778, 46093.29078747657, 57268.0460688599, 58937.23137518557, 54242.503589531414, 45218.09204062047, 41196.31936112136, 42541.41388049561, 41024.48650669875, 51622.7076655413, 56013.155747602264, 52544.45187791683, 43851.12531567635, 38604.27886985339, 38620.02900914742, 47969.72289541592, 56852.747992644305, 61996.47126627607, 58805.310177720465, 47280.70456188186, 40583.34097614553, 41128.192126370894, 50193.55312608201, 59539.43834004759, 64983.864246308556, 61279.46570907009, 48338.44203029854, 40484.3484226722, 41884.39600353677, 54289.35228313394, 61250.691081910205, 64220.31437336595, 59165.983057045596, 47642.189399246454, 41115.51194818619, 41876.72414814975, 51464.90978190481, 62803.30863417166, 65385.57373290332, 60334.06981497588, 46284.58686223134, 38833.89407337377, 39698.05357331142, 47745.00067522588, 55101.85604010965, 58237.24426215552, 57764.44135307596, 50420.8551702024, 45567.72968395969, 46504.94127248667, 54349.34998968483, 60731.4138465725, 63303.91498179767, 60298.72859863081, 51704.17508237814, 45717.347342456975, 46059.70474391559, 53049.488500429776, 59306.32398907892, 62403.524245874905, 59696.274176988816, 48684.76124917582, 41835.26632192022, 42430.54555033094, 50076.369117078095, 56940.56608534447, 58025.69842660899, 55038.43675221109, 44884.821827167245, 38939.787615900466, 38979.59802389587, 44663.55176765817, 50517.54060893932, 50345.85708898577, 47988.558150172845, 40853.4057042814, 38165.8415118995, 38648.876806740445, 41908.58157309806, 47401.57907842861, 49481.04013850385, 48095.25972992769, 41284.361139943714, 37854.4480306016, 37546.53779346516, 41603.843693113595, 47820.91079534329, 47246.476111540374, 44466.81067665845, 39450.05564794844, 37010.325669384845, 36654.85544497009, 40353.082727944166, 46185.06388406285, 47580.77496864459, 46665.410664712144, 40791.501556263196, 38355.37341411503, 37635.23471064154, 36758.08397128221, 45985.88943874199, 49995.80097644232, 47975.90718646997, 41429.81245724381, 37282.35082487397, 37137.63721104596, 42201.7547964592, 49822.27312845655, 54534.49980287231, 54533.77882552861, 47481.81951410114, 42055.08936215232, 43879.59319217621, 50025.5866728274, 57779.83356382646, 62229.99530852202, 60656.63284916041, 53322.422185201416, 46947.596440932735, 48157.11595496144, 55485.4195651247, 65417.97612369258, 69959.13505419965, 68457.41761185907, 58008.38472436904, 49036.129043870984, 48812.32078456241, 55770.81148868941, 62578.40993961545, 64911.32653171455, 63452.51661006169, 56211.79640881274, 49058.582679465755, 49695.71775694223, 54647.5259012805, 59928.6898365422, 54631.05724966697, 49250.29905728087, 40856.97377954754, 36266.89995539015, 35517.15307352361, 34184.53353052385, 33371.86128513576, 31969.554327793263, 30641.723523056826, 28778.69553058833, 27946.73958770959, 27509.81466494675, 26662.70133275635, 26476.786252324444, 24811.176812171496, 23734.407940954658, 21813.02876068876, 19691.25151218245, 17829.182468339233, 15717.002576485738, 14676.555091672217, 13659.528312206226, 12944.24059105674, 11821.33696211924, 10574.619947022518, 9870.736391613336, 8602.998647942173, 8088.018686973378, 7668.737915280734, 7409.212563574305, 7059.414255782104, 6805.311671542087, 6265.577428451043, 5336.939303601046, 5036.213630383211, 4759.937178041844, 5002.634961970787, 5377.02368302538, 5080.424215798721, 4864.875421529681, 4236.1512408381295, 4106.63390017699, 4042.2341857847173, 4197.684686131461, 4474.035273523109, 4490.267528124243, 4241.475197902689, 3589.2927029890307, 3444.298594012123, 3107.614908066812, 3426.8278986551327, 3828.278281872126, 3941.803702165574, 3732.7576911047563, 3225.113307797762, 3072.114284546333, 3150.7921285990747, 3233.947689115616, 3619.052387506853, 3551.5976360602217, 3435.0323606968946, 2973.892113333866, 2674.3168644471334, 2460.238549245212, 2800.034199553288, 3195.2623366970715, 3107.693557143192, 2961.140811696215, 2340.478044336084, 1931.373924738195, 1847.1236388756024, 2179.3253334294473, 2438.0353828364327, 2379.86512657921, 2255.5989701513035, 1870.926018202182, 1620.6083820631166, 1511.3110067191255, 1685.9676158651248, 2099.8497631541054, 2430.5076190841487, 2701.9755190700494, 2997.6300510919987, 2977.3472468469777, 2893.576703677185, 3207.4670223153535, 3539.6180497104797, 3534.997475538599, 3527.721146658791, 3489.9154298884955, 3287.543337245921]
print(performance_df)
I need to access last years revenue and print it next to the current year in a new df column so I can then compare year on year performance.
My desired out put would look something like this
Revenue . LY Revenue
Date
2018-01-01 25891.846787 . Nan
2018-01-02 25851.615541 . Nan
2018-01-03 25037.711900 . Nan
2018-01-04 26715.764965 . Nan
2018-01-05 23988.356950 . Nan
... ...
2019-12-27 3539.618050 . 25744.075480
2019-12-28 3534.997476 . 27119.697589
2019-12-29 3527.721147 . 28894.626077
2019-12-30 3489.915430 . 30321.364425
2019-12-31 3287.543337 . 29665.558703
how you go about achieving this?
So far I have just been able to get last years date from the index by:
performance_df['Last year dates'] = (performance_df['Revenue'].index - pd.Timedelta(days=365))
But I want to corresponding revenue for that date not just the date its self.

Use Series.shift:
performance_df['LY Revenue']=performance_df['Revenue'].shift(365)
print(performance_df)
Revenue LY Revenue
Date
2018-01-01 25891.8% nan%
2018-01-02 25851.6% nan%
2018-01-03 25037.7% nan%
2018-01-04 26715.8% nan%
2018-01-05 23988.4% nan%
... ... ...
2019-12-27 3539.6% 25744.1%
2019-12-28 3535.0% 27119.7%
2019-12-29 3527.7% 28894.6%
2019-12-30 3489.9% 30321.4%
2019-12-31 3287.5% 29665.6%
[730 rows x 2 columns]
Here you can see the beginning of the year 2019:
print(performance_df[364:366])
Revenue LY Revenue
Date
2018-12-31 29665.6% nan%
2019-01-01 28601.7% 25891.8%

IIUC, you need this.
This works only when you have date time as index.
What we are doing here is grouping by day & month using the datetime value & even if the dates are between leap-year & a normal-year, this should work.
performance_df['LY_Revenue'] = performance_df.groupby([performance_df.index.month,performance_df.index.day])['Revenue'].shift()
print(performance_df)
Output
Revenue LY_Revenue
Date
2018-01-01 25891.846787 NaN
2018-01-02 25851.615541 NaN
2018-01-03 25037.711900 NaN
2018-01-04 26715.764965 NaN
2018-01-05 23988.356950 NaN
2018-01-06 19029.057983 NaN
2018-01-07 16935.481705 NaN
2018-01-08 22756.072913 NaN
2018-01-09 30385.672829 NaN
2018-01-10 32970.132175 NaN
2018-01-11 31089.167075 NaN
2018-01-12 24262.972415 NaN
2018-01-13 18261.273832 NaN
2018-01-14 18304.754084 NaN
2018-01-15 26297.835665 NaN
2018-01-16 32619.669405 NaN
2018-01-17 35565.262225 NaN
2018-01-18 33229.971940 NaN
2018-01-19 25405.647136 NaN
2018-01-20 19980.890375 NaN
2018-01-21 20487.553161 NaN
2018-01-22 29709.032322 NaN
2018-01-23 38164.493648 NaN
2018-01-24 39050.801147 NaN
2018-01-25 36612.554433 NaN
2018-01-26 28169.782524 NaN
2018-01-27 22086.641618 NaN
2018-01-28 21631.662706 NaN
2018-01-29 28419.945290 NaN
2018-01-30 35644.617364 NaN
... ... ...
2019-12-02 2973.892113 28289.697207
2019-12-03 2674.316864 34737.317368
2019-12-04 2460.238549 40574.910348
2019-12-05 2800.034200 40556.066887
2019-12-06 3195.262337 39927.322507
2019-12-07 3107.693557 34634.748383
2019-12-08 2961.140812 27666.467364
2019-12-09 2340.478044 27774.363832
2019-12-10 1931.373925 33950.846875
2019-12-11 1847.123639 39518.061312
2019-12-12 2179.325333 39587.568701
2019-12-13 2438.035383 38832.660311
2019-12-14 2379.865127 32258.462222
2019-12-15 2255.598970 23343.008315
2019-12-16 1870.926018 23914.895775
2019-12-17 1620.608382 28173.094175
2019-12-18 1511.311007 30306.555827
2019-12-19 1685.967616 28284.310392
2019-12-20 2099.849763 24228.754426
2019-12-21 2430.507619 20495.999365
2019-12-22 2701.975519 19302.936445
2019-12-23 2997.630051 21391.090777
2019-12-24 2977.347247 21072.220129
2019-12-25 2893.576704 19770.681250
2019-12-26 3207.467022 22751.205447
2019-12-27 3539.618050 25744.075480
2019-12-28 3534.997476 27119.697589
2019-12-29 3527.721147 28894.626077
2019-12-30 3489.915430 30321.364425
2019-12-31 3287.543337 29665.558703

Given your data is time-indexed, you can shift with freq
performance_df['LY Revenue'] = performance_df.Revenue.shift(freq='365d')
Output:
Revenue LY Revenue
Date
2018-01-01 25891.8% nan%
2018-01-02 25851.6% nan%
2018-01-03 25037.7% nan%
2018-01-04 26715.8% nan%
2018-01-05 23988.4% nan%
2018-01-06 19029.1% nan%
2018-01-07 16935.5% nan%
2018-01-08 22756.1% nan%
2018-01-09 30385.7% nan%
...
2019-12-21 2430.5% 20496.0%
2019-12-22 2702.0% 19302.9%
2019-12-23 2997.6% 21391.1%
2019-12-24 2977.3% 21072.2%
2019-12-25 2893.6% 19770.7%
2019-12-26 3207.5% 22751.2%
2019-12-27 3539.6% 25744.1%
2019-12-28 3535.0% 27119.7%
2019-12-29 3527.7% 28894.6%
2019-12-30 3489.9% 30321.4%
2019-12-31 3287.5% 29665.6%
But note that 365D is not necessarily a year in general.

Hard to confirm without the data, but my thoughts are using .loc
pdf = performance_df
pdf.loc['2018-xx-xx','Revenue'] - pdf.loc['2019-xx-xx','Revenue']
May be able to simply use:
pdf.loc['2018','Revenue'] - pdf.loc['2019','Revenue']
Of course you can also simply slice the years into df's, and then subtract/divide
pdf18 = pdf.loc['2018-01-01':'2019-01-01','Revenue']
pdf19 = pdf.loc['2019-01-01':'2020-01-01','Revenue']
pdf_diff = pdf18 - pdf19
# = (pdf19-pdf18)/pdf18
#Or slice just year, then subtract pdf18['Revenue'] - pdf19['Revenue']
This may have issues because the shape[0]'s won't match (not full year), so make sure they line up.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exponential Smoothing with alpha and beta greater than one - python

Related

Python: Linear Regression of Subsets of Columns

Algorithm for predict(start=start_date, end=end_date) on unique same named weather station

How to use Prophet's make_future_dataframe with multiple regressors?

The sum() function does not work well when adding zeros in pandas with python

How to access last year values to compare year on year ? datetime index

Categories

Resources