Make time-series smooth saving monthly sales - python

I want to generate daily sales based on week distribution and monthly sum, and I want it to look smooth, without jumps from month to month, and another condition is not to change monthly sum. The idea of how it should looks like is below, kinda looks like normal distribution:
(blue line - current sales distribution, red - approximately what I would like to get)
My own method was about decreasing/increasing (increasing if monthly sales in the next month are higher than in current) sales in the month begginning and increasing in the end. I generated list of values and then multiply sales by that list. But that doesn't really work because in some cases decreasing sales in the begging can be too much when monthly sales in the next month are much higher than in current month.
Different time-series smoothing technics will change monthly sales, and even bringing the monthly sales to desired values by adding error (abs(new_sales_after_smooth - desired_monthly_sales))/30) to every day in month, will not change the situation and there will also be sharp ups and downs from month to month.
And sales from month to month not only can be increasing, also decreasing.
Saving weekly seasonability is also important
I would be grateful for any ideas on how to solve this problem. Example data is below. Numbers in propotrion column is part of monthly sales.
weekday
proportion
Monday
0.040088
Tuesday
0.028345
Wednesday
0.027814
Thursday
0.034188
Friday
0.035997
Saturday
0.031616
Sunday
0.032600
month
sales
July
16263212
August
17422652
September
18028792
October
20588807
November
26466756
December
40903354

Related

Creating a day of year column overriding the leap day in a leap year

I have a large database of climate variables - daily values of temp, humidity etc. I have a timestamp column %Y%m%d. I have removed leap days, as I need uniform 365 days for each of my years. I want to add a new column called 'day_of_year' with 1 to 365 for each year for as many years as I have in my database. How can I accomplish this in python, any pointers, please?
If I use the day of year function from pandas, I get 59 for feb 28 and get 61 for Mar 1. Is there a way to override the leap year, as I have dropped the leap day and get 60 for Mar 1?
Use pandas' day of year function, but instead of giving it the real timestamp, e.g. "2022-11-27", give it "2021-" + timestamp[-5:]. This will give you the altered number as if the timestamp was not a leap year.

How do I split the number of days in a 24 year period (1995-2019)

Please forgive the use of the photo but I tried copying out the dataframe but it wasn’t coming out the way I wanted it to.
The number of sales is represented by the number of rows of the dataframe which is 30255.
Above is a sample of the dataframe I am working with.
Letting n be the number of days starting at n=1 for 1st January 1995 and ending at n=9131 for 31st December 2019.
And also considering the number of sales of 'D' over each 365-day period. (Representing each datapoint for the yearly sales using day 183 as the midpoint of the first 365-day period.)
My problem is how to split that data into each 365-day period.
Any help would be much appreciated

Can the day of the month be encoded similarily to other cyclical variables?

I am working with time-series data in Python to see if variables like the time of day and the day of the month and the month of the year affect attendance at a gym. I have read up on encoding the time series data cyclicly using sine and cosine. I was wondering if you can do the same thing for the day of the month. The reason I ask is that, unlike the number of months in a year or the number of days in a week, the number of days in a month is variable (for example, February has 28, whereas March has 31). Is there any way to deal with that?
Here is a link describing what I mean by cyclic encoding: https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/
Essentially, what this is saying is that you can't just convert the hour into a series of values like 1, 2, 3, ..., 24 when you are doing machine learning because that implies that the 24th hour is further away (from a euclidean geometric perspective) from the 1st hour than the 1st hour is from the 2nd hour, which is not true. Cyclical encoding (assigning sine and cosine values to each hour) allows you to represent the fact that the 24th hour and the 2nd hour are equidistant from the 1st hour.
My question is that I do not know if this cyclical conversion will work for days in a month, seeing as different months can have different numbers of days.
You can implement this by dividing each month into 2π radians; then in a 28-day month, a day is 0.2234 while in a 31-day month, a day is 0.2026.
This obviously introduces a skew where a shorter month will appear to take up as much time as a longer one; but it will satisfy your requirement. If you only use this metric for normalizing a single feature, that should be inconsequential, and let you achieve the stated goal.
If you have points in time with a finer granularity than a day, you obviously can and probably should normalize those into the same projection.

Using pandas resample monthly data to yearly data but start from a certain month

How do I resample monthly data to yearly data but starting from 1st October.
I tried the following as I know using base works for starting at a certain hour of a day but doesnt appear to work for month of the year.
df = (df.resample(rule='Y', base=10).sum().reset_index())
Here is how you do it:
offset = pd.DateOffset(months=9)
df.shift(freq=-offset).resample('YS').sum().shift(freq=offset)
Pandas has anchored offsets available for annual resamples starting at the first of a month.
The anchored offset for annual resampling starting in October is AS-OCT. Resampling and summing can be done like this:
df.resample("AS-OCT").sum()

Pandas dataframe create Variable "Winter & rolling year" (interyears)

I am looking at a history of daily min/max temperatures for the past ~40 years of a specific city (the variable precipitation isn't needed).
I imported the CSV file with the aim to calculate an average for the low and high temperature for each winter (I consider the range November-March as winter). So I suppose a solution could be to loop over the years and maybe create a column which consists of "Winter&year" (for instance the first of december 2018 fell in winter 2018 and the 23rd of February 2019 fell in winter 2018 too). I found plenty of examples to aggregate days/months into seasons but nothing where the year changes and I actually struggle with that bit.
The structure of the data is the following:
Could anyone point me to the right direction?
Many thanks

Categories

Resources