Serially Correlated Demand in Python - python

Trying to solve the following problem but not sure how to continue:
Suppose a logistics company would like to simulate demand for a given product.
Assume that there are Good and Bad weeks.
On a good week, the demand is normally distributed with mean 200 and standard deviation 50.
On a bad week, the demand is normally distributed with mean 100 and standard deviation 30.
As a practical constraint, you should round the decimal part of demand to the nearest integer and set it to zero if it is ever negative.
Additionally, we should assume that a week being good or bad is serially correlated across time.
Conditional on a given week being Good, the next week remains Good with probability 0.9. Similarly, conditional on a given week being Bad, the next week remains Bad with probability 0.9.
You are to simulate a time series of demand for 100 weeks, assuming the first week starts Good. Also, plot the demand over time.
This is what I have so far:
rng = np.random.default_rng(seed=42)
simulated_demand = [rng.normal(200, 50)]
for t in range(1, 100):
if simulated_demand[t-1] == rng.normal():
simulated_demand.append(rng.normal(150, 70))
else:
simulated_demand.append(rng.normal(50, 15))
simulated_demand = pd.DataFrame(simulated_demand, columns=['Demand Time Series'])
simulated_demand.plot(style='r--', figsize=(10,3))
How can I fix the if condition?

Related

Predict the future graph based on averages of given data

I am trying to make a future stock price forecaster, i am nearly done but the final step has stumped me.
How do i predict the future of the graph based on the different averages of given data?
#how it works up to now:
stockprice =[1, 2, 3, ... 9999]
#for every number in stock price, add that number till x amount(x would be input) numbers and divide them (calculate average)
StockDataSeperate = StockData_AverageFinder[-int_toSplitBy:-1]
for num in StockDataSeperate:
Average += num
Average = Average / len(StockDataSeperate)
Averaged_StockData = np.append(Averaged_StockData, Average)
#doing this x amount of times and exponentiating the number to average by, by x.
using this data (StockPrice averaged graphs), is it possible to predict the future of the raw data using the data averaged?
if anyone has any links or ideas i would be so greatful!
Obviously, using a moving average for future values does not work since you don't have values beyond the present. In theory you would assume that near-term stock prices follow a random walk, so you best guess for future value would be to simply predict the last known value.
However, a more "exciting" solution could be to train a LSTM by turning the stock price series into a supervised learning problem. It is important that you dont predict the price itself but the return between the stock prices in your time series. Of course you can also use the returns of moving averages as inputs or even multiple moving averages and conduct multivariate time series forecasting.
Hopefully, I don't have to mention that stock price prediction is not that "easy" - it's a good exercise though.

Time series frequency in 5 minutes timestamps

I have two large time series data. Both is separated by 5minutes intervals timestamp. The length of each time series is 3month from(August 1 2014 to October 2014). I’m using R (3.1.1) for forecasting the data. I’d like to know the value of the “frequency” argument in the ts() function in R, for each data set. Since most of the examples and cases I’ve seen so far are for months or days at the most, it is quite confusing for me when dealing with equally separated 5 minutes.
I would think that it would be either of these:
myts1 <- ts(series, frequency = (60*60*24)/5)
myts2 <- ts(series, deltat = 5/(60*60*24))
In the first, the frequency argument gives the number of times sampled per time unit. If time unit is the day, there are 606024 seconds per day and you're sampling every 5 of them, so you would be sampling 17280 times per day. Alternatively, the second option is what fraction of a day separates each sample. Here, we would say that every 5.787037e-05 of a day, a sample is drawn. If the time unit is something different (e.g., the hour), then obviously these would change

remove/isolate days when there is no change (in pandas)

I have annual hourly energy data for two AC systems for two hotel rooms. I want to figure out when the rooms were occupied or not by isolating/removing the days when the ac was not used for 24 hours.
I did df[df.Meter2334Diff > 0.1] for one room which gives me all the hours when AC was turned on, however it removes the hours of the days when the room was most likely occupied and the AC was turned off. This is where my knowledge stops. I therefore enquire the assistance from the oracles of the internet.
my dataframe above
results after df[df.Meter2334Diff > 0.1]
If I've interpreted your question correctly, you want to extract all the days from the dataframe where the Meter2334Diff value was zero?
As your data is currently has a frequency of every hour, we can resample it in pandas using the resample() function. To resample() we can pass the freq parameter which tells pandas at what time interval to aggregate the data. There are lots of options (see the docs) but in your case we can set freq='D' to group by day.
Then we can calculate the sum of that day for the Meter2334Diff column. If we then filter out the days that have a value == 0 (obviously without knowledge of your dataset etc I don't know whether 0 is the correct value).
total_daily_meter_diff = df.resample('D')['Meter2334Diff'].sum()
days_less_than_cutoff = total_daily_meter_diff.query('MeterDiff2334 == 0')
We can then use these days to filter in the original dataset:
df.loc[df.index.floor('D').isin(days_less_than_cutoff) , :]

Exponential Weighted Moving Average using Pandas

I need to confirm few thing related to pandas exponential weighted moving average function.
If I have a data set df for which I need to find a 12 day exponential moving average, would the method below be correct.
exp_12=df.ewm(span=20,min_period=12,adjust=False).mean()
Given the data set contains 20 readings the span (Total number of values) should equal to 20.
Since I need to find a 12 day moving average hence min_period=12.
I interpret span as total number of values in a data set or the total time covered.
Can someone confirm if my above interpretation is correct?
I can't get the significance of adjust.
I've attached the link to pandas.df.ewm documentation below.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html
Quoting from Pandas docs:
Span corresponds to what is commonly called an “N-day EW moving average”.
In your case, set span=12.
You do not need to specify that you have 20 datapoints, pandas takes care of that. min_period may not be required here.

Python - Zero-Order Hold Interpolation (Nearest Neighbor)

I will be shocked if there isn't some standard library function for this especially in numpy or scipy but no amount of Googling is providing a decent answer.
I am getting data from the Poloniex exchange - cryptocurrency. Think of it like getting stock prices - buy and sell orders - pushed to your computer. So what I have is timeseries of prices for any given market. One market might get an update 10 times a day while another gets updated 10 times a minute - it all depends on how many people are buying and selling on the market.
So my timeseries data will end up being something like:
[1 0.0003234,
1.01 0.0003233,
10.0004 0.00033,
124.23 0.0003334,
...]
Where the 1st column is the time value (I use Unix timestamps to the microsecond but didn't think that was necessary in the example. The 2nd column would be one of the prices - either the buy or sell prices.
What I want is to convert it into a matrix where the data is "sampled" at a regular time frame. So the interpolated (zero-order hold) matrix would be:
[1 0.0003234,
2 0.0003233,
3 0.0003233,
...
10 0.0003233,
11 0.00033,
12 0.00033,
13 0.00033,
...
120 0.00033,
125 0.0003334,
...]
I want to do this with any reasonable time step. Right now I use np.linspace(start_time, end_time, time_step) to create the new time vector.
Writing my own, admittedly crude, zero-order hold interpolator won't be that hard. I'll loop through the original time vector and use np.nonzero to find all the indices in the new time vector which fit between one timestamp (t0) and the next (t1) then fill in those indices with the value from time t0.
For now, the crude method will work. The matrix of prices isn't that big. But I have to think there a faster method using one of the built-in libraries. I just can't find it.
Also, for the example above I only use a matrix of Nx2 (column 1: times, column 2: price) but ultimately the market has 6 or 8 different parameters that might get updated. A method/library function that could handled multiple prices and such in different columns would be great.
Python 3.5 via Anaconda on Windows 7 (hopefully won't matter).
TIA
For your problem you can use scipy.interpolate.interp1d. It seems to be able to do everything that you want. It is able to do a zero order hold interpolation if you specify kind="zero". It can also simultaniously interpolate multiple columns of a matrix. You will just have to specify the appropriate axis. f = interp1d(xData, yDataColumns, kind='zero', axis=0) will then return a function that you can evaluate at any point in the interpolation range. You can then get your normalized data by calling f(np.linspace(start_time, end_time, time_step).

Categories

Resources