Theano dimensionality error - target dimensions

Theano dimensionality error - target dimensions - python

I am using lasagne's Conv3DDNNLayer, and have input dimensions of (N x 1 x 9 x 9 x 9), where each 9x9x9 cube is my sample to be classified.
Therefore I have a target dimension of (N x 1), with each entry corresponding to a cube. This is raising the error:
Bad input argument to theano function with name "Conv_Net_1.py:45" at index 1(0-based)', 'Wrong number of dimensions: expected 1,
got 2 with shape (324640, 1).')´
Which dimensions should I have my targets in in this case?
11 dtensor5 = TensorType('float32', (False,)*5)
12 input_var = dtensor5('X_Train')
13 target_var = T.ivector('Y_train')
14
15 X_train, Y_train = DP.data_gen( '/home/Upload/Smalls', 9)
16 print X_train.shape
17 print Y_train.shape
18 # Build Neural Network:
19 input = lasagne.layers.InputLayer((None, 1, 9, 9, 9), input_var=input_var)
20
21 l_conv_1 = lasagne.layers.dnn.Conv3DDNNLayer(input, 20, (2,2,2))
22
29 l_hidden1 = lasagne.layers.DenseLayer(l_conv_1, num_units=256,nonlinearity=lasagne.nonlinearities.rectify,W=l asagne.init.HeNormal(gain='relu'))
30
31 l_hidden1_dropout = lasagne.layers.DropoutLayer(l_hidden1, p=0.5)
32
33 output = lasagne.layers.DenseLayer(l_hidden1_dropout, num_units=2, nonlinearity = lasagne.nonlinearities.soft max)
34
35 ##
36 prediction = lasagne.layers.get_output(output)
37 loss = T.mean(lasagne.objectives.categorical_crossentropy(prediction, target_var)
39
40 # Get list of all trainable parameters in the network.
41 params = lasagne.layers.get_all_params(output, trainable=True)
42 updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.3)
43
44 ##
45 train_fn = theano.function([input_var, target_var], loss, updates=updates)
46
47 ##
48 for epoch in range(500):
49 print('training')
50 loss = train_fn(X_train, Y_train)
51 print(loss.type)
52 print("Epoch %d: Loss %g" % (epoch + 1, loss))
53
54
55 ##
56 test_prediction = lasagne.layers.get_output(output, deterministic=True)
57 predict_fn = theano.function([input_var], T.argmax(test_prediction, axis=1))
edit - added code
Thanks!

In case any one is interested, it was because the data was (N, 1) not (N, ).
seemed to solve the problem! - on to the next..

Related

Optimize dataframe fill and refill Python Pandas

I have changed the column names and have added new columns too.
I am having a numpy array that I have to fill in the respective dataframe columns.
I am getting a delayed response in filling the dataframe using the following code:
import pandas as pd
import numpy as np
df = pd.read_csv("sample.csv")
df = df.tail(1000)
DISPLAY_IN_TRAINING = []
Slice_Middle_Piece_X = slice(None,-1, None)
Slice_Middle_Piece_Y = slice(-1, None)
input_slicer = slice(None, None)
output_slice = slice(None, None)
seq_len = 15 # choose sequence length
n_steps = seq_len - 1
Disp_Data = df
def Generate_DataSet(stock,
df_clone,
seq_len
):
global DISPLAY_IN_TRAINING
data_raw = stock.values # convert to numpy array
data = []
len_data_raw = data_raw.shape[0]
for index in range(0, len_data_raw - seq_len + 1):
data.append(data_raw[index: index + seq_len])
data = np.array(data);
test_set_size = int(np.round(30 / 100 * data.shape[0]));
train_set_size = data.shape[0] - test_set_size;
x_train, y_train = Get_Data_Chopped(data[:train_set_size])
print("Training Sliced Successful....!")
df_train_candle = df_clone[n_steps : train_set_size + n_steps]
if len(DISPLAY_IN_TRAINING) == 0:
DISPLAY_IN_TRAINING = list(df_clone)
df_train_candle.columns = DISPLAY_IN_TRAINING
return [x_train, y_train, df_train_candle]
def Get_Data_Chopped(data_related_to):
x_values = []
y_values = []
for index,iter_values in enumerate(data_related_to):
x_values.append(iter_values[Slice_Middle_Piece_X,input_slicer])
y_values.append([item for sublist in iter_values[Slice_Middle_Piece_Y,output_slice] for item in sublist])
x_values = np.asarray(x_values)
y_values = np.asarray(y_values)
return [x_values,y_values]
x_train, y_train, df_train_candle = Generate_DataSet(df,
Disp_Data,
seq_len
)
df_train_candle.reset_index(drop = True, inplace = True)
df_columns = list(df_train_candle)
df_outputs_name = []
OUTPUT_COLUMN = df.columns
for output_column_name in OUTPUT_COLUMN:
df_outputs_name.append(output_column_name + "_pred")
for i in range(len(df_columns)):
if df_columns[i] == output_column_name:
df_columns[i] = output_column_name + "_orig"
break
df_train_candle.columns = df_columns
df_pred_names = pd.DataFrame(columns = df_outputs_name)
df_train_candle = df_train_candle.join(df_pred_names, how="outer")
for row_index, row_value in enumerate(y_train):
for valueindex, output_label in enumerate(OUTPUT_COLUMN):
df_train_candle.loc[row_index, output_label + "_orig"] = row_value[valueindex]
df_train_candle.loc[row_index, output_label + "_pred"] = row_value[valueindex]
print(df_train_candle.head())
The shape of my y_train is (195, 24) and the dataframe shape is (195, 48). Now I am trying to optimize and make the process work faster. The y_train may change shape to say (195, 1) or (195, 5).
So please can someone tell me what other way (optimized way) for doing the above process? I want a general solution that could fit anything without loosing the data integrity and is faster too.
If teh data size increases from 1000 to 2000 the process become slow. Please advise how to make it faster.
Sample Data df looks like this with shape (1000, 8)
A B C D E F G H
64272 195 215 239 272 22 11 33 55
64273 196 216 240 273 22 11 33 55
64274 197 217 241 274 22 11 33 55
64275 198 218 242 275 22 11 33 55
64276 199 219 243 276 22 11 33 55
The output looks like this:
A_orig B_orig C_orig D_orig E_orig F_orig G_orig H_orig A_pred B_pred C_pred D_pred E_pred F_pred G_pred H_pred
0 10 30 54 87 22 11 33 55 10 30 54 87 22 11 33 55
1 11 31 55 88 22 11 33 55 11 31 55 88 22 11 33 55
2 12 32 56 89 22 11 33 55 12 32 56 89 22 11 33 55
3 13 33 57 90 22 11 33 55 13 33 57 90 22 11 33 55
4 14 34 58 91 22 11 33 55 14 34 58 91 22 11 33 55
Please generate csv columns with 1000 or more lines and see that the program becomes slower. I want to make it faster. I hope this is good to go for understanding.

Slicing data frame for N steps in the future - tensorflow

I'm currently tinkering with Tensorflow in my spare time. I came up with a simple data set:
A B C D E
1 2 5 7 9
2 4 10 14 18
3 6 15 21 27
4 8 20 28 36
5 10 25 35 45
6 12 30 42 54
7 14 35 49 63
8 16 40 56 72
9 18 45 63 81
I am slicing the data as follows:
sc = MinMaxScaler(feature_range=(0,1))
training_set = dataset_train.iloc[:,:].values
scaled_training_set = sc.fit_transform(training_set)
time_step=3
X_train = []
Y_train = []
for i in range (time_step,len(training_set)):
X_train.append(scaled_training_set[i-time_step:i,:])
Y_train.append(scaled_training_set[i:i+3,3])
X_train = np.array(X_train)
Y_train = np.array(Y_train)
X_train = np.reshape(X_train,(X_train.shape[0],X_train.shape[1],-1))
print(X_train.shape,Y_train.shape)
And my model is as follows:
model = Sequential()
model.add(LSTM(units=50,return_sequences=True, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Dropout(0.3))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=10,return_sequences=True))
model.add(Dense(units=3))
model.compile(optimizer='adam',loss='mean_squared_error')
model.summary()
model.fit(X_train,Y_train,epochs=50,batch_size=time_step)
However, when I attempt this model, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-120-ec3d3648bdb9> in <module>()
13 model.compile(optimizer='adam',loss='mean_squared_error')
14 model.summary()
---> 15 model.fit(X_train,Y_train,epochs=50,batch_size=time_step)
13 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)
99
100
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
I am assuming this is an issue with either (1) how I am slicing the dataframe or (2) how I am tinkering with the hyperparameters.
Any insight would be appreciated.
Edit:
To address the comment below, I padded Y_train[4] and Y_train[5] so as to make them be arrays with 3 elements. This still throws the above error.

I have tried optimizing model parameters using optim() but I am getting the same parameters. How do i get the optimized parameters?

Here I have modeled the Epidemic disease for which I am trying to optimize the parameters so that I can fit the model with the covid-19 data. The problem here is an I am not able to get the optimized parameters instead getting the same parameters. what should I fix here and get optimized parameters for my model? Please help me correcting the code.
library(deSolve)
library(reshape2)
library(ggplot2)
library(ggpubr)
# Model Input
N <- 1380004385
p <- .87 #0.88689
initial_state_values <- c(S = N - 1, E = 0, A = 0, I = 1, T = 0, F = 0, R = 0)
parameters <- c(pie = .6782, mu = 0.0000391, beta =.89768, alpha = 0.24757, phi = 0.08,
h = 1.08, mu_I = 0.06891, mu_T = 0.06891, gamma_I = 0.05090,
gamma_T = 0.07048) #67446.82054)
time <- seq(1, 164, .1)
# Input function: Differential equation
SEAITFR_fn <- function(time, initial_state_values, parameters) {
with(as.list(c(initial_state_values, parameters)), {
N <- S+E+A+I+T+F+R
lamda <- beta/N * (A + I)
dS <- pie - (lamda + mu) * S
dE <- lamda * S - (alpha + mu) * E
dA <- alpha * (1-p) * E - (mu + phi) * A
dI <- alpha * p * E + phi * A -(h + gamma_I + mu_I + mu) * I
dT <- h * I - (gamma_T + mu_T) * T
dF <- mu_I * I + mu_T * T
dR <- gamma_I * I + gamma_T * T - mu * R
return(list(c(dS, dE, dA, dI, dT, dF, dR)))
})
}
output <- as.data.frame(ode(y = initial_state_values,
times = time,
func = SEAITFR_fn,
parms = parameters)
)
#output
output$total_prevalence <- output$I + output$T
#output$total_prevalence
# Distance Function
SEAITFR_SSQ <- function(parameters, data) {
output <- as.data.frame(ode(y = initial_state_values, # vector of initial state
times = time, # vector of times
func = SEAITFR_fn,
parms = parameters)
)
data <- na.omit(data)
deltas_square <- (output$total_prevalence[output$time %in% data$time] -
data$new_cases)^2
SSQ <- sum(deltas_square)
return(SSQ)
}
# Real world data
covid_19_data <- read_excel("covid.xlsx")
covid_19_data
#Optimization
optimised <- optim(par = c(pie =.6782, mu = 0.0000391, beta = 0.88689, alpha = 0.24757, phi = 0.08,
h = 1.08, mu_I = 0.06891, mu_T = 0.06891, gamma_I = 0.05090,
gamma_T = 0.07048), # these are the starting beta
# and gamma that will be fed
# first, into SSQ_fn
fn = SEAITFR_SSQ,
data = covid_19_data # this argument comes under "..."
# "Further arguments to be passed to fn a
)
optimised
optimised_model <- as.data.frame(ode(y = initial_state_values,
times = time, func = SEAITFR_fn, parms = optimised$par))
# Optimised_model
optimised_model$prevalence <- optimised_model$I + optimised_model$T
#optimised_model$prevalence
# Plotting
plot2 <- ggplot() + geom_line(data = optimised_model,
aes(x = time, y = prevalence)) +
geom_point(data = covid_19_data, aes(x = time, y = new_cases), colour = "red" ) +
xlab("Times(days)") + ylab("No. of infection") + labs(title =
paste("Calibration of SIR model with optimised value ")) + ylim(0,30000)
plot2

I am able to get your model to run with different parameters based on the output. Also, I have used maximum likelihood estimation and it is working just fine with the covid data

require(deSolve)
SIR_fn <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S+I+R
dS <- -beta*S*I/N
dI <- beta*S*I/N-gamma*I
dR <- gamma*I
return(list(c(dS, dI, dR)))
})
}
SIR_SSQ <- function(parameters, dat) { # parameters must contain beta & gamma
# calculate model output using your SIR function with ode()
result <- as.data.frame(ode(y = initial_state_values # vector of initial state
# values, with named elements
, times = times # vector of times
, func = SIR_fn # your predefined SIR function
, parms = parameters) # the parameters argument
# entered with SIR_SSQ()
)
# SSQ calculation: needs the dat argument (the observed data you are fitting to)
# assumes the data you are fitting to has a column "I"
# select only complete cases, i.e. rows with no NAs, from the dataframe
dat <- na.omit(dat)
# select elements where results$time is in dat$time
deltas2 <- (result$I[result$time %in% dat$time] - dat$I)^2
SSQ <- sum(deltas2)
return(SSQ)
}
flu_dat <- read.csv("flu.csv")
head(flu_dat, 10)
# time I
# 1 1 16928
# 2 2 21389
# 3 3 20006
# 4 4 21368
# 5 5 25349
# 6 6 21405
# 7 7 17662
# 8 8 17492
# 9 9 18361
# 10 10 20951
initial_state_values <- c(S = 100000, I = 1, R = 0)
# choose values to start your optimisation
beta_start <- 0.1
gamma_start <- 0.1
# times - dense timesteps for a more detailed solution
times <- seq(from = 0, to = 107, by = 0.01)
# optim
# you will need to run the cells to assign your functions first
optimised <- optim(
par = c(beta = beta_start
, gamma = gamma_start) # these are the starting beta
# and gamma that will be fed
# first, into SSQ_fn
,
fn = SIR_SSQ
,
dat = flu_dat # this argument comes under "..."
# "Further arguments to be passed to fn and gr"
)
optimised #have a look at the model output
# $par
# beta gamma
# 0.40487383 0.01637813
#
# $value
# [1] 25002879441
#
# $counts
# function gradient
# 81 NA
#
# $convergence
# [1] 0
#
# $message
# NULL
#
#
# optimised$par
# beta gamma
# 0.40487383 0.01637813
opt_mod <- as.data.frame(ode(y = initial_state_values # named vector of initial
# state values
, times = times # vector of times
, func = SIR_fn # your predefined SIR function
, parms = optimised$par))
## plot your optimised model output, with the epidemic data using ggplot ##
require(ggplot2)
opt_plot <- ggplot()
opt_plot <- opt_plot + geom_point(aes(x = time, y = I)
, colour = "red"
,size = 3
, shape = "x"
, data = flu_dat)
opt_plot <- opt_plot + geom_line(aes(x = time, y = I)
, colour = "blue"
, data = opt_mod)
opt_plot
flu_dat
# time I
# 1 1 16928
# 2 2 21389
# 3 3 20006
# 4 4 21368
# 5 5 25349
# 6 6 21405
# 7 7 17662
# 8 8 17492
# 9 9 18361
# 10 10 20951
# 11 11 23042
# 12 12 24914
# 13 13 25268
# 14 14 19019
# 15 15 19456
# 16 16 23693
# 17 17 26499
# 18 18 27867
# 19 19 30981
# 20 20 31917
# 21 21 25968
# 22 22 30676
# 23 23 36208
# 24 24 34426
# 25 25 40356
# 26 26 45353
# 27 27 41359
# 28 28 40409
# 29 29 40057
# 30 30 45980
# 31 31 51488
# 32 32 55588
# 33 33 51841
# 34 34 45549
# 35 35 49580
# 36 36 44162
# 37 37 60755
# 38 38 59740
# 39 39 62567
# 40 40 67859
# 41 41 60038
# 42 42 59085
# 43 43 58793
# 44 44 67427
# 45 45 67635
# 46 46 77102
# 47 47 71672
# 48 48 62470
# 49 49 60716
# 50 50 61477
# 51 51 64239
# 52 52 71915
# 53 53 68511
# 54 54 73208
# 55 55 65375
# 56 56 54864
# 57 57 55947
# 58 58 65785
# 59 59 71832
# 60 60 67599
# 61 61 67911
# 62 62 57839
# 63 63 46269
# 64 64 44567
# 65 65 57283
# 66 66 54433
# 67 67 59481
# 68 68 58322
# 69 69 54449
# 70 70 46369
# 71 71 48733
# 72 72 46910
# 73 73 56811
# 74 74 51742
# 75 75 64764
# 76 76 46790
# 77 77 41181
# 78 78 36484
# 79 79 45034
# 80 80 47393
# 81 81 44087
# 82 82 48253
# 83 83 43653
# 84 84 34510
# 85 85 36432
# 86 86 39979
# 87 87 45520
# 88 88 45120
# 89 89 46929
# 90 90 45414
# 91 91 34787
# 92 92 35138
# 93 93 41677
# 94 94 40951
# 95 95 43781
# 96 96 50129
# 97 97 43192
# 98 98 31511
# 99 99 23545
# 100 100 26845
# 101 101 33895
# 102 102 36066
# 103 103 47439
# 104 104 41003
# 105 105 34305
# 106 106 33842
# 107 107 39385

Python recursive function failing

The issue that I am having is a really strange issue.
What I am trying to accomplish is the following: I am training a neural network using pytorch, and I want to restart my training function if the training loss doesn't decrease, so as to re-initialize the neural network with a different set of weights. The training function is presented below:
def __train__(dp, i, j, net, restarts, epoch=0):
if net == '2CH': model = TwoChannelCNN().cuda()
elif net == 'Siam' : model = SiameseCNN().cuda()
elif net == 'Trad' : model = TraditionalCNN().cuda()
ls_fn = torch.nn.MSELoss(reduce=True)
optim = torch.optim.SGD(model.parameters(), lr=1e-6, momentum=0.9)
epochs = np.arange(100)
eloss = []
for epoch in epochs:
model.train()
train_loss = []
tr_batches = np.array_split(dp.train_set, int(len(dp.train_set)/8))
for tr_batch in tr_batches:
if net == '2CH': loaded_batch = dp.__load2CH__(tr_batch)
elif net == 'Siam': loaded_batch = dp.__loadSiam__(tr_batch)
elif net == 'Trad' : loaded_batch = dp.__load__(tr_batch, i)
for x_batch, y_batch in loaded_batch:
x_var, y_var = Variable(x_batch.cuda()), Variable(y_batch.cuda())
y_pred = torch.clamp(model(x_var), 0, 1)
loss = ls_fn(y_pred, y_var)
train_loss.append(abs(loss.item()))
optim.zero_grad()
loss.backward()
optim.step()
eloss.append(np.mean(train_loss))
print(epoch, np.mean(train_loss))
if epoch == 10 and np.mean(train_loss) > 0.2:
restarts += 1
print('Number of restarts for client {} and fold {}: {}'.format(i,j,restarts))
__train__(dp, i, j, net, restarts, epoch=0)
__plotLoss__(epochs, eloss, 'train', str(i), str(j))
torch.save(model.state_dict(), "Output/client_{}_fold_{}.pt".format(i, j))
So the restarting based on if epoch == 10 and np.mean(train_loss) > 0.2: works, but only sometimes, which is beyond my comprehension. Here is an example of the output:
0 0.5000133737921715
1 0.4999906486272812
2 0.464298670232296
3 0.2727506290078163
4 0.2628978116512299
5 0.2588871221542358
6 0.25728522151708605
7 0.25630473804473874
8 0.2556223524808884
9 0.25522999209165576
10 0.25467908215522767
Number of restarts for client 5 and fold 1: 3
0 0.10957609283713009
1 0.02840371729924134
2 0.021477583368030594
3 0.017759160268232682
4 0.015173796122947827
5 0.013349939693290782
6 0.011949078906879265
7 0.010810676779671655
8 0.00987362345259362
9 0.009110640348696108
10 0.008239036202623808
11 0.007680381585537574
12 0.007171026876221333
13 0.006765962297888837
14 0.006428168776848068
15 0.006133011780953467
16 0.005819878347673745
17 0.005572605537395361
18 0.00535818950227004
19 0.005159409143814457
20 0.0049763926251294235
21 0.004738794513338235
22 0.004578812885309958
23 0.004428663117960554
24 0.004282198464788351
25 0.004145324644400691
26 0.004018862769889626
27 0.0039044404603504573
28 0.0037960831121495744
29 0.0036947361258523586
30 0.0035982220717533267
31 0.0035018146670104723
32 0.0034150678806059887
33 0.0033372560733512698
34 0.003261332974241583
35 0.00318166259540763
36 0.003108531899014735
37 0.0030385089141125848
38 0.002977990984523103
39 0.0029195284016142937
40 0.002870084639441188
41 0.0028180573325994373
42 0.0027717544270049643
43 0.002719321814503495
44 0.0026704726860933194
45 0.0026204266263459316
46 0.002570544072460258
47 0.0025225681523167224
48 0.0024814611543610746
49 0.0024358948737413116
50 0.002398673941639636
51 0.0023606415423654587
52 0.002330436484101057
53 0.0022891738560574027
54 0.002260655496376241
55 0.002227568955708719
56 0.002191826719741698
57 0.0021609061182290058
58 0.0021279943092100666
59 0.0020966088490456513
60 0.002066195117003474
61 0.0020381672924407895
62 0.002009863329306995
63 0.001986304977759602
64 0.0019564831849032487
65 0.0019351609173580756
66 0.0019077356409993626
67 0.0018875047204855945
68 0.0018617453310780547
69 0.001839518720600381
70 0.001815563331498197
71 0.0017149778925132932
72 0.0016894878409248121
73 0.0016652211918212743
74 0.0016422999463582074
75 0.0016183732903472788
76 0.0015962369183098418
77 0.0015757764620279887
78 0.0015542267022799728
79 0.0015323152910759318
80 0.0014337954093957706
81 0.001410489170542867
82 0.0013871921329466962
83 0.0013641994057461773
84 0.001345829172682187
85 0.001322142209181493
86 0.00130379223035348
87 0.001282231878045458
88 0.001263879886683956
89 0.001243419097817167
90 0.0012279346547037929
91 0.001206978429649382
92 0.0011871445969959496
93 0.001172510546330841
94 0.0011529557384797045
95 0.0011350733004023273
96 0.001118382818282214
97 0.001103347793609089
98 0.0010848538354748599
99 0.0010698940242660911
11 0.2542190085053444
12 0.2538975296020508
So here you can see that the restarting is correct from the 3rd restart, but then, since the network converges, the training should be complete, but the function restarts AGAIN after the 99th epoch (for an unknown reason), and somehow starts at the 11th epoch, which also makes no sense as I am explicitly specifying epoch = 0 whenever the function starts or restarts. I should also add that, SOMETIMES, the function completes correctly after the epoch 99, when convergence has been achieved, and does not restart.
So my question is, why does this piece of code produce inconsistent results and outcomes? What am I missing here? Thanks in advance for any suggestions.

You are restarting the training by calling __train__ a second time in the case if epoch == 10 and np.mean(train_loss) > 0.2: but you never terminate the first loop.
So, after the second training has converged, the outer loop continues at epoch 11.
What you need is a break statement after the inner call to __train__.

What kind of model can i use to forecast this data?

This is the dataset that I have of some orders each week. I want to predict the orders for the rest of the year. I've tried building an ARIMA model and it doesn't work.
Is there any other model that I can try for such a small dataset? Maybe a HMM or try fitting a polynomial curve to it or build a time series LSTM?
FW Order
1 6
2 45
3 59
4 60
5 50
6 115
7 23
8 44
9 164
10 8
11 30
12 20
13 0
14 50
15 60
16 0
17 50
18 30
19 115
20 75
21 54
22 29
23 124
24 32
25 28

Here's a plot of your data. Your main problem is that there isn't really enough data for any model to give you meaningful predictions with statistical significance. Your data mostly just looks like white noise around a mean, so you'd represent it with:
x_t = mu + e
where e is an error term representing white noise.
There is a hint of mean reversion, so you could try an Ornstein Uhlenbeck model:
dx_t = theta * (mu - x_t-1) dt + sigma * dW_t
https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process
Here's it coded up. Orange line is the prediction. Again, the prediction isn't great, but you probably won't find much better without more data.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
def least_squares_naive(s, delta=1.0):
y = s.diff().iloc[1:]
x = s.shift(1)[1:]
res = sm.OLS(y, sm.add_constant(x)).fit()
b, a = res.params
residual_df = y - (a * x + b)
se = residual_df.std(ddof=2)
lambda_ = -a / delta
mu_ = b / (lambda_ * delta)
sigma_ = se / (delta ** 0.5)
return mu_, lambda_, sigma_
list = [6,45,59,60,50,115,23,44,164,8,30,20,0,50,60,0,50,30,115,75,54,29,124,32,28]
s = pd.Series(list)
mu_, lambda_, sigma_ = least_squares_naive(s)
dx = -lambda_ * (s - mu_)
pred = (s + dx).shift()
diff = s.diff(1).dropna()
s.plot()
pred.plot()
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Theano dimensionality error - target dimensions - python

In case any one is interested, it was because the data was (N, 1) not (N, ). seemed to solve the problem! - on to the next..

Related

Optimize dataframe fill and refill Python Pandas

Slicing data frame for N steps in the future - tensorflow

I have tried optimizing model parameters using optim() but I am getting the same parameters. How do i get the optimized parameters?

Python recursive function failing

What kind of model can i use to forecast this data?

Categories

Resources