Out of sample forecasting

Out of sample forecasting - python

I have the following code to perform an out-of-sample assessment of a time series. The idea is to perform a recursive and rolling method to calculate MAPE and MSPE.
The code is as follows:
long = len(y)
n_estimation = 83
real = y[(n_estimation):len(y)]
n_forecasting = long - n_estimation
horizontes = 2
predicc = np.zeros((horizontes,n_forecasting))
MSFE = np.zeros((horizontes, 1))
MAPE = np.zeros((horizontes, 1))
for Periods_ahead in range(horizontes):
for i in range(0,n_forecasting):
aux_y = y[0:(n_estimation - Periods_ahead + i)]
model = SARIMAX(endog = aux_y, order = (1,1,0), seasonal_order = (1,1,0,4))
model_fit=model.fit(disp=0)
y_pred = fit.forecast(Periods_ahead + 1)
predicc[Periods_ahead][i] = y_pred[0][Periods_ahead]
error = np.array(real) - predicc[Periods_ahead]
MSFE[Periods_ahead] = np.mean(error**2)
MAPE[Periods_ahead] = np.mean(np.abs(error/np.array(real))) * 100
df_pred = pd.DataFrame({"V1":predicc[0], "V2":predicc[1]})
print("MSFE",MSFE)
print("MAPE %",MAPE)
I am getting the following error, most likely related to using a newer version of SARIMAX.
ValueError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexes\range.py:392, in RangeIndex.get_loc(self, key, method, tolerance)
391 try:
--> 392 return self._range.index(new_key)
393 except ValueError as err:
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
c:\Users\dianaf\OneDrive - Microsoft\Documents\GitHub\big_data_operations\Homework2.ipynb Cell 36 in <cell line: 13>()
17 model_fit=model.fit(disp=0)
18 y_pred = fit.forecast(Periods_ahead + 1)
---> 19 predicc[Periods_ahead][i] = y_pred[0][Periods_ahead]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\series.py:982, in Series.__getitem__(self, key)
979 return self._values[key]
981 elif key_is_scalar:
--> 982 return self._get_value(key)
984 if is_hashable(key):
985 # Otherwise index.get_value will raise InvalidIndexError
986 try:
987 # For labels that don't resolve as scalars like tuples and frozensets
...
--> 394 raise KeyError(key) from err
395 self._check_indexing_error(key)
396 raise KeyError(key)
KeyError: 0
Any idea how to fix it without downgrading to previous versions of statsmodel?
Thank you!

Related

predict_model() giving error while using 'sod' (Subspace Outlier Detection) model in pycaret.anomaly

Getting error while predicting for the test data:
CODE:
from pycaret.anomaly import *
anom_exp = setup(train,session_id = 125,
categorical_features=['date', 'hours', 'weekNumber', 'DayName', 'isWeekday'],
numeric_features=['cpu_avg'],
ignore_features = ['Timestamp', 'time'])
sod = create_model('sod',fraction = 0.1)
sod_test= predict_model(model = sod, data = test)
ERROR:
KeyError Traceback (most recent call last)
File \~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3629, in Index.get_loc(self, key, method, tolerance)
3628 try:
\-\> 3629 return self.\_engine.get_loc(casted_key)
3630 except KeyError as err:
File \~/.local/lib/python3.10/site-packages/pandas/\_libs/index.pyx:136, in pandas.\_libs.index.IndexEngine.get_loc()
File \~/.local/lib/python3.10/site-packages/pandas/\_libs/index.pyx:163, in pandas.\_libs.index.IndexEngine.get_loc()
File pandas/\_libs/hashtable_class_helper.pxi:5198, in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
File pandas/\_libs/hashtable_class_helper.pxi:5206, in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In\[20\], line 1
\----\> 1 sod_test= predict_model(model = sod, data = test)
File \~/.local/lib/python3.10/site-packages/pycaret/anomaly/functional.py:941, in predict_model(model, data)
938 if experiment is None:
939 experiment = \_EXPERIMENT_CLASS()
\--\> 941 return experiment.predict_model(estimator=model, data=data)
File \~/.local/lib/python3.10/site-packages/pycaret/anomaly/oop.py:87, in AnomalyExperiment.predict_model(self, estimator, data, ml_usecase)
48 def predict_model(
49 self, estimator, data: pd.DataFrame, ml_usecase: Optional\[MLUsecase\] = None
50 ) -\> pd.DataFrame:
51 """
52 This function generates anomaly labels on using a trained model.
53
(...)
85
86 """
\---\> 87 return super().predict_model(estimator, data, ml_usecase)
File \~/.local/lib/python3.10/site-packages/pycaret/internal/pycaret_experiment/unsupervised_experiment.py:1354, in \_UnsupervisedExperiment.predict_model(self, estimator, data, ml_usecase)
1351 else:
1352 raise TypeError("Model doesn't support predict parameter.")
\-\> 1354 pred = estimator.predict(data_transformed)
1355 if ml_usecase == MLUsecase.CLUSTERING:
1356 data_transformed\["Cluster"\] = \[f"Cluster {i}" for i in pred\]
File \~/.local/lib/python3.10/site-packages/pyod/models/base.py:165, in BaseDetector.predict(self, X, return_confidence)
144 """Predict if a particular sample is an outlier or not.
145
146 Parameters
(...)
161 Only if return_confidence is set to True.
162 """
164 check_is_fitted(self, \['decision_scores\_', 'threshold\_', 'labels\_'\])
\--\> 165 pred_score = self.decision_function(X)
166 prediction = (pred_score \> self.threshold\_).astype('int').ravel()
168 if return_confidence:
File \~/.local/lib/python3.10/site-packages/pyod/models/sod.py:157, in SOD.decision_function(self, X)
140 def decision_function(self, X):
141 """Predict raw anomaly score of X using the fitted detector.
142 The anomaly score of an input sample is computed based on different
143 detector algorithms. For consistency, outliers are assigned with
(...)
155 The anomaly score of the input samples.
156 """
\--\> 157 return self.\_sod(X)
File \~/.local/lib/python3.10/site-packages/pyod/models/sod.py:187, in SOD.\_sod(self, X)
185 anomaly_scores = np.zeros(shape=(X.shape\[0\],))
186 for i in range(X.shape\[0\]):
\--\> 187 obs = X\[i\]
188 ref = X\[ref_inds\[i,\],\]
189 means = np.mean(ref, axis=0) # mean of each column
File \~/.local/lib/python3.10/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels \> 1:
3504 return self.\_getitem_multilevel(key)
\-\> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = \[indexer\]
File \~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3631, in Index.get_loc(self, key, method, tolerance)
3629 return self.\_engine.get_loc(casted_key)
3630 except KeyError as err:
\-\> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, \_check_indexing_error will raise
3634 # InvalidIndexError. Otherwise we fall through and re-raise
3635 # the TypeError.
3636 self.\_check_indexing_error(key)
KeyError: 0
I looked at the source code and I think I know where the problem is, but editing that did not predict anomalies correctly.
source code: https://pyod.readthedocs.io/en/latest/_modules/pyod/models/sod.html
There is a decision_function which is defined as:
def decision_function(self, X):
return self._sod(X)
What I think is the problem: Dataframe X should be changed to array type using check_array(X) before sending it to _sod function

How to use TimeseriesGenerator for GRU.fit()?

I am using TimeseriesGenerator for my problem.
The shapes for my train and test data are:
x_train - (306720, 20)
x_test - (306720,)
y_train - (4321, 20)
y_test - (4321,)
And their dtype is float64. And I dont need to use to.numpy() anymore.
I then use TimeSeriesGenerator
train_data = TimeseriesGenerator(x_train, x_test, length=144, batch_size=100)
test_data = TimeseriesGenerator(y_train, y_test, length=144, batch_size=100)
When I try to run
GRU = keras.models.Sequential([keras.layers.GRU(100), keras.layers.Dense(32, activation= 'relu')])
GRU.compile(loss="mae", optimizer="adam")
resultsGRU = GRU.fit(train_data, test_data, epochs = 5)
I get the following error:
File ~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~\Anaconda3\lib\site-packages\keras\engine\data_adapter.py:997, in KerasSequenceAdapter.__init__(self, x, y, sample_weights, shuffle, workers, use_multiprocessing, max_queue_size, model, **kwargs)
984 def __init__(
985 self,
986 x,
(...)
994 **kwargs
995 ):
996 if not is_none_or_empty(y):
--> 997 raise ValueError(
998 "`y` argument is not supported when using "
999 "`keras.utils.Sequence` as input."
1000 )
1001 if not is_none_or_empty(sample_weights):
1002 raise ValueError(
1003 "`sample_weight` argument is not supported when using "
1004 "`keras.utils.Sequence` as input."
1005 )
ValueError: `y` argument is not supported when using `keras.utils.Sequence` as input.
I tried
x, y = train_data[0]
print(x.shape, y.shape)
to convert it to float before I use GRU.fit(), but I get this error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:2131, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:2140, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 4331
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Input In [205], in <cell line: 1>()
----> 1 x, y = train_data[0]
2 print(x.shape, y.shape)
File ~\Anaconda3\lib\site-packages\keras\preprocessing\sequence.py:189, in TimeseriesGenerator.__getitem__(self, index)
177 rows = np.arange(
178 i,
179 min(i + self.batch_size * self.stride, self.end_index + 1),
180 self.stride,
181 )
183 samples = np.array(
184 [
185 self.data[row - self.length : row : self.sampling_rate]
186 for row in rows
187 ]
188 )
--> 189 targets = np.array([self.targets[row] for row in rows])
191 if self.reverse:
192 return samples[:, ::-1, ...], targets
File ~\Anaconda3\lib\site-packages\keras\preprocessing\sequence.py:189, in <listcomp>(.0)
177 rows = np.arange(
178 i,
179 min(i + self.batch_size * self.stride, self.end_index + 1),
180 self.stride,
181 )
183 samples = np.array(
184 [
185 self.data[row - self.length : row : self.sampling_rate]
186 for row in rows
187 ]
188 )
--> 189 targets = np.array([self.targets[row] for row in rows])
191 if self.reverse:
192 return samples[:, ::-1, ...], targets
File ~\Anaconda3\lib\site-packages\pandas\core\series.py:958, in Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples and frozensets
File ~\Anaconda3\lib\site-packages\pandas\core\series.py:1069, in Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)
File ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 4331
Can anyone please explain what is wrong?
My whole code worked fine before, I re-ran it to check if everything really works and now I suddenly have this problem and I don't know how to fix it.

How to correct solve trigonometric functions equation by sympy?

I have an equation with trigonometric functions as below:
eq = Eq(cos(theta_3), a_2*a_3*(-a_2**2/2 - a_3**2/2 + b**2/2 + z_4**2/2))
Then I try solve θ by sympy and code as below:
solve([eq, theta_3 < pi ], theta_3)
But it raise a exception and part of the information as follows:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File E:\conda\lib\site-packages\sympy\polys\polyutils.py:211, in _parallel_dict_from_expr_if_gens(exprs, opt)
209 base, exp = decompose_power_rat(factor)
--> 211 monom[indices[base]] = exp
212 except KeyError:
KeyError: cos(_theta_3)
During handling of the above exception, another exception occurred:
PolynomialError Traceback (most recent call last)
File E:\conda\lib\site-packages\sympy\solvers\inequalities.py:809, in _solve_inequality(ie, s, linear)
808 try:
--> 809 p = Poly(expr, s)
810 if p.degree() == 0:
File E:\conda\lib\site-packages\sympy\polys\polytools.py:182, in Poly.__new__(cls, rep, *gens, **args)
181 else:
--> 182 return cls._from_expr(rep, opt)
File E:\conda\lib\site-packages\sympy\polys\polytools.py:311, in Poly._from_expr(cls, rep, opt)
310 """Construct a polynomial from an expression. """
--> 311 rep, opt = _dict_from_expr(rep, opt)
312 return cls._from_dict(rep, opt)
Why does such an exception raise?
How to correct solve trigonometric functions equation by sympy?

Pandas - TypeError: Cannot perform 'rand_' with a dtyped [bool] array and scalar of type [bool]

I wanted to change a value of a cell with the conditions of another cell value and used this code dfT.loc[dfT.state == "CANCELLED" & (dfT.Activity != "created"), "Activity"] = "cancelled"
This is an Example Table:
ID
Activity
state
1
created
CANCELLED
1
completed
CANCELLED
2
created
FINNISHED
2
completed
FINISHED
3
created
REJECTED
3
rejected
REJECTED
and There is a Type Error like this:
TypeError Traceback (most recent call last)
~\miniconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_logical_op(x, y, op)
264 # (xint or xbool) and (yint or bool)
--> 265 result = op(x, y)
266 except TypeError:
~\miniconda3\lib\site-packages\pandas\core\ops\roperator.py in rand_(left, right)
51 def rand_(left, right):
---> 52 return operator.and_(right, left)
53
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
~\miniconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_logical_op(x, y, op)
278 try:
--> 279 result = libops.scalar_binop(x, y, op)
280 except (
pandas\_libs\ops.pyx in pandas._libs.ops.scalar_binop()
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'bool'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-6-350c55a06fa7> in <module>
4 # dfT2 = dfT1[dfT1.Activity != 'created']
5 # df.loc[(df.state == "CANCELLED") & (df.Activity != "created"), "Activity"] = "cancelled"
----> 6 dfT.loc[dfT.state == "CANCELLED" & (dfT.Activity != "created"), "Activity"] = "cancelled"
7 dfT
~\miniconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~\miniconda3\lib\site-packages\pandas\core\arraylike.py in __rand__(self, other)
61 #unpack_zerodim_and_defer("__rand__")
62 def __rand__(self, other):
---> 63 return self._logical_method(other, roperator.rand_)
64
65 #unpack_zerodim_and_defer("__or__")
~\miniconda3\lib\site-packages\pandas\core\series.py in _logical_method(self, other, op)
4987 rvalues = extract_array(other, extract_numpy=True)
4988
-> 4989 res_values = ops.logical_op(lvalues, rvalues, op)
4990 return self._construct_result(res_values, name=res_name)
4991
~\miniconda3\lib\site-packages\pandas\core\ops\array_ops.py in logical_op(left, right, op)
353 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
354
--> 355 res_values = na_logical_op(lvalues, rvalues, op)
356 # error: Cannot call function of unknown type
357 res_values = filler(res_values) # type: ignore[operator]
~\miniconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_logical_op(x, y, op)
286 ) as err:
287 typ = type(y).__name__
--> 288 raise TypeError(
289 f"Cannot perform '{op.__name__}' with a dtyped [{x.dtype}] array "
290 f"and scalar of type [{typ}]"
If anyone understand what's my mistake is please help.
Thanks in advance
-Alde

You need to wrap your conditions inside ()
Use:
dfT.loc[(dfT.state == "CANCELLED") & (dfT.Activity != "created"), "Activity"] = "cancelled"

Pyspark: Random forest featureSubsetStrategy not accepting int or float

I'm building a random forest classifier using pyspark. I want to set featureSubsetStrategy to be a number rather than auto, sqrt, etc. The documentation states:
featureSubsetStrategy = Param(parent='undefined', name='featureSubsetStrategy', doc='The number of features to consider for splits at each tree node. Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].')
However, when for example I choose a number such as 0.2, I get the following error:
TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type
The same happens if I was to use featureSubsetStrategy=5. How do you set it so it can be a int or float?
Example:
# setting target label
label_col = 'veh_pref_Economy'
# random forest parameters
max_depth = 2
subset_strategy = 0.2037
impurity = 'gini'
min_instances_per_node = 41
num_trees = 1
seed = 1246
rf_econ_gen = (RandomForestClassifier()
.setLabelCol(label_col)
.setFeaturesCol("features")
.setMaxDepth(max_depth)
.setFeatureSubsetStrategy(subset_strategy)
.setImpurity(impurity)
.setMinInstancesPerNode(min_instances_per_node)
.setNumTrees(num_trees)
.setSeed(seed))
This returns:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
418 try:
--> 419 value = p.typeConverter(value)
420 except TypeError as e:
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in toString(value)
203 else:
--> 204 raise TypeError("Could not convert %s to string type" % type(value))
205
TypeError: Could not convert <class 'float'> to string type
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-28-71b9c2a0f1a0> in <module>()
3 .setFeaturesCol("features")
4 .setMaxDepth(max_depth)
----> 5 .setFeatureSubsetStrategy(subset_strategy)
6 .setImpurity(impurity)
7 .setMinInstancesPerNode(min_instances_per_node)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/regression.py in setFeatureSubsetStrategy(self, value)
632 Sets the value of :py:attr:`featureSubsetStrategy`.
633 """
--> 634 return self._set(featureSubsetStrategy=value)
635
636 #since("1.4.0")
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
419 value = p.typeConverter(value)
420 except TypeError as e:
--> 421 raise TypeError('Invalid param value given for param "%s". %s' % (p.name, e))
422 self._paramMap[p] = value
423 return self
TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type

Try to place it as string.
subset_strategy = "0.2037"
rf_econ_gen = (RandomForestClassifier()
.setFeatureSubsetStrategy(subset_strategy))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Out of sample forecasting - python

Related

predict_model() giving error while using 'sod' (Subspace Outlier Detection) model in pycaret.anomaly

How to use TimeseriesGenerator for GRU.fit()?

How to correct solve trigonometric functions equation by sympy?

Pandas - TypeError: Cannot perform 'rand_' with a dtyped [bool] array and scalar of type [bool]

Pyspark: Random forest featureSubsetStrategy not accepting int or float

Categories

Resources