Model for predicting temperature data of fridge - python

I set up a sensor which measures temperature data every 3 seconds. I collected the data for 3 days and have 60.000 rows in my csv export. Now I would like to forecast the next few days. When looking at the data you can already see a "seasonality" which displays the fridges heating and cooling cycle so I guess it shouldn't be too difficult to predict. I am not really sure if my data is too granular and if I should do some kind of undersampling. I thought about using a seasonal ARIMA model but I am having difficulties with picking parameters. As the seasonality in the data is pretty obious is there maybe a model that fits better? Please bear with me I'm pretty new to machine learning.

When the goal is to forecast rising temperatures, you can forecast the lower and upper peaks, i.e., their hight and distances. Assuming (simplified model) that the temperature change in between is linear we can, model each complete peak starting from a first lower peak of the temperature curve to the next upper peak down to next lower peak. So a complete peak can be seen as triangle which we easily integrate (calculate its area + the area of the rectangle below of it). The estimation can now be done by a integrating a number of complete peaks we have already measured. By repeating this procedure, we can do now a linear regression on the average temperatures and alert when the slope is above a defined threshold.
As this only tackles a certain kind of errors, one can do the same for the average distances between the upper peaks and the also for the lower peaks. I.e., take the times between them for a certain periode, fit a curve (linear regression can possibly be sufficient) and alert when the slope of the curve is indicating too long distances.

It's mission impossible. If fridge work without interference, then graph always looks the same. The change can be caused, for example, by opening a door, a breakdown, a major change in external conditions. But you cannot predict such events. Instead, you can try to warn about the possibility of problems in the near future, for example, based on a constant increase in average temperature. This situation may indicate a leak in the cooling system.
By the way, have you considered logging the temperature every 3 seconds? This is usually unjustified, because it is physically impossible for the temperature to change to a measurable degree in such an interval. Our team usually sets the login interval to 30 or 60 seconds in such cases. Sometimes even more. Depending on the size of the chamber, the way the air is circulated, the ratio of volume to power of the refrigeration unit, etc.

Related

What's the best way to calculate strength of fit of these curves?

The green line is a fit to the red data points. What is the best way to calculate the strength of fit for something like this?
There are lots of raw points from 0-100, as the x-axis gets larger the number of data points tends to decrease, and the residual tends to get worse.
The number of red data points is always variable, and it stops at various times on the x-axis, but the fit always goes back to 0 on the y-axis.
I'm trying to get a sense for how good the fit is for one example vs another.
I figured the average error could be good, but there's way more data points 0-100 so that will heavily influence the average. Also the error could be low, but stop very early, which would go uncaptured in that scenario.
In my grad school days, we used the Kolmogorov-Smirnov Goodness-of-Fit Test. Essentially you are comparing the Cumulative Distribution of your data to the Cumulative Distribution of your fit function evaluated with the same number of points. Enjoy the rabbit hole.

Confidence interval of summation Facebook Prophet models

I have one FB Prophet demand forecast time-series for each of three consumer classes. I want to add these three models and get a confidence interval for the total demand---the stakeholders may look at everything and choose different assumptions for each consumer class. How should I approach that?
I have tried:
using the usual square root of the summation of variances plus two times the covariances for each point in time. That returned a much wider uncertainty interval than it makes sense (the summation of the historical series falling entirely well within two standard deviations), maybe because of trend uncertainty. Also, the distribution around each point in the series time is not normal, so just getting the standard deviation won't do.
adding sample forecasts from each model and using them to estimate the confidence intervals. But then I remembered that the samples wouldn't be correlated.
Any other ideas? Is there any way for me to correlate samples from the three models?

How can I find out if a satellite has performed maneuver?

I have been recently acquainted with orbital mechanics and am trying to do some analysis on the subject. Since I don't have subject matter expertise, I was at a crossroads with trying to decide that how would one determine if a satellite has performed maneuver/rendezvous operation given the historical TLE data of that satellite from which we extract the orbital elements. To drill down further, I am approaching to the problem like this:
I take my satellite of interest and collect the historical TLE data
for it.
Once, I have the data, I extract and calculate all the orbital
parameters from the TLE.
From the list of orbital parameters, I choose a subset of those
parameters and calculate long term standardized anomalies for each
of them.
Once I have the anomalies, I filter out those dates where any one
parameter has anomalies greater than 1.5 or less than -1.5.
But the deal is, I am not too sure of my subset. As of now, I have Inclination, RAAN, Argument of Perigee and Longitude.
Is there any other factor that I should add or remove from this subset in order to nail this analysis the right way? Or is there altogether any other approach that I can use?
What I'm interested in, is to find out the days when a satellite has performed maneuvers.
You should add major and minor semi axis sizes (min and max altitude). Those changes after any burns along trajectory or perpendicular to it and decrease from friction for too low orbits.
Analyzing that can possibly hint what kind of maneuver was performed. Also changing those is usually done on the opposite side of the orbit so once you find a bump in periaxis or apoaxis the burn most likely ocured half orbit before reaching it.
Another stuff I would detect was speed. Compute local speed as derivation of consequent data points distance/time) and compare that with Kepler's equation. If they are not matching it means some kind of burn or collision or ejection ocured. and from the difference you can also detect what has been done.
For more info see:
solving Kepler`s equation
Is it possible to make realistic n-body solar system simulation in matter of size and mass?

How can I statistically compare a lightcurve data set with the simulated lightcurve?

With python I want to compare a simulated light curve with the real light curve. It should be mentioned that the measured data contain gaps and outliers and the time steps are not constant. The model, however, contains constant time steps.
In a first step I would like to compare with a statistical method how similar the two light curves are. Which method is best suited for this?
In a second step I would like to fit the model to my measurement data. However, the model data is not calculated in Python but in an independent software. Basically, the model data depends on four parameters, all of which are limited to a certain range, which I am currently feeding mannualy to the software (planned is automatic).
What is the best method to create a suitable fit?
A "Brute-Force-Fit" is currently an option that comes to my mind.
This link "https://imgur.com/a/zZ5xoqB" provides three different plots. The simulated lightcurve, the actual measurement and lastly both together. The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
If I understand this correctly, you're asking a more foundational question that could be better answered in https://datascience.stackexchange.com/, rather than something specific to Python.
That said, as a data science layperson, this may be a problem suited for gradient descent with a mean-square-error cost function. You initialize the parameters of the curve (possibly randomly), then calculate the square error at your known points.
Then you make tiny changes to each parameter in turn, and calculate how the cost function is affected. Then you change all the parameters (by a tiny amount) in the direction that decreases the cost function. Repeat this until the parameters stop changing.
(Note that this might trap you in a local minimum and not work.)
More information: https://towardsdatascience.com/implement-gradient-descent-in-python-9b93ed7108d1
Edit: I overlooked this part
The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
Is the simulated curve just a sum of sine waves, and are the parameters just phase/period/amplitude of each? In this case what you're looking for is the Fourier transform of your signal, which is very easy to calculate with numpy: https://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html

Backtracking with Particle Filter

I have just implemented a particle filter for Indoor Tracking. It looks good but at some points the particles go in a room and are trapped there.
What's a smart way to do backtracking? I save the state of the particles for their last 10 movements.
Thank you
It is completely normal that particles get distributed everywhere. Otherwise it is not a probabilistic approach. In addition, note that the particles are sampled based on the posterior probability at time t-1 and the current motion distribution. However, even if it is not recommended in filtering but you can restrict your research space in the sampling step.
For backtracking, you may use at each time t the same approach of fortracking, with changing only the direction of the velocity (on all axies). You can start from the state which maximise the probability distribution. Finaly, you compare the obtained trajectories (result of for/backtracking) and you decide based on the result which further filtering is needed to get the best result.

Categories

Resources