I set up a sensor which measures temperature data every 3 seconds. I collected the data for 3 days and have 60.000 rows in my csv export. Now I would like to forecast the next few days. When looking at the data you can already see a "seasonality" which displays the fridges heating and cooling cycle so I guess it shouldn't be too difficult to predict. I am not really sure if my data is too granular and if I should do some kind of undersampling. I thought about using a seasonal ARIMA model but I am having difficulties with picking parameters. As the seasonality in the data is pretty obious is there maybe a model that fits better? Please bear with me I'm pretty new to machine learning.
When the goal is to forecast rising temperatures, you can forecast the lower and upper peaks, i.e., their hight and distances. Assuming (simplified model) that the temperature change in between is linear we can, model each complete peak starting from a first lower peak of the temperature curve to the next upper peak down to next lower peak. So a complete peak can be seen as triangle which we easily integrate (calculate its area + the area of the rectangle below of it). The estimation can now be done by a integrating a number of complete peaks we have already measured. By repeating this procedure, we can do now a linear regression on the average temperatures and alert when the slope is above a defined threshold.
As this only tackles a certain kind of errors, one can do the same for the average distances between the upper peaks and the also for the lower peaks. I.e., take the times between them for a certain periode, fit a curve (linear regression can possibly be sufficient) and alert when the slope of the curve is indicating too long distances.
It's mission impossible. If fridge work without interference, then graph always looks the same. The change can be caused, for example, by opening a door, a breakdown, a major change in external conditions. But you cannot predict such events. Instead, you can try to warn about the possibility of problems in the near future, for example, based on a constant increase in average temperature. This situation may indicate a leak in the cooling system.
By the way, have you considered logging the temperature every 3 seconds? This is usually unjustified, because it is physically impossible for the temperature to change to a measurable degree in such an interval. Our team usually sets the login interval to 30 or 60 seconds in such cases. Sometimes even more. Depending on the size of the chamber, the way the air is circulated, the ratio of volume to power of the refrigeration unit, etc.
What is the best way to check collision of huge number of circles?
It's very easy to detect collision between two circles, but if we check every combination then it is O(n2) which definitely not an optimal solution.
We can assume that circle object has following properties:
Coordinates
Radius
Velocity
Direction
Velocity is constant, but direction can change.
I've come up with two solutions, but maybe there are some better solutions.
Solution 1
Divide whole space into overlapping squares and check for collision only with circles that are in the same square. Squares need to overlap so there won't be a problem when a circle moves from one square to another.
Solution 2
At the beginning distances between every pair of circles need to be calculated.
If the distance is small then these pair is stored in some list, and we need to check for collision in every update.
If the distance is big then we store after which update there can be a collision (it can be calculated because we know the distance and velocitites). It needs to be stored in some kind of priority queue. After previously calculated number of updates distance needs to be checked again and then we do the same procedure - put it on the list or again in the priority queue.
Answers to Mark Byers questions
Is it for a game?
It's for simulation, but it can be treated also as a game
Do you want to recalculate the new position every n milliseconds, and also check for collisions at this time?
Yes, time between update is constant.
Do you want to find the time at which the first/every collision occurs?
No, I want to find every collision and do 'something' when it occures.
How important is accuracy?
It depends on what do you mean by accuracy. I need to detect all collisions.
Is it a big problem if very small fast moving circles can pass through each other occasionally?
It can be assumed that speed is so small that it won't happen.
There are "spatial index" data-structures for storing your circles for quick comparison later; Quadtree, r-tree and kd-tree are examples.
Solution 1 seems to be a spatial index, and solution 2 would benefit from a spatial index every time you recalculate your pairs.
To complicate matters, your objects are moving - they have velocity.
It is normal to use spatial indexes for objects in games and simulations, but mostly for stationary objects, and typically objects that don't react to a collision by moving.
It is normal in games and such that you compute everything at set time intervals (discrete), so it might be that two objects pass through each other but you fail to notice because they moved so fast. Many games actually don't even evaluate collisions in strict chronological order. They have a spatial index for stationary objects e.g. walls, and lists for all the moving objects that they check exhaustively (although with relaxed discrete checks as I outlined).
Accurate continuous collision detection and where the objects react to collisions in simulations is usually much more demanding.
The pairs approach you outlined sounds promising. You might keep the pairs sorted by next collision, and reinsert them when they have collided in the appropriate new positions. You only have to sort the new generated collision list (O(n lg n)) for the two objects and then to merge two lists (the new collisions for each object, and the existing list of collisions; inserting the new collisions, removing those stale collisions that listed the two objects that collided) which is O(n).
Another solution to this is to adapt your spatial index to store the objects not strictly in one sector but in each that it has passed through since the last calculation, and do things discretely. This means storing fast moving objects in your spatial structure, and you'd need to optimise it for this case.
Remember that linked lists or lists of pointers are very bad for caching on modern processors. I'd advocate that you store copies of your circles - their important properties for collision detection at any rate - in an array (sequential memory) in each sector of any spatial index, or in the pairs you outlined above.
As Mark says in the comments, it could be quite simple to parallelise the calculations.
I assume you are doing simple hard-sphere molecular dynamic simulation, right? I came accros the same problem many times in Monte Carlo and molecular dynamic simulations. Both of your solutions are very often mentioned in literature about simulations. Personaly I prefer solution 1, but slightly modified.
Solution 1
Divide your space into rectangular cells that don't overlap. So when you check one circle for collision you look for all circles inside a cell that your first circle is, and look X cells in each direction around. I've tried many values of X and found that X=1 is the fastest solution. So you have to divide space into cells size in each direction equal to:
Divisor = SimulationBoxSize / MaximumCircleDiameter;
CellSize = SimulationBoxSize / Divisor;
Divisor should be bigger than 3, otherwise it will cause errors (if it is too small, you should enlarge your simulation box).
Then your algorithm will look like this:
Put all circles inside the box
Create cell structure and store indexes or pointers to circles inside a cell (on array or on a list)
Make a step in time (move everything) and update circles positions inside on cells
Look around every circle for collision. You should check one cell around in every direction
If there is a collision - do something
Go to 3.
If you will write it correctly then you would have something about O(N) complexity, because maximum number of circles inside 9 cells (in 2D) or 27 cells (in 3D) is constant for any total number of circles.
Solution 2
Ususaly this is done like this:
For each circle create a list of circles that are in distance R < R_max, calculate time after which we should update lists (something about T_update = R_max / V_max; where V_max is maximum current velocity)
Make a step in time
Check distance of each circle with circles on its list
If there is a collision - do something
If current time is bigger then T_update, go to 1.
Else go to 2.
This solution with lists is very often improved by adding another list with R_max_2 > R_max and with its own T_2 expiration time. In this solution this second list is used to update the first list. Of course after T_2 you have to update all lists which is O(N^2). Also be carefull with this T and T_2 times, because if collision can change velocity then those times would change. Also if you introduce some foreces to your system, then it will also cause velocity change.
Solution 1+2
You can use lists for collision detection and cells for updating lists. In one book it was written that this is the best solution, but I think that if you create small cells (like in my example) then solution 1 is better. But it is my opinion.
Other stuff
You can also do other things to improve speed of simulation:
When you calculate distance r = sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + ...) you don't have to do square root operation. You can compare r^2 to some value - it's ok. Also you don't have to do all (x1-x2)*(x1-x2) operations (I mean, for all dimentions), because if x*x is bigger than some r_collision^2 then all other y*y and so on, summed up, would be bigger.
Molecular dynamics method is very easy to parallelise. You can do it with threads or even on GPU. You can calculate each distance in different thread. On GPU you can easly create thousends of threads almost costless.
For hard-spheres there is also effective algorithm that doesn't do steps in time, but instead it looks for nearest collision in time and jumps to this time and updates all positions. It can be good for not dense systems where collisions are not very probable.
one possible technique is to use the Delaunay triangulation on the center of your circles.
consider the center of each circle and apply the delaunay triangulation. this will tesselate your surface into triangles. this allows you to build a graph where each node stores the center of a triangle, and each edge connects to the center of a neighbour circle. the tesselation operated above will limit the number of neighbours to a reasonable value (6 neighbours on average)
now, when a circle moves, you have a limited set of circles to consider for collision. you then have to apply the tesselation again to the set of circles which are impacted by the move, but this operation involves only a very small subset of circles (the neighbours of the moving circle, and some neighbours of the neighbours)
the critical part is the first tesselation, which will take some time to perform, later tesselations are not a problem. and of course you need an efficient implementation of a graph in term of time and space...
Sub-divide your space up into regions and maintain a list of which circles are centred in each region.
Even if you use a very simple scheme, such as placing all the circles in a list, sorted by centre.x, then you can speed things up massively. To test a given circle, you only need to test it against the circles on either side of it in the list, going out until you reach one that has an x coordinate more than radius away.
You could make a 2D version of a "sphere tree" which is a special (and really easy to implement) case of the "spatial index" that Will suggested. The idea is to "combine" circles into a "containing" circle until you've got a single circle that "contains" the "huge number of circles".
Just to indicate the simplicity of computing a "containing circle" (top-of-my-head):
1) Add the center-locations of the two circles (as vectors) and scale by 1/2, thats the center of the containing circle
2) Subtract the center locations of the two circles (as vectors), add the radii and scale by 1/2, thats the radius of the containing circle
What answer is most efficient will depend somewhat on the density of circles. If the density is low, then placing placing a low-resolution grid over the map and marking those grid elements that contain a circle will likely be the most efficient. This will take approximately O(N*m*k) per update, where N is the total number of circles, m is the average number of circles per grid point, and k is the average number of grid points covered by one circle. If one circle moves more than one grid point per turn, then you have to modify m to include the number of grid points swept.
On the other hand, if the density is extremely high, you're best off trying a graph-walking approach. Let each circle contain all neighbors within a distance R (R > r_i for every circle radius r_i). Then, if you move, you query all the circles in the "forward" direction for neighbors they have and grab any that will be within D; then you forget all the ones in the backward direction that are now farther than D. Now a complete update will take O(N*n^2) where n is the average number of circles within a radius R. For something like a closely-spaced hexagonal lattice, this will give you much better results than the grid method above.
A suggestion - I am no game developer
Why not precalculate when the collisions are going to occur
as you specify
We can assume that circle object has following properties:
-Coordinates
-Radius
-Velocity
-Direction
Velocity is constant, but direction can change.
Then as the direction of one object changes, recalculate those pairs that are affected. This method is effective if directions do not change too frequently.
As Will mentioned in his answer, spacial partition trees are the common solution to this problem. Those algorithms sometimes take some tweaking to handle moving objects efficiently though. You'll want to use a loose bucket-fitting rule so that most steps of movement don't require an object to change buckets.
I've seen your "solution 1" used for this problem before and referred to as a "collision hash". It can work well if the space you're dealing with is small enough to be manageable and you expect your objects to be at least vaguely close to uniformly distributed. If your objects may be clustered, then it's obvious how that causes a problem. Using a hybrid approach of some type of a partition tree inside each hash-box can help with this and can convert a pure tree approach into something that's easier to scale concurrently.
Overlapping regions is one way to deal with objects that straddle the boundaries of tree buckets or hash boxes. A more common solution is to test any object that crosses the edge against all objects in the neighboring box, or to insert the object into both boxes (though that requires some extra handling to avoid breaking traversals).
If your code depends on a "tick" (and tests to determine if objects overlap at the tick), then:
when objects are moving "too fast" they skip over each other without colliding
when multiple objects collide in the same tick, the end result (e.g. how they bounce, how much damage they take, ...) depends on the order that you check for collisions and not the order that collisions would/should occur. In rare cases this can cause a game to lock up (e.g. 3 objects collide in the same tick; object1 and object2 are adjusted for their collision, then object2 and object3 are adjusted for their collision causing object2 to be colliding with object1 again, so the collision between object1 and object2 has to be redone but that causes object2 to be colliding with object3 again, so ...).
Note: In theory this second problem can be solved by "recursive tick sub-division" (if more than 2 objects collide, divide the length of the tick in half and retry until only 2 objects are colliding in that "sub-tick"). This can also cause games to lock up and/or crash (when 3 or more objects collide at the exact same instant you end up with a "recurse forever" scenario).
In addition; sometimes when game developers use "ticks" they also say "1 fixed length tick = 1 / variable frame rate", which is absurd because something that is supposed to be a fixed length can't depend on something variable (e.g. when the GPU is failing to achieve 60 frames per second the entire simulation goes in slow motion); and if they don't do this and have "variable length ticks" instead then both of the problems with "ticks" become significantly worse (especially at low frame rates) and the simulation becomes non-deterministic (which can be problematic for multi-player, and can result in different behavior when the player saves, loads or pauses the game).
The only correct way is to add a dimension (time), and give each object a line segment described as "starting coordinates and ending coordinates", plus a "trajectory after ending coordinates". When any object changes its trajectory (either because something unpredicted happened or because it reached its "ending coordinates") you'd find the "soonest" collision by doing a "distance between 2 lines < (object1.radius + object2.radius)" calculation for the object that changed and every other object; then modify the "ending coordinates" and "trajectory after ending coordinates" for both objects.
The outer "game loop" would be something like:
while(running) {
frame_time = estimate_when_frame_will_be_visible(); // Note: Likely to be many milliseconds after you start drawing the frame
while(soonest_object_end_time < frame_time) {
update_path_of_object_with_soonest_end_time();
}
for each object {
calculate_object_position_at_time(frame_time);
}
render();
}
Note that there are multiple ways to optimize this, including:
split the world into "zones" - e.g. so that if you know object1 would be passing through zones 1 and 2 then it can't collide with any other object that doesn't also pass through zone 1 or zone 2
keep objects in "end_time % bucket_size" buckets to minimize time taken to find "next soonest end time"
use multiple threads to do the "calculate_object_position_at_time(frame_time);" for each object in parallel
do all the "advance simulation state up to next frame time" work in parallel with "render()" (especially if most rendering is done by GPU, leaving CPU/s free).
For performance:
When collisions occur infrequently it can be significantly faster than "ticks" (you can do almost no work for relatively long periods of time); and when you have spare time (for whatever reason - e.g. including because the player paused the game) you can opportunistically calculate further into the future (effectively, "smoothing out" the overhead over time to avoid performance spikes).
When collisions occur frequently it will give you the correct results, but can be slower than a broken joke that gives you incorrect results under the same conditions.
It also makes it trivial to have an arbitrary relationship between "simulation time" and "real time" - things like fast forward and slow motion will not cause anything to break (even if the simulation is running as fast as hardware can handle or so slow that its hard to tell if anything is moving at all); and (in the absence of unpredictability) you can calculate ahead to an arbitrary time in the future, and (if you store old "object line segment" information instead of discarding it when it expires) you can skip to an arbitrary time in the past, and (if you only store old information at specific points in time to minimize storage costs) you can skip back to a time described by stored information and then calculate forward to an arbitrary time. These things combined also make it easy to do things like "instant slow motion replay".
Finally; it's also more convenient for multiplayer scenarios, where you don't want to waste a huge amount of bandwidth sending a "new location" for every object to every client at every tick.
Of course the downside is complexity - as soon as you want to deal with things like acceleration/deceleration (gravity, friction, lurching movement), smooth curves (elliptical orbits, splines) or different shaped objects (e.g. arbitrary meshes/polygons and not spheres/circles) the mathematics involved in calculating when the soonest collision will occur becomes significantly harder and more expensive; which is why game developers resort to the inferior "ticks" approach for simulations that are more complex than the case of N spheres or circles with linear motion.
I have been recently acquainted with orbital mechanics and am trying to do some analysis on the subject. Since I don't have subject matter expertise, I was at a crossroads with trying to decide that how would one determine if a satellite has performed maneuver/rendezvous operation given the historical TLE data of that satellite from which we extract the orbital elements. To drill down further, I am approaching to the problem like this:
I take my satellite of interest and collect the historical TLE data
for it.
Once, I have the data, I extract and calculate all the orbital
parameters from the TLE.
From the list of orbital parameters, I choose a subset of those
parameters and calculate long term standardized anomalies for each
of them.
Once I have the anomalies, I filter out those dates where any one
parameter has anomalies greater than 1.5 or less than -1.5.
But the deal is, I am not too sure of my subset. As of now, I have Inclination, RAAN, Argument of Perigee and Longitude.
Is there any other factor that I should add or remove from this subset in order to nail this analysis the right way? Or is there altogether any other approach that I can use?
What I'm interested in, is to find out the days when a satellite has performed maneuvers.
You should add major and minor semi axis sizes (min and max altitude). Those changes after any burns along trajectory or perpendicular to it and decrease from friction for too low orbits.
Analyzing that can possibly hint what kind of maneuver was performed. Also changing those is usually done on the opposite side of the orbit so once you find a bump in periaxis or apoaxis the burn most likely ocured half orbit before reaching it.
Another stuff I would detect was speed. Compute local speed as derivation of consequent data points distance/time) and compare that with Kepler's equation. If they are not matching it means some kind of burn or collision or ejection ocured. and from the difference you can also detect what has been done.
For more info see:
solving Kepler`s equation
Is it possible to make realistic n-body solar system simulation in matter of size and mass?
With python I want to compare a simulated light curve with the real light curve. It should be mentioned that the measured data contain gaps and outliers and the time steps are not constant. The model, however, contains constant time steps.
In a first step I would like to compare with a statistical method how similar the two light curves are. Which method is best suited for this?
In a second step I would like to fit the model to my measurement data. However, the model data is not calculated in Python but in an independent software. Basically, the model data depends on four parameters, all of which are limited to a certain range, which I am currently feeding mannualy to the software (planned is automatic).
What is the best method to create a suitable fit?
A "Brute-Force-Fit" is currently an option that comes to my mind.
This link "https://imgur.com/a/zZ5xoqB" provides three different plots. The simulated lightcurve, the actual measurement and lastly both together. The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
If I understand this correctly, you're asking a more foundational question that could be better answered in https://datascience.stackexchange.com/, rather than something specific to Python.
That said, as a data science layperson, this may be a problem suited for gradient descent with a mean-square-error cost function. You initialize the parameters of the curve (possibly randomly), then calculate the square error at your known points.
Then you make tiny changes to each parameter in turn, and calculate how the cost function is affected. Then you change all the parameters (by a tiny amount) in the direction that decreases the cost function. Repeat this until the parameters stop changing.
(Note that this might trap you in a local minimum and not work.)
More information: https://towardsdatascience.com/implement-gradient-descent-in-python-9b93ed7108d1
Edit: I overlooked this part
The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
Is the simulated curve just a sum of sine waves, and are the parameters just phase/period/amplitude of each? In this case what you're looking for is the Fourier transform of your signal, which is very easy to calculate with numpy: https://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html
Query:
I want to estimate the trajectory of a person wearing an IMU between point a and point b. I know the exact location of point a and point b in an x,y,z space and the time it takes the person to walk between the points.
Is it possible to reconstruct the trajectory of the person moving from point a to point b using the data from an IMU and the time?
This question is too broad for SO. You could write a PhD thesis answering it, and I know people who have.
However, yes, it is theoretically possible.
However, there are a few things you'll have to deal with:
Your system is going to discretize time on some level. The result is that your estimate of position will be non-smooth. Increasing sampling rates is one way to address this, but this frequently increases the noise of the measurement.
Possible paths are non-unique. Knowing the time it takes to travel from a-b constrains slightly the information from the IMUs, but you are still left with an infinite family of possible routes between the two. Since you mention that you're considering a person walking between two points with z-components, perhaps you can constrain the route using knowledge of topography and roads?
IMUs function by integrating accelerations to velocities and velocities to positions. If the accelerations have measurement errors, and they always do, then the error in your estimate of the position will grow over time. The longer you run the system for, the more the results will diverge. However, if you're able to use roads/topography as a constraint, you may be able to restart the integration from known points in space; that is, if you can detect 90 degree turns on a street grid, each turn gives you the opportunity to tie the integrator back to a feasible initial condition.
Given the above, perhaps the most important question you have to ask yourself is how much error you can tolerate in your path reconstruction. Low-error estimates are going to require better (i.e. more expensive) sensors, higher sampling rates, and higher-order integrators.