I am trying to reduce the number of data points for a 3D curve, currently I have 20000 points and I would like to reduce this to around 2000 without losing much information.
I am doing this on python.
As a simple example, think of a spiral on the surface of a cylinder.
Are there any built-in functions that will do this?
I've tried using the Ramer–Douglas–Peucker algorithm to simplify the line, but due to the nature of the curve, for every data point ignored the final plot is undershooting. See picture of a 2D example, orange is what using rdp produces, green is what I want.
I would like the output of the program to be an array of ~2000 coordinates that still represent the shape of the 3D curve but they don't necessarily have to be original coordinates, I want some points to overshoot and others to undershoot.
Thank you for your help
UPDATE:
In the end I chose to do something quite involved but gave me exactly what I wanted. I started using the rdp algorithm to reduce the number of points. With this new information I then fit a straight line of best fit to the spread of the original points between the new reduced points:
i.e. if the algorithm 'ignored' 13 points, I fit the line from point 0 to point 14, and did the same for the next segment where the algorithm had skipped for example 7 points, so I fit from 14 to 22 etc.
Having those lines of best fit, I found the points were the lines intersected or if the lines did not intersect, the closest point on each of the lines to the other line.
Due to the nature of my problem, I did not need my data to be continuous, so 2000 "discontinuous" segments were not a problem.
Thank you very much for your help!
In the end I chose to do something quite involved but gave me exactly what I wanted. I started using the rdp algorithm to reduce the number of points. With this new information I then fit a straight line of best fit to the spread of the original points between the new reduced points: i.e. if the algorithm 'ignored' 13 points, I fit the line from point 0 to point 14, and did the same for the next segment where the algorithm had skipped for example 7 points, so I fit from 14 to 22 etc. Having those lines of best fit, I found the points were the lines intersected or if the lines did not intersect, the closest point on each of the lines to the other line. Due to the nature of my problem, I did not need my data to be continuous, so 2000 "discontinuous" segments were not a problem. Thank you very much for your help!
Related
I have an output from a commercial program that contains the dihedral angles of a molecule in time. The problem comes from apparently a known quadrant issue when taking cosines, that your interval is -180 to 180, and I am not familiar with. If the dihedral would be bigger than 180, this commercial program (SHARC, for molecular dynamics simulations) understands that it is bigger than -180, creating jumps on the plots (you can see an example in the figure bellow).
Is there a correct mathematical way to convert these plots to smooth curves, even if it means to go to dihedrals higher than 180?
What I am trying is to create an python program to deal with each special case, when going from 180 to -180 or vice versa, how to deal with cases near 90 or 0 degrees, by using sines and cosines... But it is becoming extremely complex, with more than 12 nested if commands inside a for loop running through the X axis.
If it was only one figure, I could do it by hand, but I will have dozens of similar plots.
I attach an ascii file with the that for plotting this figure.
What I would like it to look like is this:
Thank you very much,
Cayo Gonçalves
Ok, I've found a pretty easy solution.
Numpy has the unwrap function. I just need to feed the function with a vector with the angles in radians.
Thank you Yves for giving me the name of the problem. This helped me find the solution.
This is called phase unwrapping.
As your curves are smooth and slowly varying, every time you see a large negative (positive) jump, add (subtract) 360. This will restore the original curve. (For the jump threshold, 170 should be good, I guess).
I am trying to recover a trajectory of a 2D camera, using a sequence of 2D-images and OpenCV. But the trajectory I get is not so good as I would like it to be. It goes back and forth instead of going just forth.
I have a sequence of photos taken on 2D-camera while it was moving (KITTI dataset, outdoors part, namely). For each two sequential frames I compute the rotation matrix (R) and translation vector (t) with E = cv2.findEssentialMat() and cv2.recoverPose(E, ...), and then I estimate the trajectory, assuming that coordinates of every translation vector are given in local coordinate system, which position is set by the corresponding rotation matrix.
upd: Each recovered position looks like [X,Y,Z], and I scatter (X_i, Y_i) for every i (these points are thought to be 2D positions), so the following graphs are my estimated trajectories.
Here's what I get instead of a straight line (the camera was moving straight forward). Previous results were even worse.
The green point is where it starts and the red point is where it ends. So most of the time it even moves backwards. This, though, is probably because of a mistake in the beginning, which was the cause of everything turning around (right?)
Here's what I do:
E, mask = cv2.findEssentialMat(points1, points2, K_00, cv2.RANSAC, 0.99999, 0.1)
inliers, R, t, mask = cv2.recoverPose(E, points1, points2, K_00, R, t, mask)
Seems to me that recoverPose somehow chooses wrong R and t sign on some steps. So the trajectory that was supposed to go forward, goes back. And then forth again.
What I did to improve the situation was:
1) skip the frames with too many outliers (I check this both after using findEssentialMat and after using recoverPose)
2) set the threshold for RANSAC method in findEssentialMat to 0.1
3) increase the number of the feature points on each image from 8 to 24.
This didn't really help.
Here I need to note: I know that on practice, 5-point algorithm, which is used for computing the essential matrix, needs a lot more points than 8 or even 24. And maybe this is actually the problem.
So the questions are:
1) Can the number of feature points (approx. 8-24) be the cause of recoverPose mistakes?
2) If checking the number of outliers if the right thing, then what percentage of outliers should I set as the limitation?
3) I estimate positions like this (instead of simple p[i+1] = R*p[i]+t):
C = np.dot(R, C)
p[i+1] = p[i] + np.dot(np.linalg.inv(C), t)
This is because I can't help thinking of t as a vector in local coordinates, so C is the transformation matrix, which is updated on every step to summarize the rotations. Is that right or not really?
4) It's really possible that I am missing something, since my knowledge of the topic seems tiny. Is there anything (anything!) you could recommend?
Huge thanks for your time! I would appreciate any advice.
upd: for example, here are the first six rotation matrices, translation vectors, and recovered positions I get. Signs of t seem a bit crazy.
upd: here's my code. (I'm not a really good programmer yet). The main idea is that my feature points are corners of bouding boxes of static objects, which I detect with Faster R-CNN (I used this implementation). So the first part of the code detects objects, and the second part uses detected feature points for recovering the trajectory.
Here's the dataset I use (this is part 2011_09_26_drive_0005 from here).
I have a bunch of noisy data.
I need be able to find the points where the increase in y begins and ends. Visually it's pretty obvious, but i've been having a hard time trying to come up with an algorithm that would be consistent and accurate.
I tried getting the slope directly (just as a difference of neighboring points):
But here still, i'm not sure how to properly identify the beginning and end of a step. I tried just going off of the magnitude of difference between points, but I get either a lot of false positives (like in that very noisy spike in the second graph), or I miss the very small steps (like the first and third). I also tried going in steps of ten points, calculating a best fit line, and the MSE, and when the MSE gets about a certain threshold, i would consider that a corner in the graph. For example, for 10 points in the somewhat horizontal line, the MSE for the best fit line would be small, but for 9 points and 1 that is at the beginning of the incline, the MSE is much larger.
I thought about trying to convert it into a step graph, but I'm not sure how to do it, plus i feel like i might end up with just one point where the graph goes from low to high, rather than two points, one for when it starts increasing, and another when it stops.
Does anyone have any ideas on how one might go about doing this?
I'm working on a heatmap generation program which hopefully will fill in the colors based on value samples provided from a building layout (this is not GPS based).
If I have only a few known data points such as these in a large matrix of unknowns, how do I get the values in between interpolated in Python?:
0,0,0,0,1,0,0,0,0,0,5,0,0,0,0,9
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0,0,0,8,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,8,0,0,0,0,0,0,0,6,0,0,0,0,0,0
0,0,0,0,0,3,0,0,0,0,0,0,0,0,7,0
I understand that bilinear won't do it, and Gaussian will bring all the peaks down to low values due to the sheer number of surrounding zeros. This is obviously a matrix handling proposition, and I don't need it to be Bezier curve smooth, just close enough to be a graphic representation would be fine. My matrix will end up being about 1500×900 cells in size, with approximately 100 known points.
Once the values are interpolated, I have written code to convert it all to colors, no problem. It's just that right now I'm getting single colored pixels sprinkled over a black background.
Proposing a naive solution:
Step 1: interpolate and extrapolate existing data points onto surroundings.
This can be done using "wave propagation" type algorithm.
The known points "spread out" their values onto surroundings until all the grid is "flooded" with some known values. At the end of this stage you have a number of intersected "disks", and no zeroes left.
Step 2: smoothen the result (using bilinear filtering or some other filtering).
If you are able to use ScyPy, then interp2d does exactly what you want. A possible problem with is that it seems to not extrapolate smoothly according to this issue. This means that all values near the walls are going to be the same as closest their neighbour points. This can be solved by putting thermometers in all 4 corners :)
I am not a mathematician but I am pretty sure my problem could be solved with a bit (maybe a lot?) of good maths.
Let me explain the problem with a picture.
I have a network (GIS data) which is composed of many linear segments.
Rarely, a curve is present throughout these segments and I would need to find a reasonable method to detect them rather automatically.
Given that I have the coordinates of my segments and the curves (the green dots in the picture), would you reccomend a reasonable way to detect these curves?
I am not sure but it could be similar to the opposite of what is asked in this other SO question, but I don't actually have a function to calculate a second derivative, only line segments (and curves) made by vertices...
Assuming you can easily list out the points in a segment and iterate over them, and that a segment is "mostly" linear, you can take the end-points of a segment and interpolate a line between them.
Next, check if each point of the segment lies on the interpolated line and add a margin of error.
You can then assume that several adjacent points of the segment that do not lie on the interpolated line make up a curve.
You may need to implement other checks:
Are the end-points are part of a straight segment -- i.e. that the segment does not end in a curve
Does the segment bend and should the segment be treated as two segments?
Can two curves be adjacent to one another without a point between them that's on the line?
To get started with python, I'd write the function is_on_line and loop over all the points, calling it each time to see if the point is on the line.
Excuse the verbose pseudo code (makes lots of assumptions about data structures, can be done in one loop), but this should help you break the problem apart to get started:
points_on_line = []
for idx, point in enumerate(segment):
result = is_on_line(
endpoint_1_x=segment[0].x,
endpoint_1_y=segment[0].y,
endpoint_2_x=segment[-1].x,
endpoint_2_y=segment[-1].y,
coord_x=point.x,
coord_y=point.y,
error_margin=0.1,
)
points_on_line.append((point, result,))
for point, on_line in points_on_line:
# figure out where your curves are