I have to cluster a 3d array that looks like this
a=([[[1,2,3],[4,5,6],[7,8,9]],[[1,4,7],[2,5,9],[3,6,8]]])
Imagine that this array represents the coordinates of a triangle in a time series, so the first 2d array represents the coordinates of the vertices in the first frame, the second array represents the coordinates in the second frame and so on.
I need to cluster the position of this triangle in time, but the cluster algorithms of scikit -learn only work on 2d array. I have performed a reshape of the 3d array to obtain this
b=([[1,2,3,4,5,6,7,8,9],[1,4,7,2,5,9,3,6,8]])
but the performance of the cluster algorithms are poor (please note that the triangle is an example, I have to cluster the position of a much more complex figure so the dimensionality of the points in the 2d array is very high).
So I was wondering if there are other method to cluster a 3d array beside the reshape and dimensionality reduction techniques. I've read that converting the 3d array in a distance matrix could be a solution but I really don't know how to do this. If anyone has any kind of advice on how to do this or any other advice on how to solve this problem, I will really appreciate the help!
The clustering algorthm works with this format for a matrix: n_samples, n_features
So in your case your n_sample is your position in time and your n_features is your coordinate. I'm assuming you are trying to find the average location of your shapes across time. I would advise for this type of task to calculate the center point of your shape. Like this no matter the shape you have one point in the middle of the object to track across time. It would make a bit more sense than to try to track all corners on the object which I assume can rotate.
Hope it helps!
Related
I am trying to write a function that extracts a 2D slice in a non-orthogonal plane from a 3D volume using numpy. The non-orthogonal slice obtained should be a rectangular two-dimensional array of shape (n, m), while the input volume should be a three-dimensional numpy array of shape (i, j, k).
So far I have tried to create a function that receives the volume, the plane normal and a point that belongs to the plane as inputs. I'm representing the plane normal and the point with numpy arrays of shape (3,). I am quite certain the function should follow these steps:
The function should first create a grid with the indices of the volume coordinates.
The function should define the plane using the dot product of the normal and the point.
The function should find the coordinates of the bounding box that contains the entire image slice. It is important to note that, except for specific edge cases where one of the normal coefficients is 0, most bounding boxes should end up with its corners having a variable amount of coordinates from outside the image.
The function should interpolate the slice points from the bounding box using the volume, as some of the coordinates contained by the slice may not be integers and thus will not actually exist in the image.
The non-orthogonal slice obtained from interpolation should then be returned.
I am stuck at step 3. I have gotten steps 4 and 5 to work using orthogonal planes, which are easy to obtain using numpy slicing, but I have been stuck for days trying to get my head around how to find the bounding box coordinates (even though I do not think this should be a hard problem). Any help would be greatly welcome!
I have a thresholded image of a moire pattern that looks like this (moire lines are a little bit inclined):
I want to find the x coordinate of the darkest region in the pattern (i.e., the x coordinate where in theory there is perfect overlap between two vertical lines).
By visual inspection I can easily approximate the expected x coordinate but I am not sure how to do it more precisely using computer vision automatic techniques and algorithms.
My first idea was to reduces the matrix to a vector by treating the matrix rows as a set of 1D vectors and performing the average operation on them until a single row is obtained. Then, plot this 1D array and hopefully find the x coordinate of the global minima that cooresponds to the overlap region (see reduced_OpenCV).
reduced= cv2.reduce(threshold, 0, cv2.REDUCE_AVG)
plt.plot(reduce[0,:])
The obtained signal is:
The results don't look good and I don't know how to proceed from here. Any ideas on how to process the signal and extract the x coordinate are more than welcome. New ideas on how to approch this problem are also welcome.
Recently i was struggling trying to take the pixel values of a 3D volume (np array) using specific space coordinate of a STL object.
The STL object is spatially overlapped with the 3D volume but the latter has no coordinate and so i don't know how to pick pixel values corresponding to the STL coordinates.
Any idea?
If the STL object is truly in the 3d volume's coordinate space, then you can simply STL's coordinate as an index to lookup the value from the 3d array. This lookup does nearest neighbor interpolation of the 3d image. For better looking results you'd want to do linear (or even cubic) interpolation of the nearby pixels.
In most 3d imaging tasks, those coordinate spaces do not align. So there is a transform to go from world space to 3d volume space. But if all you have is a 3d numpy array, then there is no transformation information.
Update:
To index into the 3d volume take the X, Y, Z coordinates of your point from the STL object and convert them into integer value I, J, K. Then lookup in the numpy array using I,J,K as indices: np_array[K][J][I]. I think you have to reverse the order of the indices because of the array ordering numpy uses.
When you way 3d array and the STL align in python, how are you showing that? The original DICOM or Nifti certainly have world coordinate transformations in the metadata.
the below code transforms a detected 2D-image point to it's 3D location on a defined plane Grid in 3D-world.
This mean Z=0, and taking into account that the Extrinsics and Intrinsics are known, we can compute the corresponding 3D_point of the detected 2D-image point:
import cv2
import numpy as np
#load extrinsics & intrinsics
with np.load('parameters_cam1.npz') as X:
mtx, dist = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
with np.load('extrincic.npz') as X:
rvecs1,tvecs1 = [X[i] for i in('rvecs1','tvecs1')]
#prepare rotation matrix
R_mtx, jac=cv2.Rodrigues(rvecs1)
#prepare projection matrix
Extrincic=cv2.hconcat([R_mtx,tvecs1])
Projection_mtx=mtx.dot(Extrincic)
#delete the third column since Z=0
Projection_mtx = np.delete(Projection_mtx, 2, 1)
#finding the inverse of the matrix
Inv_Projection = np.linalg.inv(Projection_mtx)
#detected image point (extracted from a que)
img_point=np.array([pts1_blue[0]])
#adding dimension 1 in order for the math to be correct (homogeneous coordinates)
img_point=np.vstack((img_point,np.array(1)))
#calculating the 3D point which located on the 3D plane
3D_point=Inv_Projection.dot(img_point)
#show results
print('3D_pt_method1\n',3D_point)
#output
3D_pt_method1
[[0.01881387]
[0.0259416 ]
[0.04150276]]
By normalizing the point (dividing by the third value) the result is
`X_World=0.45331611680765327 # 45.3 cm from defined world point cm which is correct
Y_world=0.6250572251098481 # 62.5 cm which is also correct
By evaluating the results, it turns out that they are correct.
I now that we can't retrieve the the Z coordinate of the 3D world point since depth information is lost going from 3d to 2d. The following equation also performs the inverse projection of the 2D point onto 3D world and can be found in all literature, and the result is an equation which represents a line on which the 3D_ world point must lie on
I put the equation 3.15 into code, however without setting Z=0, meaning to say with out deleting the third column of the projection matrix like i did in the previous method (Just as it's written) by doing the following the following:
#inverting the rotation matrix
INV_R=np.linalg.inv(R_mtx)
#inverting the camera matrix
INV_k=np.linalg.inv(mtx)
#multiplying the tow matrices
kinv_Rinv=INV_k.dot(INV_R)
#calcuating the 3D_point X which expressed in eq.3.15
3D_point=kinv_Rinv.dot(img_point)+tvecs1
#print the results
print('3D_pt_method2\n',3D_point)
and the result was
3D_pt_method2 #how should one understand these coordinates ?
[[-9.12505825]
[-5.57152147]
[40.12264881]]
My question is, How should i understand or interpret this result? as it doesn't make any sense compared to the previous method where Z=0. the 3D 3x1 resulted vector seems to give an intuition that it's values represents simply the 3D X, Y and Z of the detected image_point. However, this is not true if we compare X and Y with the previous method!!
So what is literally the difference between 3D_pt_method1 and 3D_pt_method2???
I hope i could express my self and really appreciate helping me understand the difference between the two implementations!
Note: the Grid that represents my defined World-plane and can be seen in the below image in which the distance between every two yellow points is 40 cm
Thanks in advance
You miss the key variable "w" in method2.
You can get help from referring to this article: https://blog.csdn.net/zhou4411781/article/details/103876478
This article is written in Chinese, but you can just try to get the point from those formula in that article if you cannot understand Chinese.
Simply speaking:
You said right "I know that we can't retrieve the the Z coordinate of the 3D world point since depth information is lost going from 3d to 2d. "
This also means: If you know the depth (the Z axis value in world coordination), you can get 3d ordinate by 2d ordinate and the depth. As well, if you know the X or Y axis value in world coordination, you can also get the result.
I am doing my best to replicate the algorithm described here in this paper for making an inpainting algorithm. The idea is to get the contour or edge points of the part of the image that needs to be inpainted. In order to find the most linear point in the region, the orthogonal normal vector is found. On page 6, a short description of the implementation is given.
In our implementation the contour
δΩ of the target region is modelled as a dense list of image point
locations. Given a point p ∈ δΩ, the normal direction np
is computed as follows: i) the positions of the
“control” points of δΩ are filtered via a bi-dimensional Gaussian
kernel and, ii) np is estimated as the unit vector orthogonal to
the line through the preceding and the successive points in the
list.
So it appears that I need to put all these points in a gaussian filter. How do I set up a bi-dimensional Gaussian filter when we have a single dimension or a list of points?
Lets say our contour is a box shape at points, then I create a 1 dimensional list of points: [1,1],[1,2],[1,3],[2,1],[2,3],[3,1],[3,2],[3,3]. Do I need to simply make a new 2d matrix table and put the points in and leave the middle point at [2,2] empty, then run a Gaussian filter on it? This doesn't seem very dense though.
I am trying to run this through python libraries.
a dense list of image points
is simply a line.
You are basically applying a gaussian filter to a black and white image where the line is black and background is white, from what I understand. I think by doing that, they approximate the curve model fitting.
Convolve all of the points in the 2D region surrounding the point and then overwrite the point with the result.
This will make any curve on the edge of the target region less sharp, lowering the noise in the calculation of the normal, which would be the vector orthogonal to the two points that surround the current one.