Convert azure kinect JSON output to BVH - python

My current process goes like this
I used the k4arecorder that comes with the Azure-Kinect SDK v1.41 to record two MKV videos.
In the first, the person is in a T pose and in the second, they are doing some motion.
I then used the offline_processor from Microsoft/Azure-Kinect-Samples/body-tracking-samples to convert these two videos into JSON objects. For each frame the JSON contains the x, y, z positions where z is relative to the camera position and y+ is pointing downwards as well as the quaternion orientations for each joint
For the T-Pose json object, I extracted 1 frame of positions and rotations where the T-Pose was perfect. I then parsed this JSON object into two pandas dataframes of positions and orientations. The orientations were converted into euler angles.
For the second 'motion' Json object, I parsed that into two other pandas dataframes. In the first, each frame is a row and the columns are in the form
joint_1.x, joint_1.y, joint_1.z ... joint_n.x, joint_n.y, joint_n.z
In the orientation matrix, each row is also a frame and the columns are in the format
joint_1.z, joint_1.y, joint_1.x ... joint_n.z, joint_n.y, joint_n.x
What I want to know is this:
How can I go from these 4 matrices where all of the coordinates are in global space to a BVH file. I've tried a number of solutions but all have failed.
I'm missing some fundamental logic in this process and if anybody can help, I would really appreciate it. Any code solutions in any language are also appreciated

I've had some success with this.
You have to put the global orientations of the joints in the local orientation of their parent. But what does that actually mean?
Before you convert to the final euler, you should first convert to rotational matrices. The best tool I discovered for this is scipy.spatial.transform.Rotation.
First find the parent matrix, then its inverse and then perform a dot product between that and the child joint matrix
This will give you the local orientation of the child joint. However, you're not out of the woods yet. This approach works for the legs, spine and hips but results in really messed up arms
I suspect its related to the Azure Kinect's joint tracking orientation. See how the axes shift when they get to the arms
I don't really know enough about rotations to solve this final issue. Everything gets messed up from the clavicles onwards to the hands. Any suggestions would be great

Hands are very complicated because the hierarchy is pretty long. Use the normal rotations from spine chest to transform arms motion then use zyx as the order of rotation.

Related

How do you detect if there is motion between frames using opencv without simply subtracting the frames?

I have a camera in a fixed position looking at a target and I want to detect whether someone walks in front of the target. The lighting in the scene can change so subtracting the new changed frame from the previous frame would therefore detect motion even though none has actually occurred. I have thought to compare the number of contours (obtained by using findContours() on a binary edge image obtained with canny and then getting size() of this) between the two frames as a big change here could denote movement while also being less sensitive to lighting changes, I am quite new to OpenCV and my implementations have not been successful so far. Is there a way I could make this work or will I have to just subtract the frames. I don't need to track the person, just detect whether they are in the scene.
I am a bit rusty but there are various ways to do this.
SIFT and SURF are very expensive operations, so I don't think you would want to use them.
There are a couple of 'background removal' methods.
Average removal: in this one you get the average of N frames, and consider it as BG. This is vulnerable to many things, light changes, shadow, moving object staying at a location for long time etc.
Gaussian Mixture Model: a bit more advanced than 1. Still vulnerable to a lot of things.
IncPCP (incremental principal component pursuit): I can't remember the algorithm totally but basic idea was they convert each frame to a sparse form, then extract the moving objects from sparse matrix.
Optical flow: you find the change across the temporal domain of a video. For example, you compare frame2 with frame1 block by block and tell the direction of change.
CNN based methods: I know there are a bunch of them, but I didn't really follow them. You might have to do some research. As far as I know, they often are better than the methods above.
Notice that, for a #30Fps, your code should complete in 33ms per frame, so it could be real time. You can find a lot of code available for this task.
There are a handful of ways you could do this.
The first that comes to mind is doing a 2D FFT on the incoming images. Color shouldn't affect the FFT too much, but an object moving, entering/exiting a frame will.
The second is to use SIFT or SURF to generate a list of features in an image, you can insert these points into a map, sorted however you like, then do a set_difference between the last image you took, and the current image that you have. You could also use the FLANN functionality to compare the generated features.

Python Implementation for creating a triangular mesh from an array of closed loop planar contours

I'm a wee bit stuck.
I have a 3D point cloud (an array of (n,3) vertices), in which I am trying to generate a 3D triangular mesh from. So far I have had no luck.
The format my data comes in:
(x,y) values in regularly spaced (z) intervals. Think of the data as closed loop planar contours stored slice by slice in the z direction.
The vertices in my data must be absolute positions for the mesh triangles (i.e. I don't want them to be smoothed out such that the volume begins to change shape, but linear interpolation between the layers is fine).
Illustration:
Z=2. : ..x-------x... <- Contour 2
Z=1.5: ...\......|... <- Join the two contours into a mesh.
Z=1. : .....x----x... <- Contour 1
Repeat for n slices, end up with an enclosed 3D triangular mesh.
Things I have tried:
Using Open3D:
The rolling ball (pivot) method can only get 75% of the mesh completed and leaves large areas incomplete (despite a range of ball sizes). It has particular problems at the top and bottom slices where there tends to be large gaps in the middle (i.e. a flat face).
The Poisson reconstruction method smooths out the volume too much and I no longer have an accurate representation of the volume. This occurs at all depths from 3-12.
CGAL:
I cannot get this to work for the life of me. SWIG is not very good, the implementation of CGAL using SWIG is also not very good.
There are two PyBind implementations of CGAL however they have not incorporated the 3D triangulation libraries from CGAL.
Explored other modules like PyMesh, TriMesh, TetGen, Scikit-Geometry, Shapely etc. etc. I may have missed the answer somewhere along the line.
Given that my data is a list of closed-loop planar contours, it seems as though there must be some simple solution to just "joining" adjacent slice contours into one big 3d mesh. Kind of like you would in blender.
There are non-python solutions (like MeshLab) that may well solve these problems, but I require a python solution. Does anyone have any ideas? I've had a bit of a look into VTK and ITK but haven't found exactly what I'm looking for as of yet.
I'm also starting to consider that maybe I can interpolate intermediate contours between slices, and fill the contours on the top and bottom with vertices to make the data a bit more "pivot ball" method friendly.
Thank you in advance for any help, it is appreciated.
If there is a good way of doing this that isn't coded yet, I promise to code it and make it available for people in my situation :)
Actually there are two ways of having meshlab functionality in python:
The first is MeshLabXML (https://github.com/3DLIRIOUS/MeshLabXML ) a third party, is a Python scripting interface to meshlab scripting interface
the second is PyMeshLab (https://github.com/cnr-isti-vclab/PyMeshLab ) an ongoing effort done by the MeshLab authors, (currently in alpha stage) to have a direct Python bindings to all the meshlab filters
There is a very neat paper titled "Technical Note: an algorithm and software for conversion of radiotherapy contour‐sequence data to ready‐to‐print 3D structures" in the Journal of Medical Physics that describes this problem quite nicely. No python packages are required, however it is more easily implemented with numpy. No need for any 3D packages.
A useful excerpt is provided:
...
The number of slices (2D contours) constituting the specified structure is determined.
The number of points in each slice is determined.
Cartesian coordinates of each of the points in each slice are extracted and stored within dedicated data structures...
Numbers of points in each slice (curve) are re‐arranged in such a way, that the starting points (points with indices 0) are the closest points between the subsequent slices. Renumeration starts at point 0, slice 0 (slice with the lowest z coordinate).
Orientation (i.e., the direction determined by the increasing indices of points with relation to the interior/exterior of the curve) of each curve is determined. If differences between slices are found, numbering of points in non‐matching curves (and thus, orientation) is reversed.
The lateral surface of the considered structure is discretized. Points at the neighboring layers are arranged into threes, constituting triangular facets for the STL file. For each triangle the closest points with the subsequent indices from each layer are connected.
Lower and upper base surfaces of the considered structure are discretized. The program iterates over every subsequent three points on the curve and checks if they belong to a convex part of the edge. If yes, they are connected into a facet, and the middle point is removed from further iterations.
So basically it's a problem of aligning datasets in each slice to the nearest value of each slice. Then aligning the orientation of each contour. Then joining the points between two layers based on distance.
The paper also provides code to do this (for a DICOM file), however I re-wrote it myself and it works a charm.
I hope this helps others! Make sure you credit the author's in any work you do that uses this.
A recent feature of pymadcad can do things like this, not sure through if it fits your exact expectation in term of "pivot ball" or such things, checkout the doc for blending
Starting from a list of outlines, it can generate blended surfaces to join them:
For your purpose, I guess the best is one of:
blendpair(line1, line2)
junction(*lines)

remove inside information when merging two 3d objects

Hi i'm currently working on a project where we have to combine multiple 3d objects. So for example place them next to each other and some times they also intersect.
I'm looking for an algorithm/library or any idea that would reduce this new merged object to only consist of the outside faces. (Our 3d objects currently are .stl files but we are not bound to this format)
We've tried combining these objects with numpy-stl but it seems like this library does not have any optimisation that would help with this problem. We also tried using the boolean merge from pymesh but this takes very much time with detailed objects.
We want to loose all information that is inside the object and only keep the information that is outside. So for example if you would put this combined 3d object in water, we only want the faces that would be touched by the water.
We prefer python but any algorythm that could be implemented in python would bring us forward.
We appreciate every answer :)
LibIGL appears to have Python bindings. I would suggest thresholding the ambient occlusion of each facet. For example, maybe delete all facets with an occlusion value higher than 0.8
https://libigl.github.io/libigl-python-bindings/igl_docs/#ambient_occlusion
The inputs to this function are the vertices, the facet indexing into the vertices, the position of the facet centroids, and the normals for each facet. The output is the ambient occlusion for each facet, which is a value between 0 and 1. A value of 0 means the facet is fully visible, and a value of 1 means it is completely shadowed.

Separating Meshes with vtkPolyDataConnectivityFilter

I am having a hard time using vtk, especially the vtkPolyDataConnectivityFilter. I feed in the output of a Marching Cubes algorithm that created a surface from a 3d point cloud.
However, when i try to set
filt = vtk.vtkConnectivityFilter()
filt.SetInputData(surface_data) # get the data from the MC alg.
filt.SetExtractionModeToLargestRegion()
filt.ColorRegionsOn()
filt.Update()
filt.GetNumberOfExtractedRegions() # will 53 instead of 1
it gives me weird results. I cannot use the extraction modes for specific regions or seed a single point, since i don't know them in advance.
I need to separate the points of the largest mesh from the smaller ones and keep only the large mesh.
When i render the whole output it shows me the right extracted region. However, the different regions are still contained in the dataset and there is no way to separate it.
What am i doing wrong?
Best J
I had the same problem where I had to segment an STL file converted to vtkpolydata.
If you look at the example https://www.vtk.org/Wiki/VTK/Examples/Cxx/PolyData/PolyDataConnectivityFilter_SpecifiedRegion , you will find they use the member function SetExtractionModeToSpecifiedRegions().
Replace you code with the following:
filt.SetInputData(surface_data)
filt.SetExtractionModeToSpecifiedRegions()
filt.AddSpecifiedRegion(0) #Manually increment from 0 up to filt.GetNumberOfExtractedRegions()
filt.Update()
You will need to render and view the specified region to figure out the index of the segmented region your actually interested in.

Python OpenCV stereo camera position

I'd like to determine the position and orientation of a stereo camera relative to its previous position in world coordinates. I'm using a bumblebee XB3 camera and the motion between stereo pairs is on the order of a couple feet.
Would this be on the correct track?
Obtain rectified image for each pair
Detect/match feature points rectified images
Compute Fundamental Matrix
Compute Essential Matrix
Thanks for any help!
Well, it sounds like you have a fair understanding of what you want to do! Having a pre-calibrated stereo camera (like the Bumblebee) will then deliver up point-cloud data when you need it - but it also sounds like you basically want to also use the same images to perform visual odometry (certainly the correct term) and provide absolute orientation from a last known GPS position, when the GPS breaks down.
First things first - I wonder if you've had a look at the literature for some more ideas: As ever, it's often just about knowing what to google for. The whole idea of "sensor fusion" for navigation - especially in built up areas where GPS is lost - has prompted a whole body of research. So perhaps the following (intersecting) areas of research might be helpful to you:
Navigation in 'urban canyons'
Structure-from-motion for navigation
SLAM
Ego-motion
Issues you are going to encounter with all these methods include:
Handling static vs. dynamic scenes (i.e. ones that change purely based on the camera motion - c.f. others that change as a result of independent motion occurring in the scene: trees moving, cars driving past, etc.).
Relating amount of visual motion to real-world motion (the other form of "calibration" I referred to - are objects small or far away? This is where the stereo information could prove extremely handy, as we will see...)
Factorisation/optimisation of the problem - especially with handling accumulated error along the path of the camera over time and with outlier features (all the tricks of the trade: bundle adjustment, ransac, etc.)
So, anyway, pragmatically speaking, you want to do this in python (via the OpenCV bindings)?
If you are using OpenCV 2.4 the (combined C/C++ and Python) new API documentation is here.
As a starting point I would suggest looking at the following sample:
/OpenCV-2.4.2/samples/python2/lk_homography.py
Which provides a nice instance of basic ego-motion estimation from optic flow using the function cv2.findHomography.
Of course, this homography H only applies if the points are co-planar (i.e. lying on the same plane under the same projective transform - so it'll work on videos of nice flat roads). BUT - by the same principal we could use the Fundamental matrix F to represent motion in epipolar geometry instead. This can be calculated by the very similar function cv2.findFundamentalMat.
Ultimately, as you correctly specify above in your question, you want the Essential matrix E - since this is the one that operates in actual physical coordinates (not just mapping between pixels along epipoles). I always think of the Fundamental matrix as a generalisation of the Essential matrix by which the (inessential) knowledge of the camera intrinsic calibration (K) is omitted, and vise versa.
Thus, the relationships can be formally expressed as:
E = K'^T F K
So, you'll need to know something of your stereo camera calibration K after all! See the famous Hartley & Zisserman book for more info.
You could then, for example, use the function cv2.decomposeProjectionMatrix to decompose the Essential matrix and recover your R orientation and t displacement.
Hope this helps! One final word of warning: this is by no means a "solved problem" for the complexities of real world data - hence the ongoing research!

Categories

Resources