This is my first post here, so hello everyone.
I am working on a project that involves writing a program in c++ or python that will detect obstacles and will be used for AR.Drone 2.0. However, I don't know which approach should I take.
Initially, I was adviced to use opencv and optocal flow. I've found some videos and papers about it and one way is that: divide every frame from AR.Drone's camera on 2 (left/right side) or 4 (additionally up and bottom) and calculate optical flow for each part. Then, fly in the direction where the optical flow is less.
However I have some doubts about it:
1)Which method of optical flow calculation should I use? I know that in opencv there are provided methods for calculating both dense or sparse optical flow. Which one should I choose in this application? Won't dense optical flow be too slow to meet real-time requirements?
2)I guess that in time when UAV moves left-right or up-down I'll get some "fake" vectors caused by the movement of a drone and not because of the looming obstacle. How to prevent this?
Another solution I was told about a method shown here (link for paper in description) and someone who implemented it github link however the author admitted that he "never get obstacle detection working properly on the drone".
Another option I was told about is attaching a realsense camera to a drone and extract an information about the obstacle somehow using it.
So, my question is - which path should I take? Or is there some other method to do this that will work for application I described and is relatively easy to implement?
Thanks in advance for every reply.
I'm not sure the scope of your project, whether or not this is academic or professional, but my recommendation would be to use object detection of a control image with the camera facing directly forward on the drone. if the object is detected, you can estimate it's distance from your drone based on it's size. Since it is a control image it should have a constant size and you should record how many pixels across that is at various distances from your camera. This way you can build up a model. Once you know how far away the object is you can determine if it is an obstacle or not.
Once the detection becomes large enough, determine if it is in the flight path. Then you move the drone such that the coordinates of the detection box are no longer in your flight path.
For the detection, you can either use Google's detection api which comes with a number of solid detectors/classifiers, or if you are looking to add a layer of depth to the project you can train your own. PyImageSearch is a great place to start. And if you are feeling extra scientific you can dive right into Tensorflow.
Best of luck!
Try the open source project https://github.com/generalized-intelligence/GAAS
It uses stereo camera and SLAM to detect obstacles.
Related
I'm working on a machine learning application for reading data from fuel pumps, so far I've gone ahead and created a pretty robust YOLOv5 Object Detection Model that can detect the regions that I want fairly accurately. But there is a problem, at certain times of the day there are reflections on the digital screen and I'm unable to use OpenCV pre-process it so that I can extract the numbers from the display.
Check this Video to Understand (YOLOv5 Detection)
https://www.youtube.com/watch?v=3XjZ6Nw70j8
Minimum Reproduceable Example
Cars come and go and their reflection makes it really difficult to differentiate between the reigons for digital-7 font that is used in these displays, you can check out the following repository to understand what I want as s result https://github.com/arturaugusto/display_ocr
Other Solutions I'm Open to:
Since, this application is going to run 24/7 how should I deal with different times,
perhaps create a database of HSV ranges to extract at different times.
Use a polarizing lens would it help in removing the reflections (any user's who have had previous experiences in deploying them).
Edit: I added the correct video ...
I intend to make a 3D model based on multi view stereo images ( basically 2D plane images of the same object from different angles and orientation) inside Blender from scratch.However, I am new to Blender.
I wanted to know if there are any tutorials of how to project a single pixel or point in the space of Blender's 3D environment using python. If not tutorial, any documentation. I am still learning about this whole 3D construction thing and pretty new to this, so I am not sure maybe these points are displayed using a 3 dimensional matrix/array ?
Basically I want to implement 3D construction based on a paper written by some researchers. Mostly every such project is in C++. I want to do it in Python in Blender, and if I am capable enough, make these libraries open source.
Suggest me any pre-requisite if you think that shall help me. I have just started my 3rd year of BSc Computer Science course, and very new to the world of Computer Graphics.
(My skillset is C, Java and Python.)
I would be very glad and appreciate any help.
Thank You
[Link to websitehttps://vision.in.tum.de/research/image-based_3d_reconstruction/multiviewreconstruction[][1]]
image2
Yes, it can very likely be done in Blender, and in Python at least for small geometries / low resolution.
A valid approach for the kind of scenarios you seem to want to play with is based on the idea of "space carving" or "silhouette projection". A good description in is an old paper by Kutulakos and Seitz, which was based in part on earlier work by Szelisky.
Given a good estimation of the silhouettes, these methods can correctly reconstruct all convex portions of the object's surface, and the subset of concavities that are resolved in the photo hull. The remaining concavities are "patched" over and need to be reconstructed using a different method (e.g. stereo, or structured light). For the surfaces that can be reconstructed, space carving is generally more robust than stereo (since it is insensitive to the color and surface texture of the object), and can work on surfaces where structured light struggles (e.g. surfaces with specularities, or very dark objects with low reflectance for a laser stripe)
The basic idea is to use the silhouettes of the projection of the object in cameras around it to "remove" mass from an initial volume (e.g. a box) encompassing the object, a bit like a sculptor carving a statue by removing material from a block of marble.
Computationally, you can do it representing the volume of space of interest using an octree, initialized with a minimal level of subdivision, and then progressively refined. The refinement consists of projecting the vertices of the octree leaves in the cameras, and identifying which leaves are completely outside or partially inside the silhouettes. The former are pruned, while the latter are split, and the process continues until no more leaves can be split or a maximul level of subdivision is reached. The hull of the octree is then extracted as a "watertight" mesh using standard methods.
Apart from the above paper, a way more detailed description can be found on an old patent by Geometrix - it sold a scanner based on the above ideas around year 2000. Here is what it looked like:
I thought about tackling a new project in which I use Tensorflow object detection API to detect Euro pallets (eg. pic).
My ultimate goal is to know how far I am away from the pallet and which relative position I have to it. So I thought about first detecting the euro pallet in an RGB feed from a kinect camera and then using its 3D feature to get the distance to the pallet.
But how do I go about the relative position of the pallet? I could create different classes, for example one is "Front view laying pallet" another one Side view laying pallet etc. but I think for that to be accurate I'd need quite a few pictures for each class for it to be valid? Like 200 for each class?
Since my guess is that there are no such labeled datasets yet thats quite a pain to create by myself.
Another way I could think of, is if I label my pallets with segmentation instead of bounding boxes, maybe there is another way to find out my relative position to the pallet? I never did semantic segmentation labeling myself but can anyone name any good programs which I could use?
I'm hoping someone can help point me in the right direction. Any help would be appreciated.
Some ideas: assuming detection and segmentation with classifier(s) works, one could then try feature detection like edges / lines to obtain clues about its orientation (bounding box).
Of course this will be tricky for simple feature detection because of very different surfaces (wood, dirt), backgrounds and lighting.
Also, "markerless tracking" (a topic in augmented reality) and "bin picking" (actually applied in the automation industry) may be keywords for similar problems, although you are probably not starting with an unordered pile of pallets.
I am using OpenCV and Python.
Let say I have this sequence video of the car. And I have tracked some 'interesting points' of the car with the cv2.goodFeaturesToTrack and cv2.calcOpticalFlowPyrLK. Now, given the traced points, I want to estimate a very rough shape (maybe a 3D box) of the car and its distance from the camera. It doesn't need to be that accurate.
On top of that, I want it to be keep updating in real time. The closest youtube video I can find that can give a view of what I am trying to achieve is this. I have found a new Structure from Motion module in OpenCV, but it is more on building a 3D model from a collection of points.
The question is, what is the best way of achieving this and what kind of library I can use (especially in order to construct the 3D space)?
And it is also OK if somehow I need to use C++ for this (although I am still not good in it yet).
Thanks.
I'd like to determine the position and orientation of a stereo camera relative to its previous position in world coordinates. I'm using a bumblebee XB3 camera and the motion between stereo pairs is on the order of a couple feet.
Would this be on the correct track?
Obtain rectified image for each pair
Detect/match feature points rectified images
Compute Fundamental Matrix
Compute Essential Matrix
Thanks for any help!
Well, it sounds like you have a fair understanding of what you want to do! Having a pre-calibrated stereo camera (like the Bumblebee) will then deliver up point-cloud data when you need it - but it also sounds like you basically want to also use the same images to perform visual odometry (certainly the correct term) and provide absolute orientation from a last known GPS position, when the GPS breaks down.
First things first - I wonder if you've had a look at the literature for some more ideas: As ever, it's often just about knowing what to google for. The whole idea of "sensor fusion" for navigation - especially in built up areas where GPS is lost - has prompted a whole body of research. So perhaps the following (intersecting) areas of research might be helpful to you:
Navigation in 'urban canyons'
Structure-from-motion for navigation
SLAM
Ego-motion
Issues you are going to encounter with all these methods include:
Handling static vs. dynamic scenes (i.e. ones that change purely based on the camera motion - c.f. others that change as a result of independent motion occurring in the scene: trees moving, cars driving past, etc.).
Relating amount of visual motion to real-world motion (the other form of "calibration" I referred to - are objects small or far away? This is where the stereo information could prove extremely handy, as we will see...)
Factorisation/optimisation of the problem - especially with handling accumulated error along the path of the camera over time and with outlier features (all the tricks of the trade: bundle adjustment, ransac, etc.)
So, anyway, pragmatically speaking, you want to do this in python (via the OpenCV bindings)?
If you are using OpenCV 2.4 the (combined C/C++ and Python) new API documentation is here.
As a starting point I would suggest looking at the following sample:
/OpenCV-2.4.2/samples/python2/lk_homography.py
Which provides a nice instance of basic ego-motion estimation from optic flow using the function cv2.findHomography.
Of course, this homography H only applies if the points are co-planar (i.e. lying on the same plane under the same projective transform - so it'll work on videos of nice flat roads). BUT - by the same principal we could use the Fundamental matrix F to represent motion in epipolar geometry instead. This can be calculated by the very similar function cv2.findFundamentalMat.
Ultimately, as you correctly specify above in your question, you want the Essential matrix E - since this is the one that operates in actual physical coordinates (not just mapping between pixels along epipoles). I always think of the Fundamental matrix as a generalisation of the Essential matrix by which the (inessential) knowledge of the camera intrinsic calibration (K) is omitted, and vise versa.
Thus, the relationships can be formally expressed as:
E = K'^T F K
So, you'll need to know something of your stereo camera calibration K after all! See the famous Hartley & Zisserman book for more info.
You could then, for example, use the function cv2.decomposeProjectionMatrix to decompose the Essential matrix and recover your R orientation and t displacement.
Hope this helps! One final word of warning: this is by no means a "solved problem" for the complexities of real world data - hence the ongoing research!