I'm trying to work out how to measure an object in a photograph. I want to measure the actual, real-world size of it. Luckily, this object has a scale in cm. My thinking is that I measure the pixels in the scale, and use that to then determine the size of the other object/s in the photo. I've been working on this in scikit images with mixed results. One of the issues I get is that the resolution of the image changes the measurement. So, it seems that thresholding for the scale and extracting pixel counts does not work.
I know that OpenCV has the ability to measure objects with a bounding box, however the objects I'm trying to measure have uneven sides/edges, and this needs to be accounted for (the shape of the object is important too, and I can capture that using thresholding and contours).
I'm hoping that people on this board can point me in alternate/better directions for trying to solve this issue. Perhaps my approach is all wrong. Thank you.
Example photo of vase with 10 cm scale. (https://i.stack.imgur.com/dK2BZ.jpg)
Related
Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.
I have a camera in a fixed position looking at a target and I want to detect whether someone walks in front of the target. The lighting in the scene can change so subtracting the new changed frame from the previous frame would therefore detect motion even though none has actually occurred. I have thought to compare the number of contours (obtained by using findContours() on a binary edge image obtained with canny and then getting size() of this) between the two frames as a big change here could denote movement while also being less sensitive to lighting changes, I am quite new to OpenCV and my implementations have not been successful so far. Is there a way I could make this work or will I have to just subtract the frames. I don't need to track the person, just detect whether they are in the scene.
I am a bit rusty but there are various ways to do this.
SIFT and SURF are very expensive operations, so I don't think you would want to use them.
There are a couple of 'background removal' methods.
Average removal: in this one you get the average of N frames, and consider it as BG. This is vulnerable to many things, light changes, shadow, moving object staying at a location for long time etc.
Gaussian Mixture Model: a bit more advanced than 1. Still vulnerable to a lot of things.
IncPCP (incremental principal component pursuit): I can't remember the algorithm totally but basic idea was they convert each frame to a sparse form, then extract the moving objects from sparse matrix.
Optical flow: you find the change across the temporal domain of a video. For example, you compare frame2 with frame1 block by block and tell the direction of change.
CNN based methods: I know there are a bunch of them, but I didn't really follow them. You might have to do some research. As far as I know, they often are better than the methods above.
Notice that, for a #30Fps, your code should complete in 33ms per frame, so it could be real time. You can find a lot of code available for this task.
There are a handful of ways you could do this.
The first that comes to mind is doing a 2D FFT on the incoming images. Color shouldn't affect the FFT too much, but an object moving, entering/exiting a frame will.
The second is to use SIFT or SURF to generate a list of features in an image, you can insert these points into a map, sorted however you like, then do a set_difference between the last image you took, and the current image that you have. You could also use the FLANN functionality to compare the generated features.
I have some SIFT features in two stereo images, and I'm trying to place them in 3D space. I've found triangulatePoints, which seems to be what I want, however, I'm having trouble with the arguments.
triangulatePoints takes 4 arguments, projMatr1 and projMatr2, which is where my issues start, and projPoints1 and projPoints2, which are my feature points. The OpenCV docs suggest using stereoRectify to find the projection matrices.
stereoRectify takes the intrinsic camera matrices (which I've calculated prior with calibrateCamera) and the image size from calibration. As well as two arguments R (rotation matrix) and T (translation vector), which can be found with stereoCalibrate.
However, stereoCalibrate takes "object points", which I'm pretty sure I can't calculate for images without a reference, which is a bit of a roadblock.
Is this the best way to be calculating 3D positions from pairs of features? If so, how can I calculate projMatr1 and projMatr2 without stereoCalibrate?
As you say, you have no calibration, so let’s forget about rectification. What you want is the depth of the points, so you can project them into 3D (which then uses just the intrinsic calibration of one camera, mainly the focal length).
Since you have no rectification, you cannot expect exact results, so let’s try to get as close as possible:
Depth is focal length times baseline divided by disparity, disparity and focal length being in pixels, and depth and baseline in (recommendation) meters.
For accurate disparity you need a rectified camera and correspondences between your features in both images. Since without calibration, you have no hope of rectification, you could try to just use the original images instead. It will work fine the more parallel the cameras are. If they are not parallel, you will introduce an error here and your results will become less accurate. If this becomes bad you must find a way to calibrate your camera.
But most importantly, you need correspondences between your features in both images. Running SIFT in both images won‘t do. A better approach would be running SIFT in just one image and then finding the corresponding pixels for each of the features in the other image. There are plenty of methods for that, I believe OpenCv has some simple block matching builtin.
In the image there are 2 insulators, the one on the left has a gap i.e a disk missing in between. I have to detect the missing disk with a rectangular box. I know alogorithms SIFT and SURF or by using absdiff() in opencv for calculating difference between two images.
How can i can detect the missing disk if I only have this image.
Image
You should find contours,bounding boxes and circles.After that you can find missing object or noise objectt. Other way to use AI to fit objects and search for that. But this one is very hard job
General algorithm (it's obviously):
find insulators
find gaps
find insulators with gaps.
I think, insulators are kinda standardized by size and look. So, probably, you can detect them by color/texture and/or some specific details. They can't be very "curve", so you can estimate them with lines and separate overlapped elements. If all insulators have same size, than you can normalize them, stretch by one axe, and then detect gaps.
Their is no way to do 100% correct recognizing in all cases, but you can use some knowledge about insulators and get good results.
I have an image captured by android camera. Is it possible to calculate depth of object in the image ? Image contains object and background only. Any suggestion, explanation or links that you think can help me will be appreciated.
OpenCV is the library you need.
I did some depth identification of water levels in pure white background a few days ago. Generally, if you want to identify the depth, you can convert the question to identify the edge of the changing colors. In this case, you can convert the colorful pictures to grey and identify the changing of while-black-grey interface. OpenCV is capable of doing the job at high speed.
Hope it helps. Let me know if you need further help.
Edits:
If you want to find the actual depths, you need to project the coordinate system of your pictures to the real world, or vice versa. To do it, you have to know a fix location as your reference and the relationship between pixels and real distances.
What I did is find the fixed location and set it as zero. Afterwards, I measured a length of an object in the picture, and also calculated the pixel amount of the object. Therefore I obtained the relationship between pixels and real distances.
Note that these procedures may involve errors in the identification. I did it very carefully and the error was acceptable in my case.
With only one image, accurate depth estimation is near impossible. However, there are various methods of estimating depth under certain assumptions or the availability of the camera calibration matrix. As mentioned by #WenlongLiu, OpenCV is a very good place to start with.