I have continuous videos taken from two cameras placed on up right and up left corners of my car's windshield (please note that they are not fixed to each other, and I aligned them approximately straight). Now I am trying to make a 3D point cloud out of that and have no idea how to do that. I surfed the internet a lot and still couldn't find any useful info. Can you send me some links or hints on how can I make that work in Python.
You can try the stereo matching and point cloud generation implementation in the OpenCV library. Start with this short Python sample.
I suppose that you have two independent video streams that are not exactly synchronized. You will have to synchronize them first, because the linked sample expects two images, not videos. Extract images from videos using OpenCV or ffmpeg and find an image pair that shares exactly the same timepoint (e.g. green appearing on a traffic light). Alternatively you can use the audio tracks for synchronization, see https://github.com/benkno/audio-offset-finder. Beware: synchronization based on a single frame pair or a short audio excerpt will probably work only for few minutes before and after the synchronized timepoint.
Related
I'm struggling with a real-time application I'm currently writing. I capture a webcam stream and apply multiple image processing algorithms to each individual frame, e.g. to get the emotion of a person in the frame and to detect objects in it.
Unfortunately, the algorithms have different runtimes and since some are based on neural networks, those in particular are slow.
My goal is, that I want to show a video stream without lags. I don't care if an image processing algorithm grabs only every n-th frame or shows the results with a delay.
To get rid of the lags, I put the image processing in different threads but I wonder if there is a more sophisticated way to synchronize my analysis on the video stream's frames - or maybe even a library that helps building pipelines for real time data analytics?
Every hint is welcome!
I'm working on a machine learning application for reading data from fuel pumps, so far I've gone ahead and created a pretty robust YOLOv5 Object Detection Model that can detect the regions that I want fairly accurately. But there is a problem, at certain times of the day there are reflections on the digital screen and I'm unable to use OpenCV pre-process it so that I can extract the numbers from the display.
Check this Video to Understand (YOLOv5 Detection)
https://www.youtube.com/watch?v=3XjZ6Nw70j8
Minimum Reproduceable Example
Cars come and go and their reflection makes it really difficult to differentiate between the reigons for digital-7 font that is used in these displays, you can check out the following repository to understand what I want as s result https://github.com/arturaugusto/display_ocr
Other Solutions I'm Open to:
Since, this application is going to run 24/7 how should I deal with different times,
perhaps create a database of HSV ranges to extract at different times.
Use a polarizing lens would it help in removing the reflections (any user's who have had previous experiences in deploying them).
Edit: I added the correct video ...
I've been given a video consisting of frames like this:
and I am supposed to count how many of fishes went in front of the screen during the video using Python. I am not allowed to use OpenCV or similar library. The only library I am allowed to use is PIL and Numpy, therefore I am forced to use numpy matrices as the image representation.
So far I am able to generate this one channel image mask:
which I believe should be enough to detect fishes on one frame. However now I need to somehow track those fishes throughout the remaining frames to distinguish those fishes who were already on the screen and those who moved in.
What would be the procedure to do that? I'm imagining somehow marking those areas and then predicting the movement of those fishes or something.
I am trying to detect and extract the "labels" and "dimensions" of a 2D technical drawing which is being saved as PDF using python. I came across a python library call "pytesseract" which has optical character recognition capability. I tried the demo on my image but it fails to detect most of the label/dimensions. Please suggest if there is other way to do it. Thank you**.
** Attached is a sample of the 2D technical drawing I try to detect
** what I am trying to achieve is to able to obtain the coordinate of every dimensions (the 160,120,10 4x45 etc) on the image, and extract the, as well.
About 16 months ago we asked ourselves the same question.
If you want to implement it yourself, I'd suggest the following process:
Extract the Canvas from the sheet
Separate the Cuts
Detect the Measure Regions on each Cut
Detect the individual attributes of the Measure Regions to understand where the Measure Start & End. In your particular example that's relatively easy.
Run the detected Measure Labels through OCR
Associate the Labels to the Measures
Verify your results
Alternatively you can also run it through our API and get the results as JSON.
Here's a quick visualization of the result:
Drawing Read (GT stands for General Tolerances)
I have a problem, not so easy to solve i guess. In general, I have a database of frames from different videos and I want to find for a given picture (which is not necessarily one of the frames but from some same source video) the matching source video.
So lets say I have some videos and extracted frames each x seconds. The frames are stored in the db.
My guess would now be to loop over all video frames in the db and try to find matching features. So I would somehow have to find features in the source image and then try to find these in the frames stored in the db.
My question is how can I achieve this? The problem is that camera angle and vieweing distance can be quite different when the picture in question was not taken quite close to the time the frame was extracted previously.
Is this even feasible?
I'm working with Python and OpenCV.
Thanks and best regards