Detect framed video content using python

Detect framed video content using python - python

I would like to detect framed video content (frequently used in TV advertising and referred to as single split, program split etc.)
Example 1:
Example 2 (3 screen captures, 2 seconds offset):
I have the video sequence as well as 3 screen captures available to analyze (middle, middle between middle and end, end).
To get started, I already tried a few methods like bounding box detection, and autocrop algorithms on the screen captures using opencv, imagemagick and PIL. This works to some extent, but not reliably.
Every TV station uses their own artwork for the surrounding frame
They sometimes animate the surrounding frame in the first few seconds
The background of the surrounding frame can be static but also animated, changing colors, etc
What would be an effective method to get a rather precise true/false reading on the media examples above? I would appreciate some ideas to build a suitable algorithm.
Thanks

You are essentially looking for two static, horizontal lines and two static, vertical lines that represent the edges of the inset video clip. What is inside the frame and what is outside the frame may/will be changing - only the edges of the inset frame are constant.
I would be applying a strong directional filter (Sobel) oriented at 0 and 90 degrees to find the horizontal and vertical lines. Then work through some number of frames accumulating all the edges that the two filters find. The brightest lines in the image at the end should be the best defined ones that have stayed still the longest.

Related

how to locate and extract coordinates/data/sub-components of charts/map image data?

I'm working on creating a tile server from some raster nautical charts (maps) i've paid for access, and i'm trying to post-process the raw image data that these charts are distributed as, prior to geo-referencing them and slicing them up into tiles
i've got a two sets of tasks and would greatly appreciate any help or even sample code on how to get these done in an automated way. i'm no stranger to python/jupyter notebooks but have zero experience with this type of data-science to do image analysis/processing using things like opencv/machine learning (or if there's a better toolkit library that i'm not even yet aware of).
i have some sample images (originals are PNG but too big to upload so i encoded them in high-quality JPEGs to follow along/provide sample data).. here's what i'm trying to get done:
validation of all image data.. the first chart (as well as last four) demonstrate what properly formatted charts images should looks like (i manually added a few colored rectangles to the first, to highlight different parts of the image in the bonus section below)
some images will either have missing tile data, as in the 2nd sample image, these are ALWAYS chunks of 256x256 image data, so should be straightforward to identify black boxes of this exact size..
some images will have corrupt/misplaced tiles as in the 3rd image (notice in the center/upper half of the image is a large colorful semi-circle/arcs, it is slightly duplicated beneath and if you look along horizontally you can see the image data is shifted and so these tiles have been corrupted somehow
extraction of information, ultimately once all image data is verified to be valid (the above steps are ensured), there is a few bit of data i really need pulled out of the image, the most important of which is
the 4 coordinates (upper left, upper right, lower left, lower right) of the internal chart frame, in the first image they are highlighted in a small pink box at each corner (the other images don't have them but they are located in a simlar way) - NOTE, because these are geographic coordinates and involve projections, they are NOT always 100% horizontal/vertical of each other.
the critical bit is that SOME images container more than one "chartlet", i really need to obtain the above 4 coordinate for EACH chartlet (some charts have no chartlets, some two to several of them, and they are not always simple rectangular shapes), i may be able to generate for input the number of chartlets if that helps..
if possible, what would also help is extracting each chartlet as a separate image (each of these have a single capital letter, A, B, C in a circle that would be good if it appeared in the filename)
as a bonus, if there was a way to also extract the sections sampled in the first sample image (in the lower left corner), this would probably involve recognize where/if in the image this appears (would probably only appear once per file but not certain) and then extracting based on its coordinates?
mainly the most important is inside a green box and represents a pair of tables (the left table is an example and i believe would always be the same, and the right has a variable amount of columns)
also the table in the orange box would be good to also get the text from as it's related
as would the small overview map in the blue box, can be left as an image
i have been looking at tutorials on opencv and image recognition processes but the content so far has been highly elementary not to mention an overwhelming endless list of algorithms for different operations (which again i don't know which i'd even need), so i'm not sure how it relates to what i'm trying to do.. really i don't even know where to begin to structure the steps needed for undertaking all these tasks or how each should be broken down further to ease the processing.

Extracting text from scanned engineering drawings

I'm trying to extract text from a scanned technical drawing. For confidentiality reasons, I cannot post the actual drawing, but it looks similar to this, but a lot busier with more text within shapes. The problem is quite complex due to issues with letters touching both each other and it's surrounding borders / symbols.
I found an interesting paper that does exactly this called "Detection of Text Regions From Digital Engineering Drawings" by Zhaoyang Lu. It's behind a paywall so you might not be able to access it, but essentially it tries to erase everything that's not text from the image through mainly two steps:
1) Erases linear components, including long and short isolated lines
2) Erases non-text strokes in terms of analysis of connected components of strokes
What kind of OpenCV functions would help in performing these operations? I would rather not write something from the ground up to do these, but I suspect I might have to.
I've tried using a template-based approach to try to isolate the text, but since the text location isn't completely normalized between drawings (even in the same project), it fails in detecting text past the first scanned figure.

I am working on a similar problem. Technical drawings are an issue because OCR software mostly tries to find text baselines and the drawing artifacts (lines etc) get in the way of that approach. In the drawing you specified there are not many characters touching each other. So I suggest to break the image into contiguous (black) pixels and then scan those individually. The height of the contiguous areas should give you also an indication if the contiguous area is text, or a piece of the drawing. To break the image into contiguous pixels, use a flood fill algorithm, and for the scanning Tesseract does a good job.

Obviously I've never attempted this specific task, however if the image really looks like the one you showed me I would start by removing all vertical and horizontal lines. This could be done pretty easily, just set a width threshold and for all pixels with intensity larger than some N value, and after that look the threshold amount of pixels perpendicular to the hypothethic line orientation. If it looks like a line erase it.
More elegant and perhaps better would be to do a hough transform for lines and circles and remove those elements that way.
Also you could maybe try some FFT based filtering, but I'm not so sure about that.
I've never used OpenCV but i would guess it can do the things i mentioned.

Grabbing Non-reflective Markers

I am trying to write a script in Python using OpenCV that will find, track, and output the positions of multiple markers on a person performing an exercise in a video. However, the markers were not properly lit up at the time of video capture, and so as a result they appear the same color as much of the background - an unspectacular, non-reflective grey. This is a problem when it comes to pulling them out. Even when converting the image to HSV, it seems impossible to filter out the surroundings (subject's clothes, the walls, the ceiling, etc) without the markers vanishing too. And as far as finding contours goes, there's so much going on in any particular frame that the number of contours found is pretty high and the markers themselves are not necessarily the smallest detected, so I can't just assuming 'min(contours)' as many tutorials try to do.
I've tried to isolate the markers using several different methods, mostly involving manipulating the mask/HSV image, but also some others, such as SimpleBlobDetector and finding keypoints. The best method I can think of is using keypoint detection to manually select the points of interest, but even those don't always pick up the markers.
I can't share a full-size sample image since it's a person in the videos I'm using, but some notes on the situation:
I can't retake the video to do the markers correctly. This data wasn't originally taken for more than camera alignment, so no one was too concerned about marker illumination. The data sucks, but it's all I have to work with.
Skin is very easy to filter out for the most part, but outlines of the clothes, environment, and skin always remain.
in the image above, the user is holding the exercise bar. there's a marker just under the center of the image, and another further up the arm. The spots towards the right edge are not markers. H(0, 26), S(0,57), V(0,255)
Markers really are basically the same color as the wall and ceiling.
TL;DR: I need a way to grab non-reflective markers in a busy environment with as little user input as possible. Data can't simply be re-taken, and methods typical for acquiring motion capture data are not working out here.

Combine two overlapping videos frame by frame to form a single frame

I am getting video input from 2 separate cameras with some area of overlap between the output videos. I have tried out a code which combines the video output horizontally. Here is the link for that code:
https://github.com/rajatsaxena/NeuroscienceLab/blob/master/positiontracking/combinevid.py
To explain the problem visually:
The red part shows the overlap region between two image frame. I need the output to look like the second image, with first frame in blue and second frame in green (as shown in third illustration)
A solutions I can think of but unable to implement is, Using SIFT/SURF find out the maximum distance keypoints from both frames and then take the first video frame completely and just pick the non overlapping region from second video frame and horizontally combine them to get the stitched output.
Let me know of any other solutions possible as well. Thanks!

I read this post one hour ago. I tried some really easy approach. Not perfect but in some cases should work well. For example, if you have both cameras on one frame placed side by side.
I took 2 images from the phone like on a picture (color images). Program select Rectangles region from both source images and resize end extract this roi rectangles. The idea is to find the "best" overlapping Rect regions by normalized correlation.
M1 and M2 is mat roi to compare,
matchTemplate(M1, M2, res, TM_CCOEFF_NORMED);
After, I find this overlapping Rect use this to crop source images and combine by hconcat() function together.
My code is in C++ but is really simple to replicate this in python. It is not the best solution but one of the most simple solution. If your cameras are fixed in stable position between themselves. This is a good solution I think.
I hold my phone in hand :)
You can also use this simple approach on video. The speed depends only on the number of rectangle candidate you compare.
You can improve this by smart region to compare selection.
Also, I am thinking about another idea to use optical flow by putting your images from a camera at the same time to sequence behind each other. From the possible overlapping regions in one image extract good features to track and find them in the region of second images.
Surf and sift are great for this but this is the most simple idea on my mind.
Code is Here Code

Python OpenCV Tracking Points

Using Python, OpenCV, and live webcam input, I can't figure out how to set a point based on an x y coordinate and track where it moves.

Below is a simple example to track a yellow object.
https://github.com/abidrahmank/OpenCV-Python/blob/master/Other_Examples/track_yellow_draw_line.py
Here is the method to track yellow color:
1) Extract the first frame of video
2) Convert frame into HSV color space. Take H plane and threshold it for yellow color so that you get binary image with yellow object as white (also called blob) and remaining as black.
3) Now you find centre point of blob. You can use moments or contours(especially if you have more than one blob. In the example above, very simple logic is used. Just find leftmost,rightmost,topmost and bottommost points on blob and draw a rectangle around it). And store this values.
4) Extract next frame and follow all above steps to get new position. Join these two position and draw a line.
Over.

There are a few blogs that explain the basics. Check out this one: Object tracking in OpenCV and Python 2.6.
Edit: I don't think you can track arbitrary points. To be able to make a correspondence between one point in two images, you need to know something unique about the point to track. This is often done with interest points, which are "unique enough" to be compared across images. Other methods are based making the point easy to detect using a projection scheme.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.