I have a program where I get one image as input and I have to compare it with ~640 known images to see which one is the most similar. To do this I was thinking of using OpenCV's match template, as it seems very fast and effective in doing what I want to do.
I noticed that matching two images, both 400x240 px, 1000 times is much slower than matching a 400x240 px image in a 1400x240 px, despite both of them being 1000 matches.
My idea was to combine the 640 images in one big image containing them in a grid (easy to do since they all have the same size).
Doing this I could really speed up the process if I could match template the input image in only some of the subimages of the big combined reference (the ones with the top-left corner in a "grid pixel", as in a subimage that was actually one of those 640 images that I used to combine into the big reference image), but it doesn't seem to be there a way to tell match template only to match on a specified set of positions.
How could I go to speed up this process? Is there a different library (I'm working with Python) that does something similar to match template but where I can specify which subimages to match? Is there an entirely different approach that suits more my goal?
EDIT:
Basically what I have to do is: I'm taking a screenshot from a videogame and in the screenshot there's a clean portrait of some character (400x240 px). I have clean portraits of all the characters (80 of them, 8 skins each for a total of 640 portraits) and I want to find which portraits is the closest to the one in the screenshot so that I can identify the character played. It'd be awesome if I could also "mask" some pixels like with matchTemplate from OpenCV as there are some specific parts of the 400x240px rectangle that change from player to player and I'd like if I didn't have to account for that and could just mask out those (known) pixels
Related
I'm very new to image processing in Python (and not massively adept at python in general), so forgive me for how stupid this may sound. Im working with an AI for object detection, and need to submit 1000x1000 pixel images to it, that have been divided up from larger images of varying lengths and widths (not necessarily divisible, but I have a way of padding out images less than 1000x1000). In order for this to work, I need 200 pixel overlap on each segment or the AI will pick may miss objects.
I've tried a host of methods, and have either got the image to divide up using the methods suggested in Creating image tiles (m*n) of original image using Python and Numpy and how can I split a large image into small pieces in python (plus a few others that effectively do the same techniques in different words. I've been able to make a grid and get the tile names from this, using How to determine coordinate of grid elements of an image, however have not been able to get overlap to work in this, as I would then just tile it normally.
Basically what I'm saying is that I've found one way to cut the images up that works, and one way to get the tile coordinates, but I am utterly failing at putting it all together. Does anyone have any advice on what to do here?
So far I've not found a direct approach to my end goal online - and I've tried mucking around with different scripts (like the ones listed above), but feel like Im barking up totally the wrong tree.
Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.
Given two images - one a cropped (but not scaled) portion of the other, how can I find the crop parameters (i.e.: the x and y offsets and width/height)? The idea is to crop one image (screenshot) by hand, and then crop a lot more at the same points.
Ideally via imagemagick, but I am happy with any pseudo-code solution, or with Perl, Python, JavaScript (in order of preference)
I have thought of a brute-force approach (find the first pixel which is the same color, check the next, keep going until different, or move to the next). Before I go down this barabarous (and probably slow) route, I'd like to check for better ones.
Template matching can be used for the identification of smaller image within a larger image.
The following resource might be helpful. Please check it out
https://docs.opencv.org/4.5.2/d4/dc6/tutorial_py_template_matching.html
I'm working on creating a tile server from some raster nautical charts (maps) i've paid for access, and i'm trying to post-process the raw image data that these charts are distributed as, prior to geo-referencing them and slicing them up into tiles
i've got a two sets of tasks and would greatly appreciate any help or even sample code on how to get these done in an automated way. i'm no stranger to python/jupyter notebooks but have zero experience with this type of data-science to do image analysis/processing using things like opencv/machine learning (or if there's a better toolkit library that i'm not even yet aware of).
i have some sample images (originals are PNG but too big to upload so i encoded them in high-quality JPEGs to follow along/provide sample data).. here's what i'm trying to get done:
validation of all image data.. the first chart (as well as last four) demonstrate what properly formatted charts images should looks like (i manually added a few colored rectangles to the first, to highlight different parts of the image in the bonus section below)
some images will either have missing tile data, as in the 2nd sample image, these are ALWAYS chunks of 256x256 image data, so should be straightforward to identify black boxes of this exact size..
some images will have corrupt/misplaced tiles as in the 3rd image (notice in the center/upper half of the image is a large colorful semi-circle/arcs, it is slightly duplicated beneath and if you look along horizontally you can see the image data is shifted and so these tiles have been corrupted somehow
extraction of information, ultimately once all image data is verified to be valid (the above steps are ensured), there is a few bit of data i really need pulled out of the image, the most important of which is
the 4 coordinates (upper left, upper right, lower left, lower right) of the internal chart frame, in the first image they are highlighted in a small pink box at each corner (the other images don't have them but they are located in a simlar way) - NOTE, because these are geographic coordinates and involve projections, they are NOT always 100% horizontal/vertical of each other.
the critical bit is that SOME images container more than one "chartlet", i really need to obtain the above 4 coordinate for EACH chartlet (some charts have no chartlets, some two to several of them, and they are not always simple rectangular shapes), i may be able to generate for input the number of chartlets if that helps..
if possible, what would also help is extracting each chartlet as a separate image (each of these have a single capital letter, A, B, C in a circle that would be good if it appeared in the filename)
as a bonus, if there was a way to also extract the sections sampled in the first sample image (in the lower left corner), this would probably involve recognize where/if in the image this appears (would probably only appear once per file but not certain) and then extracting based on its coordinates?
mainly the most important is inside a green box and represents a pair of tables (the left table is an example and i believe would always be the same, and the right has a variable amount of columns)
also the table in the orange box would be good to also get the text from as it's related
as would the small overview map in the blue box, can be left as an image
i have been looking at tutorials on opencv and image recognition processes but the content so far has been highly elementary not to mention an overwhelming endless list of algorithms for different operations (which again i don't know which i'd even need), so i'm not sure how it relates to what i'm trying to do.. really i don't even know where to begin to structure the steps needed for undertaking all these tasks or how each should be broken down further to ease the processing.
I am trying to identify a state of a valve(on or off). My approach is to give to images of each states and compare the current image with those two and see which one it belongs to.
I have tried to compare using new_image - on_image and new_image - off_image. Then compare the number of different pixels. It works, but i feel like in some cases it might not work and there must be another better way do a simple classification like this.
Any reference or ideas?
Subtracting pixels might not be very robust in case your camera position changes slightly. If you don't shy away from using open Computer Vision (open CV) there is an interesting recipe for finding a predefined object in a picture:
Feature Matching + Homography to find Objects
You could cut out the lever from your image and search it in every new image. Depending on the coordinates and especially the rotation, you can set the status of the valve. This might even work in crazy cases where someone half opened (or for pessimists: half closed) the valve, or if the lever becomes partially covered.