Good day. I have this set of geotagged photos. I want to build a system which approximate the location of a query image based on how similar it is from the geotagged photos. I will be using python and opencv to accomplish this task. However, the problem is that most of the geotagged photos have people on it (I'm only after the background scenery).
I found some face detection algorithms that I can use to detect people on photos. However, what I need is to detect the whole body of the people in the images and just leave out the background.
Opencv have algorithms which can be used in removing background (I was hoping to reverse the output and leave the background instead). However, this is only applicable to videos (subtracting static with moving parts). Can you guys recommend any solution to this problem (where to start/ related studies/ algorithms)? I appreciate any help. Thanks!
Related
I'm working on a machine learning application for reading data from fuel pumps, so far I've gone ahead and created a pretty robust YOLOv5 Object Detection Model that can detect the regions that I want fairly accurately. But there is a problem, at certain times of the day there are reflections on the digital screen and I'm unable to use OpenCV pre-process it so that I can extract the numbers from the display.
Check this Video to Understand (YOLOv5 Detection)
https://www.youtube.com/watch?v=3XjZ6Nw70j8
Minimum Reproduceable Example
Cars come and go and their reflection makes it really difficult to differentiate between the reigons for digital-7 font that is used in these displays, you can check out the following repository to understand what I want as s result https://github.com/arturaugusto/display_ocr
Other Solutions I'm Open to:
Since, this application is going to run 24/7 how should I deal with different times,
perhaps create a database of HSV ranges to extract at different times.
Use a polarizing lens would it help in removing the reflections (any user's who have had previous experiences in deploying them).
Edit: I added the correct video ...
I'm a senior in high school and this year I have to do a project for my electronic class, I was hoping to get some advice from people with some experience.
My idea is kind of complicated and has a lot of different sensors but not too crazy, the problem begins with possible image processing. I have a camera who need to check for flashing light and send the video to a screen without the frames of the flashing (like just skipping the frame, so the video is always a frame in delay but the person won't notice it).
The fashing light is supposed to be like in a party or in a video game you get a warning on. The idea is to notice the extreme changing of lighting and to not show it on the screen.
My teacher is afraid that doing image processing might be too complicated and video processing as well... I don't have any knowledge in it, and I have a little background in Python and other languages, do you think it is possible? Can anyone give me an advice or a good video/tutorial to learn from?
Thank you in advance:)
your probleme if quite diificult, cause it envolved unknown environnement in a dynamic time range.
if you admit as an axiom that your camera has for exemple a frame rate of 20 FPS, the chances that your difference between Frame f' and next frame f+1 are quite low.
UNLESS you have a huge color change du to ligth flash,
So you can process with an image similarity such as ssim or psim
https://www.pyimagesearch.com/2017/06/19/image-difference-with-opencv-and-python/
if your image is over a certain treshold that you have to define ( can use also a kalmann filter to dynamically reajust the difference treshold)
so it will probably mean that your flash light is on.
Although it's a visual coding program (per se), Bonsai is a great open source software for doing what's in your description; as well, Bonsai supports applications that require combinations of different hardware (e.g. microcontrollers, cameras) and software components (e.g. Python).
To provide a similar application as an example, I have setup a workflow where Bonsai captures images sent from a Basler camera, it processes the input video frame-by-frame, and when it detects, within the cropped frame (that I cropped around an red LED), a threshold change in pixel intensity (i.e. the red LED turns ON or OFF), it sends an output signal (i.e. 5 volts) to an Arduino microcontroller while saving the image frame as a png file as well as a avi video file along with a vector of True/False (corresponding to the ON or OFF red LED frames) and corresponding timestamps that are saved as csv files, etc. Although this isn't identical to what you've described, I'm sure you can setup a similar Bonsai workflow to accomplish your goal.
Citation: https://www.frontiersin.org/articles/10.3389/fninf.2015.00007/full
Edit: I'm very familiar with Bonsai so if you need help with setting up a Bonsai workflow I'd be happy to help; I don't think there is direct message on StackOverFlow, but given that StackOverFlow doesn't list Bonsai as a programming language (because it's a visual programming language; or because it's not well known enough to include on StackOverFlow) feel free to reach out if you have any questions regarding Bonsai specifically (again, it's also an open source software).
I am new to the computer vision area and i have been given this task,
I need to recognize an amount of images with a camera as soon as they enter the camera focus, this images would be scanned previously and stored in some sort of database.(maybe the key-points collection to each image)
well, i've been doing some research and found that SIFT may do the trick but i don't know how to use it properly, i need to do this on Python-opencv
Note: I already found examples in which I can get the key-points on an image using SIFT, but the code is very confusing to someone who does not know the language, any help is appreciated.
Here is a good page for you to get started and learn the basics along the way.
As an exercise, I'm attempting to break the following CAPTCHA:
It doesn't seem like it would be too difficult to break as the edges seems to fairly solid and noise should be relatively easy to remove. Problem is, I have very little experience with image manipulation. Currently I'm using Python with the Pillow library to manipulate the CAPTCHA image, after which it will be passed into Tesseract for OCR.
In the following code I attempt to bring out the edges by sharpening the image and the convert the image to black and white
from PIL import Image, ImageFilter
try:
img = Image.open("Captcha.jpg")
except:
print("Can't load captcha.")
exit()
# Bring out the edges by sharpening.
out = img.filter(ImageFilter.SHARPEN)
out = out.convert("L")
out = out.point(lambda x: 0 if x<136 else 255, "1")
width, height = out.size
out = out.resize((width*5, height*5), Image.NEAREST)
out.save("captcha_modified.png")
At this point I see the following:
However, Tesseract is still unable to read the characters. As an experiment, I used good ol' mspaint to manually modify the image to a point to where it could be read by Tesseract:
So if can get the image to that point, I think Tesseract will do a fairly good job at detecting characters. So my current thoughts are that I need to enhance the edges and reduce the noise the image. Also, I imagine it would be easier for Tesseract to detect the letters if the letters will filled in rather than outlined, but I have not idea how I'd do this.
Any suggestions on how to go about this? Is there a better way to process the images?
I am short on time so this answer may not be incredibly useful but goes over my own 2 algorithms exactly. There isn't much code but a few method reccomendations. It is a good idea to use code rather than MS Paint.With code its actually really easy to break a captcha and achieve above 50% success rate. Behavioral recognition may be a better security mechanism or may be an additional one.
A. Edge Detection Method you use:
Edge detection really isn't necessary. In this case, just use the getpixel((x,y)) function and fill in the area between the bounding lines, recognizing to fill at lines 1,3,5;etc. and turn off the fill after intersection 2,4,6;etc. Luckilly, you chose an easy Captcha so edge detection is a decent solution without decluttering,rotating, and re-alignment.
B. Manipulation Method:
Another method I use utilizes OpenCV and pillow as well. I am really busy but am posting a blog article on this later at druid5.wordpress.com/ which will contain code examples of this method. Since it isn't illegal to get through them, at least I am told, I use the method I will post to collect data all the time. Mostly, contrast and detail from pillow, some basic clutter removal with stats, re-alignment with a basic dfs, and rotation (performable with opencv or easily with a kernal). Tesseract is a good choice for open source but it isn't too hard to create an OCR with opencv either.
This exercies is a decent introduction to OpenCV, PIL (pillow), image manipulation with math, and some other things that help with everything from robotics to AI.
Using flow control to find the failed conditions and try different routes may be necessary but the aim should always be a generic solution.
I am working on a project where I need to program a Raspberry Pi to grab an image from a webcam, search that image for a box and identify what box it is by it's size ratio. The boxes will be a unique color to the rest of the environment. It would also be good to identify the distance from the box and angle to the box.
Everything I've seen seems to indicate that this should be possible, but after several days of searching I have yet to find anything that really helps me to do this. This project is my first experience using Python, so I'm pretty newbish. Any help even with how to do little portions of this would be greatly appreciated.
Here's my working code so far. It's not much, all it does is grab an image from a webcam :/
import imgproc
from img imgproc *
camera = Camera(160, 120)
viewer = Viewer(160, 120)
n = 1
while (n > 0):
img = camera.grabImage()
viewer.displayImage(img)
This is not a complete solution, but some good ideas on how to get started :)
First off, there are Python bindings for OpenCV, an open source free computer vision library originally written in C: http://opencv.willowgarage.com/documentation/python/index.html
The first thing you have to do when solving a computer vision problem is pre-process. In particular, knowing that the box is a different colour helps a LOT - it means we can threshold by colour and create an image that is black where the box is not, and white where the box is, using a technique such as in http://aishack.in/tutorials/thresholding/ .
Then, you'd follow a process similar to the Sudoku grabber/solver described in this blog - you do blob extraction ( http://en.wikipedia.org/wiki/Blob_extraction ) then do a hough transform to get lines, and then you can compare the lines' distances to each other to determine the box's ratio. http://aishack.in/tutorials/sudoku-grabber-with-opencv-plot/
Pretty much just read about people's OpenCV Sudoku solvers until you get the gist of how it's done, because there are a lot of good tutorials and it's a simple illustration of how computer vision projects go: https://www.google.com.au/search?q=sudoku+opencv&aq=f&oq=sudoku+opencv&aqs=chrome.0.57j60l3j0l2.1506&sourceid=chrome&ie=UTF-8
You may want to try installing SimpleCV from the github repo. Using SimpleCV you should be able to get the blob's color using the Image.hueDistance command. If you use the findBlobs command to find your boxes each blob should have its aspect ratio as a parameter. We just posted our full PyCon tutorial about SimpleCV here. You can view just the slides here. We've heard that there are some issues installing PyGame (a SimpleCV dependency) on the RaspberryPi. This walk through might address those issues.