I want to object extraction from Images. for example i want to count of human in a picture or find similar picture in great data base(like google example) or finding field of picture (Nature of Office or Home) and etc.
did you know any python library or module for do this work.
If you can link me
tutrial or instruction to this work
similar example project
Perhaps using simplecv?
Here is a video of a presenter at pycon who runs through a quick tutorial of how to use simplecv. About half-way through, at 9:50, she demonstrates how to detect faces in an image, which you might be able to use for your project.
Try this out: https://github.com/CMU-Perceptual-Computing-Lab/openpose
I used it to detect multiple persons and extract the skeleton joints. It's also a little sensitive, so post-processing needs to be done to remove outliers caused due to reflections on the floor, glass walls, etc.
Related
This is more of a 'what is this called' kind of question than a technical one. I have recently started playing with PyAutoGUI and I am using it to do some automation. In order to improve the speed of the overall function I am trying to narrow down the 'region' in which its looking. How would I identify a region by looking for a specific "border" ignoring the internal contents. I don't really need any code, unless your just that bored, just trying to learn what techniques are available to accomplish this task or maybe some helpful keywords that I can use in my search. I am having a very difficult time finding any resources that relate to my objective.
For example, how would I match the entire dimensions of the following picture regardless of what is inside the frame.
Kindly find the link to the image in question here.1
I've tried using PyTesseract to achieve the intended objective. While it works well to extract the words, it doesn't pick the numbers to any degree of acceptable precision. In fact, it doesn't even pick the numbers I require, at all. I intend to design a program that picks up numbers from four particular locations in an image and stores them in a structured data variable (list/dictionary/etc.) and since I require to do this for a good 2500-odd screenshots via that program, I cannot manually pick the numbers I require, even if it begins to read them correctly. The following was the output returned while using PyTesseract (for the image talked about above).
`Activities Boyer STA
Candle Version 4.1-9 IUAC, N.Delhi - BUILD (Tuesday 24 October 2017 04:
CL-F41. Markers:
—
896 13) 937.0
Back
Total,
Peak-1
Lprnenea dais cinasedl
Ee
1511 Show State
Proceed Append to File`
The code used to produce this output was:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/vatsa/Desktop/Screenshot from 2020-06-15 21-41-06.png')))
Referring to the image, I'm interested in extracting the numbers present at those positions across all the screenshots, where 146.47, 915.16, 354.5 and 18.89 are present in this picture and probably save them as a list. How can I achieve such functionality using Python?
Also, upon opening the image in question with Google Docs(linked here) shows what a great job Google does to extract the text. Can an automated program do the job of using Google Docs to do this conversion and then scrape the desired data values as described before? Either approach towards solving the issue would be acceptable and any attempt at finding a solution would be highly appreciated.
[edit]: The question suggested in the comments section was really insightful, yet fell short of proving effective as the given code was unable to find the contours of the numbers in the image and therefore the model could not be trained.
I am trying to build a set of image comparison module as part of an open source e-commerce solution. I have been reading about various techniques in Computer Vision using OpenCv & python.
Objective:
I need to pull out similar images from 1000's of images available in the site. Images are primarily of clothing like shirts, pants, tops etc...
For example, when some is looking for dotted dress, they should see products with similar pattern and maybe with same color.
I so far saw multiple ways to pull similar images. But due to lack of experience can't figure out which is the right method. Some of the possible solutions I stumbled upon:
Histogram comparison.
Feature matching (Wouldn't it match the patterns?)
HAAR Classifier (I assume training a lot of dotted dresses may yeild result)
Bag of words method.
Texture Matching using Local Binary Patterns
I also so LIRE based on Lucene for similar purpose. But couldn't understand which method can be used for this purpose. For example in their documentation they have mentioned LIRE supports the following:
cl_ha .. ColorLayout
ph_ha .. PHOG
oh_ha .. OpponentHistogram
eh_ha .. EdgeHistogram
jc_ha .. JCD
ce_ha .. CEDD
sc_ha .. ScalableColor
Any input/direction in the best approach will be very much appreciated.
I am working on image processing and computer vision project. The project is to count the number of people entering the conference. This need to done in OpenCV or Python.
I have already tried the Haar Cascade that is available in OpenCV for Upper body: Detect upper body portion using OpenCV
However, it does not address the requirement. The link of the videos is as follows:
https://drive.google.com/open?id=0B3LatSCwKo2benZyVXhKLXV6R0U
If you view the sample1 file, at 0:16 secs a person is entering the room, that would always be the way. The camera is on top of the door.
Identifying People from this Aerial Video Stream
I think there is a simple way of approaching this problem. Background subtraction methods for detecting moving objects are just what you need because the video you provided seems to only have one moving object at any point: the person walking through the door. Thus, if you follow this tutorial in Python, you should be able to implement a satisfying solution for your problem.
Counting People Entering / Exiting
Now, the first question that pops to my mind is what might I do to count if multiple people are walking through the door at separate time intervals (one person walks in 10 seconds into the video and a second person walks in 20 seconds into the video)? Here's the simplest solution to this consideration that I can think of. Once you've detected the blob(s) via background subtraction, you only have to track the blob until it goes off the frame. Once it leaves the frame, the next blob you detect must be a new person entering the room and thus you can continue counting. If you aren't familiar with how to track objects once they have been detected, give this tutorial a read. In this manner, you'd avoid counting the same blob (i.e., the same person) entering too many times.
The Difficulties in Processing Complex Dynamic Environments
If you think that there is a high level of traffic through that doorway, then the problem becomes much more difficult. This is because in that case there may not be much stationary background to subtract at any given moment, and further there may be a lot of overlap between detected blobs. There is a lot of active research in the area of autonomous pedestrian tracking and identification - so, in short, it's a difficult question that doesn't have a straightforward easy-to-implement solution. However, if you're interested in reading about some of the potential approaches you could take to solving these more challenging problems in pedestrian detection from an aerial view, I'd recommend reading the answers to this question.
I hope this helps, good luck coding!
I'm working on a hand gesture recognition project. Till now I'm able to detect the centre of the hand and able to track the hand for some consecutive frames and I'm able to get a list of points Like
1
2
Now I want to recognize the path as a gesture like for #1 RIGHT for #2 CIRCLE
How should I do it ? and It should include a method to add other gestures to . Can I use SVM for this purpose? I feel it can be done by using FSM but I can't get how to implement it . I'm using Python and Opencv . Thanx in Advance!
There are various ways to approach this, but I believe that the easiest is to use a template matching approach.
For each gesture, have a sample that you compare to, and the result is simply the one most resembling the current sample.
For the comparison between a sample and a template, a good and simple to implement algorithm is the Dynamic Time Warping (DTW)
https://en.wikipedia.org/wiki/Dynamic_time_warping