I'm working on a hand gesture recognition project. Till now I'm able to detect the centre of the hand and able to track the hand for some consecutive frames and I'm able to get a list of points Like
1
2
Now I want to recognize the path as a gesture like for #1 RIGHT for #2 CIRCLE
How should I do it ? and It should include a method to add other gestures to . Can I use SVM for this purpose? I feel it can be done by using FSM but I can't get how to implement it . I'm using Python and Opencv . Thanx in Advance!
There are various ways to approach this, but I believe that the easiest is to use a template matching approach.
For each gesture, have a sample that you compare to, and the result is simply the one most resembling the current sample.
For the comparison between a sample and a template, a good and simple to implement algorithm is the Dynamic Time Warping (DTW)
https://en.wikipedia.org/wiki/Dynamic_time_warping
Related
This is more of a 'what is this called' kind of question than a technical one. I have recently started playing with PyAutoGUI and I am using it to do some automation. In order to improve the speed of the overall function I am trying to narrow down the 'region' in which its looking. How would I identify a region by looking for a specific "border" ignoring the internal contents. I don't really need any code, unless your just that bored, just trying to learn what techniques are available to accomplish this task or maybe some helpful keywords that I can use in my search. I am having a very difficult time finding any resources that relate to my objective.
For example, how would I match the entire dimensions of the following picture regardless of what is inside the frame.
Im comparing some Open-Source Face-Recognition-Frameworks running with python (dlib) and for that i wanted to create ROC and DET curves. For creating match-scores im using the casia faceV5 dataset. Everything is only for educational purpose.
My Questions is:
Whats the best way to generate these kind of curves? (Any good libs for that?)
I found this via google Skicit but i still dont know how i should use that for face recognition?
I mean, which information should i have to pass? I know that ROC is using the true match rate and the false match rate, but from a developers point of view i just dont know how to integrate these informations to that Skicit-function.
My Test:
Im creating genuine match-scores of every person in the casia dataset. Therefor i use different pictures of the same person to create it. I save this scores in the array "genuineScores".
Example:
Person1_Picture1.jpg comparing with Person1_Picture2.jpg
Person2_Picture1.jpg comparing with Person2_Picture2.jpg etc.
Im also creating impostor match-scores. For this im using two pictures of different persons. I save this scores in the array "impostorScores".
Example:
Person1_Picture1.jpg comparing with Person2_Picture1.jpg
Person2_Picture1.jpg comparing with Person3_Picture1.jpg etc.
Now im just looking for a lib where i could pass the two arrays and its creating a roc curve for me.
Or is there another method for doing so?
I appreciate any kind of help. Thank you.
I'm working on a project where I have to match one video sequence with the other. The actions and motions of the two videos are similar, since the video that I'm matching with is doing the exact same movement as the other one. And currently, I'm leaning towards using dynamic time warping(DTW) to align those two videos, but I have some trouble coming up with an idea to do so. So I'm wondering if you guys have any source code that I can work with or any ideas on how to dissect this problem. Thank you
a starting point would be to understand DTW https://www.cs.unm.edu/~mueen/DTW.pdf
In general, is there any "best practise" on how to use videos as input of deep learning models? How can we annotate video in the most efficient way?
Also, I have some videos with ducks walking through a passage. I want to count the number of grey-duck and the number of yellow-duck passing through the passage. A duck can pass directly through (easiest case), or can stay in the passage for a while and pass through, or can go half the passage and go back the other direction (in this case it should not be counted).
I plan to use Mask-RCNN to segment ducks in each frame and then to look at the masks from frame i and masks from frame i+1 and make rules to count the number of different ducks that truly pass the passage.
This does not seems optimal to me.
Any ideas/help/hints?
I guess it depends on the video, but a good option was to
Annotate some 'not to similar frames' with: http://www.robots.ox.ac.uk/~vgg/software/via/
Use a model like YOLO or Mask-RCNN to find bounding box over each objects and classfiy them. Or use Optical flow algorithm. Optical flow algorithm is also an option instead of using deep learning, but I finally decided not to use it due to several possible outcome which made it from my point of view less automatic: *object that moves, stop and restart moving would require special attention *objects which are of one main color might be split into two pieces (middle pixels might be saw as not moving) *group of object passing together will probably be saw as one object
Then using tracking algorithm you will be able to give a specific ID to each object, and hence to count when they pass a certain line.
I want to object extraction from Images. for example i want to count of human in a picture or find similar picture in great data base(like google example) or finding field of picture (Nature of Office or Home) and etc.
did you know any python library or module for do this work.
If you can link me
tutrial or instruction to this work
similar example project
Perhaps using simplecv?
Here is a video of a presenter at pycon who runs through a quick tutorial of how to use simplecv. About half-way through, at 9:50, she demonstrates how to detect faces in an image, which you might be able to use for your project.
Try this out: https://github.com/CMU-Perceptual-Computing-Lab/openpose
I used it to detect multiple persons and extract the skeleton joints. It's also a little sensitive, so post-processing needs to be done to remove outliers caused due to reflections on the floor, glass walls, etc.