Is there any mediapipe module that detects hands WITHOUT detecting their pose?
The reason is that the examples I find on the internet end up running slow on my computer, and I don't need to know the position of the fingers, just the hand.
I tried to google it but all the videos/tutorials I find are the same code (which detects each landmark in hand). I'm not much in the ML area and I don't know if there isn't a ready-made model for it or if I didn't know the correct terms to search.
As an addendum if anyone knows some way to use GPU acceleration on Windows it would also work, as I believe it would improve FPS. Everything I found said that this is only possible on Linux, so I gave up and thought about looking for a simpler model that consumes less CPU.
Related
I'm currently using AI Feynman to try to create an expression for some data we've taken. However, I'm having a couple issues during the process and was wondering if anybody in the community has any advice. I've looked around but there doesn't seem to be much documentation around the internet.
I'm having two problems. The first has to do with units. I've searched all over, but I can't seem to find how to format units so that AI Feynman can perform dimensional analysis. If anybody could provide insight it would be much appreciated.
My main issue has to do with NN issues, however. When running in a jupyter notebook, AI Feynman can get through the brute force stage, but as soon as it says "Training a NN on the data...", the kernel dies. I've tried setting up GPUs manually, but that didn't work. Does anybody know how to solve these issues?
I am asking a question, because my two week research are started to get me really confused.
I have a bunch of images, from which I want to get the numbers in Runtime (it is needed for reward function in Reinforcment Learning). The thing is, that they are pretty clear for me (I know that it is absolutely different thing for OCR-systems, but that's why I am providing additional images to show what I am talking about)
And I thought that because they are rather clear. So I've tried to use PyTesseract and when it does not worked out I have tried to research which other methods could be useful to me.
... and that's how my search ended here, because two weeks of trying to find out which method would be bestly suited for my problem just raised more questions.
Currently I think that the best resolve for it is to create digit recognizing model from MNIST/SVNH dataset, but is not it a little bit overkill? I mean, images are standardized, they are in Grayscale, they are small, and the numbers font stays the same so I suppose that there is easier way of modyfing those images/using different OCR method.
That is why I am asking for two questions:
Which method should be the most useful for my case, if not model
trained with MNIST/SVNH datasets?
Is there any kind of documentation/books/sources which could make the actual choice of infrastructure easier? I mean, let's say
that in future I will come up again to plan which OCR system to use.
On what basis should I make choice? Is it purely trial and error
thing?
If what you have to recognize are those 7 segment digits, forget about any OCR package.
Use the outline of the window to find the size and position of the digits. Then count the black pixels in seven predefined areas, facing the segments.
Quick background: I'm a pretty proficient programmer when it comes to MATLAB, but am experimenting with learning python for fun and out of curiousity. I'm working on windows (will be dual booting again soon after this experience...) and am using anaconda (3.6), as I needed Scipy and, being new, figuring out the setup on windows was difficult at best.
I am working on 3D path planning for UAVs. I have a 3D array (say 1000x500x50) where unobstructed space has a value of zero and "no fly zones" have a value of 1. I can create variable terrains representative of a forest with trees, varrying floors, etc. and I'd like a simple way to view this environment. I could sort of get away with doing this as a point cloud by subsampling, increasing density, and then displaying it, but that is a less than ideal solution.
I've been perusing mayavi as a possible method for doing this, but I haven't been able to find examples of this kind of scale. I did write a script to draw a patch for each face of each obstructed cube, and this does work but is horribly inefficinet. Just looking for a better solution if one exists. Also, mayavi doesn't seem to be supported in anaconda yet (at least in 3.6), so I'd like a little more of a lead before I start switching builds again. Thanks everyone!
I'm doing a project with Tensorflow which consist in analyzing UML diagrams drawn on a whiteboard or tablet devices to get in the end a file with the correct UML diagram, usable with softwares. The system will also use Machine learning (explaining why we choose Tensorflow).
As the project goes by with researches, my partner and I have been facing a problem : we don't know how to detect object positions in a picture with Tensorflow. We made some researches and found some articles talking about it, but no real conclusion available. We eventually met this but we're left with no real tracks on what to do.
Our real question is more about : is there anything new since that (because Tensorflow is upgrading pretty fast in my opinion)? Could we have some articles/hints on what to do then?
Thanks in advance.
You should take a look at this work : https://github.com/Russell91/TensorBox and the associated paper.
I'm neither an expert in OpenCV or python but after far too much messing around with poor c# implementations of cv libraries I decided to take the plunge.
Thus far I've got 'blob' (read-contour) tracking working the way I want - my problem now is occlusion, a problem which, as I (and myriad youtube videos) understand it, the Kalman filter can solve. The problem is, relevant examples in python don't seem to exist and the example code is largely devoid of comments, ergo how a red and yellow line running all over the shop solve my problem is a mystery to me.
What I want to achieve is something like this http://www.youtube.com/watch?v=lvmEE_LWPUc or this http://www.youtube.com/watch?v=sG-h5ONsj9s.
I'd be very grateful if someone could point me in the direction of (or provide) an example using actual images pulled from a webcam or video.
Thanks in Advance.
You can take a look at:
https://github.com/dajuric/accord-net-extensions
It implements Kalman filtering, particle filtering, Joint Probability Data Association Filter (for multi-object tracking) along with motion models.
Samples included!