Is there a simple way to track the motions of a single entity in a webcam feed? For example, I imagine a "hello world" app with an index finger used as mouse pointer.
I realize there's still a lot of basic research in this area, so it might be too early to expect an easy to use, generic abstraction.
For the sake of completeness, I've seen some related but lower-level (and non-Python) projects being mentioned, including AForge, WiimoteLib and an article on motion detection algorithms.
You might want to take a look at http://opencv.willowgarage.com/wiki/PythonInterface. I'm not sure how hard it would be to do arbitrary motion tracking, but it was fairly simple to implement face tracking.
Related
im currently working on a small startup for some extra cash, im using qt 5.13 and my aim is to develop a small camera with functions like dimensional measurement based on the lense and height or edge detection and that sort of thing, these i will be developing in python with the use of opencv.
Anyways my question is this, before i dive in too deep to go back, is it possible to use qt, to run a (Pi)camera fullscreen, no edges and just have a small transparent button on a corner to be the settings? Like, this for sake of the UX, i wouldnt like to have borders or to need to cut screen size to add features.
In Qt, all cameras will be equal, so you can prototype it on your PC first, and it should work on RPi. Using QML, it should work just fine - it's a compositing framework that uses the GPU for composition, and RPi 4 has plenty enough GPU bandwidth to deal with it. QML supports semitransparent controls.
You may wish to see various augmented reality (AR) measurement applications available for iOS and Android (even just the Ruler included in iOS 12). You might be entering a crowded market. Those apps are not perfect, and there are simple cases that throw them off - like measuring the size of a window on a large flat wall on the side of a long but narrow room - there's too much bloom and not enough detail on the wall to have a stable depth reference, even on the best iPhone available.
If you can write software that is extremely robust, then you'll have a real market differentiator - but it won't generally be easy, and OpenCV is only a low-level building block. It's not unthinkable that you'll need some GPU-oriented computational framework instead (OpenCV provides some of it, but it's far from general).
Also, 99% of the UX will be the software, and that software should be very much portable by design, so investing anything in hardware before your software is good is a waste. Just as you suggest, a RPi 4 will do great for prototype hardware - but there's a catch that you may be limiting yourself unnecessarily by tying it all to a platform. There's so many platforms that settling on RPi when there's no market need for that is not sensible, I don't think.
You could use one of a multitude of WiFi battery-powered cameras with your PC: this will let you concentrate on the algorithms and functionality without having to mess with cross-compilation for RPi, etc. It'll also let you develop good software even if an RPi won't have enough bandwidth to do this realtime processing. There are faster platforms, so it'd be best not to get invested in any hardware at all. The quality of the camera will matter a lot, though, so you will want to start with a good WiFi camera, get things perfect, and then downgrade and see how far you can go. Even professional cameras provide WiFi streaming, so you can use a camera as good as you can afford. It will make things simpler to start with.
Also, don't spend time on the UI much before you get the core functionality solid. You'll be designing a "Debug" UI, and you perhaps should keep that one available but hidden in the final product.
I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)
The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.
I have a project in which I should analyze the layout of a building in order to navigate inside it, and I was thinking about taking the blueprint of the building (or maybe an edited version of the blueprint, which should be modified in some way I am still thinking of), transforming it in some kind of object and then elaborate it.
Basically, I was thinking about doing something similar to OCR but limited (and I guess using limited sounds pretty silly to most of you, but still bear with me) to recognition of, for example, walls and doors. My idea was transforming the whole image into a matrix of points - I guess, a lower resolution version of the source - and then elaborating over the matrix the route from point A to point B.
This is the idea, but I guess that I'm actually looking at a problem way more complex than it looks to me, moreover I don't really know whether this is the best (read: easiest) way to proceed.
In short, my question is:
Is this framework feasible? Are there any libraries for, say, Python, with similar functions? Is the recognition doable by working in someway with a graphic design software (e.g. Photoshop)?
I have load an obj file to render my opengl model using pyopengl and pygame. The 3D model show successfully.
Below is the 3D model i render with obj file, Now i cut my model into ten pieces through y axis , my question is how to get the sectional drawing in each piece?
I'm really very new to openGL, Is there any way can do that?
There are two ways to do this and both use clipping to "slice" the object.
In older versions of OpenGL you can use user clip planes to "isolate" the slices you desire. You probably want to rotate the object before you clip it, but it's unclear from your question. You will need to call glClipPlane() and you will need to enable it using glEnable with the argument GL_CLIP_PLANE0, GL_CLIP_PLANE1, ...
If you don't understand what a plane equation is you will have to read up on that.
In theory you should check to see how many user clip planes exist on your GPU by calling glGetIntegerv with argument GL_MAX_CLIP_PLANES but all GPUs support at least 6.
Since user clip planes are deprecated in modern Core OpenGL you will need to use a shader to get the same effect. See gl_ClipDistance[]
Searching around on Google should get you plenty of examples for either of these.
Sorry not to provide source code but I don't like to post code unless I am 100% sure it works and I don't have the time right now to check it. However I am 100% sure you can easily find some great examples on the internet.
Finally, if you can't make it work with clip planes and some hacks to make the cross sections visible then this may indeed be complicated because creating closed cross sections from an existing model is a hard problem.
You would need to split the object, and then rotate the pieces so that they are seen from the side. (Or move the camera. The two ideas are equivalent. But if you're coding this from scratch, you don't really have the abstraction of a 'camera'.) At that point, you can just render all the slices.
This is complicated to do in raw OpenGL and python, essentially because objects in OpenGL are not solid. I would highly recommend that you slice the object into pieces ahead of time in a modeling program. If you need to drive those operations with scripting, perhaps look into Blender's python scripting system.
Now, to explain why:
When you slice a real-life orange, you expect to get cross sections. You expect to be able to see the flesh of the fruit inside, with all those triangular pieces.
There is nothing inside a standard polygonal 3D model.
Additionally, as the rind of a real orange has thickness, it is possible to view the rind from the side. In contrast, one face of a 3D model is infinitely thin, so when you view it from the side, you will see nothing at all. So if you were to render the slices of this simple model, from the side, each render would be completely blank.
(Well, the bits at the end will have 'caps', like the ends of a loaf a bread, but the middle sections will be totally invisible.)
Without a programming library that has a conception of what a cut is, this will get very complicated, very fast. Simply making the cuts is not enough. You must seal up the holes created by slicing into the original shape, if you want to see the cross-sections. However, filling up the cross sections has to be done intelligently, otherwise you'll wind up with all sorts of weird shading artifacts (fyi: this is caused by n-gons, if you want to go discover more about those issues).
To return to the original statement:
Modeling programs are designed to address problems such as these, so I would suggest you leverage their power if possible. Or at least, you can examine how Blender implements this functionality, as it is open source.
In Blender, you could make these cuts with the knife tool*, and then fill up the holes with the 'make face' command (just hit F). Very simple, even for those who are not great at art. I encourage you to learn a little bit about 3D modeling before doing too much 3D programming. It personally helped me a lot.
*(The loop cut tool may do the job as well, but it's hard to tell without understanding the topology of your model. You probably don't want to get into understanding topology right now, so just use the knife)
I'm neither an expert in OpenCV or python but after far too much messing around with poor c# implementations of cv libraries I decided to take the plunge.
Thus far I've got 'blob' (read-contour) tracking working the way I want - my problem now is occlusion, a problem which, as I (and myriad youtube videos) understand it, the Kalman filter can solve. The problem is, relevant examples in python don't seem to exist and the example code is largely devoid of comments, ergo how a red and yellow line running all over the shop solve my problem is a mystery to me.
What I want to achieve is something like this http://www.youtube.com/watch?v=lvmEE_LWPUc or this http://www.youtube.com/watch?v=sG-h5ONsj9s.
I'd be very grateful if someone could point me in the direction of (or provide) an example using actual images pulled from a webcam or video.
Thanks in Advance.
You can take a look at:
https://github.com/dajuric/accord-net-extensions
It implements Kalman filtering, particle filtering, Joint Probability Data Association Filter (for multi-object tracking) along with motion models.
Samples included!