I've worked with several methods in OpenCV to identify moving objects and track them based on changing pixels or color, but nothing regarding the area where these objects are moving, so I'm coming here to ask if anyone has a clue about this topic.
Conceptually the idea is pretty simple: let's say we have a bunch of moving objects in a video
As these objects pass by we would like to identify the boundaries of these objects or "trails":
By the end of the video, or after a time set, the idea would be to know what these boundaries are so we can compute them (area for instance):
My hunch would be to use Lucas-Kanade Optical Flow to track the corner points as the objects pass by and keep the ones further away, but so far nothing has worked and therefore I'm unsure this is the proper approach.
Would anyone have a clue about the approach to take? Thanks!
Related
I have been working on a python program using opencv that will help the user solve the Rubik's Cube. The most important & complicated part is identifying the cube to read the value of each of its sides.
I have had a decent amount of luck so far but am wanting to change the processing pipeline a little. I think it would make sense to isolate the cube from its background before trying to detect the (rounded) square stickers and read their colors.
Attached is an example of the sort of frame we would be dealing with. I'm not sure what the best method for isolating the cube from the background would be. I have tried background selection, which seemed somewhat promising (though the mask was very grainy & blotchy). But I am also wondering if it would make more sense to use something like object detection.
I have considered just making a "dumb" crop which makes the user align the cube within a reticle. However, I would prefer a more elegant solution and don't mind spending the additional time that entails.
Edit: maybe the mild bokeh could be used to identify the background, or is it too miniscule to detect consistently?
Thanks for any help!
I have a camera in a fixed position looking at a target and I want to detect whether someone walks in front of the target. The lighting in the scene can change so subtracting the new changed frame from the previous frame would therefore detect motion even though none has actually occurred. I have thought to compare the number of contours (obtained by using findContours() on a binary edge image obtained with canny and then getting size() of this) between the two frames as a big change here could denote movement while also being less sensitive to lighting changes, I am quite new to OpenCV and my implementations have not been successful so far. Is there a way I could make this work or will I have to just subtract the frames. I don't need to track the person, just detect whether they are in the scene.
I am a bit rusty but there are various ways to do this.
SIFT and SURF are very expensive operations, so I don't think you would want to use them.
There are a couple of 'background removal' methods.
Average removal: in this one you get the average of N frames, and consider it as BG. This is vulnerable to many things, light changes, shadow, moving object staying at a location for long time etc.
Gaussian Mixture Model: a bit more advanced than 1. Still vulnerable to a lot of things.
IncPCP (incremental principal component pursuit): I can't remember the algorithm totally but basic idea was they convert each frame to a sparse form, then extract the moving objects from sparse matrix.
Optical flow: you find the change across the temporal domain of a video. For example, you compare frame2 with frame1 block by block and tell the direction of change.
CNN based methods: I know there are a bunch of them, but I didn't really follow them. You might have to do some research. As far as I know, they often are better than the methods above.
Notice that, for a #30Fps, your code should complete in 33ms per frame, so it could be real time. You can find a lot of code available for this task.
There are a handful of ways you could do this.
The first that comes to mind is doing a 2D FFT on the incoming images. Color shouldn't affect the FFT too much, but an object moving, entering/exiting a frame will.
The second is to use SIFT or SURF to generate a list of features in an image, you can insert these points into a map, sorted however you like, then do a set_difference between the last image you took, and the current image that you have. You could also use the FLANN functionality to compare the generated features.
I thought about tackling a new project in which I use Tensorflow object detection API to detect Euro pallets (eg. pic).
My ultimate goal is to know how far I am away from the pallet and which relative position I have to it. So I thought about first detecting the euro pallet in an RGB feed from a kinect camera and then using its 3D feature to get the distance to the pallet.
But how do I go about the relative position of the pallet? I could create different classes, for example one is "Front view laying pallet" another one Side view laying pallet etc. but I think for that to be accurate I'd need quite a few pictures for each class for it to be valid? Like 200 for each class?
Since my guess is that there are no such labeled datasets yet thats quite a pain to create by myself.
Another way I could think of, is if I label my pallets with segmentation instead of bounding boxes, maybe there is another way to find out my relative position to the pallet? I never did semantic segmentation labeling myself but can anyone name any good programs which I could use?
I'm hoping someone can help point me in the right direction. Any help would be appreciated.
Some ideas: assuming detection and segmentation with classifier(s) works, one could then try feature detection like edges / lines to obtain clues about its orientation (bounding box).
Of course this will be tricky for simple feature detection because of very different surfaces (wood, dirt), backgrounds and lighting.
Also, "markerless tracking" (a topic in augmented reality) and "bin picking" (actually applied in the automation industry) may be keywords for similar problems, although you are probably not starting with an unordered pile of pallets.
I have a pygame program where there's a face in the center. What I want the program to do is have a bunch of objects on the screen, all irregular. Some would be circles, others would be cut-out pictures of objects like surf boards, chairs, bananas, etc. The user would be able to drag the objects around, and they'd collide with each other and the face in the center, and so be unable to pass through them. Could anyone show me how I would do this? Thanks!
-EDIT- And by not be able to pass through, I mean they'd move along the edge of the object, trying to follow the mouse.
What you are looking for is functionality usually provided by a so-called physics engine. For very basic shapes, it is simple enough to code the basic functionality yourself. (The simplest case for 2D shapes is the collision detection between circles).
Collision detection gets pretty hard pretty quickly, especially if you want to do it at a reasonably fast rate (such as you would need for the sort of project you are describing) and also especially if you are dealing with arbitrary, non-regular shapes (which your description seems to indicate). So, unless you are interested in learning how to code an optimized collision detection system, I suggest you google for python physics engines. I have never used any, so I can't personally recommend one.
Good luck!
So I've been making a game using Python, specifically the PyGame module. Everything has been going fairly well (except Python's speed, am I right :P), and I've got a nice list of accomplishments from this, but I just ran into a... speedbump. Maybe a mountain. I'm not to sure yet. The problem is:
How do I go about implementing a Camera with my current engine?
That probably means nothing to you, though, so let me explain what my current engine is doing: I have a spritesheet that I use for all images. The map is made up of a double array of Tile objects, which fills up the display (800 x 640). The map also contains references to all Entity's and Particles. So now I want to create a a camera, so that the map object can be Larger than the display. To do this I've devised that I'll need some kind of camera that follows the player (with the player at the center of the screen). I've seen this implemented before in games, and even read a few other similar posts, but I need to also know Will I have to restructure all game code to work this in? My first attempt was to make all object move on the screen when the player moves, but I feel that there is a better way to do this, as this screws up collision detection and such.
So, if anyone knows any good references to problems like this, or a way to fix it, I'm all ears... er.. eyes.
Thanks
You may find this link to be of interest.
In essence, what you need to do is to distinguish between the "actual" coordinates, and the "display" coordinates of each object.
What you would do is do the bulk of the work using the actual coordinates of each entity in your game. If it helps, imagine that you have a gigantic screen that can show everything at once, and calculate everything as normal. It might help if you also designed the camera to be an entity, so that you can update the position of your camera just like any other object.
Once everything is updated, you go to the camera object, and determine what tiles, objects, particles, etc. are visible within the window, and convert their actual, world coordinates to the pixel coordinates you need to display them correctly.
If this is done correctly, you can also do things like scale and otherwise modify the image your camera is displaying without affecting gameplay.
In essence, you want to have a very clear distinction between gameplay and physics logic/code, and your rendering/display code, so your game can do whatever it wants, and you can render it however you want, with minimal crossover between the two.
So the good news is, you probably don't need to change anything about how your game itself works. The bad news is, you'll probably have to go in and rewrite your rendering/drawing code so that everything is drawn relative to the camera, not to the world.
Since I can't have a look into your code, I can't assess how useful this answer will be for you.
My approach for side scroller, moveable maps, etc. is to blit all tiles onto a pygame.Surface spanning the dimensions of the whole level/map/ etc. or at least a big chunk of it. This way I have to blit only one surface per frame which is already prepared.
For collision detection I keep the x/y values (not the entire rect) of the tiles involved in a separate list. Updating is then mainly shifting numbers around and not surfaces anymore.
Feel free to ask for more details, if you deem it useful :)