I've been given a video consisting of frames like this:
and I am supposed to count how many of fishes went in front of the screen during the video using Python. I am not allowed to use OpenCV or similar library. The only library I am allowed to use is PIL and Numpy, therefore I am forced to use numpy matrices as the image representation.
So far I am able to generate this one channel image mask:
which I believe should be enough to detect fishes on one frame. However now I need to somehow track those fishes throughout the remaining frames to distinguish those fishes who were already on the screen and those who moved in.
What would be the procedure to do that? I'm imagining somehow marking those areas and then predicting the movement of those fishes or something.
Related
Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.
I have a camera in a fixed position looking at a target and I want to detect whether someone walks in front of the target. The lighting in the scene can change so subtracting the new changed frame from the previous frame would therefore detect motion even though none has actually occurred. I have thought to compare the number of contours (obtained by using findContours() on a binary edge image obtained with canny and then getting size() of this) between the two frames as a big change here could denote movement while also being less sensitive to lighting changes, I am quite new to OpenCV and my implementations have not been successful so far. Is there a way I could make this work or will I have to just subtract the frames. I don't need to track the person, just detect whether they are in the scene.
I am a bit rusty but there are various ways to do this.
SIFT and SURF are very expensive operations, so I don't think you would want to use them.
There are a couple of 'background removal' methods.
Average removal: in this one you get the average of N frames, and consider it as BG. This is vulnerable to many things, light changes, shadow, moving object staying at a location for long time etc.
Gaussian Mixture Model: a bit more advanced than 1. Still vulnerable to a lot of things.
IncPCP (incremental principal component pursuit): I can't remember the algorithm totally but basic idea was they convert each frame to a sparse form, then extract the moving objects from sparse matrix.
Optical flow: you find the change across the temporal domain of a video. For example, you compare frame2 with frame1 block by block and tell the direction of change.
CNN based methods: I know there are a bunch of them, but I didn't really follow them. You might have to do some research. As far as I know, they often are better than the methods above.
Notice that, for a #30Fps, your code should complete in 33ms per frame, so it could be real time. You can find a lot of code available for this task.
There are a handful of ways you could do this.
The first that comes to mind is doing a 2D FFT on the incoming images. Color shouldn't affect the FFT too much, but an object moving, entering/exiting a frame will.
The second is to use SIFT or SURF to generate a list of features in an image, you can insert these points into a map, sorted however you like, then do a set_difference between the last image you took, and the current image that you have. You could also use the FLANN functionality to compare the generated features.
I have continuous videos taken from two cameras placed on up right and up left corners of my car's windshield (please note that they are not fixed to each other, and I aligned them approximately straight). Now I am trying to make a 3D point cloud out of that and have no idea how to do that. I surfed the internet a lot and still couldn't find any useful info. Can you send me some links or hints on how can I make that work in Python.
You can try the stereo matching and point cloud generation implementation in the OpenCV library. Start with this short Python sample.
I suppose that you have two independent video streams that are not exactly synchronized. You will have to synchronize them first, because the linked sample expects two images, not videos. Extract images from videos using OpenCV or ffmpeg and find an image pair that shares exactly the same timepoint (e.g. green appearing on a traffic light). Alternatively you can use the audio tracks for synchronization, see https://github.com/benkno/audio-offset-finder. Beware: synchronization based on a single frame pair or a short audio excerpt will probably work only for few minutes before and after the synchronized timepoint.
I want to count number of people going up or down using a reference line let's say in the middle of that video. Now, How do I actually implement it using python and openCV.. I saw a lot of videos showing the people counter but no one has the method or instructions on how to exactly do that.. I don't need code.. Plz just tell me the method..
Btw here is something that i tried.. But this isn't working:
import cv2
Take a look at the detailed breakdown here, as pointed out in comments by leaf, but basically you can use the OpenCV2 built-in methods to perform pedestrian detection. OpenCV ships with a pre-trained HOG + Linear SVM model that can be used to perform pedestrian detection in both images and video streams.
To separate the Up & Down counters I would split each frame on the vertical line before running the detection on each half separately. You can count the number of people going in the given direction in a single frame by a simple len(contours) while processing that frames direction half.
To track the total number of people going in a given direction you will need to detect the motion of each contour across the frame and only add a new entry to the count when a new contour is created near the entry edge of the direction frame - of course this could be confused by people sprinting through the frame, moving the opposite direction to the expected running up the down or vice-versa and entering the frame then backing out.
I'm working on detecting license plates with openCV, python and a raspberry pi. Most of it is already covered. What I want is to detect the ROI (Region of interest) of the plate, track it on a few frames and add those together to get a more clear and crisp image of the plate.
I want to get a better image of the plate by taking the information from several frames. I detect the plate, and have a collection of plates from several frames, as many as I wish and as many as the car is moving by the camera. How can I take all those and get a better version?
You need to ensure that your frame rate is fast enough to get a decent still of the moving car. When filming, each frame will most likely be blurry, and our brain pieces together the number plate on playback. Of course a blurry frame is no good for letter recognition, so is something you'll need to deal with on the hardware side, rather than software side.
Remember the old saying: Garbage in; Garbage out.