What does it really mean real time object detection?

What does it really mean real time object detection? - python

So here is the context.
I created an script in python, YOLOv4, OpenCV, CUDA and CUDNN, for object detection and object tracking to count the objects in a video. I intend to use it in real time, but what real time really means? The video I'm using is 1min long and 60FPS originally, but the video after processing is 30FPS on average and takes 3mins to finish. So comparing both videos side by side, one is clearly faster. 30FPS is industry standard for movies and stuff. I'm trying to wrap my head around what real time truly means.
Imagine I need to use this information for traffic lights management or use this to lift a bridge for a passing boat, it should be done automatically. It's time sensitive or the chaos would be visible. In these cases, what it trully means to be real time?

First, learn what "real-time" means. Wikipedia: https://en.wikipedia.org/wiki/Real-time_computing
Understand the terms "hard" and "soft" real-time. Understand which aspects of your environment are soft and which require hard real-time.
Understand the response times that your environment requires. Understand the time scales.
This does not involve fuzzy terms like "quick" or "significant" or "accurate". It involves actual quantifiable time spans that depend on your task and its environment, acceptable error rates, ...
You did not share any details about your environment. I find it unlikely that you even need 30 fps for any application involving a road intersection.
You only need enough frame rate so you don't miss objects of interest, and you have fine enough data to track multiple objects with identity without mistaking them for each other.
Example: assume a car moving at 200 km/h. If your camera takes a frame every 1/30 second, the car moves 1.85 meters between frames.
How's your motion blur? What's the camera's exposure time? I'd recommend something on the order of a millisecond or better, giving motion blur of 0.05m
How's your tracking? Can it deal with objects "jumping" that far between frames? Does it generate object identity information that is usable for matching (association)?

Related

OpenCV Object Tracking accurate enough to measure speed of conveyor precisely?

I am building a conveyor speed tracking system with only vision.
Basic concept of the project is to calculate conveyor's speed based on looking at the surface of the conveyor by camera.
[What I Tried]
By far, I tried to use OpenCV's object tracking algorithm to track each sections, but it seems that it's not accurate enough to calculate the speed. To normalize the speed variance, I created many tracking instances and get the speed average of the instances. However, even the average seems to be not consistent. For algorithm, I used MOSSE for the speed. When I use different algorithms such as KCF or CSRT, the process time is too slow for real-time speed tracking. I didn't try to use GOTURN yet as it requires a trained model.
Is object tracking in OpenCV not accurate enough? or is it just a problem in my algorithm?
Also, do you have any suggestion on how to precisely calculate conveyor's speed only by vision system?
Any help is greatly appreciated.

SUMO Simulation. Detecting high Traffic and reducing the speed limit

I am learning SUMO from beggining, I read and learned most of tutorials from: http://sumo.dlr.de/wiki/Tutorials . What I want to do now is to make Cars slow down when there is a Traffic on a Road. I only know how to change the speed limit after a certain Time from here: http://sumo.dlr.de/wiki/Simulation/Variable_Speed_Signs . Do you know how can I change the speed limit when there is Traffic? I think that changing the value of speed Signs is the best Idea here, but I don't how do it.

There is no such thing as event triggered speed signs in SUMO, so you probably need to do that via TraCI. The easiest way is probably to start with the TraCI tutorial for trafficlights and then use functions like traci.edge.getLastStepOccupancy to find out whether vehicles are on the edge in question and then traci.edge.setMaxSpeed to set a new speed.

How to figure out multilateration with xyz positions of each post and difference in time?

I'm having some issues figuring out multilateration. I'll start by saying I'm not a math whiz, but I am usually able to figure most things out, but this one has confused me. I got to this point after reading up on Time Difference of Arrival.
I have four wifi adapters. Each one takes a point in a three sided pyramid, so this should be able to take height into account, I believe. The relative positions to each other are fixed as well.
What I'm attempting to do is listen for wifi signals and find their origin. In theory, I believe I should be able to use the difference in time between each wifi adapter "hearing" a packet to find the origin of the packet.
I've paired a GPS into this. It allows me to give each wifi adapter an actual position (with a little math).
So here's what I have when I receive a packet:
wlan1 (X, Y, Z, timestamp)
wlan2 (X, Y, Z, timestamp)
wlan3 (X, Y, Z, timestamp)
wlan4 (X, Y, Z, timestamp)
X and Y are lat/lng. Z is the altitude in meters, and the timestamp is reflecting microseconds.
Some assumptions to make are that the XYZ are accurate. For all practical purposes, if they're off, then they're all consistently off, which should be reflected in finding the source.
I haven't been able to figure out how to apply any math to this, and am seeking an example. I can provide some actual data if necessary. The end goal is working on a robotics project that'll let a robot follow you, or more accurately your cell phone. The reason I'm taking this approach is that it lets me log things in a way that in the end should be extremely easy to debug visually on a Google Map.
I believe that by taking a difference in time from each point and comparing it across the adapters, I should be able to have a somewhat accurate shot at the origin location, but this math is just too far beyond me right now.
I have cross-posted this question to the Mathematics site.

There are various algorithms for this, I found a simple paper here that looks helpful, but there are also more advanced least-squares algorithms in various journals.
Just as a warning, multilateration is very sensitive to position errors of the sensors and errors in the time difference of arrival. So your results might not be particularly good -- you've said your clocks are not synchronized (they need to be) and that you are using GPS for location (which have a ±3 m error). For what it's worth, you can use GPS for time too, but I'm not sure of the error on that.

A couple of (unfortunately negative) points:
If your timestamps are computed when the signal hits the antennae then all you'll be able to work out is the direction to the source and not the distance. After all, a signal the comes from a million miles away will have the same propagation delay between 2 antenna as one that comes from a meter away.
Unless your robot is very large I would be surprised if the deltas between the timestamps were not completely dominated by factors other than signal propagation delay. EM radiation goes quite quickly, so there is very little room for error. For example:
the wifi adapters will have some kind of onboard processing firmware - how quickly does it report new signals? Is the delay constant or does it depend on arcane details of the 802.11 spec? Will you be notified of the arrival of a signal, or the arrival of a complete packet which may have been the result of a whole series of acks and retransmissions?
Your device is linked to the adapters via some kind of IO bus - Even if we assume that the adapters are perfect there's going to be contention on this bus when a new pulse is received - which adapter wins and gets processed first?
Your device may have a single-core CPU - how quickly can a signal from an adapter be processed and given a timestamp? The delay between events will determine the fidelity of your timestamps, and thus the maximum accuracy of the system.
Is the device completely to-the-metal dedicated to putting timestamps on signals, or is there other software running too? What if some other event pre-empts your signal processing?
If you're in an indoor environment you will get indirect propagation - assuming that the system itself is perfect, how do you detect the case where the signal detected on one adapter took a longer path by bouncing off a wall or two?

image/video processing options

I have a small 12 volt board camera that is placed inside a bee hive. It is lit with infrared LEDs (bees can't see infrared). It sends a simple NTSC signal along a wire to a little TV monitor I have. This allows me to see the inside of the hive, without disturbing the bees.
The queen has a dot on her back such that it is very obvious when she's in the frame.
I would like to have something processing the signal such that it registers when the queen is in the frame. This doesn't have to be a very accurate count. Instead of processing the video, it would be just as fine to take an image every 10 seconds and see if there is a certain amount of brightness (indicating that the queen is in frame).
This is useful since it helps bee keepers know if the queen is alive (if she didn't appear for a number of days it could mean something is wrong).
I would love to hear suggestions for inexpensive ways of processing this video, especially with low power consumption. Raspberry pi? Arduino?
Camera example:
here
Sample video (no queen in frame):
here

First off, great project. I wish I was working on something this fun.
The obvious solution here is OpenCV, which will run on both Raspberry Pi (Linux) and the Android platform but not on an Arduino as far as I know. (Of the two, I'd go with Raspberry Pi to start with, since it will be less particular in how you do the programming.)
As you describe it, you may be able to get away with less robust image processing tools, but these problems are rarely as easy as they seem at first. For example, it seems to me that the brightest spot in the video is (what I guess to be) the illuminating diode reflecting off the glass. But if it's not this it will be something else, so don't start the project with your hands tied behind your back. And if this can't done with OpenCV, it probably can't be done at all.
Raspberry Pi computers are about $50, OpenCV is free, so I doubt you'll get much cheaper than this.
In case you haven't done something like this before, I'd recommend not programming OpenCV directly in C++ for something that's exploratory like this, and not very demanding either. Instead, use, for example, the Python bindings so you can explore the images interactively.
You also asked about Arduino, and I don't think this is such a good choice for this type of project. First, you'd need extra hardware, like a video shield (e.g., http://nootropicdesign.com/ve/), adding to the expense. Second, there aren't good image processing libraries for the Arduino, so you'd be doing everything from scratch. Third, generally speaking, debugging a microcontroller program is more difficult.

I don't have a good answer about image processing, but I know how to make it much easier. When you mark the queen, throw some retro-reflecting beads on the paint to get a much higher light return.
I think you can simply mix the beads in with your paint -- use 1 part beads to 3 parts paint by volume. That said, I think you'll get better results if you pour beads onto the surface of the wet paint when marking the queen. I'd pour a lot of beads on to ensure some stick (you can do it over a bowl or bag to catch all the extra beads.
I suggest doing some tests before marking the queen -- I've never applied beads before, but I've worked with retroreflective tape and paint, and it will give you a significantly higher light return. How much higher strongly depends (i.e. I don't have a number) but I'm guessing at least 2-5 times more light -- enough that your camera will saturate when it sees the queen with current exposure settings. If you set a trigger on saturation of some threshold number of pixels (making sure few pixels saturate normally) this should give you a very good signal to noise ratio that will vastly simplify image processing.to
[EDIT]
I did a little more digging, and there are some important parameters to consider. First, at an index of 1.5 (the beads I'd linked before) the beads won't focus light on the back surface and retro-reflect, they'll just act like lenses. They'll probably sparkle and reflect a bit, but you might be better off just adding glitter to the paint.
You can get VERY highly reflective tape that has the right kind of beads AND has a reflective coating on the back of the beads to reflect vastly more light! You'll have to figure out how to glue a bit of tape to a queen to use it, but it might be the best reflection you can get.
http://www.amazon.com/3M-198-Scotch-Reflective-Silver/dp/B00004Z49Q
You can also try the beads I recommended earlier with an index of refraction of 1.5. I'd be sure to test it on paper against glitter to make sure you're not wasting your time.
http://www.colesafety.com/Reflective-Powder-Glass-Beads-GSB10Powder.htm
I'm having trouble finding a source for 1lb or less glass beads with 1.9+ refractive index. I'll do more searching and I'll let you know if I find a decent source of small quantities.

Recognising tone of the audio

I have a guitar and I need my pc to be able to tell what note is being played, recognizing the tone. Is it possible to do it in python, also is it possible with pygame? Being able of doing it in pygame would be very helpful.

To recognize the frequency of an audio signal, you would use the FFT (fast Fourier transform) algorithm. As far as I can tell, PyGame has no means to record audio, nor does it support the FFT transform.
First, you need to capture the raw sampled data from the sound card; this kind of data is called PCM (Pulse Code Modulation). The simplest way to capture audio in Python is using the PyAudio library (Python bindings to PortAudio). GStreamer can also do it, it's probably an overkill for your purposes. Capturing 16-bit samples at a rate of 48000 Hz is pretty typical and probably the best a normal sound card will give you.
Once you have raw PCM audio data, you can use the fftpack module from the scipy library to run the samples through the FFT transform. This will give you a frequency distribution of the analysed audio signal, i.e., how strong is the signal in certain frequency bands. Then, it's a matter of finding the frequency that has the strongest signal.
You might need some additional filtering to avoid harmonic frequencies I am not sure.

I once wrote a utility that does exactly that - it analyses what sounds are being played.
You can look at the code here (or you can download the whole project. its integrated with Frets On Fire, a guitar hero open source clone to create a real guitar hero). It was tested using a guitar, an harmonica and whistles :) The code is ugly, but it works :)
I used pymedia to record, and scipy for the FFT.
Except for the basics that others already noted, I can give you some tips:
If you record from mic, there is a lot of noise. You'll have to use a lot of trial-and-error to set thresholds and sound clean up methods to get it working. One possible solution is to use an electric guitar, and plug its output to the audio-in. This worked best for me.
Specifically, there is a lot of noise around 50Hz. That's not so bad, but its overtones (see below) are at 100 Hz and 150 Hz, and that's close to guitar's G2 and D3.... As I said my solution was to switch to an electric guitar.
There is a tradeoff between speed of detection, and accuracy. The more samples you take, the longer it will take you to detect sounds, but you'll be more accurate detecting the exact pitch. If you really want to make a project out of this, you probably need to use several time scales.
When a tones is played, it has overtones. Sometimes, after a few seconds, the overtones might even be more powerful than the base tone. If you don't deal with this, your program with think it heard E2 for a few seconds, and then E3. To overcome this, I used a list of currently playing sounds, and then as long as this note, or one of its overtones had energy in it, I assumed its the same note being played....
It is specifically hard to detect when someone plays the same note 2 (or more) times in a row, because it's hard to distinguish between that, and random fluctuations of sound level. You'll see in my code that I had to use a constant that had to be configured to match the guitar used (apparently every guitar has its own pattern of power fluctuations).

You will need to use an audio library such as the built-in audioop.
Analyzing the specific note being played is not trivial, but can be done using those APIs.
Also could be of use: http://wiki.python.org/moin/PythonInMusic

Very similar questions:
Audio Processing - Tone Recognition
Real time pitch detection
Real-time pitch detection using FFT
Turning sound into a sequence of notes is not an easy thing to do, especially with multiple notes at once. Read through Google results for "frequency estimation" and "note recognition".
I have some Python frequency estimation examples, but this is only a portion of what you need to solve to get notes from guitar recordings.

This link shows some one doing it in VB.NET but the basics of what need to be done to achieve your goal is captured in these links below.
STFT
Colley Tukey
FFT

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.