image/video processing options - python

I have a small 12 volt board camera that is placed inside a bee hive. It is lit with infrared LEDs (bees can't see infrared). It sends a simple NTSC signal along a wire to a little TV monitor I have. This allows me to see the inside of the hive, without disturbing the bees.
The queen has a dot on her back such that it is very obvious when she's in the frame.
I would like to have something processing the signal such that it registers when the queen is in the frame. This doesn't have to be a very accurate count. Instead of processing the video, it would be just as fine to take an image every 10 seconds and see if there is a certain amount of brightness (indicating that the queen is in frame).
This is useful since it helps bee keepers know if the queen is alive (if she didn't appear for a number of days it could mean something is wrong).
I would love to hear suggestions for inexpensive ways of processing this video, especially with low power consumption. Raspberry pi? Arduino?
Camera example:
here
Sample video (no queen in frame):
here

First off, great project. I wish I was working on something this fun.
The obvious solution here is OpenCV, which will run on both Raspberry Pi (Linux) and the Android platform but not on an Arduino as far as I know. (Of the two, I'd go with Raspberry Pi to start with, since it will be less particular in how you do the programming.)
As you describe it, you may be able to get away with less robust image processing tools, but these problems are rarely as easy as they seem at first. For example, it seems to me that the brightest spot in the video is (what I guess to be) the illuminating diode reflecting off the glass. But if it's not this it will be something else, so don't start the project with your hands tied behind your back. And if this can't done with OpenCV, it probably can't be done at all.
Raspberry Pi computers are about $50, OpenCV is free, so I doubt you'll get much cheaper than this.
In case you haven't done something like this before, I'd recommend not programming OpenCV directly in C++ for something that's exploratory like this, and not very demanding either. Instead, use, for example, the Python bindings so you can explore the images interactively.
You also asked about Arduino, and I don't think this is such a good choice for this type of project. First, you'd need extra hardware, like a video shield (e.g., http://nootropicdesign.com/ve/), adding to the expense. Second, there aren't good image processing libraries for the Arduino, so you'd be doing everything from scratch. Third, generally speaking, debugging a microcontroller program is more difficult.

I don't have a good answer about image processing, but I know how to make it much easier. When you mark the queen, throw some retro-reflecting beads on the paint to get a much higher light return.
I think you can simply mix the beads in with your paint -- use 1 part beads to 3 parts paint by volume. That said, I think you'll get better results if you pour beads onto the surface of the wet paint when marking the queen. I'd pour a lot of beads on to ensure some stick (you can do it over a bowl or bag to catch all the extra beads.
I suggest doing some tests before marking the queen -- I've never applied beads before, but I've worked with retroreflective tape and paint, and it will give you a significantly higher light return. How much higher strongly depends (i.e. I don't have a number) but I'm guessing at least 2-5 times more light -- enough that your camera will saturate when it sees the queen with current exposure settings. If you set a trigger on saturation of some threshold number of pixels (making sure few pixels saturate normally) this should give you a very good signal to noise ratio that will vastly simplify image processing.to
[EDIT]
I did a little more digging, and there are some important parameters to consider. First, at an index of 1.5 (the beads I'd linked before) the beads won't focus light on the back surface and retro-reflect, they'll just act like lenses. They'll probably sparkle and reflect a bit, but you might be better off just adding glitter to the paint.
You can get VERY highly reflective tape that has the right kind of beads AND has a reflective coating on the back of the beads to reflect vastly more light! You'll have to figure out how to glue a bit of tape to a queen to use it, but it might be the best reflection you can get.
http://www.amazon.com/3M-198-Scotch-Reflective-Silver/dp/B00004Z49Q
You can also try the beads I recommended earlier with an index of refraction of 1.5. I'd be sure to test it on paper against glitter to make sure you're not wasting your time.
http://www.colesafety.com/Reflective-Powder-Glass-Beads-GSB10Powder.htm
I'm having trouble finding a source for 1lb or less glass beads with 1.9+ refractive index. I'll do more searching and I'll let you know if I find a decent source of small quantities.

Related

What does it really mean real time object detection?

So here is the context.
I created an script in python, YOLOv4, OpenCV, CUDA and CUDNN, for object detection and object tracking to count the objects in a video. I intend to use it in real time, but what real time really means? The video I'm using is 1min long and 60FPS originally, but the video after processing is 30FPS on average and takes 3mins to finish. So comparing both videos side by side, one is clearly faster. 30FPS is industry standard for movies and stuff. I'm trying to wrap my head around what real time truly means.
Imagine I need to use this information for traffic lights management or use this to lift a bridge for a passing boat, it should be done automatically. It's time sensitive or the chaos would be visible. In these cases, what it trully means to be real time?
First, learn what "real-time" means. Wikipedia: https://en.wikipedia.org/wiki/Real-time_computing
Understand the terms "hard" and "soft" real-time. Understand which aspects of your environment are soft and which require hard real-time.
Understand the response times that your environment requires. Understand the time scales.
This does not involve fuzzy terms like "quick" or "significant" or "accurate". It involves actual quantifiable time spans that depend on your task and its environment, acceptable error rates, ...
You did not share any details about your environment. I find it unlikely that you even need 30 fps for any application involving a road intersection.
You only need enough frame rate so you don't miss objects of interest, and you have fine enough data to track multiple objects with identity without mistaking them for each other.
Example: assume a car moving at 200 km/h. If your camera takes a frame every 1/30 second, the car moves 1.85 meters between frames.
How's your motion blur? What's the camera's exposure time? I'd recommend something on the order of a millisecond or better, giving motion blur of 0.05m
How's your tracking? Can it deal with objects "jumping" that far between frames? Does it generate object identity information that is usable for matching (association)?

Detecting transparency/glass clarity with openCV

I'm working on a project right now in which a key part relies on a glass slide being clean and fully transparent after having a film adhered to it.
By eye it is obvious when the the film has been well applied, since there are no cloudy sections or bubbles. Is there a good way to measure this 'clarity' with openCV?
I'm in the fortuante position that imaging of the glass can be carried out in a controlled environment; For example I can control the focus and anything that might appear behind the glass. I wondered if I could use this to try to pick out a known pattern and from that somehow determine clarity. Additionally, it may be helpful to be able to detect cracks in the glass, and I can control the lighting which may assist in illuminating fractures.
Has anyone done anything similar before, and could point me towards some reading or examples?

Qt button arrangement with fullscreen Camera view

im currently working on a small startup for some extra cash, im using qt 5.13 and my aim is to develop a small camera with functions like dimensional measurement based on the lense and height or edge detection and that sort of thing, these i will be developing in python with the use of opencv.
Anyways my question is this, before i dive in too deep to go back, is it possible to use qt, to run a (Pi)camera fullscreen, no edges and just have a small transparent button on a corner to be the settings? Like, this for sake of the UX, i wouldnt like to have borders or to need to cut screen size to add features.
In Qt, all cameras will be equal, so you can prototype it on your PC first, and it should work on RPi. Using QML, it should work just fine - it's a compositing framework that uses the GPU for composition, and RPi 4 has plenty enough GPU bandwidth to deal with it. QML supports semitransparent controls.
You may wish to see various augmented reality (AR) measurement applications available for iOS and Android (even just the Ruler included in iOS 12). You might be entering a crowded market. Those apps are not perfect, and there are simple cases that throw them off - like measuring the size of a window on a large flat wall on the side of a long but narrow room - there's too much bloom and not enough detail on the wall to have a stable depth reference, even on the best iPhone available.
If you can write software that is extremely robust, then you'll have a real market differentiator - but it won't generally be easy, and OpenCV is only a low-level building block. It's not unthinkable that you'll need some GPU-oriented computational framework instead (OpenCV provides some of it, but it's far from general).
Also, 99% of the UX will be the software, and that software should be very much portable by design, so investing anything in hardware before your software is good is a waste. Just as you suggest, a RPi 4 will do great for prototype hardware - but there's a catch that you may be limiting yourself unnecessarily by tying it all to a platform. There's so many platforms that settling on RPi when there's no market need for that is not sensible, I don't think.
You could use one of a multitude of WiFi battery-powered cameras with your PC: this will let you concentrate on the algorithms and functionality without having to mess with cross-compilation for RPi, etc. It'll also let you develop good software even if an RPi won't have enough bandwidth to do this realtime processing. There are faster platforms, so it'd be best not to get invested in any hardware at all. The quality of the camera will matter a lot, though, so you will want to start with a good WiFi camera, get things perfect, and then downgrade and see how far you can go. Even professional cameras provide WiFi streaming, so you can use a camera as good as you can afford. It will make things simpler to start with.
Also, don't spend time on the UI much before you get the core functionality solid. You'll be designing a "Debug" UI, and you perhaps should keep that one available but hidden in the final product.

Path detection and progress in the maze with live stereo3d image

I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)
The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.

Tkinter fast Image display?

I'm working on a basic Tkinter image viewer. After fiddling with a lot of the code, I have two lines which depending on the operation that triggered the refresh are 50-95% of the execution time.
self.photo = ImageTk.PhotoImage(resized)
self.main_image.config(image=self.photo)
Is there a faster way to display a PIL/Pillow Image in Tkinter?
I would recommend testing the 4000x4000 images you're concerned about (which is twice the resolution of a 4K monitor). Use a Google advanced search to find such images, or use a photo/image editor to tile an image you already have. Since I don't think many people would connect a 4K (or better) monitor to a low-end computer, you could then test the difference between scaling down a large image and simply displaying it, so if most of the work is in resizing the image you don't have to worry as much about that part.
Next, test the individual performance of each of the two lines you posted. You might try implementing some kind of intelligent pre-caching, which many programs do: resize and create the next photo as the user is looking at the current one, then when the user goes to the next image all the program has to do is reconfigure self.main_image. You can see this general strategy at work in the standard Windows Photo Viewer, which responds apparently instantaneously to normal usage, but can have noticeable lag if you browse too quickly or switch your browsing direction.
Also ask your target userbase what sort of machines they're using and how large their images are. Remember that reasonable "recommended minimum specs" are acceptable.
Most importantly, keep in mind that the point of Python is clear, concise, readable code. If you have to throw away those benefits for the sake of performance, you might as well use a language that places more emphasis on performance, like C. Make absolutely sure that these performance improvements are an actual need, rather than just "my program might lag! I have to fix that."

Categories

Resources