How to detect an object position in an image with Tensorflow?

How to detect an object position in an image with Tensorflow? - python

I'm doing a project with Tensorflow which consist in analyzing UML diagrams drawn on a whiteboard or tablet devices to get in the end a file with the correct UML diagram, usable with softwares. The system will also use Machine learning (explaining why we choose Tensorflow).
As the project goes by with researches, my partner and I have been facing a problem : we don't know how to detect object positions in a picture with Tensorflow. We made some researches and found some articles talking about it, but no real conclusion available. We eventually met this but we're left with no real tracks on what to do.
Our real question is more about : is there anything new since that (because Tensorflow is upgrading pretty fast in my opinion)? Could we have some articles/hints on what to do then?
Thanks in advance.

You should take a look at this work : https://github.com/Russell91/TensorBox and the associated paper.

Related

Hand recognition in Python without precise landmarks

Is there any mediapipe module that detects hands WITHOUT detecting their pose?
The reason is that the examples I find on the internet end up running slow on my computer, and I don't need to know the position of the fingers, just the hand.
I tried to google it but all the videos/tutorials I find are the same code (which detects each landmark in hand). I'm not much in the ML area and I don't know if there isn't a ready-made model for it or if I didn't know the correct terms to search.
As an addendum if anyone knows some way to use GPU acceleration on Windows it would also work, as I believe it would improve FPS. Everything I found said that this is only possible on Linux, so I gave up and thought about looking for a simpler model that consumes less CPU.

Can you attach sounds to a VTK simulation?

I have following scenario:
I want to have a vector field simulation which shows the current of a fluid, lets say water. This current produces a certain noise, which can change when a solid object is submerged into the current.
Is there a way to somehow attach this noise/sound to the visuals of VTK?
I am not really experienced with VTK, so any point in the right direction is appreciated.
Thanks in advance!

This is a pretty general question on an esoteric topic. A good first step in these cases is to do a scientific journal review to see what researchers have attempted before, what tools they used and what success they had. After a quick search I found a few relevant journals that cover generating sound from simulations/data.
Sounding liquids: Automatic sound synthesis from fluid simulation
Visual to Sound: Generating Natural Sound for Videos in the Wild
Auditory Display and the VTK Sonification Toolkit
Listen to your data: Model-based sonification for data analysis
After reviewing these, you'll have a better idea of what's already been attempted and what's possible.

How to create a 3d model out of a series of 2d images? using python

So lets say I have a objects like a face. I take multiple pictures of the face all at different angles and from far and close. I have a sort of idea of how to make a 3d model out of these pictures but don't know how to accomplish them. My idea goes likes this.
First make code that gets the image object and gets rid of all background "noise".
Second find what part of the 3d model the picture is about and place a tag on the image for where it should fit.
Third collect and overlap all the images together to create a 3d object.
Anyone have any idea how to accomplish any of these steps or any ideas how to create a 3d model out of a series of images? I use python 3.10.4.

It seems that you are asking if there are some Python modules that would help to implement a complete photogrammetry process.
Please note that, even in the existing (and commercial) photogrammetry solutions, the process is not always fully-automated, sometimes it require some manual tweaking & point cloud selection.
Anyway, to the best of my knowledge, what you asked requires to implement the following steps:
detecting common features between the different photographs
infer the position in space of the camera that took each photograph
generate a point cloud of the photographs based on their relative position in space and the common features
convert the point cloud in a 3D mesh.
Possibly, all of these steps can be implemented in Python but I'm not aware that such a "off-the-shelf" module does exist.

There's this commercial solution called: Metashape from Agisoft, it has a python module you can use, but beware that it has its pitfalls (it threw segmentation fault for me at the end of processing which makes things... icky) and the support kind of ignores bigger problems and you can expect that they would ignore your ticket. Still, does the job quite well.

Path detection and progress in the maze with live stereo3d image

I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)

The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.

Practical implementation of OpenCV Kalman filter w/python?

I'm neither an expert in OpenCV or python but after far too much messing around with poor c# implementations of cv libraries I decided to take the plunge.
Thus far I've got 'blob' (read-contour) tracking working the way I want - my problem now is occlusion, a problem which, as I (and myriad youtube videos) understand it, the Kalman filter can solve. The problem is, relevant examples in python don't seem to exist and the example code is largely devoid of comments, ergo how a red and yellow line running all over the shop solve my problem is a mystery to me.
What I want to achieve is something like this http://www.youtube.com/watch?v=lvmEE_LWPUc or this http://www.youtube.com/watch?v=sG-h5ONsj9s.
I'd be very grateful if someone could point me in the direction of (or provide) an example using actual images pulled from a webcam or video.
Thanks in Advance.

You can take a look at:
https://github.com/dajuric/accord-net-extensions
It implements Kalman filtering, particle filtering, Joint Probability Data Association Filter (for multi-object tracking) along with motion models.
Samples included!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.