I want to use python to dispose of an Audio file which can recognize only my voice. For example, I speak to a raspberry pi car about "forward". It will go straight but other people who speak "forward" cannot control my car.
or I want to regard another person's sounds as noise and eliminate it. How can I do? someone told me can use pca or ica to reduce those noisy.
You first recognize the command then extract the speaker with i-vector or d-vector to identify you.
You can find description of the algorithms in Apple's blog, for example. You can find implementation of the mentioned algorithms in Kaldi, they are not very easy to integrate though.
Related
I have following scenario:
I want to have a vector field simulation which shows the current of a fluid, lets say water. This current produces a certain noise, which can change when a solid object is submerged into the current.
Is there a way to somehow attach this noise/sound to the visuals of VTK?
I am not really experienced with VTK, so any point in the right direction is appreciated.
Thanks in advance!
This is a pretty general question on an esoteric topic. A good first step in these cases is to do a scientific journal review to see what researchers have attempted before, what tools they used and what success they had. After a quick search I found a few relevant journals that cover generating sound from simulations/data.
Sounding liquids: Automatic sound synthesis from fluid simulation
Visual to Sound: Generating Natural Sound for Videos in the Wild
Auditory Display and the VTK Sonification Toolkit
Listen to your data: Model-based sonification for data analysis
After reviewing these, you'll have a better idea of what's already been attempted and what's possible.
I was wondering if I could build an augmented reality system in Python using OpenCV and SLAM. If so, do you have any tutorials or documentation that you could recommend? I've been scratching my head for awhile now trying to find resources to start with, any help would be greatly appreciated!
If I were to be a bit more specific, it would be on how would I be able to integrate SLAM and AR together. SLAM acting as a form of mapping so that the AR would know where to place objects in.
if I want to be honest, python is not strong enough to bring you "RealTime Monocular SLAM System". So, firstly you should consider writing your SLAM system in C++ that is highly recommended for RealTime Systems!
Secondly, You Can see some openSource SLAM systems here (Stella-VSLAM, ORBSLAM3, PTAM). But Consider that for developing a SLAM system, you should gain knowledge in wide range of computer science related topics! and the main reason of why ARCore and ARKit are working great is their efficient SLAM system. you can also read this resource for more info on SLAM systems. If you have more questions, please don't hesitate to ask!
Where to Start
In order to gain some knowledge on SLAM and Computer Vision I would recommend watching Cyrill Stachniss' SLAM course and reading the papers ORB-SLAM, ORB-SLAM2, ORB-SLAM3, and DSO. For Computer Vision I recommend reading R. Szeliski book.
Which Language to Use
I wrote my thesis on SLAM and AR systems, and the outcome is the following: State-of-the-art SLAM systems which achieve the best accuracy are still using machine learning techniques: SURF, ORB descriptors, Bag of Words (BoW) etc. All of the systems (ORB-SLAM3, DM-VIO, DSO) are written in C++.
I'm always using C++ for programming SLAM, and only sometimes I use Python to write scripts for example to fix the recovered trajectory.
SLAM + AR
There's no much resources on this subject, although the idea is simple. SLAM system has to give you the camera location, usually as the 4x4 transformation matrix, where the first 3x3 matrix is the rotation matrix, and the last 3x1 column is the translation part. Example of the transformation matrix.
Having the camera location, you can use the projective geometry to project the AR objects on the camera frame. ORB-SLAM2 has a nice AR demo to study; basically they display a 2D image, and put the 3D rendered image on top of that.
They use Pangolin, so you need to know how to use OpenGL, Pangolin. I recommend studying Pangolin by its' examples, as it mostly documented through them.
I'd like to improve my little robot with machine learning.
Up to now it uses simple while and if then decisions in its main function to act as a lawn mowing robot.
My idea is to use SKLearn for that purpose.
Please help me to find the right first steps.
i have a few sensors that tell about the world otside:
World ={yaw, pan, tilt, distance_to_front_obstacle, ground_color}
I have a state vector
State = {left_motor, right_motor, cutter_motor}
that controls the 3 actors of the robot.
I'd like to build a dataset of input and output values to teach sklearn the wished behaviour, after that the input values should give the correct output values for the actors.
One example: if the motors are on and the robot should move forward but the distance meter tells constant values, the robot seems to be blocked. Now it should decide to draw back and turn and move to another direction.
First of all, do you think that this is possible with sklearn and second how should i start?
My (simple) robot control code is here: http://github.com/bgewehr/RPiMower
Please help me with the first steps!
I would suggest to use Reinforcement Learning. Here you have a tutorial of Q-Learning that fits well into your problem.
If you want code in python, right now I think there is no implementation of Q-learning in scikit-learn. However, I can give you some examples of code in python that you could use: 1, 2 and 3.
Also please have in mind that reinforcement learning is set to maximize the sum of all future rewards. You have to focus on the general view.
Good luck :-)
The sklearn package contains a lot of useful tools for machine learning so I dont think thats a problem. If it is, then there are definitely other useful python packages. I think collecting data for the supervised learning phase will be the challenging part, and wonder if it would be smart to make a track with tape within a grid system. That would make it be easier to translate the track to labels (x,y positions in the grid). Each cell in the grid should be small if you want to make complex tracks later on I think. It may be very smart to check how they did in the self-driving google car.
I'm neither an expert in OpenCV or python but after far too much messing around with poor c# implementations of cv libraries I decided to take the plunge.
Thus far I've got 'blob' (read-contour) tracking working the way I want - my problem now is occlusion, a problem which, as I (and myriad youtube videos) understand it, the Kalman filter can solve. The problem is, relevant examples in python don't seem to exist and the example code is largely devoid of comments, ergo how a red and yellow line running all over the shop solve my problem is a mystery to me.
What I want to achieve is something like this http://www.youtube.com/watch?v=lvmEE_LWPUc or this http://www.youtube.com/watch?v=sG-h5ONsj9s.
I'd be very grateful if someone could point me in the direction of (or provide) an example using actual images pulled from a webcam or video.
Thanks in Advance.
You can take a look at:
https://github.com/dajuric/accord-net-extensions
It implements Kalman filtering, particle filtering, Joint Probability Data Association Filter (for multi-object tracking) along with motion models.
Samples included!
I have a guitar and I need my pc to be able to tell what note is being played, recognizing the tone. Is it possible to do it in python, also is it possible with pygame? Being able of doing it in pygame would be very helpful.
To recognize the frequency of an audio signal, you would use the FFT (fast Fourier transform) algorithm. As far as I can tell, PyGame has no means to record audio, nor does it support the FFT transform.
First, you need to capture the raw sampled data from the sound card; this kind of data is called PCM (Pulse Code Modulation). The simplest way to capture audio in Python is using the PyAudio library (Python bindings to PortAudio). GStreamer can also do it, it's probably an overkill for your purposes. Capturing 16-bit samples at a rate of 48000 Hz is pretty typical and probably the best a normal sound card will give you.
Once you have raw PCM audio data, you can use the fftpack module from the scipy library to run the samples through the FFT transform. This will give you a frequency distribution of the analysed audio signal, i.e., how strong is the signal in certain frequency bands. Then, it's a matter of finding the frequency that has the strongest signal.
You might need some additional filtering to avoid harmonic frequencies I am not sure.
I once wrote a utility that does exactly that - it analyses what sounds are being played.
You can look at the code here (or you can download the whole project. its integrated with Frets On Fire, a guitar hero open source clone to create a real guitar hero). It was tested using a guitar, an harmonica and whistles :) The code is ugly, but it works :)
I used pymedia to record, and scipy for the FFT.
Except for the basics that others already noted, I can give you some tips:
If you record from mic, there is a lot of noise. You'll have to use a lot of trial-and-error to set thresholds and sound clean up methods to get it working. One possible solution is to use an electric guitar, and plug its output to the audio-in. This worked best for me.
Specifically, there is a lot of noise around 50Hz. That's not so bad, but its overtones (see below) are at 100 Hz and 150 Hz, and that's close to guitar's G2 and D3.... As I said my solution was to switch to an electric guitar.
There is a tradeoff between speed of detection, and accuracy. The more samples you take, the longer it will take you to detect sounds, but you'll be more accurate detecting the exact pitch. If you really want to make a project out of this, you probably need to use several time scales.
When a tones is played, it has overtones. Sometimes, after a few seconds, the overtones might even be more powerful than the base tone. If you don't deal with this, your program with think it heard E2 for a few seconds, and then E3. To overcome this, I used a list of currently playing sounds, and then as long as this note, or one of its overtones had energy in it, I assumed its the same note being played....
It is specifically hard to detect when someone plays the same note 2 (or more) times in a row, because it's hard to distinguish between that, and random fluctuations of sound level. You'll see in my code that I had to use a constant that had to be configured to match the guitar used (apparently every guitar has its own pattern of power fluctuations).
You will need to use an audio library such as the built-in audioop.
Analyzing the specific note being played is not trivial, but can be done using those APIs.
Also could be of use: http://wiki.python.org/moin/PythonInMusic
Very similar questions:
Audio Processing - Tone Recognition
Real time pitch detection
Real-time pitch detection using FFT
Turning sound into a sequence of notes is not an easy thing to do, especially with multiple notes at once. Read through Google results for "frequency estimation" and "note recognition".
I have some Python frequency estimation examples, but this is only a portion of what you need to solve to get notes from guitar recordings.
This link shows some one doing it in VB.NET but the basics of what need to be done to achieve your goal is captured in these links below.
STFT
Colley Tukey
FFT