Using Python to measure audio "loudness"

Using Python to measure audio "loudness" - python

I'm looking to calculate the loudness of a piece of audio using Python — probably by extracting the peak volume of a piece of audio, or possibly using a more accurate measure (RMS?).
What's the best way to do this? I've had a look at pyaudio, but that didn't seem to do what I wanted. What looked good was ruby-audio, as this seemingly has sound.abs.max built into it.
The input audio will be taken from various local MP3 files that are around 30s in duration.

I think that the RMS would be the the most accurate measure. One thing to note is that we percieve loudness differently at different frequencies, so convert the audio to frequency space with an fft (numpy.fft should work great on only 30s of audio). Now compute a power spectral density from this. Weight the PSD by frequency using some loudness curve. Especially frequencies below 10Hz, since there will be a lot of power there (it would dominate the RMS calculation in the time-domain), yet we can't hear it. Now integrate the PSD and take the square root and that will give a percieved RMS.
You can also break the mp3 into sections or windows and apply this technique to give the volume in particular sections.

Related

How to calculate beats per minute from heart sounds recorded through android MIC?

I have many .wav files having heart sounds recorded through MIC by putting phone directly on people chest. I want to calculate BPM from these sounds. Could you please help regarding this? Any library,algorithm or tutorials?

Can you (are you allowed to) put some sample somewhere?
I've played with some ECG (up to 12 eletrode) and neural signals (spikes look a lot similar to the R-S transition). Those spikes were so big, a simple find_peaks from scipy.signal was enough to detect them. I used a butterworth filter before that though. You might need that too, filtering out the 50/60Hz mains is common, there might be similar noises in audio as well.
After finding the peaks, beats per minute is a division (and probably some averaging).

What you're trying to do is essentially calculate the fourier domain for the given sound file, and then identify the strongest peak. That's likely going to be the frequency of your dominant signal (which in this case should be the heart-rate).
Thankfully, someone else has already asked / answered this on stackoverflow.
The only caveat with this approach, is if there are other repetitive signals that dominate the heart-beat, in which case you may need to clean your data first.

How to Identify Each Components from Audio Signal?

I have some audio files recorded from wind turbines, and I'm trying to do anomaly detection. The general idea is if a blade has a fault (e.g. cracking), the sound of this blade will differ with other two blades, so we can basically find a way to extract each blade's sound signal and compare the similarity / distance between them, if one of this signals has a significant difference, we can say the turbine is going to fail.
I only have some faulty samples, labels are lacking.
However, there seems to be no one doing this kind of work, and I met lots of troubles while attempting.
I've tried using stft to convert the signal to power spectrum, and some spikes show. How to identify each blade from the raw data? (Some related work use AutoEncoders to detect anomaly from audio, but in this task we want to use some similarity-based method.)
Anyone has good idea? Have some related work / paper to recommend?

Well...
If your shaft is rotating at, say 1200 RPM or 20 Hz, then all the significant sound produced by that rotation should be at harmonics of 20Hz.
If the turbine has 3 perfect blades, however, then it will be in exactly the same configuration 3 times for every rotation, so all of the sound produced by the rotation should be confined to multiples of 60 Hz.
Energy at the other harmonics of 20 Hz -- 20, 40, 80, 100, etc. -- that is above the noise floor would generally result from differences between the blades.
This of course ignores noise from other sources that are also synchronized to the shaft, which can mess up the analysis.

Assuming that the audio you got is from a location where one can hear individual blades as they pass by, there are two subproblems:
1) Estimate each blade position, and extract the audio for each blade.
2) Compare the signal from each blade to eachother. Determine if one of them is different enough to be considered an anomaly
Estimating the blade position can be done with a sensor that detects the rotation directly. For example based on the magnetic field of the generator. Ideally you would have this kind known-good sensor data, at least while developing your system. It may be possible to estimate using only audio, using some sort of Periodicity Detection. Autocorrelation is a commonly used technique for that.
To detect differences between blades, you can try to use a standard distance function on a standard feature description, like Euclidean on MFCC. You will still need to have some samples for both known faulty examples and known good/acceptable examples, to evaluate your solution.
There is however a risk that this will not be good enough. Then try to compute some better features as basis for the distance computation. Perhaps using an AutoEncoder. You can also try some sort of Similarity Learning.
If you have a good amount of both good and faulty data, you may be able to use a triplet loss setup to learn the similarity metric. Feed in data for two good blades as objects that should be similar, and the known-bad as something that should be dissimilar.

Python | librosa: how to extract human voice from an audio wav file?

Given a wav file (mono 16KHz sampling rate) of an audio recording of a human talking, is there a way to extract just the voice, thereby filtering out most mechanical and background noise? I'm trying to use librosa package in Python 3.6 for this, but can't figure out how piptrack works (or if there is a simpler way).
When tried using an fft/ifft to restrict frequencies to 300-3400 range, the resulting sound was severely distorted.
sr, y = scipy.io.wavfile.read(wav_file_path)
x = np.fft.rfft(y)[0:3400]
x[0:300] = 0
x = np.fft.irfft(x)

Extracting the human voice of an audio file is an actively researched problem. It's often referred to as 'Speech Enhancement' in scientific literature. Latest developments in the field tend to be presented at the Interspeech and IEEE ICASSP conferences. You can also check out the Deep Noise Surpression Challenge from Microsoft.
The complexity of removing unwanted sound from a speech recording is highly dependent on the unwanted sound, and how much you know about it. If, as your attempt suggest, you are only interested in filtering out low frequency noise, then you may be able to get some noise reduction with a proper low pass filter. Librosa has some filter implementations, and numpy/scipy will give you even more options.
Simply zeroing fft coefficients will give terrible distortion. See this stackoverflow answer as to why this never is a good idea.

Pitch detection in Python

The concept of the program I'm working on is a Python module which detects certain frequencies (human speech frequency 80-300hz) and by checking from a database shows the intonation of the sentence. I use SciPy to plot frequency of the sound files, but I cannot set any certain frequency in order to analyze pitch. How can I do this?
more info: I would like to be able to set a defined pattern in speech (e.g. Rising, Falling) and the program detects if the sound file follows the specific pattern.

UPDATE in 2019, now there are very accurate pitch trackers based on neural networks. And they work in Python out-of-the-box. Check
https://pypi.org/project/crepe/
ANSWER FROM 2015. Pitch detection is a complex problem, a latest Google's package provides highly intelligent solution to this non-trivial task:
https://github.com/google/REAPER
You can wrap it in Python if you want to access it from Python.

You could try the following. I'm sure you know that human voice also has harmonics which go way beyond 300 Hz. Nevertheless, you can move a window across your audio file, and try to look at change in power in the max ( as shown below) or a set of frequencies in a window. The code below is for giving intuition:
import scipy.fftpack as sf
import numpy as np
def maxFrequency(X, F_sample, Low_cutoff=80, High_cutoff= 300):
""" Searching presence of frequencies on a real signal using FFT
Inputs
=======
X: 1-D numpy array, the real time domain audio signal (single channel time series)
Low_cutoff: float, frequency components below this frequency will not pass the filter (physical frequency in unit of Hz)
High_cutoff: float, frequency components above this frequency will not pass the filter (physical frequency in unit of Hz)
F_sample: float, the sampling frequency of the signal (physical frequency in unit of Hz)
"""
M = X.size # let M be the length of the time series
Spectrum = sf.rfft(X, n=M)
[Low_cutoff, High_cutoff, F_sample] = map(float, [Low_cutoff, High_cutoff, F_sample])
#Convert cutoff frequencies into points on spectrum
[Low_point, High_point] = map(lambda F: F/F_sample * M, [Low_cutoff, High_cutoff])
maximumFrequency = np.where(Spectrum == np.max(Spectrum[Low_point : High_point])) # Calculating which frequency has max power.
return maximumFrequency
voiceVector = []
for window in fullAudio: # Run a window of appropriate length across the audio file
voiceVector.append (maxFrequency( window, samplingRate))
Now based on the intonation of the voice, the maximum power frequency may shift which you can register and map to a given intonation. This may not necessarily be true always, and you may have to monitor shifts in a lot of frequencies together, but this should get you started.

There are many different algorithms to estimate pitch, but a study found that Praat's algorithm is the most accurate [1]. Recently, the Parselmouth library has made it a lot easier to call Praat functions from Python [2].
[1]: Strömbergsson, Sofia. "Today's Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech." INTERSPEECH. 2016. https://pdfs.semanticscholar.org/ff04/0316f44eab5c0497cec280bfb1fd0e7c0e85.pdf
[2]: https://github.com/YannickJadoul/Parselmouth

There are basically two classes of f0 (pitch) estimation: time domain (with autocorrelation/cross-correlation, for example), and frequency domain (e.g. identifying the fundamental frequency by measuring distances between harmonics, or identifying the frequency in the spectrum with maximum power, as shown in the example above by Sahil M).
For many years I have successfully used RAPT (Robust Algorithm for Pitch Tracking), the predecessor of REAPER, also by David Talkin. The widely used Praat software which you mention also includes a RAPT-like cross-correlation algorithm option. Description and code are readily available on the web. A DEB install archive is available here: http://www.phon.ox.ac.uk/releases
Pattern detection (rises, falls, etc.) with the pitch function is a separate issue. The suggestion above by Sahil M for using a moving window across the pitch function is a good way to start.

Recognising tone of the audio

I have a guitar and I need my pc to be able to tell what note is being played, recognizing the tone. Is it possible to do it in python, also is it possible with pygame? Being able of doing it in pygame would be very helpful.

To recognize the frequency of an audio signal, you would use the FFT (fast Fourier transform) algorithm. As far as I can tell, PyGame has no means to record audio, nor does it support the FFT transform.
First, you need to capture the raw sampled data from the sound card; this kind of data is called PCM (Pulse Code Modulation). The simplest way to capture audio in Python is using the PyAudio library (Python bindings to PortAudio). GStreamer can also do it, it's probably an overkill for your purposes. Capturing 16-bit samples at a rate of 48000 Hz is pretty typical and probably the best a normal sound card will give you.
Once you have raw PCM audio data, you can use the fftpack module from the scipy library to run the samples through the FFT transform. This will give you a frequency distribution of the analysed audio signal, i.e., how strong is the signal in certain frequency bands. Then, it's a matter of finding the frequency that has the strongest signal.
You might need some additional filtering to avoid harmonic frequencies I am not sure.

I once wrote a utility that does exactly that - it analyses what sounds are being played.
You can look at the code here (or you can download the whole project. its integrated with Frets On Fire, a guitar hero open source clone to create a real guitar hero). It was tested using a guitar, an harmonica and whistles :) The code is ugly, but it works :)
I used pymedia to record, and scipy for the FFT.
Except for the basics that others already noted, I can give you some tips:
If you record from mic, there is a lot of noise. You'll have to use a lot of trial-and-error to set thresholds and sound clean up methods to get it working. One possible solution is to use an electric guitar, and plug its output to the audio-in. This worked best for me.
Specifically, there is a lot of noise around 50Hz. That's not so bad, but its overtones (see below) are at 100 Hz and 150 Hz, and that's close to guitar's G2 and D3.... As I said my solution was to switch to an electric guitar.
There is a tradeoff between speed of detection, and accuracy. The more samples you take, the longer it will take you to detect sounds, but you'll be more accurate detecting the exact pitch. If you really want to make a project out of this, you probably need to use several time scales.
When a tones is played, it has overtones. Sometimes, after a few seconds, the overtones might even be more powerful than the base tone. If you don't deal with this, your program with think it heard E2 for a few seconds, and then E3. To overcome this, I used a list of currently playing sounds, and then as long as this note, or one of its overtones had energy in it, I assumed its the same note being played....
It is specifically hard to detect when someone plays the same note 2 (or more) times in a row, because it's hard to distinguish between that, and random fluctuations of sound level. You'll see in my code that I had to use a constant that had to be configured to match the guitar used (apparently every guitar has its own pattern of power fluctuations).

You will need to use an audio library such as the built-in audioop.
Analyzing the specific note being played is not trivial, but can be done using those APIs.
Also could be of use: http://wiki.python.org/moin/PythonInMusic

Very similar questions:
Audio Processing - Tone Recognition
Real time pitch detection
Real-time pitch detection using FFT
Turning sound into a sequence of notes is not an easy thing to do, especially with multiple notes at once. Read through Google results for "frequency estimation" and "note recognition".
I have some Python frequency estimation examples, but this is only a portion of what you need to solve to get notes from guitar recordings.

This link shows some one doing it in VB.NET but the basics of what need to be done to achieve your goal is captured in these links below.
STFT
Colley Tukey
FFT

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.