Extracting features from audio signal

Extracting features from audio signal - python

I have just started to work on data in the form of audio. I am using librosa as a tool. My project requires me to extract features like:
Total duration of the audio
Minimum Intensity of the audio signal
Maximum Intensity of the audio signal
Mean Intensity of the audio signal
Jitter
Rate of speaking
Number of Pauses
Maximum Duration of Pauses
Average Duration of Pauses
Total Duration of Pauses
Although, I know about these terms but I have no idea how to extract these from an audio file. Are these inbuilt in some form in the librosa.feature variable? Or we need to manually calculate these? Can someone guide me how to proceed?
I know that this job can be performed using softwares like Praat, but I need to do it in python.
Praat can be used for spectral analysis (spectrograms), pitch
analysis, formant analysis, intensity analysis, jitter, shimmer, and
voice breaks.

Related

How to detect pitch abnormalities in audio stream?

I need to extract audio stream from a video and check whether it has any pitch changes or abnormalities. Ideally, we want to quantify any pitch changes in the audio stream. I'm aware that I can use ffmpeg to extract the audio stream from the video. However, what tools or programs (python?) can then be used to identify and quantify any pitch changes or abnormalities in the audio stream?

Pitch analysis is not an easy task, luckily there are existing solutions for that. https://pypi.org/project/crepe/ is an example that looks promising.
You could read the resulting CSV of pitch data into a Pandas dataframe and perform whatever data analysis you can think of.
For example for the pitch change analysis you could do
df['pitch_change'] = df.frequency.diff(periods=1)
To get a column representing the pitch change of every time unit.

Pytorch audio map time point to location in spectrogram

I have an audio file that lasts 294 seconds (sampling rate is 50000). I use torchaudio to compute its spectrogram the following way:
T.MelSpectrogram(sample_rate=50000, n_fft=1024, hop_length=512)
Say, there is an important event in the original .wav audio at second 57 exactly. How can I determine exactly what pixel that event will start at on the spectrogram.
Or, put simply, how can I map a moment in an audio to a location in a spectrogram?

What algorithms used in audio limiters?

I'd like to recreate it in numpy or other python library.
I mean a function, that not just simply clips all the samples above the threshold level or normalizes the whole audio. But a function that takes an audio waveform in a range (-1;1), attack time, decay time and threshold level in dB. Reduces the volume of samples above the threshold without distortion and outputs a new sound.
All the solutions I've found so far either add distortion like ffmpeg or don't use 64-bit floating point calculations like SOX.

FFT on high Audio Freqs

Hi I am trying to plot fft data from real time audio input for my project work . I need to see the spectral components as I have to perform some actions based on frequency response the code that I am modifying is based on here
http://flothesof.github.io/pyqt-microphone-fft-application.html
I am interested in frequency response around 18KHz , and I am sampling at 44100 Hz but the above code works only till 6800Hz after that it simply plots garbage .CHUNK size is 2048 . what should I do to get frequency around 18KHz.Also I dont see aliasing around 6800Hz on overshooting it.
This is my first question on stackoverflow am sorry if it made you face palm . I dont have a DSP background but I am studying it on my own.
Thanks priest

Python: ultrasonic to audio range

I'm using Python 2.7.3 and I have a question relating to ultrasonic frequencies:
Sampling at 40MHz, I measure an ultrasonic signal that's a convolution of a 1MHz resonant frequency and an envelope - The envelope of which depends on the media through which ultrasonic signal travels. I would like to listen to this received signal, my question is:
How may I map the received signal into the range of human hearing? Or put another way,
How may I down-sample and convert this signal to an audio frequency (keep the envelope shape and maybe even elongate the time so it’s longer).
Simulated signal here, but its typically like this in any case:
import numpy as np
import matplotlib.pylab as plt
# resonant frequency is 1MHz
f = 1e6
Omega = 2*np.pi*f
# samle at 40MHz or ts=25ns, for about 1000 samples:
t = np.arange(0,25e-6,25e-9)
y = np.sin(Omega*t) * (t**2) * np.exp(-t/3e-6)
y /= max(y)
plt.plot(y)
plt.grid()
plt.xlabel('sample')
plt.ylabel('value')
plt.show()

There are two common answers to your question:
Just play it at a fraction of the sampling frequency. If you play your signal back with, e.g. 44.1 kHz sampling frequency, you will have an audible tone of approximately 1000 Hz and signal length of roughly 20 ms. (I picked 44.1 kHz as it is certainly one of the frequencies any hw can play back.) This is probably easiest to accomplish by saving your signal into a WAV file (see the wave module) and then you may play it back with anything that plays WAV files.
The standard method would be to mix the resonant frequency down to audible frequencies. This is the fundamental thing in radios. Mathematically it involves multiplying by a carrier frequency which is close to the resonant frequency, and then low-pass filtering the result. The operation can also be viewed as shifting the frequency spectrum closer to 0. However, as your signal envelope is very fast (0.25 ms), this would only result in a short click and thus not be useful here.
Other solutions can be figured out, if there are further requirements. The envelope frequency and the resonant frequency seem to be relatively close to each other, which limits the options. If you need to do this for a real time signal, then the challenge will be elongating the envelope, because then the envelope has to be detected. Otherwise it is not possible to stretch the time.

I wanted to make this a comment, but I have some examples.
There would be many ways to represent this. You could use sound as an encoding medium.
If your original waveform has few properties, like frequency (constant), and envelope (variable/can be approximated), you can for example encode the frequency in a binary form with a short sequence of sounds and silence (1=generate sound/0=generate silence), you could then represent the amplitude with a constant sound with variable frequency (ex. a 100Hz sound would represent a 0 amplitude, and a 10000Hz sound would represent max amplitude). To rebuild the original envelope, you could use interpolation.
I hope you see my point.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting features from audio signal - python

Related

How to detect pitch abnormalities in audio stream?

Pytorch audio map time point to location in spectrogram

What algorithms used in audio limiters?

FFT on high Audio Freqs

Python: ultrasonic to audio range

Categories

Resources