What algorithms used in audio limiters? - python

I'd like to recreate it in numpy or other python library.
I mean a function, that not just simply clips all the samples above the threshold level or normalizes the whole audio. But a function that takes an audio waveform in a range (-1;1), attack time, decay time and threshold level in dB. Reduces the volume of samples above the threshold without distortion and outputs a new sound.
All the solutions I've found so far either add distortion like ffmpeg or don't use 64-bit floating point calculations like SOX.

Related

Extracting features from audio signal

I have just started to work on data in the form of audio. I am using librosa as a tool. My project requires me to extract features like:
Total duration of the audio
Minimum Intensity of the audio signal
Maximum Intensity of the audio signal
Mean Intensity of the audio signal
Jitter
Rate of speaking
Number of Pauses
Maximum Duration of Pauses
Average Duration of Pauses
Total Duration of Pauses
Although, I know about these terms but I have no idea how to extract these from an audio file. Are these inbuilt in some form in the librosa.feature variable? Or we need to manually calculate these? Can someone guide me how to proceed?
I know that this job can be performed using softwares like Praat, but I need to do it in python.
Praat can be used for spectral analysis (spectrograms), pitch
analysis, formant analysis, intensity analysis, jitter, shimmer, and
voice breaks.

Using Fourier Transform to Transform Image into Sound (I don't think it's working)

Backstory
I started messing with electronics, and realized I need an oscilloscope. I went to buy the oscilloscope (for like $40) online and watched tutorials on how to use them. I stumbled upon a video using the "X-Y" function of the oscilloscope to draw images; I thought that was cool. I tried searching how to do this from scratch and learned you need to convert the image into the frequency domain and some how convert that to an audio signal and send the signal to the two channels on the oscilloscope from the left and right channels from the audio output. So now I am trying to do the image processing part.
What I Got So Far
Choosing an Image
First thing I did was to create an nxn image using some drawing software. I've read online that the total number of pixels of the image should be a power of two. I don't know why, but I created 256x256 pixel images to minimize calculation time. Here is the image I used for this example.
I kept the image simple, so I can vividly see the symmetry when it is transformed. Therefore, if there is no symmetry, then there must be something wrong.
The MATLAB Code
The first thing I did was read the image, convert to gray scale, change data type, and grab the size of the image (for size variability for later use).
%Read image
img = imread('tets.jpg');
%Convert image to gray scale
grayImage = rgb2gray(img);
%Incompatability of data type. uint8 type vs double
grayImage = double(grayImage);
%Grab size of image
[nx, ny, nz] = size(grayImage);
The Algorithm
This is where things get a bit hazy. I am somewhat familiar with the Fourier Transform due to some Mechanical Engineering classes, but the topic was broadly introduced and never really fundamentally part of the course. It was more like, "Hey, check out this thing; but use the Laplace Transformation instead."
So somehow you have to incorporate spatial, amplitude, frequency, and time when doing the calculation. I understand that the spatial coordinates is just the location of each pixel on the image in a matrix or bitmap. I also understand that the amplitude is just the gray scale value from 0-255 of a certain pixel. However, I don't necessarily know how to incorporate frequency and time based on the pixel itself. I think I read somewhere that the frequency increases as the y location of the pixel increases, and the time variable increases with the x location. Here's the link (read first part of Part II).
So I tried following the formula as well as other formulas online and this is what I got for the MATLAB code.
if nx ~= ny
error('Image size must be NxN.'); %for some reason
else
%prepare transformation matrix
DFT = zeros(nx,ny);
%compute transformation for each pixel
for ii = 1:1:nx
for jj = 1:1:ny
amplitude = grayImage(ii,jj);
DFT(ii,jj) = amplitude * exp(-1i * 2 * pi * ((ii*ii/nx) + (jj*jj/ny)));
end
end
%plot of complex numbers
plot(DFT, '*');
%calculate magnitude and phase
magnitudeAverage = abs(DFT)/nx;
phase = angle(DFT);
%plot magnitudes and phase
figure;
plot(magnitudeAverage);
figure;
plot(phase);
end
This code simply tries to follow this discrete fourier transform example video that I found on YouTube. After the calculation I plotted the complex numbers in complex domain. This appears to be in polar coordinates; I don't know why. As stated in the video about the Nyquist Limit, I plotted the average magnitude too. As well as the phase angles of the complex numbers. I'll just show you the plots!
The Plots
Complex Numbers
This is the complex plot; I believe it's in polar form instead of cartesian, but I don't know. It appears symmetric too.
Average Amplitude Vs. Sample
The vertical axis is amplitude, and the horizontal axis is the sample number. This looks like the deconstruction of the signal, but then again I don't really know what I am looking at.
Phase Angle Vs. Sample
The vertical axis is the phase angle, and the horizontal axis is the sample number. This looks the most promising because it looks like a plot in the frequency domain, but this isn't suppose to be a plot in the frequency domain; rather, its a plot in the sample domain? Again, I don't know what I am looking at.
I Need Help Understanding
I need to somehow understand these plots, so I know I am getting the right plot. I believe there may be something very wrong in the algorithm because it doesn't necessarily implement the frequency and time component. So maybe you can tell me how that is done? Or at least guide me?
TLDR;
I am trying to convert images into sound files to display on an oscilloscope. I am stuck on the image processing part. I believe there is something wrong with the MATLAB code (check above) because it doesn't necessarily include the frequency and time component of each pixel. I need help with the code and understanding how to interpret the result, so I know the transfromations are correct-ish.

Pitch detection results wrong

I am using freq_from_crossings from here (I haven't changed the code). My input is an audio file with an acoustic guitar E2 note and nothing else (as my microphone is pretty bad, the sound is not very clear).
This is the waveform:
And this is the spectrogram I am getting:
From the spectrogram it is pretty clear that the loudest harmonic corresponds to the E2 note. However, freq_from_crossings returns 415.461966359 which is not at all the pitch played. What components could have gone wrong?
Thanks
A waveform that is not a single pure sinewave can have more zero crossings than once per pitch period. Within one period, it can include lots of "wiggles" that cross zero. The harmonic content of your guitar note spectrogram shows that the total waveform is far from being a single pure sinewave. It's also changing over time.
Therefore, estimating pitch frequency from zero crossings won't work for these types of guitar sounds.
In my experience, zero-crossings and auto-correlation are terrible ways to attempt pitch detection -- even on a monophonic signal. Consider using a method that employs either a FFT or DFT transform to acquire the initial frequency activity.
https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection
https://github.com/CreativeDetectors/PitchScope_Player

Python: ultrasonic to audio range

I'm using Python 2.7.3 and I have a question relating to ultrasonic frequencies:
Sampling at 40MHz, I measure an ultrasonic signal that's a convolution of a 1MHz resonant frequency and an envelope - The envelope of which depends on the media through which ultrasonic signal travels. I would like to listen to this received signal, my question is:
How may I map the received signal into the range of human hearing? Or put another way,
How may I down-sample and convert this signal to an audio frequency (keep the envelope shape and maybe even elongate the time so it’s longer).
Simulated signal here, but its typically like this in any case:
import numpy as np
import matplotlib.pylab as plt
# resonant frequency is 1MHz
f = 1e6
Omega = 2*np.pi*f
# samle at 40MHz or ts=25ns, for about 1000 samples:
t = np.arange(0,25e-6,25e-9)
y = np.sin(Omega*t) * (t**2) * np.exp(-t/3e-6)
y /= max(y)
plt.plot(y)
plt.grid()
plt.xlabel('sample')
plt.ylabel('value')
plt.show()
There are two common answers to your question:
Just play it at a fraction of the sampling frequency. If you play your signal back with, e.g. 44.1 kHz sampling frequency, you will have an audible tone of approximately 1000 Hz and signal length of roughly 20 ms. (I picked 44.1 kHz as it is certainly one of the frequencies any hw can play back.) This is probably easiest to accomplish by saving your signal into a WAV file (see the wave module) and then you may play it back with anything that plays WAV files.
The standard method would be to mix the resonant frequency down to audible frequencies. This is the fundamental thing in radios. Mathematically it involves multiplying by a carrier frequency which is close to the resonant frequency, and then low-pass filtering the result. The operation can also be viewed as shifting the frequency spectrum closer to 0. However, as your signal envelope is very fast (0.25 ms), this would only result in a short click and thus not be useful here.
Other solutions can be figured out, if there are further requirements. The envelope frequency and the resonant frequency seem to be relatively close to each other, which limits the options. If you need to do this for a real time signal, then the challenge will be elongating the envelope, because then the envelope has to be detected. Otherwise it is not possible to stretch the time.
I wanted to make this a comment, but I have some examples.
There would be many ways to represent this. You could use sound as an encoding medium.
If your original waveform has few properties, like frequency (constant), and envelope (variable/can be approximated), you can for example encode the frequency in a binary form with a short sequence of sounds and silence (1=generate sound/0=generate silence), you could then represent the amplitude with a constant sound with variable frequency (ex. a 100Hz sound would represent a 0 amplitude, and a 10000Hz sound would represent max amplitude). To rebuild the original envelope, you could use interpolation.
I hope you see my point.

Detecting digital clipping in audio signal

Given an audio byte array data in python like so
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, card)
# Set attributes: Mono, 48000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(48000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
l, data = inp.read()
How do I detect digital clipping, which value does data have to exceed to be sure that it was digitally clipped?
Overdrive is basically gain distortion. It raises the voltage to the point that the driver just cuts the top off and thus distorts the signal. If you need to test this in a digital sense, it would be hard clipping. So you would need to search for values that pass the maximum threshold. With 16-bit audio files the clip is going to be 0db by the nature of it. Because if theres no more bits left to save to, then the software will automatically chop it off to the maximum a 16 bit integer can hold. Unfortunately if the track had been previously distorted and then had the volume lowered so as to blend into the mix better, your probably not going to find it. Unless that is, what you're examining is the only sound source on the track, it which case just find the maximum for the track and set that as your threshold. I can say though that hard clipping shows up as a square wave, so you could search for consecutively identical values for a time period longer then a common audible wave (as to ignore legitimate square wave tones). Thats about the best I can do for you though.

Categories

Resources