I'm using Python 2.7.3 and I have a question relating to ultrasonic frequencies:
Sampling at 40MHz, I measure an ultrasonic signal that's a convolution of a 1MHz resonant frequency and an envelope - The envelope of which depends on the media through which ultrasonic signal travels. I would like to listen to this received signal, my question is:
How may I map the received signal into the range of human hearing? Or put another way,
How may I down-sample and convert this signal to an audio frequency (keep the envelope shape and maybe even elongate the time so it’s longer).
Simulated signal here, but its typically like this in any case:
import numpy as np
import matplotlib.pylab as plt
# resonant frequency is 1MHz
f = 1e6
Omega = 2*np.pi*f
# samle at 40MHz or ts=25ns, for about 1000 samples:
t = np.arange(0,25e-6,25e-9)
y = np.sin(Omega*t) * (t**2) * np.exp(-t/3e-6)
y /= max(y)
plt.plot(y)
plt.grid()
plt.xlabel('sample')
plt.ylabel('value')
plt.show()
There are two common answers to your question:
Just play it at a fraction of the sampling frequency. If you play your signal back with, e.g. 44.1 kHz sampling frequency, you will have an audible tone of approximately 1000 Hz and signal length of roughly 20 ms. (I picked 44.1 kHz as it is certainly one of the frequencies any hw can play back.) This is probably easiest to accomplish by saving your signal into a WAV file (see the wave module) and then you may play it back with anything that plays WAV files.
The standard method would be to mix the resonant frequency down to audible frequencies. This is the fundamental thing in radios. Mathematically it involves multiplying by a carrier frequency which is close to the resonant frequency, and then low-pass filtering the result. The operation can also be viewed as shifting the frequency spectrum closer to 0. However, as your signal envelope is very fast (0.25 ms), this would only result in a short click and thus not be useful here.
Other solutions can be figured out, if there are further requirements. The envelope frequency and the resonant frequency seem to be relatively close to each other, which limits the options. If you need to do this for a real time signal, then the challenge will be elongating the envelope, because then the envelope has to be detected. Otherwise it is not possible to stretch the time.
I wanted to make this a comment, but I have some examples.
There would be many ways to represent this. You could use sound as an encoding medium.
If your original waveform has few properties, like frequency (constant), and envelope (variable/can be approximated), you can for example encode the frequency in a binary form with a short sequence of sounds and silence (1=generate sound/0=generate silence), you could then represent the amplitude with a constant sound with variable frequency (ex. a 100Hz sound would represent a 0 amplitude, and a 10000Hz sound would represent max amplitude). To rebuild the original envelope, you could use interpolation.
I hope you see my point.
Related
I'd like to recreate it in numpy or other python library.
I mean a function, that not just simply clips all the samples above the threshold level or normalizes the whole audio. But a function that takes an audio waveform in a range (-1;1), attack time, decay time and threshold level in dB. Reduces the volume of samples above the threshold without distortion and outputs a new sound.
All the solutions I've found so far either add distortion like ffmpeg or don't use 64-bit floating point calculations like SOX.
I tried to reproduce Watson's spectrum plot from these set of slides (PDF p. 30, p.29 of the slides), that came from this data of housing building permits.
Watson achieves a very smooth spectrum curve in which it is very easy to tell the peak frequencies.
When I tried to run a FFT on the data, I get a really noisy spectrum curve and I wonder if there is an intermediate step that I am missing.
I ran the fourier analysis on python, using scipy package fftpack as follows:
from scipy import fftpack
fs = 1 / 12 # monthly
N = data.shape[0]
spectrum = fftpack.fft(data.PERMITNSA.values)
freqs = fftpack.fftfreq(len(spectrum)) #* fs
plt.plot(freqs[:N//2], 20 * np.log10(np.abs(spectrum[:N//2])))
Could anyone help me with the missing link?
The original data is:
Below is the Watson's spectrum curve, the one I tried to reproduce:
And these are my results:
The posted curve doesn't look realistic. But there are many methods to get a smooth result with a similar amount of "curviness", using various kinds of resampling and/or plot interpolation.
One method I like is to chop the data into segments (windows, possibly overlapped) roughly 4X longer than the maximum number of "bumps" you want to see, maybe a bit longer. Then window each segment before using a much longer (size of about the resolution of the final plot you want) zero-padded FFT. Then average the results of the multiple FFTs of the multiple windowed segments. This works because a zero-padded FFT is (almost) equivalent to a highest-quality Sinc interpolating low-pass filter.
I was just getting started with a code to pre-process some audio data in order to lately feed a neural network with it. Before explaining more deeply my actual problem, mention that I took the reference for how to do the project from this site. Also used some code taken from this post and read for more info in the signal.spectogram doc and this post.
For now with all of the sources mentioned before, I managed to get the wav audio file as a numpy array and plot both its amplitude and spectrogram. Theese represent a recording of me saying the word "command" in Spanish.
The strange fact here is that I search on the internet and found that human voice spectrum moves between 80 and 8k Hz, so just to get sure I compared this output with the one Audacity spectrogram returned. As you can see, this seems to be more coherent with the info found, as the frequency range is the one supposed to be for humans.
So that takes me to final question: Am I doing something wrong in the process of reading the audio or generating the spectrogram or maybe am I having plot issues?
By the way I'm new to both python and signal processing so thx in advance for your patience.
Here is the code I'm actually using:
def espectrograma(wav):
sample_rate, samples = wavfile.read(wav)
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, nperseg=320, noverlap=16, scaling='density')
#dBS = 10 * np.log10(spectrogram) # convert to dB
plt.subplot(2,1,1)
plt.plot(samples[0:3100])
plt.subplot(2,1,2)
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')
plt.ylim(0,30)
plt.ylabel('Frecuencia [kHz]')
plt.xlabel('Fragmento[20ms]')
plt.colorbar()
plt.show()
The computation of the spectrogram seems fine to me. If you plot the spectrogram in log scale you should observe something more similar to the audition plots you referenced. So uncomment your line
#dBS = 10 * np.log10(spectrogram) # convert to dB
and then use the variable dBS for the plotting instead of spectrogram in
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')
The spectrogram uses a fourier transform to convert your timeseries data into frequency domain.
The maximum frequency that can be measured is (sampling frequency) / 2, so in this case it may seem like your sampling frequency is 60KHz?
Anyway, regarding your question. It may be correct that the human voice spectrum lies within this range, but the fourier transform is never perfect. I would simply adjust your Y-Axis to specifically look at these frequencies.
It seems to me that you are calculating your spectrogram correctly, at least as long as you are reading the sample_rate and samples correctly..
I am using freq_from_crossings from here (I haven't changed the code). My input is an audio file with an acoustic guitar E2 note and nothing else (as my microphone is pretty bad, the sound is not very clear).
This is the waveform:
And this is the spectrogram I am getting:
From the spectrogram it is pretty clear that the loudest harmonic corresponds to the E2 note. However, freq_from_crossings returns 415.461966359 which is not at all the pitch played. What components could have gone wrong?
Thanks
A waveform that is not a single pure sinewave can have more zero crossings than once per pitch period. Within one period, it can include lots of "wiggles" that cross zero. The harmonic content of your guitar note spectrogram shows that the total waveform is far from being a single pure sinewave. It's also changing over time.
Therefore, estimating pitch frequency from zero crossings won't work for these types of guitar sounds.
In my experience, zero-crossings and auto-correlation are terrible ways to attempt pitch detection -- even on a monophonic signal. Consider using a method that employs either a FFT or DFT transform to acquire the initial frequency activity.
https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection
https://github.com/CreativeDetectors/PitchScope_Player
Given an audio byte array data in python like so
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, card)
# Set attributes: Mono, 48000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(48000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
l, data = inp.read()
How do I detect digital clipping, which value does data have to exceed to be sure that it was digitally clipped?
Overdrive is basically gain distortion. It raises the voltage to the point that the driver just cuts the top off and thus distorts the signal. If you need to test this in a digital sense, it would be hard clipping. So you would need to search for values that pass the maximum threshold. With 16-bit audio files the clip is going to be 0db by the nature of it. Because if theres no more bits left to save to, then the software will automatically chop it off to the maximum a 16 bit integer can hold. Unfortunately if the track had been previously distorted and then had the volume lowered so as to blend into the mix better, your probably not going to find it. Unless that is, what you're examining is the only sound source on the track, it which case just find the maximum for the track and set that as your threshold. I can say though that hard clipping shows up as a square wave, so you could search for consecutively identical values for a time period longer then a common audible wave (as to ignore legitimate square wave tones). Thats about the best I can do for you though.