FFT for Spectrograms in Python

FFT for Spectrograms in Python - python

How would I go about using Python to read the frequency peaks from a WAV PCM file and then be able to generate an image of it, for spectogram analysis?
I'm trying to make a program that allows you to read any audio file, converting it to WAV PCM, and then finding the peaks and frequency cutoffs.

Python's wave library will let you import the audio. After that, you can use numpy to take an FFT of the audio.
Then, matplotlib makes very nice charts and graphs - absolutely comparable to MATLAB.
It's old as dirt, but this article would probably get you started on almost exactly the problem you're describing (article in Python of course).

Loading WAV files is easy using audiolab:
from audiolab import wavread
signal, fs, enc = wavread('test.wav')
or for reading any general audio format and converting to WAV:
from audiolab import Sndfile
sound_file = Sndfile('test.w64', 'r')
signal = wave_file.read_frames(wave_file.nframes)
The spectrogram is built into PyLab:
from pylab import *
specgram(signal)
Specifically, it's part of matplotlib. Here's a better example.

from pylab import *
specgram(signal)
is the easiest. Also quite handy in this context:
subplot
But be warned: Matplotlib is very slow but it creates beautiful images. You should not use it for demanding animation, even less when you are dealing with 3D

If you need to convert from PCM format to integers, you'll want to use struct.unpack.

Related

Reading small amplitude audio with librosa

I am trying to load a .wav file in Python using librosa library. Let's assume my code is as simple as:
import librosa
import numpy as np
pcm_data, spl_rate = librosa.core.load(resource_file, sr=None)
In general it does work, however I am experiencing strange quantization problems when reading audio files with amplitude of less than 1e-5. I need some really low amplitude noise samples for my project (VERY little ambient noise, yet not complete silence).
For instance, when I generate white noise of amplitude 0.00001 in Audacity, its waveform is visible in Audacity preview when fully magnified. It is also visible after exporting the waveform as 32bit float and re-importing it to empty Audacity project. However, when I read that file using code presented above, np.max(np.abs(pcm_data)) is 0.0. Did I just reach limits of Python in this matter? How do I read my data (without pre-scaling and rescaling in runtime)?

How do I use a binary mask and STFT to produce an audio file?

So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft.
Here's what I understand:
stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram.
By taking the inverse of the stft matrix, and multiplying it by a matrix of the same size (the binary matrix) you can create a new matrix with information to generate an audio file with the masked sound.
Once I do the matrix multiplication, how is the new audio file created?
It's not much but here's what I've got in terms of code:
from librosa import load
from librosa.core import stft, istft
y, sample_rate = load('1.wav')
spectrum = stft(y)
back_y = istft(spectrum)
Thank you, and here are some slides that got me this far. I'd appreciate it if you could give me an example/demo in python

get the amplitude data from an mp3 audio files using python

I have an mp3 file and I want to basically plot the amplitude spectrum present in that audio sample.
I know that we can do this very easily if we have a wav file. There are lot of python packages available for handling wav file format. However, I do not want to convert the file into wav format then store it and then use it.
What I am trying to achieve is to get the amplitude of an mp3 file directly and even if I have to convert it into wav format, the script should do it on air during runtime without actually storing the file in the database.
I know we can convert the file like follows:
from pydub import AudioSegment
sound = AudioSegment.from_mp3("test.mp3")
sound.export("temp.wav", format="wav")
and it creates the temp.wav which it supposed to but can we just use the content without storing the actual file?

MP3 is encoded wave (+ tags and other stuff). All you need to do is decode it using MP3 decoder. Decoder will give you whole audio data you need for further processing.
How to decode mp3? I am shocked there are so few available tools for Python. Although I found a good one in this question. It's called pydub and I hope I can use a sample snippet from author (I updated it with more info from wiki):
from pydub import AudioSegment
sound = AudioSegment.from_mp3("test.mp3")
# get raw audio data as a bytestring
raw_data = sound.raw_data
# get the frame rate
sample_rate = sound.frame_rate
# get amount of bytes contained in one sample
sample_size = sound.sample_width
# get channels
channels = sound.channels
Note that raw_data is 'on air' at this point ;). Now it's up to you how do you want to use gathered data, but this module seems to give you everything you need.

Reading a RAW image (.CR2) using numpy.fromfile

I am trying to read a raw image in .CR2 format ("Canon Raw format"). I wanted to do it with opencv initially but could not get it to work so I tried doing it with a numpy function:
img = np.fromfile('IMG.CR2', "uint16")
The camera is a canon EOS t5 18MP DSLR.
If I run img.size it return 10105415 which seems too small for an 18 MP camera.
My first question, is using np.fromfile() a valid approach?
Secondly, would you recommend any other python libraries to do the same process in an easier way/more efficient? I have openCV installed so if it could be done there, that would be great (I still want to store it as a numpy array).

Canon RAW format is not just a blob of data, it has some metadata which you need to parse. Luckily, others have already implemented some python parsers.
RAW Image processing in Python
After using one of the suggested solutions you can load the data into numpy array.

Delay Sum Beamforming in Python

I am trying to implement a simple delay-sum beamformer using a 4 Microphone Array. I am using MATLAB at the moment, which has an inbuilt Signal Processing toolkit that is quite helpful. I was wondering if there are such tools in Python. For starters, i want to know how to get an audio signal from a microphone in real time and have a continuous plot as a preliminary output.

use pyAudio you can get the audio signal from mic in real time.
http://people.csail.mit.edu/hubert/pyaudio/
to plot it, you can use matplotlib or Chaco:
Chaco has an example that use pyAudio and plot the spectrum of the audio signal:
https://github.com/enthought/chaco/blob/master/examples/demo/advanced/spectrum.py

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

FFT for Spectrograms in Python - python

How would I go about using Python to read the frequency peaks from a WAV PCM file and then be able to generate an image of it, for spectogram analysis? I'm trying to make a program that allows you to read any audio file, converting it to WAV PCM, and then finding the peaks and frequency cutoffs.

from pylab import * specgram(signal) is the easiest. Also quite handy in this context: subplot But be warned: Matplotlib is very slow but it creates beautiful images. You should not use it for demanding animation, even less when you are dealing with 3D

If you need to convert from PCM format to integers, you'll want to use struct.unpack.

Related

Reading small amplitude audio with librosa

How do I use a binary mask and STFT to produce an audio file?

get the amplitude data from an mp3 audio files using python

Reading a RAW image (.CR2) using numpy.fromfile

Delay Sum Beamforming in Python

Categories

Resources