Calculation of fft using python - python

By using wave in Python we can read .wav audio format and can calculate the frequency and power of a signal. But I want to calculate the frequency of .mp3 audio format directly. I've heard a little bit about Pysox. Is Pysox capable of reading frames and can we calculate the fft and frequency using Pysox? Or is there any other software which can calculate the frequency of an MP3 file using Python?

your questions has a few parts, but I'll give it a shot: you can get the raw audio data using pydub (the same thing the wave module gives you)
import pydub
sound = pydub.AudioSegment.from_mp3("/path/to/file.mp3")
raw_data = sound._data
(note that you'll need ffmpeg or avlib installed for the mp3 decoding)
From there you should be able to use numpy. This O'Reilly post may also help

Related

How do I change the speed of an audio file in Python, like in Audacity, without quality loss?

I'm building a simple Python application that involves altering the speed of an audio track.
(I acknowledge that changing the framerate of an audio also make pitch appear different, and I do not care about pitch of the audio being altered).
I have tried using solution from abhi krishnan using pydub, which looks like this.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
However, the audio with changed speed sounds distorted, or crackled, which would not be heard with using Audacity to do the same, and I hope I find out a way to reproduce in Python how Audacity (or other digital audio editors) changes the speed of audio tracks.
I presume that the quality loss is caused by the original audio having low framerate, which is 8kHz, and that .set_frame_rate(sound.frame_rate) tries to sample points of the audio with altered speed in the original, low framerate. Simple attempts of setting the framerate of the original audio or the one with altered framerate, and the one that were to be exported didn't work out.
Is there a way in Pydub or in other Python modules that perform the task in the same way Audacity does?
Assuming what you want to do is to play audio back at say x1.5 the speed of the original. This is synonymous to saying to resample the audio samples down by 2/3rds and pretend that the sampling rate hasn't changed. Assuming this is what you are after, I suspect most DSP packages would support it (search audio resampling as the keyphrase).
You can try scipy.signal.resample_poly()
from scipy.signal import resample_poly
dec_data = resample_poly(sound.raw_data,up=2,down=3)
dec_data should have 2/3rds of the number of samples as the original raw_data samples. If you play dec_data samples at the sound's sampling rate, you should get a sped-up version. The downside of using resample_poly is you need a rational factor, and having large numerator or denominator will cause output less ideal. You can try scipy's resample function or seek other packages, which supports audio resampling.

How to extract perceived loudness of a speech signal in an audio (WAV) file using Python?

I would like to extract loudness of a speech signal from an audio file (WAV). I believe it is a perceived quantity that depends not only on the amplitude of a signal but also the frequencies involved. I found a link that was useful https://github.com/librosa/librosa/issues/463 but I
would like to use existing packages that calculate this efficiently
am uncertain the approach described here is appropriate.
For 1, I found Parselmouth, a wrapper around Praat to work with, but am unsure on how to proceed after extracting the Intensity and Pitch values by doing so:
snd = parselmouth.Sound(path)
intensity = snd.to_intensity()
pitch = snd.to_pitch()
I have also looked into Pydub and PyAudioAnalysis but couldn't find direct methods of evaluating loudness using those either.
What is a pythonic, package object-oriented way of extracting loudness from a WAV file?
You could use pyloudnorm:
import soundfile as sf
import pyloudnorm as pyln
data, rate = sf.read("test.wav")
meter = pyln.Meter(rate) #
loudness = meter.integrated_loudness(data)

pydub copying, resaving and splitting 192kHz sample rate wav file

I have .wav files sampled at 192kHz and want to split them based on time to many smaller files while keeping the same sample rate.
To start with I thought I would just open and re-save the wav file using pydub in order to learn how to do this. However when I save it it appears to resave at a much lower file size, I'm not sure why, perhaps the sample rate is lower? and I also can't open the new file with the audio analysis program I usually use (Song scope).
So I had two questions:
- How to open, read, copy and resave a wav file using pydub without changing it? (Sorry I know this is probably easy I just can't find it yet).
Whether Python and Pydub are a sensible choice for what I am trying to do? Or maybe there is a much simpler way.
what I am exactly trying to do is:
Split about 10 high sample frequency wav files (~ 1GB each) into many
(about 100) small wave files. (I plan to make a list of start and end
times for each of the smaller wav files needed then get Python to open
copy and resave the wav file data between those times).
I assume it is possible since I've seen questions for lower frequency wav files, but if you know otherwise or know of a simpler way please let me know. Thanks!!
My code so far is as follows:
from pydub import AudioSegment
# Input audio file to be sliced
audio = AudioSegment.from_wav("20190212_164446.wav")
audio.export("newWavFile.wav")
(I put the wav file and ffmpeg into the same directory as the Python file to save time since was having a lot of trouble getting pydub to find ffmpeg).
In case it's relevant the files are of bat calls, these bats make calls between around 1kHz and 50kHz which is quite low frequency for bats. I'm trying to crop out the actual calls from some very long files.
I know this is a basic question, I just couldn't find the answer yet, please also feel free to direct me to the answer if it's a duplicate.
thanks!!

What's a good way to examine audio with python and split it between high, mid and low pitches for visualizaton?

So, I'm planning on trying out making a light organ with an Arduino and Python, communicating over serial to control the brightness of several LEDs. The computer will use the microphone or a playing MP3 to generate the data.
I'm not so sure how to handle the audio processing. What's a good option for python that can take either a playing audio file or microphone data (I'd prefer the microphone), and then split it into different frequency ranges and write the intensity to variables? Do I need to worry about overtones if I use the microphone?
If you're not committed to using Python, you should also look at using PureData (PD) to handle the audio analysis. Interfacing PD to the Arduino is already a solved problem, and there are a lot of pre-existing components that make working with audio easy.
Try http://wiki.python.org/moin/Audio for links to various Python audio processing packages.
The audioop package has some basic waveform manipulation functions.
See also:
Detect and record a sound with python
Detect & Record Audio in Python
Portaudio has a Python interface that would let you read data off the microphone.
For the band splitting, you could use something like a band-pass filter feeding into an envelope follower -- one filter+follower for each frequency band of interest.

Making specific frequency (ranges) louder

I want to make certain frequencies in a sequence of audio data louder. I have already analyzed the data using FFT and have gotten a value for each audio frequency in the data. I just have no idea how I can use the frequencies to manipulate the sound data itself.
From what I understand so far, data is encoded in such a way that the difference between every two consecutive readings determines the audio amplitude at that time instant. So making the audio louder at that time instant would involve making the difference between the two consecutive readings greater. But how do I know which time instants are involved with which frequency? I don't know when the frequency starts appearing.
(I am using Python, specifically PyAudio for getting the audio data and Num/SciPy for the FFT, though this probably shouldn't be relevant.)
You are looking for a graphic equalizer. Some quick Googling turned up rbeq, which seems to be a plugin for Rhythmbox written in Python. I haven't looked through the code to see if the actual EQ part is written in Python or is just controlling something in the host, but I recommend looking through their source.

Categories

Resources