I want to make a real time fft analysis of my audio file, when converting from binary of the wav file to int with struct.unpack I get the error:
data_int =struct.unpack(format, data)
struct.error: unpack requires a buffer of 4096 bytes
Every time I change the audio format or rate to match the 4096 buffersize goes up 2x or something similar.
What is weird is that in the while loop, I see no error only at the last iteration of the while loop, and then I can't get out of it.
import numpy as np
import pyaudio
import matplotlib.pyplot as plt
import wave
import argparse
from struct import *
import struct
#Get file
filename = 'snare16.wav'
# Set chunk size of 1024 samples per data frame
chunk = 1024
# Open the sound file
wf = wave.open(filename, 'rb')
# Create an interface to PortAudio
p = pyaudio.PyAudio()
# Open a .Stream object to write the WAV file to
stream = p.open(format = p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = wf.getframerate(),
output = True)
# Read data in chunks
data = wf.readframes(chunk)
#Convert binary numbers into int (Bit-Depth)
count = len(data)/2
format = "%dh"%(count) #results in '2048h' as format: 2048 short
# Play the sound by writing the audio data to the stream
while data != '':
#play audio
stream.write(data)
#Read the chunk each rate
data = wf.readframes(chunk)
#Convert data into np array, i.e integers
data_int =struct.unpack(format, data)
Related
I'm new to python,I'm trying get a FFT value of a uploaded wav file and return the FFT of each frame in each line of a text file (using GCP)
using scipy or librosa
Frame rate i require is 30fps
wave file will be of 48k sample rate
so my questions are
how do i divide the samples for the whole wav file into samples of each frame
How do add empty samples to make the length of the frame samples power of 2 (as 48000/30 = 1600 add 448 empty samples to make it 2048)
how do i normalize the resulting FFT array to [-1,1]?
You can use pyaudio with callback to acheive whatever you are doing.
import pyaudio
import wave
import time
import struct
import sys
import numpy as np
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
def callback_test(data, frame_count, time_info, status):
frame_count =1024
elm = wf.readframes(frame_count) # read n frames
da_i = np.frombuffer(elm, dtype='<i2') # convert to little endian int pairs
da_fft = np.fft.rfft(da_i) # fast fourier transform for real values
da_ifft = np.fft.irfft(da_fft) # inverse fast fourier transform for real values
da_i = da_ifft.astype('<i2') # convert to little endian int pairs
da_m = da_i.tobytes() # convert to bytes
return (da_m, pyaudio.paContinue)
# open stream using callback (3)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),# sampling frequency
output=True,
stream_callback=callback_test)
# # start the stream (4)
stream.start_stream()
# # wait for stream to finish (5)
while stream.is_active():
time.sleep(0.1)
# # stop stream (6)
stream.stop_stream()
stream.close()
wf.close()
# close PyAudio (7)
p.terminate()
Please refer these links for further study:
https://people.csail.mit.edu/hubert/pyaudio/docs/#example-callback-mode-audio-i-o
and
Python change pitch of wav file
I'm trying to make some changes to a sound frequency and amplitude in place. I am currently getting the sound data, but whenever I try multiplying it by a value, for example to change amplitude, I get a lot of noise. I was wondering how to do that in a clean way. I need to loop through the data because the changes in frequency and amplitude rely on user input (I'll later make changes according to hand position using a webcam). With this current code, my input file had a single channel. I am not sure why, but in "getNewWave", if I change it to np.int16, the audio gets noise as well. Thanks!!
import pyaudio
import wave
import sys
import numpy as np
def getNewWave(data):
newdata = np.frombuffer(data, np.int8)
#make some changes to amplitude and frequency
return newdata
def main():
CHUNK = 1024
if len(sys.argv) < 2:
print("Missing input wav file. File must have single channel")
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=wf.getframerate(),
output=True, frames_per_buffer=CHUNK)
data = wf.readframes(CHUNK)
while data != '':
stream.write(getNewWave(data))
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
main()
I'm trying to extract frequency from .wav files. So I'm using python wave and numpy, I'm almost done! But I face an error.. I followed this url's answer : Extracting frequencies from a wav file python
when I exract frequency from .wav file that created myself by following that answer, it succeed. However, when I exract frequency from .wav file that recorded by mic. it raised an error :
struct.error: unpack requires a buffer of 288768 bytes
following is my code
import wave
import struct
import numpy as np
if __name__ == '__main__':
wf = wave.open('test6.wav', 'rb')
frame = wf.getnframes()
data_size = wf.getnframes()
frate = wf.getframerate()
data = wf.readframes(data_size)
wf.close()
duration = frame / float(frate)
data = struct.unpack('{n}h'.format(n=data_size), data)
data = np.array(data)
w = np.fft.fft(data)
freqs = np.fft.fftfreq(len(w))
print(freqs.min(), freqs.max())
# (-0.5, 0.499975)
# Find the peak in the coefficients
idx = np.argmax(np.abs(w))
freq = freqs[idx]
freq_in_hertz = abs(freq * frate)
print('freqiency: ',freq_in_hertz)
print('duration: ',duration)
288768 in error message is exactly double of data_size.
So when I use data_size=wf.getnframes()*2, it does not raise error. But, it raise an error with file that created by code.
How can I solve this?
Given that the size of the buffer is exactly double data_size, I would guess that the .wav file you recorded with your mic has two channels instead of one. You can verify this by looking at the output of wf.getnchannels(). It should be 2 for your mic recording.
If this is the case, you can load just one channel of your mic recording by following this answer:
Read the data of a single channel from a stereo wave file in Python
I've been trying to work on a project to detect time shift between two streaming audio signals. I worked with python3, Pyaudio and I'm using a Motux828 sound card with a Neumann KU-100 microphone which takes a stereo input. So when i check my input_device_index I am the correct one which is the 4th one connnected to MOTU soundcard.
However when i record with:
import time
import pyaudio
import wave
CHUNK = 1024 * 3 # Chunk is the bytes which are currently processed
FORMAT = pyaudio.paInt16
RATE = 44100
RECORD_SECONDS = 2
WAVE_OUTPUT = "temp.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,channels=2,rate=RATE,input=True,frames_per_buffer=CHUNK,input_device_index=4)
frames = [] # np array storing all the data
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream1.read(CHUNK)
frames.append(data1)
stream.stop_stream()
stream.close()
p.terminate()
wavef = wave.open(WAVE_OUTPUT, 'wb') # opening the file
wavef.setnchannels(1)
wavef.setsampwidth(p.get_sample_size(FORMAT))
wavef.setframerate(RATE)
wavef.writeframes(b''.join(frames1)) # writing the data to be saved
wavef.close()
I record a wave file with no sound, with almost no noise(naturally)
Also I can record with 3rd party softwares with the specific microphone.
It works completely, fine.
NOTE:
Sound card is 24-bit depth normally, I also tried paInt24 that records a wave file with pure noise
I think u mentioned wrong variable names as i seen your code. The wrong variables are :
data = stream1.read(CHUNK)
frames.append(data1)
wavef.writeframes(b''.join(frames1))
the correct values are :
data = stream.read(CHUNK)
frames.append(data)
wavef.writeframes(b''.join(frames))
How can I plot on matplotlib input signal from microphone?
I have tried to plot with plt.plot(frames) but frames is for some reason a string?
a) Why is frames variable a string list?
b) Why is data variable string list?
c) Should they represent energy/amplitude of single sample and be integers?
d) Why is length of data 2048 when I specified I want chunk size of 1024?
(I guess because i use paInt16, but cannot see still why it couldn't be 1024)
I have the following code for microphone input:
import pyaudio
import audioop
import matplotlib.pyplot as plt
import numpy as np
from itertools import izip
import wave
FORMAT = pyaudio.paInt16 # We use 16bit format per sample
CHANNELS = 1
RATE = 44100
CHUNK = 1024 # 1024bytes of data red from a buffer
RECORD_SECONDS = 3
WAVE_OUTPUT_FILENAME = "file.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
frames = ''.join(frames)
stream.stop_stream()
stream.close()
audio.terminate()
a) Why is frames variable a string list?
As a consequence of b), that's how you build it in your code.
b) Why is data variable string list?
It is a byte string, that is just a raw sequence of bytes. That's what read() returns.
c) Should they represent energy/amplitude of single sample and be integers?
They are. They're just packed in a byte sequence and not in Python integers.
d) Why is length of data 2048 when I specified I want chunk size of 1024?
1024 is the number of frames. Each frame is 2 bytes long, so you get 2048 bytes.
How can I plot on matplotlib input signal from microphone? I have tried to plot with plt.plot(frames) but frames is for some reason a string?
Depends on what you want to plot. Just raw amplitude can be obtained by transforming the byte string to a numpy array:
fig = plt.figure()
s = fig.add_subplot(111)
amplitude = numpy.fromstring(frames, numpy.int16)
s.plot(amplitude)
fig.savefig('t.png')
A more useful plot would be a spectrogram:
fig = plt.figure()
s = fig.add_subplot(111)
amplitude = numpy.fromstring(frames, numpy.int16)
s.specgram(amplitude)
fig.savefig('t.png')
But you can tinker with amplitude however you want, now that you have a numpy array.