So, I'm trying to use the Python Wave module to get an audio file and basically get all of the frames from it, examine them, and then write them back to another file. I tried to output the sound that I'm reading to another file just now, but it came out either as noise, or as no sound at all. So, I'm pretty sure that I'm not analyzing the file and getting the correct frames...? I'm dealing with a stereo 16-bit sound file. While I could use a simpler file to just understand the process, I eventually want to be able to accept any kind of sound file to work with, so I need to understand what the problem is.
I also noted that 32-bit sound files wouldn't be read by the Wave module - it gave me an error of "Unknown Format". Any ideas about that? Is it something I can bypass so that I could at least, for example, read 32-bit audio files, even if I can only 'render' 16-bit files?
I'm somewhat aware that wave files are interleaved between the left and right channels (first sample's for the left channel, second's for the right, etc)., but how do I separate the channels? Here's my code. I cut out the output code to just see if I'm reading the files correctly. I'm using Python 2.7.2:
import scipy
import wave
import struct
import numpy
import pylab
fp = wave.open('./sinewave16.wav', 'rb') # Problem loading certain kinds of wave files in binary?
samplerate = fp.getframerate()
totalsamples = fp.getnframes()
fft_length = 2048 # Guess
num_fft = (totalsamples / fft_length) - 2
temp = numpy.zeros((num_fft, fft_length), float)
leftchannel = numpy.zeros((num_fft, fft_length), float)
rightchannel = numpy.zeros((num_fft, fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length / fp.getnchannels() / fp.getsampwidth());
#tempb = fp.readframes(fft_length)
up = (struct.unpack("%dB"%(fft_length), tempb))
#up = (struct.unpack("%dB"%(fft_length * fp.getnchannels() * fp.getsampwidth()), tempb))
#print (len(up))
temp[i,:] = numpy.array(up, float) - 128.0
temp = temp * numpy.hamming(fft_length)
temp.shape = (-1, fp.getnchannels())
fftd = numpy.fft.rfft(temp)
pylab.plot(abs(fftd[:,1]))
pylab.show()
#Frequency of an FFT should be as follows:
#The first bin in the FFT is DC (0 Hz), the second bin is Fs / N, where Fs is the sample rate and N is the size of the FFT. The next bin is 2 * Fs / N. To express this in general terms, the nth bin is n * Fs / N.
# (It would appear to me that n * Fs / N gives you the hertz, and you can use sqrt(real portion of number*r + imaginary portion*i) to find the magnitude of the signal
Currently, this will load the sound file, unpack it into a struct, and plot the sound file so that I can look at it, but I don't think it's getting all of the audio file, or it's not getting it correctly. Am I reading the wave file into the struct correctly? Are there any up-to-date resources on using Python to read and analyze wave / audio files? Any help would be greatly appreciated.
Perhaps you should try the SciPy io.wavefile module:
http://docs.scipy.org/doc/scipy/reference/io.html
Related
Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.
But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.
Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?
This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).
import wave
import struct
# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')
# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']
# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)
number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip
format = '<{}h'.format(number_of_channels)
print('# Channels:', format)
# Read >> Second
for index in range(number_of_frames):
frame = inf.readframes(1)
data = struct.unpack(format, frame)
# Here, I change data to (0, 0), testing purposes
print('Before Audio Data:', data)
print('After Modifying Audio Data', (0, 0))
# Change Audio Data
data = (0, 0)
value = data[0]
value = (value * 2) // 3
outf.writeframes(struct.pack('<h', value))
# Close In File
inf.close()
# Close Out File
outf.close()
Is there a better practice or reference material if simply just modifying data segments of .wav files?
Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.
Performance comparison
Let's examine first 3 ways to read WAVE files.
The slowest one - wave module
As you might have noticed already, wave module can be painfully slow. Consider this code:
import wave
import struct
wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
For a WAVE as defined below:
Input File : 'audio.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size : 55.3M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.
The convenient one - audiofile
A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.
import audiofile as af
signal, sampling_rate = af.read(audio.wav)
This gave me on average 33 ms to read the mentioned file.
The fastest one - numpy
If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:
import numpy as np
byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)
The header structure (that tells us what offset to use) is defined here.
The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.
Modifying the audio
Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).
As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).
If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:
from scipy.io.wavfile import write
fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate
with open('audio_out.wav', 'a') as fout:
new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
write(fout, fs, new_data)
It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
Just the second day met with Python and with troubles...
I've got a lot of CD-standard (16 bit, 44100 Hz) stereo wave files and need to find their average (arithmetic mean) FIR. The algorithm is easy to say... - the sum of amplitudes for each freq. divides on the amount of files. Then the achieved FIR is being plotted and written down to the text file as the table.
I rolled over some similar posts like this exciting Python Scipy FFT wav files but there are still too many things, even alphabet, I lose touch in and compiler mistkes follow every time I try to repeat the examples.
I would appreciate any help that can move mу from the dead-end. So, these are my shy paces...
As the number of files may vary it is probably useful to have a list of files at the elbow:
import os
a = os.path.expanduser(u"~") # absolute user path var.
b = "integrator\\files" # base folder to use with files in it
c = os.path.join(a, b)
flist = os.listdir(c)
images = filter(lambda x: x.endswith('.wav'), flist) # filter non-wavs
for i in range(len(flist)):
print(flist[i])
print()
And it works fine for me! But I still cannot catch how to organize the multiple files reading, and calculating their mean FIR massive
As I keeked I need something like "global package":
import glob
import mainfile
files = glob.glob('./*.wav')
for ele in files:
f(ele)
quit()
Wherу the mainfile.py looks somethng like that:
import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
from scipy.fftpack import fft
from pylab import *
def f(filename):
fs, data = wavfile.read(filename) # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**16.)*2-1 for ele in a] # this is 16-bit track, now normalized on [-1,1)
c = fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
And here I just don;t know what should I better do with 'd's - summing in cycle or... Then this code example operated just 1 channel for plotting - I need the output FIR as seqence of pairs for each channel. Yet still it's not clear how to tweak FFT window to Hanning with at least 65536 FFT-size (oh yes, I know thу)calculations are slow as hell).
In the end we can plot and save the graph:
plt.plot(abs(c[:(d-1)]),'r')
savefig(filename+'.png',bbox_inches='tight')
... and somehow write average FIR to the txt table file
I'd be happy enough if this script worked as the console application (though at first I dreamt of kinda minimalistic GUI with ability choose any folder containing files with certian overview button and with progress bar to make sure that app is still breathing... though hard covering ten or twenty five wavs with FFT slow "scythe".
Got C:\Anaconda2 (with numpy, scipy and matplotlib properly installed) on Windows 7 x86 PC
Thank you in advance!
With regards,
Me.
I'm trying to write a wav upload function for my webapp. The front end portion seems to be working great. The problem is my backend (python). When it receives the binary data I'm not sure how to write it to a file. I tried using the basic write functon, and the sound is corrupt... Sounds like "gobbly-gook". Is there a special way to write wav files in Python?
Here is my backend... Not really much to it.
form = cgi.FieldStorage()
fileData = str(form.getvalue('data'))
with open("audio", 'w') as file:
file.write(fileData)
I even tried...
with open("audio", 'wb') as file:
file.write(fileData)
I am using aplay to play the sound, and I noticed that all the properties are messed up as well.
Before:
Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
After upload:
Unsigned 8 bit, Rate 8000 Hz, Mono
Perhaps the wave module might help?
import wave
import struct
import numpy as np
rate = 44100
def sine_samples(freq, dur):
# Get (sample rate * duration) samples on X axis (between freq
# occilations of 2pi)
X = (2*np.pi*freq/rate) * np.arange(rate*dur)
# Get sine values for these X axis samples (as integers)
S = (32767*np.sin(X)).astype(int)
# Pack integers as signed "short" integers (-32767 to 32767)
as_packed_bytes = (map(lambda v:struct.pack('h',v), S))
return as_packed_bytes
def output_wave(path, frames):
# Python 3.X allows the use of the with statement
# with wave.open(path,'w') as output:
# # Set parameters for output WAV file
# output.setparams((2,2,rate,0,'NONE','not compressed'))
# output.writeframes(frames)
output = wave.open(path,'w')
output.setparams((2,2,rate,0,'NONE','not compressed'))
output.writeframes(frames)
output.close()
def output_sound(path, freq, dur):
# join the packed bytes into a single bytes frame
frames = b''.join(sine_samples(freq,dur))
# output frames to file
output_wave(path, frames)
output_sound('sine440.wav', 440, 2)
EDIT:
I think in your case, you might only need:
packedData = map(lambda v:struct.pack('h',v), fileData)
frames = b''.join(packedData)
output_wave('example.wav', frames)
In this case, you just need to know the sampling rate. Check the wave module for information on the other output file parameters (i.e. the arguments to the setparams method).
The code I pasted will write a wav file as long as the data isn't corrupt. It was not necessary to use the wave module.
with open("audio", 'w') as file:
file.write(fileData)
I was originally reading the file in Javascript as FileAPI.readAsBinaryString. I changed this to FileAPI.readAsDataURL, and then decoded it in python using base64.decode(). Once I decoded it I was able to just write the data to a file. The .wav file was in perfect condition.
Is it possible to get python to generate a simple sound like a sine wave?
Is there a module available for this? If not, how would you go about creating your own?
Also, would you need some kind of host environment for python to run in in order to play sound, or can it be achieved just from making calls from the terminal?
If the answer is OS-dependent, I'm using a mac.
I was looking for the same thing, In the end, I wrote this code which is working fine.
import math #import needed modules
import pyaudio #sudo apt-get install python-pyaudio
PyAudio = pyaudio.PyAudio #initialize pyaudio
#See https://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 16000 #number of frames per second/frameset.
FREQUENCY = 500 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1 #seconds to play sound
BITRATE = max(BITRATE, FREQUENCY+100)
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
#generating wawes
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()
I know I'm a little late to the game on this one, but this is a pretty fantastic python project for synthesis and audio composition: https://github.com/hecanjog/pippi
It's still actively being developed, but it's been going for a while.
After wasting time on some uncompilable or non-existent projects, I discovered the python module wavebender, which offers generation of single or multiple channels of sine, square and combined waves. The results can be written either to a wavefile or to sys.stdout, from where they can be interpreted directly by aplay in real-time. Some useful examples are explained here, and are included at the project's github page.
The Python In Music wiki page has not been terribly well-kept-up, but it's a good starting point.
http://wiki.python.org/moin/PythonInMusic
I am working on a powerful synthesizer in python. I used custom functions to write directly to a .wav file. There are built in functions that can be used for this purpose. You will need to modify the .wav header to reflect the sample rate, bits per sample, number of channels, and duration of synthesis.
Here is an early version of a sin wave generator that outputs a list of values that after applying bytearray becomes suitable for writing to the data parameter of a wave file. [edit] A conversion function will need to transform the list into little endian hex values before applying the bytearray. See the WAVE PCM soundfile format link below for details on the .wav specification.[/edit]
def sin_basic(freq, time=1, amp=1, phase=0, samplerate=44100, bitspersample=16):
bytelist = []
import math
TwoPiDivSamplerate = 2*math.pi/samplerate
increment = TwoPiDivSamplerate * freq
incadd = phase*increment
for i in range(int(samplerate*time)):
if incadd > (2**(bitspersample - 1) - 1):
incadd = (2**(bitspersample - 1) - 1) - (incadd - (2**(bitspersample - 1) - 1))
elif incadd < -(2**(bitspersample - 1) - 1):
incadd = -(2**(bitspersample - 1) - 1) + (-(2**(bitspersample - 1) - 1) - incadd)
bytelist.append(int(round(amp*(2**(bitspersample - 1) - 1)*math.sin(incadd))))
incadd += increment
return bytelist
A newer version can use waveforms to modulate the frequency, amplitude, and phase of the waveform parameters. The data format makes it trivial to blend and concatenate waves together. If this seems up your alley, check out WAVE PCM soundfile format.
I like PyAudiere , which lets you play numpy arrays as sounds... I guess it jives well with my Matlab background. I believe it's cross-platform. I think scikits.audiolab does the same thing, and may be more current / better supported... seems easier to me than trying to save things as wavfiles or write them to buffers and use Python's builtin sound library.
I found these two python repositories very useful, might wanna have a look at it...
python https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
ipython : https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
[EDIT] As pointed out, here is an explanational of the two links
python one seems to have an error, but many people were able to make it run, so I'm not sure. ipython worked like a charm, so I hope you can run it.
Both of the links are supposed to take an audio as an input, preferably .wav file. Featurize it ( USE FFT : 512, step size = 512/8 ) to obtain spectrograms ( you can even visualize it ), it's a 2D matrix, and then train your Machine learning objects or do whatever you want using a matrix that represents the original audio. If you want, at anypoint, what those vectors represent you can resynthesize audio back as well.