I am trying to write an audio file using python's wave and numpy. So far I have the following and it works well:
import wave
import numpy as np
# set up WAV file parameters
num_channels = 1 # mono audio
sample_width = 1 # 8 bits(1 byte)/sample
sample_rate = 44.1e3 # 44.1k samples/second
frequency = 440 # 440 Hz
duration = 20 # play for this many seconds
num_samples = int(sample_rate * duration) # samples/seconds * seconds
# open WAV file and write data
with wave.open('sine8bit_2.wav', 'w') as wavfile:
wavfile.setnchannels(num_channels)
wavfile.setsampwidth(sample_width)
wavfile.setframerate(sample_rate)
t = np.linspace(0, duration, num_samples)
data = (127*np.sin(2*np.pi*frequency*t)).astype(np.int8)
wavfile.writeframes(data) # or data.tobytes() ??
My issue is that since I am using a high sampling rate, the num_samples variable might quickly become too large (9261000 samples for a 3 minute 30 seconds track say). Would using a numpy array this large be advisable? Is there a better way of going about this? Also is use of writeframes(.tobytes()) needed in this case because my code runs fine without it and it seems like extra overhead (especially if the arrays get too large).
Assuming you are only going to write a sine wave, you could very well create only one period as your data array and write that several times to the .wav file.
Using the parameters you provided, your data array is 8800 times smaller with that approach. Its size also no longer depends on the duration of your file!
import wave
import numpy as np
# set up WAV file parameters
num_channels = 1 # mono audio
sample_width = 1 # 8 bits(1 byte)/sample
sample_rate = 44.1e3 # 44.1k samples/second
frequency = 440 # 440 Hz
duration = 20 # play for this many seconds
# Create a single period of sine wave.
n = round(sample_rate/frequency)
t = np.linspace(0, 1/frequency, n)
data = (127*np.sin(2*np.pi*frequency*t)).astype(np.int8)
periods = round(frequency*duration)
# open WAV file and write data
with wave.open('sine8bit_2.wav', 'w') as wavfile:
wavfile.setnchannels(num_channels)
wavfile.setsampwidth(sample_width)
wavfile.setframerate(sample_rate)
for _ in range(periods):
wavfile.writeframes(data)
Related
I am trying to play a sound from a sawtooth wave. I created the waveform in Python and was able to save it as a WAV file, but when I try to play it it says the file is unplayable because the file type is unsupported, the file extension is incorrect, or the file is corrupt. I used this individual's tutorial (https://thehackerdiary.wordpress.com/2017/06/09/it-is-ridiculously-easy-to-generate-any-audio-signal-using-python/) and they worked around it by encoding the raw waveform from 16 bit to 8 bit in Audacity. How can this be done using only Python?
import soundfile
data, samplerate = soundfile.read('sawtooth_100_hz.wav')
soundfile.write('sawtooth_100_hz_8bit.wav', data, samplerate, subtype='PCM_S8')
^^ I tried this and got the following error: ValueError: Invalid combination of format, subtype and endian
I think the people who wrote this tutorial went for the long way. There is an easier way to convert a NumPy array to a wav file which is used below to generate the same wav file as the one generated in the tutorial:
import numpy as np
from scipy.io import wavfile
sampling_rate = 44100
freq = 440
samples = 44100
x = np.arange(samples)
y = 100*np.sin(2 * np.pi * freq * x / sampling_rate)
wavfile.write("test.wav", sampling_rate, y)
And you can use wavfile.read() method to read this file with no problem
Surprisingly, the underlying libsndfile library doesn't support WAV files with signed 8 bit samples (only unsigned), see http://www.mega-nerd.com/libsndfile/#Features.
You can also check this with the soundfile module:
>>> import soundfile as sf
>>> sf.available_subtypes('wav')
{'PCM_16': 'Signed 16 bit PCM', 'PCM_24': 'Signed 24 bit PCM', 'PCM_32': 'Signed 32 bit PCM', 'PCM_U8': 'Unsigned 8 bit PCM', 'FLOAT': '32 bit float', 'DOUBLE': '64 bit float', 'ULAW': 'U-Law', 'ALAW': 'A-Law', 'IMA_ADPCM': 'IMA ADPCM', 'MS_ADPCM': 'Microsoft ADPCM', 'GSM610': 'GSM 6.10', 'G721_32': '32kbs G721 ADPCM'}
You could try using AIFF or FLAC instead?
Or you could create a RAW file (i.e. a headerless file containing no information about its own data format), which is incidentally what they did in the tutorial you were mentioning (note that they are using these options: -t raw -e signed -b 8).
For a bit more information about creating and playing signals, see:
https://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/simple-signals.ipynb
https://nbviewer.jupyter.org/github/spatialaudio/communication-acoustics-exercises/blob/master/intro.ipynb
It sounds like you just want to generate the samples and playback from within Python?
If so, it looks like library "sounddevice" will let you write samples directly to your audio device:
https://python-sounddevice.readthedocs.io/en/0.3.15/usage.html#playback
I'm not in a python environment right now, so haven't tested, but mixing it with your sample code would just be:
import sounddevice as sd
import numpy as np
sampling_rate = 44100
freq = 440
samples = 44100
x = np.arange(samples)
y = 100*np.sin(2 * np.pi * freq * x / sampling_rate)
sd.play(y, sampling_rate)
Sounddevice's author is on SO, see his reply to a similar question: https://stackoverflow.com/a/34179010/1339735
You might have some scaling to do - not sure if it accepts values from -1 to 1 like most float playback or +/- 100 like in your example.
All of the above answers were helpful, but ultimately I found a solution to my problem from this thread: How to generate audio from a numpy array?
This was my code:
import numpy as np
from scipy.io.wavfile import write
from scipy import signal as sg
#data = np.random.uniform(-1,1,44100) # 44100 random samples between -1 and 1
sampling_rate = 44100 ## Sampling Rate
freq = 150 ## Frequency (in Hz)
duration = 3 # in seconds, may be float
t = np.linspace(0, duration, sampling_rate*duration) # Creating time vector
data = sg.sawtooth(2 * np.pi * freq * t, 0) # Sawtooth signal
'''
Scaling data to 16 bit. Divide each number by max number in array to get
fraction and multiply data by 32767 because that is the max value a 16 bit
integer can take
'''
scaled = np.int16(data/np.max(np.abs(data)) * 32767)
write('test.wav', 44100, scaled) # Write to file. Can be overridden
My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.
I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency :
(https://i.imgur.com/2UImEkR.png)
To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :
"""PyAudio Example: Play a WAVE file."""
import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
i = 0
while data != '':
i += 1
data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data)
data_np = np.array(data_unpacked)
data_fft = np.fft.fft(data_np)
data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(data_freq)
ax.set_xscale('log')
plt.show()
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is :
(https://i.imgur.com/zsAXME5.png)
I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.
EDIT:
The sampling rate is 44100. no. of channels is 2 and sample width is also 2.
Forewords
As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.
The frequency vector for FFT (half plane) is:
f = np.linspace(0, rate/2, N_fft/2)
Or (full plane):
f = np.linspace(-rate/2, rate/2, N_fft)
On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).
MCVE
Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):
import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source
# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD
# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')
Basically:
Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.
This outputs:
Find Peak
Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:
idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz
Putting all together it merely boils down to:
def freq(filename, setup={'height': 1e-2}):
rate, data = wavfile.read(filename)
f, P = signal.periodogram(data, rate)
return f[signal.find_peaks(P, **setup)[0][0]]
Handling multiple channels
I tried this code with my wav file, and got the error for the line
axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have
same first dimension. I checked the shapes and f has (2,) while
Pxx_den has (220160,2). Also, the Pxx_den array seems to have all
zeros only.
Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).
rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz
It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:
f, P = signal.periodogram(data, rate, axis=0, detrend='linear')
It also works with Transposition data.T but then we need to back transpose the result.
Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).
The detrend switch ensure the signal has zero mean and its linear trend is removed.
Real application
This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.
Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:
f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz
fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]
At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.
I have simple case, i'm recording bee sound in Bee have by python pyAudio. After recording i have to split this record in 10 sec chunks and analyze this chunks by python wave, numpy or python scipy, numpy, i don't know what is easiest way.
I would like to read the record then split it to 10 sec chunk and apply fft or rfft and after that i need get dominant frequencies in Hz in chunk.
I collect this values with timestamp in histogram for histogram plot.
Right now i have some examples where i can get whole record and make plot by python matplotlib, scipy.signal but i don't know how to separated it into listo of Hz values.
If i have completely wrong way please tell me. Thx for advice.
import numpy as np
import struct, wave
def main():
audio = wave.open('wav_data/18-08-07_09_10_12.wav', 'rb')
rate = audio.getframerate()
num_frames = audio.getnframes()
dur = int(num_frames / rate)
fmt = "%ih" % rate
for x in range(dur):
data = audio.readframes(rate)
data_int = struct.unpack(fmt, data)
data_np = np.array(data_int, dtype='b')
w = np.fft.rfft(data_np)
freqs = np.fft.fftfreq(len(w))
idx = np.argmax(w)
freq = freqs[idx]
freq_in_hz = abs(freq * rate)
print(freq_in_hz)
if __name__ == "__main__":
main()
I want to read the left and rigth channel.
import wave
origAudio = wave.open("6980.wav","r")
frameRate = origAudio.getframerate()
nChannels = origAudio.getnchannels()
sampWidth = origAudio.getsampwidth()
nbframe=origAudio.getnframes()
da = np.fromstring(origAudio.readframes(48000), dtype=np.int16)
origAudio.getparams()
the parametre
(2, 3, 48000, 2883584, 'NONE', 'not compressed')
Now I want to separate left and right channel with wave file in 24 bit data
You can use wavio, a small module that I wrote to read and write WAV files using numpy arrays. In your case:
import wavio
wav = wavio.read("6980.wav")
# wav.data is the numpy array of samples.
# wav.rate is the sampling rate.
# wav.sampwidth is the sample width, in bytes. For a 24 bit file,
# wav.sampwdith is 3.
left_channel = wav.data[:, 0]
right_channel = wav.data[:, 1]
wavio is on PyPi, and the source is on github at https://github.com/WarrenWeckesser/wavio.
The parameters tell you that you have 2 channels of data at 3 bytes per sample, at 48kHz. So when you say readframes(48000) you get one second of frames which you should probably read into a slightly different data structure:
da = np.fromstring(origAudio.readframes(48000), dtype=np.uint8)
Now you should have 48000 * 2 * 3 bytes, i.e. len(da). To take only the first channel you'd do this:
chan1 = np.zeros(48000, np.uint32)
chan1bytes = chan1.view(np.uint8)
chan1bytes[0::4] = da[0::6]
chan1bytes[1::4] = da[1::6]
chan1bytes[2::4] = da[2::6]
That is, you make an array of integers, one per sample, and copy the appropriate bytes over from the source data (you could try copying directly from the result of readframes() and skip creating da).
The following code writes a simple sine at frequency 400Hz to a mono WAV file. How should this code be changed in order to produce a stereo WAV file. The second channel should be in a different frequency.
import math
import wave
import struct
freq = 440.0
data_size = 40000
fname = "WaveTest.wav"
frate = 11025.0 # framerate as a float
amp = 64000.0 # multiplier for amplitude
sine_list_x = []
for x in range(data_size):
sine_list_x.append(math.sin(2*math.pi*freq*(x/frate)))
wav_file = wave.open(fname, "w")
nchannels = 1
sampwidth = 2
framerate = int(frate)
nframes = data_size
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, framerate, nframes,
comptype, compname))
for s in sine_list_x:
# write the audio frames to file
wav_file.writeframes(struct.pack('h', int(s*amp/2)))
wav_file.close()
Build a parallel sine_list_y list with the other frequency / channel, set nchannels=2, and in the output loop use for s, t in zip(sine_list_x, sine_list_y): as the header clause, and a body with two writeframes calls -- one for s, one for t. IOW, corresponding frames for the two channels "alternate" in the file.
See e.g. this page for a thorough description of all possible WAV file formats, and I quote:
Multi-channel digital audio samples
are stored as interlaced wave data
which simply means that the audio
samples of a multi-channel (such as
stereo and surround) wave file are
stored by cycling through the audio
samples for each channel before
advancing to the next sample time.
This is done so that the audio files
can be played or streamed before the
entire file can be read. This is handy
when playing a large file from disk
(that may not completely fit into
memory) or streaming a file over the
Internet. The values in the diagram
below would be stored in a Wave file
in the order they are listed in the
Value column (top to bottom).
and the following table clearly shows the channels' samples going left, right, left, right, ...
For an example producing a stereo .wav file, see the test_wave.py module.
The test produces an all-zero file.
You can modify by inserting alternating sample values.
nchannels = 2
sampwidth = 2
framerate = 8000
nframes = 100
# ...
def test_it(self):
self.f = wave.open(TESTFN, 'wb')
self.f.setnchannels(nchannels)
self.f.setsampwidth(sampwidth)
self.f.setframerate(framerate)
self.f.setnframes(nframes)
output = '\0' * nframes * nchannels * sampwidth
self.f.writeframes(output)
self.f.close()
Another option is to use the SciPy and NumPy libraries. In the below example, we produce a stereo wave file where the left channel has a low-frequency tone while the right channel has a higher-frequency tone. (Note: Use VLC player to play the audio)
To install SciPy, see: https://pypi.org/project/scipy/
import numpy as np
from scipy.io import wavfile
# User input
duration=5.0
toneFrequency_left=500 #Hz (20,000 Hz max value)
toneFrequency_right=1200 #Hz (20,000 Hz max value)
# Constants
samplingFrequency=48000
# Generate Tones
time_x=np.arange(0, duration, 1.0/float(samplingFrequency))
toneLeft_y=np.cos(2.0 * np.pi * toneFrequency_left * time_x)
toneRight_y=np.cos(2.0 * np.pi * toneFrequency_right * time_x)
# A 2D array where the left and right tones are contained in their respective rows
tone_y_stereo=np.vstack((toneLeft_y, toneRight_y))
# Reshape 2D array so that the left and right tones are contained in their respective columns
tone_y_stereo=tone_y_stereo.transpose()
# Produce an audio file that contains stereo sound
wavfile.write('stereoAudio.wav', samplingFrequency, tone_y_stereo)
Environment Notes
Version Used
Python 3.7.1
Python 3.7.1
SciPy 1.1.0