I'm just getting started with pyaudio and I wrote a simple function to play a note. However the note sounds different depending on the version of Python I'm using:
from __future__ import division
import math
import pyaudio
BITS_PER_BYTE = 8 # for clarity
SAMPLE_BIT_DEPTH = 8 # i.e. each sample is 1 byte
SAMPLES_PER_SECOND = 16000
NOTE_TIME_SECONDS = 1
MIDDLE_C_HZ = 523.3
CYCLES_PER_SECOND = SAMPLES_PER_SECOND / MIDDLE_C_HZ
NUM_SAMPLES = SAMPLES_PER_SECOND * NOTE_TIME_SECONDS
def play_note():
audio = pyaudio.PyAudio()
stream = audio.open(
format=audio.get_format_from_width(SAMPLE_BIT_DEPTH / BITS_PER_BYTE),
channels=1,
rate=SAMPLES_PER_SECOND,
output=True,
)
byte_string = str()
for i in range(NUM_SAMPLES):
# calculate the amplitude for this frame as a float between -1 and 1
frame_amplitude = math.sin(i / (CYCLES_PER_SECOND / math.pi))
# scale the amplitude to an integer between 0 and 255 (inclusive)
scaled_amplitude = int(frame_amplitude * 127 + 128)
# convert amplitude to byte string (ascii value)
byte_string += chr(scaled_amplitude)
stream.write(byte_string)
stream.close()
audio.terminate()
if __name__ == '__main__':
play_note()
In Python 2.7.13 I hear the correct, clear tone. In 3.6.2 it sounds rough, like a square wave.
Why is that, and how would I fix this (or at least start to debug)?
I am on OSX v10.11.6 using portaudio v19.6.0.
It's because you're using a str when you should be using bytes.
This works for me:
byte_array = bytearray() # bytearray instead of str
for i in range(NUM_SAMPLES):
frame_amplitude = math.sin(i / (CYCLES_PER_SECOND / math.pi))
scaled_amplitude = int(frame_amplitude * 127 + 128)
# Note the append here, not +=
byte_array.append(scaled_amplitude)
stream.write(bytes(byte_array))
Related
Looking for some (simple) python tone generator to use in following script running on raspi with USB sound card. On-fly tone on/off and frequency change are required.
import serial, time
ser = serial.Serial('/dev/ttyUSB0', 9600, timeout=0.1)
def monitor(inp=0):
if inp != inpold:
if inp != 0:
ser.setDTR(1) # LED on (GPIO?)
# start tone here, generate tone forever or change tone freq
else:
ser.setDTR(0) # LED off
# stop tone without clicks
inpold = inp
While True:
time.sleep(0.01) # min lenght tone pulse 10 milliseconds
input = ser.getCTS() # or GPIO input
monitor(input)
So I've found several ways to do this and I am going to lay them in order of feasibility (easiest to apply first):-
Assumptions about the tone:-
Wave type = Sinusodial
Frequency = 440Hz
Way 1 (Offline track, no sound device/backend hassle)
1- Use the Audacity software (or any similar software) to create a
particular tone and export it to a file.
2- From Audacity, pick "Generate" from the tabs above then choose
"Tone" and put 440 next to the frequency.
3- From Audacity, pick "File" from the tabs above then choose "Export" and select
export as any extension you like, preferably mp3. 'out.mp3'
4- pip install playsound
5- In python
import playsound
playsound.playsound('out.mp3')
Way 2 (flexible, but got to make sure backend works fine)
1- pip install pygame
2- If you're working under a Linux environment then please make sure you install the following libraries
libsdl1.2-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev
3- In python
import numpy
import pygame
sampleRate = 44100
freq = 440
pygame.mixer.init(44100,-16,2,512)
# sampling frequency, size, channels, buffer
# Sampling frequency
# Analog audio is recorded by sampling it 44,100 times per second,
# and then these samples are used to reconstruct the audio signal
# when playing it back.
# size
# The size argument represents how many bits are used for each
# audio sample. If the value is negative then signed sample
# values will be used.
# channels
# 1 = mono, 2 = stereo
# buffer
# The buffer argument controls the number of internal samples
# used in the sound mixer. It can be lowered to reduce latency,
# but sound dropout may occur. It can be raised to larger values
# to ensure playback never skips, but it will impose latency on sound playback.
arr = numpy.array([4096 * numpy.sin(2.0 * numpy.pi * freq * x / sampleRate) for x in range(0, sampleRate)]).astype(numpy.int16)
arr2 = numpy.c_[arr,arr]
sound = pygame.sndarray.make_sound(arr2)
sound.play(-1)
pygame.time.delay(1000)
sound.stop()
Way 3 (Sinusoidal Wave)
use this if all you need is a sinusoidal wave
1- pip install pysine
2- if you're working under a Linux environment then please make sure you install the following library
portaudio19-dev
however, if you're working under a Windows environment then please make sure you install this using pipwin
pipwin install pysine
3- In python
import pysine
pysine.sine(frequency=440.0, duration=1.0)
Try pysinewave. It allows you to start, stop, and smoothly change pitch and volume of a tone.
Example:
from pysinewave import SineWave
import time
sinewave = SineWave(pitch = 12)
sinewave.play()
time.sleep(1)
sinewave.stop()
Spend a lot of time with pyaudio but with pygame is very simple. Thanks
http://shallowsky.com/blog/programming/python-play-chords.html
import pygame, pygame.sndarray
import numpy
import scipy.signal
from time import sleep
sample_rate = 48000
pygame.mixer.pre_init(sample_rate, -16, 1, 1024)
pygame.init()
def square_wave(hz, peak, duty_cycle=.5, n_samples=sample_rate):
t = numpy.linspace(0, 1, 500 * 440/hz, endpoint=False)
wave = scipy.signal.square(2 * numpy.pi * 5 * t, duty=duty_cycle)
wave = numpy.resize(wave, (n_samples,))
return (peak / 2 * wave.astype(numpy.int16))
def audio_freq(freq = 800):
global sound
sample_wave = square_wave(freq, 4096)
sound = pygame.sndarray.make_sound(sample_wave)
# TEST
audio_freq()
sound.play(-1)
sleep(0.5)
sound.stop()
audio_freq(1000)
#sleep(1)
sound.play(-1)
sleep(2)
sound.stop()
sleep(1)
sound.play(-1)
sleep(0.5)
There is an example of tone generation in realtime using PySDL2. UP and DOWN keys are used to change the frequency.
import sys
import sdl2
import sdl2.ext
import math
import struct
import ctypes
basefreq = 110
nframes = 0
#ctypes.CFUNCTYPE(None, ctypes.c_void_p, ctypes.POINTER(sdl2.Uint8), ctypes.c_int)
def playNext(notused, stream, len):
global nframes
for i in range(0, len, 4):
t = (nframes + i) / 44100
left = int(math.sin(2 * math.pi * t * (basefreq - 1)) * 32000)
right = int(math.sin(2 * math.pi * t * (basefreq + 1)) * 32000)
stream[i] = left & 0xff
stream[i+1] = (left >> 8) & 0xff
stream[i+2] = right & 0xff
stream[i+3] = (right >> 8) & 0xff
nframes += len
def initAudio():
spec = sdl2.SDL_AudioSpec(0, 0, 0, 0)
spec.callback = playNext
spec.freq = 44100
spec.format = sdl2.AUDIO_S16SYS
spec.channels = 2
spec.samples = 1024
devid = sdl2.SDL_OpenAudioDevice(None, 0, spec, None, 0)
sdl2.SDL_PauseAudioDevice(devid, 0)
def run():
global basefreq
sdl2.SDL_Init(sdl2.SDL_INIT_AUDIO | sdl2.SDL_INIT_TIMER | sdl2.SDL_INIT_VIDEO)
window = sdl2.ext.Window("Tone Generator", size=(800, 600))
window.show()
running = True
initAudio()
while running:
events = sdl2.ext.get_events()
for event in events:
if event.type == sdl2.SDL_QUIT:
running = False
break
elif event.type == sdl2.SDL_KEYDOWN:
if event.key.keysym.sym == sdl2.SDLK_UP:
basefreq *= 2
elif event.key.keysym.sym == sdl2.SDLK_DOWN:
basefreq /= 2
break
sdl2.SDL_Delay(20)
return 0
if __name__ == "__main__":
sys.exit(run())
I tried convert PCM data from wav file and FFT to frequency chart.
Here is my chart.
0.00s 512 sample count
3.15s 512 sample count
The sound file almost quietly and have some knock sound start at 3s.
I noticed near 0 the value very high. But how it can be!
Another strange point is "the value is 0 when frequency greater than about 16000".
Here is my code:
import soundfile as sf
import numpy as np
import math
import matplotlib.pyplot as plt
_audio_path = 'source_normal.wav'
def plot_data(pcm_data, samplerate, current_time):
x_axis = np.arange(0, len(pcm_data) - 1) / len(pcm_data) * samplerate
complex_data = [x+0j for x in pcm_data]
result = np.fft.fft(complex_data)
length = len(pcm_data) // 2
amplitudes = [math.sqrt(x.imag * x.imag + x.real * x.real) for x in result[:length]]
plt.plot(x_axis[:length], amplitudes)
plt.title('{}s sample count: {}'.format(current_time, len(pcm_data)))
plt.xlabel('{}Hz'.format(samplerate))
plt.show()
def baz():
data, samplerate = sf.read(_audio_path, dtype='int16')
window = 512
total_number_of_data = len(data)
current_index = 0 # 144000
while current_index < total_number_of_data:
d = data[current_index:current_index+window]
current_time = current_index / samplerate
print('current time: {}'.format(current_index / samplerate))
plot_data(d, samplerate, current_time)
current_index += window
if __name__ == '__main__':
baz()
I not familiar with DSP and never tried before. So I think my code have some mistake, please help, thank you.
here is my sound file sound file
This high value you see on the first plot is caused by the constant component in the window. Try normalization: shift all window's values by its average.
Tail zeros are just amplitudes small enough to look like zeros. Check out their values to ensure ;)
I want to write a signal in a .wav file, but when I do this using
scipy.io.wavfile.write it just create me a .wav without sound.
The .wav has the good length but there is no sound.
I looked for a solution for this problem but I couldn't find help.
My code below :
import scipy as sp
import numpy as np
dt = np.dtype(np.int32)
sig = np.fromfile(filename, dtype=dt, count=-1, sep='')
sp.io.wavfile.write('sound.wav', int(fS), sig)
As a test, I also did a little function :
def write_wav_sin(name,fs,f):
x = np.linspace(0,10,10*fs)
dt = np.dtype(np.float32)
sig = np.sin(2*math.pi*f*x, dtype=dt)
print(type(sig[0]))
sp.io.wavfile.write(name, fs, sig)
plt.plot(x,sig)
With this test it works, but with my other code it doesn't work
Someone knows why I have this problem ?
Check the range of values in sig by printing sig.min() and sig.max(). The values are not scaled by wavfile.write, so it might be that you have a file with values so low that you can't hear them.
Try scaling up the 32 bit integer values, or writing the data as normalized 32 bit floating point. For example, this converts sig to 32 bit floating point values in the range [-1, 1] before saving it:
m = np.max(np.abs(sig))
sigf32 = (sig/m).astype(np.float32)
sp.io.wavfile.write('sound.wav', int(fS), sigf32)
Finally I divided all my signal to have an amplitude max way more little ( my signal had sometimes an amplitude of 500000, to write it in a Wav I divided it by 250000).
With that trick I can listen to the sound but there is something weird, like additionnal artifacts/noise ( I compared it to a .wav obtained with matlab , with the same file )
the code I used is :
import scipy as sp
import numpy as np
dt = np.dtype(np.int32)
sig = np.fromfile(filename, dtype=dt, count=-1, sep='')
sp.io.wavfile.write('sound.wav', int(fS), sig/250000)
Here's a commented example on how to generate a basic wave file with a set duration, frequency, volume and number of samples. Utilizing NumPy and Python's wave library.
import numpy as ny
import struct
import wave
class SoundFile:
def __init__(self, signal):
# https://docs.python.org/3.6/library/wave.html#wave.open
self.file = wave.open('test.wav', 'wb')
self.signal = signal
self.sr = 44100
def write(self):
# https://docs.python.org/3.6/library/wave.html#wave.Wave_write.setparams
self.file.setparams( ( 1, 2, self.sr, 44100 * 4, 'NONE', 'noncompressed' ) )
# https://docs.python.org/3.6/library/wave.html#wave.Wave_write.writeframes
self.file.writeframes( self.signal )
self.file.close()
# signal settings
duration = 4 # duration in Seconds
samplerate = 44100 # Hz (frequency)
samples = duration * samplerate # aka samples per second
frequency = 440 # Hz
period = samplerate / float( frequency ) # of samples
omega = ny.pi * 2 / period # calculate omega (angular frequency)
volume = 16384 # 16384 is the volume measure (max is 32768)
# create sin wave
xaxis = ny.arange( samples, dtype = ny.float )
ydata = volume * ny.sin( xaxis * omega )
# fill blanks
signal = ny.resize( ydata, ( samples, ) )
#create sound file
f = SoundFile( signal )
f.write()
print( 'sound file created' )
Did my best to comment, update, and modify this source by a random blogger.
"""Play a fixed frequency sound."""
from __future__ import division
import math
from pyaudio import PyAudio
def sine_tone(frequency, duration, volume=1, sample_rate=22050):
n_samples = int(sample_rate * duration)
restframes = n_samples % sample_rate
p = PyAudio()
stream = p.open(format=p.get_format_from_width(1), # 8bit
channels=1, # mono
rate=sample_rate,
output=True)
s = lambda t: volume * math.sin(2 * math.pi * frequency * t / sample_rate)
samples = (int(s(t) * 0x7f + 0x80) for t in range(n_samples))
for buf in zip(*[samples]*sample_rate): # write several samples at a time
stream.write(bytes(bytearray(buf)))
# fill remainder of frameset with silence
stream.write(b'\x80' * restframes)
stream.stop_stream()
stream.close()
p.terminate()
def playScale(scale):
for x in scale:
print(x)
sine_tone(frequency = x,
duration = 1,
volume=.5,
sample_rate = 50000)
The playScale function accepts an array of frequencies and plays them using the sine_tone function. How do I save this series of sounds into .WAV file or a .MP3 file?
you should write all the audio data to one stream, then you can save this stream using the 'wave' library in python which is cappable of manipulating wave files.
However, with your current code i'm not sure how it would work as you are writing seperate streams per sound / tone. Might want to pass a stream into that function so you can append too that stream and save with a different function later wwhen all audio is rendered.
https://docs.python.org/2/library/wave.html
I'm looking for a way to find out the duration of a audio file (.wav) in python. So far i had a look at python wave library, mutagen, pymedia, pymad i was not able to get the duration of the wav file. Pymad gave me the duration but its not consistent.
The duration is equal to the number of frames divided by the framerate (frames per second):
import wave
import contextlib
fname = '/tmp/test.wav'
with contextlib.closing(wave.open(fname,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = frames / float(rate)
print(duration)
Regarding #edwards' comment, here is some code to produce a 2-channel wave file:
import math
import wave
import struct
FILENAME = "/tmp/test.wav"
freq = 440.0
data_size = 40000
frate = 1000.0
amp = 64000.0
nchannels = 2
sampwidth = 2
framerate = int(frate)
nframes = data_size
comptype = "NONE"
compname = "not compressed"
data = [(math.sin(2 * math.pi * freq * (x / frate)),
math.cos(2 * math.pi * freq * (x / frate))) for x in range(data_size)]
try:
wav_file = wave.open(FILENAME, 'w')
wav_file.setparams(
(nchannels, sampwidth, framerate, nframes, comptype, compname))
for values in data:
for v in values:
wav_file.writeframes(struct.pack('h', int(v * amp / 2)))
finally:
wav_file.close()
If you play the resultant file in an audio player, you'll find that is 40 seconds in duration. If you run the code above it also computes the duration to be 40 seconds. So I believe the number of frames is not influenced by the number of channels and the formula above is correct.
the librosa library can do this: librosa
import librosa
librosa.get_duration(filename='my.wav')
A very simple method is to use soundfile (formerly pysoundfile).
Here's some example code of how to do this:
import soundfile as sf
f = sf.SoundFile('447c040d.wav')
print('samples = {}'.format(f.frames))
print('sample rate = {}'.format(f.samplerate))
print('seconds = {}'.format(f.frames / f.samplerate))
The output for that particular file is:
samples = 232569
sample rate = 16000
seconds = 14.5355625
This aligns with soxi:
Input File : '447c040d.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:14.54 = 232569 samples ~ 1090.17 CDDA sectors
File Size : 465k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
we can use ffmpeg to get the duration of any video or audio files.
To install ffmpeg follow this link
import subprocess
import re
process = subprocess.Popen(['ffmpeg', '-i', path_of_wav_file], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout, stderr = process.communicate()
matches = re.search(r"Duration:\s{1}(?P<hours>\d+?):(?P<minutes>\d+?):(?P<seconds>\d+\.\d+?),", stdout.decode(), re.DOTALL).groupdict()
print(matches['hours'])
print(matches['minutes'])
print(matches['seconds'])
import os
path="c:\\windows\\system32\\loopymusic.wav"
f=open(path,"r")
#read the ByteRate field from file (see the Microsoft RIFF WAVE file format)
#https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
#ByteRate is located at the first 28th byte
f.seek(28)
a=f.read(4)
#convert string a into integer/longint value
#a is little endian, so proper conversion is required
byteRate=0
for i in range(4):
byteRate=byteRate + ord(a[i])*pow(256,i)
#get the file size in bytes
fileSize=os.path.getsize(path)
#the duration of the data, in milliseconds, is given by
ms=((fileSize-44)*1000)/byteRate
print "File duration in miliseconds : " % ms
print "File duration in H,M,S,mS : " % ms/(3600*1000) % "," % ms/(60*1000) % "," % ms/1000 % "," ms%1000
print "Actual sound data (in bytes) : " % fileSize-44
f.close()
Let,T be the duration between 2 consecutive samples. So, we can write t = nT or t = n/Fs.
from scipy.io import wavfile
Fs, data = wavfile.read('filename.wav')
n = data.size
t = n / Fs
I was trying to get the length of different format of an audio file other than '.wav' and I tried a few of the above solution but didn't work for me
This is what worked for me :
from pydub.utils import mediainfo
mediainfo('audiofile')['duration']
To find length of music file, audioread module can be used,
install audioread: pip install audioread
then use this code:
import audioread
with audioread.audio_open(filepath) as f:
totalsec = f.duration
min,sec = divmod(totalsec,60) # divides total time in minute and second
#and store it in min and sec variable respectively
Another solution with pydub:
import pydub
audio_seg = AudioSegment.from_wav('mywav.wav')
total_in_ms = len(audio_seg)
This is short and needs no modules, works with all operating systems:
import os
os.chdir(foo) # Get into the dir with sound
statbuf = os.stat('Sound.wav')
mbytes = statbuf.st_size / 1024
duration = mbytes / 200