PyAudio Recording and Playing Back in Real Time

PyAudio Recording and Playing Back in Real Time - python

I am trying to record audio from the microphone and then play that audio through the speakers. Eventually I want to modify the audio before playing it back, but I'm having trouble taking the input data and successfully play it back through the speakers.
The format for the input stream I'm using is Int16 and for the output stream is Float32. These were the only ones which made any sound at all (albeit a demonic one).
First I tried simply putting the input data into the output stream. This outputs a demonic sound:
import pyaudio
import numpy as np
import struct
FORMATIN = pyaudio.paInt16
FORMATOUT = pyaudio.paFloat32
CHANNELS = 1
RATE = 44100
CHUNK = 1024
audio = pyaudio.PyAudio()
# start Recording
streamIn = audio.open(format=FORMATIN, channels=CHANNELS,
rate=RATE, input=True, input_device_index=0,
frames_per_buffer=CHUNK)
streamOut = audio.open(format=FORMATOUT, channels=CHANNELS,
rate=RATE, output=True, input_device_index=0,
frames_per_buffer=CHUNK)
print("recording...")
while True:
in_data = streamIn.read(CHUNK)
streamOut.write(in_data)
in_data is as follows when printed:
1\x00\x12\x00\x0f\x00\x05\x00\x14\x00\x1e\x00\x16\x00\x14\x00\x12\x00\x10\x00\x02\x00\xf7\xff\xf7\xff\xd4\xff\xde\xff\xf8\xff\xd3\xff\xe9\xff\x14\x00#\x00Z\x00\xb9\xfft\xff\xce\x00\x93\x01\xc2\xff\xe4\xfe\x93\x00d\x00\xca\xff\x94\x01V\x01\xc8\xffS\x00t\x00\xc4\xffi\x00\xaf\x01l\x00\xdb\xfeM\xffw\xffp\x01\xf5\xffr\xfc\x97\x00~\x02S\x00\x97\x00v\x00\x87\xfe\xb7\xfc\x81\xff\xf6\x00\xef\x00\xc4\x03\x84\x02\x99\xfd`\xfc\xe2\x01b\x03\xda\xfe\xc4\xff\xfd\x00:\x00\xc6\x00\xf1\xfcV\xfd\xf0\x02\xdc\xff&\xff\xa1\x02\xc7\xff\xf5\xfe\xa9\xfe\x99\xfa\x06\xfdo\x04\xaa\x02\x8f\xfe\xec\x00\x1b\xffZ\xfe;\x01t\xfe<\xffd\x02<\x02\x04\x02\xcd\xfd\xe8\xfd\xf3\x00i\xfcD\xfa\x86\xfe\xb3\x01\xea\x00$\x00q\x00\x03\x022\x00d\xf9\x14\xfa\x86\xfdQ\xfd\xc5\xfe\x81\x02\xc2\x02=\x01\xfc\x00\xe5\xfd\t\xff\x93\xff\x83\xffd\x00(\xfeQ\xffM\x01\xb1\x01\xde\xfdE\xfd\xfe\xff\x00\x00\x06\x00\x02\xffV\xff\xcd\xffJ\xff\xfb\xfc\x86\xfd^\x00\x8d\x00\x91\xff\xb6\xfe\xf7\x00\x95\x01E\x00\x1b\xff9\xfe8\xff\xa7\xff\xd4\xff\xdd\xff\xb0\x00\x97\x01\xe8\x00\xa7\xff\xd8\xfe\x89\xff\x0c\x00\x81\xff\x81\xfe\xd1\xfeN\x00\x1a\x01\xcb\x00\x19\x00\x90\x00`\x00\x93\xff5\xff\x9b\xff\\\x00\x08\x00\xc0\xff,\x00\xc0\x00\xba\x00\x83\x00\x0f\x00\xf5\xffY\x00\x19\
Then I tried changing in_data to Float32, but that did not work either:
in_data = np.frombuffer(in_data, np.float32))
I tried various clipping and packing of the data, none of which worked:
in_data = np.clip(in_data, -2**15+1, 2**15-1)
in_data = struct.pack('d' * 1024, *in_data)
Does anyone know how to record audio from the microphone and then output it through speakers? Thank you.

Set FORMATOUT =FORMATIN.
Currently, your code does the following:
44100 times per second, a frame is recorded
each frame is a 16 bit signed number (16 bit LPCM). It takes 2 bytes to encode a frame. This is the FORMATIN = pyaudio.paInt16 setting you chose.
when 1024 frames have been recorded (this takes ca 23 ms), these are returned as a bytes-object in python. It consists of 2048 bytes. you call this variable in_data
then, you pass these 2048 bytes to the output device via the .write call.
the output device works in pyaudio.paFloat32, which means that it believes each frame is 32 bits (4 bytes). It concludes that you have provided 2048/4=512 frames for it to play back. the output unit is set to 44100Hz as well, so it takes ca 12 ms to play back. the values it plays back are a mess, since it tries to interpret integers as floats. both the bitrate and the encoding mismatches, and the sound in your speaker seems to be from tormented souls in the purgatory.
then the whole process repeats
matching the input and output format should resolve these issues.

Audio data with 16-bit signed integer format will have values between 32768 and -32767. Data with float (32 bit or 64 bit) will be in range 1.0 to -1.0.
I would recommend doing all processing in floating point in Python. So try to do in_data = (in_data / 32768), before processing or sending to the output.

if you useing linux you can put
os.system("pactl load-module module-loopback latency_msec=1")
at the beginning of the script and
os.system("/usr/bin/pulseaudio --kill")
at the end
pls tell me if it work now

Related

How to record microphone on macos with pyaudio?

I made a simple voice assistant in python with speech_recognition on Windows 10 and I wanted to copy the code for macOs too.
I downloaded PortAudio and PyAudio, the code runs fine but when i play the audio track I hear nothing :( (and the program not detect when I try to use the speech_recognition)
I guess it something with permissions and things like that... anyone have an idea?
( I also checked I use the right device index and I indeed use index 0 (The Mackbook built-in Microphone)
here is some code sample:
import pyaudio
import wave
chunk = 1024 # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16 # 16 bits per sample
channels = 1
fs = 44100 # Record at 44100 samples per second
seconds = 3
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
print('Recording')
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True)
frames = [] # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
print('Finished recording')
# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()

I found the answer!!!
The code actually worked fine all this time, the problem was that I used Visual Studio Code that for some reason messed up with the microphone permissions
Now I run the code through terminal with python [filename].py and its working great!

pydub audio glitches when splitting/joining mp3

I'm experimenting with pydub, which I like very much, however I am having a problem when splitting/joining an mp3 file.
I need to generate a series of small snippets of audio on the server, which will be sent in sequence to a web browser and played via an <audio/> element. I need the audio playback to be 'seamless' with no audible joins between the separate pieces. At the moment however, the joins between the separate bits of audio are quite obvious, sometimes there is a short silence and sometimes a strange audio glitch.
In my proof of concept code I have taken a single large mp3 and split it into 1-second chunks as follows:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
segment = song[p1:p2] # 1 second of audio
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
song_pos += 1
The client_data values are streamed to the browser over a long-lived http connection:
socket.send("HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: audio/mp3\r\n\r\n")
and then for each new chunk of audio
socket.send(client_data)
Can anyone explain the glitches that I am hearing, and suggest a way to eliminate them?

Upgrading my comment to an answer:
The primary issue is that MP3 codecs used by ffmpeg add silence to the end of the encoded audio (and your approach is producing multiple individual audio files).
If possible, use a lossless format like wave and then reduce the file size with gzip or similar. You may also be able to use lossless audio compression (for example, flac) but it probably depends on how the encoder works.
I don't have a conclusive explanation for the audible artifacts you're hearing, but it could be that you're splitting the audio at a point where the signal is non-zero. If a sound begins with a sample with a value of 100 (for example), that would sound like a digital popping sound. The MP3 compression may also alter the sound though, especially at lower bit rates. If this is the issue, a 1ms fade in will eliminate the pop without a noticeable audible "fade" (though potentially introduce other artifacts) - a longer fade in (like 20 or 50 ms would avoid strange frequency domain artifacts but would introduce noticeable a "fade in".
If you're willing to do a little more (coding) work, you can search for a "zero crossing" (basically, a place where the signal is at a zero point naturally) and split the audio there.
Probably the best approach if it's possible:
Encode the entire signal as a single, compressed file, and send the bytes (of that one file) down to the client in chunks for playback as a single stream. If you use constant bitrate mp3 encoding (CBR) you can send almost perfectly 1 second long chunks just by counting bytes. e.g., with 256kbps CBR, just send 256 KB at a time.

So, I could be totally wrong I don't usually mess with audio files but it could be an indexing issue. try,
p2 = p1 + 1001
but you may need to invert the concatenation process for it to work. Unless you add an extra millisecond on the end.
The only other thing I would think it could be is an artifact in the stream that enters when you convert the bytes to string. Try, using the AudioSegment().raw_data endpoint for a bytes representation of the audio.

Sound is waveform and you are connecting two waves that are out of phase from one another; so you get a step function and that makes the pop.
I'm unfamiliar with this software but codifying Nils Werner's suggestions, you might try:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
# begin with a millisecond of blank
segment = AudioSegment.silent(duration=1)
# append all your pieces to it
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
#append an item to your segment with several milliseconds of crossfade
segment.append(song[p1:p2], crossfade=50)
song_pos += 1
# then pass it on to your client outside of your loop
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
depending upon how low/high the frequency of what you're joining you'll need to adjust the crossfade time to blend; low frequency will require more fade.

Binary string to wav file

I'm trying to write a wav upload function for my webapp. The front end portion seems to be working great. The problem is my backend (python). When it receives the binary data I'm not sure how to write it to a file. I tried using the basic write functon, and the sound is corrupt... Sounds like "gobbly-gook". Is there a special way to write wav files in Python?
Here is my backend... Not really much to it.
form = cgi.FieldStorage()
fileData = str(form.getvalue('data'))
with open("audio", 'w') as file:
file.write(fileData)
I even tried...
with open("audio", 'wb') as file:
file.write(fileData)
I am using aplay to play the sound, and I noticed that all the properties are messed up as well.
Before:
Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
After upload:
Unsigned 8 bit, Rate 8000 Hz, Mono

Perhaps the wave module might help?
import wave
import struct
import numpy as np
rate = 44100
def sine_samples(freq, dur):
# Get (sample rate * duration) samples on X axis (between freq
# occilations of 2pi)
X = (2*np.pi*freq/rate) * np.arange(rate*dur)
# Get sine values for these X axis samples (as integers)
S = (32767*np.sin(X)).astype(int)
# Pack integers as signed "short" integers (-32767 to 32767)
as_packed_bytes = (map(lambda v:struct.pack('h',v), S))
return as_packed_bytes
def output_wave(path, frames):
# Python 3.X allows the use of the with statement
# with wave.open(path,'w') as output:
# # Set parameters for output WAV file
# output.setparams((2,2,rate,0,'NONE','not compressed'))
# output.writeframes(frames)
output = wave.open(path,'w')
output.setparams((2,2,rate,0,'NONE','not compressed'))
output.writeframes(frames)
output.close()
def output_sound(path, freq, dur):
# join the packed bytes into a single bytes frame
frames = b''.join(sine_samples(freq,dur))
# output frames to file
output_wave(path, frames)
output_sound('sine440.wav', 440, 2)
EDIT:
I think in your case, you might only need:
packedData = map(lambda v:struct.pack('h',v), fileData)
frames = b''.join(packedData)
output_wave('example.wav', frames)
In this case, you just need to know the sampling rate. Check the wave module for information on the other output file parameters (i.e. the arguments to the setparams method).

The code I pasted will write a wav file as long as the data isn't corrupt. It was not necessary to use the wave module.
with open("audio", 'w') as file:
file.write(fileData)
I was originally reading the file in Javascript as FileAPI.readAsBinaryString. I changed this to FileAPI.readAsDataURL, and then decoded it in python using base64.decode(). Once I decoded it I was able to just write the data to a file. The .wav file was in perfect condition.

howto stream numpy array into pyaudio stream?

I'm writing a code that supposed to give some audio output to the user based on his action, and I want to generate the sound rather than having a fixed number of wav files to play. Now, what I'm doing is to generate the signal in numpy format, store the data in a wav file and then read the same file into pyaudio. I think this is redundant, however, I couldn't find a way to do that. My question is, can I stream a numpy array (or a regular list) directly into my the pyaudio to play?

If its just playback and does not need to be synchronised to anything then you can just do the following:
# Open stream with correct settings
stream = self.p.open(format=pyaudio.paFloat32,
channels=CHANNELS,
rate=48000,
output=True,
output_device_index=1
)
# Assuming you have a numpy array called samples
data = samples.astype(np.float32).tostring()
stream.write(data)
I use this method and it works fine for me. If you need to record at the same time then this won't work.

If you are just looking to generate audio tones then below code may be useful,
It does need pyaudio that can be installed as
pip install pyaudio
Sample Code
#Play a fixed frequency sound.
from __future__ import division
import math
import pyaudio
#See http://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 44100 #number of frames per second/frameset.
#See http://www.phy.mtu.edu/~suits/notefreqs.html
FREQUENCY = 2109.89 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1.2 #seconds to play sound
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
#fill remainder of frameset with silence
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()
Code is slightly modified from this askubuntu site

You can directly stream the data through pyaudio, there is no need to write and read a .wav file.
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=44100,
frames_per_buffer=1024,
output=True,
output_device_index=1
)
samples = np.sin(np.arange(50000)/20)
stream.write(samples.astype(np.float32).tostring())
stream.close()

encode binary to audio python or C

using C or python (python preferred), How would i encode a binary file to audio that is then outputted though the headphone jack, also how would i decode the audio back to binary using input from the microphone jack, so far i have learned how to covert a text file to binary using python, It would be similar to RTTY communication.
This is so that i can record data onto a cassette tape.
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
f = open('/Users/kyle/Desktop/file_test.txt', 'w')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
f.write(b)
f.close()

From your comments, you want to process the binary data bit by bit, turning each bit into a high or low sound.
You still need to decide exactly what those high and low sounds are, and how long each one sounds for (and whether there's a gap in between, and so on). If you make it slow, like 1/4 of a second per sound, then you're treating them as notes. If you make it very fast, like 1/44100 of a second, you're treating them as samples. The human ear can't hear 44100 different sounds in a second; instead, it hears a single sound at up to 22050Hz.
Once you've made those decisions, there are two parts to your problem.
First, you have to generate a stream of samples—for example, a stream of 44100 16-bit integers for every second. For really simple things, like playing a chunk of a raw PCM file in 44k 16-bit mono format, or generating a square wave, this is trivial. For more complex cases, like playing a chunk of an MP3 file or synthesizing a sound out of sine waves and filters, you'll need some help. The audioop module, and a few others in the stdlib, can give you the basics; beyond that, you'll need to search PyPI for appropriate modules.
Second, you have to send that sample stream to the headphone jack. There's no built-in support for this in Python. On some platforms, you can do this just by opening a special file and writing to it. But, more generally, you will need to find a third-party library on PyPI.
The simpler modules work for one particular type of audio system. Mac and Windows each have their own standards, and Linux has a half dozen different ones. There are also some Python modules that talk to higher-level wrappers; you may have to install and set up the wrapper, but once you do that, your code will work on any system.
So, let's work through one really simple example. Let's say you've got PortAudio set up on your system, and you've installed PyAudio to talk to it. This code will play square waves of 441Hz and 220.5Hz (just above middle C and low C) for just under 1/4th of a second (just because that's really easy).
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = b''.join(sample_stream)
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(8),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)

So you want to transmit digital information using audio? Basically you want to implement a MODEM in software (no matter if it is pure software, it's still called modem).
A modem (MOdulator-DEModulator) is a device that modulates an analog carrier signal to encode digital information, and also demodulates such a carrier signal to decode the transmitted information. The goal is to produce a signal that can be transmitted easily and decoded to reproduce the original digital data. Modems can be used over any means of transmitting analog signals, from light emitting diodes to radio. [wikipedia]
There are modems everywhere you need to transmit data over an analog media, be it sound, light or radio waves. Your TV remote probably is an infrared modem.
Modems implemented in pure software are called soft-modems. Most soft-modems I see in the wild are using some form of FSK modulation:
Frequency-shift keying (FSK) is a frequency modulation scheme in which digital information is transmitted through discrete frequency changes of a carrier wave.1 The simplest FSK is binary FSK (BFSK). BFSK uses a pair of discrete frequencies to transmit binary (0s and 1s) information.2 With this scheme, the "1" is called the mark frequency and the "0" is called the space frequency. The time domain of an FSK modulated carrier is illustrated in the figures to the right. [wikipedia]
There are very interesting applications for data transmission through atmosphere via sound waves - I guess it is what shopkick uses to verify user presence.
For Python check the GnuRadio project.
For a C library, look at the work of Steve Underwood (but please don't contact him with silly questions). I used his soft-modem to bootstrap a FAX to email gateway for Asterisk (a fax transmission is not much more than a B/W TIFF file encoded in audio for transmission over a phone line).

So, here is Abarnert's code, updated to python3.
import binascii
import pyaudio
a = open('/home/ian/Python-programs/modem/testy_mcTest.txt', 'rb')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = ''.join(map(str, sample_stream))
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(1),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()

If you're looking for a library that does this, I'd recommend libquiet. It uses an existing SDR library to perform its modulation, and it provides binaries that will offer you a pipe at one end and will feed the sound right to your soundcard using PortAudio at the other. It has GMSK for "over the air" low bitrate transmissions and QAM for cable-based higher bitrate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.