Real time audio processing in Python

Real time audio processing in Python - python

I'm writing a program to check for glitches in an audio signal recorded by a computer. After audio has been detected, I would like to check for glitches in the first 5 seconds of data ( corresponding to 220500 samples at a sampling rate of 44.1kHz), move on to the next 5 seconds of data and check for glitches in that, then the next 5 seconds etc. I have a while loop which begins after audio has been detected, it starts reading audio samples from a stream into an array until it has 220500 samples in the array, after which it enters an if statement where it begins checking for glitches in the 220500 samples (and deletes all the elements in the array afterwards). My problem is that while this is happening , audio is still being recorded by the computer but it is not being read from the stream into the array and by the time I have exited the if statement and restarted the while loop I have missed a few seconds of audio data.
while 1:
# little endian, signed short
snd_data = array('h', stream.read(1500))
if byteorder == 'big':
snd_data.byteswap()
r.extend(snd_data)
if len(r) == 220500 or silent:
r = trim(r)
data = pack('<' + ('h'*len(r)), *r)
data = np.fromstring(data,dtype=np.int16)
glitch detection carried out here...
I am using PyAudio for recording audio
p=pyaudio.PyAudio() # start the PyAudio class
stream=p.open(format=pyaudio.paInt16,channels=1,rate=RATE,input=True,input_device_index = 1, output_device_index = 6,frames_per_buffer=1500)
I would like to know is there any way for me to continue reading from the audio stream into the array whilst carrying out the glitch detection in the if statement? Or if not, is there any other way I could go about this?

Your Answer is Multi-threading not Multiprocessing, By use python threading package and its wonderful signal feature, you can achieve what you want.

Related

How to process a wav file for WebCRT VAD on Python

I'm learning how to use py-webrtcvad library for Python and trying to pick out segments containing speech on a .wav file. Right now, I don't know how to split a .wav files into smaller frames to put it into a vad object. Here's my code below
spf = wave.open("test4.wav", "rb")
signal = spf.readframes(-1)
vad = webrtcvad.Vad(3)
sample_rate = 16000
frame_duration = 10 # ms
#TODO: turn signal into segments and run the below print statement in a for loop
print('Contains speech: %s' % (vad.is_speech(signal, sample_rate)))
As I have read from https://github.com/wiseman/py-webrtcvad, the frames fed into the vad object must be 16-bit and either 10,20 or 30ms long. My sample is about 9s long and now I don't know how to split it into the appropriate frames. I have tested the example.py in the repo, but it is a bit too complicated for me so I tried to do a very simple example first.
Thank you.

Trying to split audio into 20ms chunks and making a spectrogram but its looking weird

I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar

Sounddevice's wait() method not working for short sounds

For a project including a moving robot arm, i need a "Geiger-Müller counter"-like distance-based alarm.
For that i wrote a python module and tried to add the posibility, if the robot arm is on the left side of an object, the sound is only on the left speaker, right side analogue.
For that i looked into the sounddevice library, where an easy channel mapping is possible.
As can be seen in the code-snippet 1 below, and like in the docu, i am calling sd.play() and waiting with sd.wait() till the sound finishes.
When i am using a sample-sound with a length of 6 secs, everything works fine. But if i am using the desired sound, a short beep-sound (under 1 sec) the code is not working. The script ends after ~1 sec without any sound.
I can fix this via adding a sleep-statement (commented out in the code).
But for the context i need a smaller playing-window than 0.5 sec.
Does anyone know how i can fix this or work around?
I tried to use pyaudio instead, but wasnt able to get the mono .wav file into a stereo byte array, for switching the audio dynamically on left/right speaker. (code-snippet2)
But there i am always encountering an error ("asscii codec cant decode byte on pos 2: ordinal not in range(128)")
Snippet1:
import sounddevice as sd
import soundfile as sf
data, fs = sf.read(args.filename, dtype='float32')
byte = np.array(data)
sd.play(byte, fs)
#time.sleep(0.5)
status = sd.wait()
Snippet2:
data = wave.readframes(chunkSize)
stereo_signal = np.zeros([len(data), 2])
stereo_signal[:,0] = data[:]
stereo_signal[:,1] = 0

pydub audio glitches when splitting/joining mp3

I'm experimenting with pydub, which I like very much, however I am having a problem when splitting/joining an mp3 file.
I need to generate a series of small snippets of audio on the server, which will be sent in sequence to a web browser and played via an <audio/> element. I need the audio playback to be 'seamless' with no audible joins between the separate pieces. At the moment however, the joins between the separate bits of audio are quite obvious, sometimes there is a short silence and sometimes a strange audio glitch.
In my proof of concept code I have taken a single large mp3 and split it into 1-second chunks as follows:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
segment = song[p1:p2] # 1 second of audio
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
song_pos += 1
The client_data values are streamed to the browser over a long-lived http connection:
socket.send("HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: audio/mp3\r\n\r\n")
and then for each new chunk of audio
socket.send(client_data)
Can anyone explain the glitches that I am hearing, and suggest a way to eliminate them?

Upgrading my comment to an answer:
The primary issue is that MP3 codecs used by ffmpeg add silence to the end of the encoded audio (and your approach is producing multiple individual audio files).
If possible, use a lossless format like wave and then reduce the file size with gzip or similar. You may also be able to use lossless audio compression (for example, flac) but it probably depends on how the encoder works.
I don't have a conclusive explanation for the audible artifacts you're hearing, but it could be that you're splitting the audio at a point where the signal is non-zero. If a sound begins with a sample with a value of 100 (for example), that would sound like a digital popping sound. The MP3 compression may also alter the sound though, especially at lower bit rates. If this is the issue, a 1ms fade in will eliminate the pop without a noticeable audible "fade" (though potentially introduce other artifacts) - a longer fade in (like 20 or 50 ms would avoid strange frequency domain artifacts but would introduce noticeable a "fade in".
If you're willing to do a little more (coding) work, you can search for a "zero crossing" (basically, a place where the signal is at a zero point naturally) and split the audio there.
Probably the best approach if it's possible:
Encode the entire signal as a single, compressed file, and send the bytes (of that one file) down to the client in chunks for playback as a single stream. If you use constant bitrate mp3 encoding (CBR) you can send almost perfectly 1 second long chunks just by counting bytes. e.g., with 256kbps CBR, just send 256 KB at a time.

So, I could be totally wrong I don't usually mess with audio files but it could be an indexing issue. try,
p2 = p1 + 1001
but you may need to invert the concatenation process for it to work. Unless you add an extra millisecond on the end.
The only other thing I would think it could be is an artifact in the stream that enters when you convert the bytes to string. Try, using the AudioSegment().raw_data endpoint for a bytes representation of the audio.

Sound is waveform and you are connecting two waves that are out of phase from one another; so you get a step function and that makes the pop.
I'm unfamiliar with this software but codifying Nils Werner's suggestions, you might try:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
# begin with a millisecond of blank
segment = AudioSegment.silent(duration=1)
# append all your pieces to it
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
#append an item to your segment with several milliseconds of crossfade
segment.append(song[p1:p2], crossfade=50)
song_pos += 1
# then pass it on to your client outside of your loop
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
depending upon how low/high the frequency of what you're joining you'll need to adjust the crossfade time to blend; low frequency will require more fade.

encode binary to audio python or C

using C or python (python preferred), How would i encode a binary file to audio that is then outputted though the headphone jack, also how would i decode the audio back to binary using input from the microphone jack, so far i have learned how to covert a text file to binary using python, It would be similar to RTTY communication.
This is so that i can record data onto a cassette tape.
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
f = open('/Users/kyle/Desktop/file_test.txt', 'w')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
f.write(b)
f.close()

From your comments, you want to process the binary data bit by bit, turning each bit into a high or low sound.
You still need to decide exactly what those high and low sounds are, and how long each one sounds for (and whether there's a gap in between, and so on). If you make it slow, like 1/4 of a second per sound, then you're treating them as notes. If you make it very fast, like 1/44100 of a second, you're treating them as samples. The human ear can't hear 44100 different sounds in a second; instead, it hears a single sound at up to 22050Hz.
Once you've made those decisions, there are two parts to your problem.
First, you have to generate a stream of samples—for example, a stream of 44100 16-bit integers for every second. For really simple things, like playing a chunk of a raw PCM file in 44k 16-bit mono format, or generating a square wave, this is trivial. For more complex cases, like playing a chunk of an MP3 file or synthesizing a sound out of sine waves and filters, you'll need some help. The audioop module, and a few others in the stdlib, can give you the basics; beyond that, you'll need to search PyPI for appropriate modules.
Second, you have to send that sample stream to the headphone jack. There's no built-in support for this in Python. On some platforms, you can do this just by opening a special file and writing to it. But, more generally, you will need to find a third-party library on PyPI.
The simpler modules work for one particular type of audio system. Mac and Windows each have their own standards, and Linux has a half dozen different ones. There are also some Python modules that talk to higher-level wrappers; you may have to install and set up the wrapper, but once you do that, your code will work on any system.
So, let's work through one really simple example. Let's say you've got PortAudio set up on your system, and you've installed PyAudio to talk to it. This code will play square waves of 441Hz and 220.5Hz (just above middle C and low C) for just under 1/4th of a second (just because that's really easy).
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = b''.join(sample_stream)
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(8),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)

So you want to transmit digital information using audio? Basically you want to implement a MODEM in software (no matter if it is pure software, it's still called modem).
A modem (MOdulator-DEModulator) is a device that modulates an analog carrier signal to encode digital information, and also demodulates such a carrier signal to decode the transmitted information. The goal is to produce a signal that can be transmitted easily and decoded to reproduce the original digital data. Modems can be used over any means of transmitting analog signals, from light emitting diodes to radio. [wikipedia]
There are modems everywhere you need to transmit data over an analog media, be it sound, light or radio waves. Your TV remote probably is an infrared modem.
Modems implemented in pure software are called soft-modems. Most soft-modems I see in the wild are using some form of FSK modulation:
Frequency-shift keying (FSK) is a frequency modulation scheme in which digital information is transmitted through discrete frequency changes of a carrier wave.1 The simplest FSK is binary FSK (BFSK). BFSK uses a pair of discrete frequencies to transmit binary (0s and 1s) information.2 With this scheme, the "1" is called the mark frequency and the "0" is called the space frequency. The time domain of an FSK modulated carrier is illustrated in the figures to the right. [wikipedia]
There are very interesting applications for data transmission through atmosphere via sound waves - I guess it is what shopkick uses to verify user presence.
For Python check the GnuRadio project.
For a C library, look at the work of Steve Underwood (but please don't contact him with silly questions). I used his soft-modem to bootstrap a FAX to email gateway for Asterisk (a fax transmission is not much more than a B/W TIFF file encoded in audio for transmission over a phone line).

So, here is Abarnert's code, updated to python3.
import binascii
import pyaudio
a = open('/home/ian/Python-programs/modem/testy_mcTest.txt', 'rb')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = ''.join(map(str, sample_stream))
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(1),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()

If you're looking for a library that does this, I'd recommend libquiet. It uses an existing SDR library to perform its modulation, and it provides binaries that will offer you a pipe at one end and will feed the sound right to your soundcard using PortAudio at the other. It has GMSK for "over the air" low bitrate transmissions and QAM for cable-based higher bitrate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.