Speeding up the sound processing algorithm - python

I use the following code to do some immediate sound processing/analyzing. It works, but really slow (compared to the planned speed). I have added some time markers to find out where the problem is and according to them there shouldn't be any. Typical duration (see below) is <0.01 s for all three computed times but it still takes around a second to complete the loop. Where is the problem?
Edit: Please note, that the time measurement is not the real issue here. To prove that: MyPeaks basically just finds the maximum of pretty short FFT - nothing expensive. And the problem persists even when these routines are commented out.
Should I use something different than lambda function to make the cycle?
Did I make some mistake when starting and recording the stream?
etc.
import pyaudio
import struct
import mute_alsa
import time
import numpy as np
from Tkinter import *
def snd_process(k=0):
if k<1000:
t0=time.clock()
data = stream.read(CHUNK)
t1=time.clock()
fl=CHUNK
int_data = struct.unpack("%sh" %str(fl),data)
ft=np.fft.fft(int_data)
ft=np.fft.fftshift(ft)
ft=np.abs(ft)
t2=time.clock()
pks=MyPeaks(np.log(ft))
freq_out.configure(text=str(pks))
t3=time.clock()
print t1-t0, t2-t1, t3-t2
master.after(1, lambda: snd_process(k+1))
CHUNK = 8000
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 4000
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
#Tkinter stuff
master=Tk()
button_play=Button(master, command=snd_process, bg="yellow", text="Analyze")
button_play.grid(row=0, column=0)
freq_out = Label(master)
freq_out.grid(row=0, column=1)
freq_out.configure(text='base')
mainloop()

You are scheduling 1000 callback in tk main thread; for every callback you are using 1 ms delay (after()'s first argument). That means the last loop will start around after 1000 ms (1 second) the first one.
Maybe that is way the loop still takes around a second to complete.
So, try to use after_idle(). I don't think you really need to Speeding up the sound processing algorithm because np is already quite efficient.
[EDIT]
Surprise!! you are reading from audio channel at every iteration 1 second 8000 bytes in 16 bits format for a 4000 frame rate. You need a second to have it.

Squeezing I/O and calculations into the main loop like you are doing is the classical solution. But there are alternatives.
Do the audio gathering and calculations in a second thread. Since both I/O and numpy should release the GIL, it might be a good alternative here. There is a caveat here. Since GUI toolkits like TKinter are generally not multithread-safe, you should not make Tkinter calls from the second thread. But you could set up a function that is called with after to check the progress of the calculation and update the UI say every 100 ms.
Do the audio gathering and calculations in a different multiprocessing.Process. This makes it completely separate from your GUI. You will have to set up a communication channel like e.g. a Queue to send the pks back to the main process. You should use an after function to check if the Queue has data available and to update the display if so.

Depending on the OS you're running at you might nog be measuring actual 'wall-clock' time. See here http://pythoncentral.io/measure-time-in-python-time-time-vs-time-clock/ for some details. Note that for python 3.3 time.clock is deprecated and time.process_time() or time.perf_counter() is recommended.

Related

In python, how can I run a function without the program waiting for its completion?

In Python, I am making a cube game (like Minecraft pre-classic) that renders chunk by chunk (16x16 blocks). It only renders blocks that are not exposed (not covered on all sides). Even though this method is fast when I have little height (like 16x16x2, which is 512 blocks in total), once I make the terrain higher (like 16x16x64, which is 16384 blocks in total), rendering each chunk takes roughly 0.03 seconds, meaning that when I render multiple chunks at once the game freezes for about a quarter of a second. I want to render the chunks "asynchronously", meaning that the program will keep on drawing frames and calling the chunk render function multiple times, no matter how long it takes. Let me show you some pictures to help:
I tried to make another program in order to test it:
import threading
def run():
n=1
for i in range(10000000):
n += 1
print(n)
print("Start")
threading.Thread(target=run()).start()
print("End")
I know that creating such a lot of threads is not the best solution, but nothing else worked.
Threading, however, didn't work, as this is what the output looked like:
>>> Start
>>> 10000001
>>> End
It also took about a quarter of a second to complete, which is about how long the multiple chunk rendering takes.
Then I tried to use async:
import asyncio
async def run():
n = 1
for i in range(10000000):
n += 1
print(n)
print("Start")
asyncio.run(run())
print("End")
It did the exact same thing.
My questions are:
Can I run a function without stopping/pausing the program execution until it's complete?
Did I use the above correctly?
Yes. No. The answer is complicated.
First, your example has at least one error on it:
print("Start")
threading.Thread(target=run).start() #notice the missing parenthesis after run
print("End")
You can use multithreading for your game of course, but it can come at a disadvantage of code complexity because of synchronization and you might not gain any performance because of GIL.
asyncio is probably not for this job either, since you don't need to highly parallelize many tasks and it has the same problems with GIL as multithreading.
The usual solution for this kind of problem is to divide your work into small batches and only process the next batch if you have time to do so on the same frame, kind of like so:
def runBatch(range):
for x in range:
print(x)
batches = [range (x, x+200) for x in range(0, 10000, 200)]
while (true): # main loop
while (timeToNextFrame() > 15):
runBatch(batch.pop())
renderFrame() #or whatever
However, in this instance, optimizing the algorithm itself could be even better than any other option. One thing that Minecraft does is it subdivides chunks into subchunks (you can mostly ignore subchunks that are full of blocks). Another is that it only considers the visible surfaces of the blocks (renders only those sides of the block that could be visible, not the whole block).
asyncio only works asynchronously only when your function is waiting on I/O task like network call or wait on disk I/O etc.
for non I/O tasks to execute asynchronously multi-threading is the only option so create all your threads and wait for the threads to complete their tasks using thread join method
from threading import Thread
import time
def draw_pixels(arg):
time.sleep(arg)
print(arg)
threads = []
args = [1,2,3,4,5]
for arg in args:
t = Thread(target=draw_pixels, args=(arg, ))
t.start()
threads.append(t)
# join all threads
for t in threads:
t.join()

Python: Eliminating gaps between segments of recorded audio

I am using Python sounddevice library to record audio, but I can't seem to eliminate ~0.25 to ~0.5 second gaps between what should be consecutive audio files. I think this is because the file writing takes up time, so I learned to use Multiprocessing and Queues to separate out the file writing but it hasn't helped. The most confusing thing is that the logs suggest that the iterations in Main()'s loop are near gapless (only 1-5 milliseconds) but mysteriously the audio_capture function is taking longer than expected even tho nothing else significant is being done. I tried to reduce the script as much as possible for this post. My research has all pointed to this threading/multiprocessing approach, so I am flummoxed.
Background: 3.7 on Raspbian Buster
I am dividing the data into segments so that the files are not too big and I imagine programming tasks must deal with this challenge. I also have 4 other subprocesses doing various things after.
Log: The audio_capture part should only take 10:00
08:26:29.991 --- Start of segment #0
08:36:30.627 --- End of segment #0 <<<<< This is >0.6 later than it should be
08:36:30.629 --- Start of segment #1 <<<<< This is near gapless with the prior event
Script:
import logging
import sounddevice
from scipy.io.wavfile import write
import time
import os
from multiprocessing import Queue, Process
# this process is a near endless loop
def main():
fileQueue = Queue()
writerProcess = Process(target=writer, args=(fileQueue,))
writerProcess.start()
for i in range(9000):
fileQueue.put(audio_capture(i))
writerProcess.join()
# This func makes an audio data object from a sound source
def audio_capture(i):
cycleNumber = str(i)
logging.debug('Start of segment #' + cycleNumber)
# each cycle is 10 minutes at 32000Hz sample rate
audio = sounddevice.rec(frames=600 * 32000, samplerate=32000, channels=2)
name = time.strftime("%H-%M-%S") + '.wav'
path = os.path.join('/audio', name)
sounddevice.wait()
logging.debug('End of segment #' + cycleNumber)
return [audio, path]
# This function writes the files.
def writer(input_queue):
while True:
try:
parameters = input_queue.get()
audio = parameters[0]
path = parameters[1]
write(filename=path, rate=32000, data=audio)
logging.debug('File is written')
except:
pass
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s.%(msecs)03d --- %(message)s', datefmt='%H:%M:%S',handlers=[logging.FileHandler('/audio/log.txt'), logging.StreamHandler()])
main()
The documentation tells us that sounddevice.rec() is not meant for gapless recording:
If you need more control (e.g. block-wise gapless recording, overlapping recordings, …), you should explicitly create an InputStream yourself. If NumPy is not available, you can use a RawInputStream.
There are multiple examples for gapless recording in the example programs.
Use Pyaudio, open a non-blocking audio-stream. you can find a very good basic example on the Pyaudio documentation frontpage. Choose a buffer size, I recommend 512 or 1024. Now just append the incoming data to a numpy array. I sometimes store up to 30 seconds of audio in one numpy array. When reaching the end of a segment, create another empty numpy array and start over. Create a thread and save the first segment somewhere. The recording will continue and not one sample will be dropped ;)
Edit: if you want to write 10 mins in one file, I would suggest just create 10 arrays á 1 minute and then append and save them.

Spawning Python Threads at EXACTLY the Same Time?

Right now some friends and I are making a program that generates music using square waves in Python (we're still very early on in development). One of the roadblocks along the way was that we figured that PyAudio will only play one sound at a time, and if you tried to play sounds over each other, e.g. to make a chord, the sounds just overwrite each other. Our current strategy is using threading to get around it, and it almost works, but the timing for when the threads start is very slightly off. Here is a snippet of our code that generates a C major chord:
import numpy as np
import pyaudio
import math
from scipy import signal
import multiprocessing
from time import time
def noteTest(frequency):
l = np.linspace(0, 2, 384000, endpoint=False)
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=192000, output=True)
wave_data = signal.square(2 * math.pi * frequency * l)
stream.write(wave_data)
def playNotes():
if __name__ == "__main__":
multiprocessing.Process(target = noteTest, args = [523.25113060119]).start()
print(time())
multiprocessing.Process(target = noteTest, args = [659.25511382575]).start()
print(time())
multiprocessing.Process(target = noteTest, args = [783.99087196355]).start()
print(time())
playNotes()
When I look at the output of the program, here are the times it gives:
1510810518.870557
1510810518.8715587
1510810518.8730626
As you can see, the threads are over a thousandth of a second apart. This is surprisingly noticeable, even for just one chord, but we fear that this will become an even bigger problem if we try and make an actual song as the tracks will drift apart and get out of time with each other. Note that all of the computers we tested this with DO have multiple physical cores. Is there any way to make the threads synchronize better, or are we better off finding an alternate solution?
An option is to have a delay in each thread, before playing the sound. If you have a reasonable idea of the offset involved in starting the threads, you can pass that value in as the delay.
For example, let's say there is 1ms delay between starting threads:
0ms: Start thread 1, with 1ms delay
0ms: Thread 1 starts on new core, waits
1ms: Start thread 2, with no delay
1ms: Thread 1 starts playing after delay
1ms: Thread 2 starts on new core, no delay, and starts playing
Another option, is to have each thread kick off, but wait for a signal from the main process loop to ALL the threads, before they start playing.

Python Sounddevice.play() on Threads

I am having some problems to play the sounddevice on a Thread. I import the sounddevice as sd at the beginning. Then during running I want to play a tone on a thread using the ASIO sound card. All the configurations I need to do on the thread works well. However, when I want to play the tone I got the following Error:
sounddevice.PortAudioError: Error opening OutputStream: Unanticipated host API 2 error 0: u'Failed to load ASIO driver'
If I initialize the import at the thread everytime I need it, it work. But of course I do not want to do that. Any Idea hot to solve it?
Thanks!
Here a simple code example:
from threading import Thread
import numpy as np
import sounddevice as sd
class Test(Thread):
def __init__(self):
Thread.__init__(self)
#-- Configuration of the Tone to be played
self.fs = 44100 # sampling rate, in Hz, 44100 or 48000
duration = 1.05 # in seconds, may be float
f = 200.0 # sine frequency, Hz, may be float
self.tone_data = (np.sin(2*np.pi*np.arange(self.fs*duration)*f/self.fs)).astype(np.float32)
def run(self):
#-- Configuration of the ASIO sound card
#import sounddevice as sd
sd.default.channels = 2
sd.default.device = 14
print sd.query_devices(sd.default.device)['name']
#sd.default.latency = ('low','low')
#asio_out = sd.AsioSettings(channel_selectors=[1, 2])
#sd.default.extra_settings = asio_out
sd.default.samplerate = self.fs
sd.play(self.tone_data)
sd.wait()
w = Test()
w.start()
This seems to be a platform-specific problem. I just tried it with ALSA/Linux and it works fine. With ASIO, you probably have to do the library initialization (which happens during import time) in the same thread you are using later to create the stream (which play() does for you)?
If I initialize the import at the thread everytime I need it, it work. But of course I do not want to do that.
Why do you not want to do that? Are you aware that the use of import in Python is cached automatically? The second time you use import, only a dict lookup is done and nothing else.
But you are right, the repeated import still looks a bit strange.
Did you try to do the import only once in Test.__init__()? There you could also do all the sd.default stuff.
If you still have problems during the initialization (or if you insist on having all imports at the top), you can try to use the undocumented _initialize() and _terminate() functions, see issue #3.
If you want to use multiple Thread instances, you'll get problems with the play() function, which is meant for single-threaded use. But it probably makes more sense anyway to have only one Python thread that does the audio I/O. See also PortAudio Tips – Threading.
BTW, you don't need (...).astype(np.float32), this conversion is done automatically for you.
And while I'm at it, your line sd.query_devices(sd.default.device)['name'] will break if the default input and output devices are different.

How to read & accumulate sensor values at high frequency, without constantly writing to disk (RPi 2 b+, MCP3304, Python)

I am attempting to use a Raspberry Pi 2 Model B+ to read analog data on IR intensity from a photodiode via an MCP3304 (5v VDD) at a rate of ~ 10kHz three times in a row (0.1 ms between readings, or a rate of 10 ksps) once every second based on an external stimulus, average those values, then write them to a text file along with the current datetime. The photodiode is simply reading out data to an amplifier, which then feeds into the MCP3304, which, via SPI, feeds data into the RPi. (In essence: RPi receives a digital input, triggering three consecutive samples from the photodiode via an MCP3304 and in-line signal amplifier. Those three samples are stored in memory, averaged, then written to disk along with a datetime stamp to an existing CSV text file.) This is all on Python 2.7.
As it stands, I'm getting a sampling of < 1kHz with the below code ( SensorRead() ). I am very new to Python (and playing with sensors & RPis, for that matter!), and think the way I've setup my class to take three separate, consecutive ADC samples and possibly my setup for writing to disk may be slowing me down. However, I can't quite seem to find a better way. Edit1: I've done a good bit of research on max sampling rates via Python from the RPi GPIO, and it appears to be well above the ADC's restriction at ~ 1MHz or ~ 1000 ksps (e.g. 1,2). Edit2: Perhaps the ADC max sample rate of 100 ksps actually refers to how many bits can be read rather than how many full 12-bit samples can be taken per second?
Yup. This was it. The MCP3304 can do 100ksps, but the Python read rate is closer to 30ksps, which, when split between the 24-bits read by the MCP3304 per iteration, is closer to 1ksps
My two questions: 1) Are there any better ways of getting closer to the full 100 ksps advertised in the MCP3304 spec sheet? This link suggests calling WiringPi every time I want to take a single sample may cause some considerable latency.
and
2) is it possible, with a beginner/moderate level of Python skill for me to do all of this sampling and per-second averaging in the memory, and only write to disk, say, once every minute? Edit: could you please point me in the direction of some related links/resources?
Thanks!
Note 1: the code is "Threaded" because there are some other functions running simultaneously.
Note 2: I am also, simultaneously reading a differential channel on the ADC, hence the "differential = True" in the MCP3304 command
'''
FILENAME = "~/output_file_location/and/name.txt"
adc_channel_pd = pin of the ADC from which analog signal is taken
stimulus_in_pin = the the pin that receives the alert to begin sampling
stimulus_LED_alert_pin = pin that goes "high" to illuminate an LED every time the stimulus_in_pin is triggered
Vref = the reference voltage for the ADC (3.3v; VDD = 5V)
'''
# import packages
import wiringpi2 as wiringpi
import time
from gpiozero import MCP3304
import threading
import datetime
# Define important objects
Vref = 3.3
adc_channel_pd = 7
stimulus_in_pin = 32
stimulus_LED_alert_pin = 16
# establish GPIO reading structure
wiringpi.wiringPiSetupPhys()
# set appropriate pin inputs and outputs (0 = input, 1 = output)
wiringpi.pinMode(stimulus_in_pin, 0)
wiringpi.pinMode(stimulus_LED_alert_pin, 1)
# create a class to take 3 PD readings, then average them, immediately upon stimulus
class SensorRead(threading.Thread):
def __init__(self):
super(SensorRead, self).__init__()
self.daemon = True
self.start()
def run(self):
for i in itertools.count():
if (wiringpi.digitalRead(stimulus_in_pin) == True):
val_ir_1 = MCP3304(adc_channel_pd, True).value * Vref)
val_ir_2 = MCP3304(adc_channel_pd, True).value * Vref)
val_ir_3 = MCP3304(adc_channel_pd, True).value * Vref)
voltage_ir = round( (float( (sum([val_ir_1,val_ir_2,val_ir_3])) / 3)) , 9)
dt_ir = '%s' % datetime.datetime.now()
f = open(FILENAME, "a")
f.write("IR Sensor," + dt_ir + "," + str(voltage_ir) + "\n")
f.close()
# print to terminal so I can verify output in real time
print "IR Sensor:", dt_ir,",",voltage_ir
# blink ir light on board for visual verification of stimulus in real time
wiringpi.digitalWrite(stimulus_LED_alert_pin, 1)
time.sleep(0.5)
wiringpi.digitalWrite(stimulus_LED_alert_pin, 0)
# sleep to avoid noise post LED firings
time.sleep(0.5)
# run class
SensorRead()
Edit: I ended up getting some great results with Cython, as demonstrated in this test-code I wrote to quantify how fast I could read my ADC. I also ended up writing my own function to read from the MCP3304-- which I'll link to once it's all clean-- that I was able to further optimize in Cython.
One point about your question. Three samples in a second is a rate of 3Hz not 100kHz. Sounds to me like what you want is three samples 10us apart.
1) 10us sample period on MCP3304 using Pi running Linux? Quite possibly not. Do some searching. See for example https://raspberrypi.stackexchange.com/questions/45386/microphone-via-spi where one answer says they achieved 33us (33ksps) using C code and avoiding the SPI driver. Also I suspect you will find Linux process switching and other threads getting in the way and affecting the sample rate particularly if they are also reading the ADC. This may be more likely if you have a dedicated non-Linux processor to read the ADC, programmed in C or assembly language, feeding the three samples to the Pi. Easier if you use a parallel ADC, i.e. not using SPI-like serial comms. See also http://www.hertaville.com/interfacing-an-spi-adc-mcp3008-chip-to-the-raspberry-pi-using-c.html and https://www.raspberrypi.org/forums/viewtopic.php?f=93&t=83069
2) While the sampling at 10us period using the MCP3304 is difficult-to-impossible on Pi, the averaging and writing is definitely possible.
I have a solution for your problem, though: if literally all you're going to do with the three samples is to average them, why not add an old-fashioned low-pass analog filter before the input pin and then just take one sample. Hey presto, no need for hardish-realtime ADCing, no need to worry about other processes, or kernel interrupts!
We have recently benchmarked PIGPIO and RPi.GPIO with regards to the accuracy of reading inputs at different frequencies. The test was performed on a Raspberry Pi 3b.
I would suggest to use the PIGPIO for better results. On our test The max read frequency on that library with a read accuracy above 99% was 20 KHz, compared to 5KHz on the Rpi.GPIO library
You can find the exact test setup and the complete results on this this post: https://atman-iot.com/blog/raspberry-pi-benchmark/

Categories

Resources