As I am currently about to build a device based on a Raspberry Pi for measuring some stuff from noise recorded with a sound card (e.g. variance), and trying to do this within python, I got stuck figuring out how to get a an audiosample as float-number for further calculations.
What did I do:
Took a Line-In-to-chinch-adapter and touching the plugs for generating some sort of test signal.
Recording to for example Audacity or Matlab shows plausible results, like
What I want to get:
In ideal, I want to get for example 5 frames á 1024 samples from the sound card, and convert them into a list, tuple or numpy array as a float number for further calculations.
When trying to achieve this with python/pyaudio with the code at the end of this post, I got something like this:
Due to the fact that the values I got with python seem to differ from them in Matlab (and others) by the factor of about two, I think I've overseen something or did something wrong.
I think I made a mistake somewhere at the struct.unpack region, but can't figure out where exactly or why.
I'd like to ask you for help, pointing out where the error is and what I did wrong.
Little testcode for getting some samples and plotting them:
import pyaudio
import struct
import matplotlib.pyplot as plt
FORMAT = pyaudio.paFloat32
SAMPLEFREQ = 44100
FRAMESIZE = 1024
NOFFRAMES = 220
p = pyaudio.PyAudio()
print('running')
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = struct.unpack(str(NOFFRAMES*FRAMESIZE)+'f',data)
stream.stop_stream()
stream.close()
p.terminate()
print('done')
plt.plot(decoded)
plt.show()
Try use "numpy.fromstring" function to replace "struct.unpack":
import numpy
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = numpy.fromstring(data, 'Float32');
let me know if this works for you
Related
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
I am new to Python but I am studying it as programming language for DSP. I recorded a wav file, and have been trying to play it back using IPython.display.Audio:
import IPython.display
from scipy.io import wavfile
rate, s = wavfile.read('h.wav')
IPython.display.Audio(s, rate=rate)
But this gives the following error:
struct.error: ushort format requires 0 <= number <= 0xffff
I tried installing FFmpeg but it hasn't helped.
That's not a very useful error message, it took a bit of debugging to figure out what was going on! It is caused by the "shape" of the matrix returned from wavfile being the wrong way around.
The docs for IPython.display.Audio say it expects a:
Numpy 2d array containing waveforms for each channel. Shape=(NCHAN, NSAMPLES).
If I read a (stereo) wav file I have lying around:
rate, samples = wavfile.read(path)
print(samples.shape)
I get (141120, 2) showing this is of shape (NSAMPLES, NCHAN). Passing this array directly to Audio I get a similar error as you do. Transposing the array will flip these around, causing the array to be compatible with this method. The transpose of a matrix in Numpy is accessed via the .T attribute, e.g.:
IPython.display.Audio(samples.T, rate=rate)
works for me.
Thank you for your answer, it helped me.
below is my code, maybe can help someone.
frequency = 44100
duration = 5
record = sd.rec((frequency * duration), samplerate=frequency , channels=1, blocking=True, dtype='float64')
sd.wait()
st.audio(record.T, sample_rate=frequency)
For a project including a moving robot arm, i need a "Geiger-Müller counter"-like distance-based alarm.
For that i wrote a python module and tried to add the posibility, if the robot arm is on the left side of an object, the sound is only on the left speaker, right side analogue.
For that i looked into the sounddevice library, where an easy channel mapping is possible.
As can be seen in the code-snippet 1 below, and like in the docu, i am calling sd.play() and waiting with sd.wait() till the sound finishes.
When i am using a sample-sound with a length of 6 secs, everything works fine. But if i am using the desired sound, a short beep-sound (under 1 sec) the code is not working. The script ends after ~1 sec without any sound.
I can fix this via adding a sleep-statement (commented out in the code).
But for the context i need a smaller playing-window than 0.5 sec.
Does anyone know how i can fix this or work around?
I tried to use pyaudio instead, but wasnt able to get the mono .wav file into a stereo byte array, for switching the audio dynamically on left/right speaker. (code-snippet2)
But there i am always encountering an error ("asscii codec cant decode byte on pos 2: ordinal not in range(128)")
Snippet1:
import sounddevice as sd
import soundfile as sf
data, fs = sf.read(args.filename, dtype='float32')
byte = np.array(data)
sd.play(byte, fs)
#time.sleep(0.5)
status = sd.wait()
Snippet2:
data = wave.readframes(chunkSize)
stereo_signal = np.zeros([len(data), 2])
stereo_signal[:,0] = data[:]
stereo_signal[:,1] = 0
Just the second day met with Python and with troubles...
I've got a lot of CD-standard (16 bit, 44100 Hz) stereo wave files and need to find their average (arithmetic mean) FIR. The algorithm is easy to say... - the sum of amplitudes for each freq. divides on the amount of files. Then the achieved FIR is being plotted and written down to the text file as the table.
I rolled over some similar posts like this exciting Python Scipy FFT wav files but there are still too many things, even alphabet, I lose touch in and compiler mistkes follow every time I try to repeat the examples.
I would appreciate any help that can move mу from the dead-end. So, these are my shy paces...
As the number of files may vary it is probably useful to have a list of files at the elbow:
import os
a = os.path.expanduser(u"~") # absolute user path var.
b = "integrator\\files" # base folder to use with files in it
c = os.path.join(a, b)
flist = os.listdir(c)
images = filter(lambda x: x.endswith('.wav'), flist) # filter non-wavs
for i in range(len(flist)):
print(flist[i])
print()
And it works fine for me! But I still cannot catch how to organize the multiple files reading, and calculating their mean FIR massive
As I keeked I need something like "global package":
import glob
import mainfile
files = glob.glob('./*.wav')
for ele in files:
f(ele)
quit()
Wherу the mainfile.py looks somethng like that:
import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
from scipy.fftpack import fft
from pylab import *
def f(filename):
fs, data = wavfile.read(filename) # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**16.)*2-1 for ele in a] # this is 16-bit track, now normalized on [-1,1)
c = fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
And here I just don;t know what should I better do with 'd's - summing in cycle or... Then this code example operated just 1 channel for plotting - I need the output FIR as seqence of pairs for each channel. Yet still it's not clear how to tweak FFT window to Hanning with at least 65536 FFT-size (oh yes, I know thу)calculations are slow as hell).
In the end we can plot and save the graph:
plt.plot(abs(c[:(d-1)]),'r')
savefig(filename+'.png',bbox_inches='tight')
... and somehow write average FIR to the txt table file
I'd be happy enough if this script worked as the console application (though at first I dreamt of kinda minimalistic GUI with ability choose any folder containing files with certian overview button and with progress bar to make sure that app is still breathing... though hard covering ten or twenty five wavs with FFT slow "scythe".
Got C:\Anaconda2 (with numpy, scipy and matplotlib properly installed) on Windows 7 x86 PC
Thank you in advance!
With regards,
Me.
Is it possible to get python to generate a simple sound like a sine wave?
Is there a module available for this? If not, how would you go about creating your own?
Also, would you need some kind of host environment for python to run in in order to play sound, or can it be achieved just from making calls from the terminal?
If the answer is OS-dependent, I'm using a mac.
I was looking for the same thing, In the end, I wrote this code which is working fine.
import math #import needed modules
import pyaudio #sudo apt-get install python-pyaudio
PyAudio = pyaudio.PyAudio #initialize pyaudio
#See https://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 16000 #number of frames per second/frameset.
FREQUENCY = 500 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1 #seconds to play sound
BITRATE = max(BITRATE, FREQUENCY+100)
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
#generating wawes
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()
I know I'm a little late to the game on this one, but this is a pretty fantastic python project for synthesis and audio composition: https://github.com/hecanjog/pippi
It's still actively being developed, but it's been going for a while.
After wasting time on some uncompilable or non-existent projects, I discovered the python module wavebender, which offers generation of single or multiple channels of sine, square and combined waves. The results can be written either to a wavefile or to sys.stdout, from where they can be interpreted directly by aplay in real-time. Some useful examples are explained here, and are included at the project's github page.
The Python In Music wiki page has not been terribly well-kept-up, but it's a good starting point.
http://wiki.python.org/moin/PythonInMusic
I am working on a powerful synthesizer in python. I used custom functions to write directly to a .wav file. There are built in functions that can be used for this purpose. You will need to modify the .wav header to reflect the sample rate, bits per sample, number of channels, and duration of synthesis.
Here is an early version of a sin wave generator that outputs a list of values that after applying bytearray becomes suitable for writing to the data parameter of a wave file. [edit] A conversion function will need to transform the list into little endian hex values before applying the bytearray. See the WAVE PCM soundfile format link below for details on the .wav specification.[/edit]
def sin_basic(freq, time=1, amp=1, phase=0, samplerate=44100, bitspersample=16):
bytelist = []
import math
TwoPiDivSamplerate = 2*math.pi/samplerate
increment = TwoPiDivSamplerate * freq
incadd = phase*increment
for i in range(int(samplerate*time)):
if incadd > (2**(bitspersample - 1) - 1):
incadd = (2**(bitspersample - 1) - 1) - (incadd - (2**(bitspersample - 1) - 1))
elif incadd < -(2**(bitspersample - 1) - 1):
incadd = -(2**(bitspersample - 1) - 1) + (-(2**(bitspersample - 1) - 1) - incadd)
bytelist.append(int(round(amp*(2**(bitspersample - 1) - 1)*math.sin(incadd))))
incadd += increment
return bytelist
A newer version can use waveforms to modulate the frequency, amplitude, and phase of the waveform parameters. The data format makes it trivial to blend and concatenate waves together. If this seems up your alley, check out WAVE PCM soundfile format.
I like PyAudiere , which lets you play numpy arrays as sounds... I guess it jives well with my Matlab background. I believe it's cross-platform. I think scikits.audiolab does the same thing, and may be more current / better supported... seems easier to me than trying to save things as wavfiles or write them to buffers and use Python's builtin sound library.
I found these two python repositories very useful, might wanna have a look at it...
python https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
ipython : https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
[EDIT] As pointed out, here is an explanational of the two links
python one seems to have an error, but many people were able to make it run, so I'm not sure. ipython worked like a charm, so I hope you can run it.
Both of the links are supposed to take an audio as an input, preferably .wav file. Featurize it ( USE FFT : 512, step size = 512/8 ) to obtain spectrograms ( you can even visualize it ), it's a 2D matrix, and then train your Machine learning objects or do whatever you want using a matrix that represents the original audio. If you want, at anypoint, what those vectors represent you can resynthesize audio back as well.