I am new to Python but I am studying it as programming language for DSP. I recorded a wav file, and have been trying to play it back using IPython.display.Audio:
import IPython.display
from scipy.io import wavfile
rate, s = wavfile.read('h.wav')
IPython.display.Audio(s, rate=rate)
But this gives the following error:
struct.error: ushort format requires 0 <= number <= 0xffff
I tried installing FFmpeg but it hasn't helped.
That's not a very useful error message, it took a bit of debugging to figure out what was going on! It is caused by the "shape" of the matrix returned from wavfile being the wrong way around.
The docs for IPython.display.Audio say it expects a:
Numpy 2d array containing waveforms for each channel. Shape=(NCHAN, NSAMPLES).
If I read a (stereo) wav file I have lying around:
rate, samples = wavfile.read(path)
print(samples.shape)
I get (141120, 2) showing this is of shape (NSAMPLES, NCHAN). Passing this array directly to Audio I get a similar error as you do. Transposing the array will flip these around, causing the array to be compatible with this method. The transpose of a matrix in Numpy is accessed via the .T attribute, e.g.:
IPython.display.Audio(samples.T, rate=rate)
works for me.
Thank you for your answer, it helped me.
below is my code, maybe can help someone.
frequency = 44100
duration = 5
record = sd.rec((frequency * duration), samplerate=frequency , channels=1, blocking=True, dtype='float64')
sd.wait()
st.audio(record.T, sample_rate=frequency)
Related
I am processing an audio file with librosa as:
import librosa
import soundfile as sf
y,sr = librosa.cora.load('test.wav', sr=22050)
y_processed = some_processing(y)
sf.write('test_processed.wav', y_processed , sr)
y_read = librosa.cora.load('test_processed.wav', sr=22050)
Now the issue is that y_processed and y_read do not match. My understanding is that this comes from some encoding done by soundfile library. Why is this happening and how can I get from y_processed to y_read without saving?
According to this article, librosa.load(), along with other things, normalizes the bit depth between -1 and 1.
I experienced the same problem as you did, where the min and max values of the "loaded" signal were much closer to each other.
Since I don't exactly how your data differs from each other, this may not help you, but this has helped me.
y_processed_buf = librosa.util.buf_to_float(y_processed)
This seems to be the culprit, which would normalizes your values (source code). It is also called during librosa.load(), which is how I stumbled over it.
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
I have a big file saved from matlab with version -v7.3, when reading it by python, the shape of matrix change !! is that normal ?
For example, let's have the below matrix in MATLAB,
clear all, clcl
A = randn(10,3) + randn(10,3)*i;
save('example.mat','-v7.3'); %% The saved file is example.mat with version 7.3
above, the saved file is example.mat a matrix of size (10,3)
so, let's go to python to read that file :
import numpy as np
import h5py as h5
data_try = h5.File('example.mat', 'r')
A = np.array(data_try)
A = A.view(np.complex) #here the matrix equivalent to that one in matlab
but what i find that A in python is of size (3,10) !! and also when having matrix of three dimensions, the shape is changing !!
Is that normal that python reads the transpose of matrix coming from matlab ??!! or something wrong is happening !
However when using the other way as below:
import scipy.io as spio
Data = spio.loadmat('example.mat', squeeze_me=True)
A = Data[‘A’]
in that case, everything is really nice, but unfortunately we can not use that way for big matrices !!!
please, any solution for that issue ?
You might face a problem with different memory alignment in Matlab (column-major) and Numpy (row-major)... check e.g. this question for related discussion and a solution (reshaping in Fortran-style, which is also column-major).
SciPy's .mat interface automatically takes care of this reinterpretation, which is why you don't encounter the problem when using it.
I want to plot Spectrogram of 30s of a audio file in wav. But I encountered error while doing so in python. How can I achieve my goal?
import scipy
import matplotlib.pyplot as plt
import scipy.io.wavfile
sample_rate, X = scipy.io.wavfile.read('595.wav')
print (sample_rate, X.shape )
plt.specgram(X, Fs=sample_rate, xextent=(0,30))
And error
ValueError: only 1-dimensional arrays can be used
The error is pretty clear: ValueError: only 1-dimensional arrays can be used.
In your case X is not 1-dimensional. You would find out by printing X.shape.
While I can't be certain without a complete example here, the best guess would be that you're having a stereo wav file, which has 2 channels. So you need to select if you want to plot the spectrogram for the left or the right channel. E.g. for the left channel:
plt.specgram(X[:,0], Fs=sample_rate, xextent=(0,30))
As I am currently about to build a device based on a Raspberry Pi for measuring some stuff from noise recorded with a sound card (e.g. variance), and trying to do this within python, I got stuck figuring out how to get a an audiosample as float-number for further calculations.
What did I do:
Took a Line-In-to-chinch-adapter and touching the plugs for generating some sort of test signal.
Recording to for example Audacity or Matlab shows plausible results, like
What I want to get:
In ideal, I want to get for example 5 frames á 1024 samples from the sound card, and convert them into a list, tuple or numpy array as a float number for further calculations.
When trying to achieve this with python/pyaudio with the code at the end of this post, I got something like this:
Due to the fact that the values I got with python seem to differ from them in Matlab (and others) by the factor of about two, I think I've overseen something or did something wrong.
I think I made a mistake somewhere at the struct.unpack region, but can't figure out where exactly or why.
I'd like to ask you for help, pointing out where the error is and what I did wrong.
Little testcode for getting some samples and plotting them:
import pyaudio
import struct
import matplotlib.pyplot as plt
FORMAT = pyaudio.paFloat32
SAMPLEFREQ = 44100
FRAMESIZE = 1024
NOFFRAMES = 220
p = pyaudio.PyAudio()
print('running')
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = struct.unpack(str(NOFFRAMES*FRAMESIZE)+'f',data)
stream.stop_stream()
stream.close()
p.terminate()
print('done')
plt.plot(decoded)
plt.show()
Try use "numpy.fromstring" function to replace "struct.unpack":
import numpy
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = numpy.fromstring(data, 'Float32');
let me know if this works for you