Python u-Law (MULAW) wave decompression to raw wave signal - python

I googled this issue for last 2 weeks and wasn't able to find an algorithm or solution. I have some short .wav file but it has MULAW compression and python doesn't seem to have function inside wave.py that can successfully decompresses it. So I've taken upon myself to build a decoder in python.
I've found some info about MULAW in basic elements:
Wikipedia
A-law u-Law comparison
Some c-esc codec library
So I need some guidance, since I don't know how to approach getting from signed short integer to a full wave signal. This is my initial thought from what I've gathered so far:
So from wiki I've got a equation for u-law compression and decompression :
compression :
decompression :
So judging by compression equation, it looks like the output is limited to a float range of -1 to +1 , and with signed short integer from –32,768 to 32,767 so it looks like I would need to convert it from short int to float in specific range.
Now, to be honest, I've heard of quantisation before, but I am not sure if I should first try and dequantize and then decompress or in the other way, or even if in this case it is the same thing... the tutorials/documentation can be a bit of tricky with terminology.
The wave file I am working with is supposed to contain 'A' sound like for speech synthesis, I could probably verify success by comparing 2 waveforms in some audio software and custom wave analyzer but I would really like to diminish trial and error section of this process.
So what I've had in mind:
u = 0xff
data_chunk = b'\xe7\xe7' # -6169
data_to_r1 = unpack('h',data_chunk)[0]/0xffff # I suspect this is wrong,
# # but I don't know what else
u_law = ( -1 if data_chunk<0 else 1 )*( pow( 1+u, abs(data_to_r1)) -1 )/u
So is there some sort of algorithm or crucial steps I would need to take in form of first: decompression, second: quantisation : third ?
Since everything I find on google is how to read a .wav PCM-modulated file type, not how to manage it if wild compression arises.

So, after scouring the google the solution was found in github ( go figure ). I've searched for many many algorithms and found 1 that is within bounds of error for lossy compression. Which is for u law for positive values from 30 -> 1 and for negative values from -32 -> -1
To be honest i think this solution is adequate but not quite per equation per say, but it is best solution for now. This code is transcribed to python directly from gcc9108 audio codec
def uLaw_d(i8bit):
bias = 33
sign = pos = 0
decoded = 0
i8bit = ~i8bit
if i8bit&0x80:
i8bit &= ~(1<<7)
sign = -1
pos = ( (i8bit&0xf0) >> 4 ) + 5
decoded = ((1 << pos) | ((i8bit & 0x0F) << (pos - 4)) | (1 << (pos - 5))) - bias
return decoded if sign else ~decoded
def uLaw_e(i16bit):
MAX = 0x1fff
BIAS = 33
mask = 0x1000
sign = lsb = 0
pos = 12
if i16bit < 0:
i16bit = -i16bit
sign = 0x80
i16bit += BIAS
if ( i16bit>MAX ): i16bit = MAX
for x in reversed(range(pos)):
if i16bit&mask != mask and pos>=5:
pos = x
break
lsb = ( i16bit>>(pos-4) )&0xf
return ( ~( sign | ( pos<<4 ) | lsb ) )
With test:
print( 'normal :\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(0xff) )
print( 'encoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_e(0xff)) )
print( 'decoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_d(uLaw_e(0xff))) )
and output:
normal : 255 | FF : 0000000011111111
encoded: -179 | -B3 : -000000010110011
decoded: 263 | 107 : 0000000100000111
And as you can see 263-255 = 8 which is within bounds. When i tried to implement seeemmmm method described in G.711 ,that kind user Oliver Charlesworth suggested that i look in to , the decoded value for maximum in data was -8036 which is close to the maximum of uLaw spec, but i couldn't reverse engineer decoding function to get binary equivalent of function from wikipedia.
Lastly, i must say that i am currently disappointed that python library doesn't support all kind of compression algorithms since it is not just a tool that people use, it is also a resource python consumers learn from since most of data for further dive into code isn't readily available or understandable.
EDIT
After decoding the data and writing wav file via wave.py i've successfully succeeded to write a new raw linear PCM file. This works... even though i was sceptical at first.
EDIT 2: ::> you can find real solution oncompressions.py

I find this helpful for converting to/from ulaw with numpy arrays.
import audioop
def numpy_audioop_helper(x, xdtype, func, width, ydtype):
'''helper function for using audioop buffer conversion in numpy'''
xi = np.asanyarray(x).astype(xdtype)
if np.any(x != xi):
xinfo = np.iinfo(xdtype)
raise ValueError("input must be %s [%d..%d]" % (xdtype, xinfo.min, xinfo.max))
y = np.frombuffer(func(xi.tobytes(), width), dtype=ydtype)
return y.reshape(xi.shape)
def audioop_ulaw_compress(x):
return numpy_audioop_helper(x, np.int16, audioop.lin2ulaw, 2, np.uint8)
def audioop_ulaw_expand(x):
return numpy_audioop_helper(x, np.uint8, audioop.ulaw2lin, 2, np.int16)

Python actually supports decoding u-Law out of the box:
audioop.ulaw2lin(fragment, width)
Convert sound fragments in u-LAW encoding to linearly encoded sound fragments. u-LAW encoding always uses 8 bits samples, so width
refers only to the sample width of the output fragment here.
https://docs.python.org/3/library/audioop.html#audioop.ulaw2lin

Related

hashlib: Unicode-objects must be encoded before hashing

I am running this hashlib code and it runs almost all the way:
def generate_hashes(peaks, fan_value=DEFAULT_FAN_VALUE):
if PEAK_SORT:
sorted(peaks,key=itemgetter(1))
# bruteforce all peaks
peaks=list(peaks)
len_peaks=len(peaks)
for i in range(len_peaks):
for j in range(1, fan_value):
if (i + j) < len(peaks):
# take current & next peak frequency value
freq1 = peaks[i][IDX_FREQ_I]
freq2 = peaks[i + j][IDX_FREQ_I]
# take current & next -peak time offset
t1 = peaks[i][IDX_TIME_J]
t2 = peaks[i + j][IDX_TIME_J]
# get diff of time offsets
t_delta = t2 - t1
# check if delta is between min & max
if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
yield (h.hexdigest()[0:FINGERPRINT_REDUCTION], t1)
However, it returns this error:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
TypeError: Unicode-objects must be encoded before hashing
I am honestly completely lost and don't know how to fix it. If you guys have any follow up questions regarding details about the code I will try my best to answer. Any feedback would be appreciated.
The answer is in the error message: use encode on your text string before hashing.
h = hashlib.sha1(("%s|%s|%s" % (str(freq1), str(freq2), str(t_delta))).encode('utf-8'))
The reason this is necessary is because hashlib.sha1() requires a bytes object due to the way it works internally. Normal Python strings (since version 3.0) are made of Unicode codepoints, which don't fit into a byte. They need an encoding which defines how the translation between codepoints and bytes occurs. UTF-8 is the most popular encoding, because it can handle every Unicode codepoint yet remain backwards compatible with older encodings like ASCII.

Serial Port data

I am attempting to read the data from an Absolute Encoder with a USB interface using pyserial on the Raspberry. The datasheet for the encoder is below. The USB interface data is on page 22-23
https://www.rls.si/eng/fileuploader/download/download/?d=0&file=custom%2Fupload%2FData-sheet-AksIM-offaxis-rotary-absolute-encoder.pdf
I have successfully connected to the Encoder and I am able to send commands using
port = serial.Serial("/dev/serial/by-id/usb-RLS_Merilna_tehnkis_AksIM_encoder_3454353-if00")
port.write(b"x")
where x is any of the available Commands listed for the USB interface.
For example port.write(b"1") is meant to initiate a single position request. I am able to print the output from encoder with
x = port.read()
print(x)
The problem is converting the output into actual positiong data. port.write(b"1") outputs the following data:
b'\xea\xd0\x05\x00\x00\x00\xef'
I know that the first and last bytes are just the header and footer. Bytes 5 and 6 are the encoder status. Bytes 2-4 is the actual position data. The customer support has informed me that I need to take bytes 2 to 4, shift them into a 32 bit unsigned integer (into lower 3 bytes), convert to a floating point number, divide by 0xFF FF FF, multiply by 360. Result are degrees.
I'm not exactly sure how to do this. Can someone please let me know the python prgramming/functions I need to write in order to do this. Thank you.
You have to use builtin from_bytes() method:
x = b'\xea\xd0\x05\x00\x00\x00\xef'
number = 360 * float(
int.from_bytes(x[1:4], 'big') # get integer from bytes
) / 0xffffff
print(number)
will print:
292.5274832563092
This is the way to extract the bytes and shift them into an integer and scale as a float:
x = b'\xea\xd0\x05\x00\x00\x00\xef'
print(x)
int_value = 0 # initialise shift register
for index in range(1,4):
int_value *= 256 # shift up by 8 bits
int_value += x[index] # or in the next byte
print(int_value)
# scale the integer against the max value
float_value = 360 * float(int_value) / 0xffffff
print(float_value)
Output:
b'\xea\xd0\x05\x00\x00\x00\xef'
13632768
292.5274832563092

how to convert wav file to float amplitude

so I asked everything in the title:
I have a wav file (written by PyAudio from an input audio) and I want to convert it in float data corresponding of the sound level (amplitude) to do some fourier transformation etc...
Anyone have an idea to convert WAV data to float?
I have identified two decent ways of doing this.
Method 1: using the wavefile module
Use this method if you don't mind installing some extra libraries which involved a bit of messing around on my Mac but which was easy on my Ubuntu server.
https://github.com/vokimon/python-wavefile
import wavefile
# returns the contents of the wav file as a double precision float array
def wav_to_floats(filename = 'file1.wav'):
w = wavefile.load(filename)
return w[1][0]
signal = wav_to_floats(sys.argv[1])
print "read "+str(len(signal))+" frames"
print "in the range "+str(min(signal))+" to "+str(max(signal))
Method 2: using the wave module
Use this method if you want less module install hassles.
Reads a wav file from the filesystem and converts it into floats in the range -1 to 1. It works with 16 bit files and if they are > 1 channel, will interleave the samples in the same way they are found in the file. For other bit depths, change the 'h' in the argument to struct.unpack according to the table at the bottom of this page:
https://docs.python.org/2/library/struct.html
It will not work for 24 bit files as there is no data type that is 24 bit, so there is no way to tell struct.unpack what to do.
import wave
import struct
import sys
def wav_to_floats(wave_file):
w = wave.open(wave_file)
astr = w.readframes(w.getnframes())
# convert binary chunks to short
a = struct.unpack("%ih" % (w.getnframes()* w.getnchannels()), astr)
a = [float(val) / pow(2, 15) for val in a]
return a
# read the wav file specified as first command line arg
signal = wav_to_floats(sys.argv[1])
print "read "+str(len(signal))+" frames"
print "in the range "+str(min(signal))+" to "+str(max(signal))
I spent hours trying to find the answer to this. The solution turns out to be really simple: struct.unpack is what you're looking for. The final code will look something like this:
rawdata=stream.read() # The raw PCM data in need of conversion
from struct import unpack # Import unpack -- this is what does the conversion
npts=len(rawdata) # Number of data points to be converted
formatstr='%ih' % npts # The format to convert the data; use '%iB' for unsigned PCM
int_data=unpack(formatstr,rawdata) # Convert from raw PCM to integer tuple
Most of the credit goes to Interpreting WAV Data. The only trick is getting the format right for unpack: it has to be the right number of bytes and the right format (signed or unsigned).
Most wave files are in PCM 16-bit integer format.
What you will want to:
Parse the header to known which format it is (check the link from Xophmeister)
Read the data, take the integer values and convert them to float
Integer values range from -32768 to 32767, and you need to convert to values from -1.0 to 1.0 in floating points.
I don't have the code in python, however in C++, here is a code excerpt if the PCM data is 16-bit integer, and convert it to float (32-bit):
short* pBuffer = (short*)pReadBuffer;
const float ONEOVERSHORTMAX = 3.0517578125e-5f; // 1/32768
unsigned int uFrameRead = dwRead / m_fmt.Format.nBlockAlign;
for ( unsigned int i = 0; i < uFrameCount * m_fmt.Format.nChannels; ++i )
{
short i16In = pBuffer[i];
out_pBuffer[i] = (float)i16In * ONEOVERSHORTMAX;
}
Be careful with stereo files, as the stereo PCM data in wave files is interleaved, meaning the data looks like LRLRLRLRLRLRLRLR (instead of LLLLLLLLRRRRRRRR). You may or may not need to de-interleave depending what you do with the data.
This version reads a wav file from the filesystem and converts it into floats in the range -1 to 1. It works with files of all sample widths and it will interleave the samples in the same way they are found in the file.
import wave
def read_wav_file(filename):
def get_int(bytes_obj):
an_int = int.from_bytes(bytes_obj, 'little', signed=sampwidth!=1)
return an_int - 128 * (sampwidth == 1)
with wave.open(filename, 'rb') as file:
sampwidth = file.getsampwidth()
frames = file.readframes(-1)
bytes_samples = (frames[i : i+sampwidth] for i in range(0, len(frames), sampwidth))
return [get_int(b) / pow(2, sampwidth * 8 - 1) for b in bytes_samples]
Also here is a link to the function that converts floats back to ints and writes them to desired wav file:
https://gto76.github.io/python-cheatsheet/#writefloatsamplestowavfile
The Microsoft WAVE format is fairly well documented. See https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ for example. It wouldn't take much to write a file parser to open and interpret the data to get the information you require... That said, it's almost certainly been done before, so I'm sure someone will give an "easier" answer ;)

How to set parameters in Python zlib module

I want to write a Python program that makes PNG files. My big problem is with generating the CRC and the data in the IDAT chunk. Python 2.6.4 does have a zlib module, but there are extra settings needed. The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module.
As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification.
Note that I can generate the rest of the PNG file and the data that is to be compressed for the IDAT chunk, I just don't know how to properly compress the image data for the IDAT chunk after implementing the initial filtering step.
EDITED:
The problem with PyPNG is that it will not write tEXt chunks. A minor annoyance is that one has to manipulate the image as (R, G, B) data; I'd prefer to manipulate palette values of the pixels directly and then define the associations between palette values and color data. I'm also left unsure if PyPNG takes advantage of the "compression" allowed by using 1-, 2-, and 4- bit palette values in the image data to fit more than one pixel in a byte.
Even if you can't use PyPNG for the tEXt chunk reason, you can use its code! (it's MIT licensed). Here's how a chunk is written:
def write_chunk(outfile, tag, data=''):
"""
Write a PNG chunk to the output file, including length and
checksum.
"""
# http://www.w3.org/TR/PNG/#5Chunk-layout
outfile.write(struct.pack("!I", len(data)))
outfile.write(tag)
outfile.write(data)
checksum = zlib.crc32(tag)
checksum = zlib.crc32(data, checksum)
outfile.write(struct.pack("!i", checksum))
Note the use of zlib.crc32 to create the CRC checksum, and also note how the checksum runs over both the tag and the data.
For compressing the IDAT chunks you basically just use zlib. As others have noted the adler checksum and the default window size are all okay (by the way the PNG spec does not require a window size of 32768, it requires that the window is at most 32768 bytes; this is all a bit odd, because in any case 32768 is the maximum window size permitted by the current version of the zlib spec).
The code to do this in PyPNG is not particular great, see the write_passes() function. The bit that actually compresses the data and writes a chunk is this:
compressor = zlib.compressobj()
compressed = compressor.compress(tostring(data))
if len(compressed):
# print >> sys.stderr, len(data), len(compressed)
write_chunk(outfile, 'IDAT', compressed)
PyPNG never uses scanline filtering. Partly this is because it would be very slow in Python, partly because I haven't written the code. If you have Python code to do filtering, it would be a most welcome contribution to PyPNG. :)
Short answer: (1) "deflate" and "32Kb window" are the defaults (2) uses adler32 not crc32
Long answer:
""" The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module. """
You don't need to set them. Those are the defaults.
If you really want to specify non-default arguments to zlib, you can use zlib.compressobj() ... it has several args that are not documented in the Python docs. Reading material:
source: Python's gzip.py (see how it calls zlib.compressobj)
source: Python's zlibmodule.c (see its defaults)
SO: This question (see answers by MizardX and myself, and comments on each)
docs: The manual on the zlib site
"""As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification."""
Please check out the zlib specification aka RFC 1950 ... it says that the checksum used is adler32
The zlib compress or compressobj output will include the appropriate CRC; why do you think that you will need to do it yourself?
Edit So you do need a CRC-32. Good news: zlib.crc32() will do the job:
Code:
import zlib
crc_table = None
def make_crc_table():
global crc_table
crc_table = [0] * 256
for n in xrange(256):
c = n
for k in xrange(8):
if c & 1:
c = 0xedb88320L ^ (c >> 1)
else:
c = c >> 1
crc_table[n] = c
make_crc_table()
"""
/* Update a running CRC with the bytes buf[0..len-1]--the CRC
should be initialized to all 1's, and the transmitted value
is the 1's complement of the final running CRC (see the
crc() routine below)). */
"""
def update_crc(crc, buf):
c = crc
for byte in buf:
c = crc_table[int((c ^ ord(byte)) & 0xff)] ^ (c >> 8)
return c
# /* Return the CRC of the bytes buf[0..len-1]. */
def crc(buf):
return update_crc(0xffffffffL, buf) ^ 0xffffffffL
if __name__ == "__main__":
tests = [
"",
"\x00",
"\x01",
"Twas brillig and the slithy toves did gyre and gimble in the wabe",
]
for test in tests:
model = crc(test) & 0xFFFFFFFFL
zlib_result = zlib.crc32(test) & 0xFFFFFFFFL
print (model, zlib_result, model == zlib_result)
Output from Python 2.7 is below. Also tested with Python 2.1 to 2.6 inclusive and 1.5.2 JFTHOI.
(0L, 0L, True)
(3523407757L, 3523407757L, True)
(2768625435L, 2768625435L, True)
(4186783197L, 4186783197L, True)
Don't you want to use some existing software to generate your PNGs? How about PyPNG?
There are libraries that can write PNG files for you, such as PIL. That will be easier and faster, and as an added bonus you can read and write tons of formats.
It looks like you will have to resort to call zlib "by hand" using ctypes --
It is not that hard:
>>> import ctypes
>>> z = ctypes.cdll.LoadLibrary("libzip.so.1")
>>> z.zlibVersion.restype=ctypes.c_char_p
>>> z.zlibVersion()
'1.2.3'
You can check the zlib library docmentation here: http://zlib.net/manual.html
The zlib.crc32 works fine, and the zlib compressor has correct defaults for png generation.
For the casual reader who looks for png generation from Python code, here is a complete example that you can use as a starter for your own png generator code - all you need is the standard zlib module and some bytes-encoding:
#! /usr/bin/python
""" Converts a list of list into gray-scale PNG image. """
__copyright__ = "Copyright (C) 2014 Guido Draheim"
__licence__ = "Public Domain"
import zlib
import struct
def makeGrayPNG(data, height = None, width = None):
def I1(value):
return struct.pack("!B", value & (2**8-1))
def I4(value):
return struct.pack("!I", value & (2**32-1))
# compute width&height from data if not explicit
if height is None:
height = len(data) # rows
if width is None:
width = 0
for row in data:
if width < len(row):
width = len(row)
# generate these chunks depending on image type
makeIHDR = True
makeIDAT = True
makeIEND = True
png = b"\x89" + "PNG\r\n\x1A\n".encode('ascii')
if makeIHDR:
colortype = 0 # true gray image (no palette)
bitdepth = 8 # with one byte per pixel (0..255)
compression = 0 # zlib (no choice here)
filtertype = 0 # adaptive (each scanline seperately)
interlaced = 0 # no
IHDR = I4(width) + I4(height) + I1(bitdepth)
IHDR += I1(colortype) + I1(compression)
IHDR += I1(filtertype) + I1(interlaced)
block = "IHDR".encode('ascii') + IHDR
png += I4(len(IHDR)) + block + I4(zlib.crc32(block))
if makeIDAT:
raw = b""
for y in xrange(height):
raw += b"\0" # no filter for this scanline
for x in xrange(width):
c = b"\0" # default black pixel
if y < len(data) and x < len(data[y]):
c = I1(data[y][x])
raw += c
compressor = zlib.compressobj()
compressed = compressor.compress(raw)
compressed += compressor.flush() #!!
block = "IDAT".encode('ascii') + compressed
png += I4(len(compressed)) + block + I4(zlib.crc32(block))
if makeIEND:
block = "IEND".encode('ascii')
png += I4(0) + block + I4(zlib.crc32(block))
return png
def _example():
with open("cross3x3.png","wb") as f:
f.write(makeGrayPNG([[0,255,0],[255,255,255],[0,255,0]]))

Interpreting WAV Data

I'm trying to write a program to display PCM data. I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. However, I'm not sure how to interpret the data.
The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). This all seems cheery, but then I tried printing a single frame:'\xc0\xff\xd0\xff' which is 4 bytes. I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there.
96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds
However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. What's going on here?
Additionally, how am I to know if the bytes are signed or unsigned?
Many thanks!
Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it:
import wave
import struct
def pcm_channels(wave_file):
"""Given a file-like object or file path representing a wave file,
decompose it into its constituent PCM data streams.
Input: A file like object or file path
Output: A list of lists of integers representing the PCM coded data stream channels
and the sample rate of the channels (mixed rate channels not supported)
"""
stream = wave.open(wave_file,"rb")
num_channels = stream.getnchannels()
sample_rate = stream.getframerate()
sample_width = stream.getsampwidth()
num_frames = stream.getnframes()
raw_data = stream.readframes( num_frames ) # Returns byte data
stream.close()
total_samples = num_frames * num_channels
if sample_width == 1:
fmt = "%iB" % total_samples # read unsigned chars
elif sample_width == 2:
fmt = "%ih" % total_samples # read signed 2 byte shorts
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
integer_data = struct.unpack(fmt, raw_data)
del raw_data # Keep memory tidy (who knows how big it might be)
channels = [ [] for time in range(num_channels) ]
for index, value in enumerate(integer_data):
bucket = index % num_channels
channels[bucket].append(value)
return channels, sample_rate
"Two channels" means stereo, so it makes no sense to sum each channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). As for signedness, as explained for example here, and I quote:
8-bit samples are stored as unsigned
bytes, ranging from 0 to 255. 16-bit
samples are stored as 2's-complement
signed integers, ranging from -32768
to 32767.
This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file.
I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this.
pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)
Also, one package that I used was called PyAudio, though I still had to use the wave package with it.
Each sample is 16 bits and there 2 channels, so the frame takes 4 bytes
The duration is simply the number of frames divided by the number of frames per second. From your data this is: 96333 / 44100 = 2.18 seconds.
Building upon this answer, you can get a good performance boost by using numpy.fromstring or numpy.fromfile. Also see this answer.
Here is what I did:
def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):
if sample_width == 1:
dtype = np.uint8 # unsigned char
elif sample_width == 2:
dtype = np.int16 # signed 2-byte short
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
channels = np.fromstring(raw_bytes, dtype=dtype)
if interleaved:
# channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
channels.shape = (n_frames, n_channels)
channels = channels.T
else:
# channels are not interleaved. All samples from channel M occur before all samples from channel M-1
channels.shape = (n_channels, n_frames)
return channels
Assigning a new value to shape will throw an error if it requires data to be copied in memory. This is a good thing, since you want to use the data in place (using less time and memory overall). The ndarray.T function also does not copy (i.e. returns a view) if possible, but I'm not sure how you ensure that it does not copy.
Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. I haven't tried this yet.

Categories

Resources