I want to write a Python program that makes PNG files. My big problem is with generating the CRC and the data in the IDAT chunk. Python 2.6.4 does have a zlib module, but there are extra settings needed. The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module.
As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification.
Note that I can generate the rest of the PNG file and the data that is to be compressed for the IDAT chunk, I just don't know how to properly compress the image data for the IDAT chunk after implementing the initial filtering step.
EDITED:
The problem with PyPNG is that it will not write tEXt chunks. A minor annoyance is that one has to manipulate the image as (R, G, B) data; I'd prefer to manipulate palette values of the pixels directly and then define the associations between palette values and color data. I'm also left unsure if PyPNG takes advantage of the "compression" allowed by using 1-, 2-, and 4- bit palette values in the image data to fit more than one pixel in a byte.
Even if you can't use PyPNG for the tEXt chunk reason, you can use its code! (it's MIT licensed). Here's how a chunk is written:
def write_chunk(outfile, tag, data=''):
"""
Write a PNG chunk to the output file, including length and
checksum.
"""
# http://www.w3.org/TR/PNG/#5Chunk-layout
outfile.write(struct.pack("!I", len(data)))
outfile.write(tag)
outfile.write(data)
checksum = zlib.crc32(tag)
checksum = zlib.crc32(data, checksum)
outfile.write(struct.pack("!i", checksum))
Note the use of zlib.crc32 to create the CRC checksum, and also note how the checksum runs over both the tag and the data.
For compressing the IDAT chunks you basically just use zlib. As others have noted the adler checksum and the default window size are all okay (by the way the PNG spec does not require a window size of 32768, it requires that the window is at most 32768 bytes; this is all a bit odd, because in any case 32768 is the maximum window size permitted by the current version of the zlib spec).
The code to do this in PyPNG is not particular great, see the write_passes() function. The bit that actually compresses the data and writes a chunk is this:
compressor = zlib.compressobj()
compressed = compressor.compress(tostring(data))
if len(compressed):
# print >> sys.stderr, len(data), len(compressed)
write_chunk(outfile, 'IDAT', compressed)
PyPNG never uses scanline filtering. Partly this is because it would be very slow in Python, partly because I haven't written the code. If you have Python code to do filtering, it would be a most welcome contribution to PyPNG. :)
Short answer: (1) "deflate" and "32Kb window" are the defaults (2) uses adler32 not crc32
Long answer:
""" The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module. """
You don't need to set them. Those are the defaults.
If you really want to specify non-default arguments to zlib, you can use zlib.compressobj() ... it has several args that are not documented in the Python docs. Reading material:
source: Python's gzip.py (see how it calls zlib.compressobj)
source: Python's zlibmodule.c (see its defaults)
SO: This question (see answers by MizardX and myself, and comments on each)
docs: The manual on the zlib site
"""As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification."""
Please check out the zlib specification aka RFC 1950 ... it says that the checksum used is adler32
The zlib compress or compressobj output will include the appropriate CRC; why do you think that you will need to do it yourself?
Edit So you do need a CRC-32. Good news: zlib.crc32() will do the job:
Code:
import zlib
crc_table = None
def make_crc_table():
global crc_table
crc_table = [0] * 256
for n in xrange(256):
c = n
for k in xrange(8):
if c & 1:
c = 0xedb88320L ^ (c >> 1)
else:
c = c >> 1
crc_table[n] = c
make_crc_table()
"""
/* Update a running CRC with the bytes buf[0..len-1]--the CRC
should be initialized to all 1's, and the transmitted value
is the 1's complement of the final running CRC (see the
crc() routine below)). */
"""
def update_crc(crc, buf):
c = crc
for byte in buf:
c = crc_table[int((c ^ ord(byte)) & 0xff)] ^ (c >> 8)
return c
# /* Return the CRC of the bytes buf[0..len-1]. */
def crc(buf):
return update_crc(0xffffffffL, buf) ^ 0xffffffffL
if __name__ == "__main__":
tests = [
"",
"\x00",
"\x01",
"Twas brillig and the slithy toves did gyre and gimble in the wabe",
]
for test in tests:
model = crc(test) & 0xFFFFFFFFL
zlib_result = zlib.crc32(test) & 0xFFFFFFFFL
print (model, zlib_result, model == zlib_result)
Output from Python 2.7 is below. Also tested with Python 2.1 to 2.6 inclusive and 1.5.2 JFTHOI.
(0L, 0L, True)
(3523407757L, 3523407757L, True)
(2768625435L, 2768625435L, True)
(4186783197L, 4186783197L, True)
Don't you want to use some existing software to generate your PNGs? How about PyPNG?
There are libraries that can write PNG files for you, such as PIL. That will be easier and faster, and as an added bonus you can read and write tons of formats.
It looks like you will have to resort to call zlib "by hand" using ctypes --
It is not that hard:
>>> import ctypes
>>> z = ctypes.cdll.LoadLibrary("libzip.so.1")
>>> z.zlibVersion.restype=ctypes.c_char_p
>>> z.zlibVersion()
'1.2.3'
You can check the zlib library docmentation here: http://zlib.net/manual.html
The zlib.crc32 works fine, and the zlib compressor has correct defaults for png generation.
For the casual reader who looks for png generation from Python code, here is a complete example that you can use as a starter for your own png generator code - all you need is the standard zlib module and some bytes-encoding:
#! /usr/bin/python
""" Converts a list of list into gray-scale PNG image. """
__copyright__ = "Copyright (C) 2014 Guido Draheim"
__licence__ = "Public Domain"
import zlib
import struct
def makeGrayPNG(data, height = None, width = None):
def I1(value):
return struct.pack("!B", value & (2**8-1))
def I4(value):
return struct.pack("!I", value & (2**32-1))
# compute width&height from data if not explicit
if height is None:
height = len(data) # rows
if width is None:
width = 0
for row in data:
if width < len(row):
width = len(row)
# generate these chunks depending on image type
makeIHDR = True
makeIDAT = True
makeIEND = True
png = b"\x89" + "PNG\r\n\x1A\n".encode('ascii')
if makeIHDR:
colortype = 0 # true gray image (no palette)
bitdepth = 8 # with one byte per pixel (0..255)
compression = 0 # zlib (no choice here)
filtertype = 0 # adaptive (each scanline seperately)
interlaced = 0 # no
IHDR = I4(width) + I4(height) + I1(bitdepth)
IHDR += I1(colortype) + I1(compression)
IHDR += I1(filtertype) + I1(interlaced)
block = "IHDR".encode('ascii') + IHDR
png += I4(len(IHDR)) + block + I4(zlib.crc32(block))
if makeIDAT:
raw = b""
for y in xrange(height):
raw += b"\0" # no filter for this scanline
for x in xrange(width):
c = b"\0" # default black pixel
if y < len(data) and x < len(data[y]):
c = I1(data[y][x])
raw += c
compressor = zlib.compressobj()
compressed = compressor.compress(raw)
compressed += compressor.flush() #!!
block = "IDAT".encode('ascii') + compressed
png += I4(len(compressed)) + block + I4(zlib.crc32(block))
if makeIEND:
block = "IEND".encode('ascii')
png += I4(0) + block + I4(zlib.crc32(block))
return png
def _example():
with open("cross3x3.png","wb") as f:
f.write(makeGrayPNG([[0,255,0],[255,255,255],[0,255,0]]))
Related
I am trying to zero out the pixels in some .tiff-like biomedical scans (.svs & .ndpi) by changing values directly in the binary file.
For reference, I am using the docs on the .tiff format here.
As a sanity check, I've confirmed that the first two bytes have values 73 and 73 (or I and I in ASCII), meaning it is little-endian, and that the two next bytes are the value 42 (both these things are expected as according to the docs just mentioned).
I wrote a Python script that reads the IFD (Image File Directory) and its components, but I am having troubles proceeding from there.
My code is this:
with open('scan.svs', "rb") as f:
# Read away the first 4 bytes:
f.read(4)
# Read offset of first IFD as the four next bytes:
IFD_offset = int.from_bytes(f.read(4), 'little')
# Move to IFD:
f.seek(IFD_offset, 0)
# Read IFD:
IFD = f.read(12)
# Get components of IFD:
tag = int.from_bytes(IFD[:2], 'little')
field_type = int.from_bytes(IFD[2:4], 'little')
count = int.from_bytes(IFD[4:8], 'little')
value_offset = int.from_bytes(IFD[8:], 'little')
# Now what?
The values for the components are tag=16, field_type=254, count=65540 and value_offset=0.
How do I go from there?
Ps: Using Python is not a must, if there is some other tool that could more easily to the job.
I googled this issue for last 2 weeks and wasn't able to find an algorithm or solution. I have some short .wav file but it has MULAW compression and python doesn't seem to have function inside wave.py that can successfully decompresses it. So I've taken upon myself to build a decoder in python.
I've found some info about MULAW in basic elements:
Wikipedia
A-law u-Law comparison
Some c-esc codec library
So I need some guidance, since I don't know how to approach getting from signed short integer to a full wave signal. This is my initial thought from what I've gathered so far:
So from wiki I've got a equation for u-law compression and decompression :
compression :
decompression :
So judging by compression equation, it looks like the output is limited to a float range of -1 to +1 , and with signed short integer from –32,768 to 32,767 so it looks like I would need to convert it from short int to float in specific range.
Now, to be honest, I've heard of quantisation before, but I am not sure if I should first try and dequantize and then decompress or in the other way, or even if in this case it is the same thing... the tutorials/documentation can be a bit of tricky with terminology.
The wave file I am working with is supposed to contain 'A' sound like for speech synthesis, I could probably verify success by comparing 2 waveforms in some audio software and custom wave analyzer but I would really like to diminish trial and error section of this process.
So what I've had in mind:
u = 0xff
data_chunk = b'\xe7\xe7' # -6169
data_to_r1 = unpack('h',data_chunk)[0]/0xffff # I suspect this is wrong,
# # but I don't know what else
u_law = ( -1 if data_chunk<0 else 1 )*( pow( 1+u, abs(data_to_r1)) -1 )/u
So is there some sort of algorithm or crucial steps I would need to take in form of first: decompression, second: quantisation : third ?
Since everything I find on google is how to read a .wav PCM-modulated file type, not how to manage it if wild compression arises.
So, after scouring the google the solution was found in github ( go figure ). I've searched for many many algorithms and found 1 that is within bounds of error for lossy compression. Which is for u law for positive values from 30 -> 1 and for negative values from -32 -> -1
To be honest i think this solution is adequate but not quite per equation per say, but it is best solution for now. This code is transcribed to python directly from gcc9108 audio codec
def uLaw_d(i8bit):
bias = 33
sign = pos = 0
decoded = 0
i8bit = ~i8bit
if i8bit&0x80:
i8bit &= ~(1<<7)
sign = -1
pos = ( (i8bit&0xf0) >> 4 ) + 5
decoded = ((1 << pos) | ((i8bit & 0x0F) << (pos - 4)) | (1 << (pos - 5))) - bias
return decoded if sign else ~decoded
def uLaw_e(i16bit):
MAX = 0x1fff
BIAS = 33
mask = 0x1000
sign = lsb = 0
pos = 12
if i16bit < 0:
i16bit = -i16bit
sign = 0x80
i16bit += BIAS
if ( i16bit>MAX ): i16bit = MAX
for x in reversed(range(pos)):
if i16bit&mask != mask and pos>=5:
pos = x
break
lsb = ( i16bit>>(pos-4) )&0xf
return ( ~( sign | ( pos<<4 ) | lsb ) )
With test:
print( 'normal :\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(0xff) )
print( 'encoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_e(0xff)) )
print( 'decoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_d(uLaw_e(0xff))) )
and output:
normal : 255 | FF : 0000000011111111
encoded: -179 | -B3 : -000000010110011
decoded: 263 | 107 : 0000000100000111
And as you can see 263-255 = 8 which is within bounds. When i tried to implement seeemmmm method described in G.711 ,that kind user Oliver Charlesworth suggested that i look in to , the decoded value for maximum in data was -8036 which is close to the maximum of uLaw spec, but i couldn't reverse engineer decoding function to get binary equivalent of function from wikipedia.
Lastly, i must say that i am currently disappointed that python library doesn't support all kind of compression algorithms since it is not just a tool that people use, it is also a resource python consumers learn from since most of data for further dive into code isn't readily available or understandable.
EDIT
After decoding the data and writing wav file via wave.py i've successfully succeeded to write a new raw linear PCM file. This works... even though i was sceptical at first.
EDIT 2: ::> you can find real solution oncompressions.py
I find this helpful for converting to/from ulaw with numpy arrays.
import audioop
def numpy_audioop_helper(x, xdtype, func, width, ydtype):
'''helper function for using audioop buffer conversion in numpy'''
xi = np.asanyarray(x).astype(xdtype)
if np.any(x != xi):
xinfo = np.iinfo(xdtype)
raise ValueError("input must be %s [%d..%d]" % (xdtype, xinfo.min, xinfo.max))
y = np.frombuffer(func(xi.tobytes(), width), dtype=ydtype)
return y.reshape(xi.shape)
def audioop_ulaw_compress(x):
return numpy_audioop_helper(x, np.int16, audioop.lin2ulaw, 2, np.uint8)
def audioop_ulaw_expand(x):
return numpy_audioop_helper(x, np.uint8, audioop.ulaw2lin, 2, np.int16)
Python actually supports decoding u-Law out of the box:
audioop.ulaw2lin(fragment, width)
Convert sound fragments in u-LAW encoding to linearly encoded sound fragments. u-LAW encoding always uses 8 bits samples, so width
refers only to the sample width of the output fragment here.
https://docs.python.org/3/library/audioop.html#audioop.ulaw2lin
import sys
import os
import zlib
try:
import pylzma as lzma
except ImportError:
import lzma
from io import StringIO
import struct
#-----------------------------------------------------------------------------------------------------------------------
def read_ui8(c):
return struct.unpack('<B', c)[0]
def read_ui16(c):
return struct.unpack('<H', c)[0]
def read_ui32(c):
return struct.unpack('<I', c)[0]
def parse(input):
"""Parses the header information from an SWF file."""
if hasattr(input, 'read'):
input.seek(0)
else:
input = open(input, 'rb')
header = { }
# Read the 3-byte signature field
header['signature'] = signature = b''.join(struct.unpack('<3c', input.read(3))).decode()
# Version
header['version'] = read_ui8(input.read(1))
# File size (stored as a 32-bit integer)
header['size'] = read_ui32(input.read(4))
# Payload
if header['signature'] == 'FWS':
print("The opened file doesn't appear to be compressed")
buffer = input.read(header['size'])
elif header['signature'] == 'CWS':
print("The opened file appears to be compressed with Zlib")
buffer = zlib.decompress(input.read(header['size']))
elif header['signature'] == 'ZWS':
print("The opened file appears to be compressed with Lzma")
# ZWS(LZMA)
# | 4 bytes | 4 bytes | 4 bytes | 5 bytes | n bytes | 6 bytes |
# | 'ZWS'+version | scriptLen | compressedLen | LZMA props | LZMA data | LZMA end marker |
size = read_ui32(input.read(4))
buffer = lzma.decompress(input.read())
# Containing rectangle (struct RECT)
# The number of bits used to store the each of the RECT values are
# stored in first five bits of the first byte.
nbits = read_ui8(buffer[0]) >> 3
current_byte, buffer = read_ui8(buffer[0]), buffer[1:]
bit_cursor = 5
for item in 'xmin', 'xmax', 'ymin', 'ymax':
value = 0
for value_bit in range(nbits-1, -1, -1): # == reversed(range(nbits))
if (current_byte << bit_cursor) & 0x80:
value |= 1 << value_bit
# Advance the bit cursor to the next bit
bit_cursor += 1
if bit_cursor > 7:
# We've exhausted the current byte, consume the next one
# from the buffer.
current_byte, buffer = read_ui8(buffer[0]), buffer[1:]
bit_cursor = 0
# Convert value from TWIPS to a pixel value
header[item] = value / 20
header['width'] = header['xmax'] - header['xmin']
header['height'] = header['ymax'] - header['ymin']
header['frames'] = read_ui16(buffer[0:2])
header['fps'] = read_ui16(buffer[2:4])
input.close()
return header
header = parse(sys.argv[1]);
print('SWF header')
print('----------')
print('Version: %s' % header['version'])
print('Signature: %s' % header['signature'])
print('Dimensions: %s x %s' % (header['width'], header['height']))
print('Bounding box: (%s, %s, %s, %s)' % (header['xmin'], header['xmax'], header['ymin'], header['ymax']))
print('Frames: %s' % header['frames'])
print('FPS: %s' % header['fps'])
I was under the impression the built in python 3.4 LZMA module works the same as the Python 2.7 pyLZMA module.
The code I've provided runs on both 2.7 and 3.4, but when it is run on 3.4 (which doesn't have pylzma so it resorts to the inbuilt lzma) I get the following error:
_lzma.LZMAError: Input format not supported by decoder
Why does pylzma work but Python 3.4's lzma doesn't?
While I do not have an answer to why the two modules work differently, I do have a solution.
I was unable to get the non-stream LZMA lzma.decompress to work since I do not have enough knowledge about the LZMA/XZ/SWF specs, however I got the lzma.LZMADecompressor to work. For completeness, I believe SWF LZMA uses this header format (not 100% confirmed):
Bytes Length Type Endianness Description
0- 2 3 UI8 - SWF Signature: ZWS
3 1 UI8 - SWF Version
4- 7 4 UI32 LE SWF FileLength aka File Size
8-11 4 UI32 LE SWF? Compressed Size (File Size - 17)
12 1 - - LZMA Decoder Properties
13-16 4 UI32 LE LZMA Dictionary Size
17- - - - LZMA Compressed Data (including rest of SWF header)
However the LZMA file format spec says that it should be:
Bytes Length Type Endianness Description
0 1 - - LZMA Decoder Properties
1- 4 4 UI32 LE LZMA Dictionary Size
5-12 8 UI64 LE LZMA Uncompressed Size
13- - - - LZMA Compressed Data
I was never able to really get my head around what Uncompressed Size should be (if even possible to define for this format). pylzma seems to not care about this, while Python 3.3 lzma does. However, it seems that an explicit unknown size works and may be specified as an UI64 with value 2^64, e.g. 8*b'\xff' or 8*'\xff', so by shuffling around headers a bit and instead of using:
buffer = lzma.decompress(input.read())
Try:
d = lzma.LZMADecompressor(format=lzma.FORMAT_ALONE)
buffer = d.decompress(input.read(5) + 8*b'\xff' + input.read())
Note: I had no local python3 interpreter available so only tested it online with a slightly modified read procedure, so it might not work out of the box.
Edit: Confirmed to work in python3 however some things needed to be changed, like Marcus mentioned about unpack (easily solved by using buffer[0:1] instead of buffer[0]). It's not really necessary to read the whole file either, a small chunk, say 256 bytes should be fine for reading the whole SWF header. The frames field is a bit quirky too, though I believe all you have to do is some bit shifting, i.e.:
header['frames'] = read_ui16(buffer[0:2]) >> 8
SWF file format spec
LZMA file format spec
I'm trying to use python to create a random binary file. This is what I've got already:
f = open(filename,'wb')
for i in xrange(size_kb):
for ii in xrange(1024/4):
f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))
f.close()
But it's terribly slow (0.82 seconds for size_kb=1024 on my 3.9GHz SSD disk machine). A big bottleneck seems to be the random int generation (replacing the randint() with a 0 reduces running time from 0.82s to 0.14s).
Now I know there are more efficient ways of creating random data files (namely dd if=/dev/urandom) but I'm trying to figure this out for sake of curiosity... is there an obvious way to improve this?
IMHO - the following is completely redundant:
f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))
There's absolutely no need to use struct.pack, just do something like:
import os
fileSizeInBytes = 1024
with open('output_filename', 'wb') as fout:
fout.write(os.urandom(fileSizeInBytes)) # replace 1024 with a size in kilobytes if it is not unreasonably large
Then, if you need to re-use the file for reading integers, then struct.unpack then.
(my use case is generating a file for a unit test so I just need a
file that isn't identical with other generated files).
Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.
The python code you should write completely depends on the way you intend to use the random binary file. If you just need a "rather good" randomness for multiple purposes, then the code of Jon Clements is probably the best.
However, on Linux OS at least, os.urandom relies on /dev/urandom, which is described in the Linux Kernel (drivers/char/random.c) as follows:
The /dev/urandom device [...] will return as many bytes as are
requested. As more and more random bytes are requested without giving
time for the entropy pool to recharge, this will result in random
numbers that are merely cryptographically strong. For many
applications, however, this is acceptable.
So the question is, is this acceptable for your application ? If you prefer a more secure RNG, you could read bytes on /dev/random instead. The main inconvenient of this device: it can block indefinitely if the Linux kernel is not able to gather enough entropy. There are also other cryptographically secure RNGs like EGD.
Alternatively, if your main concern is execution speed and if you just need some "light-randomness" for a Monte-Carlo method (i.e unpredictability doesn't matter, uniform distribution does), you could consider generate your random binary file once and use it many times, at least for development.
Here's a complete script based on accepted answer that creates random files.
import sys, os
def help(error: str = None) -> None:
if error and error != "help":
print("***",error,"\n\n",file=sys.stderr,sep=' ',end='');
sys.exit(1)
print("""\tCreates binary files with random content""", end='\n')
print("""Usage:""",)
print(os.path.split(__file__)[1], """ "name1" "1TB" "name2" "5kb"
Accepted units: MB, GB, KB, TB, B""")
sys.exit(2)
# https://stackoverflow.com/a/51253225/1077444
def convert_size_to_bytes(size_str):
"""Convert human filesizes to bytes.
ex: 1 tb, 1 kb, 1 mb, 1 pb, 1 eb, 1 zb, 3 yb
To reverse this, see hurry.filesize or the Django filesizeformat template
filter.
:param size_str: A human-readable string representing a file size, e.g.,
"22 megabytes".
:return: The number of bytes represented by the string.
"""
multipliers = {
'kilobyte': 1024,
'megabyte': 1024 ** 2,
'gigabyte': 1024 ** 3,
'terabyte': 1024 ** 4,
'petabyte': 1024 ** 5,
'exabyte': 1024 ** 6,
'zetabyte': 1024 ** 7,
'yottabyte': 1024 ** 8,
'kb': 1024,
'mb': 1024**2,
'gb': 1024**3,
'tb': 1024**4,
'pb': 1024**5,
'eb': 1024**6,
'zb': 1024**7,
'yb': 1024**8,
}
for suffix in multipliers:
size_str = size_str.lower().strip().strip('s')
if size_str.lower().endswith(suffix):
return int(float(size_str[0:-len(suffix)]) * multipliers[suffix])
else:
if size_str.endswith('b'):
size_str = size_str[0:-1]
elif size_str.endswith('byte'):
size_str = size_str[0:-4]
return int(size_str)
if __name__ == "__main__":
input = {} #{ file: byte_size }
if (len(sys.argv)-1) % 2 != 0:
print("-- Provide even number of arguments --")
print(f'--\tGot: {len(sys.argv)-1}: "' + r'" "'.join(sys.argv[1:]) +'"')
sys.exit(2)
elif len(sys.argv) == 1:
help()
try:
for file, size_str in zip(sys.argv[1::2], sys.argv[2::2]):
input[file] = convert_size_to_bytes(size_str)
except ValueError as ex:
print(f'Invalid size: "{size_str}"', file=sys.stderr)
sys.exit(1)
for file, size_bytes in input.items():
print(f"Writing: {file}")
#https://stackoverflow.com/a/14276423/1077444
with open(file, 'wb') as fout:
while( size_bytes > 0 ):
wrote = min(size_bytes, 1024) #chunk
fout.write(os.urandom(wrote))
size_bytes -= wrote
I'm trying to write a program to display PCM data. I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. However, I'm not sure how to interpret the data.
The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). This all seems cheery, but then I tried printing a single frame:'\xc0\xff\xd0\xff' which is 4 bytes. I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there.
96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds
However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. What's going on here?
Additionally, how am I to know if the bytes are signed or unsigned?
Many thanks!
Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it:
import wave
import struct
def pcm_channels(wave_file):
"""Given a file-like object or file path representing a wave file,
decompose it into its constituent PCM data streams.
Input: A file like object or file path
Output: A list of lists of integers representing the PCM coded data stream channels
and the sample rate of the channels (mixed rate channels not supported)
"""
stream = wave.open(wave_file,"rb")
num_channels = stream.getnchannels()
sample_rate = stream.getframerate()
sample_width = stream.getsampwidth()
num_frames = stream.getnframes()
raw_data = stream.readframes( num_frames ) # Returns byte data
stream.close()
total_samples = num_frames * num_channels
if sample_width == 1:
fmt = "%iB" % total_samples # read unsigned chars
elif sample_width == 2:
fmt = "%ih" % total_samples # read signed 2 byte shorts
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
integer_data = struct.unpack(fmt, raw_data)
del raw_data # Keep memory tidy (who knows how big it might be)
channels = [ [] for time in range(num_channels) ]
for index, value in enumerate(integer_data):
bucket = index % num_channels
channels[bucket].append(value)
return channels, sample_rate
"Two channels" means stereo, so it makes no sense to sum each channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). As for signedness, as explained for example here, and I quote:
8-bit samples are stored as unsigned
bytes, ranging from 0 to 255. 16-bit
samples are stored as 2's-complement
signed integers, ranging from -32768
to 32767.
This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file.
I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this.
pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)
Also, one package that I used was called PyAudio, though I still had to use the wave package with it.
Each sample is 16 bits and there 2 channels, so the frame takes 4 bytes
The duration is simply the number of frames divided by the number of frames per second. From your data this is: 96333 / 44100 = 2.18 seconds.
Building upon this answer, you can get a good performance boost by using numpy.fromstring or numpy.fromfile. Also see this answer.
Here is what I did:
def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):
if sample_width == 1:
dtype = np.uint8 # unsigned char
elif sample_width == 2:
dtype = np.int16 # signed 2-byte short
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
channels = np.fromstring(raw_bytes, dtype=dtype)
if interleaved:
# channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
channels.shape = (n_frames, n_channels)
channels = channels.T
else:
# channels are not interleaved. All samples from channel M occur before all samples from channel M-1
channels.shape = (n_channels, n_frames)
return channels
Assigning a new value to shape will throw an error if it requires data to be copied in memory. This is a good thing, since you want to use the data in place (using less time and memory overall). The ndarray.T function also does not copy (i.e. returns a view) if possible, but I'm not sure how you ensure that it does not copy.
Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. I haven't tried this yet.