Reading an Ogg Opus header to check the crc - python

I decided to experiment with file formats and I'm using python to read said files.
Everything I have extracted from the Ogg header is correct, except the crc check.
The documentation says you must check the entire header and page with the original crc check value set to 0.
I'm wondering what steps I'm missing to get the expected result.
import zlib
import struct
with open("sample3.opus", "rb") as f_:
file_data = f_.read()
cp, ssv, htf, agp, ssn, psn, pc, ps = struct.unpack_from("<4sBBQIIIB", file_data, 0)
offset = struct.calcsize("<4sBBQIIIB")
segments = struct.unpack_from(f"<{ps}B", file_data, offset)
packet_size = 0
for num in segments:
packet_size += num
header_size = offset + len(segments) + packet_size
# Copying the entire packet then changing the crc to 0.
header_copy = bytearray()
header_copy.extend(file_data[0:header_size])
struct.pack_into("<I", header_copy, struct.calcsize("<4sBBQII"), 0)
print(pc)
print(zlib.crc32(header_copy))
This script results in:
277013243
752049619
The audio file I'm using:
https://filesamples.com/formats/opus

zlib.crc32() is not the CRC that they specify. They say the initial value and final exclusive-or is zero, whereas for zlib.crc32(), those values are both 0xffffffff. They fail to specify whether their CRC is reflected or not, so you'd need to try both to see which it is.
Update:
I checked, and it's a forward CRC. Unfortunately, you can't use zlib.crc32() to calculate it. You can compute it with this:
def crc32ogg(seq):
crc = 0
for b in seq:
crc ^= b << 24
for _ in range(8):
crc = (crc << 1) ^ 0x104c11db7 if crc & 0x80000000 else crc << 1
return crc

Related

Implementing a CRC algorithm

I'm trying to implement a CRC algorithm as defined in some video interface standards:
SMPTE296M-2001
BT.1120-9:2017
The raw data is 10 bit words that are squashed into 8 bit bytes which I have no issues extracting and working with in numpy.
the CRC has polynomial:
CRC(X) = X^18 + X^5 + X^4 + 1
I believe this gives me the constant:
POLY = 0x40031
I've tried a few different implementations and nothing I generate matches my sample data.
this implementation was inspired by this
MASK = 0x3FFFF
class MYCRC:
crc_table = []
def __init__(self):
if not self.crc_table:
for i in range(1024):
k = i
for j in range(10):
if k & 1:
k ^= POLY
k >>= 1
self.crc_table.append(k)
def calc(self, crc, data):
crc ^= MASK
for d in data:
crc = (crc >> 10) ^ self.crc_table[(crc & 0x3FF) ^ d]
return crc ^ MASK
then there is this implementation I pulled from somewhere (not sure where)
def crc_calc(crc, p):
crc = MASK & ~crc
for i in range(len(p)):
crc = (crc ^ p[i]) # & BIG_MASK
for j in range(10):
crc = ((crc >> 1) ^ (POLY & -(crc & 1))) # & BIG_MASK
return MASK & ~crc
I also looked at using this library which has support for using custom polynomials, but it appears to be built to work with 8 bit data, not the 10 bit data I have.
I'm not sure how best to share test data as I only have whole frames which if exported as a numpy file is ~5MB.
I'm also unclear as the the range of data I'm supposed to feed to the CRC calculation. I think from reading it, it should be from the first active sample on one line, up to the line count of the line after, then the checksum calculated over that range. This makes the most sense from a hardware perspective, but the standard doesn't read that clearly to me.
edit:
pastebin of 10 lines worth of test data, this includes the embedded checksum.
within a line of data, samples 0-7 are the EAV marker, 8-11 are the line number,12-16 are the two checksums. the data is two interleaved streams of video data (luma channel and CbCr channel).
the standards state the checksums are run from the first active sample to the end of the line data, which I interpret to mean that it runs from sample 740 of one line to sample 11 of the next line.
As per section 5 of SMPTE292M the data is 10 bit data which cannot go below 0x3 or above 0x3FC. as per table 4 the result of the CRC should be 18 bits which get split and embedded into the stream as two words (with one bit filled in with the not of another bit) Note that there is one checksum for each channel of data, these two checksums are at 12-16 on each line
edit 2
some longer test data that straddles the jump from blanking data to active frame data
The CRC calculation must be done reflected. (Clue in note on Table 9: "NOTE – CRC0 is the MSB of error detection codes.")
This C routine checks the CRCs in your example correctly:
// Update the CRC-18 crc with the low ten bits of word.
// Polynomial = 1000000000000110001
// Reflected (dropping x^18) = 10 0011 0000 0000 0000 = 0x23000
unsigned crc18(unsigned crc, unsigned word) {
crc ^= word & 0x3ff;
for (int k = 0; k < 10; k++)
crc = crc & 1 ? (crc >> 1) ^ 0x23000 : crc >> 1;
return crc;
}
Indeed the span of the check is from the start of the active line through the line numbers, up to just before the two CRCs in the stream. That calculation matches those two CRCs. Each CRC is calculated on alternating words from the stream. The CRCs are initialized to zero.

What is a python "cksum" equivalent for very large files and how does it work?

I have a problem that i need to validate huge compressed files after download (usually more than 10-20gb per file) against reference checksums that have apparently been generated using cksum
(To be more precise: My python script needs to download large compressed files from the ncbi ftp-server that was supposed to provide md5 checksums for validating the downloads, but instead only provided some different unspecified filehash/checksum values. After some trial and error I found that these checksums were identical to the output of the unix tool cksum, which apparently genereates CRC-checksums. So to compare/validate these i need to generate cksum-equivalent checksums for the downloaded files.)
It appears that the unix tool cksum yields totally different checksum values than the supposed equivalent unix tool crc32 (or the python zlib.crc32() function, for that matter). When googling the problem I could not understand the explanations for why this occurs, especially since they appear to be identical on some systems? So maybe this is because I work on a 64 bit system (but then: who doesn't nowadays)?
using built-in python modules I can easily generate md5- and CRC32 checksums, but none of these are equivalent to the cksum output, neither in decimal nor in hexadecimal representation.
I did find a previous post here on stackoverflow pointing to a snippet that seems to solve this. But while it works for small files, A.) I do not understand a word of it, so I have a hard time adapting it and B.) it does not seem to work well with large files.
for completeness sake: here is the snippet (python3 version):
#!/usr/bin/env python
import sys
crctab = [ 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc,
0x17c56b6b, 0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f,
0x2f8ad6d6, 0x2b4bcb61, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a,
0x384fbdbd, 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75, 0x6a1936c8,
0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3,
0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e,
0x95609039, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81, 0xad2f2d84,
0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d, 0xd4326d90, 0xd0f37027,
0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb, 0xceb42022,
0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077,
0x30476dc0, 0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c,
0x2e003dc5, 0x2ac12072, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1,
0x1fcdbb16, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde, 0x6b93dddb,
0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08,
0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d,
0x40d816ba, 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692, 0x8aad2b2f,
0x8e6c3698, 0x832f1041, 0x87ee0df6, 0x99a95df3, 0x9d684044,
0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050, 0xe9362689,
0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683,
0xd1799b34, 0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59,
0x608edb80, 0x644fc637, 0x7a089632, 0x7ec98b85, 0x738aad5c,
0x774bb0eb, 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53, 0x251d3b9e,
0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5,
0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48,
0x0e56f0ff, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2,
0xe6ea3d65, 0xeba91bbc, 0xef68060b, 0xd727bbb6, 0xd3e6a601,
0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604,
0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6,
0x9ff77d71, 0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad,
0x81b02d74, 0x857130c3, 0x5d8a9099, 0x594b8d2e, 0x5408abf7,
0x50c9b640, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8, 0x68860bfd,
0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e,
0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b,
0x0fdc1bec, 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654, 0xc5a92679,
0xc1683bce, 0xcc2b1d17, 0xc8ea00a0, 0xd6ad50a5, 0xd26c4d12,
0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676, 0xea23f0af,
0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5,
0x9e7d9662, 0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06,
0xa6322bdf, 0xa2f33668, 0xbcb4666d, 0xb8757bda, 0xb5365d03,
0xb1f740b4 ]
UNSIGNED = lambda n: n & 0xffffffff
def memcrc(b):
n = len(b)
i = c = s = 0
for c in b:
tabidx = (s>>24)^c
s = UNSIGNED((s << 8)) ^ crctab[tabidx]
while n:
c = n & 0o0377
n = n >> 8
s = UNSIGNED(s << 8) ^ crctab[(s >> 24) ^ c]
return UNSIGNED(~s)
if __name__ == '__main__':
fname = sys.argv[-1]
buffer = open(fname, 'rb').read()
print("%d\t%d\t%s" % (memcrc(buffer), len(buffer), fname))
Could someone please help me understand this?
what exactly is the problem with the difference between cksum and crc32?
is it simply the fact that the one is 32bit based and the other 64 bit?
Can i simply convert between the values produced by both, and if yes how?
what is the purpose of the crctab in the above snippet and how does the conversion work there?
I don't know the why part of your question. All I can say is that the great thing about standards is that you have so many to choose from.
cksum is specified by POSIX to use a different CRC than the more common CRC-32 you find in zlib, Python, used in zip and gzip files, etc. The CRC-32/CKSUM has this specification (from Greg Cook's CRC catalog):
width=32 poly=0x04c11db7 init=0x00000000 refin=false refout=false xorout=0xffffffff check=0x765e7680 residue=0xc704dd7b name="CRC-32/CKSUM"
The more common CRC-32 has this specification:
width=32 poly=0x04c11db7 init=0xffffffff refin=true refout=true xorout=0xffffffff check=0xcbf43926 residue=0xdebb20e3 name="CRC-32/ISO-HDLC"
The cksum utility on my system (macOS) computes the CRC-32/CKSUM, but it also has options to compute the CRC-32/ISO_HDLC, as well as two other actual checksums, the first from the BSD Unix sum command, and the second from the AT&T System V Unix sum command.
There is apparently no shortage of results that cksum might produce.
No, it has nothing to do with 32 vs. 64 bit systems.
No, you cannot convert between the values.
The purpose of the table is to speed up the CRC calculation by precomputing the CRC of every byte value.
I've tried refactoring your code into a class following a similar api to the standard Python hashlib:
class crc32:
def __init__(self):
self.nchars = 0
self.crc = 0
def update(self, buf):
crc = self.crc
for c in buf:
crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
self.crc = crc
self.nchars += len(buf)
def digest(self):
crc = self.crc
n = self.nchars
while n:
c = n & 0xff
crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
n >>= 8
return UNSIGNED(~crc)
I've expanded out UNSIGNED to try and make it faster and reordered some of the statements to be more similar to the standard zlib library (as used by Python) while I was trying to understand the differences. It seems they use a different polynomial to generate the table, but otherwise it's the same.
The above code can be used as:
with open('largefile', 'rb') as fd:
digest = crc32()
while buf := fd.read(4096):
digest.update(buf)
print(digest.digest())
which prints out the expected 1135714720 for a file created by:
echo -n hello world > test.txt
The above code should work for large files, but given the performance of Python this would take far too long to be useful. A 75MB file I have takes ~11 seconds, while cksum takes just ~0.2 seconds. You should be able to get somewhere with using Cython to speed it up, but that's a bit more fiddly and if your're struggling with the existing code it's going to be quite a learning curve!
I've had another play and got performance similar to cksum with Cython, the code looks like:
cdef unsigned int *crctab_c = [
// copy/paste crctab from above
]
cdef class crc32_c:
cdef unsigned int crc, nchars
def __init__(self):
self.nchars = 0
self.crc = 0
cdef _update(self, bytes buf):
cdef unsigned int crc, i, j
cdef unsigned char c
crc = self.crc
for c in buf:
i = (crc >> 24) ^ c
j = crc << 8
crc = crctab_c[i] ^ j
self.crc = crc
self.nchars += len(buf)
def update(self, buf):
return self._update(buf)
def digest(self):
crc = self.crc
n = self.nchars
while n:
c = n & 0xff
crc = crctab_c[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
n >>= 8
return (~crc) & 0xffffffff
after compiling this code with Cython, it can be used in a similar manner to the previous class. Performance is pretty good: Python now takes ~200ms for a 75MiB file and is basically the same as cksum, but much slower than zlib which only takes ~80ms.

16 bit artimatic sum in Python

I am working on a program that is creating IRIG106 Chapter 10 data for a cube-sat project. Currently it is being implemented in python and I am having difficulty implementing the final component of the Chapter 10 header.
The way I have implemented it I am currently finding checksum values that are larger than what will fit inside of an integer of the size defined by the specification (2 bytes).
The standard defines the header checksum in section 10.6.1.1 paragraph "J" of the IRIG 106-09 standard. It is defined as the following:
J Header Checksum. (2 Bytes) contains a value representing a 16-bit arithmetic sum of all 16-bit words in the header excluding the Header Checksum Word.
There is also a programming manual provided that has example C code that shows the following (from page A-2-17):
uint16_t I106_CALL_DECL uCalcHeaderChecksum(SuI106Ch10Header * psuHeader)
{
int iHdrIdx;
uint16_t uHdrSum;
uint16_t * aHdr = (uint16_t *)psuHeader;
uHdrSum = 0;
for (iHdrIdx=0; iHdrIdx<(HEADER_SIZE-2)/2; iHdrIdx++)
uHdrSum += aHdr[iHdrIdx];
return uHdrSum;
}
I have implemented the following in Python using the BitString library:
def calculate_checksum(byte_data: BitArray = None, header_length_bytes: int = 24, chunk_length: int = 16):
# Set the checksum to zero:
checksum = 0
# Loop through the Chapter 10 header and compute the 16 bit arithmetic sum:
for bit_location in range(0, (header_length_bytes-2), chunk_length):
# Get the range of bits to select:
bit_range = slice(bit_location, (bit_location + chunk_length))
# Get the uint representation of the bit data found:
checksum += Bits(bin=byte_data.bin[bit_range]).uint
# Write the computed checksum as binary data to the start location of the checksum in the header:
byte_data.overwrite(Bits(uint=checksum, length=chunk_length), (header_length_bytes-2*8))
Any thoughts or insights you could provide would be extremely appreciated. I know it should be a simple solution but I am just not able to see it.
--- Update 2 ---
I tried doing both roll over and truncation and they both produced the same result:
test_value = 2**16
test_value1 = test_value + 500
test_value2 = test_value1 % (2**16) -> 500
test_value3 = test_value1 & 0xFFFF -> 500
--- Update 3 ---
When I compare the execution of the python and C checksum functions I have run into the following using these values as an input per the spec:
Sync = "EB25" (2 bytes)
ChannelID = 1 (2 bytes)
PacketLen = 1024 (4 bytes)
When I compare the outputs at each step I see the following:
C:
Header0: EB25
index = 0 16bit chunk = 60197 checksum = 60197
Header1: 0001
index = 1 16bit chunk = 1 checksum = 60198
Header2: 0400
index = 2 16bit chunk = 1024 checksum = 61222
Header3: 0000
index = 3 16bit chunk = 0 checksum = 61222
Python:
eb25
index: 0 chunk: 60197 checksum: 60197
0001
index: 1 chunk: 1 checksum: 60198
0000
index: 2 chunk: 0 checksum: 60198
0400
index: 3 chunk: 1024 checksum: 61222
So, I know this question is really old, but I had the same issue. The endianess of the packet matters. In a .ch10 file, the packet start is 0x25EB because each section is little endian. Here's how I'm doing everything right now in C-ish code.
// read until the end of the file
while (!atEndOfFile)
{
// verify that we get teh sync packet
if ( readNextByte == 0x25 )
{
if ( readNextByte == 0xEB )
{
// store the sync packet
byte packetHeader[24];
packetHeader[0] = 0x25;
packetHeader[1] = 0xEB
// grab the rest of the header
for ( int i = 2; i < 24; j++ )
{
packetHeader[j] = readNextByte;
}
// grab the check sum from the packet
uint16 actualCheckSum = (packetHeader[23] << 8) |
packetHeader[22];
// calculate the checkSum
uint16 calculatedCheckSum = 0;
for ( int i = 0; i < 22; i++ )
{
calculatedCheckSum += (packetHeader[i + 1] << 8) |
packetHeader[i];
}
// verify the calculation
if ( calculatedCheckSum == actualChecksum )
{
printLine( "We calculated the checksum!");
printLine( "actual checksum: " + actualCheckSum +
"calculated checksum" + calculatedCheckSum );
}
}
}
}
I haven't done enough digging into the irig106 library, but I believe that it handles the translation when it reads in a .ch10 file.

CRC in python, little Endian

I need to calc CRC checksumme of binary file.
This file content CRC too and by comparing I find out when file was corrupted.
Bin file is something like long hex string
00200020 595A0008 ......
But CRC in file was calculated per integer(4.byte little Endian) like this
1.int - 0x20002000
2.int - 0x8000A559
How can I get the same result without switching bytes in python?
I was trying http://www.tty1.net/pycrc/ and played with reflect in, but I dont get the same result.
For this two bytes is correct crc 0xEF2B32F8
Try using the struct module. You can open a file and use the unpack read the data in any format you want with any Endianess.
I have written the following code for calculating crc8:
acklist = [] # a list of your byte string data
x = 0xff
crc = 0
for i in range(len(acklist)):
crc += int(acklist[i], 16) & x
print(crc)
crc = ~crc
crc += 1
crc1 = crc >> 8 & x
crc2 = crc & x

How to set parameters in Python zlib module

I want to write a Python program that makes PNG files. My big problem is with generating the CRC and the data in the IDAT chunk. Python 2.6.4 does have a zlib module, but there are extra settings needed. The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module.
As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification.
Note that I can generate the rest of the PNG file and the data that is to be compressed for the IDAT chunk, I just don't know how to properly compress the image data for the IDAT chunk after implementing the initial filtering step.
EDITED:
The problem with PyPNG is that it will not write tEXt chunks. A minor annoyance is that one has to manipulate the image as (R, G, B) data; I'd prefer to manipulate palette values of the pixels directly and then define the associations between palette values and color data. I'm also left unsure if PyPNG takes advantage of the "compression" allowed by using 1-, 2-, and 4- bit palette values in the image data to fit more than one pixel in a byte.
Even if you can't use PyPNG for the tEXt chunk reason, you can use its code! (it's MIT licensed). Here's how a chunk is written:
def write_chunk(outfile, tag, data=''):
"""
Write a PNG chunk to the output file, including length and
checksum.
"""
# http://www.w3.org/TR/PNG/#5Chunk-layout
outfile.write(struct.pack("!I", len(data)))
outfile.write(tag)
outfile.write(data)
checksum = zlib.crc32(tag)
checksum = zlib.crc32(data, checksum)
outfile.write(struct.pack("!i", checksum))
Note the use of zlib.crc32 to create the CRC checksum, and also note how the checksum runs over both the tag and the data.
For compressing the IDAT chunks you basically just use zlib. As others have noted the adler checksum and the default window size are all okay (by the way the PNG spec does not require a window size of 32768, it requires that the window is at most 32768 bytes; this is all a bit odd, because in any case 32768 is the maximum window size permitted by the current version of the zlib spec).
The code to do this in PyPNG is not particular great, see the write_passes() function. The bit that actually compresses the data and writes a chunk is this:
compressor = zlib.compressobj()
compressed = compressor.compress(tostring(data))
if len(compressed):
# print >> sys.stderr, len(data), len(compressed)
write_chunk(outfile, 'IDAT', compressed)
PyPNG never uses scanline filtering. Partly this is because it would be very slow in Python, partly because I haven't written the code. If you have Python code to do filtering, it would be a most welcome contribution to PyPNG. :)
Short answer: (1) "deflate" and "32Kb window" are the defaults (2) uses adler32 not crc32
Long answer:
""" The PNG specification REQUIRES the IDAT data to be compressed with zlib's deflate method with a window size of 32768 bytes, but I can't find how to set those parameters in the Python zlib module. """
You don't need to set them. Those are the defaults.
If you really want to specify non-default arguments to zlib, you can use zlib.compressobj() ... it has several args that are not documented in the Python docs. Reading material:
source: Python's gzip.py (see how it calls zlib.compressobj)
source: Python's zlibmodule.c (see its defaults)
SO: This question (see answers by MizardX and myself, and comments on each)
docs: The manual on the zlib site
"""As for the CRC for each chunk, the zlib module documentation indicates that it contains a CRC function. I believe that calling that CRC function as crc32(data,-1) will generate the CRC that I need, though if necessary I can translate the C code given in the PNG specification."""
Please check out the zlib specification aka RFC 1950 ... it says that the checksum used is adler32
The zlib compress or compressobj output will include the appropriate CRC; why do you think that you will need to do it yourself?
Edit So you do need a CRC-32. Good news: zlib.crc32() will do the job:
Code:
import zlib
crc_table = None
def make_crc_table():
global crc_table
crc_table = [0] * 256
for n in xrange(256):
c = n
for k in xrange(8):
if c & 1:
c = 0xedb88320L ^ (c >> 1)
else:
c = c >> 1
crc_table[n] = c
make_crc_table()
"""
/* Update a running CRC with the bytes buf[0..len-1]--the CRC
should be initialized to all 1's, and the transmitted value
is the 1's complement of the final running CRC (see the
crc() routine below)). */
"""
def update_crc(crc, buf):
c = crc
for byte in buf:
c = crc_table[int((c ^ ord(byte)) & 0xff)] ^ (c >> 8)
return c
# /* Return the CRC of the bytes buf[0..len-1]. */
def crc(buf):
return update_crc(0xffffffffL, buf) ^ 0xffffffffL
if __name__ == "__main__":
tests = [
"",
"\x00",
"\x01",
"Twas brillig and the slithy toves did gyre and gimble in the wabe",
]
for test in tests:
model = crc(test) & 0xFFFFFFFFL
zlib_result = zlib.crc32(test) & 0xFFFFFFFFL
print (model, zlib_result, model == zlib_result)
Output from Python 2.7 is below. Also tested with Python 2.1 to 2.6 inclusive and 1.5.2 JFTHOI.
(0L, 0L, True)
(3523407757L, 3523407757L, True)
(2768625435L, 2768625435L, True)
(4186783197L, 4186783197L, True)
Don't you want to use some existing software to generate your PNGs? How about PyPNG?
There are libraries that can write PNG files for you, such as PIL. That will be easier and faster, and as an added bonus you can read and write tons of formats.
It looks like you will have to resort to call zlib "by hand" using ctypes --
It is not that hard:
>>> import ctypes
>>> z = ctypes.cdll.LoadLibrary("libzip.so.1")
>>> z.zlibVersion.restype=ctypes.c_char_p
>>> z.zlibVersion()
'1.2.3'
You can check the zlib library docmentation here: http://zlib.net/manual.html
The zlib.crc32 works fine, and the zlib compressor has correct defaults for png generation.
For the casual reader who looks for png generation from Python code, here is a complete example that you can use as a starter for your own png generator code - all you need is the standard zlib module and some bytes-encoding:
#! /usr/bin/python
""" Converts a list of list into gray-scale PNG image. """
__copyright__ = "Copyright (C) 2014 Guido Draheim"
__licence__ = "Public Domain"
import zlib
import struct
def makeGrayPNG(data, height = None, width = None):
def I1(value):
return struct.pack("!B", value & (2**8-1))
def I4(value):
return struct.pack("!I", value & (2**32-1))
# compute width&height from data if not explicit
if height is None:
height = len(data) # rows
if width is None:
width = 0
for row in data:
if width < len(row):
width = len(row)
# generate these chunks depending on image type
makeIHDR = True
makeIDAT = True
makeIEND = True
png = b"\x89" + "PNG\r\n\x1A\n".encode('ascii')
if makeIHDR:
colortype = 0 # true gray image (no palette)
bitdepth = 8 # with one byte per pixel (0..255)
compression = 0 # zlib (no choice here)
filtertype = 0 # adaptive (each scanline seperately)
interlaced = 0 # no
IHDR = I4(width) + I4(height) + I1(bitdepth)
IHDR += I1(colortype) + I1(compression)
IHDR += I1(filtertype) + I1(interlaced)
block = "IHDR".encode('ascii') + IHDR
png += I4(len(IHDR)) + block + I4(zlib.crc32(block))
if makeIDAT:
raw = b""
for y in xrange(height):
raw += b"\0" # no filter for this scanline
for x in xrange(width):
c = b"\0" # default black pixel
if y < len(data) and x < len(data[y]):
c = I1(data[y][x])
raw += c
compressor = zlib.compressobj()
compressed = compressor.compress(raw)
compressed += compressor.flush() #!!
block = "IDAT".encode('ascii') + compressed
png += I4(len(compressed)) + block + I4(zlib.crc32(block))
if makeIEND:
block = "IEND".encode('ascii')
png += I4(0) + block + I4(zlib.crc32(block))
return png
def _example():
with open("cross3x3.png","wb") as f:
f.write(makeGrayPNG([[0,255,0],[255,255,255],[0,255,0]]))

Categories

Resources