Implementing a CRC algorithm - python
I'm trying to implement a CRC algorithm as defined in some video interface standards:
SMPTE296M-2001
BT.1120-9:2017
The raw data is 10 bit words that are squashed into 8 bit bytes which I have no issues extracting and working with in numpy.
the CRC has polynomial:
CRC(X) = X^18 + X^5 + X^4 + 1
I believe this gives me the constant:
POLY = 0x40031
I've tried a few different implementations and nothing I generate matches my sample data.
this implementation was inspired by this
MASK = 0x3FFFF
class MYCRC:
crc_table = []
def __init__(self):
if not self.crc_table:
for i in range(1024):
k = i
for j in range(10):
if k & 1:
k ^= POLY
k >>= 1
self.crc_table.append(k)
def calc(self, crc, data):
crc ^= MASK
for d in data:
crc = (crc >> 10) ^ self.crc_table[(crc & 0x3FF) ^ d]
return crc ^ MASK
then there is this implementation I pulled from somewhere (not sure where)
def crc_calc(crc, p):
crc = MASK & ~crc
for i in range(len(p)):
crc = (crc ^ p[i]) # & BIG_MASK
for j in range(10):
crc = ((crc >> 1) ^ (POLY & -(crc & 1))) # & BIG_MASK
return MASK & ~crc
I also looked at using this library which has support for using custom polynomials, but it appears to be built to work with 8 bit data, not the 10 bit data I have.
I'm not sure how best to share test data as I only have whole frames which if exported as a numpy file is ~5MB.
I'm also unclear as the the range of data I'm supposed to feed to the CRC calculation. I think from reading it, it should be from the first active sample on one line, up to the line count of the line after, then the checksum calculated over that range. This makes the most sense from a hardware perspective, but the standard doesn't read that clearly to me.
edit:
pastebin of 10 lines worth of test data, this includes the embedded checksum.
within a line of data, samples 0-7 are the EAV marker, 8-11 are the line number,12-16 are the two checksums. the data is two interleaved streams of video data (luma channel and CbCr channel).
the standards state the checksums are run from the first active sample to the end of the line data, which I interpret to mean that it runs from sample 740 of one line to sample 11 of the next line.
As per section 5 of SMPTE292M the data is 10 bit data which cannot go below 0x3 or above 0x3FC. as per table 4 the result of the CRC should be 18 bits which get split and embedded into the stream as two words (with one bit filled in with the not of another bit) Note that there is one checksum for each channel of data, these two checksums are at 12-16 on each line
edit 2
some longer test data that straddles the jump from blanking data to active frame data
The CRC calculation must be done reflected. (Clue in note on Table 9: "NOTE – CRC0 is the MSB of error detection codes.")
This C routine checks the CRCs in your example correctly:
// Update the CRC-18 crc with the low ten bits of word.
// Polynomial = 1000000000000110001
// Reflected (dropping x^18) = 10 0011 0000 0000 0000 = 0x23000
unsigned crc18(unsigned crc, unsigned word) {
crc ^= word & 0x3ff;
for (int k = 0; k < 10; k++)
crc = crc & 1 ? (crc >> 1) ^ 0x23000 : crc >> 1;
return crc;
}
Indeed the span of the check is from the start of the active line through the line numbers, up to just before the two CRCs in the stream. That calculation matches those two CRCs. Each CRC is calculated on alternating words from the stream. The CRCs are initialized to zero.
Related
Reading an Ogg Opus header to check the crc
I decided to experiment with file formats and I'm using python to read said files. Everything I have extracted from the Ogg header is correct, except the crc check. The documentation says you must check the entire header and page with the original crc check value set to 0. I'm wondering what steps I'm missing to get the expected result. import zlib import struct with open("sample3.opus", "rb") as f_: file_data = f_.read() cp, ssv, htf, agp, ssn, psn, pc, ps = struct.unpack_from("<4sBBQIIIB", file_data, 0) offset = struct.calcsize("<4sBBQIIIB") segments = struct.unpack_from(f"<{ps}B", file_data, offset) packet_size = 0 for num in segments: packet_size += num header_size = offset + len(segments) + packet_size # Copying the entire packet then changing the crc to 0. header_copy = bytearray() header_copy.extend(file_data[0:header_size]) struct.pack_into("<I", header_copy, struct.calcsize("<4sBBQII"), 0) print(pc) print(zlib.crc32(header_copy)) This script results in: 277013243 752049619 The audio file I'm using: https://filesamples.com/formats/opus
zlib.crc32() is not the CRC that they specify. They say the initial value and final exclusive-or is zero, whereas for zlib.crc32(), those values are both 0xffffffff. They fail to specify whether their CRC is reflected or not, so you'd need to try both to see which it is. Update: I checked, and it's a forward CRC. Unfortunately, you can't use zlib.crc32() to calculate it. You can compute it with this: def crc32ogg(seq): crc = 0 for b in seq: crc ^= b << 24 for _ in range(8): crc = (crc << 1) ^ 0x104c11db7 if crc & 0x80000000 else crc << 1 return crc
What is a python "cksum" equivalent for very large files and how does it work?
I have a problem that i need to validate huge compressed files after download (usually more than 10-20gb per file) against reference checksums that have apparently been generated using cksum (To be more precise: My python script needs to download large compressed files from the ncbi ftp-server that was supposed to provide md5 checksums for validating the downloads, but instead only provided some different unspecified filehash/checksum values. After some trial and error I found that these checksums were identical to the output of the unix tool cksum, which apparently genereates CRC-checksums. So to compare/validate these i need to generate cksum-equivalent checksums for the downloaded files.) It appears that the unix tool cksum yields totally different checksum values than the supposed equivalent unix tool crc32 (or the python zlib.crc32() function, for that matter). When googling the problem I could not understand the explanations for why this occurs, especially since they appear to be identical on some systems? So maybe this is because I work on a 64 bit system (but then: who doesn't nowadays)? using built-in python modules I can easily generate md5- and CRC32 checksums, but none of these are equivalent to the cksum output, neither in decimal nor in hexadecimal representation. I did find a previous post here on stackoverflow pointing to a snippet that seems to solve this. But while it works for small files, A.) I do not understand a word of it, so I have a hard time adapting it and B.) it does not seem to work well with large files. for completeness sake: here is the snippet (python3 version): #!/usr/bin/env python import sys crctab = [ 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd, 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75, 0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5, 0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d, 0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1, 0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde, 0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba, 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6, 0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2, 0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637, 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb, 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53, 0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623, 0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b, 0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7, 0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3, 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8, 0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec, 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0, 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4, 0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668, 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4 ] UNSIGNED = lambda n: n & 0xffffffff def memcrc(b): n = len(b) i = c = s = 0 for c in b: tabidx = (s>>24)^c s = UNSIGNED((s << 8)) ^ crctab[tabidx] while n: c = n & 0o0377 n = n >> 8 s = UNSIGNED(s << 8) ^ crctab[(s >> 24) ^ c] return UNSIGNED(~s) if __name__ == '__main__': fname = sys.argv[-1] buffer = open(fname, 'rb').read() print("%d\t%d\t%s" % (memcrc(buffer), len(buffer), fname)) Could someone please help me understand this? what exactly is the problem with the difference between cksum and crc32? is it simply the fact that the one is 32bit based and the other 64 bit? Can i simply convert between the values produced by both, and if yes how? what is the purpose of the crctab in the above snippet and how does the conversion work there?
I don't know the why part of your question. All I can say is that the great thing about standards is that you have so many to choose from. cksum is specified by POSIX to use a different CRC than the more common CRC-32 you find in zlib, Python, used in zip and gzip files, etc. The CRC-32/CKSUM has this specification (from Greg Cook's CRC catalog): width=32 poly=0x04c11db7 init=0x00000000 refin=false refout=false xorout=0xffffffff check=0x765e7680 residue=0xc704dd7b name="CRC-32/CKSUM" The more common CRC-32 has this specification: width=32 poly=0x04c11db7 init=0xffffffff refin=true refout=true xorout=0xffffffff check=0xcbf43926 residue=0xdebb20e3 name="CRC-32/ISO-HDLC" The cksum utility on my system (macOS) computes the CRC-32/CKSUM, but it also has options to compute the CRC-32/ISO_HDLC, as well as two other actual checksums, the first from the BSD Unix sum command, and the second from the AT&T System V Unix sum command. There is apparently no shortage of results that cksum might produce. No, it has nothing to do with 32 vs. 64 bit systems. No, you cannot convert between the values. The purpose of the table is to speed up the CRC calculation by precomputing the CRC of every byte value.
I've tried refactoring your code into a class following a similar api to the standard Python hashlib: class crc32: def __init__(self): self.nchars = 0 self.crc = 0 def update(self, buf): crc = self.crc for c in buf: crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff) self.crc = crc self.nchars += len(buf) def digest(self): crc = self.crc n = self.nchars while n: c = n & 0xff crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff) n >>= 8 return UNSIGNED(~crc) I've expanded out UNSIGNED to try and make it faster and reordered some of the statements to be more similar to the standard zlib library (as used by Python) while I was trying to understand the differences. It seems they use a different polynomial to generate the table, but otherwise it's the same. The above code can be used as: with open('largefile', 'rb') as fd: digest = crc32() while buf := fd.read(4096): digest.update(buf) print(digest.digest()) which prints out the expected 1135714720 for a file created by: echo -n hello world > test.txt The above code should work for large files, but given the performance of Python this would take far too long to be useful. A 75MB file I have takes ~11 seconds, while cksum takes just ~0.2 seconds. You should be able to get somewhere with using Cython to speed it up, but that's a bit more fiddly and if your're struggling with the existing code it's going to be quite a learning curve! I've had another play and got performance similar to cksum with Cython, the code looks like: cdef unsigned int *crctab_c = [ // copy/paste crctab from above ] cdef class crc32_c: cdef unsigned int crc, nchars def __init__(self): self.nchars = 0 self.crc = 0 cdef _update(self, bytes buf): cdef unsigned int crc, i, j cdef unsigned char c crc = self.crc for c in buf: i = (crc >> 24) ^ c j = crc << 8 crc = crctab_c[i] ^ j self.crc = crc self.nchars += len(buf) def update(self, buf): return self._update(buf) def digest(self): crc = self.crc n = self.nchars while n: c = n & 0xff crc = crctab_c[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff) n >>= 8 return (~crc) & 0xffffffff after compiling this code with Cython, it can be used in a similar manner to the previous class. Performance is pretty good: Python now takes ~200ms for a 75MiB file and is basically the same as cksum, but much slower than zlib which only takes ~80ms.
Why use bit shifting on values before writing to I2C?
I recently bought the PTZ-Camera-controller from arducam, and found a kind of API/Controller software on github (https://github.com/ArduCAM/PTZ-Camera-Controller). The module is integrated with I2C, and has several functions on different registers. I tried understanding the python-program, but I was confused by one line that occured several times in the program: def read(self, I2C_address, register_address): value = self.bus.read_word_data(I2C_address, register_address) value = ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8) #This one return value def write(self, I2C_address, register_address, value): if value < 0: value = 0 value = ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8) #And this one return self.bus.write_word_data(I2C_address, register_address, value) These can be used to, for instance, read or write to the cameras zoom-motor. The motor has 2317 steps, and the default span for the zoom-value is 0-18000. Which makes some sense, i guess, because if you for instance try writing 18000 = 0x4650, you get: value = ((0x4650 & 0x00FF) << 8 | (0x4650 & 0xFF00) >> 8)) Which should equal 2400, by my estimation. Fairly close to 2317. However: Why would they do this, instead of just having the input span from 0 to 2317?
It looks like that's flipping the order of two bytes in a 16 bit integer. That's probably to convert from big endian to little endian, two different ways of representing integers. I think the controller uses big endian, PCs use little endian. (https://en.wikipedia.org/wiki/Endianness)
16 bit artimatic sum in Python
I am working on a program that is creating IRIG106 Chapter 10 data for a cube-sat project. Currently it is being implemented in python and I am having difficulty implementing the final component of the Chapter 10 header. The way I have implemented it I am currently finding checksum values that are larger than what will fit inside of an integer of the size defined by the specification (2 bytes). The standard defines the header checksum in section 10.6.1.1 paragraph "J" of the IRIG 106-09 standard. It is defined as the following: J Header Checksum. (2 Bytes) contains a value representing a 16-bit arithmetic sum of all 16-bit words in the header excluding the Header Checksum Word. There is also a programming manual provided that has example C code that shows the following (from page A-2-17): uint16_t I106_CALL_DECL uCalcHeaderChecksum(SuI106Ch10Header * psuHeader) { int iHdrIdx; uint16_t uHdrSum; uint16_t * aHdr = (uint16_t *)psuHeader; uHdrSum = 0; for (iHdrIdx=0; iHdrIdx<(HEADER_SIZE-2)/2; iHdrIdx++) uHdrSum += aHdr[iHdrIdx]; return uHdrSum; } I have implemented the following in Python using the BitString library: def calculate_checksum(byte_data: BitArray = None, header_length_bytes: int = 24, chunk_length: int = 16): # Set the checksum to zero: checksum = 0 # Loop through the Chapter 10 header and compute the 16 bit arithmetic sum: for bit_location in range(0, (header_length_bytes-2), chunk_length): # Get the range of bits to select: bit_range = slice(bit_location, (bit_location + chunk_length)) # Get the uint representation of the bit data found: checksum += Bits(bin=byte_data.bin[bit_range]).uint # Write the computed checksum as binary data to the start location of the checksum in the header: byte_data.overwrite(Bits(uint=checksum, length=chunk_length), (header_length_bytes-2*8)) Any thoughts or insights you could provide would be extremely appreciated. I know it should be a simple solution but I am just not able to see it. --- Update 2 --- I tried doing both roll over and truncation and they both produced the same result: test_value = 2**16 test_value1 = test_value + 500 test_value2 = test_value1 % (2**16) -> 500 test_value3 = test_value1 & 0xFFFF -> 500 --- Update 3 --- When I compare the execution of the python and C checksum functions I have run into the following using these values as an input per the spec: Sync = "EB25" (2 bytes) ChannelID = 1 (2 bytes) PacketLen = 1024 (4 bytes) When I compare the outputs at each step I see the following: C: Header0: EB25 index = 0 16bit chunk = 60197 checksum = 60197 Header1: 0001 index = 1 16bit chunk = 1 checksum = 60198 Header2: 0400 index = 2 16bit chunk = 1024 checksum = 61222 Header3: 0000 index = 3 16bit chunk = 0 checksum = 61222 Python: eb25 index: 0 chunk: 60197 checksum: 60197 0001 index: 1 chunk: 1 checksum: 60198 0000 index: 2 chunk: 0 checksum: 60198 0400 index: 3 chunk: 1024 checksum: 61222
So, I know this question is really old, but I had the same issue. The endianess of the packet matters. In a .ch10 file, the packet start is 0x25EB because each section is little endian. Here's how I'm doing everything right now in C-ish code. // read until the end of the file while (!atEndOfFile) { // verify that we get teh sync packet if ( readNextByte == 0x25 ) { if ( readNextByte == 0xEB ) { // store the sync packet byte packetHeader[24]; packetHeader[0] = 0x25; packetHeader[1] = 0xEB // grab the rest of the header for ( int i = 2; i < 24; j++ ) { packetHeader[j] = readNextByte; } // grab the check sum from the packet uint16 actualCheckSum = (packetHeader[23] << 8) | packetHeader[22]; // calculate the checkSum uint16 calculatedCheckSum = 0; for ( int i = 0; i < 22; i++ ) { calculatedCheckSum += (packetHeader[i + 1] << 8) | packetHeader[i]; } // verify the calculation if ( calculatedCheckSum == actualChecksum ) { printLine( "We calculated the checksum!"); printLine( "actual checksum: " + actualCheckSum + "calculated checksum" + calculatedCheckSum ); } } } } I haven't done enough digging into the irig106 library, but I believe that it handles the translation when it reads in a .ch10 file.
CRC in python, little Endian
I need to calc CRC checksumme of binary file. This file content CRC too and by comparing I find out when file was corrupted. Bin file is something like long hex string 00200020 595A0008 ...... But CRC in file was calculated per integer(4.byte little Endian) like this 1.int - 0x20002000 2.int - 0x8000A559 How can I get the same result without switching bytes in python? I was trying http://www.tty1.net/pycrc/ and played with reflect in, but I dont get the same result. For this two bytes is correct crc 0xEF2B32F8
Try using the struct module. You can open a file and use the unpack read the data in any format you want with any Endianess.
I have written the following code for calculating crc8: acklist = [] # a list of your byte string data x = 0xff crc = 0 for i in range(len(acklist)): crc += int(acklist[i], 16) & x print(crc) crc = ~crc crc += 1 crc1 = crc >> 8 & x crc2 = crc & x