Ethernet CRC32 calculation - software vs algorithmic result

Ethernet CRC32 calculation - software vs algorithmic result - python

I'm trying to calculate the Frame Check Sequence (FCS) of an Ethernet packet byte by byte. The polynomial is 0x104C11DB7.
I did follow the XOR-SHIFT algorithm seen here http://en.wikipedia.org/wiki/Cyclic_redundancy_check or here http://www.woodmann.com/fravia/crctut1.htm
Assume the information that is supposed have a CRC is only one byte. Let's say it is 0x03.
step: pad with 32 bits to the right
0x0300000000
align the polynomial and the data at the left hand side with their first bit that is not zero and xor them
0x300000000 xor 0x209823B6E = 0x109823b6e
take remainder align and xor again
0x109823b6e xor 0x104C11DB7 = 0x0d4326d9
Since there are no more bit left the CRC32 of 0x03 should be 0x0d4326d9
Unfortunately all the software implementations tell me I'm wrong, but what did I do wrong or what are they doing differently?
Python tells me:
"0x%08x" % binascii.crc32(chr(0x03))
0x4b0bbe37
The online tool here http://www.lammertbies.nl/comm/info/crc-calculation.html#intr gets the same result.
What is the difference between my hand calculation and the algorithm the mentioned software uses?
UPDATE:
Turns out there was a similar question already on stack overflow:
You find an answer here Python CRC-32 woes
Although this is not very intuitive. If you want a more formal description on how it is done for Ethernet frames you can look at the Ethernet Standard document 802.3 Part 3 - Chapter 3.2.9 Frame Check Sequence Field
Lets continue the example from above:
Reverse the bit order of your message. That represents the way they would come into the receiver bit by bit.
0x03 therefore is 0xC0
Complement the first 32 bit of your message. Notice we pad the single byte with 32 bit again.
0xC000000000 xor 0xFFFFFFFF = 0x3FFFFFFF00
Complete the Xor and shift method from above again. After about 6 step you get:
0x13822f2d
The above bit sequense is then complemented.
0x13822f2d xor 0xFFFFFFFF = 0xec7dd0d2
Remember that we reversed the bit order to get the representation on the Ethernet wire in step one. Now we have to reverse this step and we finally fulfill our quest.
0x4b0bbe37
Whoever came up with this way of doing it should be ...
A lot of times you actually want to know it the message you received is correct. In order to achieve this you take your received message including the FCS and do the same step 1 through 5 as above. The result should be what they call residue. Which is a constant for a given polynomial. In this case it is 0xC704DD7B.
As mcdowella mentions you have to play around with your bits until you get it right, depending on the Application you are using.

This snippet writes the correct CRC for Ethernet.
Python 3
# write payload
for byte in data:
f.write(f'{byte:02X}\n')
# write FCS
crc = zlib.crc32(data)
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write(f'{byte:02X}\n')
Python 2
# write payload
for byte in data:
f.write('%02X\n' % ord(byte))
# write FCS
crc = zlib.crc32(data) & 0xFFFFFFFF
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write('%02X\n' % byte)
Would have saved me some time if I found this here.

There is generally a bit of trial and error required to get CRC calculations to match, because you never end up reading exactly what has to be done. Sometimes you have to bit-reverse the input bytes or the polynomial, sometimes you have to start off with a non-zero value, and so on.
One way to bypass this is to look at the source of a program getting it right, such as http://sourceforge.net/projects/crcmod/files/ (at least it claims to match, and comes with a unit test for this).
Another is to play around with an implementation. For instance, if I use the calculator at http://www.lammertbies.nl/comm/info/crc-calculation.html#intr I can see that giving it 00000000 produces a CRC of 0x2144DF1C, but giving it FFFFFFFF produces FFFFFFFF - so it's not exactly the polynomial division you describe, for which 0 would have checksum 0
From a quick glance at the source code and these results I think you need to start with an CRC of 0xFFFFFFFF - but I could be wrong and you might end up debugging your code side by side with the implementation, using corresponding printfs to find out where the first differ, and fixing the differences one by one.

There are a number of places on the Internet where you will read that the bit order must be reversed before calculating the FCS, but the 802.3 spec is not one of them. Quoting from the 2008 version of the spec:
3.2.9 Frame Check Sequence (FCS) field
A cyclic redundancy check (CRC) is used by the transmit and receive algorithms to
generate a CRC value for the FCS field. The FCS field contains a 4-octet (32-bit)
CRC value. This value is computed as a function of the contents of the protected
fields of the MAC frame: the Destination Address, Source Address, Length/ Type
field, MAC Client Data, and Pad (that is, all fields except FCS). The encoding is
defined by the following generating polynomial.
G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11
+ x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Mathematically, the CRC value corresponding to a given MAC frame is defined by
the following procedure:
a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the coefficients
of a polynomial M(x) of degree n – 1. (The first bit of the Destination Address
field corresponds to the x(n–1) term and the last bit of the MAC Client Data
field (or Pad field if present) corresponds to the x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of
degree ≤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.
The 32 bits of the CRC value are placed in the FCS field so that the x31 term is
the left-most bit of the first octet, and the x0 term is the right most bit of the
last octet. (The bits of the CRC are thus transmitted in the order x31, x30,...,
x1, x0.) See Hammond, et al. [B37].
Certainly the rest of the bits in the frame are transmitted in reverse order, but that does not include the FCS. Again, from the spec:
3.3 Order of bit transmission
Each octet of the MAC frame, with the exception of the FCS, is transmitted least
significant bit first.

http://en.wikipedia.org/wiki/Cyclic_redundancy_check
has all the data for ethernet and wealth of important details, for example there are (at least) 2 conventions to encode polynomial into a 32-bit value, largest term first or smallest term first.

Related

Python bit manipulation to get usable data out of a lidar sensor

I am trying to write a python driver for a lidar sensor that only has a package for robot OS.
I was able to get the communication working on a Raspberry Pi and I am getting the data that I need.
I never really worked with bytearrays before and even python is pretty new to me.
The received data looks like this (png), but you can take a look at the documentation (pdf) as well.
So if I'm not mistaken, I have to combine three bits into two like this:
[0x5D, 0xC7, 0xD0] => [0x5DC, 0x7D0]
I think the aforementioned robot OS library does this here, but my c++ is even worse than my python :)
After I have the correct data I want to sort it into a 2D array but that's not a problem.
Can you point me in the right direction, or just suggest how to search for a solution?
Thank you for your help

So here's one solution (maybe not the cleanest but it's bit-manipulation so...):
arr = [0x5D, 0xC7, 0xD0]
byte_0 = arr[0] << 4 | (arr[1] >> 4)
byte_1 = (arr[1] & 0xF) << 8 | arr[2]
I'll try to go over this step by step. The three bytes are, in binary representation:
0b0101_1101
0b1100_0111
0b1101_0000
The << operator is the shift-operator. It moves the bits to the left the specified amount. Applying this to the first byte yields:
0b0101_1101 << 4 = 0b0101_1101_0000, effectively appending four zero's at the end.
The >> operator is basically equivalent to the << operator, just shifting it the other way round. It discards bits when they would go below position 0:
0b1100_0111 >> 4 = 0b1100
Finally, the | operator is the logical 'or' operator. It performs a bit-wise or operation where each result bit is '1' if one or both of the initial bits is '1'. It is only '0' when both bits are '0'. We can make use of this to 'override' the contents of the lower four bits of our result so far. Note that I have omitted leading zero's for simplicity, but here are the numbers padded with zeroes
0b0101_1101_0000 | 0b0000_0000_1100 = 0b0101_1101_1100. And there you have your first number. Now note that this is not a byte, rather you now need 12 bits to represent the number.
The same is done with the second byte. The only thing new here is the logical and operator (&). This operator yields '1' only if both bits are '1'. We can use this to mask out a part of interest of the byte:
0b1100_0111 & 0x1111 = 0b0111

Python 3 wave module byteorder..?

[Edit: In summary, this question was the result of me making (clearly incorrect) assumptions about what endian means (I assumed it was 00000001 vs 10000000, i.e. reversing the bits, rather than the bytes). Many thanks #tripleee for clearing up my confusion.]
As far as I can tell, the byte order of frames returned by the Python 3 wave module [1] (which I'll now refer to as pywave) isn't documented. I've had a look at the source code [2] [3], but haven't quite figured it out.
Firstly, it looks like pywave only supports 'RIFF' wave files [2]. 'RIFF' files use little endian; unsigned for 8 bit or lower bitrate, otherwise signed (two's complement).
However, it looks like pywave converts the bytes it reads from the file to sys.byteorder [2]:
data = self._data_chunk.read(nframes * self._framesize)
if self._sampwidth != 1 and sys.byteorder == 'big':
data = audioop.byteswap(data, self._sampwidth)
Except in the case of sampwidth==1, which corresponds to an 8 bit file. So 8 bit files aren't converted to sys.byteorder? Why would this be? (Maybe because they are unsigned?)
Currently my logic looks like:
if sampwidth == 1:
signed = False
byteorder = 'little'
else:
signed = True
byteorder = sys.byteorder
Is this correct?
8 bit wav files are incredibly rare nowadays, so this isn't really a problem. But I would still like to find answers...
[1] https://docs.python.org/3/library/wave.html
[2] https://github.com/python/cpython/blob/3.9/Lib/wave.py
[3] https://github.com/python/cpython/blob/3.9/Lib/chunk.py

A byte is a byte, little or big endian only makes sense for data which is more than one byte.
0xf0 is a single, 8-bit byte. The bits are 0x11110000 on any modern architecture. Without a sign bit, the range is 0 through 255 (8 bits of storage gets 28 possible values).
0xf0eb is a 16-bit number which takes two 8-bit bytes to represent. This can be represented as
0xf0 0xeb big-endian (0x11110000 0x11101011), or
0xeb 0xf0 little-endian (0x11101011 0x11110000)
The range of possible values without a sign bit is 0 through 65,535 (216 values).
You can also have different byte orders for 32-bit numbers etc, but I'll defer to Wikipedia etc for the full exposition.

RC-6 ciphertext does not match non 0 vectors

I came up to this problem where my RC-6 algorithm does not produce the cipher text it should (by the spec doc) well to be more clear, let me give you an example
As you see when plain text and key are made out of zero-bytes it passes both tests -> cipher text and decryption text tests
To clarify this even more the cipher values(both correct and wrong) ,are also ordered in little-endian fashion after encrypting.
So my question is - where should I look for invalid code ?
I have a feeling that it is something to do with the byte-ordering before passing it to encryption or key-scheduling functions.
The values I pass to the key-scheduling and encryption functions are straightforward arrays of 32bit words (e.g. [0x00,0x10,0x00,0x00]) and then I move one straight to algorithm (which I wrote looking at the pseudo-code) so no other formatting done before that.
They also start as follows :
def encrypt(plaintext,S):
A,C = plaintext[0],plaintext[2]
B = modulus(plaintext[1]+S[0])
D = modulus(plaintext[3]+S[1])
for i in range(1,r+1):
....
def keyGenerator(L):
c = len(L)
S = [int(0)]* (2*r+4)
S[0] = P
....
I could use any help..
Thank you in advance!
By the way the official test vectors could be in THIS document's appendix

So I found out what was wrong in this case. It was indeed a problem with swapping bytes. Since 0's were symmetric input it would go through, and input with mixed values were working ,however giving the wrong answer.
def swap32(x):
return (((x << 24) & 0xFF000000) |((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |((x >> 24) & 0x000000FF))
This function ,for swapping 8 byte blocks was very useful in my case. I had to swipe the key bytes, the plaintext bytes in the beggining of encryption, then at the end of the enryption, then at the beggining at decryption and at the end of decryption.
I hope someone will find this useful in the future and won't be stuck in the same place like I was..
Cheers

How to do <xor> in python e.g. enc_price = pad <xor> price

I am new to crypto and I am trying to interpret the below code. Namely, what does <xor> mean?
I have a secret_key secret key. I also have a unique_id. I create pad using the below code.
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
Once the pad is created, I have a price e.g. 1000. I am trying to follow this instruction which is pseudocode:
enc_price = pad <xor> price
In Python, what is the code to implement enc_price = pad <xor> price? What is the logic behind doing this?
As a note, a complete description of what I want to do here here:
https://developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
Thanks

The binary (I assume that's what you need) xor is ^ in python:
>>> 6 ^ 12
10
Binary xor works like this (numbers represented in binary):
1234
6 = 0110
12 = 1100
10 = 1010
For every pair of bits, if their sum is 1 (bits 1 and 3 in my example), the resulting bit is 1. Otherwise, it's 0.

The pad, and the plaintext "price" are each to be interpreted as a stream of bits. For each corresponding bit in the two streams, you take the "exclusive OR" of the pair of bits - if the bits are the same, you emit 0, if the bits are different, you emit 1. This operation is interesting because it's reversible: plaintext XOR pad -> ciphertext, and ciphertext XOR pad -> plaintext.
However, in Python, you won't usually do the XORing yourself because it's tedious and overly complex for a newbie; you want to use a popular encryption library such as PyCrypto to do the work.

You mean "Binary bitwise operations"?
The & operator yields the bitwise AND of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The ^ operator yields the bitwise XOR (exclusive OR) of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The | operator yields the bitwise (inclusive) OR of its arguments, which must be plain or long integers. The arguments are converted to a common type.
[update]
Since you can't xor a string and a number, you should either:
convert the number to a string padded to the same size and xor each byte (may give you all sort of strange "escape" problems with some chars, for example, accidentally generating invalid unicode)
use the raw value (20 byte integer?) of the digest to xor and make an hexdigest of the resulting number.
Something like this (untested):
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
rawpad = reduce(lambda x, y: (x << 8) + y,
[ b for b in struct.unpack('B' * len(pad), pad)])
enc_price = "%X" % (rawpad ^ price)
[update]
The OP wants to implement "DoubleClick Ad Exchange Real-Time Bidding Protocol".
This very article tells there are some sample python code available:
Initial Testing
You can test your bidding application internally using requester.tar.gz. This is a test python program that sends requests to a bidding application and checks the responses. The program is available on request from your Ad Exchange representative.

I did it so
def strxor(s1,s2):
size = min(len(s1),len(s2))
res = ''
for i in range(size):
res = res + '%c' % (ord(s1[i]) ^ ord(s2[i]))
return res

python len calculation

I'm currently trying to build a RDP client in python and I came across the following issue with a len check;
From: http://msdn.microsoft.com/en-us/library/cc240836%28v=prot.10%29.aspx
81 2a -> ConnectData::connectPDU length = 298 bytes Since the most significant bit of the first byte (0x81) is set to 1 and the following bit is set to 0, the length is given by the low six bits of the first byte and the second byte. Hence, the value is 0x12a, which is 298 bytes.
This sounds weird.
For normal len checks, I'm simply using : struct.pack(">h",len(str(PacketLen)))
but in this case, I really don't see how I can calculate the len as described above.
Any help on this would be greatly appreciated !

Just set the most-significant bit by using a bitwise OR:
struct.pack(">H", len(...) | 0x8000)
You might want to add a check to make sure the length fits into 14 bits, i.e. it is less than 2 ** 14.
Edit: Fixed according to the comment by TokenMacGuy.

Not a terribly uncommon scenario when dealing with bandwidth sensitive transmission protocols. They are basically saying if the length that follows fit in the range of 0 -> 0x7F, just use one byte, otherwise, you can optionally use 2-bytes. (note: the largest legal value with this system is therefore 16,383)
Here's a quick example:
if len <= 0x7F:
pkt = struct.pack('B', len)
elif len <= 0x3FFF:
pkt = struct.pack('>h', len | 0x8000)
else:
raise ValueError('length exceeds maxvalue')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.