python len calculation

python len calculation - python

I'm currently trying to build a RDP client in python and I came across the following issue with a len check;
From: http://msdn.microsoft.com/en-us/library/cc240836%28v=prot.10%29.aspx
81 2a -> ConnectData::connectPDU length = 298 bytes Since the most significant bit of the first byte (0x81) is set to 1 and the following bit is set to 0, the length is given by the low six bits of the first byte and the second byte. Hence, the value is 0x12a, which is 298 bytes.
This sounds weird.
For normal len checks, I'm simply using : struct.pack(">h",len(str(PacketLen)))
but in this case, I really don't see how I can calculate the len as described above.
Any help on this would be greatly appreciated !

Just set the most-significant bit by using a bitwise OR:
struct.pack(">H", len(...) | 0x8000)
You might want to add a check to make sure the length fits into 14 bits, i.e. it is less than 2 ** 14.
Edit: Fixed according to the comment by TokenMacGuy.

Not a terribly uncommon scenario when dealing with bandwidth sensitive transmission protocols. They are basically saying if the length that follows fit in the range of 0 -> 0x7F, just use one byte, otherwise, you can optionally use 2-bytes. (note: the largest legal value with this system is therefore 16,383)
Here's a quick example:
if len <= 0x7F:
pkt = struct.pack('B', len)
elif len <= 0x3FFF:
pkt = struct.pack('>h', len | 0x8000)
else:
raise ValueError('length exceeds maxvalue')

Related

Converting bitstring to 32-bit signed integer yields wrong result

I am trying to solve a challenge on this site. I have everything correct except I can't properly convert a bitstring to its 32-bit signed integer representation.
For example I have this bitstring:
block = '10101010001000101110101000101110'
My own way of converting this bitstring to 32-bit signed integer: I partially remember from school that first bit is the sign bit. If it is 1 we have negative number and vice versa.
when I do this, it gives me the number in base 10. It just converts it to base 10:
int(block, 2) #yields 2854414894
I have tried excluding the first bit and convert remaining 31 length bitstring, after that checked the first bit to decide whether this is negative number or not:
int(block[1:32], 2) #yields 706931246
But the correct answer is -1440552402. What operation should I do to this bitstring to get this integer? Is it relevant if the byte order of the system is little endian or big endian? My system is little endian.

In python there's no size for integers, so you'll never get a negative value with a high order 1 bit.
To "emulate" 32-bit behaviour just do this, since your 2854414894 value is > 2**31-1 aka 0x7FFFFFFF:
print(int(block[1:32], 2)-2**31)
you'll get
-1440552402

You're right that the upper bit determines sign, but it's not a simple flag. Instead, the whole character of negative numbers is inverted. This is a positive number 1 (in 8 bits):
00000001
This is a negative 1:
11111111
The upshot is that addition and subtraction "wrap around". So 4 - 1 would be:
0100 - 0001 = 0011
And so 0 - 1 is the same as 1_0000_0000 - 1. The "borrow" just goes off the top of the integer.
The general way to "negate" a number is "invert the bits, add 1". This works both ways, so you can go from positive to negative and back.
In your case, use the leading '1' to detect whether negation is needed, then convert to int, then maybe perform the negation steps. Note, however, that because python's int is not a fixed-width value, there's a separate internal flag (a Python int is not a "32-bit" number, it's an arbitrary-precision integer, with a dynamically allocated representation stored in some fashion other than simple 2's complement).
block = '10101010001000101110101000101110'
asnum = int(block, 2)
if block[0] == '1':
asnum ^= 0xFFFFFFFF
asnum += 1
asnum = -asnum
print(asnum)

You should check for when the input value is out of the positive range for 32 bit signed integers:
res = int(block, 2)
if res >= 2**31:
res -= 2**32
So first you interpret the number as an unsigned number, but when you notice the sign bit was set ( >= 2^31 ), you subtract 2^32 so to get the negative number.

Converting number into 32 bits in python

I have 32 bit numbers A=0x0000000A and B=0X00000005.
I get A xor B by A^B and it gives 0b1111.
I rotated this and got D=0b111100000 but I want this to be 32 bit number not just for printing but I need MSB bits even though there are 0 in this case for further manipulation.

Most high-level languages don't have ROR/ROL operators. There are two ways to deal with this: one is to add an external library like ctypes or https://github.com/scott-griffiths/bitstring, that have native support for rotate or bitslice support for integers (which is pretty easy to add).
One thing to keep in mind is that Python is 'infinite' precision - those MSBs are always 0 for positive numbers, 1 for negative numbers; python stores as many digits as it needs to hold up to the highest magnitude difference from the default. This is one reason you see weird notation in python like ~(0x3) is shown as -0x4, which is equivalent in two's complement notation, rather than the equivalent positive value, but -0x4 is always true, even if you AND it against a 5000 bit number, it will just mask off the bottom two bits.
Or, you can just do yourself, the way we all used to, and how the hardware actually does it:
def rotate_left(number, rotatebits, numbits=32):
newnumber = (number << rotatebits) & ~((1<<numbits)-1)
newnumber |= (number & ~((1<<rotatebits)-1)) << rotatebits
return newnumber

To get the binary of an integer you could use bin().
Just an short example:
>>> i = 333333
>>> print (i)
333333
>>> print (bin(i))
0b1010001011000010101
>>>

bin(i)[2:].zfill(32)
I guess does what you want.
I think your bigger problem here is that you are misunderstanding the difference between a number and its representation
12 ^ 18 #would xor the values
56 & 11 # and the values
if you need actual 32bit signed integers you can use numpy
a =numpy.array(range(100),dtype=np.int32)

What's the fastest way to compare bits of a hash in Python3?

I'm looking to compare bits of a hash in Python3, as part of a Hashcash system.
So for instance, I want to know if the first N bits of a SHA256 hash are 0.
Right now, I'm doing this based on the hex version
if newhash.hexdigest()[0:4] == '0000'
But this doesn't let me be as granular as I want - I'd prefer to compare the raw bits, which lets me vary the number of matching 0s far more closely.
I get get the bit values to compare through a convoluted hop
bin(int(h.hexdigest(), 16))[2:]
but this seems like it can't possibly be the fastest/right way to do it.
I'd appreciate any advise on the right/correct way to do it ;)
Thanks,
-CPD

To check whether selected bits of a number are zero, you need to and the number with a precomputed mask that has all those bits set, and compare the result to zero. The mask that checks for the first n bits of an m-bit number is the number that consists of n 1s followed by m - n 0s in binary.
def mask(n, m):
return ((1 << n) - 1) << (m - n)
def test_0bits(digest_bytes, n_bits):
m = 8 * len(digest_bytes)
digest_num = int.from_bytes(digest_bytes, 'big')
return digest_num & mask(n_bits, m) == 0
>>> test_0bits(b'\123\456', 3) # 001 010 011 100 101 110
False
>>> test_0bits(b'\023\456', 3) # 000 010 011 100 101 110
True
If you keep calling test_bits with the same number of bits, you can precompute the mask and store it as a module-level "constant".

You can get the first 8 bytes of the digest unpacked like this:
bin(struct.unpack('>Q', h.digest()[:8])[0])
but I'm not sure if it's faster, and it won't be convenient for the rest of the bits. Bit-twiddling is not easy in Python.

If you can deal with indexing bits from the right, the integer type in gmpy2 supports slicing to access individual bits:
>>> x=gmpy2.mpz(12345)
>>> x.digits(2)
'11000000111001'
>>> x[2:5].digits(2)
'110'
If you need to modify individual bits, gmpy2 includes a mutable integer type that allows you to modify the bits in place.
Disclaimer: I maintain gmpy2.

How to do <xor> in python e.g. enc_price = pad <xor> price

I am new to crypto and I am trying to interpret the below code. Namely, what does <xor> mean?
I have a secret_key secret key. I also have a unique_id. I create pad using the below code.
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
Once the pad is created, I have a price e.g. 1000. I am trying to follow this instruction which is pseudocode:
enc_price = pad <xor> price
In Python, what is the code to implement enc_price = pad <xor> price? What is the logic behind doing this?
As a note, a complete description of what I want to do here here:
https://developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
Thanks

The binary (I assume that's what you need) xor is ^ in python:
>>> 6 ^ 12
10
Binary xor works like this (numbers represented in binary):
1234
6 = 0110
12 = 1100
10 = 1010
For every pair of bits, if their sum is 1 (bits 1 and 3 in my example), the resulting bit is 1. Otherwise, it's 0.

The pad, and the plaintext "price" are each to be interpreted as a stream of bits. For each corresponding bit in the two streams, you take the "exclusive OR" of the pair of bits - if the bits are the same, you emit 0, if the bits are different, you emit 1. This operation is interesting because it's reversible: plaintext XOR pad -> ciphertext, and ciphertext XOR pad -> plaintext.
However, in Python, you won't usually do the XORing yourself because it's tedious and overly complex for a newbie; you want to use a popular encryption library such as PyCrypto to do the work.

You mean "Binary bitwise operations"?
The & operator yields the bitwise AND of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The ^ operator yields the bitwise XOR (exclusive OR) of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The | operator yields the bitwise (inclusive) OR of its arguments, which must be plain or long integers. The arguments are converted to a common type.
[update]
Since you can't xor a string and a number, you should either:
convert the number to a string padded to the same size and xor each byte (may give you all sort of strange "escape" problems with some chars, for example, accidentally generating invalid unicode)
use the raw value (20 byte integer?) of the digest to xor and make an hexdigest of the resulting number.
Something like this (untested):
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
rawpad = reduce(lambda x, y: (x << 8) + y,
[ b for b in struct.unpack('B' * len(pad), pad)])
enc_price = "%X" % (rawpad ^ price)
[update]
The OP wants to implement "DoubleClick Ad Exchange Real-Time Bidding Protocol".
This very article tells there are some sample python code available:
Initial Testing
You can test your bidding application internally using requester.tar.gz. This is a test python program that sends requests to a bidding application and checks the responses. The program is available on request from your Ad Exchange representative.

I did it so
def strxor(s1,s2):
size = min(len(s1),len(s2))
res = ''
for i in range(size):
res = res + '%c' % (ord(s1[i]) ^ ord(s2[i]))
return res

Ethernet CRC32 calculation - software vs algorithmic result

I'm trying to calculate the Frame Check Sequence (FCS) of an Ethernet packet byte by byte. The polynomial is 0x104C11DB7.
I did follow the XOR-SHIFT algorithm seen here http://en.wikipedia.org/wiki/Cyclic_redundancy_check or here http://www.woodmann.com/fravia/crctut1.htm
Assume the information that is supposed have a CRC is only one byte. Let's say it is 0x03.
step: pad with 32 bits to the right
0x0300000000
align the polynomial and the data at the left hand side with their first bit that is not zero and xor them
0x300000000 xor 0x209823B6E = 0x109823b6e
take remainder align and xor again
0x109823b6e xor 0x104C11DB7 = 0x0d4326d9
Since there are no more bit left the CRC32 of 0x03 should be 0x0d4326d9
Unfortunately all the software implementations tell me I'm wrong, but what did I do wrong or what are they doing differently?
Python tells me:
"0x%08x" % binascii.crc32(chr(0x03))
0x4b0bbe37
The online tool here http://www.lammertbies.nl/comm/info/crc-calculation.html#intr gets the same result.
What is the difference between my hand calculation and the algorithm the mentioned software uses?
UPDATE:
Turns out there was a similar question already on stack overflow:
You find an answer here Python CRC-32 woes
Although this is not very intuitive. If you want a more formal description on how it is done for Ethernet frames you can look at the Ethernet Standard document 802.3 Part 3 - Chapter 3.2.9 Frame Check Sequence Field
Lets continue the example from above:
Reverse the bit order of your message. That represents the way they would come into the receiver bit by bit.
0x03 therefore is 0xC0
Complement the first 32 bit of your message. Notice we pad the single byte with 32 bit again.
0xC000000000 xor 0xFFFFFFFF = 0x3FFFFFFF00
Complete the Xor and shift method from above again. After about 6 step you get:
0x13822f2d
The above bit sequense is then complemented.
0x13822f2d xor 0xFFFFFFFF = 0xec7dd0d2
Remember that we reversed the bit order to get the representation on the Ethernet wire in step one. Now we have to reverse this step and we finally fulfill our quest.
0x4b0bbe37
Whoever came up with this way of doing it should be ...
A lot of times you actually want to know it the message you received is correct. In order to achieve this you take your received message including the FCS and do the same step 1 through 5 as above. The result should be what they call residue. Which is a constant for a given polynomial. In this case it is 0xC704DD7B.
As mcdowella mentions you have to play around with your bits until you get it right, depending on the Application you are using.

This snippet writes the correct CRC for Ethernet.
Python 3
# write payload
for byte in data:
f.write(f'{byte:02X}\n')
# write FCS
crc = zlib.crc32(data)
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write(f'{byte:02X}\n')
Python 2
# write payload
for byte in data:
f.write('%02X\n' % ord(byte))
# write FCS
crc = zlib.crc32(data) & 0xFFFFFFFF
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write('%02X\n' % byte)
Would have saved me some time if I found this here.

There is generally a bit of trial and error required to get CRC calculations to match, because you never end up reading exactly what has to be done. Sometimes you have to bit-reverse the input bytes or the polynomial, sometimes you have to start off with a non-zero value, and so on.
One way to bypass this is to look at the source of a program getting it right, such as http://sourceforge.net/projects/crcmod/files/ (at least it claims to match, and comes with a unit test for this).
Another is to play around with an implementation. For instance, if I use the calculator at http://www.lammertbies.nl/comm/info/crc-calculation.html#intr I can see that giving it 00000000 produces a CRC of 0x2144DF1C, but giving it FFFFFFFF produces FFFFFFFF - so it's not exactly the polynomial division you describe, for which 0 would have checksum 0
From a quick glance at the source code and these results I think you need to start with an CRC of 0xFFFFFFFF - but I could be wrong and you might end up debugging your code side by side with the implementation, using corresponding printfs to find out where the first differ, and fixing the differences one by one.

There are a number of places on the Internet where you will read that the bit order must be reversed before calculating the FCS, but the 802.3 spec is not one of them. Quoting from the 2008 version of the spec:
3.2.9 Frame Check Sequence (FCS) field
A cyclic redundancy check (CRC) is used by the transmit and receive algorithms to
generate a CRC value for the FCS field. The FCS field contains a 4-octet (32-bit)
CRC value. This value is computed as a function of the contents of the protected
fields of the MAC frame: the Destination Address, Source Address, Length/ Type
field, MAC Client Data, and Pad (that is, all fields except FCS). The encoding is
defined by the following generating polynomial.
G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11
+ x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Mathematically, the CRC value corresponding to a given MAC frame is defined by
the following procedure:
a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the coefficients
of a polynomial M(x) of degree n – 1. (The first bit of the Destination Address
field corresponds to the x(n–1) term and the last bit of the MAC Client Data
field (or Pad field if present) corresponds to the x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of
degree ≤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.
The 32 bits of the CRC value are placed in the FCS field so that the x31 term is
the left-most bit of the first octet, and the x0 term is the right most bit of the
last octet. (The bits of the CRC are thus transmitted in the order x31, x30,...,
x1, x0.) See Hammond, et al. [B37].
Certainly the rest of the bits in the frame are transmitted in reverse order, but that does not include the FCS. Again, from the spec:
3.3 Order of bit transmission
Each octet of the MAC frame, with the exception of the FCS, is transmitted least
significant bit first.

http://en.wikipedia.org/wiki/Cyclic_redundancy_check
has all the data for ethernet and wealth of important details, for example there are (at least) 2 conventions to encode polynomial into a 32-bit value, largest term first or smallest term first.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.