Python - Packing string representing short int into 2 byte string - python

I do some sort of calculation and in the end I have an int that needs at most 16 bits to represent. I want to pack it into a string in unsigned short int format. For example, if I have 1223, I want to store 0000010011000111.
I tried using:
n = pack('H', 1223)
When I tried to print in (in binary representation) I got:
11000111 100
But I want the leading zeros to also be encoded into n, how can I do it elegantly?

Related

Python hex digest to integer

Many online gambling games use a function that converts a hash into a decimal from 0-(usually 2^52).
Here's some code I grabbed that works fine, but I don't understand why it works:
def get_result(hash):
hm = hmac.new(str.encode(hash),b'', hashlib.sha256) #hashing object
h = hm.hexdigest() #hex digest, 32 bytes 256 bit
print(h) #Something like 848ab848c6486d4f64
c = int(h,16)
print(c) #numbers only, 77 numbers long...?
if (c % 33 == 0):
return 1
h = int(h[:13],16)
return (((100 * E - h) / (E - h)) // 1) / 100.0
The part of the code that I don't understand is the conversion from h to c. h is a hex digest, so it is base-16. The python documentation says that the int(a,b) function converts the string a into a base-b integer. Here's my question:
How can an integer number be base-16? Isn't the definition of decimal base-10 (0-9)? Where do the extra 6 come from?
As far as I'm aware, a single hex digit can be stored by 4 bits, or 1/2 a byte. So a hex string of 64 length will occupy 32 bytes. Does this mean that any base of this data will also be 32 bytes? (converting the hex string to base-n, n being anything)
What does the fact that the c variable is always 77 digits long mean?
How can an integer number be base-16? Where do the extra 6 come from?
This is known as the hexadecimal system.
Isn't the definition of decimal base-10 (0-9)?
Integer and decimal are not synonyms. You can have a integer in base 2 instead of base 10.
As far as I'm aware, a single hex digit can be stored by 4 bits, or 1/2 a byte. So a hex string of 64 length will occupy 32 bytes.
There are two different concepts: a hex string and a hex integer.
When you type in Python, for example, "8ff", you're creating a hex string of length 3. A string is an array of characters. A character is (under the hood) a 1-byte integer. Therefore, you're storing 3 bytes¹ (about your second statement, a hex string of length 64 will actually occupy 64 bytes).
Now, when you type in Python 0x8ff, you're creating a hex integer of 3 digits. If you print it, it'll show 2303, because of the conversion from base-16 (8ff, hex) to base-10 (2303, dec). A single integer stores 4 bytes², so you're storing 4 bytes.
Does this mean that any base of this data will also be 32 bytes? (converting the hex string to base-n, n being anything)
It depends, what type of data?
A string of length 3 will always occupy 3 bytes (let's ignore Unicode), it doesn't matter if its "8ff" or "123".
A string of length 10 will always occupy 10 bytes, it doesn't matter if its "85d8afff" or "ef08c0e38e".
An integer will always occupy 4 bytes³, it doesn't matter if its 10 or 1000000.
What does the fact that the c variable is always 77 digits long mean?
As #flakes noted, that's because 2^256 ~= 1.16e+77 in decimal.
¹ Actually a string of length 3 stores 4 bytes: three for its characters and one for the null terminator.
¹ Let's ignore that integers in Python are unbounded.
² If it's lesser than 2,147,483,647 (signed) or 4,294,967,295 (unsigned).

Convert int to hex of a given number of characters

I want to convert my int number into an hex one specifying the number of characters in my final hex representation.
This is my simple code that takes an input an converts it into hex:
my_number = int(1234)
hex_version = hex(my_number)
This code returns a string equal to 0x4d2.
However I would like my output to contain 16 characters, so basically it should be 0x00000000000004d2.
Is there a way to specify the number of output character to the hex() operator? So that it pads for the needed amount of 0.
From Python's Format Specification Mini-Language:
n = int(1234)
h = format(n, '#018x')
The above will generate the required string. The magic number 18 is obtained as follows: 16 for the width you need + 2 for '0' (zero) and 'x' (for the hex descriptor string prefix).

STL binary file reader with Python

I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000

Struct.unpack and Length of Byte Object

I have the following code (data is a byte object):
v = sum(struct.unpack('!%sH' % int(len(data)/2), data))
The part that confuses me is the %sH in the format string and the % int(len(data)/2
How exactly is this part of the code working? What is the length of a byte object? And what exactly is this taking the sum of?
Assuming you have a byte string data such as:
>>> data = b'\x01\x02\x03\x04'
>>> data
'\x01\x02\x03\x04'
The length is the number of bytes (or characters) in the byte string:
>>> len(data)
4
So this is equivalent to your code:
>>> import struct
>>> struct.unpack('!2H', data)
(258, 772)
This tells the struct module to use the following format characters:
! - use network (big endian) mode
2H - unpack 2 x unsigned shorts (16 bits each)
And it returns two integers which correspond to the data we supplied:
>>> '%04x' % 258
'0102'
>>> '%04x' % 772
'0304'
All your code does is automatically calculate the number of unsigned shorts on the fly
>>> struct.unpack('!%sH' % int(len(data)/2), data)
(258, 772)
But the int convesion is unnecessary, and it shouldn't really be using the %s placeholder as that is for string substitution:
>>> struct.unpack('!%dH' % (len(data)/2), data)
(258, 772)
So unpack returns two integers relating to the unpacking of 2 unsigned shorts from the data byte str. Sum then returns the sum of these:
>>> sum(struct.unpack('!%dH' % (len(data)/2), data))
1030
How your code works:
You are interpreting the byte structure of data
struct.unpack uses a string to determine the byte format of the data you want to interpret
Given the format stuct.unpack returns an iterable of the interpreted data.
You then sum the interable.
Byte Formatting
To interpret your data you are passing, you create a string to tell Python what form data comes in. Specifically the %sH part is a short hand for this number of unsigned shorts which you then format to say the exact number of unsigned short you want.
In this case the number is:
int(len(data) / 2)
because an unsigned short is normally 2 bytes wide.

Convert decimal int to little endian string ('\x##\x##...')

I want to convert an integer value to a string of hex values, in little endian. For example, 5707435436569584000 would become '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'.
All my googlefu is finding for me is hex(..) which gives me '0x4f34e24a4f34e180' which is not what I want.
I could probably manually split up that string and build the one I want but I'm hoping somone can point me to a better option.
You need to use the struct module:
>>> import struct
>>> struct.pack('<Q', 5707435436569584000)
'\x80\xe14OJ\xe24O'
>>> struct.pack('<Q', 5707435436569584202)
'J\xe24OJ\xe24O'
Here < indicates little-endian, and Q that we want to pack a unsigned long long (8 bytes).
Note that Python will use ASCII characters for any byte that falls within the printable ASCII range to represent the resulting bytestring, hence the 14OJ, 24O and J parts of the above result:
>>> struct.pack('<Q', 5707435436569584202).encode('hex')
'4ae2344f4ae2344f'
>>> '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'
'J\xe24OJ\xe24O'
I know it is an old thread, but it is still useful. Here my two cents using python3:
hex_string = hex(5707435436569584202) # '0x4f34e24a4f34e180' as you said
bytearray.fromhex(hex_string[2:]).reverse()
So, the key is convert it to a bytearray and reverse it.
In one line:
bytearray.fromhex(hex(5707435436569584202)[2:])[::-1] # bytearray(b'J\xe24OJ\xe24O')
PS: You can treat "bytearray" data like "bytes" and even mix them with b'raw bytes'
Update:
As Will points in coments, you can also manage negative integers:
To make this work with negative integers you need to mask your input with your preferred int type output length. For example, -16 as a little endian uint32_t would be bytearray.fromhex(hex(-16 & (2**32-1))[2:])[::-1], which evaluates to bytearray(b'\xf0\xff\xff\xff')

Categories

Resources