Set fixed length integer in python - python

After a bit of googling, nothing came up. I am manipulating sequence numbers for network packets and need the numbers to be of a fixed length. For example:
>>> 0000 + 1
1
Instead, I'd like the integer that is returned to be 0001. Are there any built-in commands for setting an integer of fixed length?
Edit: I do not need to print these integers, I need to actually manipulate them. I will need them to iterate but they must be fixed length so that they can be easily found in a networking protocol head file.

What you're asking doesn't make any sense. The integer 0011 and the integer 11 are exactly the same number.*
If you want to format them as strings to print them out or to search a text file, you can do that with, e.g., format(n, '04'). It doesn't matter whether you're formatting 11 or 0011, they're both the same number, and that number will format to the string '0011'.
If you want to convert them to big-endian 32-bit C-style unsigned integers, again, they're both the same number, and struct.pack('>I', n) will pack that number to the byte string b'\x00\x00\x00\x0b'.
If you want to add them modulo 10000, again, they're both the same number, and (n + 9990) % 10000 will give you 1.
No matter what operation you dream up, there will be no difference.
* Actually, in Python 2.x, number literals starting with 0 are treated as octal, not decimal, so 0011 is actually 9, not 11. And in 3.x numbers starting with 0 are a SyntaxError, to avoid the confusion caused by accidentally writing octal numbers. But forget all that. We're not talking about the Python number literals, we're talking about something even simpler here: the numbers themselves.

Numbers don't have a "length", they're just numbers. The representation of a number as text, in a string, has a length. To convert numbers to strings in Python, use the format() function:
x = 1
s = "{:04d}".format(x)
print(s)

Related

Python define bitwise

I have a function that accepts 'data' as a parameter. Being new to python I wasn't really sure that that was even a type.
I noticed when printing something of that type it would be
b'h'
if I encoded the letter h. Which dosen't make a ton of sense to me. Is there a way to define bits in python, such as 1 or 0. I guess b'h' must be in hex? Is there a way for me to simply define an eight bit string
bits1 = 10100000
You're conflating a number of unrelated things.
First of all, (in Python 3), quoted literals prefixed with b are of type bytes -- that means a string of raw byte values. Example:
x = b'abc'
print(type(x)) # will output `<class 'bytes'>`
This is in contrast to the str type, which is a (Unicode) string.
Integer literals can be expressed in binary using an 0b prefix, e.g.
y = 0b10100000
print(y) # Will output 160
For what I know, 'data' is not a type. Your function (probably) accepts anything you pass to it, regardless of its type.
Now, b'h' means "the number (int) whose binary sequence maps to the char ´h´", this is not hexadecimal, but a number with possibly 8 bits (1 byte, which is the standard size for int and char).
The ASCII code for ´h´ is 104 (decimal), written in binary that would be b'\b01101000', or in hexa b'\x68'.
So, here is the answer I think you are looking for: if you want to code an 8-bit int from its binary representation just type b'\b01101000' (for 104). I would recommend to use hexa instead, to make it more compact and readable. In hexa, every four bits make a symbol from 0 to f, and the symbols can be concatenated every four bits to form a larger number. So the bit sequence 01101000 is written b'\b0110\b1000' or b'\x6\x8', which can be written as b'\x68'. The preceding b, before the quote marks tells python to interpret the string as a binary sequence expressed in the base defined by \b or \x (or \d for decimal), instead of using escape characters.

Converting bitstring to 32-bit signed integer yields wrong result

I am trying to solve a challenge on this site. I have everything correct except I can't properly convert a bitstring to its 32-bit signed integer representation.
For example I have this bitstring:
block = '10101010001000101110101000101110'
My own way of converting this bitstring to 32-bit signed integer: I partially remember from school that first bit is the sign bit. If it is 1 we have negative number and vice versa.
when I do this, it gives me the number in base 10. It just converts it to base 10:
int(block, 2) #yields 2854414894
I have tried excluding the first bit and convert remaining 31 length bitstring, after that checked the first bit to decide whether this is negative number or not:
int(block[1:32], 2) #yields 706931246
But the correct answer is -1440552402. What operation should I do to this bitstring to get this integer? Is it relevant if the byte order of the system is little endian or big endian? My system is little endian.
In python there's no size for integers, so you'll never get a negative value with a high order 1 bit.
To "emulate" 32-bit behaviour just do this, since your 2854414894 value is > 2**31-1 aka 0x7FFFFFFF:
print(int(block[1:32], 2)-2**31)
you'll get
-1440552402
You're right that the upper bit determines sign, but it's not a simple flag. Instead, the whole character of negative numbers is inverted. This is a positive number 1 (in 8 bits):
00000001
This is a negative 1:
11111111
The upshot is that addition and subtraction "wrap around". So 4 - 1 would be:
0100 - 0001 = 0011
And so 0 - 1 is the same as 1_0000_0000 - 1. The "borrow" just goes off the top of the integer.
The general way to "negate" a number is "invert the bits, add 1". This works both ways, so you can go from positive to negative and back.
In your case, use the leading '1' to detect whether negation is needed, then convert to int, then maybe perform the negation steps. Note, however, that because python's int is not a fixed-width value, there's a separate internal flag (a Python int is not a "32-bit" number, it's an arbitrary-precision integer, with a dynamically allocated representation stored in some fashion other than simple 2's complement).
block = '10101010001000101110101000101110'
asnum = int(block, 2)
if block[0] == '1':
asnum ^= 0xFFFFFFFF
asnum += 1
asnum = -asnum
print(asnum)
You should check for when the input value is out of the positive range for 32 bit signed integers:
res = int(block, 2)
if res >= 2**31:
res -= 2**32
So first you interpret the number as an unsigned number, but when you notice the sign bit was set ( >= 2^31 ), you subtract 2^32 so to get the negative number.

STL binary file reader with Python

I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000

How to compute a double precision float score from the first 8 bytes of a string in Python?

Trying to get a double-precision floating point score from a UTF-8 encoded string object in Python. The idea is to grab the first 8 bytes of the string and create a float, so that the strings, ordered by their score, would be ordered lexicographically according to their first 8 bytes (or possibly their first 63 bits, after forcing them all to be positive to avoid sign errors).
For example:
get_score(u'aaaaaaa') < get_score(u'aaaaaaab') < get_score(u'zzzzzzzz')
I have tried to compute the score in an integer using bit-shift-left and XOR, but I am not sure of how to translate that into a float value. I am also not sure if there is a better way to do this.
How should the score for a string be computed so the condition I specified before is met?
Edit: The string object is UTF-8 encoded (as per #Bakuriu's commment).
float won't give you 64 bits of precision. Use integers instead.
def get_score(s):
return struct.unpack('>Q', (u'\0\0\0\0\0\0\0\0' + s[:8])[-8:])[0]
In Python 3:
def get_score(s):
return struct.unpack('>Q', ('\0\0\0\0\0\0\0\0' + s[:8])[-8:].encode('ascii', 'error'))[0]
EDIT:
For floats, with 6 characters:
def get_score(s):
return struct.unpack('>d', (u'\0\1' + (u'\0\0\0\0\0\0\0\0' + s[:6])[-6:]).encode('ascii', 'error'))[0]
You will need to setup the entire alphabet and do the conversion by hand, since conversions to base > 36 are not built in, in order to do that you only need to define the complete alphabet to use. If it was an ascii string for instance you would create a conversion to a long in base 256 from the input string using all the ascii table as an alphabet.
You have an example of the full functions to do it here: string to base 62 number
Also you don't need to worry about negative-positive numbers when doing this, since the encoding of the string with the first character in the alphabet will yield the minimum possible number in the representation, which is the negative value with the highest absolute value, in your case -2**63 which is the correct value and allows you to use < > against it.
Hope it helps!

Python, string , integer

I have a string variable:
str1 = '0000120000210000'
I want to convert the string into an integer without losing the first 4 zero characters. In other words, I want the integer variable to also store the first 4 zero digits as part of the integer.
I tried the int() function, but I'm not able to retain the first four digits.
You can use two integers, one to store the width of the number, and the other to store the number itself:
kw = len(s)
k = int(s)
To put the number back together in a string, use format:
print '{:0{width}}'.format(k, width=kw) # prints 0000120000210000
But, in general, you should not store identifiers (such as credit card numbers, student IDs, etc.) as integers, even if they appear to be. Numbers in these contexts should only be used if you need to do arithmetic, and you don't usually do arithmetic with identifiers.
What you want simply cannot be done.. Integer value does not store the leading zero's, because there can be any number of them. So, it can't be said how many to store.
But if you want to print it like that, that can be done by formatting output.
EDIT: -
Added #TimPietzcker's comment from OP to make complete answer: -
You should never store a number as an integer unless you're planning on doing arithmetic with it. In all other cases, they should be stored as strings

Categories

Resources