I have a binary hex string, for example: b'\x914\x05\x11\x11\x95h\xf5' (the f is a filler in this case), and the expected result would be b'\x914\x05\x11\x11\x95h\xf5' → 91340511119568F5 → 19435011115986515.
To do this with the string and a loop is probably not the best solution (there are million on records), what would be a better way?
Edit: Forgot to mention the f is just a filler added from the server (network switch) to provide an even number for the switching (as mentioned by mkrieger)
What you have is a bytes object. When iterating over it you get integer values:
>>> s = b'\x914\x05\x11\x11\x95h\xf5'
>>> list(s) # equivalent to [b for b in s]
[145, 52, 5, 17, 17, 149, 104, 245]
To swap the two "nibbles" (the two 4-bit halves) of a byte, you can use bitwise operations, given that the byte is represented by its integer value. To select the higher-valued nibble, use a bit mask 0xF0 with bitwise and (&), and to select the lower-valued nibble, use 0x0F. Then shift them by 4 bits in the respective directions and combine them again with bitwise or (|):
def swap_nibbles(b: int) -> int:
return ((b & 0xF0) >> 4) | ((b & 0x0F) << 4)
Now you can do this for all bytes in s to get a list of byte values with swapped nibbles:
>>> swapped_nibbles = [swap_nibbles(b) for b in s]
# [25, 67, 80, 17, 17, 89, 134, 95]
Now you just need to reassemble these numbers to your desired result. I'm not entirely sure what exactly you want, so here are some options:
To get another bytes object: Just use the bytes constructor.
>>> bytes(swapped_nibbles)
b'\x19CP\x11\x11Y\x86_'
To get an integer: Use the int.from_bytes constructor with big-endian byte order.
>>> int.from_bytes(swapped_nibbles, byteorder='big')
1820386708623558239 # == 0x194350111159865F
To get a hex string: Use hex on the above integer, or build it from the individual bytes:
>>> ''.join(f'{b:02X}' for b in swapped_nibbles)
'194350111159865F'
It's not clear to me what exactly the rules are for excluding the last F from the result, so you would have to add this logic yourself somehow.
Related
I don't have much experience with Python, so I need your help!
In the following example I can convert a char to unsigned integer, but i need a signed integer. How can I convert a char to signed integer in python?
d="bd"
d=int(d,16)
print (d)
The Result is: 189
but I expect: -67
First a nitpick: It's not a char, it's a string.
The main problem is that int() cannot know how long the input is supposed to be; or in other words: it cannot know which bit is the MSB (most significant bit) designating the sign. In python, int just means "an integer, i.e. any whole number". There is no defined bit size of numbers, unlike in C.
For int(), the inputs 000000bd and bd therefore are the same; and the sign is determined by the presence or absence of a - prefix.
For arbitrary bit count of your input numbers (not only the standard 8, 16, 32, ...), you will need to do the two-complement conversion step manually, and tell it the supposed input size. (In C, you would do that implicitely by assigning the conversion result to an integer variable of the target bit size).
def hex_to_signed_number(s, width_in_bits):
n = int(s, 16) & (pow(2, width_in_bits) - 1)
if( n >= pow(2, width_in_bits-1) ):
n -= pow(2, width_in_bits)
return n
Some testcases for that function:
In [6]: hex_to_signed_number("bd", 8)
Out[6]: -67
In [7]: hex_to_signed_number("bd", 16)
Out[7]: 189
In [8]: hex_to_signed_number("80bd", 16)
Out[8]: -32579
In [9]: hex_to_signed_number("7fff", 16)
Out[9]: 32767
In [10]: hex_to_signed_number("8000", 16)
Out[10]: -32768
print(int.from_bytes(bytes.fromhex("bd"), byteorder="big", signed=True))
You can convert the string into Bytes and then convert bytes to int by adding signed to True which will give you negative integer value.
I am receiving some data from a third party and stumbled over a curious feature of the byte array output:
Some byte arrays I receive have spaces in the string which is printed to console, and I do not know how to interpet these.
a = b'\x14 \x00'
b = b'\x14\x00'
print(len(a), ':', a[0], a[1], a[2])
print(len(b), ':', b[0], b[1])
results in the output
3 : 20 32 0
2 : 20 0
Where does the 32 (which is '\x20' in hex) come from?
ASCII space is 32, but why is this interpreted as such?
32 is the decimal value for the string " " (a space). In Python, a bytes object is an iterable of bytes 0-255, which can be represented by \x14 for 0x14, or ASCII characters like a, b, or c. Or a combination of the two, as you've seen in your example.
list(b'\x01\b02') # [1, 2]
list(b'ab') # [97, 98] (decimal values for 'a' and 'b')
list(b'\x12ab\x44') # [18, 97, 98, 68]
I want to parse some data with Python and scapy.
Therefor I have to analyse single bits. But at the moment I have for example UDP packets with some payload like:
bytes = b'\x18\x00\x03\x61\xFF\xFF\x00\x05\x42\xFF\xFF\xFF\xFF'
Is there any elegant way to convert the bytes so that I can access single bits like:
bytes_as_bits = convert(bytes)
bit_at_index_42 = bytes_as_bits[42]
That will work:
def access_bit(data, num):
base = int(num // 8)
shift = int(num % 8)
return (data[base] >> shift) & 0x1
If you'd like to create a binary array you can use it like this:
[access_bit(data,i) for i in range(len(data)*8)]
If you would like to have the bits string, or to spare yourself from creating a function, I would use format() and ord(), let me take a simpler example to illustrate
bytes = '\xf0\x0f'
bytes_as_bits = ''.join(format(ord(byte), '08b') for byte in bytes)
This should output: '1111000000001111'
If you want LSB first you can just flip the output of format(), so:
bytes = '\xf0\x0f'
bytes_as_bits = ''.join(format(ord(byte), '08b')[::-1] for byte in bytes)
This should output: '0000111111110000'
Now you want to use b'\xf0\x0f' instead of '\xf0\x0f'. For python2 the code works the same, but for python3 you have to get rid of ord() so:
bytes = b'\xf0\x0f'
bytes_as_bits = ''.join(format(byte, '08b') for byte in bytes)
And flipping the string is the same issue.
I found the format() functionality here.
And the flipping ([::-1]) functionality here.
Hm, there is no builtin bits type in python, but you can do something like
>>> bin(int.from_bytes(b"hello world", byteorder="big")).strip('0b')
'1101000011001010110110001101100011011110010000001110111011011110111001001101100011001'
>>> n=17
>>> [(n & (1<<x))>>x for x in [7,6,5,4,3,2,1,0]]
[0, 0, 0, 1, 0, 0, 0, 1]
To extend #Liran's answer I have added byteorder as an input argument which defaults to 'big'.
Note I am not refering to bit packing within the bytes.
def access_bit(b: bytearray, n: int, byteorder: str = "big") -> int:
"""
Returns the boolean value of the nth bit (n) from the byte array (b).
The byteorder argument accepts the literal strings ['little', 'big'] and
refers to the byte order endianness
"""
base = int(n // 8)
shift = int(n % 8)
if byteorder == "big":
return (b[-base - 1] >> shift) & 0x1
elif byteorder == "little":
return (b[base] >> shift) & 0x1
else:
raise KeyError("byteorder only recognises 'big' or 'little'")
access_bit(b, 0) returns the least significant bit of the least significant byte assuming big-endian
access_bit(b, 7) returns the most significant bit of the least significant byte assuming big-endian
access_bit(b, 0, 'little') returns the least significant bit of the least significant byte specifying little-endian
access_bit(b, 7) returns the most significant bit of the least significant byte assuming little-endian
Specifying an index n outside the range of the bytearray will result in an error (i.e. access_bit(b'\x05\x01', 16) results in an error as the max index of the bytearray is 15)
I would just use a simple lambda expression to convert the bytes to a string:
>>> bytes = b'\x18\x00\x03\x61\xFF\xFF\x00\x05\x42\xFF\xFF\xFF\xFF'
>>> convert = lambda x: f"{int.from_bytes(x, 'big'):b}"
>>> bytes_as_bits = convert(bytes)
>>> bytes_as_bits[42]
'1'
>>> _
'big' is the byteorder to be used. The official python documentation describes it as follows:
The byteorder argument determines the byte order used to represent the integer. If byteorder is "big", the most significant byte is at the beginning of the byte array. If byteorder is "little", the most significant byte is at the end of the byte array. To request the native byte order of the host system, use sys.byteorder as the byte order value.
Let's say I have this number i = -6884376.
How do I refer to it as to an unsigned variable?
Something like (unsigned long)i in C.
Assuming:
You have 2's-complement representations in mind; and,
By (unsigned long) you mean unsigned 32-bit integer,
then you just need to add 2**32 (or 1 << 32) to the negative value.
For example, apply this to -1:
>>> -1
-1
>>> _ + 2**32
4294967295L
>>> bin(_)
'0b11111111111111111111111111111111'
Assumption #1 means you want -1 to be viewed as a solid string of 1 bits, and assumption #2 means you want 32 of them.
Nobody but you can say what your hidden assumptions are, though. If, for example, you have 1's-complement representations in mind, then you need to apply the ~ prefix operator instead. Python integers work hard to give the illusion of using an infinitely wide 2's complement representation (like regular 2's complement, but with an infinite number of "sign bits").
And to duplicate what the platform C compiler does, you can use the ctypes module:
>>> import ctypes
>>> ctypes.c_ulong(-1) # stuff Python's -1 into a C unsigned long
c_ulong(4294967295L)
>>> _.value
4294967295L
C's unsigned long happens to be 4 bytes on the box that ran this sample.
To get the value equivalent to your C cast, just bitwise and with the appropriate mask. e.g. if unsigned long is 32 bit:
>>> i = -6884376
>>> i & 0xffffffff
4288082920
or if it is 64 bit:
>>> i & 0xffffffffffffffff
18446744073702667240
Do be aware though that although that gives you the value you would have in C, it is still a signed value, so any subsequent calculations may give a negative result and you'll have to continue to apply the mask to simulate a 32 or 64 bit calculation.
This works because although Python looks like it stores all numbers as sign and magnitude, the bitwise operations are defined as working on two's complement values. C stores integers in twos complement but with a fixed number of bits. Python bitwise operators act on twos complement values but as though they had an infinite number of bits: for positive numbers they extend leftwards to infinity with zeros, but negative numbers extend left with ones. The & operator will change that leftward string of ones into zeros and leave you with just the bits that would have fit into the C value.
Displaying the values in hex may make this clearer (and I rewrote to string of f's as an expression to show we are interested in either 32 or 64 bits):
>>> hex(i)
'-0x690c18'
>>> hex (i & ((1 << 32) - 1))
'0xff96f3e8'
>>> hex (i & ((1 << 64) - 1)
'0xffffffffff96f3e8L'
For a 32 bit value in C, positive numbers go up to 2147483647 (0x7fffffff), and negative numbers have the top bit set going from -1 (0xffffffff) down to -2147483648 (0x80000000). For values that fit entirely in the mask, we can reverse the process in Python by using a smaller mask to remove the sign bit and then subtracting the sign bit:
>>> u = i & ((1 << 32) - 1)
>>> (u & ((1 << 31) - 1)) - (u & (1 << 31))
-6884376
Or for the 64 bit version:
>>> u = 18446744073702667240
>>> (u & ((1 << 63) - 1)) - (u & (1 << 63))
-6884376
This inverse process will leave the value unchanged if the sign bit is 0, but obviously it isn't a true inverse because if you started with a value that wouldn't fit within the mask size then those bits are gone.
Python doesn't have builtin unsigned types. You can use mathematical operations to compute a new int representing the value you would get in C, but there is no "unsigned value" of a Python int. The Python int is an abstraction of an integer value, not a direct access to a fixed-byte-size integer.
Since version 3.2 :
def unsignedToSigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=False), 'little', signed=True)
def signedToUnsigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
output :
In [3]: unsignedToSigned(5, 1)
Out[3]: 5
In [4]: signedToUnsigned(5, 1)
Out[4]: 5
In [5]: unsignedToSigned(0xFF, 1)
Out[5]: -1
In [6]: signedToUnsigned(0xFF, 1)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 signedToUnsigned(0xFF, 1)
Input In [1], in signedToUnsigned(n, byte_count)
4 def signedToUnsigned(n, byte_count):
----> 5 return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
OverflowError: int too big to convert
In [7]: signedToUnsigned(-1, 1)
Out[7]: 255
Explanations : to/from_bytes convert to/from bytes, in 2's complement considering the number as one of size byte_count * 8 bits. In C/C++, chances are you should pass 4 or 8 as byte_count for respectively a 32 or 64 bit number (the int type).
I first pack the input number in the format it is supposed to be from (using the signed argument to control signed/unsigned), then unpack to the format we would like it to have been from. And you get the result.
Note the Exception when trying to use fewer bytes than required to represent the number (In [6]). 0xFF is 255 which can't be represented using a C's char type (-128 ≤ n ≤ 127). This is preferable to any other behavior.
You could use the struct Python built-in library:
Encode:
import struct
i = -6884376
print('{0:b}'.format(i))
packed = struct.pack('>l', i) # Packing a long number.
unpacked = struct.unpack('>L', packed)[0] # Unpacking a packed long number to unsigned long
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
-11010010000110000011000
4288082920
11111111100101101111001111101000
Decode:
dec_pack = struct.pack('>L', unpacked) # Packing an unsigned long number.
dec_unpack = struct.unpack('>l', dec_pack)[0] # Unpacking a packed unsigned long number to long (revert action).
print(dec_unpack)
Out:
-6884376
[NOTE]:
> is BigEndian operation.
l is long.
L is unsigned long.
In amd64 architecture int and long are 32bit, So you could use i and I instead of l and L respectively.
[UPDATE]
According to the #hl037_ comment, this approach works on int32 not int64 or int128 as I used long operation into struct.pack(). Nevertheless, in the case of int64, the written code would be changed simply using long long operand (q) in struct as follows:
Encode:
i = 9223372036854775807 # the largest int64 number
packed = struct.pack('>q', i) # Packing an int64 number
unpacked = struct.unpack('>Q', packed)[0] # Unpacking signed to unsigned
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
9223372036854775807
111111111111111111111111111111111111111111111111111111111111111
Next, follow the same way for the decoding stage. As well as this, keep in mind q is long long integer — 8byte and Q is unsigned long long
But in the case of int128, the situation is slightly different as there is no 16-byte operand for struct.pack(). Therefore, you should split your number into two int64.
Here's how it should be:
i = 10000000000000000000000000000000000000 # an int128 number
print(len('{0:b}'.format(i)))
max_int64 = 0xFFFFFFFFFFFFFFFF
packed = struct.pack('>qq', (i >> 64) & max_int64, i & max_int64)
a, b = struct.unpack('>QQ', packed)
unpacked = (a << 64) | b
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
123
10000000000000000000000000000000000000
111100001011110111000010000110101011101101001000110110110010000000011110100001101101010000000000000000000000000000000000000
just use abs for converting unsigned to signed in python
a=-12
b=abs(a)
print(b)
Output:
12
I'm searching for a simple algorithm that 'combines' two 2bytes integers into one unique 4bytes integer.
The two 2bytes integers are both whole positive numbers in the range 0..65535.
I want to create one 4bytes integer that is the exact combination of both, in a way that will make it easy to:
(1) given the two 2bytes integers --> calculate the value of that 4bytes integer.
(2) given the 4bytes integer --> parse the contents of the two 2bytes integers.
Any idea how to achieve this in python?
How about:
def combine(n, m):
return (n << 16) | m
def extract(c):
return (c >> 16), c & 0xffff
This solution places one of the 2-byte integers in the upper half of a 32-bit word, and the other into the lower half. In order to extract the values, simply take the upper half of the word (c >> 16), and the lower half (c & 0xffff).
>>> i1, i2 = 345, 12
>>> i1 * 0x10000 + i2
22609932
>>> divmod(22609932, 0x10000)
(345, 12)