byte vs str.encode in python3 - python

I am using python3.5, Let
M = '\x09\x00\x00\x00\x01\x89\x02\xdb\xd6\x01\x49\x63\x74'
Why is the left part of the following comparison not equal to the right part?
> M.encode() == b'\x09\x00\x00\x00\x01\x89\x02\xdb\xd6\x01\x49\x63\x74'
$ FALSE
the type of both is bytes. How can I reach from M to a variable that contains the right part of the above comparison?

not an encoding specialist, but whatever the encoding you'll be choosing, if it's not correct, it will interpret characters and the result won't be the same as the bytes object.
It will work with pure ascii, but with your values it doesn't.
One way would be to rebuild the bytes object from character codes that ord provides:
M = '\x09\x00\x00\x00\x01\x89\x02\xdb\xd6\x01\x49\x63\x74'
N = b'\x09\x00\x00\x00\x01\x89\x02\xdb\xd6\x01\x49\x63\x74'
M2 = bytes(map(ord,M))
print(N == M2)
yields True :)

Related

Decode method throws an error in Python 3

Similar to this other question on decoding a hex string, I have some code in a Python 2.7 script which has worked for years. I'm now trying to convert that script to Python 3.
OK, I apologize for not posting a complete question initially. I hope this clarifies the situation.
The issue is that I'm trying to convert an older Python 2.7 script to Python 3.8. For the most part the conversion has gone ok, but I am having issues converting the following code:
# get Register Stings
RegString = ""
for i in range(length):
if regs[start+i]!=0:
RegString = RegString + str(format(regs[start+i],'x').decode('hex'))
Here are some suppodrting data:
regs[start+0] = 20341
regs[start+1] = 29762
I think that my Python 2.7 code is converting these to HEX as "4f75" and "7442", respectively. And then to the characters "Ou" and "tB", respectively.
In Python 3 I get this error:
'str' object has no attribute 'decode'
My goal is to modify my Python 3 code so that the script will generate the same results.
str(format(regs[start+i],'x').decode('hex')) is a very verbose and round-about way of turning the non-zero integer values in regs[start:start + length] into individual characters of a bytestring (str in Python 2 should really be seen as a sequence of bytes). It first converts an integer value into a hexadecimal representation (a string), decodes that hexadecimal string to a (series) of string characters, then calls str() on the result (redundantly, the value is already a string). Assuming that the values in regs are integers in the range 0-255 (or even 0-127), in Python 2 this should really have been using the chr() function.
If you want to preserve the loop use chr() (to get a str string value) or if you need a binary value, use bytes([...]). So:
RegString = ""
for codepoint in regs[start:start + length]:
RegString += chr(codepoint)
or
RegString = b""
for codepoint in regs[start:start + length]:
RegString += bytes([codepoint])
Since this is actually converting a sequence of integers, you can just pass the whole lot to bytes() and filter out the zeros as you go:
# only take non-zero values
RegString = bytes(b for b in regs[start:start + length] if b)
or remove the nulls afterwards:
RegString = bytes(regs[start:start + length]).replace(b"\x00", b"")
If that's still supposed to be a string and not a bytes value, you can then decode it, with whatever encoding is appropriate (ASCII if the integers are in the range 0-127, or a more specific codec otherwise, in Python 2 this code produced a bytestring so look for other hints in the code as to what encoding they might have been using).

Hex string convert to ASCII after XOR

I am new to Python & I am trying to learn how to XOR hex encoded ciphertexts against one another & then derive the ASCII value of this.
I have tried some of the functions as outlined in previous posts on this subject - such as bytearray.fromhex, binascii.unhexlify, decode("hex") and they have all generated different errors (obviously due to my lack of understanding). Some of these errors were due to my python version (python 3).
Let me give a simple example, say I have a hex encoded string ciphertext_1 ("4A17") and a hex endoded string ciphertext_2. I want to XOR these two strings and derive their ASCII value. The closest that I have come to a solution is with the following code:
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
This prints me a result of: 0xd07
(This is a hex string is my understanding??)
I then try to convert this to its ASCII value. At the moment, I am trying:
binascii.unhexliy(result)
However this gives me an error: "binascii.Error: Odd-length string"
I have tried the different functions as outlined above, as well as trying to solve this specific error (strip function gives another error) - however I have been unsuccessful. I realise my knowledge and understanding of the subject are lacking, so i am hoping someone might be able to advise me?
Full example:
#!/usr/bin/env python
import binascii
ciphertext_1="4A17"
ciphertext_2="4710"
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
print(binascii.unhexliy(result))
from binascii import unhexlify
ciphertext_1 = "4A17"
ciphertext_2 = "4710"
xored = (int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
# We format this integer: hex, no leading 0x, uppercase
string = format(xored, 'X')
# We pad it with an initial 0 if the length of the string is odd
if len(string) % 2:
string = '0' + string
# unexlify returns a bytes object, we decode it to obtain a string
print(unhexlify(string).decode())
#
# Not much appears, just a CR followed by a BELL
Or, if you prefer the repr of the string:
print(repr(unhexlify(string).decode()))
# '\r\x07'
When doing byte-wise operations like XOR, it's often easier to work with bytes objects (since the individual bytes are treated as integers). From this question, then, we get:
ciphertext_1 = bytes.fromhex("4A17")
ciphertext_2 = bytes.fromhex("4710")
XORing the bytes can then be accomplished as in this question, with a comprehension. Then you can convert that to a string:
result = [c1 ^ c2 for (c1, c2) in zip(ciphertext_1, ciphertext_2)]
result = ''.join(chr(c) for c in result)
I would probably take a slightly different angle and create a bytes object instead of a list, which can be decoded into your string:
result = bytes(b1 ^ b2 for (b1, b2) in zip(ciphertext_1, ciphertext_2)).decode()

How to convert hexadecimal string to character with that code point?

I have the string x = '0x32' and would like to turn it into y = '\x32'.
Note that len(x) == 4 and len(y) == 1.
I've tried to use z = x.replace("0", "\\"), but that causes z = '\\x32' and len(z) == 4. How can I achieve this?
You do not have to make it that hard: you can use int(..,16) to parse a hex string of the form 0x.... Next you simply use chr(..) to convert that number into a character with that Unicode (and in case the code is less than 128 ASCII) code:
y = chr(int(x,16))
This results in:
>>> chr(int(x,16))
'2'
But \x32 is equal to '2' (you can look it up in the ASCII table):
>>> chr(int(x,16)) == '\x32'
True
and:
>>> len(chr(int(x,16)))
1
Try this:
z = x[2:].decode('hex')
The ability to include code points like '\x32' inside a quoted string is a convenience for the programmer that only works in literal values inside the source code. Once you're manipulating strings in memory, that option is no longer available to you, but there are other ways of getting a character into a string based on its code point value.
Also note that '\x32' results in exactly the same string as '2'; it's just typed out differently.
Given a string containing a hexadecimal literal, you can convert it to its numeric value with int(str,16). Once you have a numeric value, you can convert it to the character with that code point via chr(). So putting it all together:
x = '0x32'
print(chr(int(x,16)))
#=> 2

how to check for a presence of a character in a byte array

This sounds too basic for me to ask but here goes. I have a bytearray. I want to check for the presence of let's say 'a' or 'A' in the array and print the count of them. I do the following but I don't see any even though I know there is 'a' in there -
a_bytes = bytearray.fromhex(hex_string)
count = 0
for x in a_bytes:
if ( (x=='a') or (x == 'A') ):
count = count+1
return count
Why doesn't the above code work? I printed out the byte values as integers and I see 65 repeating multiple times.
Then again I try to convert the constant 'a' to integer using int('a') but I get an error --
ValueError: invalid literal for int() with base 10: 'a'
The values in the bytearray are stored as integers, not as hex representations. You need to search for 65 or 97, not "A" or "a".
If you want to use this to look up strings, just use a list. If you're not interested in the integer values of the bytes, a bytearray is not the right choice. Also, if you use a list, you can just use the .count method of lists to directly count occurrences of a particular value.
Comparison isn't supported between int and str types. You are trying to compare a byte with a string or character. To get the unicode codepoint of a character you can use the ord() function. Note that a unicode codepoint is an integer between 0 and about 1 million (unlike the range of a byte that is [0-255]) but in case of some encodings used in your byte array (let's say ascii or utf-8) and in case of ascii characters the usage of ord() is OK. Introducing the relation between byte arrays and strings (encoding) is out of the scope of this answer.
A solution to correct your code is to replace 'a' and 'A' with ord('a') and ord('A') respectively as others recommended.
However instead of your solution I would do this:
count = a_bytes.count(b'a') + a_bytes.count(b'A')
This makes the code much simpler and readable in your scenario.
Converting between characters and their integer-value and back is done with the functions ord and chr. So ord('A') == 65.
And as the byte-array stores the values as ints, the result is
a_bytes = bytearray.fromhex(hex_string)
count = sum(c == ord('A') or c == ord('b') for c in a_bytes)
This works because True is 1 and False is 0
If you want to work with strings and characters, you need to decode the byte data first:
a_bytes = bytearray.fromhex(hex_string).decode('ascii')
count = 0
for x in a_bytes:
if x == 'a' or x == 'A':
count += 1
return count

Python 2,3 Convert Integer to "bytes" Cleanly

The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number

Categories

Resources