I am new to Python & I am trying to learn how to XOR hex encoded ciphertexts against one another & then derive the ASCII value of this.
I have tried some of the functions as outlined in previous posts on this subject - such as bytearray.fromhex, binascii.unhexlify, decode("hex") and they have all generated different errors (obviously due to my lack of understanding). Some of these errors were due to my python version (python 3).
Let me give a simple example, say I have a hex encoded string ciphertext_1 ("4A17") and a hex endoded string ciphertext_2. I want to XOR these two strings and derive their ASCII value. The closest that I have come to a solution is with the following code:
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
This prints me a result of: 0xd07
(This is a hex string is my understanding??)
I then try to convert this to its ASCII value. At the moment, I am trying:
binascii.unhexliy(result)
However this gives me an error: "binascii.Error: Odd-length string"
I have tried the different functions as outlined above, as well as trying to solve this specific error (strip function gives another error) - however I have been unsuccessful. I realise my knowledge and understanding of the subject are lacking, so i am hoping someone might be able to advise me?
Full example:
#!/usr/bin/env python
import binascii
ciphertext_1="4A17"
ciphertext_2="4710"
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
print(binascii.unhexliy(result))
from binascii import unhexlify
ciphertext_1 = "4A17"
ciphertext_2 = "4710"
xored = (int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
# We format this integer: hex, no leading 0x, uppercase
string = format(xored, 'X')
# We pad it with an initial 0 if the length of the string is odd
if len(string) % 2:
string = '0' + string
# unexlify returns a bytes object, we decode it to obtain a string
print(unhexlify(string).decode())
#
# Not much appears, just a CR followed by a BELL
Or, if you prefer the repr of the string:
print(repr(unhexlify(string).decode()))
# '\r\x07'
When doing byte-wise operations like XOR, it's often easier to work with bytes objects (since the individual bytes are treated as integers). From this question, then, we get:
ciphertext_1 = bytes.fromhex("4A17")
ciphertext_2 = bytes.fromhex("4710")
XORing the bytes can then be accomplished as in this question, with a comprehension. Then you can convert that to a string:
result = [c1 ^ c2 for (c1, c2) in zip(ciphertext_1, ciphertext_2)]
result = ''.join(chr(c) for c in result)
I would probably take a slightly different angle and create a bytes object instead of a list, which can be decoded into your string:
result = bytes(b1 ^ b2 for (b1, b2) in zip(ciphertext_1, ciphertext_2)).decode()
Related
I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?
All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'
Similar to this other question on decoding a hex string, I have some code in a Python 2.7 script which has worked for years. I'm now trying to convert that script to Python 3.
OK, I apologize for not posting a complete question initially. I hope this clarifies the situation.
The issue is that I'm trying to convert an older Python 2.7 script to Python 3.8. For the most part the conversion has gone ok, but I am having issues converting the following code:
# get Register Stings
RegString = ""
for i in range(length):
if regs[start+i]!=0:
RegString = RegString + str(format(regs[start+i],'x').decode('hex'))
Here are some suppodrting data:
regs[start+0] = 20341
regs[start+1] = 29762
I think that my Python 2.7 code is converting these to HEX as "4f75" and "7442", respectively. And then to the characters "Ou" and "tB", respectively.
In Python 3 I get this error:
'str' object has no attribute 'decode'
My goal is to modify my Python 3 code so that the script will generate the same results.
str(format(regs[start+i],'x').decode('hex')) is a very verbose and round-about way of turning the non-zero integer values in regs[start:start + length] into individual characters of a bytestring (str in Python 2 should really be seen as a sequence of bytes). It first converts an integer value into a hexadecimal representation (a string), decodes that hexadecimal string to a (series) of string characters, then calls str() on the result (redundantly, the value is already a string). Assuming that the values in regs are integers in the range 0-255 (or even 0-127), in Python 2 this should really have been using the chr() function.
If you want to preserve the loop use chr() (to get a str string value) or if you need a binary value, use bytes([...]). So:
RegString = ""
for codepoint in regs[start:start + length]:
RegString += chr(codepoint)
or
RegString = b""
for codepoint in regs[start:start + length]:
RegString += bytes([codepoint])
Since this is actually converting a sequence of integers, you can just pass the whole lot to bytes() and filter out the zeros as you go:
# only take non-zero values
RegString = bytes(b for b in regs[start:start + length] if b)
or remove the nulls afterwards:
RegString = bytes(regs[start:start + length]).replace(b"\x00", b"")
If that's still supposed to be a string and not a bytes value, you can then decode it, with whatever encoding is appropriate (ASCII if the integers are in the range 0-127, or a more specific codec otherwise, in Python 2 this code produced a bytestring so look for other hints in the code as to what encoding they might have been using).
I'm still on my RSA project, and now I can successfully create the keys, and encrypt a string with them
def encrypt(clear_message, public_key):
clear_list = convert_into_unicode (clear_message)
n = public_key[0]
e = public_key[1]
message_chiffre = str()
for i, value in enumerate (clear_list) :
encrypted_value = str( pow (int(value), e, n) )
encrypted_message += (encrypted_value )
return encrypted_message
def convert_into_unicode (clear_message):
str_unicode = ''
for car in clear_message:
str_unicode += str (ord (car))
if len (str_unicode ) % 5 != 0:
str_unicode += (5 - len (str_unicode ) % 5) * '0'
clear_list = []
i = 5
while i <= len (str_unicode ):
clear_list .append (str_unicode [i-5:i])
i += 5
return liste_claire
For example, encrypting the message 'Hello World' returns ['72101', '10810', '81113', '28711', '11141', '08100', '32330'] as clear_list then
'3863 111 1616 3015 1202 341 4096' as encrypted_message
The encrypt () function uses the other function to convert the string into a list of the Unicode values but put in blocks because I've read that otherwise, it would be easy to find the clear message only with frequency analysis.
Is it really that easy?
And as it probably is, I come to my main question. As you know, the Unicode values of a character are either double-digits or triple-digits. Before the encryption, the Unicode values are separated into blocks of 5 digits ('stack' -> '115 116 97 99 107' -> '11511 69799 10700')
But the problem is when I want to decrypt this, how do I know where I have to separate that string so that one number represents one character?
I mean, the former Unicode value could be either 11 or 115 (I know it couldn't really be 11, but that's only as an example). So to decrypt and then get back the character, the problem is, I don't know how much digits I have to take.
I had thought of adding a 0 when the Unicode value is < 100, but
Then it's easy to do the same thing as before with the frequency analysis
Still, when I encrypt it, '087' can result in '467' and '089' can result in '046', so the problem is still here.
You're trying to solve real world problems with a toy RSA problem. The frequency analysis can be performed because no random padding of the plaintext message has been used. Random padding is required to make RSA secure.
For this kind of problem it is enough to directly use the Unicode code point (an integer value) per character as input to RSA. RSA can however only directly encrypt values in the range [0..N) where N is the modulus. If you input a larger value x then value will first be converted into the value x modulus N. In that case you loose information and decryption will not be deterministic anymore.
As for the ciphertext, just make this the string representation of the integer values separated by spaces and split them to read them in. This will take more space, but RSA always has a certain overhead.
If you want to implement secure RSA then please read into PKCS#1 standard and beware of time attacks etc. And, as Wyzard already indicated, please use hybrid cryptography (using a symmetric encryption in addition to RSA).
Or use a standard library, now you understand how RSA works in principle.
Your convert_into_unicode function isn't really converting anything "into" Unicode. Assuming clear_message is a Unicode string (The default string type in Python 3, or u'' in Python 2), it's (naturally) Unicode already, and you're using an awkward way of turning it into a sequence of bytes that you can encrypt. If clear_message is a byte string (the default in Python 2, or b'' in Python 3), all the characters fit in a byte already, so the whole process is unnecessary.
It's true that Unicode string needs to be encoded as a byte sequence before you can encrypt it. The normal way to do that is with an encoding such as UTF-8 or UTF-16. You can do that by calling clear_message.encode('utf-8'). After decrypting, you can turn the decrypted byte string back into a Unicode string with decrypted_bytes.decode('utf-8').
You don't need the convert_into_unicode function at all.
The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number
I feel like a complete tool for posting this, it is so basic and I cant believe I have wasted the last two days on this problem. I've tried all the solutions I can find on this (seriously, I will show you my internet history) but to no avail. Here is the problem:
I am parsing a serial string in from a uC. It is 52 bytes long and contains a lot of different variables of data. The data in encoded in packed binary coded decimal.
Ex: .....blah.....0x01 0x5E .....blah
015E hex gives 350 decimal. This is the value I want. I am reading in the serial string just fine, I used binascii.hexifiy to print the bytes to ensure it is corrent. I use
data = ser.read()
and placed the data in an array if an newline is not received. I have tried making the array a bytearray, list, anything that I could find, but none work.
I want to send the required two byte section to a defined method.
def makeValue(highbyte, lowbyte)
When I try to use unpack, join, pack, bit manipulation, string concentation, I keep getting the same problem.
Because 0x01 and 0x5E are not valid int numbers (start of heading and ^ in ASCII), it wont work. It wont even let me join the numbers first because it's not a valid int.
using hex(): hex argument can't be converted to hex.
Joining the strings: invalid literal for int() with base 16: '\x01^'
using int: invalid literal for int() with base 10: '\x01^'
Packing a struct: struct.error: cannot convert argument to integer
Seriously, am I missing something really basic here? All the examples I can find make use of all the functions above perfectly but they specificy the hex numbers '0x1234', or the numbers they are converting are actual ASCII numbers. Please help.
EDIT
I got it, ch3ka set me on the right track, thanks a million!
I don't know why it wouldn't work before but I hex'ed both values
one = binascii.hexlify(line[7])
two = binascii.hexlify(line[8])
makeValue(one, two)`
and then used the char makeValues ch3ka defined:
def makeValue(highbyte, lowbyte)
print int(highbyte, 16)*256 + int(lowbyte, 16)
Thanks again!!!
you are interpreting the values as chars. Feeding chars to int() won't work, you have to feed the values as strings, like so: int("0x5E", 16). What you are attempting is in fact int(chr(int("0x5E", 16)),16), which is int("^",16) and will of course not work.
Do you expect these results?
makevalue('0x01', '0x5E') -> 350 0x15e 0b101011110
makevalue('0xFF', '0x00') -> 65280 0xff00 0b1111111100000000
makevalue('0x01', '0xFF') -> 511 0x1ff 0b111111111
makevalue('0xFF', '0xFF') -> 65535 0xffff 0b1111111111111111
If so, you can use this:
def makeValue(highbyte, lowbyte):
return int(highbyte, 16)*256 + int(lowbyte, 16)
or the IMO more ugly and errorprone:
def makeValue(highbyte, lowbyte):
return int(highbyte+lowbyte[2:], 16) # strips leading "0x" from lowbyte be4 concat