I came up to this problem where my RC-6 algorithm does not produce the cipher text it should (by the spec doc) well to be more clear, let me give you an example
As you see when plain text and key are made out of zero-bytes it passes both tests -> cipher text and decryption text tests
To clarify this even more the cipher values(both correct and wrong) ,are also ordered in little-endian fashion after encrypting.
So my question is - where should I look for invalid code ?
I have a feeling that it is something to do with the byte-ordering before passing it to encryption or key-scheduling functions.
The values I pass to the key-scheduling and encryption functions are straightforward arrays of 32bit words (e.g. [0x00,0x10,0x00,0x00]) and then I move one straight to algorithm (which I wrote looking at the pseudo-code) so no other formatting done before that.
They also start as follows :
def encrypt(plaintext,S):
A,C = plaintext[0],plaintext[2]
B = modulus(plaintext[1]+S[0])
D = modulus(plaintext[3]+S[1])
for i in range(1,r+1):
....
def keyGenerator(L):
c = len(L)
S = [int(0)]* (2*r+4)
S[0] = P
....
I could use any help..
Thank you in advance!
By the way the official test vectors could be in THIS document's appendix
So I found out what was wrong in this case. It was indeed a problem with swapping bytes. Since 0's were symmetric input it would go through, and input with mixed values were working ,however giving the wrong answer.
def swap32(x):
return (((x << 24) & 0xFF000000) |((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |((x >> 24) & 0x000000FF))
This function ,for swapping 8 byte blocks was very useful in my case. I had to swipe the key bytes, the plaintext bytes in the beggining of encryption, then at the end of the enryption, then at the beggining at decryption and at the end of decryption.
I hope someone will find this useful in the future and won't be stuck in the same place like I was..
Cheers
Related
I'm attempting to learn Z3 by break a simple (3 character) XOR-cipher: https://projecteuler.net/problem=59
So far I've specified some simple requirements, like the plaintext needing to be equivalent to the ciphertext ^ password, that the plaintext must consist of 7 bit ascii and that the password uses only lowercase characters.
ciphertext_bytes = parse_input("p059_cipher.txt")
set_param("parallel.enable", True)
set_param("parallel.threads.max", os.cpu_count())
s = Solver()
ctx = s.ctx
password = IntVector('x', 3, ctx)
ciphertext = IntVector('c', len(ciphertext_bytes), ctx)
plaintext = IntVector('p', len(ciphertext_bytes), ctx)
for x in password:
s.add(x >= 97)
s.add(x <= 122)
for i, value in enumerate(ciphertext_bytes):
ciphertext[i] = Int(value, ctx)
password_index = i % len(password)
password_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, password[password_index].as_ast()), ctx)
ciphertext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, ciphertext[i].as_ast()), ctx)
plaintext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, plaintext[i].as_ast()), ctx)
s.add(password_char ^ ciphertext_char == plaintext_char)
s.add(plaintext[i] >= 0)
s.add(plaintext[i] <= 127)
s.add(ciphertext[i] >= 0)
s.add(ciphertext[i] <= 255)
print(s.check())
print(s.model())
The s.check() call does not terminate in any reasonable amount of time however.
As the specified problem is (currently still) brute-forceable with 26*3 tries (guessing each of the password chars separately and checking the plaintext after) I must have made a mistake somewhere.
Why is this code slow?
Why does Z3 not use multi-threading here?
This isn't really a suitable problem for an SMT solver, alas. The issue here is not that you can't solve the puzzle using SMT; but rather it does not buy you anything new. Note that in an XOR based encryption, the equality:
cipher = plain ^ key
has a solution for every value of cipher and key: You simply get a plain-text as a sequence of bytes, and the SMT solver has no way of making sure that it's valid English. (Yes, you can constrain it to be 7-bits per character etc., but that's really not cutting down the search space in any meaningful way.) The trick here is to make sure the plain text is "meaningful" English, as is the password. But there's no inherent knowledge in the SMT solver to tell you if a sequence of bytes correspond to meaningful English text, or text in any known natural language.
The best way to solve this problem, then, is to use classic decipher technology; like frequency analysis, dictionary based enumeration, etc. And none of that is really going to gain anything from an SMT solver.
Alternatively, given this is an Euler-project problem, your best best is probably to scan through the dictionary file in your computer and exhaustively search for a suitable solution; since there're only so many 3-letter words that can be used as valid passwords.
Regarding the "speed" issue you're observing: Note that int2bv is an expensive operation. You should avoid it, and simply use bit-vectors. But again, this will not help you here, because you'll "quickly" get a non-sense solution.
I am new to Python & I am trying to learn how to XOR hex encoded ciphertexts against one another & then derive the ASCII value of this.
I have tried some of the functions as outlined in previous posts on this subject - such as bytearray.fromhex, binascii.unhexlify, decode("hex") and they have all generated different errors (obviously due to my lack of understanding). Some of these errors were due to my python version (python 3).
Let me give a simple example, say I have a hex encoded string ciphertext_1 ("4A17") and a hex endoded string ciphertext_2. I want to XOR these two strings and derive their ASCII value. The closest that I have come to a solution is with the following code:
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
This prints me a result of: 0xd07
(This is a hex string is my understanding??)
I then try to convert this to its ASCII value. At the moment, I am trying:
binascii.unhexliy(result)
However this gives me an error: "binascii.Error: Odd-length string"
I have tried the different functions as outlined above, as well as trying to solve this specific error (strip function gives another error) - however I have been unsuccessful. I realise my knowledge and understanding of the subject are lacking, so i am hoping someone might be able to advise me?
Full example:
#!/usr/bin/env python
import binascii
ciphertext_1="4A17"
ciphertext_2="4710"
result=hex(int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
print(result)
print(binascii.unhexliy(result))
from binascii import unhexlify
ciphertext_1 = "4A17"
ciphertext_2 = "4710"
xored = (int(ciphertext_1, 16) ^ int(ciphertext_2, 16))
# We format this integer: hex, no leading 0x, uppercase
string = format(xored, 'X')
# We pad it with an initial 0 if the length of the string is odd
if len(string) % 2:
string = '0' + string
# unexlify returns a bytes object, we decode it to obtain a string
print(unhexlify(string).decode())
#
# Not much appears, just a CR followed by a BELL
Or, if you prefer the repr of the string:
print(repr(unhexlify(string).decode()))
# '\r\x07'
When doing byte-wise operations like XOR, it's often easier to work with bytes objects (since the individual bytes are treated as integers). From this question, then, we get:
ciphertext_1 = bytes.fromhex("4A17")
ciphertext_2 = bytes.fromhex("4710")
XORing the bytes can then be accomplished as in this question, with a comprehension. Then you can convert that to a string:
result = [c1 ^ c2 for (c1, c2) in zip(ciphertext_1, ciphertext_2)]
result = ''.join(chr(c) for c in result)
I would probably take a slightly different angle and create a bytes object instead of a list, which can be decoded into your string:
result = bytes(b1 ^ b2 for (b1, b2) in zip(ciphertext_1, ciphertext_2)).decode()
I'm still on my RSA project, and now I can successfully create the keys, and encrypt a string with them
def encrypt(clear_message, public_key):
clear_list = convert_into_unicode (clear_message)
n = public_key[0]
e = public_key[1]
message_chiffre = str()
for i, value in enumerate (clear_list) :
encrypted_value = str( pow (int(value), e, n) )
encrypted_message += (encrypted_value )
return encrypted_message
def convert_into_unicode (clear_message):
str_unicode = ''
for car in clear_message:
str_unicode += str (ord (car))
if len (str_unicode ) % 5 != 0:
str_unicode += (5 - len (str_unicode ) % 5) * '0'
clear_list = []
i = 5
while i <= len (str_unicode ):
clear_list .append (str_unicode [i-5:i])
i += 5
return liste_claire
For example, encrypting the message 'Hello World' returns ['72101', '10810', '81113', '28711', '11141', '08100', '32330'] as clear_list then
'3863 111 1616 3015 1202 341 4096' as encrypted_message
The encrypt () function uses the other function to convert the string into a list of the Unicode values but put in blocks because I've read that otherwise, it would be easy to find the clear message only with frequency analysis.
Is it really that easy?
And as it probably is, I come to my main question. As you know, the Unicode values of a character are either double-digits or triple-digits. Before the encryption, the Unicode values are separated into blocks of 5 digits ('stack' -> '115 116 97 99 107' -> '11511 69799 10700')
But the problem is when I want to decrypt this, how do I know where I have to separate that string so that one number represents one character?
I mean, the former Unicode value could be either 11 or 115 (I know it couldn't really be 11, but that's only as an example). So to decrypt and then get back the character, the problem is, I don't know how much digits I have to take.
I had thought of adding a 0 when the Unicode value is < 100, but
Then it's easy to do the same thing as before with the frequency analysis
Still, when I encrypt it, '087' can result in '467' and '089' can result in '046', so the problem is still here.
You're trying to solve real world problems with a toy RSA problem. The frequency analysis can be performed because no random padding of the plaintext message has been used. Random padding is required to make RSA secure.
For this kind of problem it is enough to directly use the Unicode code point (an integer value) per character as input to RSA. RSA can however only directly encrypt values in the range [0..N) where N is the modulus. If you input a larger value x then value will first be converted into the value x modulus N. In that case you loose information and decryption will not be deterministic anymore.
As for the ciphertext, just make this the string representation of the integer values separated by spaces and split them to read them in. This will take more space, but RSA always has a certain overhead.
If you want to implement secure RSA then please read into PKCS#1 standard and beware of time attacks etc. And, as Wyzard already indicated, please use hybrid cryptography (using a symmetric encryption in addition to RSA).
Or use a standard library, now you understand how RSA works in principle.
Your convert_into_unicode function isn't really converting anything "into" Unicode. Assuming clear_message is a Unicode string (The default string type in Python 3, or u'' in Python 2), it's (naturally) Unicode already, and you're using an awkward way of turning it into a sequence of bytes that you can encrypt. If clear_message is a byte string (the default in Python 2, or b'' in Python 3), all the characters fit in a byte already, so the whole process is unnecessary.
It's true that Unicode string needs to be encoded as a byte sequence before you can encrypt it. The normal way to do that is with an encoding such as UTF-8 or UTF-16. You can do that by calling clear_message.encode('utf-8'). After decrypting, you can turn the decrypted byte string back into a Unicode string with decrypted_bytes.decode('utf-8').
You don't need the convert_into_unicode function at all.
I am new to crypto and I am trying to interpret the below code. Namely, what does <xor> mean?
I have a secret_key secret key. I also have a unique_id. I create pad using the below code.
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
Once the pad is created, I have a price e.g. 1000. I am trying to follow this instruction which is pseudocode:
enc_price = pad <xor> price
In Python, what is the code to implement enc_price = pad <xor> price? What is the logic behind doing this?
As a note, a complete description of what I want to do here here:
https://developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
Thanks
The binary (I assume that's what you need) xor is ^ in python:
>>> 6 ^ 12
10
Binary xor works like this (numbers represented in binary):
1234
6 = 0110
12 = 1100
10 = 1010
For every pair of bits, if their sum is 1 (bits 1 and 3 in my example), the resulting bit is 1. Otherwise, it's 0.
The pad, and the plaintext "price" are each to be interpreted as a stream of bits. For each corresponding bit in the two streams, you take the "exclusive OR" of the pair of bits - if the bits are the same, you emit 0, if the bits are different, you emit 1. This operation is interesting because it's reversible: plaintext XOR pad -> ciphertext, and ciphertext XOR pad -> plaintext.
However, in Python, you won't usually do the XORing yourself because it's tedious and overly complex for a newbie; you want to use a popular encryption library such as PyCrypto to do the work.
You mean "Binary bitwise operations"?
The & operator yields the bitwise AND of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The ^ operator yields the bitwise XOR (exclusive OR) of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The | operator yields the bitwise (inclusive) OR of its arguments, which must be plain or long integers. The arguments are converted to a common type.
[update]
Since you can't xor a string and a number, you should either:
convert the number to a string padded to the same size and xor each byte (may give you all sort of strange "escape" problems with some chars, for example, accidentally generating invalid unicode)
use the raw value (20 byte integer?) of the digest to xor and make an hexdigest of the resulting number.
Something like this (untested):
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
rawpad = reduce(lambda x, y: (x << 8) + y,
[ b for b in struct.unpack('B' * len(pad), pad)])
enc_price = "%X" % (rawpad ^ price)
[update]
The OP wants to implement "DoubleClick Ad Exchange Real-Time Bidding Protocol".
This very article tells there are some sample python code available:
Initial Testing
You can test your bidding application internally using requester.tar.gz. This is a test python program that sends requests to a bidding application and checks the responses. The program is available on request from your Ad Exchange representative.
I did it so
def strxor(s1,s2):
size = min(len(s1),len(s2))
res = ''
for i in range(size):
res = res + '%c' % (ord(s1[i]) ^ ord(s2[i]))
return res
I'm trying to calculate the Frame Check Sequence (FCS) of an Ethernet packet byte by byte. The polynomial is 0x104C11DB7.
I did follow the XOR-SHIFT algorithm seen here http://en.wikipedia.org/wiki/Cyclic_redundancy_check or here http://www.woodmann.com/fravia/crctut1.htm
Assume the information that is supposed have a CRC is only one byte. Let's say it is 0x03.
step: pad with 32 bits to the right
0x0300000000
align the polynomial and the data at the left hand side with their first bit that is not zero and xor them
0x300000000 xor 0x209823B6E = 0x109823b6e
take remainder align and xor again
0x109823b6e xor 0x104C11DB7 = 0x0d4326d9
Since there are no more bit left the CRC32 of 0x03 should be 0x0d4326d9
Unfortunately all the software implementations tell me I'm wrong, but what did I do wrong or what are they doing differently?
Python tells me:
"0x%08x" % binascii.crc32(chr(0x03))
0x4b0bbe37
The online tool here http://www.lammertbies.nl/comm/info/crc-calculation.html#intr gets the same result.
What is the difference between my hand calculation and the algorithm the mentioned software uses?
UPDATE:
Turns out there was a similar question already on stack overflow:
You find an answer here Python CRC-32 woes
Although this is not very intuitive. If you want a more formal description on how it is done for Ethernet frames you can look at the Ethernet Standard document 802.3 Part 3 - Chapter 3.2.9 Frame Check Sequence Field
Lets continue the example from above:
Reverse the bit order of your message. That represents the way they would come into the receiver bit by bit.
0x03 therefore is 0xC0
Complement the first 32 bit of your message. Notice we pad the single byte with 32 bit again.
0xC000000000 xor 0xFFFFFFFF = 0x3FFFFFFF00
Complete the Xor and shift method from above again. After about 6 step you get:
0x13822f2d
The above bit sequense is then complemented.
0x13822f2d xor 0xFFFFFFFF = 0xec7dd0d2
Remember that we reversed the bit order to get the representation on the Ethernet wire in step one. Now we have to reverse this step and we finally fulfill our quest.
0x4b0bbe37
Whoever came up with this way of doing it should be ...
A lot of times you actually want to know it the message you received is correct. In order to achieve this you take your received message including the FCS and do the same step 1 through 5 as above. The result should be what they call residue. Which is a constant for a given polynomial. In this case it is 0xC704DD7B.
As mcdowella mentions you have to play around with your bits until you get it right, depending on the Application you are using.
This snippet writes the correct CRC for Ethernet.
Python 3
# write payload
for byte in data:
f.write(f'{byte:02X}\n')
# write FCS
crc = zlib.crc32(data)
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write(f'{byte:02X}\n')
Python 2
# write payload
for byte in data:
f.write('%02X\n' % ord(byte))
# write FCS
crc = zlib.crc32(data) & 0xFFFFFFFF
for i in range(4):
byte = (crc >> (8*i)) & 0xFF
f.write('%02X\n' % byte)
Would have saved me some time if I found this here.
There is generally a bit of trial and error required to get CRC calculations to match, because you never end up reading exactly what has to be done. Sometimes you have to bit-reverse the input bytes or the polynomial, sometimes you have to start off with a non-zero value, and so on.
One way to bypass this is to look at the source of a program getting it right, such as http://sourceforge.net/projects/crcmod/files/ (at least it claims to match, and comes with a unit test for this).
Another is to play around with an implementation. For instance, if I use the calculator at http://www.lammertbies.nl/comm/info/crc-calculation.html#intr I can see that giving it 00000000 produces a CRC of 0x2144DF1C, but giving it FFFFFFFF produces FFFFFFFF - so it's not exactly the polynomial division you describe, for which 0 would have checksum 0
From a quick glance at the source code and these results I think you need to start with an CRC of 0xFFFFFFFF - but I could be wrong and you might end up debugging your code side by side with the implementation, using corresponding printfs to find out where the first differ, and fixing the differences one by one.
There are a number of places on the Internet where you will read that the bit order must be reversed before calculating the FCS, but the 802.3 spec is not one of them. Quoting from the 2008 version of the spec:
3.2.9 Frame Check Sequence (FCS) field
A cyclic redundancy check (CRC) is used by the transmit and receive algorithms to
generate a CRC value for the FCS field. The FCS field contains a 4-octet (32-bit)
CRC value. This value is computed as a function of the contents of the protected
fields of the MAC frame: the Destination Address, Source Address, Length/ Type
field, MAC Client Data, and Pad (that is, all fields except FCS). The encoding is
defined by the following generating polynomial.
G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11
+ x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Mathematically, the CRC value corresponding to a given MAC frame is defined by
the following procedure:
a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the coefficients
of a polynomial M(x) of degree n ā 1. (The first bit of the Destination Address
field corresponds to the x(nā1) term and the last bit of the MAC Client Data
field (or Pad field if present) corresponds to the x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of
degree ā¤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.
The 32 bits of the CRC value are placed in the FCS field so that the x31 term is
the left-most bit of the first octet, and the x0 term is the right most bit of the
last octet. (The bits of the CRC are thus transmitted in the order x31, x30,...,
x1, x0.) See Hammond, et al. [B37].
Certainly the rest of the bits in the frame are transmitted in reverse order, but that does not include the FCS. Again, from the spec:
3.3 Order of bit transmission
Each octet of the MAC frame, with the exception of the FCS, is transmitted least
significant bit first.
http://en.wikipedia.org/wiki/Cyclic_redundancy_check
has all the data for ethernet and wealth of important details, for example there are (at least) 2 conventions to encode polynomial into a 32-bit value, largest term first or smallest term first.