I'm attempting to learn Z3 by break a simple (3 character) XOR-cipher: https://projecteuler.net/problem=59
So far I've specified some simple requirements, like the plaintext needing to be equivalent to the ciphertext ^ password, that the plaintext must consist of 7 bit ascii and that the password uses only lowercase characters.
ciphertext_bytes = parse_input("p059_cipher.txt")
set_param("parallel.enable", True)
set_param("parallel.threads.max", os.cpu_count())
s = Solver()
ctx = s.ctx
password = IntVector('x', 3, ctx)
ciphertext = IntVector('c', len(ciphertext_bytes), ctx)
plaintext = IntVector('p', len(ciphertext_bytes), ctx)
for x in password:
s.add(x >= 97)
s.add(x <= 122)
for i, value in enumerate(ciphertext_bytes):
ciphertext[i] = Int(value, ctx)
password_index = i % len(password)
password_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, password[password_index].as_ast()), ctx)
ciphertext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, ciphertext[i].as_ast()), ctx)
plaintext_char = BitVecRef(Z3_mk_int2bv(ctx.ref(), 8, plaintext[i].as_ast()), ctx)
s.add(password_char ^ ciphertext_char == plaintext_char)
s.add(plaintext[i] >= 0)
s.add(plaintext[i] <= 127)
s.add(ciphertext[i] >= 0)
s.add(ciphertext[i] <= 255)
print(s.check())
print(s.model())
The s.check() call does not terminate in any reasonable amount of time however.
As the specified problem is (currently still) brute-forceable with 26*3 tries (guessing each of the password chars separately and checking the plaintext after) I must have made a mistake somewhere.
Why is this code slow?
Why does Z3 not use multi-threading here?
This isn't really a suitable problem for an SMT solver, alas. The issue here is not that you can't solve the puzzle using SMT; but rather it does not buy you anything new. Note that in an XOR based encryption, the equality:
cipher = plain ^ key
has a solution for every value of cipher and key: You simply get a plain-text as a sequence of bytes, and the SMT solver has no way of making sure that it's valid English. (Yes, you can constrain it to be 7-bits per character etc., but that's really not cutting down the search space in any meaningful way.) The trick here is to make sure the plain text is "meaningful" English, as is the password. But there's no inherent knowledge in the SMT solver to tell you if a sequence of bytes correspond to meaningful English text, or text in any known natural language.
The best way to solve this problem, then, is to use classic decipher technology; like frequency analysis, dictionary based enumeration, etc. And none of that is really going to gain anything from an SMT solver.
Alternatively, given this is an Euler-project problem, your best best is probably to scan through the dictionary file in your computer and exhaustively search for a suitable solution; since there're only so many 3-letter words that can be used as valid passwords.
Regarding the "speed" issue you're observing: Note that int2bv is an expensive operation. You should avoid it, and simply use bit-vectors. But again, this will not help you here, because you'll "quickly" get a non-sense solution.
Related
Some time ago I found this function (unfortunately, I don't remember from where it came from, most likely from some Python framework) that compares two strings and returns a bool value. It's quite simple to understand what's going on here.
Finding xor between char returns 1 (True) if they do not match.
def cmp_strings(str1, str2):
return len(str1) == len(str2) and sum(ord(x)^ord(y) for x, y in zip(str1, str2)) == 0
But why is this function used? Isn't it the same as str1==str2?
It takes a similar amount of time to compare any strings that have the same length. It's used for security when the strings are sensitive. Usually it's used to compare password hashes.
If == is used, Python stops comparing characters when the first one not matching is found. This is bad for hashes because it could reveal how close a hash was to matching. This would help an attacker to brute force a password.
This is how hmac.compare_digest works.
The security issue that is being addressed by XOR comparison is known as a Timing Attack. ...This is where you observe how much time it takes the Compare function to succeed|fail, and use that knowledge to gain an advantage over the system.
There are 95 printable ASCII characters. If you have an 8 character password, there are 95^8 (6,634,204,312,890,625) possible combinations ...If the correct password is the last one in your list, and you can try 1 billion passwords per second, it will take you about 77 days to Brute Force the password ...That's too long - so we need a shortcut!
There are an infinite number of ways to store a string - and probably a dozen in popular use {length-prefixed, nul-terminated, ...}{Unicode, UTF-8, ASCII, ,...}. For this working example, I will use the ubiquitous 'NUL-terminated array of bytes using ASCII encoding' ...IE. "ABC" will be stored as "ABC"NUL, or {65, 66, 67, 0} ...but whatever storage/encoding standard you use, the problem is essentially the same.
Syntactically, there are as many ways to compare two strings as there are languages, eg. if str1 == str2 or if (strcmp(str1, str2) == 0) etc. ...but when you look at how they work internally, they are all pretty-much the same. Here is some simple (but realisitic) pseudo-code to perform a classic (non-security) string compare:
index = 0
LOOP FOREVER {
IF ( (str1[index] == 0) AND (str2[index] == 0) ) THEN return 'same'
IF (str1[index] != str2[index]) THEN return 'different'
index = index + 1
}
Assuming the secret password is "BY3"NUL ...Let's try some passwords, and notice how many operations the Compare function has to do to establish success|fail.
1. "A"NUL ... returns 'different' when 1st char is checked (A) [zero chars are correct]
2. "B"NUL ... returns 'different' when 2nd char is checked (NUL) [first char must be correct]
3. "BX"NUL ... returns 'different' when 2nd char is checked (X) [first char must be correct]
4. "BY"NUL ... returns 'different' when 3rd char is checked (NUL) [first two chars must be correct]
5. "BY1"NUL ... returns 'different' when 3rd char is checked (1) [first two chars must be correct]
6. "BY2"NUL ... returns 'different' when 3rd char is checked (2) [first two chars must be correct]
7. "BY3"NUL ... returns 'same' when the 4th character is checked (NUL) [all three chars are correct]
You can see that guess 1 fails the 1st time around the loop, guesses 2 & 3 fail the 2nd time around the loop ...guesses 4, 5, 6 fail the 3rd time around the loop ...and guess 7 succeeds the 4th time around the loop.
By observing how much time it takes the Compare function to fail, we can tell which character is wrong! This means we can actually guess the password one character at a time.
Again, let's assume an 8 character password made up of the 95 printable characters, and our last guess will be correct ...Because we can now guess the password one character at a time, it will take 95*8 (760) guesses. At 1 billion guesses per second, it will take about 0.7 milliseconds to find the password [it takes about 100mS to blink] ...which is a significant advantage over 77 days ...For a laugh work out the advantage for a 20 character password (95^20 vs 95 * 20).
So how do we stop an attacker from using a Timing Attack? [Spoiler: XOR]
The first thing we need to do is to make both strings the same length; and secondly, we must ALWAYS check EVERY character before returning 'same' or 'different' ...This is surprisingly difficult to do without introducing a new Timing Attack. But rather than show you lots of ways to get it wrong, let's see a way to do it right.
Passwords should (where possible) be stored as Hashes ...{DES, MD5, SHA-1, ...} have now been shown to have cryptographic flaws, {SHA-256, SHA-3, Whirlpool, ...} are still in good favour [Oct 2021] ...You may know that ALL Hashes (generated by a given algorithm) are the same length ...So if we Hash the guess and compare the Guess-Hash against the Stored-Hash, we have solved the first problem - the 'strings' (array of bytes) we need to compare are now ALWAYS the same length.
Secondly. How to make sure our Compare function ALWAYS takes the same amount of time to reach its decision ...There are probably a lot of ways to do this, but the most common solution is to use XOR like this:
result = 0
index = 0
LOOP WHILE (index < hashLength) {
result = result OR ( secretHash[index] XOR guessHash[index] )
index = index + 1
}
IF result == 0 THEN return 'same' ELSE return 'different'
And this way ALL calls to the compare function take the same length of time to run ...No more Timing Attack!
Footnote:
For readers not familiar with Boolean Logic - go and read up; but the essence here is:
If A and B are the same, (A XOR B) gives a result of 0
If A and B are different, (A XOR B) gives a non-0 result
If A and B are both 0, (A OR B) gives a result of 0
If either A or B are non-0, (A OR B) gives a non-0 result
So (looking at the second code block) the first time the XOR returns non-0 (different), the result becomes non-0 (different) and can never return to 0 (same).
A search for "cve timing attack" will provide you with a list of real-life examples.
It appears to be doing a correlation (XOR sum) character-wise between the strings, given they are of the same length. It could be required in situations where you need to know 'similarity' and not equality. Maybe that was the plan. The author might have wanted to extend this function further.
I came up to this problem where my RC-6 algorithm does not produce the cipher text it should (by the spec doc) well to be more clear, let me give you an example
As you see when plain text and key are made out of zero-bytes it passes both tests -> cipher text and decryption text tests
To clarify this even more the cipher values(both correct and wrong) ,are also ordered in little-endian fashion after encrypting.
So my question is - where should I look for invalid code ?
I have a feeling that it is something to do with the byte-ordering before passing it to encryption or key-scheduling functions.
The values I pass to the key-scheduling and encryption functions are straightforward arrays of 32bit words (e.g. [0x00,0x10,0x00,0x00]) and then I move one straight to algorithm (which I wrote looking at the pseudo-code) so no other formatting done before that.
They also start as follows :
def encrypt(plaintext,S):
A,C = plaintext[0],plaintext[2]
B = modulus(plaintext[1]+S[0])
D = modulus(plaintext[3]+S[1])
for i in range(1,r+1):
....
def keyGenerator(L):
c = len(L)
S = [int(0)]* (2*r+4)
S[0] = P
....
I could use any help..
Thank you in advance!
By the way the official test vectors could be in THIS document's appendix
So I found out what was wrong in this case. It was indeed a problem with swapping bytes. Since 0's were symmetric input it would go through, and input with mixed values were working ,however giving the wrong answer.
def swap32(x):
return (((x << 24) & 0xFF000000) |((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |((x >> 24) & 0x000000FF))
This function ,for swapping 8 byte blocks was very useful in my case. I had to swipe the key bytes, the plaintext bytes in the beggining of encryption, then at the end of the enryption, then at the beggining at decryption and at the end of decryption.
I hope someone will find this useful in the future and won't be stuck in the same place like I was..
Cheers
I have a hex string f6befc34e3de2d30. I want to convert it to signed long long, but
x['id'], = struct.unpack('>q', 'f6befc34e3de2d30'.decode('hex'))
gives:
-0b100101000001000000111100101100011100001000011101001011010000
0b1111011010111110111111000011010011100011110111100010110100110000
expected.
Thanks!
You could do long('f6befc34e3de2d30', 16)
bin(long('f6befc34e3de2d30', 16))
>>> '0b1111011010111110111111000011010011100011110111100010110100110000'
Edit: Follow up on #Paul Panzer's comment. That would be true with C type long implementation based on ALU hardware. You could not have signed integer larger that 2^63. However, Python's implementation is different, and relies on array representation of big numbers, and Karatsuba algorithm for arithmetic operations. That is why this method works.
Edit 2: Following OPs questions. There is no question of "first bit as sign". In your question you explicitly want to use the long construct of Python, for which the implementation is not the one you expect in the sense that, the representation it uses isn't the same as what you may be familiar with in C. Instead it represents large integers as an array. So if you want to implement some kind of first bit logic, you have to do it yourself. I have no culture or experience in that whatsoever so the following may come completely wrong as someone knowking his stuff, but still let me give you my take on this.
I see two ways of proceeding. In the first one you agree on a convention for the max long you want to work with, and then implement the same kind of logic the ALU does. Let us say for the sake of argument we want to work with sign long in the range [-2^127, 2^127-1]. We can do the following
MAX_LONG = long('1' + "".join([str(0)]*127), 2)
def parse_nb(s):
# returns the first bit and the significand in the case of a usual
# integer representation
b = bin(long(s, 16))
if len(b) < 130: # deal with the case where the leading zeros are absent
return "0", b[2:]
else:
return b[2], b[3:]
def read_long(s):
# takes an hexadecimal representation of a string, and return
# the corresponding long with the convention stated above
sign, mant = parse_nb(s)
b = "0b" + mant
if sign == "0":
return long(b, 2)
else:
return -MAX_LONG + long(b, 2)
read_long('5')
>>> 5L
# fffffffffffffffffffffffffffffffb is the representation of -5 using the
# usual integer representation, extended to 128 bits integers
read_long("fffffffffffffffffffffffffffffffb")
>>> -5L
For the second approach, you don't consider that there a MAX_LONG, but that the first bit is always the sign bit. Then you would have to modify the parse_nb method above. I leave that as an exercise :).
I'm still on my RSA project, and now I can successfully create the keys, and encrypt a string with them
def encrypt(clear_message, public_key):
clear_list = convert_into_unicode (clear_message)
n = public_key[0]
e = public_key[1]
message_chiffre = str()
for i, value in enumerate (clear_list) :
encrypted_value = str( pow (int(value), e, n) )
encrypted_message += (encrypted_value )
return encrypted_message
def convert_into_unicode (clear_message):
str_unicode = ''
for car in clear_message:
str_unicode += str (ord (car))
if len (str_unicode ) % 5 != 0:
str_unicode += (5 - len (str_unicode ) % 5) * '0'
clear_list = []
i = 5
while i <= len (str_unicode ):
clear_list .append (str_unicode [i-5:i])
i += 5
return liste_claire
For example, encrypting the message 'Hello World' returns ['72101', '10810', '81113', '28711', '11141', '08100', '32330'] as clear_list then
'3863 111 1616 3015 1202 341 4096' as encrypted_message
The encrypt () function uses the other function to convert the string into a list of the Unicode values but put in blocks because I've read that otherwise, it would be easy to find the clear message only with frequency analysis.
Is it really that easy?
And as it probably is, I come to my main question. As you know, the Unicode values of a character are either double-digits or triple-digits. Before the encryption, the Unicode values are separated into blocks of 5 digits ('stack' -> '115 116 97 99 107' -> '11511 69799 10700')
But the problem is when I want to decrypt this, how do I know where I have to separate that string so that one number represents one character?
I mean, the former Unicode value could be either 11 or 115 (I know it couldn't really be 11, but that's only as an example). So to decrypt and then get back the character, the problem is, I don't know how much digits I have to take.
I had thought of adding a 0 when the Unicode value is < 100, but
Then it's easy to do the same thing as before with the frequency analysis
Still, when I encrypt it, '087' can result in '467' and '089' can result in '046', so the problem is still here.
You're trying to solve real world problems with a toy RSA problem. The frequency analysis can be performed because no random padding of the plaintext message has been used. Random padding is required to make RSA secure.
For this kind of problem it is enough to directly use the Unicode code point (an integer value) per character as input to RSA. RSA can however only directly encrypt values in the range [0..N) where N is the modulus. If you input a larger value x then value will first be converted into the value x modulus N. In that case you loose information and decryption will not be deterministic anymore.
As for the ciphertext, just make this the string representation of the integer values separated by spaces and split them to read them in. This will take more space, but RSA always has a certain overhead.
If you want to implement secure RSA then please read into PKCS#1 standard and beware of time attacks etc. And, as Wyzard already indicated, please use hybrid cryptography (using a symmetric encryption in addition to RSA).
Or use a standard library, now you understand how RSA works in principle.
Your convert_into_unicode function isn't really converting anything "into" Unicode. Assuming clear_message is a Unicode string (The default string type in Python 3, or u'' in Python 2), it's (naturally) Unicode already, and you're using an awkward way of turning it into a sequence of bytes that you can encrypt. If clear_message is a byte string (the default in Python 2, or b'' in Python 3), all the characters fit in a byte already, so the whole process is unnecessary.
It's true that Unicode string needs to be encoded as a byte sequence before you can encrypt it. The normal way to do that is with an encoding such as UTF-8 or UTF-16. You can do that by calling clear_message.encode('utf-8'). After decrypting, you can turn the decrypted byte string back into a Unicode string with decrypted_bytes.decode('utf-8').
You don't need the convert_into_unicode function at all.
I am new to crypto and I am trying to interpret the below code. Namely, what does <xor> mean?
I have a secret_key secret key. I also have a unique_id. I create pad using the below code.
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
Once the pad is created, I have a price e.g. 1000. I am trying to follow this instruction which is pseudocode:
enc_price = pad <xor> price
In Python, what is the code to implement enc_price = pad <xor> price? What is the logic behind doing this?
As a note, a complete description of what I want to do here here:
https://developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
developers.google.com/ad-exchange/rtb/response-guide/decrypt-price
Thanks
The binary (I assume that's what you need) xor is ^ in python:
>>> 6 ^ 12
10
Binary xor works like this (numbers represented in binary):
1234
6 = 0110
12 = 1100
10 = 1010
For every pair of bits, if their sum is 1 (bits 1 and 3 in my example), the resulting bit is 1. Otherwise, it's 0.
The pad, and the plaintext "price" are each to be interpreted as a stream of bits. For each corresponding bit in the two streams, you take the "exclusive OR" of the pair of bits - if the bits are the same, you emit 0, if the bits are different, you emit 1. This operation is interesting because it's reversible: plaintext XOR pad -> ciphertext, and ciphertext XOR pad -> plaintext.
However, in Python, you won't usually do the XORing yourself because it's tedious and overly complex for a newbie; you want to use a popular encryption library such as PyCrypto to do the work.
You mean "Binary bitwise operations"?
The & operator yields the bitwise AND of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The ^ operator yields the bitwise XOR (exclusive OR) of its arguments, which must be plain or long integers. The arguments are converted to a common type.
The | operator yields the bitwise (inclusive) OR of its arguments, which must be plain or long integers. The arguments are converted to a common type.
[update]
Since you can't xor a string and a number, you should either:
convert the number to a string padded to the same size and xor each byte (may give you all sort of strange "escape" problems with some chars, for example, accidentally generating invalid unicode)
use the raw value (20 byte integer?) of the digest to xor and make an hexdigest of the resulting number.
Something like this (untested):
pad = hmac.new(secret_key, msg=unique_id, digestmod=hashlib.sha1).digest()
rawpad = reduce(lambda x, y: (x << 8) + y,
[ b for b in struct.unpack('B' * len(pad), pad)])
enc_price = "%X" % (rawpad ^ price)
[update]
The OP wants to implement "DoubleClick Ad Exchange Real-Time Bidding Protocol".
This very article tells there are some sample python code available:
Initial Testing
You can test your bidding application internally using requester.tar.gz. This is a test python program that sends requests to a bidding application and checks the responses. The program is available on request from your Ad Exchange representative.
I did it so
def strxor(s1,s2):
size = min(len(s1),len(s2))
res = ''
for i in range(size):
res = res + '%c' % (ord(s1[i]) ^ ord(s2[i]))
return res