Guessing XOR secret key knowing some part of it

Guessing XOR secret key knowing some part of it - python

I'm trying to guess the secret key to decrypt a message using Python 3. I know the message is going to be something like: crypto{1XXXXXX} where the XXXXXXX is the unknown part of the message.
The encrypt message is: '0e0b213f26041e480b26217f27342e175d0e070a3c5b103e2526217f27342e175d0e077e263451150104' and I have the following code:
from pwn import xor
flkey=bytes.fromhex('0e0b213f26041e480b26217f27342e175d0e070a3c5b103e2526217f27342e175d0e077e263451150104')
print(flkey)
y = xor(flkey, "crypto{1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}")
print(y)
xor(flkey, y)
My question is, how can I find the rest of the message knowing only some part of it? I'm quite new in this topic related to XOR.
EDIT: when I print(y) I obtain:
b'crypto{1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}'
So i guess the length of between the brackets is 34.

The weak point of the XOR operation in cryptography is that A XOR B XOR A = B. So when you know the part of the plaintext message M for the corresponding encrypted message C, you immediately obtain that part of the key as K = M XOR C.
In particular:
>>> cypher = bytes.fromhex('0e0b213f26041e480b26217f27342e175d0e070a3c5b103e2526217f27342e175d0e077e263451150104')
>>> plaintext = b'crypto{1'
>>> key = ''.join(chr(c ^ m) for c, m in zip(cypher, plaintext))
>>> key
'myXORkey'
The chances are high that this is the entire key (it actually is, which is left as an exercise). This string will repeat as many times as needed to match the plain text length.
Suppose now, that this was not the entire key. We know, however, that the key repeats in a loop, so that part we alreaydy know, myXORkey, will be reused somewhere later. We can start applying it to various places in the cypher and see when it starts making sense. That way we know the key length and parts of the messages. There are few ways from here, the most simple is, because we know some parts of the plaintext, we can find the missing part by sense and from there find the remaining part of the key.
The following properties may help:
the key is sufficiently short
the key makes some sense
you know the language the plain text is written in
If the key is as long as the message, is truly random, and used only once, the cypher cannot be broken (See One-time pad).
In a generic case when the plaintext or/and the key length is unknown, there is more sophisticated method based on the Hamming distance and transposition (The method was first discovered in 19th century by Friedrich Kasiski to analyze the Vigenère cipher.

Related

Why is this certain function a bad hashing function?

I would like to know why the following code snippet is a bad hashing function.
def computeHash(self, s):
h = 0
for ch in s:
h = h + ord(ch) # ord gives the ASCII value of character ch
return h % HASH_TABLE_SIZE
If I dramatically increase the size of the hash table will this make up for the inadequacies of the hash function?

It's a bad hashing function because strings are order-sensitive, but the hash is not; "ab" and "ba" would hash identically, and for longer strings the collisions just get worse; all of "abc", "acb", "bac", "bca", "cab", "cba" would share the same hash.
For an order-insensitive data structure (e.g. frozenset) a strategy like this isn't as bad, but it's still too easy to produce collisions by simply reducing one ordinal by one and increasing another by one, or by simplying putting a NUL character in there; frozenset({'\0', 'a'}) would hash identically to just frozenset({'a'}); typically this is countered by incorporating the length of the collection into the hash in some manner.
Secure hashes (e.g. Python uses SipHash) are the best solution; a randomized seed combined with an algorithm that conceals the seed while incorporating it into the resulting hash makes it not only harder to accidentally create collisions (which simple improvements like hashing the index as well as the ordinal would help with, to make the hash order and length sensitive), but also makes it nigh impossible for malicious data sources to intentionally cause collisions (which the simple solutions don't handle at all).
The other problem with this hash is that it's doesn't distribute the bits evenly; short strings mean only low bits are set in the hash. This means that increasing the table size is completely pointless when the strings are all relatively short; if all the strings are ASCII and 100 characters or less, the largest possible raw hash value is 12700; if you need to store a million such strings, you'll average nearly 79 collisions per bucket in the first 12,700 buckets (in practice, much more than that for common buckets; there will be humps with many more collisions in the middling values, and fewer collisions near the beginning, and almost none at the end, since stuff like '\x7f' * 100 is the only way to reach said maximum value), and no matter how many more buckets you have, they'll never be used. Technically, an open-addressing based hash table might use them, but it would be largely equivalent to separate chaining per bucket, since all indices past 12700 would only be found by the open-addressing "bounce around" algorithm; if thats badly designed, e.g. linear scanning, you might end up linearly scanning the whole table even if no entries actually collide for your particular hash (your bucket was filled by chaining, and it has to linearly scan until it finds an empty slot or the matching element).

Bad hashing function:
1.AC and BB would give same result at for big string there can be many permutations in which sum of ascii value is same.
2.Even different length string would give same hash result . 'A ' (A +space) = 'a'.
3.Even rearrangement characters in string would give same hash.

This is a bad hashing function. One big problem: re-arranging any or all characters returns the exact same hash.
Increasing the TABLE_SIZE does nothing to adjust for this.

Python library to generate hash value as numbers

I am searching for a library where I need to hash a string which should producer numbers rather than alpha numeric
eg:
Input string: hello world
Salt value: 5467865390
Output value: 9223372036854775808
I have searched many libraries, but those library produces alpha-numeric as output, but I need plain numbers as output.
Is there is any such library? Though the problem of having only numbers as output will have high chance of collision, but though it is fine for my business use case.
EDIT 1:
Also I need to control the number of digits in output. I want to store the value in database which has Numeric datatype. So I need to control the number of digits to fit the size within the data type range

Hexadecimal hash codes can be interpreted as (rather large) numbers:
import hashlib
hex_hash = hashlib.sha1('hello world'.encode('utf-8')).hexdigest()
int_hash = int(hex_hash, 16) # convert hexadecimal to integer
print(hex_hash)
print(int_hash)
outputs
'2aae6c35c94fcfb415dbe95f408b9ce91ee846ed'
243667368468580896692010249115860146898325751533
EDIT: As asked in the comments, to limit the number to a certain range, you can simply use the modulus operator. Note, of course, that this will increase the possibility of collisions. For instance, we can limit the "hash" to 0 .. 9,999,999 with modulus 10,000,000.
limited_hex_hash = hex_hash % 10_000_000
print(limited_hex_hash)
outputs
5751533

I think there is no need for libraries. You can simply accomplish this with hash() function in python.
InputString="Hello World!!"
HashValue=hash(InputString)
print(HashValue)
print(type(HashValue))
Output:
8831022758553168752
<class 'int'>
Solution for the problem based on Latest EDIT :
The above method is the simplest solution, changing the hash for each invocation will help us prevent attackers from tampering our application.
If you like to switch off the randomization you can simply do that by assigning
PYTHONHASHSEED to zero.
For information on switching off the randomization check the official docs https://docs.python.org/3.3/using/cmdline.html#cmdoption-R

Sequential SHA 256 hashes give different outputs for the same input

I thought that this would be a fairly common and straightforward problem, but I searched and was not able to find it.
I am a novice Python user, mostly self-taught. I'm trying what I thought would be a fairly straightforward exercise: generating a hash value from an input phrase. Here is my code:
import hashlib
target = input("Give me a phrase: ").encode('utf-8')
hashed_target = hashlib.sha256(target)
print(hashed_target)
I execute this and get the prompt:
Give me a phrase:
I entered the phrase "Give me liberty or give me death!" and got the hash output 0x7f8ed43d6a80.
Just to test, I tried again with the same phrase, but got a different output: 0x7f1cc23bca80.
I thought that was strange, so I copied the original input and pasted it in, and got a third, different hash output: 0x7f358aabea80.
I'm sure there must be a simple explanation. I'm not getting any errors, and the code looks straightforward, but the hashes, while similar, are definitely different.
Can someone help?

You are directly printing an object, which returns a memory address in the __repr__ string. You need to use the hexdigest or digest methods to get the hash:
>>> import hashlib
>>> testing=hashlib.sha256(b"sha256 is much longer than 12 hex characters")
>>> testing
<sha256 HASH object # 0x7f31c1c64670>
>>> hashed_testing=testing.hexdigest()
>>> hashed_testing
'a0798cfd68c7463937acd7c08e5c157b7af29f3bbe9af3c30c9e62c10d388e80'
>>>

Translating a lambda to a normal function in Python

While trying to understand how to use the lambda, I came across one reply in which the poster said that nothing you can do using lambda you can't do using normal functions.
I have been trying so hard to call a function from within itself in Python, not expert though, yet I'm learning, and I came across few problems where you need to use recursive functions, call multiple times to get a certain answer.
A guy has used the lambda function to do that, I tried to understand it but I failed, so I though if the functions can be implemented using normal functions, it would be easier to start understanding the lambda from that point on.
Let's take this sentence for example:
print"\n".join(" ".join([(lambda f:(lambda x:f(lambda*r:x(x)(*r)))(lambda x:f(lambda*r:x(x)(*r))))(lambda f:lambda q,n:len(q)<=n and q or f(q[len(q)/2:],n)+f(q[:len(q)/2],n))(k,z+1)for z,k in enumerate(i[:-1].split())]) for i in list(s)[1:])
This has been used in the Facebook hacker cup, I couldn't solve this problem as I was lost in the loops.
This sentence takes a few words, let's say "Stackoverflow rocks and it is great"
The problem statement in Facebook is :
You've intercepted a series of transmissions encrypted using an interesting and stupid method, which you have managed to decipher. The messages contain only spaces and lowercase English characters, and are encrypted as follows: for all words in a sentence, the ith word (1-based) word is replaced with the word generated by applying the following recursive operation f(word, i):
If the length of word is less than or equal to i,
return word.
Otherwise, return f(right half of word, i) +
f(left half of word, i).
If word is of odd length, it is split such that the right side is longer. You've decided to have a little fun with whoever is sending the messages, and to broadcast your own messages encrypted in the same style that they are using.
Input
Your input will begin with an integer N, followed by a newline and then N test cases. Each case consists of an unencrypted sentence containing only spaces and lowercase letters, and cases are newline-separated. There will be no leading or trailing spaces in a sentence and there will be at most 1 space character between any otherwise-adjacent characters
Output
Output, for each case and separated by newlines, the contents of the encrypted sentence after applying the encoding method describe above to it. You may ignore traditional capitalization rules and stick to all lowercase letters.
Constraints
5 ≤ N ≤ 25
Sentences will contain no more than 100 characters.

Python lambdas are simply syntactic sugar. "Regular" functions have the same capabilities such as closures, because, remember, you can define them inside another function, just as lambda does.
def some_func():
some_expr_using(lambda args: 42)
# becomes:
def some_func():
def unique_name(args):
return 42
some_expr_using(unique_name)
Except that when inspecting the lambda object, its name is set to "<lambda>", rather than unique_name as above, and other superficial details relating to how the actual source code is spelled rather than as it behaves.
Your code could be written as:
def y(f):
def a(x):
def b(*r):
return x(x)(*r)
return f(b)
return a(a)
def fx(f):
def x(q, n):
# changed "a and b or c": different semantics if b can be falsy
if len(q) <= n:
return q
else:
return f(q[len(q) / 2:], n) + f(q[:len(q) / 2], n)
return x
print "\n".join(
" ".join(y(fx)(k, z + 1) for z, k in enumerate(i[:-1].split()))
for i in list(s)[1:])
(But only if I've translated it correctly; double-check. :P)
This code is an example of a fixed-point combinator, which I only barely understand, and it's hard to give better names without knowing more context (I didn't try to decipher the actual problem statement). It can be unraveled to a recursive function which calls itself by name directly.

how to keep count of replaced strings

I have a massive string im trying to parse as series of tokens in string form, and i found a problem: because many of the strings are alike, sometimes doing string.replace()will cause previously replaced characters to be replaced again.
say i have the string being replaced is 'goto' and it gets replaced by '41' (hex) and gets converted into ASCII ('A'). later on, the string 'A' is also to be replaced, so that converted token gets replaced again, causing problems.
what would be the best way to get the strings to be replaced only once? breaking each token off the original string and searching for them one at a time takes very long
This is the code i have now. although it more or less works, its not very fast
# The largest token is 8 ASCII chars long
'out' is the string with the final outputs
while len(data) != 0:
length = 8
while reverse_search(data[:length]) == None:#sorry THC4k, i used your code
#at first, but it didnt work out
#for this and I was too lazy to
#change it
length -= 1
out += reverse_search(data[:length])
data = data[length:]

If you're trying to substitute strings at once, you can use a dictionary:
translation = {'PRINT': '32', 'GOTO': '41'}
code = ' '.join(translation[i] if i in translation else i for i in code.split(' '))
which is basically O(2|S|+(n*|dict|)). Very fast. Although memory usage could be quite substantial. Keeping track of substitutions would allow you to solve the problem in linear time, but only if you exclude the cost of looking up previous substitution. Altogether, the problem seems to be polynomial by nature.
Unless there is a function in python to translate strings via dictionaries that i don't know about, this one seems to be the simplest way of putting it.
it turns
10 PRINT HELLO
20 GOTO 10
into
10 32 HELLO
20 41 10
I hope this has something to do with your problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Guessing XOR secret key knowing some part of it - python

Related

Why is this certain function a bad hashing function?

Python library to generate hash value as numbers

Sequential SHA 256 hashes give different outputs for the same input

Translating a lambda to a normal function in Python

how to keep count of replaced strings

Categories

Resources