This was made to brute force caesar ciphers using a dictionary file from http://www.math.sjsu.edu/~foster/dictionary.txt. It is run through three functions, lang_lib() which makes the text of the dictionary into a callable object, isEnglish(), which checks the percentage of the phrase, and if at least 60% of it matchwa with the any words in the dictionary, it would return a True value. Using this, a caeser cipher function runs through all shifts, and checking them from english words. It should return the result with the highest percentage, but it only seems to work through shifts 1-18. I can't figure out why it isn't working.
def lang_lib():
file = open('dictionary.txt', 'r')
file_read = file.read()
file_split = file_read.split()
words = []
for word in file_split:
words.append(word)
file.close()
return words
dictionary = lang_lib()
def isEnglish(text):
split_text = text.lower().split()
counter = 0
not_in_dict = []
for word in split_text:
if word in dictionary:
counter += 1
else:
not_in_dict.append(word)
length = len(split_text)
text_percent = ((counter / length) * 100)
#print(text_percent)
if text_percent >= 60.0:
return True
else:
return False
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789!##$%/."
def caeser(text): #Put in text, and it will spit out all possible values
lower_text = text.lower()
ciphertext = "" #stores current cipher value
matches = [] #stores possible matches
for i in range(len(alphabet)): #loops for the length of input alphabet
for c in lower_text:
if c in alphabet:
num = alphabet.find(c)
newnum = num - i
if newnum >= len(alphabet):
newnum -= len(alphabet)
elif newnum < 0:
newnum += len(alphabet)
ciphertext = ciphertext + alphabet[newnum]
else:
ciphertext = ciphertext + c
testing = isEnglish(ciphertext)
for text in ciphertext:
if testing == True and len(ciphertext) == len(lower_text):
matches.append(ciphertext)
return i, matches
ciphertext = "" #clears ciphertext so it doesn't get cluttered
print(caeser('0x447 #0x$x 74w v0%5')) #shift of 19
print(caeser('zw336 #zw9w 63v uz#4')) #shift of 18
Thanks guys.
This part is indented too far as #tripleee suggested:
testing = isEnglish(ciphertext)
for text in ciphertext:
if testing == True:
matches.append(ciphertext)
return i, matches
Also you don't need to check the length if you have the indentation right and let the previous loop complete....
I found out that the dictionary.txt does not contain 2 or 3 letter words, so it would skew long inputs with many of these words, and return False. I added a list of common words, so now all inputs work accurately.
If anyone wants to help me make this code more efficient, I'd love some pointers. I am very new to Python.
I have been working for a while on cipher program in python for an online course. I keep going back and forth between successes and set backs, and recently thought I had figured it out. That is, until I compared the output I was getting to what the course said I should actually be getting. When I input "The crow flies at midnight!" and a key of "boom", I should be getting back "Uvs osck rwse bh auebwsih!" but instead get back "Tvs dfci tzufg mu auebwsih!" I am at a loss for what my program is doing, and could use a second look at my program from someone. Unfortunately, I don't have a person in real life to go to lol. Any help is greatly appreciated.
alphabet = "abcdefghijklmnopqrstuvwxyz"
def alphabet_position(letter):
lower_letter = letter.lower() #Makes any input lowercase.
return alphabet.index(lower_letter) #Returns the position of input as a number.
def vigenere(text,key):
m = len(key)
newList = ""
for i in range(len(text)):
if text[i] in alphabet:
text_position = alphabet_position(text[i])
key_position = alphabet_position(key[i % m])
value = (text_position + key_position) % 26
newList += alphabet[value]
else:
newList += text[i]
return newList
print (vigenere("The crow flies at midnight!", "boom"))
# Should print out Uvs osck rmwse bh auebwsih!
# Actually prints out Tvs dfci tzufg mu auebwsih!
Ok.The problem was the expected cipher skipped non-alphabetical characters and continued on the next letter with the same key.But in your implementaion you skipped the key too.
The crow
boo mboo // expected
boo boom // your version
So here is the corrected code:
alphabet = "abcdefghijklmnopqrstuvwxyz"
def alphabet_position(letter):
lower_letter = letter.lower() #Makes any input lowercase.
return alphabet.index(lower_letter) #Returns the position of input as a number.
def vigenere(text,key):
text_lower = text.lower()
m = len(key)
newList = ""
c = 0
for i in range(len(text)):
if text_lower[i] in alphabet:
text_position = alphabet_position(text[i])
key_position = alphabet_position(key[c % m])
value = (text_position + key_position) % 26
if text[i].isupper():
newList += alphabet[value].upper()
else:
newList += alphabet[value]
c += 1
else:
newList += text[i]
return newList
print (vigenere("The crow flies at midnight!", "boom"))
# Should print out Uvs osck rmwse bh auebwsih!
# Actually prints out Tvs dfci tzufg mu auebwsih!
In your vigenere function, convert set text = text.lower() .
To find such problems just follow one letter and see what happens, it was very easy to see that it doesn't work because 'T' is not in the alphabet but 't' is so you should convert the text to lower case.
It looks like the problem is that you didn't remind to handle the spaces. The "m" of "boom" should be used to encrypt the "c" of "crow", not the space between "The" and "crow"
def caesar_cipher(offset, string):
words = string.replace(" ", " ")
cipher_chars = "abcdefghijklmnopqrstuvwxyz"
word_i = 0
while word_i < len(words):
word = words[word_i]
letter_i = 0
while letter_i < len(word):
char_i = ord(word[letter_i]) - ord("c")
new_char_i = (char_i + offset) % 26
value = chr(new_char_i + ord("c"))
letter_i += 1
word_i += 1
return words.join(value)
print caesar_cipher(3, "abc")
Hey everyone, for some reason my ceasar cipher is only printing the last letter in my string, when I want it to cipher the whole string, for example, if i print an offset of 3 with string "abc" it should print def, but instead is just printing the f. Any help is greatly appreciated!
value is overwritten in the loop. You want to create a list passed to join (ATM you're joining only 1 character):
value = []
then
value.append(chr(new_char_i + ord("c")))
the join statement is also wrong: just do:
return "".join(value)
Note that there are other issues in your code. It seems to intent to process several words, but it doesn't, so a lot of loops don't loop (there's no list of words, it's just a word), so what you are doing could be summarized to (using a simple list comprehension):
def caesar_cipher(offset, string):
return "".join([chr((ord(letter) - ord("c") + offset) % 26 + ord("c")) for letter in string])
and for a sentence:
print(" ".join([caesar_cipher(3, w) for w in "a full sentence".split()]))
As a nice commenter noted, using c as start letter is not correct since it trashes sentences containing the 3 last letters. There's no reason not to start by a (the result are the same for the rest of the letters):
def caesar_cipher(offset, string):
return "".join([chr((ord(letter) - ord("a") + offset) % 26 + ord("a")) for letter in string])
Aside: a quick similar algorithm is rot13. Not really a cipher but it's natively supported:
import codecs
print(codecs.encode("a full sentence","rot13"))
(apply on the encoded string to decode it)
The first two functions are mine:
def rotated(n: int):
'''Returns a rotated letter if parameter is greater than 26'''
ALPHABET = 'abcdefghijklmnopqrstuvwxyz'
if n >= 26:
n %= 26
return ALPHABET[n:26] + ALPHABET[:n]
assert rotated(0) == 'abcdefghijklmnopqrstuvwxyz'
assert rotated(26) == 'abcdefghijklmnopqrstuvwxyz'
def Caesar_decrypt(text: str, key: int) -> str:
'''Returns a decryption of parameter text and key'''
text = text.lower()
key_to_zero = str.maketrans(rotated(key),rotated(0))
return text.translate(key_to_zero)
But my partner worked on the 3rd function:
def Caesar_break(code: str)-> str:
'Decrypts the coded text without a key'
file = open('wordlist.txt', 'r')
dic = []
dlist = file.readlines()
wl = []
l = []
cl = []
swl = []
sw = ''
for words in code:
if words.isalnum() or words.isspace():
l.append(words)
else:
l.append(' ')
Ncode = ''.join(l)
codelist = Ncode.split()
high = 0
for i in range(1,27):
highesthit = 0
hit = 0
out = Caesar_decrypt(Ncode, i)
e = 0
l = 0
while l < len(dlist):
dic.append(dlist[l].split()[0])
l += 1
while e < len(dic):
if out == dic[e]:
hit += 1
e += 1
if hit > highesthit:
high = i
highesthit = hit
return(Caesar_decrypt(Ncode, high))
I can't contact him right now, so I was wondering if there is a simpler way to break the Caesar code using brute force. My partner used too many random letters in his code, so I can't really understand it.
Note: "wordlist.txt" is a document we downloaded down with all of the words in the dictionary. Here is the link for reference.
The Caesar_break code is supposed to work like this:
Caesar_break('amknsrcp qagclac') == 'computer science'
code breaking! Yay! The simplest way to break the caeser cipher is to assume that your encoded text is representative of the actual language it's in with respect to the frequency of letters. In English, that relative frequency looks kind of like:
"etaoinshrdlcumwfgypbvkjxqz"
# most to least common characters in English according to
# https://en.wikipedia.org/wiki/Letter_frequency
The fastest way to break a Caeser Cipher, then, is to create a collections.Counter of the letters in your encrypted phrase, find the most common couple, and assume each one (in turn) is e. Calculate your difference from there, and apply the decrypt cipher. Test to see if it's valid English, and ta-da!
import collections
def difference(a: str, b: str) -> int:
a, b = a.lower(), b.lower()
return ord(b) - ord(a)
def english_test(wordlist: "sequence of valid english words",
text: str) -> bool:
"""english_test checks that every word in `text` is in `wordlist`"""
return all(word in wordlist for word in text)
def bruteforce_caeser(text: str) -> str:
with open('path/to/wordlist.txt') as words:
wordlist = {word.strip() for word in words}
# set comprehension!
c = collections.Counter(filter(lambda ch: not ch.isspace(), text))
most_common = c.most_common() # ordered by most -> least common
for ch, _ in most_common:
diff = difference('e', ch)
plaintext = Caeser_decrypt(text, diff)
if english_test(wordlist, plaintext):
return plaintext
There's a subtle logic error in this code though, w.r.t. an assumption made about the input text. I'll leave it as an exercise to the student to find the logic error and think of what small change could be made to ensure a result on any input. As a hint: try rotating and then decrypting the following phrase:
Judy I don't think it's right for you to contact such a marksman as this man, for without warning this marksman could shoot and kill you.
If you are sure that a ciphertext was encrypted with ceaser (x+3)mod25 you can just float letters. I would make all text lowercase first. then get asci values all chracters. For example asci(a)=97, make it 97-97=0; for b make it 98-97=1.Then I would make 2 arrays 1 for characters, 1 for integer values of chracters....
Eventually I will be able to post simple questions like this in a chat room, but for now I must post it. I am still struggling with comparison issues in Python. I have a list containing strings that I obtained from a file. I have a function which takes in the word list (previously created from a file) and some 'ciphertext'. I am trying to Brute Force crack the ciphertext using a Shift Cipher. My issue is the same as with comparing integers. Although I can see when trying to debug using print statements, that my ciphertext will be shifted to a word in the word list, it never evaluates to True. I am probably comparing two different variable types or a /n is probably throwing the comparison off. Sorry for all of the posts today, I am doing lots of practice problems today in preparation for an upcoming assignment.
def shift_encrypt(s, m):
shiftAmt = s % 26
msgAsNumList = string2nlist(m)
shiftedNumList = add_val_mod26(msgAsNumList, shiftAmt)
print 'Here is the shifted number list: ', shiftedNumList
# Take the shifted number list and convert it back to a string
numListtoMsg = nlist2string(shiftedNumList)
msgString = ''.join(numListtoMsg)
return msgString
def add_val_mod26(nlist, value):
newValue = value % 26
print 'Value to Add after mod 26: ', newValue
listLen = len(nlist)
index = 0
while index < listLen:
nlist[index] = (nlist[index] + newValue) % 26
index = index + 1
return nlist
def string2nlist(m):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
msgLen = len(m) # var msgLen will be an integer of the length
index = 0 # iterate through message length in while loop
while index < msgLen:
letter = m[index] # iterate through message m
i = 0
while i < 26:
if letter == characters[i]:
newList.append(numbers[i])
i = i + 1
index = index + 1
return newList
def nlist2string(nlist):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
nListLen = len(nlist)
index = 0
while index < nListLen:
num = nlist[index]
newNum = num % 26
i = 0
while i < 26:
num1 = newNum
num2 = numbers[i]
if (num1 == num2):
newList.append(characters[i])
i = i + 1
index = index + 1
return newList
def wordList(filename):
fileObject = open(filename, "r+")
wordsList = fileObject.readlines()
return wordsList
def shift_computePlaintext(wlist, c):
index = 0
while index < 26:
newCipher = shift_encrypt(index, c)
print 'The new cipher text is: ', newCipher
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
index = index + 1
print 'Take Ciphertext and Find Plaintext from Wordlist Function: \n'
list = wordList('test.txt')
print list
plainText = shift_computePlaintext(list, 'vium')
print 'The plaintext was found in the wordlist: ', plainText
When the shift amount = 18, the ciphertext = name which is a word in my wordlist, but it never evaluates to True. Thanks for any help in advance!!
It's hard to be sure with the information we have so far, but here's a guess:
wordsList = fileObject.readlines()
This is going to return you a list of strings with the newlines preserved, like:
['hello\n', 'my\n', 'name\n', 'is\n', 'jesi\n']
So, inside shift_computePlaintext, when you iterate over wlist looking for something that matches the decrypted 'vium', you're looking for a string that matches 'name', and none of them match, including 'name\n'.
In other words, exactly what you suspected.
There are a few ways to fix this, but the most obvious are to use wlist[i].strip() instead of wlist[i], or to strip everything in the first place by using something like wordsList = [line.strip() for line in fileObject] instead of wordsList = fileObject.readlines().
A few side notes:
There is almost never a good reason to call readlines(). That returns a list of lines that you can iterate over… but the file object itself was already an iterable of lines that you can iterate over. If you really need to make sure it's a list instead of some other kind of iterable, or make a separate copy for later, or whatever, just call list on it, as you would with any other iterable.
You should almost never write a loop like this:
index = 0
while index < 26:
# ...
index = index + 1
Instead, just do this:
for index in range(26):
It's easier to read, harder to get wrong (subtle off-by-one errors are responsible for half the frustrating debugging you will do in your lifetime), etc.
And if you're looping over the length of a collection, don't even do that. Instead of this:
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
# ...
word = wlist[i]
# ...
i = i + 1
… just do this:
for word in wlist:
… or, if you need both i and word (which you occasionally do):
for i, word in enumerate(wlist):
Meanwhile, if the only reason you're looping over a collection is to check each of its values, you don't even need that. Instead of this:
wordlistLen = len(wlist)
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
… just do this:
if newCipher in wlist:
return newCipher
else:
print 'Word not found.'
Here, you've actually got one of those subtle bugs: you print 'Word not found' over and over, instead of only printing it once at the end if it wasn't found.