Python String Comparisons Using A Word List

Python String Comparisons Using A Word List - python

Eventually I will be able to post simple questions like this in a chat room, but for now I must post it. I am still struggling with comparison issues in Python. I have a list containing strings that I obtained from a file. I have a function which takes in the word list (previously created from a file) and some 'ciphertext'. I am trying to Brute Force crack the ciphertext using a Shift Cipher. My issue is the same as with comparing integers. Although I can see when trying to debug using print statements, that my ciphertext will be shifted to a word in the word list, it never evaluates to True. I am probably comparing two different variable types or a /n is probably throwing the comparison off. Sorry for all of the posts today, I am doing lots of practice problems today in preparation for an upcoming assignment.
def shift_encrypt(s, m):
shiftAmt = s % 26
msgAsNumList = string2nlist(m)
shiftedNumList = add_val_mod26(msgAsNumList, shiftAmt)
print 'Here is the shifted number list: ', shiftedNumList
# Take the shifted number list and convert it back to a string
numListtoMsg = nlist2string(shiftedNumList)
msgString = ''.join(numListtoMsg)
return msgString
def add_val_mod26(nlist, value):
newValue = value % 26
print 'Value to Add after mod 26: ', newValue
listLen = len(nlist)
index = 0
while index < listLen:
nlist[index] = (nlist[index] + newValue) % 26
index = index + 1
return nlist
def string2nlist(m):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
msgLen = len(m) # var msgLen will be an integer of the length
index = 0 # iterate through message length in while loop
while index < msgLen:
letter = m[index] # iterate through message m
i = 0
while i < 26:
if letter == characters[i]:
newList.append(numbers[i])
i = i + 1
index = index + 1
return newList
def nlist2string(nlist):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
nListLen = len(nlist)
index = 0
while index < nListLen:
num = nlist[index]
newNum = num % 26
i = 0
while i < 26:
num1 = newNum
num2 = numbers[i]
if (num1 == num2):
newList.append(characters[i])
i = i + 1
index = index + 1
return newList
def wordList(filename):
fileObject = open(filename, "r+")
wordsList = fileObject.readlines()
return wordsList
def shift_computePlaintext(wlist, c):
index = 0
while index < 26:
newCipher = shift_encrypt(index, c)
print 'The new cipher text is: ', newCipher
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
index = index + 1
print 'Take Ciphertext and Find Plaintext from Wordlist Function: \n'
list = wordList('test.txt')
print list
plainText = shift_computePlaintext(list, 'vium')
print 'The plaintext was found in the wordlist: ', plainText
When the shift amount = 18, the ciphertext = name which is a word in my wordlist, but it never evaluates to True. Thanks for any help in advance!!

It's hard to be sure with the information we have so far, but here's a guess:
wordsList = fileObject.readlines()
This is going to return you a list of strings with the newlines preserved, like:
['hello\n', 'my\n', 'name\n', 'is\n', 'jesi\n']
So, inside shift_computePlaintext, when you iterate over wlist looking for something that matches the decrypted 'vium', you're looking for a string that matches 'name', and none of them match, including 'name\n'.
In other words, exactly what you suspected.
There are a few ways to fix this, but the most obvious are to use wlist[i].strip() instead of wlist[i], or to strip everything in the first place by using something like wordsList = [line.strip() for line in fileObject] instead of wordsList = fileObject.readlines().
A few side notes:
There is almost never a good reason to call readlines(). That returns a list of lines that you can iterate over… but the file object itself was already an iterable of lines that you can iterate over. If you really need to make sure it's a list instead of some other kind of iterable, or make a separate copy for later, or whatever, just call list on it, as you would with any other iterable.
You should almost never write a loop like this:
index = 0
while index < 26:
# ...
index = index + 1
Instead, just do this:
for index in range(26):
It's easier to read, harder to get wrong (subtle off-by-one errors are responsible for half the frustrating debugging you will do in your lifetime), etc.
And if you're looping over the length of a collection, don't even do that. Instead of this:
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
# ...
word = wlist[i]
# ...
i = i + 1
… just do this:
for word in wlist:
… or, if you need both i and word (which you occasionally do):
for i, word in enumerate(wlist):
Meanwhile, if the only reason you're looping over a collection is to check each of its values, you don't even need that. Instead of this:
wordlistLen = len(wlist)
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
… just do this:
if newCipher in wlist:
return newCipher
else:
print 'Word not found.'
Here, you've actually got one of those subtle bugs: you print 'Word not found' over and over, instead of only printing it once at the end if it wasn't found.

Related

How to group consecutive letters in a string in Python?

For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.

This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)

Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)

yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.

Brute Force Dictionary Attack Caesar Cipher Python Code not working past 18'th shift

This was made to brute force caesar ciphers using a dictionary file from http://www.math.sjsu.edu/~foster/dictionary.txt. It is run through three functions, lang_lib() which makes the text of the dictionary into a callable object, isEnglish(), which checks the percentage of the phrase, and if at least 60% of it matchwa with the any words in the dictionary, it would return a True value. Using this, a caeser cipher function runs through all shifts, and checking them from english words. It should return the result with the highest percentage, but it only seems to work through shifts 1-18. I can't figure out why it isn't working.
def lang_lib():
file = open('dictionary.txt', 'r')
file_read = file.read()
file_split = file_read.split()
words = []
for word in file_split:
words.append(word)
file.close()
return words
dictionary = lang_lib()
def isEnglish(text):
split_text = text.lower().split()
counter = 0
not_in_dict = []
for word in split_text:
if word in dictionary:
counter += 1
else:
not_in_dict.append(word)
length = len(split_text)
text_percent = ((counter / length) * 100)
#print(text_percent)
if text_percent >= 60.0:
return True
else:
return False
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789!##$%/."
def caeser(text): #Put in text, and it will spit out all possible values
lower_text = text.lower()
ciphertext = "" #stores current cipher value
matches = [] #stores possible matches
for i in range(len(alphabet)): #loops for the length of input alphabet
for c in lower_text:
if c in alphabet:
num = alphabet.find(c)
newnum = num - i
if newnum >= len(alphabet):
newnum -= len(alphabet)
elif newnum < 0:
newnum += len(alphabet)
ciphertext = ciphertext + alphabet[newnum]
else:
ciphertext = ciphertext + c
testing = isEnglish(ciphertext)
for text in ciphertext:
if testing == True and len(ciphertext) == len(lower_text):
matches.append(ciphertext)
return i, matches
ciphertext = "" #clears ciphertext so it doesn't get cluttered
print(caeser('0x447 #0x$x 74w v0%5')) #shift of 19
print(caeser('zw336 #zw9w 63v uz#4')) #shift of 18
Thanks guys.

This part is indented too far as #tripleee suggested:
testing = isEnglish(ciphertext)
for text in ciphertext:
if testing == True:
matches.append(ciphertext)
return i, matches
Also you don't need to check the length if you have the indentation right and let the previous loop complete....

I found out that the dictionary.txt does not contain 2 or 3 letter words, so it would skew long inputs with many of these words, and return False. I added a list of common words, so now all inputs work accurately.
If anyone wants to help me make this code more efficient, I'd love some pointers. I am very new to Python.

Reversing words in place

Ex. Input: rat the ate cat the
Output: the cat ate the rat
Here's my code so far:
def reverse_message(starting, ending, msg):
while(starting < ending):
msg[starting], msg[ending] = msg[ending], msg[starting]
starting += 1
ending -= 1
def reverse_words(msg):
# Decode the message by reversing the words
# reverse entire message
reverse_message(0, len(msg) - 1, msg)
#reverse each word
starting = 0
for i in range(len(msg)):
if ((msg[i] == ' ') or (i == len(msg) - 1)):
reverse_message(starting, i-1, msg)
starting = i+1
What am I doing wrong? Any help would be highly appreciated.

This can be done in a single line:
str=' '.join(list(input().split(' '))[::-1])

To begin with, I would avoid explicitly passing a starting and ending index, instead relying on the message itself, where starting index is the first, and the ending index is the last index of the string, also I will pass the string as a list, since strings are mutable and cannot be changed, but list can.
def reverse_word(msg):
starting = 0
ending = len(msg)-1
while(starting < ending):
tmp = msg[starting]
msg[starting] = msg[ending]
msg[ending] = tmp
starting += 1
ending -= 1
return msg
After that, to reverse the string, I will reverse the entire string first, and then reverse each word in the string in place, and then stitch the string back together for the output.
def reverse_message(msg):
#Convert the string into list of characters
chars = list(msg)
#Reverse entire list
chars = reverse_word(chars)
starting = 0
i = 0
result = []
#Iterate through the reversed list, and pick individual words based on
#whitespace, and then reverse them in place
while i < len(chars):
if chars[i] == ' ':
#Append all reversed words to another list
result += reverse_word(chars[starting:i]) + [' ']
starting = i+1
i+=1
#Reverse the last remaining word
result += reverse_word(chars[starting:i])
#Stitch the list back to string and return it
return ''.join(result)
The resultant output will look like.
print(reverse_message('rat the ate cat the'))
#the cat ate the rat

Split a string, loop through it character by character, and replace specific ones?

I'm working on an assignment and have gotten stuck on a particular task. I need to write two functions that do similar things. The first needs to correct capitalization at the beginning of a sentence, and count when this is done. I've tried the below code:
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
if sentence[0].islower():
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
Bu receive an out of range error. I'm getting an "Index out of range error" and am not sure why. Should't sentence[0] simply reference the first character in that particular string in the list?
I also need to replace certain characters with others, as shown below:
def replace_punctuation(usrStr):
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for sentence in s:
for i in sentence:
if i == '!':
sentence[i] = '.'
exclamationCount += 1
if i == ';':
sentence[i] = ','
semicolonCount += 1
newStr = ''.join(s)
print(newStr)
print(semicolonCount)
print(exclamationCount)
But I'm struggling to figure out how to actually do the replacing once the character is found. Where am I going wrong here?
Thank you in advance for any help!

I would use str.capitalize over str.upper on one character. It also works correctly on empty strings. The other major improvement would be to use enumerate to also track the index as you iterate over the list:
def fix_capitalization(s):
sentences = [sentence.strip() for sentence in s.split('.')]
count = 0
for index, sentence in enumerate(sentences):
capitalized = sentence.capitalize()
if capitalized != sentence:
count += 1
sentences[index] = capitalized
result = '. '.join(sentences)
return result, count
You can take a similar approach to replacing punctuation:
replacements = {'!': '.', ';': ','}
def replace_punctuation(s):
l = list(s)
counts = dict.fromkeys(replacements, 0)
for index, item in enumerate(l):
if item in replacements:
l[index] = replacements[item]
counts[item] += 1
print("Replacement counts:")
for k, v in counts.items():
print("{} {:>5}".format(k, v))
return ''.join(l)

There are better ways to do these things but I'll try to change your code minimally so you will learn something.
The first function's issue is that when you split the sentence like "Hello." there will be two sentences in your fixStr list that the last one is an empty string; so the first index of an empty string is out of range. fix it by doing this.
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
# changed line
if sentence != "":
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
In second snippet you are trying to write, when you pass a string to list() you get a list of characters of that string. So all you need to do is to iterate over the elements of the list and replace them and after that get string from the list.
def replace_punctuation(usrStr):
newStr = ""
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for c in s:
if c == '!':
c = '.'
exclamationCount += 1
if c == ';':
c = ','
semicolonCount += 1
newStr = newStr + c
print(newStr)
print(semicolonCount)
print(exclamationCount)
Hope I helped!

Python has a nice build in function for this
for str in list:
new_str = str.replace('!', '.').replace(';', ',')
You can write a oneliner to get a new list
new_list = [str.replace('!', '.').replace(';', ',') for str in list]
You also could go for the split/join method
new_str = '.'.join(str.split('!'))
new_str = ','.join(str.split(';'))
To count capitalized letters you could do
result = len([cap for cap in str if str(cap).isupper()])
And to capitalize them words just use the
str.capitalize()
Hope this works out for you

What's the role of string = "" in a program Python

i know the title may not be the best, as i'm not exactly how to explain my problem in short words. However i recently was looking at some codes online and i didn't get the reason why some code was used i tried looking on the internet but as i dont know what that part of the code is called ive no idea what to search up so you guys are my last hope.
In this function
def NumIntoChar(LineLis):
for n in LineLis:
string = "" # Here is what im not sure. why is this used here ?
for i in range(n):
string += '-'
print(string)
Im unsure why string = "" is used between the 2 for looks
another example is:
message = """SAHH""" # Add Code
message = message.upper()
keyShift = 1
encryptedMsg = ""
result = {}
while keyShift <= 26:
encryptedMsg = ""
for character in message:
if character.isalpha() is True:
x = ord(character) - 65
x += keyShift
x = x % 26
encryptedMsg += chr(x + 65)
else:
encryptedMsg += character
result[keyShift] = encryptedMsg
keyShift += 1
for r in result.keys():
print(r,result[r])
Here we see ' encryptedMsg = "" ' being used just like in the previous code.

Just below that line of code, you have this for loop:
for i in range(n):
string += '-'
The x += y operator is syntactic sugar for x = x + y. In order to use this operator, x must have a defined value first.
For the first iteration of the loop, string will essentially be assigned like this:
string = string + '-'
In order to avoid NameError being thrown, string first needs to be declared and assigned some value, which is what string = "" does. The expression in the first iteration of the loop then essentially becomes:
string = '' + '-'

Here you initialize a variable with empty string using var = ''.
It is commonly followed in scenarios where you have to iteratively concatenate content to form a bigger string. Your code starts with initializing the empty string and within the loop, content of the string is concatenated. For example:
my_str = ""
while repeat:
my_str += some_str
# Do some stuff
Other scenario in which you might need it is: when you have to set default value of string as empty, but based on some condition reset the content of string. For example:
my_name = ''
if user.is_logged_in():
my_name = user.name
Also read: Initialize a string variable in Python: “” or None?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Comparisons Using A Word List - python

Related

How to group consecutive letters in a string in Python?

Brute Force Dictionary Attack Caesar Cipher Python Code not working past 18'th shift

Reversing words in place

Split a string, loop through it character by character, and replace specific ones?

What's the role of string = "" in a program Python

Categories

Resources