I am having a bit of trouble with some Python code. I have a large text file called "big.txt". I have iterated over it in my code to sort each word into an array (or list) and then iterated over it again to remove any character that is not in the alphabet. I also have a function called worddistance which looks at how similar two words are and returns a score subsequently. I have another function called autocorrect. I want to pass this function a misspelled word, and print a 'Did you mean...' sentence with words that gave a low score on the worddistance function (the function adds 1 to a counter whenever a difference is noticed - the lower the score, the more similar).
Strangely, I keep getting the error:
"Index Error: string index out of range"
I am at a loss at what is going on!
My code is below.
Thanks in advance for the replies,
Samuel Naughton
f = open("big.txt", "r")
words = list()
temp_words = list()
for line in f:
for word in line.split():
temp_words.append(word.lower())
allowed_characters = 'abcdefghijklmnopqrstuvwxyz'
for item in temp_words:
temp_new_word = ''
for char in item:
if char in allowed_characters:
temp_new_word += char
else:
continue
words.append(temp_new_word)
list(set(words)).sort()
def worddistance(word1, word2):
counter = 0
if len(word1) > len(word2):
counter += len(word1) - len(word2)
new_word1 = word1[:len(word2) + 1]
for char in range(0, len(word2) + 1) :
if word2[char] != new_word1[char]:
counter += 1
else:
continue
elif len(word2) > len(word1):
counter += len(word2) - len(word1)
new_word2 = word2[:len(word1) + 1]
for char in range(0, len(word1) + 1):
if word1[char] != word2[char]:
counter += 1
else:
continue
return counter
def autocorrect(word):
word.lower()
if word in words:
print("The spelling is correct.")
return
else:
suggestions = list()
for item in words:
diff = worddistance(word, item)
if diff == 1:
suggestions.append(item)
print("Did you mean: ", end = ' ')
if len(suggestions) == 1:
print(suggestions[0])
return
else:
for i in range(0, len(suggestions)):
if i == len(suggestons) - 1:
print("or " + suggestions[i] + "?")
return
print(suggestions[i] + ", ", end="")
return
In worddistance(), it looks like for char in range(0, len(word1) + 1): should be:
for char in range(len(word1)):
And for char in range(0, len(word2) + 1) : should be:
for char in range(len(word2)):
And by the way, list(set(words)).sort() is sorting a temporary list, which is probably not what you want. It should be:
words = sorted(set(words))
As mentioned in the other comment, you should range(len(word1)).
In addition to that:
- You should consider case where word1 and words have the same length #len(word2) == len(word1)
- You should also take care of naming. In the second condition in wordDistance function
if word1[char] != word2[char]:
You should be comparing to new_word2
if word1[char] != new_word2[char]:
- In the autocorrect, you should assign lower to word= word.lower()
words= []
for item in temp_words:
temp_new_word = ''
for char in item:
if char in allowed_characters:
temp_new_word += char
else:
continue
words.append(temp_new_word)
words= sorted(set(words))
def worddistance(word1, word2):
counter = 0
if len(word1) > len(word2):
counter += len(word1) - len(word2)
new_word1 = word1[:len(word2) + 1]
for char in range(len(word2)) :
if word2[char] != new_word1[char]:
counter += 1
elif len(word2) > len(word1):
counter += len(word2) - len(word1)
new_word2 = word2[:len(word1) + 1]
for char in range(len(word1)):
if word1[char] != new_word2[char]: #This is a problem
counter += 1
else: #len(word2) == len(word1) #You missed this case
for char in range(len(word1)):
if word1[char] != word2[char]:
counter += 1
return counter
def autocorrect(word):
word= word.lower() #This is a problem
if word in words:
print("The spelling is correct.")
else:
suggestions = list()
for item in words:
diff = worddistance(word, item)
print diff
if diff == 1:
suggestions.append(item)
print("Did you mean: ")
if len(suggestions) == 1:
print(suggestions[0])
else:
for i in range(len(suggestions)):
if i == len(suggestons) - 1:
print("or " + suggestions[i] + "?")
print(suggestions[i] + ", ")
Next time, Try to use Python built-in function like enumerate, to avoid using for i in range(list), then list[i], len instead of counter .. etc
Eg:
Your distance function could be written this way, or much more simpler.
def distance(word1, word2):
counter= max(len(word1),len(word2))- min(len(word1),len(word2))
if len(word1) > len(word2):
counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z])
elif len(word2) > len(word1):
counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z])
else:
counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z])
return counter
Related
I want to find consecutive number of characters and print them as >3 with alphabets#count otherwise print all alphabets
I want to get: B#6CCCBBB
But I get B#5CCCBBB as output. I am missing 0th element.
str1 = "BBBBBBCCCBBB"
def consecutive_alpha(str1):
count = 0
new_string = ""
n = 3
for i in range(0, len(str1)-1):
if str1[i] == str1[i+1]:
count += 1
if i == (len(str1)-2):
if count > n:
new_string = new_string + str1[i] +"#" + str(count)
else:
new_string = new_string + str1[i]*count
else:
if count > n:
new_string = new_string + str1[i] +"#" + str(count)
else:
new_string = new_string + str1[i]*count
count = 1
print new_string
consecutive_alpha(str1)
Why not just use itertools.groupby?
from itertools import groupby
def strict_groupby(iterable, **kwargs):
for key, group in groupby(iterable, **kwargs):
yield (key, ''.join(group))
def consecutive_alpha(string):
return ''.join(f'{key}#{len(group)}'
if len(group) > 3
else group
for key, group in strict_groupby(string))
consecutive_alpha('BBBBBBCCCBBB')
Output:
'B#6CCCBBB'
Incase want to try one-liner
from itertools import groupby
''.join(_ + '#' + str(len(l)) if len(l)> 3 else ''.join(l) for l in [list(g) for _,g in groupby(str1)])
#B#6CCCBBB
You're getting B#5 because you initialize count = 0. So you're not counting the first character. You get it right when you do count = 1 later in the loop.
You have another problem. If the last character isn't part of a repeated sequence, you never print it, since the loop stops early.
def consecutive_alpha(str1):
count = 1
new_string = ""
n = 3
for i in range(0, len(str1)-1):
if str1[i] == str1[i+1]:
count += 1
if i == (len(str1)-2):
if count > n:
new_string += str1[i] +"#" + str(count)
else:
new_string += str1[i]*count
else:
if count > n:
new_string += str1[i] + "#" + str(count)
else:
new_string += str1[i]*count
count = 1
# Add last character if necessary
if len(str1) > 1 and str1[-1] != str1[-2]:
new_string += str1[-1]
print(new_string)
consecutive_alpha("BBBBBBCCCBBBD")
consecutive_alpha("BBBBBBCCCAAAABBBXXXXX")
Given a paragraph of space-separated lowercase English words and a list of unique lowercase English keywords, find the minimum length of the substring of which contains all the keywords that are separated by space in any order.
i put the following code where is the error ? How can i decrease time complexity.
import sys
def minimumLength(text, keys):
answer = 10000000
text += " $"
for i in xrange(len(text) - 1):
dup = list(keys)
word = ""
if i > 0 and text[i - 1] != ' ':
continue
for j in xrange(i, len(text)):
if text[j] == ' ':
for k in xrange(len(dup)):
if dup[k] == word:
del(dup[k])
break
word = ""
else:
word += text[j]
if not dup:
answer = min(answer, j - i)
break
if(answer == 10000000):
answer = -1
return answer
text = raw_input()
keyWords = int(raw_input())
keys = []
for i in xrange(keyWords):
keys.append(raw_input())
print(minimumLength(text, keys))
The trick is to scan from left to right and, once you find a window containing all the keys, try to reduce it on the left and enlarge it on the right preserving the property that all the terms remain inside the window.
Using this strategy you can solve the task in linear time.
The following code is a draft of the code that I tested on few strings, I hope the comments are enough to highlight the most critical steps:
def minimum_length(text, keys):
assert isinstance(text, str) and (isinstance(keys, set) or len(keys) == len(set(keys)))
minimum_length = None
key_to_occ = dict((k, 0) for k in keys)
text_words = [word if word in key_to_occ else None for word in text.split()]
missing_words = len(keys)
left_pos, last_right_pos = 0, 0
# find an interval with all the keys
for right_pos, right_word in enumerate(text_words):
if right_word is None:
continue
key_to_occ[right_word] += 1
occ_word = key_to_occ[right_word]
if occ_word == 1: # the first time we see this word in the current interval
missing_words -= 1
if missing_words == 0: # we saw all the words in this interval
key_to_occ[right_word] -= 1
last_right_pos = right_pos
break
if missing_words > 0:
return None
# reduce the interval on the left and enlarge it on the right preserving the property that all the keys are inside
for right_pos in xrange(last_right_pos, len(text_words)):
right_word = text_words[right_pos]
if right_word is None:
continue
key_to_occ[right_word] += 1
while left_pos < right_pos: # let's try to reduce the interval on the left
left_word = text_words[left_pos]
if left_word is None:
left_pos += 1
continue
if key_to_occ[left_word] == 1: # reduce the interval only if it doesn't decrease the number of occurrences
interval_size = right_pos + 1 - left_pos
if minimum_length is None or interval_size < minimum_length:
minimum_length = interval_size
break
else:
left_pos += 1
key_to_occ[left_word] -= 1
return minimum_length
I'm trying to create a game where the score is dependent on what the letters are worth. I'm having trouble with keeping a count on the side while still recursing to the next letter of the string. I'm really stuck & I hope you can help!
def net_zero():
guess_prompt = input('Guess a string: ')
win_display = 'Congratulations you win'
low_vowels = "aeiou" # +1
low_constants = "bcdfghjklmnpqrstvwxyz" # -1
up_vowels = "AEIOU" # +2
up_constants = "BCDFGHJKLMNPQRSTVWXYZ" # -2
ten_digits = "0123456789" # +3
#else -3
count = 0
if len(guess_prompt) == 0:
return count
elif guess_prompt[0] in low_vowels:
return (count + 1) + guess_prompt[1:]
elif guess_prompt[0] in low_constants:
return (count - 1) + guess_prompt[1:]
elif guess_prompt[0] in up_vowels:
return (count + 2) + guess_prompt[1:]
elif guess_prompt[0] in up_constants:
return (count - 2) + guess_prompt[1:]
elif guess_prompt[0] in ten_digits:
return (count + 3) + guess_prompt[1:]
else: return (count - 3) + guess_prompt[1:]
I think you would like to do following
count = 0
if len(guess_prompt) == 0:
return count
for letter in guess_prompt:
if letter in low_vowels:
count +=1
if letter in low_constants:
count -=1
...
return count
I feel you can use dict instead of using string content for lookup. It will improve lookup time.
guess_prompt = "aaB4??BBBBB"
value = {}
for char in "aeiou":
value[char] = 1
for char in "bcdfghjklmnpqrstvwxyz":
value[char] = -1
for char in "AEIOU":
value[char] = 2
for char in "BCDFGHJKLMNPQRSTVWXYZ":
value[char] = -2
for char in "0123456789":
value[char] = 3
count = 0
for char in guess_prompt:
count = count + value.get(char, -3) #default value -3
print(count) ## PRINTS -13 ##
So I have a stored word. And the user is invited to check if a letter of their choice is in this word. My code for this is the following
storedword = "abcdeef"
word = list(germ)
print (word)
merge = input("letter please")
print ("your letter is", merge)
counter = int(0)
letterchecker = int(0)
listlength = len(word)
while counter < listlength and merge != word[counter]:
counter +=1
if counter <listlength:
print ("found")
else:
print ("not found")
How can I alter this code to check how many times the user letter is in this word? I can only use if's and while loops and not using .count
Can you use a Counter
from collections import Counter
storedword = "abcdeef"
wordcounter = Counter(list(storedword))
merge = input("letter please ")
print("your letter is %s" % merge)
print('It occurs %d times' % wordcounter[merge])
len([w for w in word if w == merge])
is short for
x = []
for w in word:
if w == merge:
x.append(w)
len(x)
Similar approach with while loop:
i = x = 0
while i < len(word):
if word[i] == merge:
x += 1
i += 1
counter = 0
letter_count = 0
while counter < len(word);
if word[counter] == merge:
letter_count +=1
counter +=1
Try this:
counter = 0
for c in word:
if c == merge:
counter += 1
If you can't use for, use:
counter = 0
ind = 0
while ind < len(word):
if word[ind] == merge:
counter += 1
ind +=1
I'm trying to make a program that tests if a word is a palindrome using a recursive function. I pretty much have it working but I'm just having trouble with getting it to move on to the next letter if the first and last are the same.
word = input("enterword")
word = word.lower()
def palindrom(word):
if len(word) == 1 or len(word) == 0:
return 0;
if word[0] == word[-1]:
print(word[0], word[-1])
palindrom(word);
else:
return 1;
test = palindrom(word)
if test == 0:
print("Yes")
elif test == 1:
print("No")
So right now it tests if the first and last letter are the same and if so, should run the function again. I just need to have it then check word[1] and word[-2] but I'm having some trouble. I tried splitting word and just popping the letters but it kept looking at the list as a length of 1. So if there is a way to get it to get the length of the whole split list, that would work as well.
You're just missing the return statement when you call your method recursively and the correct slice:
def palindrom(word):
if len(word) == 1 or len(word) == 0:
return 0
if word[0] == word[-1]:
print(word[0], word[-1])
return palindrom(word[1:-1])
else:
return 1
You are missing return statement, also you need to pass reduced word palindrom(word[1:-1]) on each recursion.
word = "viooiv"
word = word.lower()
def palindrom(word):
if len(word) == 1 or len(word) == 0:
return 0
if word[0] == word[-1]:
print(word[0], word[-1])
return palindrom(word[1:-1])
else:
return 1
test = palindrom(word)
if test == 0:
print("Yes")
elif test == 1:
print("No"
Output:
('v', 'v')
('i', 'i')
('o', 'o')
Yes
Try calling palindrome on the word with this function:
def palindrome(word, i = 0):
if i == len(word):
return word[0] == word[-1]
if word[i] == word[-i - 1]:
return palindrome(word, i + 1)
else:
return False
A bit more shorter code:
def checkpalindrome(value):
valuelen = len(value)
if valuelen < 2:
print("palindrome")
else:
checkpalindrome(value[1:valuelen-1]) if value[0] == value[-1] else print('Not palindrome')