Longest repeating substring using for-loops and if-statements - python

I'm in an introductory level programming class that teaches python. I was introduced to a longest repeating substring problem for a project and I can't seem to crack it. I've looked on here for a solution, but I haven't learned suffix trees yet so I wouldn't be able to use them. So far, I've gotten here:
msg = "kalhfdlakdhfklajdf" (anything)
for i in range(len(msg)):
if msg[i] == msg[i + 1]:
reps.append(msg[i])
What this does is scan my string, msg, and check to see if the counter matches the next character in sequence. If the characters match, it appends msg[i] to the list "reps". My problem is that:
a) The function I created always appends one less than repetition amount, and
b) my function program always crashes due to msg[i+1] going out of bounds once it reaches the last spot on the list.
In essence, I want my program to find repeats, append them to a list where the highest repeating character is counted and returned to the user.

You need to use len(msg)-1 as your range but your condition will omit one character with your condition, and for getting ride of that you can add another condition to your code that check the preceding characters too :
with you'r condition you'll have 8 h in reps till there is 9 in msg:
>>> msg = "kalhfdlakdhhhhhhhhhfklajdf"
>>> reps = []
>>> for i in range(len(msg)-1):
... if msg[i] == msg[i + 1]:
... reps.append(msg[i])
...
>>> reps
['h', 'h', 'h', 'h', 'h', 'h', 'h', 'h']
And with another condition :
>>> reps=[]
>>> for i in range(len(msg)-1):
... if msg[i] == msg[i + 1] or msg[i] == msg[i - 1]:
... reps.append(msg[i])
...
>>> reps
['h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h']

For the groupby answer I alluded to on #Kasra's excellent response:
from itertools import groupby
msg = "kalhfdlakdhhhhhhhhhfklajdf"
maxcount = 0
for substring in groupby(msg):
lett, count = substring[0], len(list(substring[1]))
if count > maxlen:
maxcountlett = lett
maxcount = count
result = [maxcountlett] * maxlen
But note that this only works for substrings of length 1. msg = 'hahahaha' should give ['ha', 'ha', 'ha', 'ha'] by my understanding.

a) Think about what is happening when it makes the first match.
For example, given abcdeeef it sees that msg[4] matches msg[5]. It then goes and appends msg[4] to reps. Then msg[5] matches msg[6] and it appends msg[5] to reps. However, msg[6] does not match msg[7] so it does not append msg[6]. You are one short.
In order to fix this you need to append one extra for each string of matches. A good way to do this is to check if the character you're currently matching already exists in reps. If it does only append the current one. If it does not append it twice.
if msg[i] == msg[i+1]
if msg[i] in reps
reps.append(msg[i])
else
reps.append(msg[i])
reps.append(msg[i])
b) You need to ensure that you do not exceed your boundaries. This can be accomplished by taking 1 off of your range.
for i in (range(len(msg)-1))

Related

List modification doesn't change list

I'm trying to reverse a string, so I converted the string into a list and was trying to send the last element to the front, 2nd to last element to the 2nd space, etc.
word = input("Enter a word: ")
word = list(word)
count = 0
while count < len(word):
word.insert(count, word.pop())
count = count + 1
print(word)
It just returns the original string in list form, even though I'm saving the last letter and inserting it before popping it off of the string? Does word.pop() not capture the last letter of a string before deleting it or am I overlooking something?
Well the simplest way to do what you are trying is to slice the string in reverse order, this does not even require changing into a list:
word = input("Enter a word: ")
return word[::-1]
Here's an experiment:
>>> word = list('python')
>>> word.insert(0, word[-1])
>>> word
['n', 'p', 'y', 't', 'h', 'o', 'n']
>>> word.remove(word[-1])
>>> word
['p', 'y', 't', 'h', 'o', 'n']
Wait, what?!
>>> help(word.remove)
Help on built-in function remove:
remove(value, /) method of builtins.list instance
Remove first occurrence of value.
Raises ValueError if the value is not present.
Remove first occurrence of value.
So, you inserted word[-1] at the beginning of the list, and then word.remove immediately removes the first occurrence of word[-1], which is now at the beginning of the list, you've just inserted it there!
You're setting the variables inside the while-loop to the same value. Also, use list.pop to remove the element from the list. For example:
word = input("Enter a word: ")
word = list(word)
count = 0
while count < len(word):
word.insert(count, word.pop())
count = count + 1
print(word)
Prints:
Enter a word: book
['k', 'o', 'o', 'b']
Here is the docstring for list.remove:
>>> help(list.remove)
Help on method_descriptor:
remove(self, value, /)
Remove first occurrence of value.
Raises ValueError if the value is not present.
>>>
As you can see, list.remove removes the first occurrence of the given value from the list. All your backwards function does right now is take the last character of the word, add it to the front and then immediately remove it from the front again. You do this once for every character in the word, the net result being no change.

How do I check if a sequence of characters exists in a list?

How do I check if a sequence of characters exists in a list?
I have a string with some characters that have sequences that reoccur. I know that strings are immutable so I turn the string into the list. However, I'm not sure how to iterate through the list, find the occurrence and change the first letter of the occurrence.
message: DDMCAXQVEKGYBNDDMZUH
Occurence is: DDM
list: ['D', 'D', 'M', 'C', 'A', 'X', 'Q', 'V', 'E', 'K', 'G', 'Y', 'B', 'N', 'D', 'D', 'M', 'Z', 'U', 'H']
What I have currently is simply turning the message into the list. I've tried different ways, which were unsuccessfully that's what I didn't post it. Not really asking you to write the code but at the least explain how to achieve this.
It's a lot easier to check if a string exists in another string since you can simply use the in operator:
if 'DDM' in message:
# do something
But since your goal is to change the first letter of the occurrence, you can use the str.index method to obtain the index of the occurrence and then assemble a new string with slices of the current string and the new letter:
try:
i = message.index('DDM')
message = message[:i] + new_letter + message[i + 1:]
except ValueError:
raise RuntimeError("Sequence 'DDM' not found in message.")
You can use re.sub():
import re
s = 'DDMCAXQVEKGYBNDDMZUH'
re.sub(r'DDM', '$DM', s)
# $DMCAXQVEKGYBN$DMZUH
A simple solution with a for-loop would be:
msg = 'DDMCAXQVEKGYBNDDMZUH'
occ = 'DDM'
for i in range(len(msg)):
if msg[i:i+len(occ)] == occ:
msg = msg[:i] + 'x' + msg[i+1:]
resulting in xDMCAXQVEKGYBNxDMZUH
This also works with overlapping substrings. For example:
msg = 'AAABAA'
occ = 'AA'
will give xxABxA
The simplest way would be using string replace() function.
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
message = "DDMCAXQVEKGYBNDDMZUH"
print message.replace("DDM", "ABC", 1)
Replace function would replace the first occurrence of DDM in the message string.
output: ABCCAXQVEKGYBNDDMZUH
If I carefully read your question you want to search the first occurrence of DDM in your message and replace the first character of it. In that case use below:
message = "DDMCAXQVEKGYBNDDMZUH"
print message.replace("DDM", "ADM", 1)
output: ADMCAXQVEKGYBNDDMZUH

Complexity of reverse sentence algorithm

I was working on a data-structure problem in Python where I have to reverse the order of the words in the array in the most efficient manner. I came up with the following solution to the problem
def reverse(arr, st, end):
while st < end:
arr[st], arr[end] = arr[end], arr[st]
end -= 1
st += 1
def reverse_arr(arr):
arr = arr[::-1]
st_index = 0
length = len(arr)
for i, val in enumerate(arr):
if val == ' ':
end_index = i-1
reverse(arr, st_index, end_index)
st_index = end_index + 2
if i == length - 1:
reverse(arr, st_index, length-1)
return arr
If the arr is:
arr = [ 'p', 'e', 'r', 'f', 'e', 'c', 't', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'r', 'a', 'c', 't', 'i', 'c', 'e' ]
It returns:
['p', 'r', 'a', 'c', 't', 'i', 'c', 'e', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'e', 'r', 'f', 'e', 'c', 't']
The solution works fine but I don't understand how the complexity of this algorithm is O(n). It's written that traversing the array twice with a constant number of actions for each item is linear i.e. O(n) where n is the length of the array.
I think it should be more than O(n) as according to me the length of each word is not fixed and time complexity to reverse each word depends on the length of the word. Can someone explain this in a better way?
reverse will get called once for each word. During that call, it will do a constant amount of work per character.
You can either represent this in terms of the number of words and average length of words (i.e. O(wordCount*averageWordLength)), or in terms of the total number of characters in the array. If you do the latter, it's easy to see that you're still doing a constant amount of work per character (since both reverse and reverse_arr does a constant amount of work per character, and no two reverse calls will include the same character), leading to O(characterCount) complexity.
I would not assume that "the length of the array" in the explanation refers to the number of words, but rather the number of characters, or they're assuming the word length has a fixed upper bound (in which the complexity is indeed O(wordCount)).
TL;DR: n in O(n) is characterCount, not wordCount.
def reverse(arr, st, end):
while st < end:
arr[st], arr[end] = arr[end], arr[st]
end -= 1
st += 1
def reverse_Cha(arr):
arr = arr[::-1]
st_index = 0
length = len(arr)
for i, val in enumerate(arr):
if val == ' ':
end_index = i-1
reverse(arr, st_index, end_index)
st_index = end_index + 2
if i == length - 1:
reverse(arr, st_index, length-1)
return arr
def reverse_Jon(arr):
r = [ch for word in ' '.join(''.join(arr).split()[::-1]) for ch in word]
return r
def reverse_Nua(arr):
rev_arr = list(' '.join(''.join(arr).split()[::-1]))
return rev_arr
If we considered the 3 proposed solutions: yours as reverse_Cha, Jon Clements' as reverse_Jon, and mine as reverse_Nua.
We note that we have O(n) when we use [::-1], when we examine each elements of a list (length n), etc.
reverse_Cha uses [::-1], then examine each elements twice (to read then to exchange), complexity is thus depending on the total number of elements (O(3n+c) which we write as O(n) (+c comes from O(1) operations))
reverse_Jon uses [::-1], then examine each elements twice (examine each character of each word), complexity is thus depending on the total number of elements and number of words (O(3n+m) which we write as O(n+m) (with m the number of words))
reverse_Nua uses [::-1], then stick to python list functions, complexity is thus still depending on the total number of elements (Just O(n) directly this time)
As term of performance (1e6 loops), we got reverse_Cha: 2.785867s; reverse_Jon: 4.11845s (due to for); reverse_Nua: 1.185973s.
I assume this is a purely theoretical question, because in real world applications you would probably rather split your list into one-word sublists, then rejoin the sublists in reverse order - that requires more memory, but is much faster.
Having said that, I'd like to point out that the algorithm you've shown is, indeed, O(n) - it depends on total length of your words, not on lengths of individual words. In other words: it will take the same time for 20 3-letter words, 6 10-letter words, 10 6-letter words… you always go through every letter only twice: once during reversal of individual words (that's the first call to reverse in reverse_arr) and once during reversal of the whole array (the second call to reverse).

Can this function return an int value by linking the elements of a list?

I am creating a function in order to develop a tiny word game. When I was creating it I got stuck when I tried to write the body. I was looking for information about Python and if I can write a return statement . It seems that it is possible but I didn't find out anything clear about that. This is my body function: This is my current progress: Am I close? or Should I try another method?
def num_words_on_board(board, words):
""" (list of list of str, list of str) -> int
Return how many words appear on board.
>>> num_words_on_board([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], ['ANT', 'BOX', 'SOB', 'TO'])
3
"""
count = 0
for word_list in board:
if words in ''.join(word_list):
count = count + 1
return count
Your question is lacking in explanation, but I'll answer the best I understood.
I think you are trying to do something like a wordsearch solver, mixed with a scramble puzzle?
Anyways, my recommendation is to make multiple functions for everything you need to solve. For example:
The way I see it, you need to know if the letters in the board can make up each of the words in the words variable. That can be done by one function. If you don't really need the order of the word, just the length, then we can do it like this.
def same (word, letters):
temp = []
for every_letter in word:
if every_letter in letters and every_letter not in temp:
temp.append(every_letter)
return len(temp) >= len(word)
This function takes only one word and a "bunch of letters" (for example a list from board ;) ) as parameters, then the function compares each letter in the word against each letter in the "bunch of letters" and if it finds a match it adds it to a temp variable, at the end of the iterations if the temp variable has at least the same count of letters as the initial `word' then it's safe to say that the word can be built.
*there is a problem with this function. If the original word has repeated letters, for example the word "butter" then this function will not work, but since this is not your case we are good to continue.
For the second part we have to use that function for every word in board so we'll use another function for that:
def num_words_on_board(board, words):
count = 0
for word in words:
for letters in board:
if same(word, letters):
count += 1
print(count) # This is not needed, just here for testing.
return count
And there we go. This function should return the count which is 3. I hope this helps you.
(if anyone wanna correct my code please feel free, it's not optimal by any means. Just thought it would be easy to understand like this, rather than the answer in the duplicate question mentioned by Stefan Pochmann)
:)
I had a previous function can I use it in order to create this new one?.
My previous function is this:
def board_contains_word(board, word):
""" (list of list of str, str) -> bool
Return True if and only if word appears in board.
Precondition: board has at least one row and one column.
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'ANT')
True
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'NNT')
False
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'NTT')
True
"""
for word_list in board:
if word in ''.join(word_list):
return True
return False

Eliminating last element in array

So I am working on a small hangman text based game.
The problem I am currently dealing with is calling random words from my text file. Each word has one additional character for a new line (\n).
For instance, running through my function that separates a string's letters into individual elements I get something to the effect of:
from text file: guess
answer = arrange_word(guess)
>>>>> ['g', 'u', 'e', 's', 's', '\n']
however, when joining the array back together the following is shown:
print ''.join(arrange_word)
>>>>> guess
as you can see, it is a bit difficult to guess an element that does not show up.
For clarity here is my function for arrange_word:
def arrange_word(word):
##########
# This collects the mystery word and breaks it into an array of
# individual letters.
##########
word_length = len(word)
break_up = ["" for x in range(word_length)]
for i in range(0, word_length):
break_up[i] = word[i]
return break_up
What I am stuck on is that when trying to guess letters, the \n is impossible to guess. The win condition of my game is based on the guess being identical to the answer word. However the \n keeps that from working because they are of different length.
These answer arrays are of different length as well, since I am just pulling random lines from a text file of ~1000 words. After hours of searching I cannot seem to find out how to drop the last element of an array.
For this line here:
word_length = len(word)
Before you take the length, what you can do is this first:
word = word.strip()
Explanation:
strip removes leading and trailing whitespace.
>>> s = "bob\n"
>>> s
'bob\n'
>>> s.strip()
'bob'
With all this in mind, you don't need the rest of this code anymore:
word_length = len(word)
break_up = ["" for x in range(word_length)]
for i in range(0, word_length):
break_up[i] = word[i]
Applying the strip will give you your word without the whitespace character, then all you want to do after this to have a list of characters, is simply:
>>> s = "bob"
>>> list(s)
['b', 'o', 'b']
So your method can now simply be:
def arrange_word(word):
return list(word.strip())
Demo:
arrange_word("guess")
Output:
['g', 'u', 'e', 's', 's']
All these answers are fine for specifically stripping whitespace characters from a string, but more generally, Python lists implement standard stack/queue operations, and you can make your word into a list just by calling the list() constructor without needing to write your own function:
In [38]: letters = list('guess\n')
letters.pop()
letters
Out[38]: ['g', 'u', 'e', 's', 's']
Use List slicing
arr = [1,2,3,4]
print(arr[:-1:])
Array slicing syntax is [startindex:endindex:offset(2, means each 2 element)] So in your case you could. Which mean start at the begging of the list, to the last element -1 for every 1 element in the list.
return break_up[:-1:]
you can access last element by -1 index like:
guess[-1]
and you can delte it by:
del guess[-1]
Just strip the word:
word = 'guess\n'
word = word.strip() ## 'guess' without new line character, or spaces
Maybe first line of your arrange_word function should be
word = word.strip()
to remove all leading/trailing whitespace characters.

Categories

Resources