How to find the largest repeating substring given character in Python?

How to find the largest repeating substring given character in Python? - python

Given some string say 'aabaaab', how would I go about finding the largest substring of a. So it should return 'aaa'. Any help would be greatly appreciated.
def sub_string(s):
best_run = 0
current_run = 0
for char in s:
if char == 'a'
current_run += 1
else:
current_letter = char
return(best_run)
I have something like the one above. Not sure where I can fix it up.

not the most efficient, but a straightforward solution:
word = "aasfgaaassaasdsddaaaaaafff"
substr_count = 0
substr_counts = []
character = "f"
for i, letter in enumerate(word):
if (letter == character):
substr_count += 1
else:
substr_counts.append(substr_count)
substr_count = 0
if (i == len(word) - 1):
substr_counts.append(substr_count)
print(max(substr_counts))

If you want a short method using standard python tools (and avoid writing loops to reconstruct the string as you iterate), you can use regex to split the string by any non-a characters than get the max() according to len:
import re
test_string = 'aabaaab'
split_string_list = re.split( '[^a]', test_string )
longest_string_subset = max( split_string_list, key=len )
print( longest_string_subset )
The re library is for regex, the '[^a]' is a regex statement for any non-a character. Basically, the 'aabaaab' is being split into a list according to any matches on the regex statement, so that it becomes [ 'aa' 'aaa' '' ]. Then, the max() statement looks for the longest string based on len (aka length).
You can read more about functions like re.split() in the docs: https://docs.python.org/2/library/re.html

Related

Replace sequence of the same letter with single one

I am trying to replace the number of letters with a single one, but seems to be either hard either I am totally block how this should be done
So example of input:
aaaabbcddefff
The output should be abcdef
Here is what I was able to do, but when I went to the last piece of the string I can't get it done. Tried different variants, but I am stucked. Can someone help me finish this code?
text = "aaaabbcddefff"
new_string = ""
count = 0
while text:
for i in range(len(text)):
l = text[i]
for n in range(len(text)):
if text[n] == l:
count += 1
continue
new_string += l
text = text.replace(l, "", count)
break
count = 0
break

Using regex
re.sub(r"(.)(?=\1+)", "", text)
>>> import re
>>> text = "aaaabbcddefff"
>>> re.sub(r"(.)(?=\1+)", "", text)
abcdeaf

Side note: You should consider building your string up in a list and then joining the list, because it is expensive to append to a string, since strings are immutable.
One way to do this is to check if every letter you look at is equal to the previous letter, and only append it to the new string if it is not equal:
def remove_repeated_letters(s):
if not s: return ""
ret = [s[0]]
for index, char in enumerate(s[1:], 1):
if s[index-1] != char:
ret.append(char)
return "".join(ret)
Then, remove_repeated_letters("aaaabbcddefff") gives 'abcdef'.
remove_repeated_letters("aaaabbcddefffaaa") gives 'abcdefa'.
Alternatively, use itertools.groupby, which groups consecutive equal elements together, and join the keys of that operation
import itertools
def remove_repeated_letters(s):
return "".join(key for key, group in itertools.groupby(s))

How to use a list of numbers as index inputs

So I have a list of numbers (answer_index) which correlate to the index locations (indicies) of a characters (char) in a word (word). I would like to use the numbers in the list as index inputs later (indexes) on in code to replace every character except my chosen character(char) with "*" so that the final print (new_word) in this instance would be (****ee) instead of (coffee). it is important that (word) maintains it's original value while (new_word) becomes the modified version. Does anyone have a solution for turning a list into valid index inputs? I will also except easier ways to meet my goal. (Note: I am extremely new to python so I'm sure my code looks horrendous) Code below:
word = 'coffee'
print(word)
def find(string, char):
for i, c in enumerate(string):
if c == char:
yield i
string = word
char = "e"
indices = (list(find(string, char)))
answer_index = (list(indices))
print(answer_index)
for t in range(0, len(answer_index)):
answer_index[t] = int(answer_index[t])
indexes = [(answer_index)]
new_character = '*'
result = ''
for i in indexes:
new_word = word[:i] + new_character + word[i+1:]
print(new_word)

You hardly ever need to work with indices directly:
string = "coffee"
char_to_reveal = "e"
censored_string = "".join(char if char == char_to_reveal else "*" for char in string)
print(censored_string)
Output:
****ee
If you're trying to implement a game of hangman, you might be better off using a dictionary which maps characters to other characters:
string = "coffee"
map_to = "*" * len(string)
mapping = str.maketrans(string, map_to)
translated_string = string.translate(mapping)
print(f"All letters are currently hidden: {translated_string}")
char_to_reveal = "e"
del mapping[ord(char_to_reveal)]
translated_string = string.translate(mapping)
print(f"'{char_to_reveal}' has been revealed: {translated_string}")
Output:
All letters are currently hidden: ******
'e' has been revealed: ****ee

The easiest and fastest way to replace all characters except some is to use regular expression substitution. In this case, it would look something like:
import re
re.sub('[^e]', '*', 'coffee') # returns '****ee'
Here, [^...] is a pattern for negative character match. '[^e]' will match (and then replace) anything except "e".
Other options include decomposing the string into an iterable of characters (#PaulM's answer) or working with bytearray instead

In Python, it's often not idiomatic to use indexes, unless you really want to do something with them. I'd avoid them for this problem and instead just iterate over the word, read each character and and create a new word:
word = "coffee"
char_to_keep = "e"
new_word = ""
for char in word:
if char == char_to_keep:
new_word += char_to_keep
else:
new_word += "*"
print(new_word)
# prints: ****ee

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.

you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba

Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

Check special symbols in string endings

How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?

You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.

You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3

If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))

You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass

You could try this.
string="gffrwr."
print(string[-1] in "!?,().")

Counting words starting with a character

Write a function that accepts a string and a character as input and
returns the count of all the words in the string which start with the
given character. Assume that capitalization does not matter here. You
can assume that the input string is a sentence i.e. words are
separated by spaces and consists of alphabetic characters.
This is my code:
def count_input_character (input_str, character):
input_str = input_str.lower()
character = character.lower()
count = 0
for i in range (0, len(input_str)):
if (input_str[i] == character and input_str[i - 1] == " "):
count += 1
return (count)
#Main Program
input_str = input("Enter a string: ")
character = input("Enter character whose occurances are to be found in the given input string: ")
result = count_input_character(input_str, character)
#print(result)
The only part missing here is that how to check if the first word of the sentence is stating with the user given character. consider this output:
Your answer is NOT CORRECT Your code was tested with different inputs. > For example when your function is called as shown below:
count_input_character ('the brahman the master of the universe', 't')
####### Your function returns ############# 2 The returned variable type is: type 'int'
### Correct return value should be ######## 3 The returned variable type is: type 'int'

You function misses the first t because in this line
if (input_str[i] == character and input_str[i - 1] == " "):
when i is 0, then input_str[i - 1] is input_str[-1] which Python will resolve as the last character of the string!
To fix this, you could change your condition to
if input_str[i] == character and (i == 0 or input_str[i - 1] == " "):
Or use str.split with a list comprehension. Or a regular expression like r'(?i)\b%s', with (?i) meaning "ignore case", \b is word boundary and %s a placeholder for the character..

Instead of looking for spaces, you could split input_str on whitespace, this would produce a list of words that you could then test against character. (Pseudocode below)
function F sentence, character {
l = <sentence split by whitespace>
count = 0
for word in l {
if firstchar(word) == character {
count = count + 1
}
}
return count
}

Although it doesn't fix your specific bug, for educational purposes, please note you could rewrite your function like this using list comprehension:
def count_input_character (input_str, character):
return len([x for x in input_str.lower().split() if x.startswith(character.lower())])
or even more efficiently(thanks to tobias_k)
def count_input_character (input_str, character):
sum(w.startswith(character.lower()) for w in input_str.lower().split())

def c_upper(text, char):
text = text.title() #set leading char of words to uppercase
char = char.upper() #set given char to uppercase
k = 0 #counter
for i in text:
if i.istitle() and i == char: #checking conditions for problem, where i is a char in a given string
k = k + 1
return k

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find the largest repeating substring given character in Python? - python

Related

Replace sequence of the same letter with single one

How to use a list of numbers as index inputs

Remove string character after run of n characters in string

Check special symbols in string endings

Counting words starting with a character

Categories

Resources