Substring of a string from a point where character starts to repeat - python

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.
The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.
def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
if word[i] not in check_char:
temp_w += word[i]
check_char[word[i]] = i
else:
my_l.append(temp_w)
temp_w=''
i = check_char[word[i]]
check_char.pop(word[i])
i+=1
return my_l
print(checkrepeat('college'))

This may not be best practice, but it seems functional:
def checkrepeat(word):
for letter in set(word):
split_word = []
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
print split_word
checkrepeat('college')
set(word) gives us a list of the unique characters in word. We create an empty list (split_word) to maintain the separate sections of the word. count lets us count the number of times a letter appears in a word - we want to split our word until every substring contains the given letter only once.
We iterate over a copy of word (as we need to repeat the exercise for each duplicated letter, thus don't want to tamper with the original word variable), and add the end-section of the word from our letter onwards to the start of our list. We repeat this until copyword only has our letter in it once, at which point we exit the while loop. The remaining characters of copyword must be added to the start of our list, and we print the word given. This example prints:
['colleg', 'e']
['col', 'lege']

EDIT2 - The Working Solution, that's semi-elegant and almost Pythonic:
def split_on_recursion(your_string, repeat_character): #Recursive function
temp_string = ''
i = 0
for character in your_string:
if repeat_character == character:
if i==1:
return split_on_recursion(temp_string, repeat_character) #Recursion
else:
i += 1
temp_string += character
return temp_string
def split_on_repeat(your_string):
temp_dict = {}
your_dict = {}
your_end_strings = []
for char in set(your_string):
temp_dict[char] = your_string.count(char) #Notice temp_dict
for key in temp_dict:
if temp_dict[key] >= 2:
your_dict[key] = temp_dict[key] #Isolate only the characters which repeat
if your_dict != {}:
for key in your_dict:
pre_repeat_string = split_on_recursion(your_string,key)
post_repeat_string = your_string.replace(pre_repeat_string,'')
your_end_strings.append((pre_repeat_string, post_repeat_string))
else:
your_end_strings = [(your_string)]
return your_end_strings
Use:
>>> print(split_on_repeat('Innocent'))
[('In', 'nocent')]
>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]
>>> print(split_on_repeat('Python.py'))
[('Python.p', 'y')]
>>> print(split_on_repeat('Systems'))
[('System', 's')]
As is the case, the solution is case-sensitive, but that is a minor issue.
To fathom the solution, though, you need to understand how recursions work. If you don't, this might not be a great example; I would recommend people to start with math problems.
But here's some quick context about how indexing works in python:
'Word'[:1] == 'Wo'
'Word'[-1] == 'd'
'Word'[:-1] == 'Wor'
This indexing works for every object that is indexable.

Solution derived from original #asongtoruin's idea:
import collections
def checkrepeat(word):
out = collections.defaultdict(int)
for c in word:
out[c] += 1
out = {k: [] for (k, v) in out.items() if v > 1}
for letter, split_word in out.iteritems():
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
return out
for word in ["bloomberg", "college", "systems"]:
print checkrepeat(word)
Output:
{'b': ['bloom', 'berg'], 'o': ['blo', 'omberg']}
{'e': ['colleg', 'e'], 'l': ['col', 'lege']}
{'s': ['sy', 'stem', 's']}

def split_repeated(string):
visited = set()
res = []
for i, c in enumerate(string):
if c in visited: res.append([string[0:i], string[i:]])
visited.add(c)
return res
Output:
split_repeated("college")
#=> [['col', 'lege'], ['colleg', 'e']]
split_repeated("hello world")
#=> [['hel', 'lo world'], ['hello w', 'orld'], ['hello wor', 'ld']]
If you need to split a string only when you meet repeated letter first time:
def split_repeated_unique(string):
visited = set()
shown = set()
res = []
for i, c in enumerate(string):
if c in visited:
if c not in shown:
res.append([string[0:i], string[i:]])
shown.add(c)
else:
visited.add(c)
return res
And the key difference is following:
split_repeated("Hello, Dolly")
#=> [('Hel', 'lo, Dolly'), ('Hello, D', 'olly'), ('Hello, Do', 'lly'), ('Hello, Dol', 'ly')]
split_repeated_unique("Hello, Dolly")
#=> [['Hel', 'lo, Dolly'], ['Hello, D', 'olly']]

Related

python given query string find a set of strings with same beginning

Edit: I appreciate all the answers but could anyone tell me why my solution is not working? I wanted to try to do this without the .startswith() thank you!
I am trying to complete this excercise:
Implement an autocomplete system. That is, given a query string and a set of all possible query strings,
return all strings in the set that have s as a prefix.
For example, given the query string de and the set of strings [dog, deer, deal], return [deer, deal].
Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.
But I get a empty list. What could I be doing wrong? I thought this would give me [deer, deal]
def autocomplete(string,set):
string_letters = []
letter_counter = 0
list_to_return = []
for letter in string:
string_letters.append(letter)
for words in set:
for letter in words:
if letter_counter == len(string):
list_to_return.append(words)
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
output:
[]
Edit: I appreciate all the answers but could anyone tell me why my solution is not working? I wanted to try to do this without the .startswith() thank you!
Here is how I would accomplish what you are trying to do:
import re
strings = ['dog', 'deer', 'deal']
search = 'de'
pattern = re.compile('^' + search)
[x for x in strings if pattern.match(x)]
RESULT: ['deer', 'deal']
However in most cases with a use case such as this, you might want to ignore the case of the search string and search field.
import re
strings = ['dog', 'Deer', 'deal']
search = 'De'
pattern = re.compile('^' + search, re.IGNORECASE)
[x for x in strings if pattern.match(x)]
RESULT: ['Deer', 'deal']
To answer the part of why your code does not work, it helps to add some verbosity to the code:
def autocomplete(string,set):
string_letters = []
letter_counter = 0
list_to_return = []
for letter in string:
string_letters.append(letter)
for word in set:
print(word)
for letter in word:
print(letter, letter_counter, len(string))
if letter_counter == len(string):
list_to_return.append(word)
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
print('hit break')
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
Output:
dog
('d', 0, 2)
('o', 1, 2)
hit break
deer
('d', 1, 2)
hit break
deal
('d', 1, 2)
hit break
[]
As you can see in the output for dog 'd matched but o did not', this made the letter_counter 1, then upon deer 'd != 'e' so it breaks... This perpetuates over and over. Interestingly setting 'ddeer' would actually match due this behavior. To fix this you need to reset the letter_counter in the for loop, and have additional break points to prevent over-reving your indexes.
def autocomplete(string,set):
string_letters = []
list_to_return = []
for letter in string:
string_letters.append(letter)
for word in set:
# Reset letter_counter as it is only relevant to this word.
letter_counter = 0
print(word)
for letter in word:
print(letter, letter_counter, len(string))
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
# We did not match break early
break
if letter_counter == len(string):
# We matched for all letters append and break.
list_to_return.append(word)
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
I notice the hint, but it's not stated as a requirement, so:
def autocomplete(string,set):
return [s for s in set if s.startswith(string)]
print(autocomplete("de", ["dog","deer","deal"]))
str.startswith(n) will return a boolean value, True if the str starts with n, otherwise, False.
You can just use the startswith string function and avoid all those counters, like this:
def autocomplete(string, set):
list_to_return = []
for word in set:
if word.startswith(string):
list_to_return.append(word)
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
Simplify.
def autocomplete(string, set):
back = []
for elem in set:
if elem.startswith(string[0]):
back.append(elem)
return back
print(autocomplete("de", ["dog","deer","deal","not","this","one","dasd"]))

multiplying letter of string by digits of number

I want to multiply letter of string by digits of number. For example for a word "number" and number "123"
output would be "nuummmbeerrr". How do I create a function that does this? My code is not usefull, because it doesn't work.
I have only this
def new_word(s):
b=""
for i in range(len(s)):
if i % 2 == 0:
b = b + s[i] * int(s[i+1])
return b
for new_word('a3n5z1') output is aaannnnnz .
Using list comprehension and without itertools:
number = 123
word = "number"
new_word = "".join([character*n for (n, character) in zip(([int(c) for c in str(number)]*len(str(number)))[0:len(word)], word)])
print(new_word)
# > 'nuummmbeerrr'
What it does (with more details) is the following:
number = 123
word = "number"
# the first trick is to link each character in the word to the number that we want
# for this, we multiply the number as a string and split it so that we get a list...
# ... with length equal to the length of the word
numbers_to_characters = ([int(c) for c in str(number)]*len(str(number)))[0:len(word)]
print(numbers_to_characters)
# > [1, 2, 3, 1, 2, 3]
# then, we initialize an empty list to contain the repeated characters of the new word
repeated_characters_as_list = []
# we loop over each number in numbers_to_letters and each character in the word
for (n, character) in zip(numbers_to_characters, word):
repeated_characters_as_list.append(character*n)
print(repeated_characters_as_list)
# > ['n', 'uu', 'mmm', 'b', 'ee', 'rrr']
new_word = "".join(repeated_characters_as_list)
print(new_word)
# > 'nuummmbeerrr'
This will solve your issue, feel free to modify it to fit your needs.
from itertools import cycle
numbers = cycle("123")
word = "number"
output = []
for letter in word:
output += [letter for _ in range(int(next(numbers)))]
string_output = ''.join(output)
EDIT:
Since you're a beginner This will be easier to understand for you, even though I suggest reading up on the itertools module since its the right tool for this kind of stuff.
number = "123"
word = "number"
output = []
i = 0
for letter in word:
if(i == len(number)):
i = 0
output += [letter for _ in range(int(number[i]))]
i += 1
string_output = ''.join(output)
print(string_output)
you can use zip to match each digit to its respective char in the word (using itertools.cycle for the case the word is longer), then just multiply the char by that digit, and finally join to a single string.
try this:
from itertools import cycle
word = "number"
number = 123
number_digits = [int(d) for d in str(number)]
result = "".join(letter*num for letter,num in zip(word,cycle(number_digits)))
print(result)
Output:
nuummmbeerrr

How to remove Triplicate Letters in Python

So I'm a little confused as far as putting this small code together. My teacher gave me this info:
Iterate over the string and remove any triplicated letters (e.g.
"byeee mmmy friiiennd" becomes "bye my friennd"). You may assume any
immediate following same letters are a triplicate.
I've mostly only seen examples for duplicates, so how do I remove triplicates? My code doesn't return anything when I run it.
def removeTriplicateLetters(i):
result = ''
for i in result:
if i not in result:
result.append(i)
return result
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()
I have generalized the scenario with "n". In your case, you can pass n=3 as below
def remove_n_plicates(input_string, n):
i=0
final_string = ''
if not input_string:
return final_string
while(True):
final_string += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return final_string
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd
You can use this for any "n" value now (where n > 0 and n < length of input string)
Your code returns an empty string because that's exactly what you coded:
result = ''
for i in result:
...
return result
Since result is an empty string, you don't enter the loop at all.
If you did enter the loop you couldn't return anything:
for i in result:
if i not in result:
The if makes no sense: to get to that statement, i must be in result
Instead, do as #newbie showed you. Iterate through the string, looking at a 3-character slice. If the slice is equal to 3 copies of the first character, then you've identified a triplet.
if input_string[i:i+n] == input_string[i]*n:
Without going in to writing the code to resolve the problem.
When you iterate over the string, add that iteration to a new string.
If the next iteration is the same as the previous iteration then do not add that to the new string.
This will catch both the triple and the double characters in your problem.
Tweaked a previous answer to remove a few lines that were not needed.
def remove_n_plicates(input_string, n):
i=0
result = ''
while(True):
result += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return result
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd
Here's a fun way using itertools.groupby:
def removeTriplicateLetters(s):
return ''.join(k*(l//3+l%3) for k,l in ((k,len(list(g))) for k, g in groupby(s)))
>>> removeTriplicateLetters('byeee mmmy friiiennd')
'bye my friennd'
just modifying #newbie solution and using stack data structure as solution
def remove_n_plicates(input_string, n):
if input_string =='' or n<1:
return None
w = ''
c = 0
if input_string!='':
tmp =[]
for i in range(len(input_string)):
if c==n:
w+=str(tmp[-1])
tmp=[]
c =0
if tmp==[]:
tmp.append(input_string[i])
c = 1
else:
if input_string[i]==tmp[-1]:
tmp.append(input_string[i])
c+=1
elif input_string[i]!=tmp[-1]:
w+=str(''.join(tmp))
tmp=[input_string[i]]
c = 1
w+=''.join(tmp)
return w
input_string = "byeee mmmy friiiennd nnnn"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
output
bye my friennd nn
so this is a bit dirty but it's short and works
def removeTriplicateLetters(i):
result,string = i[:2],i[2:]
for k in string:
if result[-1]==k and result[-2]==k:
result=result[:-1]
else:
result+=k
return result
print(removeTriplicateLetters('byeee mmmy friiiennd'))
bye my friennd
You have already got a working solution. But here, I come with another way to achieve your goal.
def removeTriplicateLetters(sentence):
"""
:param sentence: The sentence to transform.
:param words: The words in the sentence.
:param new_words: The list of the final words of the new sentence.
"""
words = sentence.split(" ") # split the sentence into words
new_words = []
for word in words: # loop through words of the sentence
new_word = []
for char in word: # loop through characters in a word
position = word.index(char)
if word.count(char) >= 3:
new_word = [i for i in word if i != char]
new_word.insert(position, char)
new_words.append(''.join(new_word))
return ' '.join(new_words)
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()
Output: bye my friennd

How to write my own split function without using .split and .strip function?

How to write my own split function? I just think I should remove spaces, '\t' and '\n'. But because of the shortage of knowledge, I have no idea of doing this question
Here is the original question:
Write a function split(string) that returns a list of words in the
given string. Words may be separated by one or more spaces ' ' , tabs
'\t' or newline characters '\n' .
And there are examples:
words = split('duff_beer 4.00') # ['duff_beer', '4.00']
words = split('a b c\n') # ['a', 'b', 'c']
words = split('\tx y \n z ') # ['x', 'y', 'z']
Restrictions: Don't use the str.split method! Don't use the str.strip method
Some of the comments on your question provide really interesting ideas to solve the problem with the given restrictions.
But assuming you should not use any python builtin split function, here is another solution:
def split(string, delimiters=' \t\n'):
result = []
word = ''
for c in string:
if c not in delimiters:
word += c
elif word:
result.append(word)
word = ''
if word:
result.append(word)
return result
Example output:
>>> split('duff_beer 4.00')
['duff_beer', '4.00']
>>> split('a b c\n')
['a', 'b', 'c']
>>> split('\tx y \n z ')
['x', 'y', 'z']
I think using regular expressions is your best option as well.
I would try something like this:
import re
def split(string):
return re.findall('\S+',string)
This should return a list of all none whitespace characters in your string.
Example output:
>>> split('duff_beer 4.00')
['duff_beer', '4.00']
>>> split('a b c\n')
['a', 'b', 'c']
>>> split('\tx y \n z ')
['x', 'y', 'z']
This is what you can do with assigning a list, This is tested on python3.6
Below is Just an example..
values = 'This is a sentence'
split_values = []
tmp = ''
for words in values:
if words == ' ':
split_values.append(tmp)
tmp = ''
else:
tmp += words
if tmp:
split_values.append(tmp)
print(split_values)
Desired output:
$ ./splt.py
['This', 'is', 'a', 'sentence']
You can use the following function that sticks to the basics, as your professor apparently prefers:
def split(s):
output = []
delimiters = {' ', '\t', '\n'}
delimiter_found = False
for c in s:
if c in delimiters:
delimiter_found = True
elif output:
if delimiter_found:
output.append('')
delimiter_found = False
output[-1] += c
else:
output.append(c)
return output
so that:
print(split('duff_beer 4.00'))
print(split('a b c\n'))
print(split('\tx y \n z '))
would output:
['duff_beer', '4.00']
['a', 'b', 'c']
['x', 'y', 'z']
One approach would be to iterate over every char until you find a seperator, built a string from that chars and append it to the outputlist like this:
def split(input_str):
out_list = []
word = ""
for c in input_str:
if c not in ("\t\n "):
word += c
else:
out_list.append(word)
word = ""
out_list.append(word)
return out_list
a = "please\nsplit\tme now"
print(split(a))
# will print: ['please', 'split', 'me', 'now']
Another thing you could do is by using regex:
import re
def split(input_str):
out_list = []
for m in re.finditer('\S+', input_str):
out_list.append(m.group(0))
return out_list
a = "please\nsplit\tme now"
print(split(a))
# will print: ['please', 'split', 'me', 'now']
The regex \S+ is looking for any sequence of non whitespace characters and the function re.finditer returns an iterator with MatchObject instances over all non-overlapping matches for the regex pattern.
Please find my solution, it is not the best one, but it works:
def convert_list_to_string(b):
localstring=""
for i in b:
localstring+=i
return localstring
def convert_string_to_list(b):
locallist=[]
for i in b:
locallist.append(i)
return locallist
def mysplit(inputString, separator):
listFromInputString=convert_string_to_list(inputString)
part=[]
result=[]
j=0
for i in range(0, len(listFromInputString)):
if listFromInputString[i]==separator:
part=listFromInputString[j:i]
j=i+1
result.append(convert_to_string(part))
else:
pass
if j != 0:
result.append(convert_to_string(listFromInputString[j:]))
if len(result)==0:
result.append(inputString)
return result
Test:
mysplit("deesdfedefddfssd", 'd')
Result: ['', 'ees', 'fe', 'ef', '', 'fss', '']
Some of your solutions are very good, but it seems to me that there are more alternative options than using the function:
values = 'This is a sentence'
split_values = []
tmp = ''
for words in values:
if words == ' ':
split_values.append(tmp)
tmp = ''
else:
tmp += words
if tmp:
split_values.append(tmp)
print(split_values)
a is string and s is pattern here.
a="Tapas Pall Tapas TPal TapP al Pala"
s="Tapas"
def fun(a,s):
st=""
l=len(s)
li=[]
lii=[]
for i in range(0,len(a)):
if a[i:i+l]!=s:
st=st+a[i]
elif i+l>len(a):
st=st+a[i]
else:
li.append(st)
i=i+l
st=""
li.append(st)
lii.append(li[0])
for i in li[1:]:
lii.append(i[l-1:])
return lii
print(fun(a,s))
print(a.split(s))
This handles for whitespaces in strings and returns empty lists if present
def mysplit(strng):
#
# put your code here
#
result = []
words = ''
for char in strng:
if char != ' ':
words += char
else:
if words:
result.append(words)
words = ''
result.append(words)
for item in result:
if item == '':
result.remove(item)
return result
print(mysplit("To be or not to be, that is the question"))
print(mysplit("To be or not to be,that is the question"))
print(mysplit(" "))
print(mysplit(" abc "))
print(mysplit(""))
def mysplit(strng):
my_string = ''
liste = []
for x in range(len(strng)):
my_string += "".join(strng[x])
if strng[x] == ' ' or x+1 == len(strng):
liste.append(my_string.strip())
my_string = ''
liste = [elem for elem in liste if elem!='']
return liste
It is always a good idea to provide algorithm before coding:
This is the procedure for splitting words on delimiters without using any python built in method or function:
Initialize an empty list [] called result which will be used to save the resulting list of words, and an empty string called word = "" which will be used to concatenate each block of string.
Keep adding string characters as long as the delimiter is not reached
When you reach the delimiter, and len(word) = 0, Don't do whatever is below. Just go to the next iteration. This will help detecting and removing leading spaces.
When you reach the delimiter, and len(word) != 0, Append word to result, reinitialize word and jump to the next iteration without doing whatever is below
Return result
def my_split(s, delimiter = [" ","\t"]):
result,word = [], "" # Step 0
N = len(s)
for i in range(N) : #
if N == 0:# Case of empty string
return result
else: # Non empty string
if s[i] in delimiter and len(word) == 0: # Step 2
continue # Step 2: Skip, jump to the next iteration
if s[i] in delimiter and len(word) != 0: # Step 3
result.append(word) # Step 3
word = "" # Step 3
continue # Step 3: Skip, jump to the next iteration
word = word + s[i] # Step 1.
return result
print(my_split(" how are you? please split me now! "))
All the above answers are good, there is a similar solution with an extra empty list.
def my_split(s):
l1 = []
l2 = []
word = ''
spaces = ['', '\t', ' ']
for letters in s:
if letters != ' ':
word += letters
else:
l1.append(word)
word = ''
if word:
l1.append(word)
for words in l1:
if words not in spaces:
l2.append(words)
return l2
my_string = ' The old fox jumps into the deep river'
y = my_split(my_string)
print(y)

Using a For Loop to Change Words in Strings to List Items

I am trying to use a for loop to find every word in a string that contains exactly one letter e.
My guess is that I need to use a for loop to first separate each word in the string into its own list (for example, this is a string into ['this'], ['is'], ['a'], ['string'])
Then, I can use another For Loop to check each word/list.
My string is stored in the variable joke.
I'm having trouble structuring my For Loop to make each word into its own list. Any suggestions?
j2 = []
for s in joke:
if s[0] in j2:
j2[s[0]] = joke.split()
else:
j2[s[0]] = s[0]
print(j2)
This is a classic case for list comprehensions. To generate a list of words containing exactly one letter 'e', you would use the following source.
words = [w for w in joke.split() if w.count('e') == 1]
For finding words with exactly one letter 'e', use regex
import re
mywords = re.match("(\s)*[e](\s)*", 'this is your e string e')
print(mywords)
I would use Counter:
from collections import Counter
joke = "A string with some words they contain letters"
j2 = []
for w in joke.split():
d = Counter(w)
if 'e' in d.keys():
if d['e'] == 1:
j2.append(w)
print(j2)
This results in:
['some', 'they']
A different way to do it using numpy which is all against for:
s = 'Where is my chocolate pizza'
s_np = np.array(s.split())
result = s_np[np.core.defchararray.count(s_np, 'e').astype(bool)]
This is one way:
mystr = 'this is a test string'
[i for i in mystr.split() if sum(k=='e' for k in i) == 1]
# test
If you need an explicit loop:
result = []
for i in mystr:
if sum(k=='e' for k in i) == 1:
result.append(i)
sentence = "The cow jumped over the moon."
new_str = sentence.split()
count = 0
for i in new_str:
if 'e' in i:
count+=1
print(i)
print(count)

Categories

Resources