python given query string find a set of strings with same beginning

python given query string find a set of strings with same beginning - python

Edit: I appreciate all the answers but could anyone tell me why my solution is not working? I wanted to try to do this without the .startswith() thank you!
I am trying to complete this excercise:
Implement an autocomplete system. That is, given a query string and a set of all possible query strings,
return all strings in the set that have s as a prefix.
For example, given the query string de and the set of strings [dog, deer, deal], return [deer, deal].
Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.
But I get a empty list. What could I be doing wrong? I thought this would give me [deer, deal]
def autocomplete(string,set):
string_letters = []
letter_counter = 0
list_to_return = []
for letter in string:
string_letters.append(letter)
for words in set:
for letter in words:
if letter_counter == len(string):
list_to_return.append(words)
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
output:
[]
Edit: I appreciate all the answers but could anyone tell me why my solution is not working? I wanted to try to do this without the .startswith() thank you!

Here is how I would accomplish what you are trying to do:
import re
strings = ['dog', 'deer', 'deal']
search = 'de'
pattern = re.compile('^' + search)
[x for x in strings if pattern.match(x)]
RESULT: ['deer', 'deal']
However in most cases with a use case such as this, you might want to ignore the case of the search string and search field.
import re
strings = ['dog', 'Deer', 'deal']
search = 'De'
pattern = re.compile('^' + search, re.IGNORECASE)
[x for x in strings if pattern.match(x)]
RESULT: ['Deer', 'deal']
To answer the part of why your code does not work, it helps to add some verbosity to the code:
def autocomplete(string,set):
string_letters = []
letter_counter = 0
list_to_return = []
for letter in string:
string_letters.append(letter)
for word in set:
print(word)
for letter in word:
print(letter, letter_counter, len(string))
if letter_counter == len(string):
list_to_return.append(word)
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
print('hit break')
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))
Output:
dog
('d', 0, 2)
('o', 1, 2)
hit break
deer
('d', 1, 2)
hit break
deal
('d', 1, 2)
hit break
[]
As you can see in the output for dog 'd matched but o did not', this made the letter_counter 1, then upon deer 'd != 'e' so it breaks... This perpetuates over and over. Interestingly setting 'ddeer' would actually match due this behavior. To fix this you need to reset the letter_counter in the for loop, and have additional break points to prevent over-reving your indexes.
def autocomplete(string,set):
string_letters = []
list_to_return = []
for letter in string:
string_letters.append(letter)
for word in set:
# Reset letter_counter as it is only relevant to this word.
letter_counter = 0
print(word)
for letter in word:
print(letter, letter_counter, len(string))
if letter == string_letters[letter_counter]:
letter_counter += 1
else:
# We did not match break early
break
if letter_counter == len(string):
# We matched for all letters append and break.
list_to_return.append(word)
break
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))

I notice the hint, but it's not stated as a requirement, so:
def autocomplete(string,set):
return [s for s in set if s.startswith(string)]
print(autocomplete("de", ["dog","deer","deal"]))
str.startswith(n) will return a boolean value, True if the str starts with n, otherwise, False.

You can just use the startswith string function and avoid all those counters, like this:
def autocomplete(string, set):
list_to_return = []
for word in set:
if word.startswith(string):
list_to_return.append(word)
return list_to_return
print(autocomplete("de", ["dog","deer","deal"]))

Simplify.
def autocomplete(string, set):
back = []
for elem in set:
if elem.startswith(string[0]):
back.append(elem)
return back
print(autocomplete("de", ["dog","deer","deal","not","this","one","dasd"]))

Related

Comparing the Nth letter to Nth letters of multiple strings in python

I can't quite figure this one out.
I have multiple five letter long strings and I want to compare each of the letters of the strings to a single string, and then to know if any of the Nth letters of the strings are equal to the Nth letter of the string I'm comparing them to, like this:
string_1 = 'ghost'
string_2 = 'media'
string_3 = 'blind'
the_word = 'shine'
if the_word[0] == string_1[0] or the_word[0] == string_2[0] or the_word[0] == string_3[0] or the_word[1] == string_1[1] or the_word[1] == string_2[1]... and so on...
print('The Nth letter of some of the strings is equal to the Nth letter of the_word')
else:
print('None of the letters positions correspond')
If there are multiple strings I want to compare the if statement gets very long so there must be a better way of doing this.
I would also like to know what the corresponding letters are (in this case they would be H (string_1[1] == the_word[1]), I (string_3[2] == the_word[2]) and N (string_3[3] == the_word[3])
If there are more than one corresponding letters I would like the return to be list containing all of the letters.
Also I dont need to know if the corresponding letter was the first or whatever the letters position in the word is, only if there are any (and what) corresponding letters.
I find this kind of hard to explain so sorry for possible confusion, will be happy to elaborate.
Thank you!

IIUC, you can get to what you want using zip -
base_strings = zip(string_1, string_2, string_3)
for cmp_pair in zip(the_word, base_strings):
if (cmp_pair[0] in cmp_pair[1]):
print(cmp_pair[0])
Output
h
i
n

You can extract the logic to a dedicated function and call it over each character of the string to be checked:
string_1 = 'ghost'
string_2 = 'media'
string_3 = 'blind'
the_word = 'shine'
def check_letter(l, i, words):
match = []
for w in words:
if w[i] == l:
match.append(w)
return match
for i in range(len(the_word)):
l = the_word[i]
print("checking letter: {}".format(l))
match = check_letter(l, i, [string_1, string_2, string_3])
if (len(match) > 0):
print("found in: {}".format(match))
else:
print("found in: -")
The above code results in:
$ python3 test.py
checking letter: s
found in: -
checking letter: h
found in: ['ghost']
checking letter: i
found in: ['blind']
checking letter: n
found in: ['blind']
checking letter: e
found in: -

Maybe this answers your question:
strings = ['ghost', 'media', 'blind']
the_word = 'shine'
for s in strings:
check = []
lett = []
for i in range(len(s)):
if s[i] == the_word[i]:
check.append(i)
lett.append(s[i])
if check:
print('The letters {0} (position {1}) of the string {2} match to
the word {3}'.format(lett,check,s,the_word))
else:
print('No match between {0} and {1}'.format(s,the_word))

Well one straight forward way would be the following:
string_1 = 'ghost'
string_2 = 'media'
string_3 = 'blind'
string_4 = 'trenn'
the_word = 'shine'
string_list = [string_1, string_2, string_3]
duplicate_letters_list = []
for string in string_list:
for i in range(5):
if the_word[i] == string[i]:
print(f'{i}th letter is in {string} is a duplicate')
if the_word[i] not in duplicate_letters_list:
duplicate_letters_list.append(the_word[i])
print(duplicate_letters_list)
Output
1th letter is in ghost is a duplicate
2th letter is in blind is a duplicate
3th letter is in blind is a duplicate
['h', 'i', 'n']

How to remove Triplicate Letters in Python

So I'm a little confused as far as putting this small code together. My teacher gave me this info:
Iterate over the string and remove any triplicated letters (e.g.
"byeee mmmy friiiennd" becomes "bye my friennd"). You may assume any
immediate following same letters are a triplicate.
I've mostly only seen examples for duplicates, so how do I remove triplicates? My code doesn't return anything when I run it.
def removeTriplicateLetters(i):
result = ''
for i in result:
if i not in result:
result.append(i)
return result
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()

I have generalized the scenario with "n". In your case, you can pass n=3 as below
def remove_n_plicates(input_string, n):
i=0
final_string = ''
if not input_string:
return final_string
while(True):
final_string += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return final_string
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd
You can use this for any "n" value now (where n > 0 and n < length of input string)

Your code returns an empty string because that's exactly what you coded:
result = ''
for i in result:
...
return result
Since result is an empty string, you don't enter the loop at all.
If you did enter the loop you couldn't return anything:
for i in result:
if i not in result:
The if makes no sense: to get to that statement, i must be in result
Instead, do as #newbie showed you. Iterate through the string, looking at a 3-character slice. If the slice is equal to 3 copies of the first character, then you've identified a triplet.
if input_string[i:i+n] == input_string[i]*n:

Without going in to writing the code to resolve the problem.
When you iterate over the string, add that iteration to a new string.
If the next iteration is the same as the previous iteration then do not add that to the new string.
This will catch both the triple and the double characters in your problem.
Tweaked a previous answer to remove a few lines that were not needed.
def remove_n_plicates(input_string, n):
i=0
result = ''
while(True):
result += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return result
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd

Here's a fun way using itertools.groupby:
def removeTriplicateLetters(s):
return ''.join(k*(l//3+l%3) for k,l in ((k,len(list(g))) for k, g in groupby(s)))
>>> removeTriplicateLetters('byeee mmmy friiiennd')
'bye my friennd'

just modifying #newbie solution and using stack data structure as solution
def remove_n_plicates(input_string, n):
if input_string =='' or n<1:
return None
w = ''
c = 0
if input_string!='':
tmp =[]
for i in range(len(input_string)):
if c==n:
w+=str(tmp[-1])
tmp=[]
c =0
if tmp==[]:
tmp.append(input_string[i])
c = 1
else:
if input_string[i]==tmp[-1]:
tmp.append(input_string[i])
c+=1
elif input_string[i]!=tmp[-1]:
w+=str(''.join(tmp))
tmp=[input_string[i]]
c = 1
w+=''.join(tmp)
return w
input_string = "byeee mmmy friiiennd nnnn"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
output
bye my friennd nn

so this is a bit dirty but it's short and works
def removeTriplicateLetters(i):
result,string = i[:2],i[2:]
for k in string:
if result[-1]==k and result[-2]==k:
result=result[:-1]
else:
result+=k
return result
print(removeTriplicateLetters('byeee mmmy friiiennd'))
bye my friennd

You have already got a working solution. But here, I come with another way to achieve your goal.
def removeTriplicateLetters(sentence):
"""
:param sentence: The sentence to transform.
:param words: The words in the sentence.
:param new_words: The list of the final words of the new sentence.
"""
words = sentence.split(" ") # split the sentence into words
new_words = []
for word in words: # loop through words of the sentence
new_word = []
for char in word: # loop through characters in a word
position = word.index(char)
if word.count(char) >= 3:
new_word = [i for i in word if i != char]
new_word.insert(position, char)
new_words.append(''.join(new_word))
return ' '.join(new_words)
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()
Output: bye my friennd

How to access each string in the lists of list

I was given a task to input multiple lines each consisting of multiple words.The task is to uppercase the words with an odd length and lowercase the words with an
even length.
My code now looks like this, can you help me to solve it right?
first = []
while True:
line = input().split()
first.append(line)
if len(line) < 1:
break
for i in first:
for j in i:
if len(line[i][j]) % 2 == 0:
line[i][j] = line[i][j].lower()
elif len(line[i][j]) % 2 != 0:
line[i][j] = line[i][j].upper()
print(first[i])
it should look like this

i and j are not an indexes, they are the sublists and words themselves.You can do:
for i in first: # i is a list of strings
for j in range(len(i)): # you do need the index to mutate the list
if len(i[j]) % 2 == 0:
i[j] = i[j].lower()
else:
i[j] = i[j].upper()
print(' '.join(i))

So looking at the input output in your image, here is a better solution
sentences = []
while True:
word_list = input().split()
sentences = [*sentences, word_list]
if len(word_list) < 1:
break
So now that you have your input from command line you can do
[word.upper() if len(word)%2 == 1 else word.lower() for word_list in sentences for word in word_list]
or you could extract into a function
def apply_case(word):
if len(word)%2:
return word.upper()
return word.lower()
new_sentences = [apply_case(word) for word_list in sentences for word in word_list]
now you can print it like
output = "\n".join([" ".join(word_list) for word_list in new_sentences])
print(output)

You forgot to join the lines back together. Furthermore from a software design point of view, you are doing to much in the code fragment: it is better to encapsulate the functionalities in functions, like:
def wordcase(word):
if len(word) % 2 == 0: # even
return word.lower()
else: # odd
return word.upper()
Then we can even perform the processing "online" (as in line-per-line):
while True:
line = input()
if not line:
break
else:
print(' '.join(wordcase(word) for word in line.split()))

I don't think you need do be using i or j. You can just loop over the words in your string.
Further, although it probably won't speed things up, you don't need the elif, you can just use an else. There are only two options, odd and even so you only need to check it once.
sentance = 'I am using this as a test string with many words'
wordlist = sentance.split()
fixed_wordlist = []
for word in wordlist:
if len(word)%2==0:
fixed_wordlist.append(word.lower())
else:
fixed_wordlist.append(word.upper())
print(sentance, '\n', wordlist, '\n', fixed_wordlist)

Substring of a string from a point where character starts to repeat

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.
The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.
def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
if word[i] not in check_char:
temp_w += word[i]
check_char[word[i]] = i
else:
my_l.append(temp_w)
temp_w=''
i = check_char[word[i]]
check_char.pop(word[i])
i+=1
return my_l
print(checkrepeat('college'))

This may not be best practice, but it seems functional:
def checkrepeat(word):
for letter in set(word):
split_word = []
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
print split_word
checkrepeat('college')
set(word) gives us a list of the unique characters in word. We create an empty list (split_word) to maintain the separate sections of the word. count lets us count the number of times a letter appears in a word - we want to split our word until every substring contains the given letter only once.
We iterate over a copy of word (as we need to repeat the exercise for each duplicated letter, thus don't want to tamper with the original word variable), and add the end-section of the word from our letter onwards to the start of our list. We repeat this until copyword only has our letter in it once, at which point we exit the while loop. The remaining characters of copyword must be added to the start of our list, and we print the word given. This example prints:
['colleg', 'e']
['col', 'lege']

EDIT2 - The Working Solution, that's semi-elegant and almost Pythonic:
def split_on_recursion(your_string, repeat_character): #Recursive function
temp_string = ''
i = 0
for character in your_string:
if repeat_character == character:
if i==1:
return split_on_recursion(temp_string, repeat_character) #Recursion
else:
i += 1
temp_string += character
return temp_string
def split_on_repeat(your_string):
temp_dict = {}
your_dict = {}
your_end_strings = []
for char in set(your_string):
temp_dict[char] = your_string.count(char) #Notice temp_dict
for key in temp_dict:
if temp_dict[key] >= 2:
your_dict[key] = temp_dict[key] #Isolate only the characters which repeat
if your_dict != {}:
for key in your_dict:
pre_repeat_string = split_on_recursion(your_string,key)
post_repeat_string = your_string.replace(pre_repeat_string,'')
your_end_strings.append((pre_repeat_string, post_repeat_string))
else:
your_end_strings = [(your_string)]
return your_end_strings
Use:
>>> print(split_on_repeat('Innocent'))
[('In', 'nocent')]
>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]
>>> print(split_on_repeat('Python.py'))
[('Python.p', 'y')]
>>> print(split_on_repeat('Systems'))
[('System', 's')]
As is the case, the solution is case-sensitive, but that is a minor issue.
To fathom the solution, though, you need to understand how recursions work. If you don't, this might not be a great example; I would recommend people to start with math problems.
But here's some quick context about how indexing works in python:
'Word'[:1] == 'Wo'
'Word'[-1] == 'd'
'Word'[:-1] == 'Wor'
This indexing works for every object that is indexable.

Solution derived from original #asongtoruin's idea:
import collections
def checkrepeat(word):
out = collections.defaultdict(int)
for c in word:
out[c] += 1
out = {k: [] for (k, v) in out.items() if v > 1}
for letter, split_word in out.iteritems():
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
return out
for word in ["bloomberg", "college", "systems"]:
print checkrepeat(word)
Output:
{'b': ['bloom', 'berg'], 'o': ['blo', 'omberg']}
{'e': ['colleg', 'e'], 'l': ['col', 'lege']}
{'s': ['sy', 'stem', 's']}

def split_repeated(string):
visited = set()
res = []
for i, c in enumerate(string):
if c in visited: res.append([string[0:i], string[i:]])
visited.add(c)
return res
Output:
split_repeated("college")
#=> [['col', 'lege'], ['colleg', 'e']]
split_repeated("hello world")
#=> [['hel', 'lo world'], ['hello w', 'orld'], ['hello wor', 'ld']]
If you need to split a string only when you meet repeated letter first time:
def split_repeated_unique(string):
visited = set()
shown = set()
res = []
for i, c in enumerate(string):
if c in visited:
if c not in shown:
res.append([string[0:i], string[i:]])
shown.add(c)
else:
visited.add(c)
return res
And the key difference is following:
split_repeated("Hello, Dolly")
#=> [('Hel', 'lo, Dolly'), ('Hello, D', 'olly'), ('Hello, Do', 'lly'), ('Hello, Dol', 'ly')]
split_repeated_unique("Hello, Dolly")
#=> [['Hel', 'lo, Dolly'], ['Hello, D', 'olly']]

Swapping uppercase and lowercase in a string [duplicate]

This question already has answers here:
How can I invert (swap) the case of each letter in a string?
(8 answers)
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
Closed 6 months ago.
I would like to change the chars of a string from lowercase to uppercase.
My code is below, the output I get with my code is a; could you please tell me where I am wrong and explain why?
Thanks in advance
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
for word in words:
if word.isupper() == True:
return word.lower()
else:
return word.upper()
print to_alternating_case(test)

If you want to invert the case of that string, try this:
>>> 'AltERNating'.swapcase()
'aLTernATING'

There are two answers to this: an easy one and a hard one.
The easy one
Python has a built in function to do that, i dont exactly remember what it is, but something along the lines of
string.swapcase()
The hard one
You define your own function. The way you made your function is wrong, because
iterating over a string will return it letter by letter, and you just return the first letter instead of continuing the iteration.
def to_alternating_case(string):
temp = ""
for character in string:
if character.isupper() == True:
temp += character.lower()
else:
temp += word.upper()
return temp

Your loop iterates over the characters in the input string. It then returns from the very first iteration. Thus, you always get a 1-char return value.
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
rval = ''
for c in words:
if word.isupper():
rval += c.lower()
else:
rval += c.upper()
return rval
print to_alternating_case(test)

That's because your function returns the first character only. I mean return keyword breaks your for loop.
Also, note that is unnecessary to convert the string into a list by running words = list(string) because you can iterate over a string just as you did with the list.
If you're looking for an algorithmic solution instead of the swapcase() then modify your method this way instead:
test = "AltERNating"
def to_alternating_case(string):
res = ""
for word in string:
if word.isupper() == True:
res = res + word.lower()
else:
res = res + word.upper()
return res
print to_alternating_case(test)

You are returning the first alphabet after looping over the word alternating which is not what you are expecting. There are some suggestions to directly loop over the string rather than converting it to a list, and expression if <variable-name> == True can be directly simplified to if <variable-name>. Answer with modifications as follows:
test = "AltERNating"
def to_alternating_case(string):
result = ''
for word in string:
if word.isupper():
result += word.lower()
else:
result += word.upper()
return result
print to_alternating_case(test)
OR using list comprehension :
def to_alternating_case(string):
result =[word.lower() if word.isupper() else word.upper() for word in string]
return ''.join(result)
OR using map, lambda:
def to_alternating_case(string):
result = map(lambda word:word.lower() if word.isupper() else word.upper(), string)
return ''.join(result)

You should do that like this:
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
newstring = ""
if word.isupper():
newstring += word.lower()
else:
newstring += word.upper()
return alternative
print to_alternating_case(test)

def myfunc(string):
i=0
newstring=''
for x in string:
if i%2==0:
newstring=newstring+x.lower()
else:
newstring=newstring+x.upper()
i+=1
return newstring

contents='abcdefgasdfadfasdf'
temp=''
ss=list(contents)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)
you can add this code inside a function also and in place of print use the return key

string=input("enter string:")
temp=''
ss=list(string)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)

Here is a short form of the hard way:
alt_case = lambda s : ''.join([c.upper() if c.islower() else c.lower() for c in s])
print(alt_case('AltERNating'))
As I was looking for a solution making a all upper or all lower string alternating case, here is a solution to this problem:
alt_case = lambda s : ''.join([c.upper() if i%2 == 0 else c.lower() for i, c in enumerate(s)])
print(alt_case('alternating'))

You could use swapcase() method
string_name.swapcase()
or you could be a little bit fancy and use list comprehension
string = "thE big BROWN FoX JuMPeD oVEr thE LAZY Dog"
y = "".join([val.upper() if val.islower() else val.lower() for val in string])
print(y)
>>> 'THe BIG brown fOx jUmpEd OveR THe lazy dOG'

This doesn't use any 'pythonic' methods and gives the answer in a basic logical format using ASCII :
sentence = 'aWESOME is cODING'
words = sentence.split(' ')
sentence = ' '.join(reversed(words))
ans =''
for s in sentence:
if ord(s) >= 97 and ord(s) <= 122:
ans = ans + chr(ord(s) - 32)
elif ord(s) >= 65 and ord(s) <= 90 :
ans = ans + chr(ord(s) + 32)
else :
ans += ' '
print(ans)
So, the output will be : Coding IS Awesome

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python given query string find a set of strings with same beginning - python

I notice the hint, but it's not stated as a requirement, so: def autocomplete(string,set): return [s for s in set if s.startswith(string)] print(autocomplete("de", ["dog","deer","deal"])) str.startswith(n) will return a boolean value, True if the str starts with n, otherwise, False.

You can just use the startswith string function and avoid all those counters, like this: def autocomplete(string, set): list_to_return = [] for word in set: if word.startswith(string): list_to_return.append(word) return list_to_return print(autocomplete("de", ["dog","deer","deal"]))

Simplify. def autocomplete(string, set): back = [] for elem in set: if elem.startswith(string[0]): back.append(elem) return back print(autocomplete("de", ["dog","deer","deal","not","this","one","dasd"]))

Related

Comparing the Nth letter to Nth letters of multiple strings in python

How to remove Triplicate Letters in Python

How to access each string in the lists of list

Substring of a string from a point where character starts to repeat

Swapping uppercase and lowercase in a string [duplicate]

Categories

Resources