More accurate alternative to findline?

More accurate alternative to findline? - python

I have a list (words.txt) for which I need a method to search that is more exact than findline.
My current function (shown at the bottom) uses findline to search through the list. The problem is this: instead of returning an exact match, findline returns the first string that contains the whole word, regardless of whether there are better matches following it.
Example:
I enter 'BEES' and findline returns 'BAUBEES' because it is the first string to contain the sub-string ('BEES'). Of course, this completely ruins the function.
What I need is a function or (preferably) a built-in method that looks alphabetically for an exact match. So if 'BEES' is in the list (which I assure you it is), I want it to return 'BEES'. Or alternately, if 'BAUBEES' and 'BEESWAX' were the only substring matches in the list, the ideal function would return 'BEESWAX' if only because the second letter in 'BEES' is 'E' NOT 'A' (as in 'BAUBEES').
def iswholeword(word):
openfile = open('/media/Gianson/Python Programs/words.txt','r')
linz = openfile.readlines()[:]
openfile.close()
hit = findline(word,linz)[:]
print 'hit', hit
if len(hit)-1 == len(word):
return True
else:
return False

r = re.compile(r"\b%s" % re.escape(word))
for line in openfile:
hit = r.search(line)
if hit:
# whatever
Explanation: this build a regular expression from \b (word boundary) and the word under consideration, then searches for it in each line of the file. It finds the first word starting with word in the line and return an regexp match object.

Related

I have a problem with the task of reversing words and removing parentheses

Task
Write a program that will decode the secret message by reversing text
between square brackets. The message may contain nested brackets (that
is, brackets within brackets, such as One[owT[Three[ruoF]]]). In
this case, innermost brackets take precedence, similar to parentheses
in mathematical expressions, e.g. you could decode the aforementioned
example like this:
One[owT[Three[ruoF]]]
One[owT[ThreeFour]]
One[owTruoFeerhT]
OneThreeFourTwo
In order to make your own task slightly easier and less tricky, you
have already replaced all whitespaces in the original text with
underscores (“_”) while copying it from the paper version.
Input description
The first and only line of the standard input
consists of a non-empty string of up to 2 · 106 characters which may
be letters, digits, basic punctuation (“,.?!’-;:”), underscores (“_”)
and square brackets (“[]”). You can safely assume that all square
brackets are paired correctly, i.e. every opening bracket has exactly
one closing bracket matching it and vice versa.
Output description
The standard output should contain one line – the
decoded secret message without any square brackets.
Example
For sample input:
A[W_[y,[]]oh]o[dlr][!]
the correct output is:
Ahoy,_World!
Explanation
This example contains empty brackets. Of course, an empty string, when
reversed, remains empty, so we can simply ignore them. Then, as
previously, we can decode this example in stages, first reversing the
innermost brackets to obtain A[W_,yoh]o[dlr][!]. Afterwards, there
are no longer any nested brackets, so the remainder of the task is
trivial.
Below is my program that doesn't quite work
word = input("print something: ")
word_reverse = word[::-1]
while("[" in word and "]" in word):
open_brackets_index = word.index("[")
close_brackets_index = word_reverse.index("]")*(-1)-1
# print(word)
# print(open_brackets_index)
# print(close_brackets_index)
reverse_word_into_quotes = word[open_brackets_index+1:close_brackets_index:][::-1]
word = word[:close_brackets_index]
word = word[:open_brackets_index]
word = word+reverse_word_into_quotes
word = word.replace("[","]").replace("]","[")
print(word)
print(word)
Unfortunately my code only works with one pair of parentheses and I don't know how to fix it.
Thank you in advance for your help

Assuming the re module can be used, this code does the job:
import re
text = 'A[W_[y,[]]oh]o[dlr][!]'
# This scary regular expresion does all the work:
# It says find a sequence that starts with [ and ends with ] and
# contains anything BUT [ and ]
pattern = re.compile('\[([^\[\]]*)\]')
while True:
m = re.search(pattern, text)
if m:
# Here a single pattern like [String], if any, is replaced with gnirtS
text = re.sub(pattern, m[1][::-1], text, count=1)
else:
break
print(text)
Which prints this line:
Ahoy,_World!

I realize the my previous answer has been accepted but, for completeness, I'm submitting a second solution that does NOT use the re module:
text = 'A[W_[y,[]]oh]o[dlr][!]'
def find_pattern(text):
# Find [...] and return the locations of [ (start) ] (end)
# and the in-between str (content)
content = ''
for i,c in enumerate(text):
if c == '[':
content = ''
start = i
elif c == ']':
end = i
return start, end, content
else:
content += c
return None, None, None
while True:
start, end, content = find_pattern(text)
if start is None:
break
# Replace the content between [] with its reverse
text = "".join((text[:start], content[::-1], text[end+1:]))
print(text)

My code is incorrectly removing a strings from a larger string

"""
This code takes two strings and returns a copy of the first string with
all instances of the second string removed
"""
# This function removes the letter from the word in the event that the
# word has the letter in it
def remove_all_from_string(word, letter):
while letter in word:
find_word = word.find(letter)
word_length = len(word)
if find_word == -1:
continue
else:
word = word[:find_word] + word[find_word + word_length:]
return word
# This call of the function states the word and what letter will be
# removed from the word
print(remove_all_from_string("bananas", "an"))
This code is meant to remove a defined string from a larger define string. In this case the larger string is "bananas" and the smaller string which is removed is "an".
In this case the smaller string is removed multiple times. I believe I am very close to the solution of getting the correct output, but I need the code to output "bas". Instead, it outputs "ba".
The code is supposed to remove all instances of "an" and print whatever is left, however it does not do this. Any help is appreciated.

Your word_length should be len(letter), and as the while ensures the inclusion, don't need to test the value of find_word
def remove_all_from_string(word, replacement):
word_length = len(replacement)
while replacement in word:
find_word = word.find(replacement)
word = word[:find_word] + word[find_word + word_length:]
return word
Note that str.replace exists
def remove_all_from_string(word, replacement):
return word.replace(replacement, "")

You can simply use the .replace() function for python strings.
def remove_all_from_string(word, letter):
word = word.replace(letter, "")
return word
print(remove_all_from_string("bananas", "an"))
Output: bas

The Python language has built-in utilities to do that in a single expression.
The fact that you need to do that, indicates you are doing sme exercise to better understand coding, and that is important. (Hint: to do it in a single glob, just use the string replace method)
So, first thing - avoid using built-in tools that perform more than basic tasks - in this case, in your tentative code, you are using the string find method. It is powerful, but combining it to find and remove all occurrences of a sub-string is harder than doing so step by step.
So, what ou need is to have variables to annotate the state of your search, and your result. Variables are "free" - do not hesitate in creating as many, and updating then inside the proper if blocks to keep track of your solution.
In this case, you can start with a "position = 0", and increase this "0" until you are at the end of the parent string. You check the character at that position - if it does match the starting character of your substring, you update other variables indicating you are "inside a match", and start a new "position_at_substring" index - to track the "matchee". If at any point the character in the main string does not correspond to the character on the substring: not an occurrence, you bail out (and copy the skipped charactrs to your result -therefore you also have to accumulate all skipped characters in a "match_check" substring) .
Build your code with the simplest 'while', 'if' and variable updates - stick it all inside a function, so that whenever it works, you can reuse it at will with no effort, and you will have learned a lot.

Substring replacements based on replace and no-replace rules

I have a string and rules/mappings for replacement and no-replacements.
E.g.
"This is an example sentence that needs to be processed into a new sentence."
"This is a second example sentence that shows how 'sentence' in 'sentencepiece' should not be replaced."
Replacement rules:
replace_dictionary = {'sentence': 'processed_sentence'}
no_replace_set = {'example sentence'}
Result:
"This is an example sentence that needs to be processed into a new processed_sentence."
"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."
Additional criteria:
Only replace if case is matched, i.e. case matters.
Whole words replacement only, interpunction should be ignored, but kept after replacement.
I was thinking what would the cleanest way to solve this problem in Python 3.x be?

Based on the answer of demongolem.
UPDATE
I am sorry, I missed the fact, that only whole words should be replaced. I updated my code and even generalized it for usage in a function.
def replace_whole(sentence, replace_token, replace_with, dont_replace):
rx = f"[\"\'\.,:; ]({replace_token})[\"\'\.,:; ]"
iter = re.finditer(rx, sentence)
out_sentence = ""
found = []
indices = []
for m in iter:
indices.append(m.start(0))
found.append(m.group())
context_size=len(dont_replace)
for i in range(len(indices)):
context = sentence[indices[i]-context_size:indices[i]+context_size]
if dont_replace in context:
continue
else:
# First replace the word only in the substring found
to_replace = found[i].replace(replace_token, replace_with)
# Then replace the word in the context found, so any special token like "" or . gets taken over and the context does not change
replace_val = context.replace(found[i], to_replace)
# finally replace the context found with the replacing context
out_sentence = sentence.replace(context, replace_val)
return out_sentence
Use regular expressions for finding all occurences and values of your string (as we need to check whether is a whole word or embedded in any kind of word), by using finditer(). You might need to adjust the rx to what your definition of "whole word" is. Then get the context around these values of the size of your no_replace rule. Then check, whether the context contains your no_replace string.
If not, you may replace it, by using replace() for the word only, then replace the occurence of the word in the context, then replace the context in the whole text. That way the replacing process is nearly unique and no weird behaviour should happen.
Using your examples, this leads to:
replace_whole(sen2, "sentence", "processed_sentence", "example sentence")
>>>"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."
and
replace_whole(sen1, "sentence", "processed_sentence", "example sentence")
>>>'This is an example sentence that needs to be processed into a new processed_sentence.'

After some research, this is what I believe to be the best and cleanest solution to my problem. The solution works by calling the match_fun whenever a match has been found, and the match_fun only performs the replacement, if and only if, there is no "no-replace-phrase" overlapping with the current match. Let me know if you need more clarification or if you believe something can be improved.
replace_dict = ... # The code below assumes you already have this
no_replace_dict = ...# The code below assumes you already have this
text = ... # The text on input.
def match_fun(match: re.Match):
str_match: str = match.group()
if str_match not in cls.no_replace_dict:
return cls.replace_dict[str_match]
for no_replace in cls.no_replace_dict[str_match]:
no_replace_matches_iter = re.finditer(r'\b' + no_replace + r'\b', text)
for no_replace_match in no_replace_matches_iter:
if no_replace_match.start() >= match.start() and no_replace_match.start() < match.end():
return str_match
if no_replace_match.end() > match.start() and no_replace_match.end() <= match.end():
return str_match
return cls.replace_dict[str_match]
for replace in cls.replace_dict:
pattern = re.compile(r'\b' + replace + r'\b')
text = pattern.sub(match_fun, text)

How to check subsequent elements of string in python using iterators?

I have a sentence that I want to parse to check for some conditions:
a) If there is a period and it is followed by a whitespace followed by a lowercase letter
b) If there is a period internal to a sequence of letters with no adjacent whitespace (i.e. www.abc.com)
c) If there is a period followed by a whitespace followed by an uppercase letter and preceded by a short list of titles (i.e. Mr., Dr. Mrs.)
Currently I am iterating through the string (line) and using the next() function to see whether the next character is a space or lowercase, etc. And then I just loop through the line. But how would I check to see what the next, next character would be? And how would I find the previous ones?
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
Any help would be appreciated. Thank you.

Regular expressions is what you want.
Since your going to check for a pattern in a string, you can make use of the python's builtin support for regular expressions through re library.
Example:
#To check if there is a period internal to a sequence of letters with no adjacent whitespace
import re
str = 'www.google.com'
pattern = '.*\..*'
obj = re.compile(pattern)
if obj.search(str):
print "Pattern matched"
Similarly generate patterns for the conditions you want to check in your string.
#If there is a period and it is followed by a whitespace followed by a lowercase letter
regex = '.*\. [a-z].*'
You can generate and test your regular expressions online using this simple tool
Read more extensively about re library here

You can use multiple next operations to get more data
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
c = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
You can get previous ones by saving your iterations to a temporary list

deleting letters from strings without string methods or imports?

This is a homework question. I need to define a function that takes a word and letter and deletes all occurrences of that letter in the word. I can't use stuff like regex or the string library. I've tried...
def delete(word,letter):
word = []
char = ""
if char != letter:
word+=char
return word
and
def delete(word,letter):
word = []
char = ""
if char != letter: #I also tried "if char not letter" for both
word = word.append(char)
return word
Both don't give any output. What am I doing wrong?

Well, look at your functions closely:
def delete(word,letter):
word = []
char = ""
if char != letter:
word+=char # or `word = word.append(char)` in 2nd version
return word
So, the function gets a word and a letter passed in. The first thing you do is throw away the word, because you are overwriting the local variable with a different value (a new empty list). Next, you are initializing an empty string char and compare its content (it’s empty) with the passed letter. If they are not equal, i.e. if letter is not an empty string, the empty string in char is added to the (empty list) word. And then word is returned.
Also note that you cannot add a string to a list. The + operation on lists is only implemented to combine two lists, so your append version is definitelly less wrong. Given that you want a string as a result, it makes more sense to just store the result as one to begin with.
Instead of adding an empty string to an empty string/list when something completely unrelated to the passed word happens, what you rather want to do is keep the original word intact and somehow look at each character. You basically want to loop through the word and keep all characters that are not the passed letter; something like this:
def delete(word, letter):
newWord = '' # let's not overwrite the passed word
for char in word:
# `char` is now each character of the original word.
# Here you now need to decide if you want to keep the
# character for `newWord` or not.
return newWord
The for var in something will basically take the sequence something and execute the loop body for each value of that sequence, identified using the variable var. Strings are sequences of characters, so the loop variable will contain a single character and the loop body is executed for each character within the string.

You're not doing anything with word passed to your function. Ultimately, you need to iterate over the word passed into your function (for character in word: doSomething_with_character) and build your output from that.

def delete(word, ch):
return filter(lambda c: c != ch, word)
Basically, just a linear pass over the string, dropping out letters that match ch.
filter takes a higher order function and an iterable. A string is an iterable and iterating over it iterates over the characters it contains. filter removes the elements from the iterable for which the higher order function returns False.
In this case, we filter out all characters that are equal to the passed ch argument.

I like the functional style #TC1 and #user2041448 that is worth understanding. Here's another implementation:
def delete( letter, string ):
s2 = []
for c in string:
if c!=letter:
s2.append( c )
return ''.join(s2)

Your first function uses + operator with a list which probably isn't the most appropriate choice. The + operator should probably be reserved for strings (and use .append() function with lists).
If the intent is to return a string, assign "" instead of [], and use + operators.
If the intent is to return a list of characters assign [], and use .append() function.
Change the name of the variable you are using to construct the returned value.
Assigning anything to word gets rid of the content that was given to the function as an argument.
so make it result=[] OR result="" etc..
ALSO:
the way you seem to be attempting to solve this requires you to loop over the characters in the original string, the code you posted does not loop at all.
you could use a for loop with this type of semantic:
for characterVar in stringVar:
controlled-code-here
code-after-loop
you can/should change the names of course, but i named them in a way that should help you understand. In your case stringVar would be replaced with word and you would append or add characterVar to result if it isn't the deleted character. Any code that you wish to be contained in the loop must be indented. the first unindented line following the control line indicates to python that the code comes AFTER the loop.

This is what I came up with:
def delete(word, letter):
new_word = ""
for i in word:
if i != letter:
new_word += i
return new_word

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.