Replacing each match with a different word - python

I have a regular expression like this:
findthe = re.compile(r" the ")
replacement = ["firstthe", "secondthe"]
sentence = "This is the first sentence in the whole universe!"
What I am trying to do is to replace each occurrence with an associated replacement word from a list so that the end sentence would look like this:
>>> print sentence
This is firstthe first sentence in secondthe whole universe
I tried using re.sub inside a for loop enumerating over replacement but it looks like re.sub returns all occurrences. Can someone tell me how to do this efficiently?

If it is not required to use regEx than you can try to use the following code:
replacement = ["firstthe", "secondthe"]
sentence = "This is the first sentence in the whole universe!"
words = sentence.split()
counter = 0
for i,word in enumerate(words):
if word == 'the':
words[i] = replacement[counter]
counter += 1
sentence = ' '.join(words)
Or something like this will work too:
import re
findthe = re.compile(r"\b(the)\b")
print re.sub(findthe, replacement[1],re.sub(findthe, replacement[0],sentence, 1), 1)
And at least:
re.sub(findthe, lambda matchObj: replacement.pop(0),sentence)

Artsiom's last answer is destructive of replacement variable. Here's a way to do it without emptying replacement
re.sub(findthe, lambda m, r=iter(replacement): next(r), sentence)

You can use a callback function as the replace parameter, see how at:
http://docs.python.org/library/re.html#re.sub
Then use some counter and replace depending on the counter value.

Related

How to remove a group of characters from a string in python including shifting

For instance I have a string that needs a certain keyword removed from a string so lets say may key word iswhatthemomooofun and I want to delete the word moo from it, I have tried using the remove function but it only removes "moo" one time, but now my string is whatthemoofun and I can seem to remove it is there anyway I can do that?
You can use the replace in built function
def str_manipulate(word, key):
while key in word:
word = word.replace(key, '')
return word
Have you tried using a while loop? You could loop through it until it doesn't find your keyword anymore then break out.
Edit
Cause answer could be improved by an example, take a look at the general approach it deals with:
Example
s = 'whatthemomooofun'
while 'moo' in s:
s= s.replace('moo','')
print(s)
Output
whatthefun
original_string = "!(Hell#o)"
characters_to_remove = "!()#"
new_string = original_string
for character in characters_to_remove:
new_string = new_string.replace(character, "")
print(new_string)
OUTPUT
Hello

How to avoid .replace replacing a word that was already replaced

Given a string, I have to reverse every word, but keeping them in their places.
I tried:
def backward_string_by_word(text):
for word in text.split():
text = text.replace(word, word[::-1])
return text
But if I have the string Ciao oaiC, when it try to reverse the second word, it's identical to the first after beeing already reversed, so it replaces it again. How can I avoid this?
You can use join in one line plus generator expression:
text = "test abc 123"
text_reversed_words = " ".join(word[::-1] for word in text.split())
s.replace(x, y) is not the correct method to use here:
It does two things:
find x in s
replace it with y
But you do not really find anything here, since you already have the word you want to replace. The problem with that is that it starts searching for x from the beginning at the string each time, not at the position you are currently at, so it finds the word you have already replaced, not the one you want to replace next.
The simplest solution is to collect the reversed words in a list, and then build a new string out of this list by concatenating all reversed words. You can concatenate a list of strings and separate them with spaces by using ' '.join().
def backward_string_by_word(text):
reversed_words = []
for word in text.split():
reversed_words.append(word[::-1])
return ' '.join(reversed_words)
If you have understood this, you can also write it more concisely by skipping the intermediate list with a generator expression:
def backward_string_by_word(text):
return ' '.join(word[::-1] for word in text.split())
Splitting a string converts it to a list. You can just reassign each value of that list to the reverse of that item. See below:
text = "The cat tac in the hat"
def backwards(text):
split_word = text.split()
for i in range(len(split_word)):
split_word[i] = split_word[i][::-1]
return ' '.join(split_word)
print(backwards(text))

substring with a small change

I'm trying to solve this problem were they give me a set of strings where to count how many times a certain word appears within a string like 'code' but the program also counts any variant where the 'd' changes like 'coze' but something like 'coz' doesn't count this is what I made:
def count(word):
count=0
for i in range(len(word)):
lo=word[i:i+4]
if lo=='co': # this is what gives me trouble
count+=1
return count
Test if the first two characters match co and the 4th character matches e.
def count(word):
count=0
for i in range(len(word)-3):
if word[i:i+1] == 'co' and word[i+3] == 'e'
count+=1
return count
The loop only goes up to len(word)-3 so that word[i+3] won't go out of range.
You could use regex for this, through the re module.
import re
string = 'this is a string containing the words code, coze, and coz'
re.findall(r'co.e', string)
['code', 'coze']
from there you could write a function such as:
def count(string, word):
return len(re.findall(word, string))
Regex is the answer to your question as mentioned above but what you need is a more refined regex pattern. since you are looking for certain word appears you need to search for boundary words. So your pattern should be sth. like this:
pattern = r'\bco.e\b'
this way your search will not match with the words like testcodetest or cozetest but only match with code coze coke but not leading or following characters
if you gonna test for multiple times, then it's better to use a compiled pattern, that way it'd be more memory efficient.
In [1]: import re
In [2]: string = 'this is a string containing the codeorg testcozetest words code, coze, and coz'
In [3]: pattern = re.compile(r'\bco.e\b')
In [4]: pattern.findall(string)
Out[4]: ['code', 'coze']
Hope that helps.

Trying to make sure certain symbols aren't in a word

I currently have the following to filter words with square and normal brackets and can't help but think there must be a tidier way to do this..
words = [word for word in random.choice(headlines).split(" ")[1:-1] if "[" not in word and "]" not in word and "(" not in word and ")" not in word]
I tried creating a list or tuple of symbols and doing
if symbol not in word
But it dies because I'm comparing a list with a string. I appreciate I could explode this out and do a compare like:
for word in random.choice(headlines).split(" ")[1:-1]:
popIn = 1
for symbol in symbols:
if symbol in word:
popIn = 0
if popIn = 1:
words.append(word)
But it seems like overkill in my head. I appreciate I'm a novice programmer so anything I can do to tidy either method up would be very helpful.
Use set intersection.
brackets = set("[]()")
words = [word for word in random.choice(headlines).split(" ")[1:-1] if not brackets.intersection(word)]
The intersection is empty if word does not contain any of the characters in brackets.
You might also consider using itertools instead of a list comprehension.
words = list(itertools.ifilterfalse(brackets.intersection,
random.choice(headlines).split(" "))[1:-1]))
I'm not sure of what you want to filter but I advise you to use the Regular expression module of python.
import re
r = re.compile("\w*[\[\]\(\)]+\w*")
test = ['foo', '[bar]', 'f(o)o']
result = [word for word in test if not r.match(word)]
print result
output is ['foo']

How might I create an acronym by splitting a string at the spaces, taking the character indexed at 0, joining it together, and capitalizing it?

My code
beginning = input("What would you like to acronymize? : ")
second = beginning.upper()
third = second.split()
fourth = "".join(third[0])
print(fourth)
I can't seem to figure out what I'm missing. The code is supposed to the the phrase the user inputs, put it all in caps, split it into words, join the first character of each word together, and print it. I feel like there should be a loop somewhere, but I'm not entirely sure if that's right or where to put it.
Say input is "Federal Bureau of Agencies"
Typing third[0] gives you the first element of the split, which is "Federal". You want the first element of each element in the sprit. Use a generator comprehension or list comprehension to apply [0] to each item in the list:
val = input("What would you like to acronymize? ")
print("".join(word[0] for word in val.upper().split()))
In Python, it would not be idiomatic to use an explicit loop here. Generator comprehensions are shorter and easier to read, and do not require the use of an explicit accumulator variable.
When you run the code third[0], Python will index the variable third and give you the first part of it.
The results of .split() are a list of strings. Thus, third[0] is a single string, the first word (all capitalized).
You need some sort of loop to get the first letter of each word, or else you could do something with regular expressions. I'd suggest the loop.
Try this:
fourth = "".join(word[0] for word in third)
There is a little for loop inside the call to .join(). Python calls this a "generator expression". The variable word will be set to each word from third, in turn, and then word[0] gets you the char you want.
works for me this way:
>>> a = "What would you like to acronymize?"
>>> a.split()
['What', 'would', 'you', 'like', 'to', 'acronymize?']
>>> ''.join([i[0] for i in a.split()]).upper()
'WWYLTA'
>>>
One intuitive approach would be:
get the sentence using input (or raw_input in python 2)
split the sentence into a list of words
get the first letter of each word
join the letters with a space string
Here is the code:
sentence = raw_input('What would you like to acronymize?: ')
words = sentence.split() #split the sentece into words
just_first_letters = [] #a list containing just the first letter of each word
#traverse the list of words, adding the first letter of
#each word into just_first_letters
for word in words:
just_first_letters.append(word[0])
result = " ".join(just_first_letters) #join the list of first letters
print result
#acronym2.py
#illustrating how to design an acronymn
import string
def main():
sent=raw_input("Enter the sentence: ")#take input sentence with spaces
for i in string.split(string.capwords(sent)):#split the string so each word
#becomes
#a string
print string.join(i[0]), #loop through the split
#string(s) and
#concatenate the first letter
#of each of the
#split string to get your
#acronym
main()
name = input("Enter uppercase with lowercase name")
print(f'the original string = ' + name)
def uppercase(name):
res = [char for char in name if char.isupper()]
print("The uppercase characters in string are : " + "".join(res))
uppercase(name)

Categories

Resources