I am trying to replace any i's in a string with capital I's. I have the following code:
str.replace('i ','I ')
However, it does not replace anything in the string. I am looking to include a space after the I to differentiate between any I's in words and out of words.
Thanks if you can provide help!
The exact code is:
new = old.replace('i ','I ')
new = old.replace('-i-','-I-')
new = old.replace('i ','I ')
new = old.replace('-i-','-I-')
You throw away the first new when you assign the result of the second operation over it.
Either do
new = old.replace('i ','I ')
new = new.replace('-i-','-I-')
or
new = old.replace('i ','I ').replace('-i-','-I-')
or use regex.
I think you need something like this.
>>> import re
>>> s = "i am what i am, indeed."
>>> re.sub(r'\bi\b', 'I', s)
'I am what I am, indeed.'
This only replaces bare 'i''s with I, but the 'i''s that are part of other words are left untouched.
For your example from comments, you may need something like this:
>>> s = 'i am sam\nsam I am\nThat Sam-i-am! indeed'
>>> re.sub(r'\b(-?)i(-?)\b', r'\1I\2', s)
'I am sam\nsam I am\nThat Sam-I-am! indeed'
Related
Is there a way to replace a word within a string without using a "string replace function," e.g., string.replace(string,word,replacement).
[out] = forecast('This snowy weather is so cold.','cold','awesome')
out => 'This snowy weather is so awesome.
Here the word cold is replaced with awesome.
This is from my MATLAB homework which I am trying to do in python. When doing this in MATLAB we were not allowed to us strrep().
In MATLAB, I can use strfind to find the index and work from there. However, I noticed that there is a big difference between lists and strings. Strings are immutable in python and will likely have to import some module to change it to a different data type so I can work with it like how I want to without using a string replace function.
just for fun :)
st = 'This snowy weather is so cold .'.split()
given_word = 'awesome'
for i, word in enumerate(st):
if word == 'cold':
st.pop(i)
st[i - 1] = given_word
break # break if we found first word
print(' '.join(st))
Here's another answer that might be closer to the solution you described using MATLAB:
st = 'This snow weather is so cold.'
given_word = 'awesome'
word_to_replace = 'cold'
n = len(word_to_replace)
index_of_word_to_replace = st.find(word_to_replace)
print st[:index_of_word_to_replace]+given_word+st[index_of_word_to_replace+n:]
You can convert your string into a list object, find the index of the word you want to replace and then replace the word.
sentence = "This snowy weather is so cold"
# Split the sentence into a list of the words
words = sentence.split(" ")
# Get the index of the word you want to replace
word_to_replace_index = words.index("cold")
# Replace the target word with the new word based on the index
words[word_to_replace_index] = "awesome"
# Generate a new sentence
new_sentence = ' '.join(words)
Using Regex and a list comprehension.
import re
def strReplace(sentence, toReplace, toReplaceWith):
return " ".join([re.sub(toReplace, toReplaceWith, i) if re.search(toReplace, i) else i for i in sentence.split()])
print(strReplace('This snowy weather is so cold.', 'cold', 'awesome'))
Output:
This snowy weather is so awesome.
I am writing a code using python to extract the name of a road,street, highway, for example a sentence like "There is an accident along Uhuru Highway", I want my code to be able to extract the name of the highway mentioned, I have written the code below.
sentence="there is an accident along uhuru highway"
listw=[word for word in sentence.lower().split()]
for i in range(len(listw)):
if listw[i] == "highway":
print listw[i-1] + " "+ listw[i]
I can achieve this but my code is not optimized, i am thinking of using regular expressions, any help please
'uhuru highway' can be found as follows
import re
m = re.search(r'\S+ highway', sentence) # non-white-space followed by ' highway'
print(m.group())
# 'uhuru highway'
If the location you want to extract will always have highway after it, you can use:
>>> sentence = "there is an accident along uhuru highway"
>>> a = re.search(r'.* ([\w\s\d\-\_]+) highway', sentence)
>>> print(a.group(1))
>>> uhuru
You can do the following without using regexes:
sentence.split("highway")[0].strip().split(' ')[-1]
First split according to "highway". You'll get:
['there is an accident along uhuru', '']
And now you can easily extract the last word from the first part.
I am Python beginner. Following code does exactly what i want. But it looks a little dump coz of three for loop. Can somebody show me smarter/shorter way to achieve it? may be a single function or parallelizing for loops.
def getWordListAndCounts(text):
words = []
for t in text:
for tt in t:
for ttt in (re.split("\s+", str(tt))):
words.append(str(ttt))
return Counter(words)
text = [['I like Apple' , 'I also like Google']]
getWordListAndCounts(text)
Firstly remove redundat list (it will reduce level in list comprehension):
Since there is not any necessity to store temporary result in list, generators are more preferable and efficient way.
Check this one-line approach:
text = ['I like Apple' , 'I also like Google']
print Counter(str(ttt) for t in text for ttt in (re.split("\s+", str(t))))
Use meaningful variable names. t, tt and ttt can't help the code being readable.
Why not use "for phrase in text" then "for word in phrase"?
Why are you using double encoded strings? Unless it is already in this format when you are reading it, I would suggest you not to do this.
import re
from collections import Counter
def getWordListAndCounts(text):
return Counter(re.split('\s+', str([' '.join(x) for x in text][0])))
text = [['I like Apple' , 'I also like Google']]
print getWordListAndCounts(text)
Just say I have a string such as:
Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP
I want to pull out every word which occurs before "/NNP/". This would mean my output is
Lecture, UNESCO, House
I tried:
print re.findall(r'/NNP/',string) then working backwards but I can't make it arbitrary. There is always a blank space leading the word which might help.
Edit: removed error in list.
Try this:
s = 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP'
re.findall(r'(\S+)/NNP/', s)
=> ['Lecture', 'UNESCO', 'House']
Forward lookahead.
>>> re.findall('(?:\s|^)[^/]+(?=/NNP/)', 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP')
['Lecture', 'UNESCO', 'House']
I wish to let the user ask a simple question, so I can extract a few standard elements from the string entered.
Examples of strings to be entered:
Who is the director of The Dark Knight?
What is the capital of China?
Who is the president of USA?
As you can see sometimes it is "Who", sometimes it is "What". I'm most likely looking for the "|" operator. I'll need to extract two things from these strings. The word after "the" and before "of", as well as the word after "of".
For example:
1st sentence: I wish to extract "director" and place it in a variable called Relation, and extract "The Dark Knight" and place it in a variable called Concept.
Desired output:
RelationVar = "director"
ConceptVar = "The Dark Knight"
2nd sentence: I wish to extract "capital", assign it to variable "Relation".....and extract "China" and place it in variable "Concept".
RelationVar = "capital"
ConceptVar = "China"
Any ideas on how to use the re.match function? or any other method?
You're correct that you want to use | for who/what. The rest of the regex is very simple, the group names are there for clarity but you could use r"(?:Who|What) is the (.+) of (.+)[?]" instead.
>>> r = r"(?:Who|What) is the (?P<RelationVar>.+) of (?P<ConceptVar>.+)[?]"
>>> l = ['Who is the director of The Dark Knight?', 'What is the capital of China?', 'Who is the president of USA?']
>>> [re.match(r, i).groupdict() for i in l]
[{'RelationVar': 'director', 'ConceptVar': 'The Dark Knight'}, {'RelationVar': 'capital', 'ConceptVar': 'China'}, {'RelationVar': 'president', 'ConceptVar': 'USA'}]
Change (?:Who|What) to (Who|What) if you also want to capture whether the question uses who or what.
Actually extracting the data and assigning it to variables is very simple:
>>> m = re.match(r, "What is the capital of China?")
>>> d = m.groupdict()
>>> relation_var = d["RelationVar"]
>>> concept_var = d["ConceptVar"]
>>> relation_var
'capital'
>>> concept_var
'China'
Here is the script, you can simply use | to optional match one inside the brackets.
This worked fine for me
import re
list = ['Who is the director of The Dark Knight?','What is the capital of China?','Who is the president of USA?']
for string in list:
a = re.compile(r'(What|Who) is the (.+) of (.+)')
nodes = a.findall(string);
Relation = nodes[0][0]
Concept = nodes[0][1]
print Relation
print Concept
print '----'
Best Regards:)