This question already has answers here:
Replace all the occurrences of specific words
(4 answers)
Find substring in string but only if whole words?
(8 answers)
Closed 6 years ago.
Want to replace a certain words in a string but keep getting the followinf result:
String: "This is my sentence."
User types in what they want to replace: "is"
User types what they want to replace word with: "was"
New string: "Thwas was my sentence."
How can I make sure it only replaces the word "is" instead of any string of the characters it finds?
Code function:
import string
def replace(word, new_word):
new_file = string.replace(word, new_word[1])
return new_file
Any help is much appreciated, thank you!
using regular expression word boundary:
import re
print(re.sub(r"\bis\b","was","This is my sentence"))
Better than a mere split because works with punctuation as well:
print(re.sub(r"\bis\b","was","This is, of course, my sentence"))
gives:
This was, of course, my sentence
Note: don't skip the r prefix, or your regex would be corrupt: \b would be interpreted as backspace.
A simple but not so all-round solution (as given by Jean-Francios Fabre) without using regular expressions.
' '.join(x if x != word else new_word for x in string.split())
Related
This question already has answers here:
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 2 years ago.
I've got a problem with carets and dollar signs in Python.
I want to find every word which starts with a number and ends with a letter
Here is what I've tried already:
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'^\d+\w+$')
print(phoneNumRegex.findall(text))
Result is an empty list:
[]
The result I want:
415kkk, 9999ll, 555jjj
Where is the problem?
Problems with your regex:
^...$ means you only want full matches over the whole string - get rid of that.
r'\w+' means "any word character" which means letters + numbers (case ignorant) plus underscore '_'. So this would match '5555' for '555' via
r'\d+' and another '5' as '\w+' hence add it to the result.
You need
import re
text = "Cell: 415kkk -555- 9999ll Work: 212-555jjj -0000"
phoneNumRegex = re.compile(r'\b\d+[a-zA-Z]+\b')
print(phoneNumRegex.findall(text))
instead:
['415kkk', '9999ll', '555jjj']
The '\b' are word boundaries so you do not match 'abcd1111' inside '_§$abcd1111+§$'.
Readup:
re-syntax
regex101.com - Regextester website that can handle python syntax
This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 3 years ago.
I want to replace only specific word in one string. However, some other words have that word inside but I don't want them to be changed.
For example, for the below string I only want to replace x with y in z string. how to do that?
x = "the"
y = "a"
z = "This is the thermometer"
import re
pattern=r'\bthe\b' # \b - start and end of the word
repl='a'
string = 'This is the thermometer'
string=re.sub(pattern, repl, string)
In your case you can use re.sub(x, y, z).
You can read the documentation here for more information.
This question already has answers here:
How can I tell if a string repeats itself in Python?
(13 answers)
Closed 3 years ago.
I need to split a string by using repeated characters.
For example:
My string is "howhowhow"
I need output as 'how,how,how'.
I cant use 'how' directly in my reg exp. because my input varies. I should check the string whether it is repeating the character and need to split that characters.
import re
string = "howhowhow"
print(','.join(re.findall(re.search(r"(.+?)\1", string).group(1), string)))
OUTPUT
howhowhow -> how,how,how
howhowhowhow -> how,how,how,how
testhowhowhow -> how,how,how # not clearly defined by OP
The pattern is non-greedy so that howhowhowhow doesn't map to howhow,howhow which is also legitimate. Remove the ? if you prefer the longest match.
lengthofRepeatedChar = 3
str1 = 'howhowhow'
HowmanyTimesRepeated = int(len(str1)/lengthofRepeatedChar)
((str1[:lengthofRepeatedChar]+',')*HowmanyTimesRepeated)[:-1]
'how,how,how'
Works When u know the length of repeated characters
This question already has answers here:
Best way to strip punctuation from a string
(32 answers)
Closed 6 years ago.
I have been working on a file which has lot of puntuations and we need to neglect the puntuations so we can count the actual length of words.
Example:
Is this stack overflow! ---> Is this stack overflow
While doing this I did wrote a lot of cases for each and every punctuation which is there which made my code work slow.So I was looking for some effective way to implement the same using a module or function.
Code snippet :
with open(file_name,'r') as f:
for line in f:
for word in line.split():
#print word
'''
Handling Puntuations
'''
word = word.replace('.','')
word = word.replace(',','')
word = word.replace('!','')
word = word.replace('(','')
word = word.replace(')','')
word = word.replace(':','')
word = word.replace(';','')
word = word.replace('/','')
word = word.replace('[','')
word = word.replace(']','')
word = word.replace('-','')
So form this logic I have written this, so is there any way to minimize this?
This question is a "classic", but a lot of answers don't work in Python 3 because the maketrans function has been removed from Python 3. A Python 3-compliant solution is:
use string.punctuation to get the list and str.translate to remove them
import string
"hello, world !".translate({ord(k):"" for k in string.punctuation})
results in:
'hello world '
the argument of translate is (in Python 3) a dictionary. Key is the ASCII code of the character, and value is the replacement character. I created it using a dictionary comprehension.
You can use regular expression to replace from a character class as
>>> import re
>>> re.sub(r'[]!,:)([/-]', '', string)
'Is this stack overflow'
[]!,:)([/-] A character class which matches ] or ! or , or etc. Replace it with ''.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to refer to “\” sign in python string
I've quite large string data in which I've to remove all characters other than A-Z,a-z and 0-9
I'm able to remove almost every character but '\' is a problem.
every other character is removed but '\' is making problem
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
reps = {' ':'-','.':'-','"':'-',',':'-','/':'-',
'<':'-',';':'-',':':'-','*':'-','+':'-',
'=':'-','_':'-','?':'-','%':'-','!':'-',
'$':'-','(':'-',')':'-','\#':'-','[':'-',
']':'-','\&':'-','#':'-','\W':'-','\t':'-'}
x.name = x.name.lower()
x1 = replace_all(x.name,reps)
I've quite large string data in which I've to remove all characters other than A-Z,a-z and 0-9
In other words, you want to keep only those characters.
The string class already provides a test "is every character a letter or number?", called .isalnum(). So, we can just filter with that:
>>> filter(str.isalnum, 'foo-bar\\baz42')
'foobarbaz42'
If you have a string:
a = 'hi how \\are you'
you can remove it by doing:
a.replace('\\','')
>'hi how are you'
If you have a specific context where you are having trouble, I recommend posting a bit more detail.
birryee is correct, you need to escape the backslash with a second backslash.
to remove all characters other than A-Z, a-z and 0-9
Instead of trying to list all the characters you want to remove (that would take a long time), use a regular expression to specify those characters you wish to keep:
import re
text = re.sub('[^0-9A-Za-z]', '-', text)