Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following strings.
string1 = "按照由 GPV 提供的相关报告; 世界卫生组织 WHO 发布的有关研究"
string2 = "\n\n 介绍 INTRODUCTION"
How can I remove the spaces between Chinese characters and English acronyms?
The expected result is:
"按照由GPV提供的相关报告; 世界卫生组织WHO发布的有关研究".
However, the re pattern should not remove the space between 介绍 and INTRODUCTION since there are no Chinese characters on the right side of INTRODUCTION.
If you can use the third-party regex implementation module regex, it supports \p{script} tokens which make this task easy :
\p{Han}+\s+\p{Latin}+\s+\p{Han}+
Python native re's unfortunately doesn't support these.
In order to remove the spaces, use capturing groups to select the surrounding words and refer to those in your replacement pattern :
Match (\p{Han}+)\s+(\p{Latin}+)\s+(\p{Han}+)
Replace by \1\2\3
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a string that might have any of the following format (example) :
1111__1111
1111__1111_11
111_11A_11
I have added the following check :
import re
print(bool(re.match("\d__\d","1111_1111"))
print(bool(re.match("\d__\d_\d","1111_1111_11"))
print(bool(re.match("\d_\d[A-Za-z]_\d","111_11A_11"))
I don't think the regex is correct because when I introduce a character in the first regex for example it returns me True Always.
can you please point me to a solution?
Thank you
It returns True because the pattern is trying to find matches based on each one of the characters inside the pattern string.
The following regular expression finds exact matches for the three scenarios:
print(bool(re.match("(^\d{4}__\d{4}$)","1111__1111")))
print(bool(re.match("(^\d{4}\_\d{4}\_\d{2}$)","1111_1111_11")))
print(bool(re.match("(^\d{3}_\d{2}[A-Z]_\d{2}$)","111_11A_11")))
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Having a regex pattern which can be found several times in a text, how would one chooses and differs between the matches in this case?
choose the first match.
ignore the peripherals.
only one match using index.
Example:
regex_pattern = r"(AB.+?AbB)|(CD.+?CdD)|(EF.+?EfF)"
text = "for finding several CDadjacent CDtagsCdDCdD in a
text this is an ABexampleAbB text"
the first match is CDadjacent CDtagsCdD.
while one might wants to match both:
CDadjacent CDtagsCdDCdD
and. . . . . . . CDtagsCdD
Found the answer faster than I thought.
using '(',')' (round brackets) lets you use parts of the regex as one unit and by adding the '+; (plus sign) you can look for more than one occurrences of a pattern.
for this example this pattern will solve the issue:
r"(AB.+?(AbB)+)|(CD.+?(CdD)+)|(EF.+?(EfF)+)"
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a simple regex to grab a string, It works fine with words like HelloWorld without space and how can i grab a words with space or more than 1 word like Hello World
Text File
FAN PS-2 is NOT PRESENT #like this 'NOT PRESENT'
Regex
Value FAN_PS (\S*) # regex
Start
^FAN PS is ${FAN_PS_1}
What should I change in my regex so I can grab more than 1 word?
Thank you.
You have to change your regular expressions. The current expressions \S* match a string of non-white-space characters, so it is normal that you only get the first word.
If you change \S* to [\S ]*, you will get multiple words. You can even change it to the simpler .* if you do not care about certain characters.
Read the python regex reference for information on different character classes.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Is it possible to remove periods from the middle of a string (sentence), leaving the ending period?
The answers that I have seen, basically strip all of the periods.
Remove periods at the end of sentences in python
If I understand correctly, this should do what you want:
import re
string = 'You can. use this to .remove .extra dots.'
string = re.sub('\.(?!$)', '', string)
It uses regex to replace all dots, except if the dot is at the end of the string. (?!$) is a negative lookahead, so the regex looks for any dot not directly followed by $ (end of line).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to write a regex matching pattern code to either return true if there is one '+' between two words and nothing else. I have written the code to check if there is only one '+' in the string but how will I check it is between two words?
The code is below:
import re
inputStr= "ali+ahmedafaw+"
inputStr2= "hello+world+again"
plus=re.findall(r'[+]', inputStr)
print (plus)
l_plus=len(plus)
print "The length is ",l_plus
if l_plus<=1:
print "True"
else:
print "False"
Actually it depends on what you mean by word. If you mean a word with more than one character, you can simply use [a-zA-Z]+ around the + character. Or other patterns which will match different characters like \w to match word characters.
re.search(r'[a-zA-Z]+\+[a-zA-Z]+', input_str)
But if you just want it doesn't appears at the leading and trailing of your text you can use negative look-around:
re.search(r'(?<!^)\+(?!$)', input_str)