Regex - Find 2 words or more using regex [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a simple regex to grab a string, It works fine with words like HelloWorld without space and how can i grab a words with space or more than 1 word like Hello World
Text File
FAN PS-2 is NOT PRESENT #like this 'NOT PRESENT'
Regex
Value FAN_PS (\S*) # regex
Start
^FAN PS is ${FAN_PS_1}
What should I change in my regex so I can grab more than 1 word?
Thank you.

You have to change your regular expressions. The current expressions \S* match a string of non-white-space characters, so it is normal that you only get the first word.
If you change \S* to [\S ]*, you will get multiple words. You can even change it to the simpler .* if you do not care about certain characters.
Read the python regex reference for information on different character classes.

Related

Replace a string in a URL with python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))

Handle several adjacent regex matches [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Having a regex pattern which can be found several times in a text, how would one chooses and differs between the matches in this case?
choose the first match.
ignore the peripherals.
only one match using index.
Example:
regex_pattern = r"(AB.+?AbB)|(CD.+?CdD)|(EF.+?EfF)"
text = "for finding several CDadjacent CDtagsCdDCdD in a
text this is an ABexampleAbB text"
the first match is CDadjacent CDtagsCdD.
while one might wants to match both:
CDadjacent CDtagsCdDCdD
and. . . . . . . CDtagsCdD
Found the answer faster than I thought.
using '(',')' (round brackets) lets you use parts of the regex as one unit and by adding the '+; (plus sign) you can look for more than one occurrences of a pattern.
for this example this pattern will solve the issue:
r"(AB.+?(AbB)+)|(CD.+?(CdD)+)|(EF.+?(EfF)+)"

Match Before and After a pattern in Python RE [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following strings.
string1 = "按照由 GPV 提供的相关报告; 世界卫生组织 WHO 发布的有关研究"
string2 = "\n\n 介绍 INTRODUCTION"
How can I remove the spaces between Chinese characters and English acronyms?
The expected result is:
"按照由GPV提供的相关报告; 世界卫生组织WHO发布的有关研究".
However, the re pattern should not remove the space between 介绍 and INTRODUCTION since there are no Chinese characters on the right side of INTRODUCTION.
If you can use the third-party regex implementation module regex, it supports \p{script} tokens which make this task easy :
\p{Han}+\s+\p{Latin}+\s+\p{Han}+
Python native re's unfortunately doesn't support these.
In order to remove the spaces, use capturing groups to select the surrounding words and refer to those in your replacement pattern :
Match (\p{Han}+)\s+(\p{Latin}+)\s+(\p{Han}+)
Replace by \1\2\3

How to remove periods from the middle of sentences [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Is it possible to remove periods from the middle of a string (sentence), leaving the ending period?
The answers that I have seen, basically strip all of the periods.
Remove periods at the end of sentences in python
If I understand correctly, this should do what you want:
import re
string = 'You can. use this to .remove .extra dots.'
string = re.sub('\.(?!$)', '', string)
It uses regex to replace all dots, except if the dot is at the end of the string. (?!$) is a negative lookahead, so the regex looks for any dot not directly followed by $ (end of line).

Regex Matching - Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to write a regex matching pattern code to either return true if there is one '+' between two words and nothing else. I have written the code to check if there is only one '+' in the string but how will I check it is between two words?
The code is below:
import re
inputStr= "ali+ahmedafaw+"
inputStr2= "hello+world+again"
plus=re.findall(r'[+]', inputStr)
print (plus)
l_plus=len(plus)
print "The length is ",l_plus
if l_plus<=1:
print "True"
else:
print "False"
Actually it depends on what you mean by word. If you mean a word with more than one character, you can simply use [a-zA-Z]+ around the + character. Or other patterns which will match different characters like \w to match word characters.
re.search(r'[a-zA-Z]+\+[a-zA-Z]+', input_str)
But if you just want it doesn't appears at the leading and trailing of your text you can use negative look-around:
re.search(r'(?<!^)\+(?!$)', input_str)

Categories

Resources