Handle several adjacent regex matches [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Having a regex pattern which can be found several times in a text, how would one chooses and differs between the matches in this case?
choose the first match.
ignore the peripherals.
only one match using index.
Example:
regex_pattern = r"(AB.+?AbB)|(CD.+?CdD)|(EF.+?EfF)"
text = "for finding several CDadjacent CDtagsCdDCdD in a
text this is an ABexampleAbB text"
the first match is CDadjacent CDtagsCdD.
while one might wants to match both:
CDadjacent CDtagsCdDCdD
and. . . . . . . CDtagsCdD

Found the answer faster than I thought.
using '(',')' (round brackets) lets you use parts of the regex as one unit and by adding the '+; (plus sign) you can look for more than one occurrences of a pattern.
for this example this pattern will solve the issue:
r"(AB.+?(AbB)+)|(CD.+?(CdD)+)|(EF.+?(EfF)+)"

Related

Replace a string in a URL with python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))

Check if string follow a strict format via Regex Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a string that might have any of the following format (example) :
1111__1111
1111__1111_11
111_11A_11
I have added the following check :
import re
print(bool(re.match("\d__\d","1111_1111"))
print(bool(re.match("\d__\d_\d","1111_1111_11"))
print(bool(re.match("\d_\d[A-Za-z]_\d","111_11A_11"))
I don't think the regex is correct because when I introduce a character in the first regex for example it returns me True Always.
can you please point me to a solution?
Thank you
It returns True because the pattern is trying to find matches based on each one of the characters inside the pattern string.
The following regular expression finds exact matches for the three scenarios:
print(bool(re.match("(^\d{4}__\d{4}$)","1111__1111")))
print(bool(re.match("(^\d{4}\_\d{4}\_\d{2}$)","1111_1111_11")))
print(bool(re.match("(^\d{3}_\d{2}[A-Z]_\d{2}$)","111_11A_11")))

Match Before and After a pattern in Python RE [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following strings.
string1 = "按照由 GPV 提供的相关报告; 世界卫生组织 WHO 发布的有关研究"
string2 = "\n\n 介绍 INTRODUCTION"
How can I remove the spaces between Chinese characters and English acronyms?
The expected result is:
"按照由GPV提供的相关报告; 世界卫生组织WHO发布的有关研究".
However, the re pattern should not remove the space between 介绍 and INTRODUCTION since there are no Chinese characters on the right side of INTRODUCTION.
If you can use the third-party regex implementation module regex, it supports \p{script} tokens which make this task easy :
\p{Han}+\s+\p{Latin}+\s+\p{Han}+
Python native re's unfortunately doesn't support these.
In order to remove the spaces, use capturing groups to select the surrounding words and refer to those in your replacement pattern :
Match (\p{Han}+)\s+(\p{Latin}+)\s+(\p{Han}+)
Replace by \1\2\3

Regex - Find 2 words or more using regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a simple regex to grab a string, It works fine with words like HelloWorld without space and how can i grab a words with space or more than 1 word like Hello World
Text File
FAN PS-2 is NOT PRESENT #like this 'NOT PRESENT'
Regex
Value FAN_PS (\S*) # regex
Start
^FAN PS is ${FAN_PS_1}
What should I change in my regex so I can grab more than 1 word?
Thank you.
You have to change your regular expressions. The current expressions \S* match a string of non-white-space characters, so it is normal that you only get the first word.
If you change \S* to [\S ]*, you will get multiple words. You can even change it to the simpler .* if you do not care about certain characters.
Read the python regex reference for information on different character classes.

Python - how to replace 'p' in a number(4p5) with '.' (4p5->4.5)? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a program that prints out some data with 'p's in place of decimal points, and also some other information. I was trying to replace the 'p's with '.'s.
Example output by the program:
out_info = 'value is approximately 34p55'
I would like to change it to:
out_info_updated = 'value is approximately 34.55'
I tried using re.search to extract out the number and replace the p with ., but plugging it back becomes a problem.I could not figure out that pattern to use for re.sub that would do the job.
Can anyone please help?
Here you go:
import re
out_info = "value is approximately 34p55"
re.sub(r'(\d+)p(\d+)', r'\1.\2', out_info)
The output is:
'value is approximately 34.55'
That says "Look for one or more digits, followed by a p, followed by one or more digits, then replace all that with the first set of digits, followed by a ., followed by the second set of digits."

Categories

Resources