Regular Expressions(extraction values) [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to extract content, between a certain text.
For example:
<html><title>lol</title></html>
I want to extract what is located between the <title> </ title>, which regular expression do I need ?

You can use better tools than regular expressions.
Read about HTMLParser
EDIT:
But if you want use regular expressions:
import re
def get_tag_body(tagname, text):
regexp = r'<%s>(.*?)</%s>' % (tagname, tagname)
rx_obj = re.search(regexp, text, re.IGNORECASE|re.DOTALL)
return rx_obj.groups()

Related

How to get all text between [CODE][/CODE]? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have text with tags [CODE]something here[/CODE].
How can I take only part between this tags, using re module.
Using the re.findall function we can use a regular expression with a capture group which will return a list of all the matches.
import re
regex = r"\[CODE\](.+?)\[\/CODE\]"
test_str = "[CODE]Noob[/CODE] Nb[CODE]something here[/CODE]"
matches = re.findall(regex, test_str)
for match in matches:
print(match)

Grab specific text from string in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
How can I grab the invite code from this string?
{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}
The expected output would be "111A111A".
Any help is appreciated
I tried it in a simple way, You could give more details for further improvement.
s = "{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}"
print(s[-11: -3])
This will do it with ReGex
import re
def findInvite(s):
return re.search(r"(?<=/invite\\/).*(?=\\/)",s).group()
assert findInvite("{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}") == "111A111A"
And if this isn't a string but a dict, then change the function to:
def findInvite(d):
s = d["inviteURL"]
return re.search(r"(?<=/invite\\/).*(?=\\/)",s).group()

Extract Data Enclosed between three asterisks in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want the data enclosed between three asterisks.And the Word should start with description.
For eg:I have data like
description ***tCore-DFON_P.17-18>dPLUC80115_S19P1>>><<<dPDCL80121_S17P1<100G.IPT.NTTA.SEA.ASE+PC1.LUC/PLD-SEA/PLD_100GEL064.263568***;
I want only
tCore-DFON_P.17-18>dPLUC80115_S19P1>>><<<dPDCL80121_S17P1<100G.IPT.NTTA.SEA.ASE+PC1.LUC/PLD-SEA/PLD_100GEL064.263568
You may use re.findall here:
inp = "description ***tCore-DFON_P.17-18>dPLUC80115_S19P1>>><<<dPDCL80121_S17P1<100G.IPT.NTTA.SEA.ASE+PC1.LUC/PLD-SEA/PLD_100GEL064.263568***;"
matches = re.findall(r'\bdescription\s+\*{3}(.*?)\*{3}', inp, flags=re.DOTALL)
print(matches)
This prints:
['tCore-DFON_P.17-18>dPLUC80115_S19P1>>><<<dPDCL80121_S17P1<100G.IPT.NTTA.SEA.ASE+PC1.LUC/PLD-SEA/PLD_100GEL064.263568']
Note that I use dot all mode in the regex, in case your expected matches might span across more than one line.

Regex in python to catch a string like MMS-2839 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How can I catch some string like these with a regex in python?
M1Sxs-2839
McS-28S9213
Both the first and the second part (divided by the -) can contains letters and numbers (case insensitive).
You may try the below re.match function.
re.match(r"(?i)[A-Z0-9]+-[A-Z0-9]+$", st)
(?i) helps to do case-insensitive match. Since re.match scans the input from start, you don't need to add start of the line anchor ^ explicitly.

grep with python to match string inside quotes in html files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am newbie in grep and I'm familiar with Python. My problem is to find and replace every string inside the quote like "text" by < em >text< /em >
The source file has the html form
Thanks
That'll do the trick
import re
s = '"text" "some"'
res = re.subn('"([^"]*)"', '<em>\\1</em>', s)[0]

Categories

Resources