match all occurrence of regular [duplicate] - python

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Regex to first occurrence only? [duplicate]
(4 answers)
Closed 4 years ago.
I want to extract all occurrences of a pattern in Python.
Here is what i have done
import re
string="Any information <p>sent to the server as clear text</p>, may be stolen and used later for <p>identity theft</p> or user impersonation. In addition, several privacy regulations state that sensitive information such as user<p> credentials will always be sent encrypted </p> to the web site."
regex='<p>.*</p>' # obviously it matches starting <p> to the last </p>
if re.findall(regex, String):
print(re.findall(regex, string))
else:
print('no match found')
I want to extract all the occurance of paragraph tags. I mean the output should be a list which looks like this
['<p>sent to the server as clear text</p>', '<p>identity theft</p>', '<p> credentials will always be sent encrypted </p>']
I've found few similar questions but not serving the purpose
Find all occurrences of a substring in Python
Finding multiple occurrences of a string within a string in Python

change your regex like this :
regex=r"<p>.*?</p>"
It gives o/p like :
['<p>sent to the server as clear text</p>', '<p>identity theft</p>',
'<p> credentials will always be sent encrypted </p>']

Related

Find Numbers after a specific text using regex [duplicate]

This question already has answers here:
How to grab number after word in python
(4 answers)
Closed 2 months ago.
I would recieve text like below
CRM NO: 23542536 crmno:# 3542536 crmno:_ 3542536... crm no 43653768754
my desired output will be:
23542536
3542536
3542536
43653768754
I want to write a regex to extract only the number after the string 'CRM NO'.
Also the CRM NO will come in variations like CRM NO or crmno or crm no
I have tried the regex ((?<=CRM NO)\D+\d+) but not compatible with all the entries
You can use a capture group with a case insensitive match and then match the leading part with an optional space
(?i)\bCRM ?NO\D+(\d+)\b
Regex demo

URL Regex for Python [duplicate]

This question already has answers here:
Regex match everything after question mark?
(7 answers)
Closed 12 months ago.
I am trying to compare webpage URLs using regex. I am using the below method.
regex_url = r'https://www.website.com/books/\w{8}$'
is_read = re.match(regex_url, request.url) is not None
if not is_read:
add_to_read(token)
Everything works well for the above regex. But there is a new URL pattern now which I cant seem to get the regex right.
The new URL pattern is
https://www.website.com/books/Ab7us83xI?varient=web
9 characters followed by a question mark and then the word 'varient' and then '=web'. Can anyone help me get the correct regex for this?
Only the first 9 characters change every time. Apologies if this is a stupid question.
Many thanks.
Is this what you need?
https://www.website.com/books/\w{9}\?varient=web$
\w{9} - match 9 characters
\? - match question mark
varient=web - match varient=web

Email Regex Validation fails in python [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 2 years ago.
I am using python to extract Emails from web using re library. it does its job but it extracts links that match the pattern. For example:
/images/paramproofs/services/pgp/logo_black_16#2x.png
/images/paramproofs/services/twitter/logo_black_16#2x.png
/images/paramproofs/services/github/logo_black_16#2x.png
/images/paramproofs/services/reddit/logo_black_16#2x.png
/images/paramproofs/services/web/logo_black_16#2x.png
/images/paramproofs/services/web/logo_black_16#2x.png
/images/paramproofs/services/stellar/logo_black_16#2x.png
/images/badges/install-badge-windows-168-56#2x.png
/images/badges/install-badge-windows-168-56#3x.png
This is the pattern I use:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[ a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
I don't know where you took that regex from, but according to emailregex.com this should suffice for almost all cases (including yours):
(^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
The line anchors (^ for the beginning of the line and $ for the end) are the key here.

Numeric pattern search in regular expression using Python [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']

Regex - so close, yet so far away [duplicate]

This question already has answers here:
Regular expression to find URLs within a string
(35 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 4 years ago.
Here is my current regex: (?:ht|f)tps?:[\S]*\/?(?:\w+)
I need to refine it such that it pulls the following link correctly from the quoted text below: http://www.purdue.edu/transcom/index.php
Any thoughts on how I can improve my current regex? Thanks in advance!
Additional information about the experimental protocol and results is
provided in the companion files and the TransCom project web site
(http://www.purdue.edu/transcom/index.php).The results of the Level 1
experiments presented here are grouped into two broad categories
I do not tested your regex thougoutly, and this is not clear enough why is your current regex failing.
But to catch a ulr in general, I would use the repetition of the group (the authorized characters for html minus the slash like [a-zA-Z0-9.]) and the slash)
something like
r'(?:ht|f)tps?:\\(?:\\[_html_authorized_chars])*'
and eventually a positive lookahead assertion if the answer is always inside quotes or parenthesis...
Url Similar Splitter
matches url similars and splits it into its address and parameters
by deme72
([--:\w?#%&+~#=]*\.[a-z]{2,4}\/{0,2})((?:[?&](?:\w+)=(?:\w+))+|[--:\w?#%&+~#=]+)?
Source: regexr.com community

Categories

Resources