Email Regex Validation fails in python [duplicate] - python

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 2 years ago.
I am using python to extract Emails from web using re library. it does its job but it extracts links that match the pattern. For example:
/images/paramproofs/services/pgp/logo_black_16#2x.png
/images/paramproofs/services/twitter/logo_black_16#2x.png
/images/paramproofs/services/github/logo_black_16#2x.png
/images/paramproofs/services/reddit/logo_black_16#2x.png
/images/paramproofs/services/web/logo_black_16#2x.png
/images/paramproofs/services/web/logo_black_16#2x.png
/images/paramproofs/services/stellar/logo_black_16#2x.png
/images/badges/install-badge-windows-168-56#2x.png
/images/badges/install-badge-windows-168-56#3x.png
This is the pattern I use:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[ a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

I don't know where you took that regex from, but according to emailregex.com this should suffice for almost all cases (including yours):
(^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
The line anchors (^ for the beginning of the line and $ for the end) are the key here.

Related

Find Numbers after a specific text using regex [duplicate]

This question already has answers here:
How to grab number after word in python
(4 answers)
Closed 2 months ago.
I would recieve text like below
CRM NO: 23542536 crmno:# 3542536 crmno:_ 3542536... crm no 43653768754
my desired output will be:
23542536
3542536
3542536
43653768754
I want to write a regex to extract only the number after the string 'CRM NO'.
Also the CRM NO will come in variations like CRM NO or crmno or crm no
I have tried the regex ((?<=CRM NO)\D+\d+) but not compatible with all the entries
You can use a capture group with a case insensitive match and then match the leading part with an optional space
(?i)\bCRM ?NO\D+(\d+)\b
Regex demo

String pattern in Python [duplicate]

This question already has answers here:
How do I validate a date string format in python?
(5 answers)
Closed 6 months ago.
Im trying to check if a user's input is following the pattern integer/integer/integer(like month/day/year) but i dont know how to use exactly the match function to define that the pattern contains "number",then "/",again "number" and "/"...
Check out https://regex101.com/ for a neat website to check your regex! This is implemented in python using the re library. https://docs.python.org/3/library/re.html
In your case, the pattern would be [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}

Python regex with multiple matches in the same string [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Python non-greedy regexes
(7 answers)
Closed 3 years ago.
test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall("<tag.*>(.*)</tag>", test))
It outputs:
['part2']
The text can have any amount of "parts". I want to return all of them, not only the last one. What's the best way to do it?
You could change your .* to be .*? so that they are non-greedy. That will make your original example work:
import re
test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall(r'<tag.*?>(.*?)</tag>', test))
Output:
['part1', 'part2']
Though it would probably be best to not try to parse this with just regex, but instead use a proper HTML parser library.

Regex - so close, yet so far away [duplicate]

This question already has answers here:
Regular expression to find URLs within a string
(35 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 4 years ago.
Here is my current regex: (?:ht|f)tps?:[\S]*\/?(?:\w+)
I need to refine it such that it pulls the following link correctly from the quoted text below: http://www.purdue.edu/transcom/index.php
Any thoughts on how I can improve my current regex? Thanks in advance!
Additional information about the experimental protocol and results is
provided in the companion files and the TransCom project web site
(http://www.purdue.edu/transcom/index.php).The results of the Level 1
experiments presented here are grouped into two broad categories
I do not tested your regex thougoutly, and this is not clear enough why is your current regex failing.
But to catch a ulr in general, I would use the repetition of the group (the authorized characters for html minus the slash like [a-zA-Z0-9.]) and the slash)
something like
r'(?:ht|f)tps?:\\(?:\\[_html_authorized_chars])*'
and eventually a positive lookahead assertion if the answer is always inside quotes or parenthesis...
Url Similar Splitter
matches url similars and splits it into its address and parameters
by deme72
([--:\w?#%&+~#=]*\.[a-z]{2,4}\/{0,2})((?:[?&](?:\w+)=(?:\w+))+|[--:\w?#%&+~#=]+)?
Source: regexr.com community

How to use python to replace 'http://xyz.example.com' to 'http://example.com' with regular expression [duplicate]

This question already has answers here:
Python urlparse -- extract domain name without subdomain
(7 answers)
Closed 5 years ago.
How to use python to replace 'http://xyz.example.com' to 'http://example.com' with regular expression
Note: 'xyz' is just a template. it may be '123' or 'abc-123'
This would do it:
import re
input = 'http://xyz.example.com'
output = re.sub(r'(?<=http:\/\/).*?\.', '', input)
print(output)
Regex demo
Python demo
(?<=http:\/\/) is a positive look behind for http://
.*?\. matches everything that isn't a new line token lazily up until the first .

Categories

Resources