URL Regex for Python [duplicate] - python

This question already has answers here:
Regex match everything after question mark?
(7 answers)
Closed 12 months ago.
I am trying to compare webpage URLs using regex. I am using the below method.
regex_url = r'https://www.website.com/books/\w{8}$'
is_read = re.match(regex_url, request.url) is not None
if not is_read:
add_to_read(token)
Everything works well for the above regex. But there is a new URL pattern now which I cant seem to get the regex right.
The new URL pattern is
https://www.website.com/books/Ab7us83xI?varient=web
9 characters followed by a question mark and then the word 'varient' and then '=web'. Can anyone help me get the correct regex for this?
Only the first 9 characters change every time. Apologies if this is a stupid question.
Many thanks.

Is this what you need?
https://www.website.com/books/\w{9}\?varient=web$
\w{9} - match 9 characters
\? - match question mark
varient=web - match varient=web

Related

Regex for cleaning a list of prohibited words from a string [duplicate]

This question already has answers here:
Match a whole word in a string using dynamic regex
(1 answer)
Word boundary with words starting or ending with special characters gives unexpected results
(2 answers)
Closed 1 year ago.
I'm following the accepted answer in this link:
Replace all words from word list with another string in python
Despite following the code exactly as describe in the above solution, I can't seem to remove the characters from my string. I am not receiving any errors in the console. Could anybody point out what I am doing wrong? Here is a reproducible example. Thank you.
example = "(-) This is an example of a string € which is + not being cleaned // correctly"
prohibited_strings = ["(-)","€","+","//"]
regex_cleaner = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, prohibited_strings)))
example = regex_cleaner.sub("", example)

Validating Regex US ZIP CODE with constraints in Python [duplicate]

This question already has answers here:
regex for zip-code
(3 answers)
Closed 2 years ago.
I'm trying to write a regex that follows these constraints:
Exactly 5 digits
Sometimes, but not always, followed by a dash with 4 more digits
Zip code needs to be preceded by at least one whitespace
Cannot be at the start of a text
I've arrived at this but it's not giving me the output I want:
r"^[A-Za-z].*\s.*\d{5}(?:[-\s]\d{4})?$"
I would use:
(?<=[ \t])((?:\d{5}(?=[^\d-]|$))|(?:\d{5}-\d{4}(?=[^\d-]|$)))
Demo and explanation

Numeric pattern search in regular expression using Python [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']

Regex - so close, yet so far away [duplicate]

This question already has answers here:
Regular expression to find URLs within a string
(35 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 4 years ago.
Here is my current regex: (?:ht|f)tps?:[\S]*\/?(?:\w+)
I need to refine it such that it pulls the following link correctly from the quoted text below: http://www.purdue.edu/transcom/index.php
Any thoughts on how I can improve my current regex? Thanks in advance!
Additional information about the experimental protocol and results is
provided in the companion files and the TransCom project web site
(http://www.purdue.edu/transcom/index.php).The results of the Level 1
experiments presented here are grouped into two broad categories
I do not tested your regex thougoutly, and this is not clear enough why is your current regex failing.
But to catch a ulr in general, I would use the repetition of the group (the authorized characters for html minus the slash like [a-zA-Z0-9.]) and the slash)
something like
r'(?:ht|f)tps?:\\(?:\\[_html_authorized_chars])*'
and eventually a positive lookahead assertion if the answer is always inside quotes or parenthesis...
Url Similar Splitter
matches url similars and splits it into its address and parameters
by deme72
([--:\w?#%&+~#=]*\.[a-z]{2,4}\/{0,2})((?:[?&](?:\w+)=(?:\w+))+|[--:\w?#%&+~#=]+)?
Source: regexr.com community

Excluding words using regex without excluding its variants [duplicate]

This question already has answers here:
Find substring in string but only if whole words?
(8 answers)
Closed 4 years ago.
I am trying to exclude the word ‘define’ without excluding other forms of the word like ‘defined’ or ‘defining’ but the below mentioned regex doesn’t work. Help.
Regex :
^((?!define).)*$
Use word boundaries around the word define:
^((?!\bdefine\b).)*$
You could also write this pattern as:
^(?!.*\bdefine\b).*$
Demo

Categories

Resources