This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
So I have a regex that goes like:
regex1= re.compile(r'\S+#\S+')
This works perfectly but I am trying to add a character limit so the total amount of characters have to be less than 20.
I tried re.compile(r'\S+#\S+{5,20}') but it keeps giving me an error. Seems like a simple fix, but cant see what I am doing wrong.
You can't specify a greedy modifier (+) with a specific number of characters (i.e., \S+{5,20) is not a valid pattern). If you're doing this in python, I'd suggest just using the len(...) function on the string in addition to the regex to verify. For example:
if regex1.match(email) and (len(email) < 20):
...
Related
This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have a text string, similar to below example,
I have 5-6 year of experience with 2-3 years experience in Java
I have used this below regex syntax to match it,
import re
pattern = '\d{1}-\d{1} year'
[(m.start(0), m.end(0),'Experience') for m in re.finditer(pattern, string)]
# results
5-6 year
2-3 year (In this case it's missing out the 's'.)
How to modify this pattern to also match 'years and year' which every is longest?
Add an optional "s": '\d{1,2}-\d{1,2}\s*years?'. I also changed '\d{1}' to '\d{1,2}' which means "one or two digits" (it's hard to imagine someone has more than 99 years of experience), and replaced one space with '\s*' - any number of spaces, including no spaces.
This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']
This question already has answers here:
Remove Last instance of a character and rest of a string
(5 answers)
Closed 3 years ago.
I have a string such as:
string="lcl|NC_011588.1_cds_YP_002321424.1_1"
and I would like to keep only: "YP_002321424.1"
So I tried :
string=re.sub(".*_cds_","",string)
string=re.sub("_\d","",string)
Does someone have an idea?
But the first _ is removed to
Note: The number can change (they are not fixed).
"Ordinary" split, as proposed in the other answer, is not enough,
because you also want to strip the trailing _1, so the part to capture
should end after a dot and digit.
Try the following pattern:
(?<=_cds_)\w+\.\d
For a working example see https://regex101.com/r/U2QsFH/1
Don't bother with regexes, a simple
string.split('_cds_')[1]
will be enough
This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 3 years ago.
I want to achieve the following:
say I have two regex, regex1 and regex2. I want to construct a new regex that is of 'prefix_regex1 | prefix_regex2', what syntax should I use to share the prefix, I tried 'prefix_(regex1|regex2)' but it's not working, since I think it's confused on the bracket used as group rather than making the | precedence higher.
example:
I have two string that both should match the pattern:
prefix_123
prefix_abc
I wrote this pattern: prefix_(\d*|\D*) that tries to capture both cases, but when I run it against prefix_abc it's only matching prefix_, not the entire string.
This site might help with this problem (and others). It lets you tinker with the regex and see the result both graphically and in code: https://www.debuggex.com/
For example, I changed your regex to this: prefix_(\d+|\D+) which requires 1 or more digit or non-digit after "prefix_" Not sure if that's what you are looking for, but it's easy to experiment with the site I shared above.
Hope it helps.
This question already has answers here:
Regular expression to find URLs within a string
(35 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 4 years ago.
Here is my current regex: (?:ht|f)tps?:[\S]*\/?(?:\w+)
I need to refine it such that it pulls the following link correctly from the quoted text below: http://www.purdue.edu/transcom/index.php
Any thoughts on how I can improve my current regex? Thanks in advance!
Additional information about the experimental protocol and results is
provided in the companion files and the TransCom project web site
(http://www.purdue.edu/transcom/index.php).The results of the Level 1
experiments presented here are grouped into two broad categories
I do not tested your regex thougoutly, and this is not clear enough why is your current regex failing.
But to catch a ulr in general, I would use the repetition of the group (the authorized characters for html minus the slash like [a-zA-Z0-9.]) and the slash)
something like
r'(?:ht|f)tps?:\\(?:\\[_html_authorized_chars])*'
and eventually a positive lookahead assertion if the answer is always inside quotes or parenthesis...
Url Similar Splitter
matches url similars and splits it into its address and parameters
by deme72
([--:\w?#%&+~#=]*\.[a-z]{2,4}\/{0,2})((?:[?&](?:\w+)=(?:\w+))+|[--:\w?#%&+~#=]+)?
Source: regexr.com community