Python Regular Expressions for a question mark [duplicate] - python

This question already has answers here:
Python regex with question mark literal
(5 answers)
Closed 4 years ago.
Am working on a data set with a column next review date. This column have missen fields represented by a question mark(?)
I want to capture this ? with a regular expression then seperate all rows with no review date from the rest of the data.
Question: What is the expression to distinctly match a question mark? (?)

Backslash before question mark means "literally match a question mark"
\?
Also, putting a question mark into a character class will mean it's matched literally rather than having its typical "0 or 1 of the previous" meaning
[?]
Thus:
bcd[?]
bcd\?
Will both match data that looks like:
abcd?efg
^^^^
If you want to match data that is just a question mark and nothing else, use the start ^ and end $ markers:
^\?$
Consider though that it may be faster not to use regex and just do a simple "string contains" check for the presence of a question mark if that's literally all you're doing, and don't require complex pattern matching and value capture

Related

python regular expression doesn't match all letters after "or" group [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I'm trying to match FD or MD in a string by doing:
matches = re.findall(r"(F|M)D",myString)
Suppose myString = 'MD'. Then, matches becomes
matches = ['M']
Why does it ignore D?
That's because (F|M) is a group, and D is not a part of this group.
Use this instead:
matches = re.findall(r"((?:F|M)D)",myString)
For a visual representation of the differences between these two patterns, I really like to use Regexper.com:
(F|M)D
((?:F|M)D)
The Python documentation on regular expressions has a lot more information available here.
Note that ?: indicates that F|M is a "non-capturing" group. If the pattern were ((F|M)D) instead, then matches would be [('MD', 'M')] (which doesn't sound like what you want).

Numeric pattern search in regular expression using Python [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']

Re.sub in python (remove last _) [duplicate]

This question already has answers here:
Remove Last instance of a character and rest of a string
(5 answers)
Closed 3 years ago.
I have a string such as:
string="lcl|NC_011588.1_cds_YP_002321424.1_1"
and I would like to keep only: "YP_002321424.1"
So I tried :
string=re.sub(".*_cds_","",string)
string=re.sub("_\d","",string)
Does someone have an idea?
But the first _ is removed to
Note: The number can change (they are not fixed).
"Ordinary" split, as proposed in the other answer, is not enough,
because you also want to strip the trailing _1, so the part to capture
should end after a dot and digit.
Try the following pattern:
(?<=_cds_)\w+\.\d
For a working example see https://regex101.com/r/U2QsFH/1
Don't bother with regexes, a simple
string.split('_cds_')[1]
will be enough

Python regular expression to capture last word with missing line feed [duplicate]

This question already has answers here:
Python csv string to array
(10 answers)
In regex, match either the end of the string or a specific character
(2 answers)
Closed 3 years ago.
I need to capture words separated by tabs as illustrated in the image below.
The expression (.*?)[\t|\n] works well, except for the last line where a line feed is missing. Can anyone suggest a modification of the regular expression to also match the last word, i.e. Cheyenne? Link to code example
Replace [\t|\n] with (\t|$).
BTW, [\t|\n] is a character class, so the pipe | is literal here. You probably meant [\t\n].

Regex for matching email addresses [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
So I have a regex that goes like:
regex1= re.compile(r'\S+#\S+')
This works perfectly but I am trying to add a character limit so the total amount of characters have to be less than 20.
I tried re.compile(r'\S+#\S+{5,20}') but it keeps giving me an error. Seems like a simple fix, but cant see what I am doing wrong.
You can't specify a greedy modifier (+) with a specific number of characters (i.e., \S+{5,20) is not a valid pattern). If you're doing this in python, I'd suggest just using the len(...) function on the string in addition to the regex to verify. For example:
if regex1.match(email) and (len(email) < 20):
...

Categories

Resources