Python regular expression for string with the following format

Python regular expression for string with the following format - python

I'm having trouble making a Regex to find a string matching the format of:
'One or more numeric digits///Any combination of alphanumerics and non-alphanumerics///Any combination of alphanumerics and non-alphanumerics///Any combination of alphanumerics and non-alphanumerics'
A more specific example would be:
'Number of transactions///Total Revenue///Product name///Cost of Supplies'
Which would look something like:
'1002///1502.34///Coca-Cola-12.Oz///902.23'
There are no whitespaces in the strings.
I've tried the Regex: r'\d+///\d+.\d+///\w+///\d+.\d+'
The problem is that for the \w+ section because it can sometimes contain non-alphanumeric characters.

You can use a character class in between the words to allow what characters should be matched. In this case you could add a . and a -
Note to escape the dot to match it literally.
\d+///\d+\.\d+///\w+(?:[.-]\w+)*///\d+\.\d+
Regex demo
Other options could be using only the character class or if you don't want to allow / in between, a negated character class:
\d+///\d+\.\d+///[\w.-]+///\d+\.\d+
\d+///\d+\.\d+///[^/\n]///\d+\.\d+

Related

Detect strings containing only digits, letters and one or more question marks

I am writing a python regex that matches only string that consists of letters, digits and one or more question marks.
For example, regex1: ^[A-Za-z0-9?]+$ returns strings with or without ?
I want a regex2 that matches expressions such as ABC123?A, 1AB?CA?, ?2ABCD, ???, 123? but not ABC123, ABC.?1D1, ABC(a)?1d
on mysql, I did that and it works:
select *
from (
select * from norm_prod.skill_patterns
where pattern REGEXP '^[A-Za-z0-9?]+$') AS XXX
where XXX.pattern not REGEXP '^[A-Za-z0-9]+$'

How about something like this:
^(?=.*\?)[a-zA-Z0-9\?]+$
As you can see here at regex101.com
Explanation
The (?=.*\?) is a positive lookahead that tells the regex that the start of the match should be followed by 0 or more characters and then a ? - i.e., there should be a ? somewhere in the match.
The [a-zA-Z0-9\?]+ matches one-or-more occurrences of the characters given in the character class i.e. a-z, A-Z and digits from 0-9, and the question mark ?.
Altogether, the regex first checks if there is a question mark somewhere in the string to be matched. If yes, then it matches the characters mentioned above. If either the ? is not present, or there is some foreign character, then the string is not matched.

You can validate an alphanumeric string with one or more question marks using
where pattern REGEXP '^[A-Za-z0-9]*([?][A-Za-z0-9]*)+$'
In Python:
re.search(r'^[A-Za-z0-9]*(?:\?[A-Za-z0-9]*)+$', text)
See the regex demo.
Details:
^ - start of string
[A-Za-z0-9]* - zero or more letters or digits
([?][A-Za-z0-9]*)+ - one or more repetitions of a ? char and then zero or more letters or digits
$ - end of string.
If you plan to apply this to any Unicode string, consider using POSIX character classes:
where pattern REGEXP '^[[:alnum:]]*([?][[:alnum:]]*)+$'
where [[:alnum:]] matches any letters and digits. In Python:
re.search(r'^[^\W_]*(?:\?[^\W_]*)+$', text)
In Python, all shorthand character classes are Unicode aware by default, and the [^\W_] pattern is a \w (that matches letters, digits, connector punctuation) with _ subtracted from it.

If there should be at least a single question mark present using MySQL or Python:
^[A-Za-z0-9]*\?[A-Za-z0-9?]*$
Explanation
^ Start of string
[A-Za-z0-9]* Match optional chars A-Z a-z 0-9
\? Match a question mark
[A-Za-z0-9]* Match optional chars A-Z a-z 0-9 or ?
$ End of string
See a regex demo.
In MySQL double escape the backslash like:
REGEXP '^[A-Za-z0-9]*\\?[A-Za-z0-9?]*$'

Regex that match any string except specific string [duplicate]

I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).

Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.

You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.

You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =

Just match /^index\.php/, and then reject whatever matches it.

In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>

Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end

Replacing string if it contains specified pattern

I need to replace ravi.jhon#piramal.com| or sam.jennifer#piramal.com| to
''(empty strings).I have written following regex but its unable to deal
with . - emptyspace in the strings.
my regex is \w+#ongoose.com["|"]
now question is how to include ., empty space,- along with alpha numeric characters
my final output should be : ravi.jhon#piramal.com| to ``

Add the character you want to match in a character class [\w.-].
In you example you want to match piramal and in your regex you want to match ongoose. To match both of them you might use an alternation (?:ongoose|piramal) or match any non whitespace character using \S+ and replace with an empty string.
To match a dot you have to escape it \.
[\w.-]+#\S+\.com\|

python regex match a group or not match it

I want to match the string:
from string as string
It may or may not contain as.
The current code I have is
r'(?ix) from [a-z0-9_]+ [as ]* [a-z0-9_]+'
But this code matches a single a or s. So something like from string a little will also be in the result.
I wonder what is the correct way of doing this.

You may use
(?i)from\s+[a-z0-9_]+\s+(?:as\s+)?[a-z0-9_]+
See the regex demo
Note that you use x "verbose" (free spacing) modifier, and all spaces in your pattern became formatting whitespaces that the re engine omits when parsing the pattern. Thus, I suggest using \s+ to match 1 or more whitespaces. If you really want to use single regular spaces, just omit the x modifier and use the regular space. If you need the x modifier to insert comments, escape the regular spaces:
r'(?ix) from\ [a-z0-9_]+\ (?:as\ )?[a-z0-9_]+'
Also, to match a sequence of chars, you need to use a grouping construct rather than a character class. Here, (?:as\s+)? defines an optional non-capturing group that matches 1 or 0 occurrences of as + space substring.

Regular expression which does not match specific string [duplicate]

I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).

Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.

You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.

You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =

Just match /^index\.php/, and then reject whatever matches it.

In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>

Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regular expression for string with the following format - python

Related

Detect strings containing only digits, letters and one or more question marks

Regex that match any string except specific string [duplicate]

Replacing string if it contains specified pattern

python regex match a group or not match it

Regular expression which does not match specific string [duplicate]

Categories

Resources