Python Regex stop at '|' character [duplicate]

Python Regex stop at '|' character [duplicate] - python

This question already has answers here:
Python regular expression again - match URL
(7 answers)
Closed 8 years ago.
I am trying to find a URL in a Dokuwiki using python regex. Dokuwikis format URLs like this:
[['insert URL'|Name of External Link]]
I need to design a python regex that captures the URL but stops at '|'
I could try and type out every non-alphanumeric character besides '|'
(something like this: (https?://[\w|\.|\-|\?|\/|\=|\+|\!|\#|\#|\$|\%|^|&]*) )
However that sounds really tedious and I might miss one.
Thoughts?

You can use negative character sets, or [^things to not match].
In this case, you want to not match |, so you would have [^|].
import re
bool(re.match("[^|]", "a"))
#>>> True
bool(re.match("[^|]", "|"))
#>>> False

You expect any character that's not | followed by a | and some other characters that are not ], everything enclosed within double square brackets. This translates to:
pattern = re.compile('\[\[([^\|]+)\|([^/]]+)\]\]')
print pattern.match("[[http://bla.org/path/to/page|Name of External Link]]").groups()
This would print:
('http://bla.org/path/to/page', 'Name of External Link')
If you don't need the name of the link you can just remove the parenthesis around the second group. More on regular expressions in Python here

Related

What is the regex expression to find everything between --[[ and ]]? [duplicate]

This question already has answers here:
RegEx Get string between two strings that has line breaks
(2 answers)
Closed 4 months ago.
I'm trying to clean up some Lua code files using a Python script and regex by removing comments. I'm using the following regular regular expression to find multiline comments: "--\[\[[^]\]]+"
For example:
--[[ This is a comment
on multiple lines
that needs to be removed ]]
The expression picks these up without a problem. However, there are also comments like this:
--[[
if thing == "whatever" or thing == "whateeeever" then
self:print( ">" .. thing.. params[2] .. " something " )
-- printing the thing
]]
On a comment like this, the regex only captures until the first ] on the end of params[2] instead of all the way to ]]
Can anyone provide me a working regex that captures everything, including square brackets?

You might try a pattern like
^[^\S\n]*--\[\[.*(?:\n(?!.*\]\]).*)*\n.*\]\]
Regex demo

Make part of a regex match in python optional [duplicate]

This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Closed 5 years ago.
I'm trying to match a URL using re but am having trouble in regards to making part of the match optional.
import re
x = raw_input('Link: ')
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)/[A-Za-z0-9?&=/?_]+'
if re.match(reg, x):
print 'True'
Currently, the above code would match something like:
https://iskis.com/?loc=shop_view_item&item=220503032
I would like to alter the regular expression to make the following, [A-Za-z0-9?&=/?_]+ an option - As such, anything after the slash isn't required, so the following should match:
https://iskis.com
I'm sure there is a simple solution but I don't know how to go about solving this.

reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
Should do it. Surround the character class with () so it's a group, put a ? after it to make the text match 0-1 instances of that group, and put a $ at the end so that the regex will match to the end.
EDIT:
Come to think of it, you could use the optional match elsewhere in your regex.
reg = '(https?)://(www\.)?(iskis?)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'

re.sub() doesn't replace middle of string [duplicate]

This question already has answers here:
How to replace only the contents within brackets using regular expressions?
(2 answers)
Closed 6 years ago.
I am trying to replace the contents of brackets in a string with nothing. The code I am using right now is like this:
tstString = "OUTPUT:TRACK[:STATE]?"
modString = re.sub("[\[\]]","",tstString)
When I print the results, I get:
OUTPUT:TRACK:STATE?
But I want the result to be:
OUTPUT:TRACK?
How can I do this?

I guess this one will work fine. Regexp now match Some string Inside []. Not ? after *. It makes * non-greedy
import re
tstString = "OUTPUT:TRACK[:STATE]?"
modString = re.sub("\[.*?\]", "", tstString)
print modString

Your regular expression "[\[\]]" says 'any of these characters: "[", "]"'.
But you want to delete what's between the square brackets too, so you should use something like r"\[:\w+\]". It says '[, then :, then one or more alphanumeric characters, then ]'.
And please, always use raw strings (r in front of quotes) when working with regular expressions to avoid funny things connected with Python string processing.

understanding this python regular expression re.compile(r'[ :]') [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.

The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.

The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html

matching parentheses in python regular expression [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 1 year ago.
I have something like
store(s)
ending line like "1 store(s)".
I want to match it using Python regular expression.
I tried something like re.match('store\(s\)$', text)
but it's not working.
This is the code I tried:
import re
s = '1 store(s)'
if re.match('store\(s\)$', s):
print('match')

In more or less direct reply to your comment
Try this
import re
s = '1 stores(s)'
if re.match('store\(s\)$',s):
print('match')
The solution is to use re.search instead of re.match as the latter tries to match the whole string with the regexp while the former just tries to find a substring inside of the string that does match the expression.

Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default)
Straight from the docs, but it does come up alot.

have you considered re.match('(.*)store\(s\)$',text) ?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Regex stop at '|' character [duplicate] - python

You can use negative character sets, or [^things to not match]. In this case, you want to not match |, so you would have [^|]. import re bool(re.match("[^|]", "a")) #>>> True bool(re.match("[^|]", "|")) #>>> False

Related

What is the regex expression to find everything between --[[ and ]]? [duplicate]

Make part of a regex match in python optional [duplicate]

re.sub() doesn't replace middle of string [duplicate]

understanding this python regular expression re.compile(r'[ :]') [duplicate]

matching parentheses in python regular expression [duplicate]

Categories

Resources