This question already has answers here:
RegEx Get string between two strings that has line breaks
(2 answers)
Closed 4 months ago.
I'm trying to clean up some Lua code files using a Python script and regex by removing comments. I'm using the following regular regular expression to find multiline comments: "--\[\[[^]\]]+"
For example:
--[[ This is a comment
on multiple lines
that needs to be removed ]]
The expression picks these up without a problem. However, there are also comments like this:
--[[
if thing == "whatever" or thing == "whateeeever" then
self:print( ">" .. thing.. params[2] .. " something " )
-- printing the thing
]]
On a comment like this, the regex only captures until the first ] on the end of params[2] instead of all the way to ]]
Can anyone provide me a working regex that captures everything, including square brackets?
You might try a pattern like
^[^\S\n]*--\[\[.*(?:\n(?!.*\]\]).*)*\n.*\]\]
Regex demo
Related
This question already has answers here:
Using regex to remove all text after the last number in a string
(2 answers)
Closed 4 years ago.
I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.
ive tried using:
test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)
but still get the same string ..
example:
dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'
Desired output:
clean = 'AA1001'
You can use re.match() instead of re.sub():
re.match('\w\w\d\d\d\d', dirty).group(0) # returns 'AA1001'
Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().
This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Closed 5 years ago.
I'm trying to match a URL using re but am having trouble in regards to making part of the match optional.
import re
x = raw_input('Link: ')
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)/[A-Za-z0-9?&=/?_]+'
if re.match(reg, x):
print 'True'
Currently, the above code would match something like:
https://iskis.com/?loc=shop_view_item&item=220503032
I would like to alter the regular expression to make the following, [A-Za-z0-9?&=/?_]+ an option - As such, anything after the slash isn't required, so the following should match:
https://iskis.com
I'm sure there is a simple solution but I don't know how to go about solving this.
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
Should do it. Surround the character class with () so it's a group, put a ? after it to make the text match 0-1 instances of that group, and put a $ at the end so that the regex will match to the end.
EDIT:
Come to think of it, you could use the optional match elsewhere in your regex.
reg = '(https?)://(www\.)?(iskis?)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
This question already has answers here:
Reference - What does this regex mean?
(1 answer)
Decyphering a simple regex
(3 answers)
Closed 5 years ago.
I'm new to learning regex, and I came across a problem that I solved, although I'm not sure why it was a problem and would just like to learn a bit more!
I'm using Python for my regex statement. The relevant portion of text to be captured is (I've changed the exact numbers, but this is what it looks like)
Evaluation Type: InterimContract Percent Complete: 30%Period of Performance Being Assessed: 05/27/2013 -
I'm looking to capture Interim and 05/27/2013. The regex that I was using that did NOT work was
match = re.search(
"Evaluation Type:[\s\n]*(.*?)[\s\n]*Contract Percent[.]*"
"Period of Performance Being Assessed:[\s\n]*(.*?)[\s\n]*-"
, page_content)
The code that does work is
match = re.search(
"Evaluation Type:[\s\n]*(.*?)[\s\n]*Contract Percent.*"
"Period of Performance Being Assessed:[\s\n]*(.*?)[\s\n]*-"
, page_content)
(as you may notice, the difference is that I removed the square brackets around the . at the end of line 2.
I understand that the brackets weren't actually needed (just helped me visualize it as I'm creating the regex) but I'm not sure why they broke it. I was getting no match with the first set of code, while a perfect match with the second. I'm sure it's some simple little thing, but I couldn't find what would be breaking from my searches online (although it could be that I don't understand enough in depth to know what I'm looking for)
[.]* means 0 or more dot
.* means 0 or more any character but newline.
A dot inside a character class loses its special meaning.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.
The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.
The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html
This question already has answers here:
Python regular expression again - match URL
(7 answers)
Closed 8 years ago.
I am trying to find a URL in a Dokuwiki using python regex. Dokuwikis format URLs like this:
[['insert URL'|Name of External Link]]
I need to design a python regex that captures the URL but stops at '|'
I could try and type out every non-alphanumeric character besides '|'
(something like this: (https?://[\w|\.|\-|\?|\/|\=|\+|\!|\#|\#|\$|\%|^|&]*) )
However that sounds really tedious and I might miss one.
Thoughts?
You can use negative character sets, or [^things to not match].
In this case, you want to not match |, so you would have [^|].
import re
bool(re.match("[^|]", "a"))
#>>> True
bool(re.match("[^|]", "|"))
#>>> False
You expect any character that's not | followed by a | and some other characters that are not ], everything enclosed within double square brackets. This translates to:
pattern = re.compile('\[\[([^\|]+)\|([^/]]+)\]\]')
print pattern.match("[[http://bla.org/path/to/page|Name of External Link]]").groups()
This would print:
('http://bla.org/path/to/page', 'Name of External Link')
If you don't need the name of the link you can just remove the parenthesis around the second group. More on regular expressions in Python here