Hi all,
How does this expression actually work?
urlpatterns = patterns('',
url(r'^get/(?P<app_id>\d+)/$', 'app.views.app'),
...
)
I understand what it does, at least to map a url entered by the user to the app() function in the app's view page. I also understand it is a regular expression that ends up taking the id of the app and mapping it to the url. But where is this function going? What is going on with the r'^...?P /$ (I get the d+ is a digit regex, of the id itself, but that's about it).
I also understand this url function draws from the django.conf.urls module.
Perhaps my misunderstanding is more buried in my lack of regex experience. Nonetheless, I need help! I do not like using things I do not understand, and I am guilty.
Let's take a look: r'^get/(?P<app_id>\d+)/$'
The r'' means that assume as string characters every character inside the string quotes.
^ character means the beginning of the regular expression. For example, forget/123 won't match the expression because doesn't start with get, if the sign weren't there, it should've match it because it won't be forcing the matched string to begin with get, just that get...appears in the string.
The $ character means the end of the expression. If absent, get/123/xd may match the expression and this is not desired.
(?P<>) is a way to give a name/alias to a group in the expression.
You should read the python's regular expressions documentation. It's very good to know about regular expressions because they're very useful.
Hope this helps!
r just changes how the following string literal is interpreted. Backslashes (\) are not treated as escape sequences, that means that the regex in the string will be used as is.
^ at the beginning and $ at the end match and the end of the string respectively.
(?P<name>...) is a saving named group - it helps you to cut a part of url and pass it as a parameter into the view. See more in django named groups docs.
Hope that helps.
Related
I have a file named Document.pdf and sometimes it is called Document-12345678.pdf where -12345678 is a random number.
I want to check a file is downloaded in folder. When the file is not finished it display Document.pdf.fkasfmq or Document-12345678.pdf.fkasfmq where .fkasfmq is a random hash from the downloader and I don't want it to match.
I try make a regex like r'Document(?:[\-0-9]+).pdf' and test it with either Document.pdf or Document-12345678.pdf it will always return false.
From my understanding (?:[\-0-9]+) means it can be or not in the set that matches any hyphen and any numbers before .pdf, is that correct? I am very very rusty with regex...
The parentheses only perform grouping, not optionality. If you want to make the expression optional, the ? quantifier does that (and actually the parentheses are unnecessary, as the character class is a single expression). Though as #anubhava notes in a comment, you might as well use the * quantifier then.
r'Document[-0-9]*\.pdf'
Notice also the backslash to match a literal dot; an unescaped . matches any character (other than newline). Inside a character class, an initial or final hyphen does not need to be backslash-escaped.
On the other hand, perhaps prefer a more precise expression:
r'^Document(-\d)?\.pdf$'
which says, opionally, a hyphen followed by numbers, and nothing before or after.
You should mark it as optional with the "?" symbol. Otherwise, you are requiring that the name should have the numbers and/or digits part.
r'Document(?:[\-0-9]+)?\.pdf'
Or as #anubhava pointed out in the comments, it can be simplified to:
r'Document[\-0-9]*\.pdf'
This way, it will also match e.g. "Document.pdf"
Also, you should consider putting the mark "$" to signify end of string so that it doesn't match e.g. "Document.pdf.fkasfmq"
r'^Document(?:[\-0-9]+)?\.pdf$'
Or
r'^Document[\-0-9]*\.pdf$'
You can just use (\d{8}) to see if there's a document there with 8 digits in the filename.
My problem is the following:
Inside my urls.py I have defined these url patterns:
url(r'^image/upload', 'main.views.presentations.upload_image'),
url(r'^image/upload-from-url', 'main.views.presentations.upload_image_from_url'),
the problem is when I call from my browser the URL
myowndomain:8000/image/upload-from-url
Django always execute the first pattern (r'^image/upload')
Is there any solution to my problem?
Django uses the first matching pattern, and your ^image/upload pattern doesn't include anything to stop it matching the longer text. The solution is to require that your pattern also match the end of the string:
r'^image/upload$'
By convention, Django URLs generally have a trailing slash as well, but that's not strictly required:
r'^image/upload/$'
You need to insert the dollar sign "$" at the end of the pattern. The dollar sign is a character that represents position. In the case of regex, this is the end of the string. Because both image/upload and image/upload-from-url match what you're looking for, you need to explicitly say where to stop in the pattern.
I'm working on a file parser that needs to cut out comments from JavaScript code. The thing is it has to be smart so it won't take '//' sequence inside string as the beggining of the comment. I have following idea to do it:
Iterate through lines.
Find '//' sequence first, then find all strings surrounded with quotes ( ' or ") in line and then iterate through all string matches to check if the '//' sequence is inside or outside one of those strings. If it is outside of them it's obvious that it'll be a proper comment begining.
When testing code on following line (part of bigger js file of course):
document.getElementById("URL_LABEL").innerHTML="<a name=\"link\" href=\"http://"+url+"\" target=\"blank\">"+url+"</a>";
I've encountered problem. My regular expression code:
re_strings=re.compile(""" "
(?:
\\.|
[^\\"]
)*
"
|
'
(?:
[^\\']|
\\.
)*
'
""",re.VERBOSE);
for s in re.finditer(re_strings,line):
print(s.group(0))
In python 3.2.3 (and 3.1.4) returns the following strings:
"URL_LABEL"
"<a name=\"
" href=\"
"+url+"
" target=\"
">"
"</a>"
Which is obviously wrong because \" should not exit the string. I've been debugging my regex for quite a long time and it SHOULDN'T exit here. So i used RegexBuddy (with Python compatibility) and Python regex tester at http://re-try.appspot.com/ for reference.
The most peculiar thing is they both return same, correct results other than my code, that is:
"URL_LABEL"
"<a name=\"link\" href=\"http://"
"\" target=\"blank\">"
"</a>"
My question is what is the cause of those differences? What have I overlooked? I'm rather a beginer in both Python and regular expressions so maybe the answer is simple...
P.S. I know that finding if the '//' sequence is inside string quotes can be accomplished with one, bigger regex. I've already tried it and met the same problem.
P.P.S I would like to know what I'm doing wrong, why there are differences in behaviour of my code and regex test applications, not find other ideas how to parse JavaScript code.
You just need to use a raw string to create the regex:
re_strings=re.compile(r""" "
etc.
"
""",re.VERBOSE);
The way you've got it, \\.|[^\\"] becomes the regex \.|[^\"], which matches a literal dot (.) or anything that's not a quotation mark ("). Add the r prefix to the string literal and it works as you intended.
See the demo here. (I also used a raw string to make sure the backslashes appeared in the target string. I don't know how you arranged that in your tests, but the backslashes obviously are present; the problem is that they're missing from your regex.)
you cannot deal with matching quotes with regex ... in fact you cannot guarantee any matching pairs of anything(and nested pairs especially) ... you need a more sophisticated statemachine for that(LLVM, etc...)
source: lots of CS classes...
and also see : Matching pair tag with regex for a more detailed explanation
I know its not what you wanted to hear but its basically just the way it is ... and yes different implementations of regex can return different results for stuff that regex cant really do
I have a URL that is either going to be united-states/boulder-21781/tool-&-anchor/mulligan-21/. Assuming the best strategy is to encode the &, the url changes to united-states/boulder-21781/tool-%26-anchor/mulligan-21/
I'm trying to write a url conf that will accept this, but the regex I'm using isn't working. I have:
url(r'^%(regex)s/%(regex)s-(\d+)/%(regex)s/%(regex)s-(\d+)/$' % {'regex'= '(?i)([\.\-\_\w]+)'}, 'view_tip_page', name='tip_page'),
What do I add to capture the %? or should i just include the &?
My first recommendation would be to not do it. As you yourself are demonstrating, not everybody knows that a & is a perfectly valid character in a URI before the first ?, and you are bound to get into trouble. It also looks ugly, is harder to type, and more jarring than, say, and, or even just n. Having said that, if you really want it in there, just put it in there in the character class.
Not related to your question, the way you're building that regex is weird; you're not capturing any of the bits of the path for use by the view. You're also including the (?i) global modifier four times, and specifying _ which is already part of \w. I dunno, I'd expect something like
r'(?i)(?P<country>[.\w-]+)/(?P<city>[.\w-]+)-(?P<cityno>[\d+])/...etc...
but maybe I'm missing something.
Well currently there is no way for you to match % or & in your regex. Depending on whether it is encoded or not, you will need to add one or the other to the character class in your regex, and it should match.
I might change it to something like the following:
r'(?i)^%(regex)s/%(regex)s-(\d+)/%(regex)s/%(regex)s-(\d+)/$' % {'regex': r'([-.%\w]+)'}
And proof that it works:
>>> pattern = re.compile(r'(?i)^%(regex)s/%(regex)s-(\d+)/%(regex)s/%(regex)s-(\d+)/$' % {'regex': r'([-.%\w]+)'})
>>> s = 'united-states/boulder-21781/tool-%26-anchor/mulligan-21/'
>>> match = pattern.match(s)
>>> match.groups()
('united-states', 'boulder', '21781', 'tool-%26-anchor', 'mulligan', '21')
A few comments on your regex:
The (?i) isn't really doing anything, since you are using \w which will already match both upper and lowercase. If you do want to use (?i) I would move it out of the replacement string and into the format string ('(?i)...' % {'regex': '...'} instead of '...' % {'regex': '(?i)...'}), since otherwise it will show up multipe times.
Note that character class was changed from [\.\-\_\w] to [-.%\w], this is because underscores are included in \w, you don't need to escape the hyphen if it comes at the beginning of the character class, and you don't need to escape the . inside of character classes.
Also, \w does match digits so technically to match something like 'boulder-21781' you could just use %(regex)s instead of %(regex)s-(\d+), but I didn't want to change that in case it was intentionally adding some additional verification of the format.
'[A-Za-z0-9-_]*'
'^[A-Za-z0-9-_]*$'
I want to check if a string only contains the sign in the above expression, just want to make sure no more weird sign like #%&/() are in the strings.
I am wondering if there's any difference between these two regular expression? Did the beginning and ending sign matter? Will it affect the result somehow?
Python regular expressions are anchored at the beginning of strings (like in many other languages): hence the ^ sign at the beginning doesn’t make any difference. However, the $ sign does very much make one: if you don’t include it, you’re only going to match the beginning of your string, and the end could contain anything – including the characters you want to exclude. Just try re.match("[a-z0-9]", "abcdef/%&").
In addition to that, you may want to use a regular expression that simply excludes the characters you’re testing for, it’s much safe (hence [^#%&/()] – or maybe you have to do something to escape the parentheses; can’t remember how it works at the moment).
The beginning and end sign match the beginning and end of a String.
The first will match any String that contains zero or more ocurrences of the class [A-Za-z0-9-_] (basically any string whatsoever...).
The second will match an empty String, but not one that contains characters not defined in [A-Za-z0-9-_]
Yes it will. A regex can match anywhere in its input. # will match in your first regex.