Python refuses text.replace() in one environment - python

I've been mocking about with the following bit of dirty support-code for a pylons app, which works fine in a python-shell, a separate python file, or when running in paster. Now, we've put the application on-line through mod_wsgi and apache and this specific piece of code stopped working completely. First off, the code itself:
def fixStyle(self, text):
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
t = t.replace('class="wide"', 'style="width: 125px; %s"' % (STYLEDEF,))
t = t.replace('<td>', '<td style="%s">' % (STYLEDEF,))
t = t.replace('<a ', '<a style="%s" ' % (LINKSTYLE,))
return t
It seems pretty straightforward, and to be honest, it is. So what happens when I put a piece of text in it, for example:
<table><tr><td>Test!</td></tr></table>
The output should be:
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table>
and it is, on most systems. When we put it through the app on Apache/mod_wsgi though, the following happens:
<table><tr><td>Test!</td></tr></table>
You guessed it.
I have put logging at the start outputting the text, and at the end outputting original text and the t variable. It displays what I present here: on most systems t is changed, on the apache environment it isn't.
Of course I made sure to restart apache (to get it to reload the .py files) after every change, and it reflected in the logging output.
I'm currently at a loss and have no idea where to go next. Googling doesn't really work out, so I'm hoping on you guys to help out and perhaps point out a fundamental issue with using whatever-is-causing-this.
If anything is missing I'll edit it in.

Add some print statments and examine the Apache logs:
def fixStyle(self, text):
print "text:", text
print "STYLEDEF", STYLEDEF
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
print "t:", t

I have no idea concerning what is your problem, but I find the repetition of replace() not a good thing: if the four patterns are in the string, there will be 4 times creation of a new string.
IMO, this should be better:
def fixStyle(self, text):
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
t = t.replace('class="wide"', 'style="width: 125px; %s"' % (STYLEDEF,))
t = t.replace('<td>', '<td style="%s">' % STYLEDEF)
t = t.replace('<a ', '<a style="%s" ' % (LINKSTYLE,))
return t
import re
STYLEDEF = 'stuff-from-styledef'
LINKSTYLE = 'VVVV'
def aux(m, dic = {'<p':('<p style="',STYLEDEF),
'<td':('<td style="',STYLEDEF),
'class="wide"':('style="width: 125px; ',STYLEDEF),
'<a':('<a style="',LINKSTYLE)} ):
return '%s%s"' % dic[m.group()]
pat = re.compile('<p(?=>)>|class="wide"|<td(?=>)|<a(?= )')
ch = '<table><tr><td>Test!</td></tr></table><a type="brown" >'
print ch
print fixStyle(None, ch)
print pat.sub(aux,ch)
result
<table><tr><td>Test!</td></tr></table><a type="brown" >
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table><a style="VVVV" type="brown" >
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table><a style="VVVV" type="brown" >
I think re.sub() does the replacements in only one pass upon the string.
Defining parameter dic with a default argument => the value is assigned to dic at the definition of aux() and then doesn't change anymore. At each call, there is no passing of an argument to dic from the outer level: the value is kept inside the function.
Also, the function aux() doesn't need to go out and search the values of STYLEDEF and LINKSTYLE ouside the function.
All that should increase the execution speed.
.
EDIT:
Since ' style="' and STYLEDEF are common to several results to be returned, I had tried to shorten the list of them and I had found
def aux(m, dic = {'<p' :'<p style="%s"',
'<td' :'<td style="%s"',
'class="wide"':'style="width: 125px; %s"'} ):
if m.group(1):
return '<a style="%s"' % LINKSTYLE
else:
return dic[m.group()] % STYLEDEF
pat = re.compile('<p(?=>)|class="wide"|<td(?=>)|(<a)(?= )')
In the aim to take down the conditional lines , I wrote the preceding solution and, I dont know why, I stopped there. The interest of the solution was in the writing of the Regular Expression string , with assertions, that allow to write the solution of John Machin, but I polluted it with these oafish tuples.
There is also this solution:
def aux(m, STY = STYLEDEF,LIN = LINKSTYLE ):
return ( 'style="width: 125px; ' if m.group(3) else m.group(1)+' style="' ) + \
( LIN if m.group(2) else STY) + '"'
pat = re.compile('(<p(?=>)|<td(?=>)|(<a(?= )))|(class="wide")')
But the clearer and simpler solution is, as John Machin noticed:
def aux(m, dic = {'<p' :'<p style="%s"' % STYLEDEF,
'<td':'<td style="%s"' % STYLEDEF,
'<a' :'<a style="%s"' % LINKSTYLE,
'class="wide"':'style="%s"' % ('width: 125px; '+STYLEDEF) } ):
return dic[m.group()]
pat = re.compile('<p(?=>)|class="wide"|<td(?=>)|<a(?= )')
The values in dic are calculated only at the execution of the function aux()'s definition.
In fact , it's very near the arguments of the replace() functions.

Sorry, but: Descriptions of debugging that don't mention repr() are not credible. Ensure that you are logging repr(text) and repr(t), NOT text and t.
Run the non-working environment and at least one of the working environments on the same piece of data and edit your question to show the actual code that you used and the actual logging output.

Related

Compare XML documents for equality [duplicate]

I am new to programming in python,´and i have some troubles understanding the concept. I wish to compare two xml files. These xml files are quite large.
I will give an example for the type of files i wish to compare.
xmlfile1:
<xml>
<property1>
<property2>
<property3>
</property3>
</property2>
</property1>
</xml>
xml file2:
<xml>
<property1>
<property2>
<property3>
<property4>
</property4>
</property3>
</property2>
</property1>
</xml>
the property1,property2 that i have named are different from the ones that are actually in the file. There are a lot of properties within the xml file.
ANd i wish to compare the two xml files.
I am using an lxml parser to try to compare the two files and to print out the difference between them.
I do not know how to parse it and compare it automatically.
I tried reading through the lxml parser, but i couldnt understand how to use it to my problem.
Can someone please tell me how should i proceed with this problem.
Code snippets can be very useful
One more question, Am i following the right concept or i am missing something else? Please correct me of any new concepts that you knwo about
This is actually a reasonably challenging problem (due to what "difference" means often being in the eye of the beholder here, as there will be semantically "equivalent" information that you probably don't want marked as differences).
You could try using xmldiff, which is based on work in the paper Change Detection in Hierarchically Structured Information.
My approach to the problem was transforming each XML into a xml.etree.ElementTree and iterating through each of the layers.
I also included the functionality to ignore a list of attributes while doing the comparison.
The first block of code holds the class used:
import xml.etree.ElementTree as ET
import logging
class XmlTree():
def __init__(self):
self.hdlr = logging.FileHandler('xml-comparison.log')
self.formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
#staticmethod
def convert_string_to_tree( xmlString):
return ET.fromstring(xmlString)
def xml_compare(self, x1, x2, excludes=[]):
"""
Compares two xml etrees
:param x1: the first tree
:param x2: the second tree
:param excludes: list of string of attributes to exclude from comparison
:return:
True if both files match
"""
if x1.tag != x2.tag:
self.logger.debug('Tags do not match: %s and %s' % (x1.tag, x2.tag))
return False
for name, value in x1.attrib.items():
if not name in excludes:
if x2.attrib.get(name) != value:
self.logger.debug('Attributes do not match: %s=%r, %s=%r'
% (name, value, name, x2.attrib.get(name)))
return False
for name in x2.attrib.keys():
if not name in excludes:
if name not in x1.attrib:
self.logger.debug('x2 has an attribute x1 is missing: %s'
% name)
return False
if not self.text_compare(x1.text, x2.text):
self.logger.debug('text: %r != %r' % (x1.text, x2.text))
return False
if not self.text_compare(x1.tail, x2.tail):
self.logger.debug('tail: %r != %r' % (x1.tail, x2.tail))
return False
cl1 = x1.getchildren()
cl2 = x2.getchildren()
if len(cl1) != len(cl2):
self.logger.debug('children length differs, %i != %i'
% (len(cl1), len(cl2)))
return False
i = 0
for c1, c2 in zip(cl1, cl2):
i += 1
if not c1.tag in excludes:
if not self.xml_compare(c1, c2, excludes):
self.logger.debug('children %i do not match: %s'
% (i, c1.tag))
return False
return True
def text_compare(self, t1, t2):
"""
Compare two text strings
:param t1: text one
:param t2: text two
:return:
True if a match
"""
if not t1 and not t2:
return True
if t1 == '*' or t2 == '*':
return True
return (t1 or '').strip() == (t2 or '').strip()
The second block of code holds a couple of XML examples and their comparison:
xml1 = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
xml2 = "<note><to>Tove</to><from>Daniel</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
tree1 = XmlTree.convert_string_to_tree(xml1)
tree2 = XmlTree.convert_string_to_tree(xml2)
comparator = XmlTree()
if comparator.xml_compare(tree1, tree2, ["from"]):
print "XMLs match"
else:
print "XMLs don't match"
Most of the credit for this code must be given to syawar
If your intent is to compare the XML content and attributes, and not just compare the files byte-by-byte, there are subtleties to the question, so there is no solution that fits all cases.
You have to know something about what is important in the XML files.
The order of attributes listed in an element tag is generally not supposed to matter. That is, two XML files that differ only in the order of element attributes generally ought to be judged the same.
But that's the generic part.
The tricky part is application-dependent. For instance, it may be that white-space formatting of some elements of the file doesn't matter, and white-space might be added to the XML for legibility. And so on.
Recent versions of the ElementTree module have a function canonicalize(), which can take care of simpler cases, by putting the XML string into a canonical format.
I used this function in the unit tests of a recent project, to compare a known XML output with output from a package that sometimes changes the order of attributes. In this case, white space in the text elements was unimportant, but it was sometimes used for formatting.
import xml.etree.ElementTree as ET
def _canonicalize_XML( xml_str ):
""" Canonicalizes XML strings, so they are safe to
compare directly.
Strips white space from text content."""
if not hasattr( ET, "canonicalize" ):
raise Exception( "ElementTree missing canonicalize()" )
root = ET.fromstring( xml_str )
rootstr = ET.tostring( root )
return ET.canonicalize( rootstr, strip_text=True )
To use it, something like this:
file1 = ET.parse('file1.xml')
file2 = ET.parse('file2.xml')
canon1 = _canonicalize_XML( ET.tostring( file1.getroot() ) )
canon2 = _canonicalize_XML( ET.tostring( file2.getroot() ) )
print( canon1 == canon2 )
In my distribution, the Python 2 doesn't have canonicalize(), but Python 3 does.
Another script using xml.etree. Its awful but it works :)
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET
from termcolor import colored
tree1 = ET.parse(sys.argv[1])
root1 = tree1.getroot()
tree2 = ET.parse(sys.argv[2])
root2 = tree2.getroot()
class Element:
def __init__(self,e):
self.name = e.tag
self.subs = {}
self.atts = {}
for child in e:
self.subs[child.tag] = Element(child)
for att in e.attrib.keys():
self.atts[att] = e.attrib[att]
print "name: %s, len(subs) = %d, len(atts) = %d" % ( self.name, len(self.subs), len(self.atts) )
def compare(self,el):
if self.name!=el.name:
raise RuntimeError("Two names are not the same")
print "----------------------------------------------------------------"
print self.name
print "----------------------------------------------------------------"
for att in self.atts.keys():
v1 = self.atts[att]
if att not in el.atts.keys():
v2 = '[NA]'
color = 'yellow'
else:
v2 = el.atts[att]
if v2==v1:
color = 'green'
else:
color = 'red'
print colored("first:\t%s = %s" % ( att, v1 ), color)
print colored("second:\t%s = %s" % ( att, v2 ), color)
for subName in self.subs.keys():
if subName not in el.subs.keys():
print colored("first:\thas got %s" % ( subName), 'purple')
print colored("second:\thasn't got %s" % ( subName), 'purple')
else:
self.subs[subName].compare( el.subs[subName] )
e1 = Element(root1)
e2 = Element(root2)
e1.compare(e2)

How to write an exception in the following case in Python3?

I need to revise a python code (given by a programmer) which I would like to use for a genealogy project of mine. I am very new to python and start being able to read code. Yet, I do not know how to fix the following thing.
I get the following error message when executing the code:
self['gebort'] += ", Taufe: %s" % place.get_title()
KeyError: 'gebort'
The issue is that for one of the persons in my database only the date of baptism (here: Taufe) is known but not the date of birth. This is were the code fails.
This is the relevant snippet of the code basis:
birth_ref = person.get_birth_ref()
if birth_ref:
birth = database.get_event_from_handle(birth_ref.ref)
self['gjahr'] = birth.get_date_object().get_year()
if self['gjahr'] >= 1990:
self['mindj'] = True
self['gebdat'] = dd.display(birth.get_date_object())
self['plaingebdat'] = self['gebdat']
place_handle = birth.get_place_handle()
self['geborthandle'] = place_handle
place = database.get_place_from_handle(place_handle)
if place:
self['gebort'] = place.get_title()
self['plaingebort'] = self['gebort']
for eventref in person.get_event_ref_list():
event = database.get_event_from_handle(eventref.ref)
if event.get_type() in (gramps.gen.lib.EventType.CHRISTEN, gramps.gen.lib.EventType.BAPTISM):
self['gebdat'] += ", Taufe: %s" % dd.display(event.get_date_object())
place_handle = event.get_place_handle()
place = database.get_place_from_handle(place_handle)
if place:
self['gebort'] += ", Taufe: %s" % place.get_title()
Now, I do not know to add an exception handling when there is no birth date/place found so that the code would not give out any values of birth. Would somebody be able to point me to the right direction?
Instead of:
dict_name['gebort'] += ", Taufe: %s" % place.get_title()
you could write
dict_name['gebort'] = dict_name.get('gebort', '') + ", Taufe: %s" % place.get_title()
As already written, naming something self is not clever, or is the code above from a class derived from dict? Using .get you can define what is returned in case if there is no key of that name, in the exampe an empty string.

Method returns Unicode object, gets assigned into a NoneType. In Python

Whis scripts read from a source with lines consisting of artist names followed by a parenthesis with information about whether the artists cancelled and which country they come from.
A normal sentence may look like:
Odd Nordstoga (NO) (Cancelled), 20-08-2012, Blå
As I import the data I decode them into UTF-8 and this works fine. Uncommenting the second comment in the else block of the remove_extra() method shows that all variables are of type Unicode.
However, when a value is returned and put into another variable and the value of this is tested, the majority of the variables seems to be of NoneType.
Why does this happen? And how can it be corrected? Seems to be an error happening between the method return and assignment of the new variable.
# -*- charset: utf-8 -*-
import re
f1 = open("oya_artister_2011.csv")
artister = []
navnliste = []
PATTERN = re.compile(r"(.*)(\(.*\))")
TEST_PAT = re.compile(r"\(.*\)")
def remove_extra(tekst):
if re.search(PATTERN, tekst) > 1:
after = re.findall(PATTERN, tekst)[0][0]
#print "tekst is: %s " % tekst
#print "and of type: %s" % type(tekst)
remove_extra(after)
else:
#print "will return: ", tekst
#print "of type: %s" % type(tekst)
return tekst
for line in f1:
navn, _rest = line.split(",",1)
navn = navn.decode("utf-8")
artister.append(navn)
for artist in artister:
ny_artist = remove_extra(artist)
#print "%s" % ny_artist
print "of type: %s" % type(ny_artist)
Try
return remove_extra(after)
instead of just
remove_extra(after)

string comprehension in Python

I am working with images that have multiple layer which are described in their meta data that looks like this..
print layers
Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}
I'm wondering if this formatting could be recognizable by Python as more then a string. Ideally I would like to call up the properties of any one for the layers. For example:
print layers[0].mode
"Normal"
On another post someone showed me how to get the names of each layer, which was very helpful, but now I'm looking to use the other info.
PS: if it helps I don't care about any of the info inside the {}
Thanks
print type(layers)
<type 'str'>"
In case you don't want to deal with regex ...
layers = "Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}"
layer_dict = {}
parts = layers.split('}')
for part in parts:
part = part.strip(', ')
name_end = part.find('[')
if name_end < 1:
continue
name = part[:name_end]
attrs_end = part.find(']')
attrs = part[name_end+1:attrs_end].split(', ')
layer_dict[name] = {}
for attr in attrs:
attr_parts = attr.split(':')
layer_dict[name][attr_parts[0]] = attr_parts[1]
print 'Cube1 ... mode:', layer_dict.get('Cube1').get('mode')
print 'Ground ... opacity:', layer_dict.get('Ground').get('opacity')
print 'Cube3', layer_dict.get('Cube3')
output ...
Cube1 ... mode: Normal
Ground ... opacity: 186
Cube3 {'visible': 'true', 'mode': 'Normal'}
Parsing (Pyparsing et al) is surely the correct and extensible way to go, but here's a fast-and-dirty object and constructors using regexes and comprehensions to parse properties and bolt them on with setattr(). All constructive criticisms welcome!
import re
#import string
class Layer(object):
#classmethod
def make_list_from_string(cls,s):
all_layers_params = re.findall(r'(\w+)\[([^\]]+)\]',s)
return [cls(lname,largs) for (lname, largs) in all_layers_params]
def __init__(self,name,args):
self.name = name
for (larg,lval) in re.findall(r'(\w+):(\w+)(?:,\w*)?', args):
setattr(self,larg,lval)
def __str__(self):
return self.name + '[' + ','.join('%s:%s' % (k,v) for k,v in self.__dict__.iteritems() if k!='name') + ']'
def __repr__(self):
return self.__str__()
t = 'Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}'
layers = Layer.make_list_from_string(t)
I moved all the imperative code into __init__() or the classmethod Layers.make_list_from_string().
Currently it stores all args as string, it doesn't figure opacity is int/float, but that's just an extra try...except block.
Hey, it does the job you wanted. And as a bonus it throws in mutability:
print layers[0].mode
'Normal'
print layers[1].opacity
'186'
print layers[2]
Cube3[visible:true,mode:Normal]
layers[0].mode = 'Weird'
print layers[0].mode
'Weird'
"I'm wondering if this formatting could be recognizable by Python as more then a string."
Alternatively, I was thinking if you tweaked the format a little, eval()/exec() could be used, but that's yukkier, slower and a security risk.

Django, custom template filters - regex problems

I'm trying to implement a WikiLink template filter in Django that queries the database model to give different responses depending on Page existence, identical to Wikipedia's red links. The filter does not raise an Error but instead doesn't do anything to the input.
WikiLink is defined as: [[ThisIsAWikiLink | This is the alt text]]
Here's a working example that does not query the database:
from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re
register = template.Library()
#register.filter
#stringfilter
def wikilink(value):
return re.sub(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', r'\2', value)
wikilink.is_safe = True
The input (value) is a multi-line string, containing HTML and many WikiLinks.
The expected output is substituting [[ThisIsAWikiLink | This is the alt text]] with
This is the alt text
or if "ThisIsAWikiLink" doesn't exist in the database:
This is the alt text
and returning value.
Here's the non-working code (edited in response to comments/answers):
from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re
register = template.Library()
#register.filter
#stringfilter
def wikilink(value):
m = re.match(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', value)
if(m):
page_alias = m.group(2)
page_title = m.group(3)
try:
page = Page.objects.get(alias=page_alias)
return re.sub(r'(\[\[)(.*)\|(.*)(\]\])', r'\3', value)
except Page.DoesNotExist:
return re.sub(r'(\[\[)(.*)\|(.*)(\]\])', r'\3', value)
else:
return value
wikilink.is_safe = True
What the code needs to do is:
extract all the WikiLinks in value
query the Page model to see if the page exists
substitute all the WikiLinks with normal links, styled dependent on each wikipage existence.
return the altered value
The updated question is:
What regular expression (method) can return a python List of WikiLinks, which can be altered and used to substitute the original matches (after being altered).
Edit:
I'd like to do something like this:
def wikilink(value):
regex = re.magic_method(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]', value)
foreach wikilink in regex:
alias = wikilink.group(0)
text = wikilink.group(1)
if(alias exists in Page):
regex.sub(""+ text +"")
else:
regex.sub("<a href="+alias+" class='redlink'>"+ text +"</a>")
return value
If your string contains other text in addition to the wiki-link, your filter won't work because you are using re.match instead of re.search. re.match matches at the beginning of the string. re.search matches anywhere in the string. See matching vs. searching.
Also, your regex uses the greedy *, so it won't work if one line contains multiple wiki-links. Use *? instead to make it non-greedy:
re.search(r'\[\[(.*?)\|(.*?)\]\]', value)
Edit:
As for tips on how to fix your code, I suggest that you use re.sub with a callback. The advantages are:
It works correctly if you have multiple wiki-links in the same line.
One pass over the string is enough. You don't need a pass to find wiki-links, and another one to do the replacement.
Here is a sketch of the implmentation:
import re
WIKILINK_RE = re.compile(r'\[\[(.*?)\|(.*?)\]\]')
def wikilink(value):
def wikilink_sub_callback(match_obj):
alias = match_obj.group(1).strip()
text = match_obj.group(2).strip()
if(alias exists in Page):
class_attr = ''
else:
class_attr = ' class="redlink"'
return '<a href="%s"%s>%s</a>' % (alias, class_attr, text)
return WIKILINK_RE.sub(wikilink_sub_callback, value)
This is the type of problem that falls quickly to a small set of unit tests.
Pieces of the filter that can be tested in isolation (with a bit of code restructuring):
Determining whether or not value contains the pattern you're looking for
What string gets generated if there is a matching Page
What string gets generated is there isn't a matching Page
That would help you isolate where things are going wrong. You'll probably find that you'll need to rewire the regexps to account for optional spaces around the |.
Also, on first glance it looks like your filter is exploitable. You're claiming the result is safe, but you haven't filtered the alt text for nasties like script tags.
Code:
import re
def page_exists(alias):
if alias == 'ThisIsAWikiLink':
return True
return False
def wikilink(value):
if value == None:
return None
for alias, text in re.findall('\[\[\s*(.*?)\s*\|\s*(.*?)\s*\]\]',value):
if page_exists(alias):
value = re.sub('\[\[\s*%s\s*\|\s*%s\s*\]\]' % (alias,text), '%s' % (alias, text),value)
else:
value = re.sub('\[\[\s*%s\s*\|\s*%s\s*\]\]' % (alias,text), '%s' % (alias, text), value)
return value
Sample results:
>>> import wikilink
>>> wikilink.wikilink(None)
>>> wikilink.wikilink('')
''
>>> wikilink.wikilink('Test')
'Test'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]')
'This is the alt text'
>>> wikilink.wikilink('[[ThisIsABadWikiLink | This is the alt text]]')
'This is the alt text'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]\n[[ThisIsAWikiLink | This is another instance]]')
'This is the alt text\nThis is another instance'
>>> wikilink.wikilink('[[ThisIsAWikiLink | This is the alt text]]\n[[ThisIsAWikiLink | This is another instance]]')
General comments:
findall is the magic re function you're looking for
Change page_exists to run whatever query you want
Vulnerable to HTML injection (as mentioned by Dave W. Smith above)
Having to recompile the regex on each iteration is inefficient
Querying the database each time is inefficient
I think you'd run into performance issues pretty quickly with this approach.
This is the working code in case someone needs it:
from django import template
from django.template.defaultfilters import stringfilter
from sites.wiki.models import Page
import re
register = template.Library()
#register.filter
#stringfilter
def wikilink(value):
WIKILINK_RE = re.compile(r'\[\[ ?(.*?) ?\| ?(.*?) ?\]\]')
def wikilink_sub_callback(match_obj):
alias = match_obj.group(1).strip()
text = match_obj.group(2).strip()
class_attr = ''
try:
Page.objects.get(alias=alias)
except Page.DoesNotExist:
class_attr = ' class="redlink"'
return '<a href="%s"%s>%s</a>' % (alias, class_attr, text)
return WIKILINK_RE.sub(wikilink_sub_callback, value)
wikilink.is_safe = True
Many thanks for all the answers!

Categories

Resources