I am using bottle and peewee as the framework in back end, sqlite3 as the DB. And the summernote is the text editor at the front. Succeed in saving the code into DB, but failed to display the text when retrieving from the DB.
DB data:
The Draft column is the html
Source code:
Front:
$('#summernote').summernote("code", "{{content}}");
Backend:
template('apps_post_editer', appName='Post New', pid=newPost.id, title=('' if newPost.title is None else str(newPost.title)), content=('' if newPost.draft is None else unicode(str(newPost.draft), "utf-8")))
I thought it was the coding problem at the beginning, so i use unicode to turn the value in utf-8, but does not work. And also failed only str(newPost.draft)
The result:
You can see that the html code is not converted
Question:
Why it happens like that? Is there any solution?
Thanks very much.
Update: sorry it is my first question, don't know why the picture don't display...Please click the link to get more details...
OK...need 10 reputation
When you want to render HTML that comes from the database with bottle, you have to tell the rendering engine that the content is safe to render in order to avoid XSS attacks.
With bootle you can disable escaping for expressions like this:
{{! summernotecontent}}
in your case that would be:
$('#summernote').summernote("code", "{{! content}}");
You can find the documentation on this topic in bottle here
Related
(The title may be in error here, but I believe that the problem is related to escaping characters)
I'm using webpy to create a VERY simple todo list using peewee with Sqlite to store simple, user submitted todo list items, such as "do my taxes" or "don't forget to interact with people", etc.
What I've noticed is that the DELETE request fails on certain inputs that contain specific symbols. For example, while I can add the following entries to my Sqlite database that contains all the user input, I cannot DELETE them:
what?
test#
test & test
This is a test?
Any other user input with any other symbols I'm able to DELETE with no issues. Here's the webpy error message I get in the browser when I try to DELETE the inputs list above:
<class 'peewee.UserInfoDoesNotExist'> at /del/test
Instance matching query does not exist: SQL: SELECT "t1"."id", "t1"."title" FROM "userinfo" AS t1 WHERE ("t1"."title" = ?) PARAMS: [u'test']
Python /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/peewee.py in get, line 2598
Web POST http://0.0.0.0:7700/del/test
When I view the database file (called todoUserList.db) in sqlitebrowser, I can see that these entries do exist with the symbols, they're all there.
In my main webpy app script, I'm using a regex to search through the db to make a DELETE request, it looks like this:
urls = (
'/', 'Index',
'/del/(.*?)', 'Delete'
)
I've tried variations of the regex, such as '/del/(.*)', but still get the same error, so I don't think that's the problem.
Given the error message above, is webpy not "seeing" certain symbols in the user input because they're not being escaped properly?
Confused as to why it seems to only happen with the select symbols listed above.
Depending on how the URL escaping is functioning it could be an issue in particular with how "?" and "&" are interpreted by the browser (in a typical GET style request & and ? are special character used to separate query string parameters)
Instead of passing those in as part of the URL itself you should pass them in as an escaped querystring. As far as I know, no web server is going to respect wacky values like that as part of a URL. If they are escaped and put in the querystring (or POST body) you'll be fine, though.
I have a server as app and all works ok except when I save ajax forms. If I save from python script - with right input - data are returned as unicode. But the data from js is strange: on pipe should be only bytes(that's the only data type http knows) , but bottle show me str (it is not utf-8) and I can't encode/decode to get correct value. On js side I try with jquery and form.serialise, works for other frameworks.
#post('/agt/save')
def saveagt():
a = Agent({x: request.forms.get(x) for x in request.forms})
print(a.nume, a.nume.encode())
return {'ret': ags.add(a)}
... and a name like „țânțar” become „ÈânÈar”.
It may be a simple problem, but I think I didn't drink enough coffee yet.
If anyone is curious, bottle don't handle corect the url.
So urllib.parse.unquote(request.body.read().decode()) solve problem.
or
d = urllib.parse.parse_qs(request.body.read().decode())
a = Agent({x: d[x][0] for x in d})
in my case.
Is it a bug of bottle? Or should I tell him to decode URI and I don't know how?
Use
request.forms.getunicode('some_form_field_name')
as shorthand, if you want to get around the character conversion to latin-1.
I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?
Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...
I'm trying to learn to use Python to create dynamic web content. Problem I'm having right out the door, though, is that when I try to do a mySQL query, absolutely nothing happens. There's no error message... it looks like the script simply stops running when I import the module that enables connection to the database.
This does do exactly what I'd expect when I try to run it from the command line.
#!/usr/bin/python
print "Content-Type: text/xml"
print
#if I type, for example, print "<b>test</b>" here, it appears in the browser window
#msql contains the credentials for connecting to database
#it is NOT in public_html
import msql
#no print instructions after this point are followed
connex=msql.msqlConn()
db=msql.MySQLdb
cursor=connex.cursor(db.cursors.DictCursor)
cursor.execute("SELECT * FROM userActions")
#run the query
xmlOutput=""
rows=cursor.fetchall()
#output the results
for row in rows:
xmlOutput+="<action>"
xmlOutput+="<actionId>"+str(row["actionId"])+"</actionId>"
xmlOutput+="<userId>"+str(row["userId"])+"</userId>"
xmlOutput+="<actText>"+str(row["action"])+"</actText>"
xmlOutput+="<date>"+str(row["dateStamp"])+"</date>"
xmlOutput+="</action>"
xmlOutput="<list>"+xmlOutput+"</list>"
print xmlOutput
This would be my first stab at this, so it merely stands to reason that this should work. I've found nothing online that would suggest otherwise, though.
Please, take 24h to learn something like Django. Django has an ORM and a XML serializer that will make your life easier ensuring proper (and legal) xml, it really pays up.
You can enable cgi traceback to see what happens:
import cgitb
cgitb.enable()
(I don't think it causes your problem, and clients understand \n as well but it is better to use sys.write("...\r\n") instead of print to print HTTP headers)
Edited: try to add Content-length: XXXX\r\n to the header.
What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?
Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).
It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)
As you can see, it uses the (awesome) BeautifulSoup library.
import re
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup, Comment
def sanitizeHtml(value, base_url=None):
rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))
rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))
re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)
validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()
validAttrs = 'href src width height'.split()
urlAttrs = 'href src'.split() # Attributes which should have a URL
soup = BeautifulSoup(value)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
# Get rid of comments
comment.extract()
for tag in soup.findAll(True):
if tag.name not in validTags:
tag.hidden = True
attrs = tag.attrs
tag.attrs = []
for attr, val in attrs:
if attr in validAttrs:
val = re_scripts.sub('', val) # Remove scripts (vbs & js)
if attr in urlAttrs:
val = urljoin(base_url, val) # Calculate the absolute url
tag.attrs.append((attr, val))
return soup.renderContents().decode('utf8')
As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.
Edit: bleach is a wrapper around html5lib which makes it even easier to use as a whitelist-based sanitiser.
html5lib comes with a whitelist-based HTML sanitiser - it's easy to subclass it to restrict the tags and attributes users are allowed to use on your site, and it even attempts to sanitise CSS if you're allowing use of the style attribute.
Here's now I'm using it in my Stack Overflow clone's sanitize_html utility function:
http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py
I've thrown all the attacks listed in ha.ckers.org's XSS Cheatsheet (which are handily available in XML format at it after performing Markdown to HTML conversion using python-markdown2 and it seems to have held up ok.
The WMD editor component which Stackoverflow currently uses is a problem, though - I actually had to disable JavaScript in order to test the XSS Cheatsheet attacks, as pasting them all into WMD ended up giving me alert boxes and blanking out the page.
The best way to prevent XSS is not to try and filter everything, but rather to simply do HTML Entity encoding. For example, automatically turn < into <. This is the ideal solution assuming you don't need to accept any html input (outside of forum/comment areas where it is used as markup, it should be pretty rare to need to accept HTML); there are so many permutations via alternate encodings that anything but an ultra-restrictive whitelist (a-z,A-Z,0-9 for example) is going to let something through.
SQL Injection, contrary to other opinion, is still possible, if you are just building out a query string. For example, if you are just concatenating an incoming parameter onto a query string, you will have SQL Injection. The best way to protect against this is also not filtering, but rather to religiously use parameterized queries and NEVER concatenate user input.
This is not to say that filtering isn't still a best practice, but in terms of SQL Injection and XSS, you will be far more protected if you religiously use Parameterize Queries and HTML Entity Encoding.
Jeff Atwood himself described how StackOverflow.com sanitizes user input (in non-language-specific terms) on the Stack Overflow blog: https://blog.stackoverflow.com/2008/06/safe-html-and-xss/
However, as Justin points out, if you use Django templates or something similar then they probably sanitize your HTML output anyway.
SQL injection also shouldn't be a concern. All of Python's database libraries (MySQLdb, cx_Oracle, etc) always sanitize the parameters you pass. These libraries are used by all of Python's object-relational mappers (such as Django models), so you don't need to worry about sanitation there either.
I don't do web development much any longer, but when I did, I did something like so:
When no parsing is supposed to happen, I usually just escape the data to not interfere with the database when I store it, and escape everything I read up from the database to not interfere with html when I display it (cgi.escape() in python).
Chances are, if someone tried to input html characters or stuff, they actually wanted that to be displayed as text anyway. If they didn't, well tough :)
In short always escape what can affect the current target for the data.
When I did need some parsing (markup or whatever) I usually tried to keep that language in a non-intersecting set with html so I could still just store it suitably escaped (after validating for syntax errors) and parse it to html when displaying without having to worry about the data the user put in there interfering with your html.
See also Escaping HTML
If you are using a framework like django, the framework can easily do this for you using standard filters. In fact, I'm pretty sure django automatically does it unless you tell it not to.
Otherwise, I would recommend using some sort of regex validation before accepting inputs from forms. I don't think there's a silver bullet for your problem, but using the re module, you should be able to construct what you need.
To sanitize a string input which you want to store to the database (for example a customer name) you need either to escape it or plainly remove any quotes (', ") from it. This effectively prevents classical SQL injection which can happen if you are assembling an SQL query from strings passed by the user.
For example (if it is acceptable to remove quotes completely):
datasetName = datasetName.replace("'","").replace('"',"")