Creating views in django (string indentation problems) - python

I am new to both Python (and django) - but not to programming.
I am having no end of problems with identation in my view. I am trying to generate my html dynamically, so that means a lot of string manipulation. Obviously - I cant have my entire HTML page in one line - so what is required in order to be able to dynamically build an html string, i.e. mixing strings and other variables?
For example, using PHP, the following trivial example demonstrates generating an HTML doc containing a table
<?php
$output = '<html><head><title>Getting worked up over Python indentations</title></head><body>';
output .= '<table><tbody>'
for($i=0; $i< 10; $i++){
output .= '<tr class="'.(($i%2) ? 'even' : 'odd').'"><td>Row: '.$i;
}
$output .= '</tbody></table></body></html>'
echo $output;
I am trying to do something similar in Python (in my views.py), and I get errors like:
EOL while scanning string literal (views.py, line 21)
When I put everything in a single line, it gets rid of the error.
Could someone show how the little php script above will be written in python?, so I can use that as a template to fix my view.
[Edit]
My python code looks something like this:
def just_frigging_doit(request):
html = '<html>
<head><title>What the funk<title></head>
<body>'
# try to start builing dynamic HTML from this point onward...
# but server barfs even further up, on the html var declaration line.
[Edit2]
I have added triple quotes like suggested by Ned and S.Lott, and that works fine if I want to print out static text. If I want to create dynamic html (for example a row number), I get an exception - cannot concatenate 'str' and 'int' objects.

I am trying to generate my html dynamically, so that means a lot of string manipulation.
Don't do this.
Use Django's templates. They work really, really well. If you can't figure out how to apply them, do this. Ask a question showing what you want to do. Don't ask how to make dynamic HTML. Ask about how to create whatever page feature you're trying to create. 80% of the time, a simple {%if%} or {%for%} does everything you need. The rest of the time you need to know how filters and the built-in tags work.
Use string.Template if you must fall back to "dynamic" HTML. http://docs.python.org/library/string.html#template-strings Once you try this, you'll find Django's is better.
Do not do string manipulation to create HTML.
cannot concatenate 'str' and 'int' objects.
Correct. You cannot.
You have three choices.
Convert the int to a string. Use the str() function. This doesn't scale well. You have lots of ad-hoc conversions and stuff. Unpleasant.
Use the format() method of a string to insert values into the string. This is slightly better than complex string manipulation. After doing this for a while, you figure out why templates are a good idea.
Use a template. You can try string.Template. After a while, you figure out why Django's are a good idea.
my_template.html
<html><head><title>Getting worked up over Python indentations</title></head><body>
<table><tbody>
{%for object in objects%}
<tr class="{%cycle 'even' 'odd'%}"><td>Row: {{object}}</td></tr>
{%endfor%}
</tbody></table></body></html>
views.py
def myview( request ):
render_to_response( 'my_template.html',
{ 'objects':range(10) }
)
I think that's all you'd need for a mockup.

In Python, a string can span lines if you use triple-quoting:
"""
This is a
multiline
string
"""
You probably want to use Django templates to create your HTML. Read a Django tutorial to see how it's done.
Python is strongly typed, meaning it won't automatically convert types for you to make your expressions work out, the way PHP will. So you can't concatenate strings and numbers like this: "hello" + num.

Related

What is the Python way to express a GREL line that is creating as many tags as needed in an XML document?

I'm using Open Refine to do something that I KNOW Python can do. I'm using it to convert a csv into an XML metadata document. I can figure out most of it, but the one thing that trips me up, is this GREL line:
{{forEach(cells["subjectTopicsLocal"].value.split('; '), v, '<subject authority="local"><topic>'+v.escape("xml")+'</topic></subject>')}}
What this does, is beautiful for me. I've got a "subject" field in my Excel spreadsheet. My volunteers enter keywords, separated with a "; ". I don't know how many keywords they'll come up with, and sometimes there is only one. That GREL line creates a new <subject authority="local"><topic></topic></subject> for each term created, and of course slides it into the field.
I know there has to be a Python expression that can do this. Could someone recommend best practice for this? I'd appreciate it!
Basically you want to use 'split' in Python to convert the string from your subject field into a Python list, and then you can iterate over the list.
So assuming you've read the content of the 'subject' field from a line in your csv/excel document already and assigned it to a string variable 'subj' you could do something like:
subjList = subj.split(";")
for subject in subjList:
#do what you need to do to output 'subject' in an xml element here
This Python expression is the equivalent to your GREL expression:
['<subject authority="local"><topic>'+escape(v)+'</topic></subject>') for v in split(value,'; ')]
It will create an array of XML snippets containing your subjects. It assumes that you've created or imported an appropriate escape function, such as
from xml.sax.saxutils import escape

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

How to place the iterated string variable in another string

Quick overview: I am writing a a very simple script using Python and Selenium to view Facebook Metrics for multiple Facebook pages.
I am trying to find a clean way to loop through the pages and output their results (it's only one number that I am collecting).
Here is what I have right now but it is not working.
# Navigate to metrics page
pages = ["page_example_1", "page_example_2", "page_example_3"]
for link in pages:
browser.get(('https://www.facebook.com/{link}/insights/?section=navVideos'))
# Navigate to metrics page
pages = ["page_example_1", "page_example_2", "page_example_3"]
for link in pages:
browser.get('https://www.facebook.com/'+ link + '/insights/?section=navVideos')
its just string concatenation
or if you are so much inclined to use that syntax, have a look at the comment by #heather
It didn't work for you, because you aimed to use Python 3.6's f-strings, but forgot to prepend your string with the f char - crucial for this syntax. E.g. your code should be (only the relevant part):
browser.get(f'https://www.facebook.com/{link}/insights/?sights/?section=navVideos')
Alternatively you could use string formatting (e.g. the established approach before 3.6):
browser.get('https://www.facebook.com/{}/insights/?sights/?section=navVideos'.format(link))
In general, string concatenation - 'string1' + variable + 'string2' - is discouraged in python for performance and readability reasons.
BTW, in your sample code you had brackets around the get()'s argument - it is browser.get((arg)), which essentially turned it to a tuple, and might've caused error in the call. Not sure was it a typo or on purpose, as you can see I and the other responders have removed it.

Template strings python 2.5 error

#!/usr/bin/python
from string import Template
s = Template('$x, go home $x')
s.substitute(x='lee')
print s
error i get is
<string.Template object at 0x81abdcc>
desired results i am looking for is : lee, go home lee
You need to look at the return value of substitute. It gives you the string with substitutions performed.
print s.substitute(x='lee')
The template object itself (s) is not changed. This gives you the ability to perform multiple substitutions with the same template object.
You're not getting an error: you're getting exactly what you're asking for -- the template itself. To achieve your desired result,
print s.substitute(x='lee')
Templates, like strings, are not mutable objects: any method you call on a template (or string) can never alter that template -- it can only produce a separate result which you can use. This, of course, applies to the .substitute method. You're calling it, but ignoring the result, and then printing the template -- no doubt you expect the template itself to be somehow altered, but that's just not how it works.
print s.substitute(x='lee')

Sanitising user input using Python

What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?
Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).
It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)
As you can see, it uses the (awesome) BeautifulSoup library.
import re
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup, Comment
def sanitizeHtml(value, base_url=None):
rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))
rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))
re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)
validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()
validAttrs = 'href src width height'.split()
urlAttrs = 'href src'.split() # Attributes which should have a URL
soup = BeautifulSoup(value)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
# Get rid of comments
comment.extract()
for tag in soup.findAll(True):
if tag.name not in validTags:
tag.hidden = True
attrs = tag.attrs
tag.attrs = []
for attr, val in attrs:
if attr in validAttrs:
val = re_scripts.sub('', val) # Remove scripts (vbs & js)
if attr in urlAttrs:
val = urljoin(base_url, val) # Calculate the absolute url
tag.attrs.append((attr, val))
return soup.renderContents().decode('utf8')
As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.
Edit: bleach is a wrapper around html5lib which makes it even easier to use as a whitelist-based sanitiser.
html5lib comes with a whitelist-based HTML sanitiser - it's easy to subclass it to restrict the tags and attributes users are allowed to use on your site, and it even attempts to sanitise CSS if you're allowing use of the style attribute.
Here's now I'm using it in my Stack Overflow clone's sanitize_html utility function:
http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py
I've thrown all the attacks listed in ha.ckers.org's XSS Cheatsheet (which are handily available in XML format at it after performing Markdown to HTML conversion using python-markdown2 and it seems to have held up ok.
The WMD editor component which Stackoverflow currently uses is a problem, though - I actually had to disable JavaScript in order to test the XSS Cheatsheet attacks, as pasting them all into WMD ended up giving me alert boxes and blanking out the page.
The best way to prevent XSS is not to try and filter everything, but rather to simply do HTML Entity encoding. For example, automatically turn < into <. This is the ideal solution assuming you don't need to accept any html input (outside of forum/comment areas where it is used as markup, it should be pretty rare to need to accept HTML); there are so many permutations via alternate encodings that anything but an ultra-restrictive whitelist (a-z,A-Z,0-9 for example) is going to let something through.
SQL Injection, contrary to other opinion, is still possible, if you are just building out a query string. For example, if you are just concatenating an incoming parameter onto a query string, you will have SQL Injection. The best way to protect against this is also not filtering, but rather to religiously use parameterized queries and NEVER concatenate user input.
This is not to say that filtering isn't still a best practice, but in terms of SQL Injection and XSS, you will be far more protected if you religiously use Parameterize Queries and HTML Entity Encoding.
Jeff Atwood himself described how StackOverflow.com sanitizes user input (in non-language-specific terms) on the Stack Overflow blog: https://blog.stackoverflow.com/2008/06/safe-html-and-xss/
However, as Justin points out, if you use Django templates or something similar then they probably sanitize your HTML output anyway.
SQL injection also shouldn't be a concern. All of Python's database libraries (MySQLdb, cx_Oracle, etc) always sanitize the parameters you pass. These libraries are used by all of Python's object-relational mappers (such as Django models), so you don't need to worry about sanitation there either.
I don't do web development much any longer, but when I did, I did something like so:
When no parsing is supposed to happen, I usually just escape the data to not interfere with the database when I store it, and escape everything I read up from the database to not interfere with html when I display it (cgi.escape() in python).
Chances are, if someone tried to input html characters or stuff, they actually wanted that to be displayed as text anyway. If they didn't, well tough :)
In short always escape what can affect the current target for the data.
When I did need some parsing (markup or whatever) I usually tried to keep that language in a non-intersecting set with html so I could still just store it suitably escaped (after validating for syntax errors) and parse it to html when displaying without having to worry about the data the user put in there interfering with your html.
See also Escaping HTML
If you are using a framework like django, the framework can easily do this for you using standard filters. In fact, I'm pretty sure django automatically does it unless you tell it not to.
Otherwise, I would recommend using some sort of regex validation before accepting inputs from forms. I don't think there's a silver bullet for your problem, but using the re module, you should be able to construct what you need.
To sanitize a string input which you want to store to the database (for example a customer name) you need either to escape it or plainly remove any quotes (', ") from it. This effectively prevents classical SQL injection which can happen if you are assembling an SQL query from strings passed by the user.
For example (if it is acceptable to remove quotes completely):
datasetName = datasetName.replace("'","").replace('"',"")

Categories

Resources