How to deal with params in URL schema in Django application

How to deal with params in URL schema in Django application - python

As we know most URL schemes base their URL syntax on this nine-part general format:
<scheme>://<user>:<password>#<host>:<port>/<path>;<params>?<query>#<frag>
I know I can get the query strings in Django request.GET dict. But how to get parameters? It seems HttpRequest in Django can not deal with params defined in URL.
For example: http://www.example.com/hello;param1=aaaa;param2=bbbb
Is there any way to extract params like this in Django without writing own regex in url pattern.
Any help will be appreciated.

The path segment parameters you're talking about were defined in RFC 2396. This RFC has been obsolete since 2005. The new standard, RFC 3986, does not define a specific way to define path segment parameters. The relevant section states:
Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax. [...] For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. [...] Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific to
the implementation of the URI's dereferencing algorithm.
Django uses the new standard and does not define any specific way to parse path segment parameters. However, Python defines the urlparse() function, which parses the url according to RFC2396. You can use this to extract the path parameters:
from urllib.parse import urlparse # urlparse.urlparse on Python 2
params = urlparse(request.path).params

to capture params you've to define url patterns as per that.
lets assume your url is http://myurl.com/mydashboard/250?sort=-1&q=this so here 250 is param which you want.
To capture the same you've to define url like below
url(r'^mydashboard/(?P<weeknum>\d+)/$',
mydashboard, name='mydashboard'),
and now the view definition of mydashboard will have another input param like below
def nagdashboard(request, weeknum):
And to get GET params or POST params you can use.
request.GET.get('sort')
For further reading on patterns refer to here

Related

Pass a URL as parameter in Django Urls

I'm working on a Django(2) project in which I need to pass an URL as a parameter in a Django URL,
Here's what i have tried:
urls.py:
urlpatterns = [
path('admin/', admin.site.urls),
url(r'^api/(?P<address>.*)/$', PerformImgSegmentation.as_view()),
]
views.py:
class PerformImgSegmentation(generics.ListAPIView):
def get(self, request, *args, **kwargs):
img_url = self.kwargs.get('address')
print(img_url)
print('get request')
return 'Done'
But it doesn't work, I have passed an argument with the name as address via postman, but it failed.
It returns this error:
Not Found: /api/
[05/Sep/2018 15:28:06] "GET /api/ HTTP/1.1" 404 2085

Django 2.0 and later use now path func constructors to specify URLs. I'm not sure if there's still backwards compatibility; you can try that with a simple example. However if you are starting to write an app you should use path:
path('api/<str:encoded_url>/', view_action)
To avoid confusion with a stantard path to view in your app, I do not recommend using the path converter instead of the str (the former lets you match /, while the other does not).
You can get more help for transitioning from url to path with this article.
Second step, get the encoded_url as an argument in the view. You need to decode it: to pass a url inside the get url, you use ASCII encoding that substitutes certain reserved characters for others (for example, the forward slash).
You can encode and decode urls easily with urllib (there are other modules as well). For Python 3.7 syntax is as follows (docs here)
>>> urllib.parse.quote("http://www.google.com")
'http%3A//www.google.com'
>>> urllib.parse.unquote('http%3A//www.google.com')
'http://www.google.com'
Remember: if you pass the url without quoting it won't match: you are not accepting matches for slashes with that path expression. (Edit: quote method default does not convert forward slashes, for that you need to pass: quote(<str>, safe='')
So for example your GET call should look: /api/http%3A%2F%2Fwww.google.com. However it's better if you pass the URL as a get parameter and in the paths you only care aboubt readability (for example /api/name_to_my_method?url=http%3A%2F%2Fwww.google.com). Path engineering is important for readability and passing a quoted URL through is usually not ebst practice (though perfectly possible).

Django 2.0 is providing the Path Converters to convert the path parameters into appropriate types, which also includes a converter for urls, take a look at the docs.
So, your URLs can be like this:
path('api/<path:encoded_url>/', PerformImgSegmentation.as_view()),
So, the path converter will Matches any non-empty string, including the path separator, '/'. This allows you to match against a complete URL path rather than just a segment of a URL path as with str.
Then in the view, we can simply get our URL values from kwargs like this:
img_url = self.kwargs.get('encoded_url')
print(img_url)

Are you trying to pass the url https://i.imgur.com/TGJHFe1.jpg as a parameter from the django template to the view ?
You could simple write in your app's url.py:
path(api/<path:the_url_you_want_to_pass>', PerformImgSegmentation.as_view())
Please have a look at this article.

pass a multi-value as parameters in pyramid URL Dispatch (add_route)

how to configure and use multidict in pyramid.
config.add_route('show_choosed_categories', '/categories/[list]')
and generate the urls like
${request.route_url('show_choosed_categories', categories=[1, 2] )}
in view i would use
request.GET.getall('categories')
pyramid seems to support it by webob.multidict – multi-value dictionary object https://docs.pylonsproject.org/projects/webob/en/stable/api/multidict.html
but how to use it with URL Dispatch.

The route you have configured would match only on a string literal [list]. Routes cannot match on Python objects, only strings and replacement markers. From Route Pattern Syntax under URL Dispatch:
A pattern segment (an individual item between / characters in the pattern) may either be a literal string (e.g., foo) or it may be a replacement marker (e.g., {foo}), or a certain combination of both.
Nonetheless you can extract a multidict from the request object.
# Conjugation of English verbs is horrible
config.add_route('show_chosen_categories', '/categories/')
Assuming you have a list of checkboxes named the same or a select multiple input in a form, with either input being named category, then your request parameters would be generated to look like this:
category=1&category=2
Then any URL that begins with categories would be matched, and request parameters would be usable in your view, depending on your form action:
# form action="POST"
request.POST.getall('category')
# form action="GET"
request.GET.getall('category')
>>> [1, 2]
See Multidict under Request and Response Objects for further information.

You are probably looking for config.add_route('foo', '/categories/*subpath') and request.route_url('foo', subpath=(1, 2, 3). Support for this is limited but it does work if it fits your use cases. Note that an empty list is valid here so you need to handle that.

Regular expression url Django

I would like to have one or two urls in following format in django.
e.g. /view/q/node/c/programming....
Here q, c, etc are specific get params and latter on I'll be accessing them either through js or in view.
I have tried following pattern, but it didn't work out.
r'view/(P<q>\w+)/(P<c>\w+)/(P<l>\w+)/(P<o>\w+)/$'
I know that all the get params can be accessed in view if we have ? in url. But I don't want to have ? in url.
Can anyone suggest how to achieve this? Please note that the url can be without any params also. e.g. view/.
Thanks.

You're missing the ? from your Named groups. From the Django docs:
In Python regular expressions, the syntax for named regular-expression groups is
(?P<name>pattern), where name is the name of the group and pattern is some pattern to match.
Thus, referring to your example, it should be:
r'^view/(?P<q>\w+)/(?P<c>\w+)/(?P<l>\w+)/(?P<o>\w+)/$'

Pyramids route_url with additional query arguments

In Pyramids framework, functions route_path and route_url are used to generate urls from routes configuration. So, if I have route:
config.add_route('idea', 'ideas/{idea}')
I am able to generate the url for it using
request.route_url('idea', idea="great");
However, sometimes I may want to add additional get parameters to generate url like:
idea/great?sort=asc
How to do this?
I have tried
request.route_url('idea', idea='great', sort='asc')
But that didn't work.

You can add additional query arguments to url passing the _query dictionary
request.route_url('idea', idea='great', _query={'sort':'asc'})

If you are using Mako templates, _query={...} won't work; instead you need to do:
${request.route_url('idea', idea='great', _query=(('sort', 'asc'),))}
The tuple of 2-tuples works as a dictionary.

Sanitising user input using Python

What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?

Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).
It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)
As you can see, it uses the (awesome) BeautifulSoup library.
import re
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup, Comment
def sanitizeHtml(value, base_url=None):
rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))
rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))
re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)
validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()
validAttrs = 'href src width height'.split()
urlAttrs = 'href src'.split() # Attributes which should have a URL
soup = BeautifulSoup(value)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
# Get rid of comments
comment.extract()
for tag in soup.findAll(True):
if tag.name not in validTags:
tag.hidden = True
attrs = tag.attrs
tag.attrs = []
for attr, val in attrs:
if attr in validAttrs:
val = re_scripts.sub('', val) # Remove scripts (vbs & js)
if attr in urlAttrs:
val = urljoin(base_url, val) # Calculate the absolute url
tag.attrs.append((attr, val))
return soup.renderContents().decode('utf8')
As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.

Edit: bleach is a wrapper around html5lib which makes it even easier to use as a whitelist-based sanitiser.
html5lib comes with a whitelist-based HTML sanitiser - it's easy to subclass it to restrict the tags and attributes users are allowed to use on your site, and it even attempts to sanitise CSS if you're allowing use of the style attribute.
Here's now I'm using it in my Stack Overflow clone's sanitize_html utility function:
http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py
I've thrown all the attacks listed in ha.ckers.org's XSS Cheatsheet (which are handily available in XML format at it after performing Markdown to HTML conversion using python-markdown2 and it seems to have held up ok.
The WMD editor component which Stackoverflow currently uses is a problem, though - I actually had to disable JavaScript in order to test the XSS Cheatsheet attacks, as pasting them all into WMD ended up giving me alert boxes and blanking out the page.

The best way to prevent XSS is not to try and filter everything, but rather to simply do HTML Entity encoding. For example, automatically turn < into <. This is the ideal solution assuming you don't need to accept any html input (outside of forum/comment areas where it is used as markup, it should be pretty rare to need to accept HTML); there are so many permutations via alternate encodings that anything but an ultra-restrictive whitelist (a-z,A-Z,0-9 for example) is going to let something through.
SQL Injection, contrary to other opinion, is still possible, if you are just building out a query string. For example, if you are just concatenating an incoming parameter onto a query string, you will have SQL Injection. The best way to protect against this is also not filtering, but rather to religiously use parameterized queries and NEVER concatenate user input.
This is not to say that filtering isn't still a best practice, but in terms of SQL Injection and XSS, you will be far more protected if you religiously use Parameterize Queries and HTML Entity Encoding.

Jeff Atwood himself described how StackOverflow.com sanitizes user input (in non-language-specific terms) on the Stack Overflow blog: https://blog.stackoverflow.com/2008/06/safe-html-and-xss/
However, as Justin points out, if you use Django templates or something similar then they probably sanitize your HTML output anyway.
SQL injection also shouldn't be a concern. All of Python's database libraries (MySQLdb, cx_Oracle, etc) always sanitize the parameters you pass. These libraries are used by all of Python's object-relational mappers (such as Django models), so you don't need to worry about sanitation there either.

I don't do web development much any longer, but when I did, I did something like so:
When no parsing is supposed to happen, I usually just escape the data to not interfere with the database when I store it, and escape everything I read up from the database to not interfere with html when I display it (cgi.escape() in python).
Chances are, if someone tried to input html characters or stuff, they actually wanted that to be displayed as text anyway. If they didn't, well tough :)
In short always escape what can affect the current target for the data.
When I did need some parsing (markup or whatever) I usually tried to keep that language in a non-intersecting set with html so I could still just store it suitably escaped (after validating for syntax errors) and parse it to html when displaying without having to worry about the data the user put in there interfering with your html.
See also Escaping HTML

If you are using a framework like django, the framework can easily do this for you using standard filters. In fact, I'm pretty sure django automatically does it unless you tell it not to.
Otherwise, I would recommend using some sort of regex validation before accepting inputs from forms. I don't think there's a silver bullet for your problem, but using the re module, you should be able to construct what you need.

To sanitize a string input which you want to store to the database (for example a customer name) you need either to escape it or plainly remove any quotes (', ") from it. This effectively prevents classical SQL injection which can happen if you are assembling an SQL query from strings passed by the user.
For example (if it is acceptable to remove quotes completely):
datasetName = datasetName.replace("'","").replace('"',"")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to deal with params in URL schema in Django application - python

Related

Pass a URL as parameter in Django Urls

pass a multi-value as parameters in pyramid URL Dispatch (add_route)

Regular expression url Django

Pyramids route_url with additional query arguments

Sanitising user input using Python

Categories

Resources