Python string templater

Python string templater - python

I'm using this REST web service, which returns various templated strings as urls, for example:
"http://api.app.com/{foo}"
In Ruby, I can then use
url = Addressable::Template.new("http://api.app.com/{foo}").expand('foo' => 'bar')
to get
"http://api.app.com/bar"
Is there any way to do this in Python? I know about %() templates, but obviously they're not working here.

In python 2.6 you can do this if you need exactly that syntax
from string import Formatter
f = Formatter()
f.format("http://api.app.com/{foo}", foo="bar")
If you need to use an earlier python version then you can either copy the 2.6 formatter class or hand roll a parser/regex to do it.

Don't use a quick hack.
What is used there (and implemented by Addressable) are URI Templates. There seem to be several libs for this in python, for example: uri-templates. described_routes_py also has a parser for them.

I cannot give you a perfect solution but you could try using string.Template.
You either pre-process your incoming URL and then use string.Template directly, like
In [6]: url="http://api.app.com/{foo}"
In [7]: up=string.Template(re.sub("{", "${", url))
In [8]: up.substitute({"foo":"bar"})
Out[8]: 'http://api.app.com/bar'
taking advantage of the default "${...}" syntax for replacement identifiers. Or you subclass string.Template to control the identifier pattern, like
class MyTemplate(string.Template):
delimiter = ...
pattern = ...
but I haven't figured that out.

Related

Python Variable For Get Request?

I am trying to move over some API calls I had working over to python from postman, I am having some issues making a variable callable by my next get request. I've found a few things while searching but never found a 100% answer on how to call the environment variable in the get request...is it correct to use the {{TEST}} to call that var. Example below.
Test = Myaccoount
Json_Response_Test = requests.get('https://thisisjustatesttoaccessmyaccount/{{Test}}')
How can I carry over Test into the request?

Your code will almost work as you have it if you use the feature of newer version of Python called "format strings". These are denoted by a f at the beginning of the string. This works like this in such versions of Python:
Test = Myaccoount
Json_Response_Test = requests.get(f'https://thisisjustatesttoaccessmyaccount/{Test}')
as long as Myaccoount is a valid value that can be expanded by Python into the format string.
If you're using an older version of Python, you could do something like this:
Test = Myaccoount
Json_Response_Test = requests.get('https://thisisjustatesttoaccessmyaccount/{}'.format(Test))
BTW, it's not good form to use uppercase first character names for variables. The convention is to use uppercase only for class and type names, and use lowercase for variable and field names.

Replacing strings with variables inside file in Python

I have a bunch of files with many tags inside of the form {my_var}, {some_var}, etc. I am looking to open them, and replace them with my_var and some_var that I've read into Python.
To do these sorts of things I've been using inspect.cleandoc():
import inspect, markdown
my_var='this'
some_var='that'
something=inspect.cleandoc(f'''
All my vars are {some_var} and {my_var}. This is all.
''')
print(something)
#All my vars are that and this. This is all.
But I'd like to do this by reading files file1.md and file2.md
### file1.md
There are some strings such as {my_var} and {some_var}.
Done.
### file2.md
Here there are also some vars: {some_var}, {my_var}. Also done.
Here's the Python code:
import inspect, markdown
my_var='this'
some_var='that'
def filein(file):
with open(file, 'r') as file:
data = file.read()
return data
for filei in ['file1.md','file2.md']:
fin=filein(file)
pre=inspect.cleandoc(f'''{fin}''')
However, the above does not evaluate the strings inside filei and replace them with this (my_var) and that (some_var), and instead keeps them as strings {my_var} and {some_var}.
What am I doing wrong?

You can use the .format method.
You can use ** to pass it a dictionary containing the variable.
Therefore you can use the locals() or globals(), which are dictionary of all the locals and globals variables.
e.g.
text = text.format(**globals())
Complete code:
my_var="this"
some_var="that"
for file in ["file1.md", "file2.md"]:
with open(file, "r") as f:
text = f.read()
text = text.format(**globals())
print(text)

f-strings are a static replacement mechanism, they're an intrinsic part of the bytecode, not a general-purpose templating mechanism
I've no idea what you think inspect.cleandoc does, but it does not do that.
Python generally avoids magic, meaning it really doesn't give a rat's ass about your local variables unless you specifically make it, which is not the case here. Python generally works with explicitely provided dicts (mappings of some term to its replacement).
I guess what you want here is the format/format_map methods, which do apply to format strings using {} e.g.
filein(file).format(my_var=my_var, some_var=some_var)
This can be risky if the files you're reading are under the control of a third party though: str.format allows attribute access and thus ultimately provides tools for arbitrary code execution. In that case, tools like string.Template, old-style string substitution (%) or a proper template engine might be a better idea.

Python - How to make a parametrized string factory

What are good recipes or light-weight libraries to - given a specified schema/template - compile strings from parameters and parse parameters from strings?
This is especially useful when working with URIs (file paths, URLs, etc.). One would like to define the template, along with any needed value converters, and be able to
validate if a string obeys the template's rules
produce (i.e. compile) a string given the template's parameters
extract parameters from a valid string (i.e. parse)
It seems the builtin string.Formatter has a lot of what's needed (not the parameter extraction though), but the URI compiling and parsing use case seems so common that I'd be surprised if there wasn't already a go-to library for this.
Working example
I'm looking for something that would do the following
>>> ln = LinearNaming('/home/{user}/fav/{num}.txt', # template
... format_dict={'user': '[^/]+', 'num': '\d+'},
... process_info_dict={'num': int} # param conversion
... )
>>> ln.is_valid('/home/USER/fav/123.txt')
True
>>> ln.is_valid('/home/US/ER/fav/123.txt')
False
>>> ln.is_valid('/home/US/ER/fav/not_a_number.txt')
False
>>> ln.mk('USER', num=123) # making a string (with args or kwargs)
'/home/USER/fav/123.txt'
>>> # Note: but ln.mk('USER', num='not_a_number') would fail because num is not valid
>>> ln.info_dict('/home/USER/fav/123.txt') # note in the output, 123 is an int, not a string
{'user': 'USER', 'num': 123}
>>>
>>> ####### prefix methods #######
>>> ln.is_valid_prefix('/home/USER/fav/')
True
>>> ln.is_valid_prefix('/home/USER/fav/12')
False # too long
>>> ln.is_valid_prefix('/home/USER/fav')
False # too short
>>> ln.is_valid_prefix('/home/')
True # just right
>>> ln.is_valid_prefix('/home/USER/fav/123.txt') # full path, so output same as is_valid() method
True
>>>
>>> ln.mk_prefix('ME')
'/home/ME/fav/'
>>> ln.mk_prefix(user='YOU', num=456) # full specification, so output same as same as mk() method
'/home/YOU/fav/456.txt'
(The example above uses LinearNaming of https://gist.github.com/thorwhalen/7e6a967bde2a8ae4ddf8928f1c9d8ea5. Works, but the approach is ugly and doesn't use string.Formatter)

Werkzeug is a set of tools for making web applications, and it provides a lot of functionality for dealing with urls.
For example:
url = "https://mywebsite.com/?location=paris&season=winter"
from werkzeug import urls
params = urls.url_parse(url).decode_query()
params['season']
>>> 'winter'
Give it a look:
https://werkzeug.palletsprojects.com/en/0.14.x/urls/
Edit: As for generic templating, another library from the flask toolset, namely Jinja, could be a good option.
f-strings are also a simple tool for the job.
Now, the specific use-case you presented is basically a combination of templating, like what you see in Jinja or f-strings, with regex for validating the variables. I don't know how something so specific could be accomplished in a simpler way than through something that will end up being equivalent with the LinearNaming library. You need to call a function (at least if you don't want to monkey patch the string class), you need to define the regex (otherwise the library cannot distinguish important characters in the template like '/'), and you need to pass the variable name/value pairs.

recursive nested expression in Python

I am using Python 2.6.4.
I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like "ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name," or the simple case of just "base_field_name" as a field, is it possible to use Python's re module to write a regex to extract base_field_name? If so, what would the regex look like?

Regular expressions are not suitable for parsing "nested" structures. Try, instead, a full-fledged parsing kit such as pyparsing -- examples of using pyparsing specifically to parse SQL can be found here and here, for example (you'll no doubt need to take the examples just as a starting point, and write some parsing code of your own, but, it's definitely not too difficult).

>>> import re
>>> string = 'ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name'
>>> rx = re.compile('^(.*?\()*(.+?)(,.*?)*(,|\).*?)*$')
>>> rx.search(string).group(2)
'base_field_name'
>>> rx.search('base_field_name').group(2)
'base_field_name'

Either a table-driven parser as Alex Martelli suggests or a hand-written recursive descent parser. They're not hard and quite rewarding to write.

This may be good enough:
import re
print re.match(r".*\(([^\)]+)\)", "ltrim(to_char(field_name, format)))").group(1)
You would need to do further processing. For example pick up the function name as well and pull the field name according to function signature.
.*(\w+)\(([^\)]+)\)

Here's a really hacky parser that does what you want.
It works by calling 'eval' on the text to be parsed, mapping all identifiers to a function which returns its first argument (which I'm guessing is what you want given your example).
class FakeFunction(object):
def __init__(self, name):
self.name = name
def __call__(self, *args):
return args[0]
def __str__(self):
return self.name
class FakeGlobals(dict):
def __getitem__(self, x):
return FakeFunction(x)
def ExtractBaseFieldName(x):
return eval(x, FakeGlobals())
print ExtractBaseFieldName('ltrim(rtrim(to_char(base_field_name, format)))')

Do you really need regular expressions? To get the one you've got up there I'd use
s[s.rfind('(')+1:s.find(')')].split(',')[0]
with 's' containing the original string.
Of course, it's not a general solution, but...

Sanitising user input using Python

What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?

Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).
It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)
As you can see, it uses the (awesome) BeautifulSoup library.
import re
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup, Comment
def sanitizeHtml(value, base_url=None):
rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))
rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))
re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)
validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()
validAttrs = 'href src width height'.split()
urlAttrs = 'href src'.split() # Attributes which should have a URL
soup = BeautifulSoup(value)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
# Get rid of comments
comment.extract()
for tag in soup.findAll(True):
if tag.name not in validTags:
tag.hidden = True
attrs = tag.attrs
tag.attrs = []
for attr, val in attrs:
if attr in validAttrs:
val = re_scripts.sub('', val) # Remove scripts (vbs & js)
if attr in urlAttrs:
val = urljoin(base_url, val) # Calculate the absolute url
tag.attrs.append((attr, val))
return soup.renderContents().decode('utf8')
As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.

Edit: bleach is a wrapper around html5lib which makes it even easier to use as a whitelist-based sanitiser.
html5lib comes with a whitelist-based HTML sanitiser - it's easy to subclass it to restrict the tags and attributes users are allowed to use on your site, and it even attempts to sanitise CSS if you're allowing use of the style attribute.
Here's now I'm using it in my Stack Overflow clone's sanitize_html utility function:
http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py
I've thrown all the attacks listed in ha.ckers.org's XSS Cheatsheet (which are handily available in XML format at it after performing Markdown to HTML conversion using python-markdown2 and it seems to have held up ok.
The WMD editor component which Stackoverflow currently uses is a problem, though - I actually had to disable JavaScript in order to test the XSS Cheatsheet attacks, as pasting them all into WMD ended up giving me alert boxes and blanking out the page.

The best way to prevent XSS is not to try and filter everything, but rather to simply do HTML Entity encoding. For example, automatically turn < into <. This is the ideal solution assuming you don't need to accept any html input (outside of forum/comment areas where it is used as markup, it should be pretty rare to need to accept HTML); there are so many permutations via alternate encodings that anything but an ultra-restrictive whitelist (a-z,A-Z,0-9 for example) is going to let something through.
SQL Injection, contrary to other opinion, is still possible, if you are just building out a query string. For example, if you are just concatenating an incoming parameter onto a query string, you will have SQL Injection. The best way to protect against this is also not filtering, but rather to religiously use parameterized queries and NEVER concatenate user input.
This is not to say that filtering isn't still a best practice, but in terms of SQL Injection and XSS, you will be far more protected if you religiously use Parameterize Queries and HTML Entity Encoding.

Jeff Atwood himself described how StackOverflow.com sanitizes user input (in non-language-specific terms) on the Stack Overflow blog: https://blog.stackoverflow.com/2008/06/safe-html-and-xss/
However, as Justin points out, if you use Django templates or something similar then they probably sanitize your HTML output anyway.
SQL injection also shouldn't be a concern. All of Python's database libraries (MySQLdb, cx_Oracle, etc) always sanitize the parameters you pass. These libraries are used by all of Python's object-relational mappers (such as Django models), so you don't need to worry about sanitation there either.

I don't do web development much any longer, but when I did, I did something like so:
When no parsing is supposed to happen, I usually just escape the data to not interfere with the database when I store it, and escape everything I read up from the database to not interfere with html when I display it (cgi.escape() in python).
Chances are, if someone tried to input html characters or stuff, they actually wanted that to be displayed as text anyway. If they didn't, well tough :)
In short always escape what can affect the current target for the data.
When I did need some parsing (markup or whatever) I usually tried to keep that language in a non-intersecting set with html so I could still just store it suitably escaped (after validating for syntax errors) and parse it to html when displaying without having to worry about the data the user put in there interfering with your html.
See also Escaping HTML

If you are using a framework like django, the framework can easily do this for you using standard filters. In fact, I'm pretty sure django automatically does it unless you tell it not to.
Otherwise, I would recommend using some sort of regex validation before accepting inputs from forms. I don't think there's a silver bullet for your problem, but using the re module, you should be able to construct what you need.

To sanitize a string input which you want to store to the database (for example a customer name) you need either to escape it or plainly remove any quotes (', ") from it. This effectively prevents classical SQL injection which can happen if you are assembling an SQL query from strings passed by the user.
For example (if it is acceptable to remove quotes completely):
datasetName = datasetName.replace("'","").replace('"',"")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python string templater - python

In python 2.6 you can do this if you need exactly that syntax from string import Formatter f = Formatter() f.format("http://api.app.com/{foo}", foo="bar") If you need to use an earlier python version then you can either copy the 2.6 formatter class or hand roll a parser/regex to do it.

Don't use a quick hack. What is used there (and implemented by Addressable) are URI Templates. There seem to be several libs for this in python, for example: uri-templates. described_routes_py also has a parser for them.

Related

Python Variable For Get Request?

Replacing strings with variables inside file in Python

Python - How to make a parametrized string factory

recursive nested expression in Python

Sanitising user input using Python

Categories

Resources