Python - How to make a parametrized string factory - python

What are good recipes or light-weight libraries to - given a specified schema/template - compile strings from parameters and parse parameters from strings?
This is especially useful when working with URIs (file paths, URLs, etc.). One would like to define the template, along with any needed value converters, and be able to
validate if a string obeys the template's rules
produce (i.e. compile) a string given the template's parameters
extract parameters from a valid string (i.e. parse)
It seems the builtin string.Formatter has a lot of what's needed (not the parameter extraction though), but the URI compiling and parsing use case seems so common that I'd be surprised if there wasn't already a go-to library for this.
Working example
I'm looking for something that would do the following
>>> ln = LinearNaming('/home/{user}/fav/{num}.txt', # template
... format_dict={'user': '[^/]+', 'num': '\d+'},
... process_info_dict={'num': int} # param conversion
... )
>>> ln.is_valid('/home/USER/fav/123.txt')
True
>>> ln.is_valid('/home/US/ER/fav/123.txt')
False
>>> ln.is_valid('/home/US/ER/fav/not_a_number.txt')
False
>>> ln.mk('USER', num=123) # making a string (with args or kwargs)
'/home/USER/fav/123.txt'
>>> # Note: but ln.mk('USER', num='not_a_number') would fail because num is not valid
>>> ln.info_dict('/home/USER/fav/123.txt') # note in the output, 123 is an int, not a string
{'user': 'USER', 'num': 123}
>>>
>>> ####### prefix methods #######
>>> ln.is_valid_prefix('/home/USER/fav/')
True
>>> ln.is_valid_prefix('/home/USER/fav/12')
False # too long
>>> ln.is_valid_prefix('/home/USER/fav')
False # too short
>>> ln.is_valid_prefix('/home/')
True # just right
>>> ln.is_valid_prefix('/home/USER/fav/123.txt') # full path, so output same as is_valid() method
True
>>>
>>> ln.mk_prefix('ME')
'/home/ME/fav/'
>>> ln.mk_prefix(user='YOU', num=456) # full specification, so output same as same as mk() method
'/home/YOU/fav/456.txt'
(The example above uses LinearNaming of https://gist.github.com/thorwhalen/7e6a967bde2a8ae4ddf8928f1c9d8ea5. Works, but the approach is ugly and doesn't use string.Formatter)

Werkzeug is a set of tools for making web applications, and it provides a lot of functionality for dealing with urls.
For example:
url = "https://mywebsite.com/?location=paris&season=winter"
from werkzeug import urls
params = urls.url_parse(url).decode_query()
params['season']
>>> 'winter'
Give it a look:
https://werkzeug.palletsprojects.com/en/0.14.x/urls/
Edit: As for generic templating, another library from the flask toolset, namely Jinja, could be a good option.
f-strings are also a simple tool for the job.
Now, the specific use-case you presented is basically a combination of templating, like what you see in Jinja or f-strings, with regex for validating the variables. I don't know how something so specific could be accomplished in a simpler way than through something that will end up being equivalent with the LinearNaming library. You need to call a function (at least if you don't want to monkey patch the string class), you need to define the regex (otherwise the library cannot distinguish important characters in the template like '/'), and you need to pass the variable name/value pairs.

Related

can we distinguish string/int value in yaml file using yaml BaseLoader?

I have a yaml file with the following data:
apple: 1
banana: '2'
cat: "3"
My project is parsing it using Python(yaml.BaseLoader) and want to deduce that "apple" is associated with an integer, using the isinstance()?
But in my case, the value isinstance(config['apple'], int) is showing FALSE and isinstance(config['apple'], str) is TRUE.
I think it makes sense as well, as we are using BaseLoader, so is there a way to update this to include integer without replacing the BaseLoader as the project's parsing script is getting used at many places?
As you've noticed, the base loader doesn't distinguish between scalar types (the behavior is via BaseConstructor.construct_scalar).
I'm not quite sure if this is what you want, but...
There's no safe way (that wouldn't affect other libraries using BaseLoader) to add integers to BaseLoader, but if you're willing to do a single search-and-replace to replace the use of BaseLoader with something else, you can do
class OurLoader(BaseLoader):
pass
OurLoader.add_constructor(
"tag:yaml.org,2002:int", SafeConstructor.construct_yaml_int,
)
# Borrowed from the yaml module itself:
YAML_INT_RE = re.compile(
r"""
^(?:[-+]?0b[0-1_]+
|[-+]?0[0-7_]+
|[-+]?(?:0|[1-9][0-9_]*)
|[-+]?0x[0-9a-fA-F_]+
|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$""",
re.X,
)
OurLoader.add_implicit_resolver(
"tag:yaml.org,2002:int", YAML_INT_RE, list("-+0123456789")
)
to end up with a loader that knows integers but nothing else.

Why the yaml can't load value as expected?

I use next minimal example to explain my problem:
test.py
#! /usr/bin/python3
import jinja2
import yaml
from yaml import CSafeLoader as SafeLoader
devices = [
"usb_otg_path: 1:8",
"usb_otg_path: m1:8",
"usb_otg_path: 18",
]
for device in devices:
template = jinja2.Template(device)
device_template = template.render()
print(device_template)
obj = yaml.load(device_template, Loader=SafeLoader)
print(obj)
The run result is:
root#pie:~# python3 test.py
usb_otg_path: 1:8
{'usb_otg_path': 68}
usb_otg_path: m1:8
{'usb_otg_path': 'm1:8'}
usb_otg_path: 18
{'usb_otg_path': 18}
You could see if the value of device_template is usb_otg_path: 1:8, then after yaml.load, the 1:8 becomes 68, looks like because we have : in it. But it's ok for other 2 inputs.
You know above is a simplify of a complex system, in which "usb_otg_path: 1:8" is the input value which I could not change, also the yaml.load is the basic mechanism it used to change a string to a python object.
Then if possible for me to get {'usb_otg_path': '1:8'} with some small changes (We need to upstream to that project, so may can't do big changes to affect others)? Something like change any parameters of yaml.load or something else?
YAML allows numerical literals (scalars) formatted as x:y:z and interprets them as "sexagesimal," that is to say: base 60.
1:8 is thus interpreted by YAML as 1*60**1 + 8*60**0, obviously giving you 68.
Notably you also have m1:8 as a string and 18 as a number. You sound like you want all strings? This answer might be useful:
yaml.load(yaml, Loader=yaml.BaseLoader)
This disables automatic value conversion, as BaseLoader "does not resolve or support any tags and construct only basic Python objects: lists, dictionaries, and Unicode strings." - See reference below

How to parse and then unparse a url query string so that it ends up in the same format/encoding as before?

Is there a way that I can take a url, parse it to get the query, edit the query with python, then remake the url so that its exactly the same (same format, encoding, etc). Here is what I have tried using urllib functions
>>> working_url
'https://<some-netloc>/reports/sales-order-history?page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z'
>>> working_parse = urlparse(working_url)
>>> working_parse
ParseResult(scheme='https', netloc='<some-netloc>', path='/reports/sales-order-history', params='', query='page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z', fragment='')
>>> working_query_dict = parse_qs(working_parse.query)
Here is where I would edit working_query_dict to change those timestamps for instance. Now I use urlencode to encode the dictionary again and urlunparse to turn it back into a real working url.
>>> working_query_dict
{'filter[official][0][name]': ['status'], 'filter[official][0][value]': ['Pending,Processing,Ready to ship,Delivering,Delivered,Completed'], 'filter[official][1][name]': ['orderDate'], 'filter[official][1][value]': ['2020-05-10T07:00:00.000Z,2020-05-18T06:59:59.999Z']}
>>> urlunparse((working_parse.scheme,working_parse.netloc,working_parse.path,working_parse.params,urlencode(working_query_dict),working_parse.fragment))
'https://<some-net-loc>/reports/sales-order-history?filter%5Bofficial%5D%5B0%5D%5Bname%5D=%5B%27status%27%5D&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=%5B%27Pending%2CProcessing%2CReady+to+ship%2CDelivering%2CDelivered%2CCompleted%27%5D&filter%5Bofficial%5D%5B1%5D%5Bname%5D=%5B%27orderDate%27%5D&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=%5B%272020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z%27%5D'
However, this url that gets formed doesn't work - it doesn't resolve to the same place on the website. Even looking at it, you can tell its changed, even though I changed no attributes or anything.
Im thinking maybe I need to like, detect the encoding or format when doing parse_qs, and then use that format when doing urlencode? How can I do this?
Ok the key is the urlencode flag quote_via=urllib.parse.quote. Additionally, parse_qs could be changed to parse_qsl in order to preserve ordering of parameters, and the keep_blank_labels=True to that function maintains even the blank parameters in the dictionary if you want an absolutely true match.
So now this works for me:
>>> from urllib.parse import quote, parse_qsl,urlencode
>>> urlencode(parse_qsl(working_parse.query,keep_blank_values=True),quote_via=quote) == working_parse.query
True
it takes a complicated query (which you could edit the attributes if you want), parses it out and urlencodes it to the original query string.

recursive nested expression in Python

I am using Python 2.6.4.
I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like "ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name," or the simple case of just "base_field_name" as a field, is it possible to use Python's re module to write a regex to extract base_field_name? If so, what would the regex look like?
Regular expressions are not suitable for parsing "nested" structures. Try, instead, a full-fledged parsing kit such as pyparsing -- examples of using pyparsing specifically to parse SQL can be found here and here, for example (you'll no doubt need to take the examples just as a starting point, and write some parsing code of your own, but, it's definitely not too difficult).
>>> import re
>>> string = 'ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name'
>>> rx = re.compile('^(.*?\()*(.+?)(,.*?)*(,|\).*?)*$')
>>> rx.search(string).group(2)
'base_field_name'
>>> rx.search('base_field_name').group(2)
'base_field_name'
Either a table-driven parser as Alex Martelli suggests or a hand-written recursive descent parser. They're not hard and quite rewarding to write.
This may be good enough:
import re
print re.match(r".*\(([^\)]+)\)", "ltrim(to_char(field_name, format)))").group(1)
You would need to do further processing. For example pick up the function name as well and pull the field name according to function signature.
.*(\w+)\(([^\)]+)\)
Here's a really hacky parser that does what you want.
It works by calling 'eval' on the text to be parsed, mapping all identifiers to a function which returns its first argument (which I'm guessing is what you want given your example).
class FakeFunction(object):
def __init__(self, name):
self.name = name
def __call__(self, *args):
return args[0]
def __str__(self):
return self.name
class FakeGlobals(dict):
def __getitem__(self, x):
return FakeFunction(x)
def ExtractBaseFieldName(x):
return eval(x, FakeGlobals())
print ExtractBaseFieldName('ltrim(rtrim(to_char(base_field_name, format)))')
Do you really need regular expressions? To get the one you've got up there I'd use
s[s.rfind('(')+1:s.find(')')].split(',')[0]
with 's' containing the original string.
Of course, it's not a general solution, but...

Python string templater

I'm using this REST web service, which returns various templated strings as urls, for example:
"http://api.app.com/{foo}"
In Ruby, I can then use
url = Addressable::Template.new("http://api.app.com/{foo}").expand('foo' => 'bar')
to get
"http://api.app.com/bar"
Is there any way to do this in Python? I know about %() templates, but obviously they're not working here.
In python 2.6 you can do this if you need exactly that syntax
from string import Formatter
f = Formatter()
f.format("http://api.app.com/{foo}", foo="bar")
If you need to use an earlier python version then you can either copy the 2.6 formatter class or hand roll a parser/regex to do it.
Don't use a quick hack.
What is used there (and implemented by Addressable) are URI Templates. There seem to be several libs for this in python, for example: uri-templates. described_routes_py also has a parser for them.
I cannot give you a perfect solution but you could try using string.Template.
You either pre-process your incoming URL and then use string.Template directly, like
In [6]: url="http://api.app.com/{foo}"
In [7]: up=string.Template(re.sub("{", "${", url))
In [8]: up.substitute({"foo":"bar"})
Out[8]: 'http://api.app.com/bar'
taking advantage of the default "${...}" syntax for replacement identifiers. Or you subclass string.Template to control the identifier pattern, like
class MyTemplate(string.Template):
delimiter = ...
pattern = ...
but I haven't figured that out.

Categories

Resources