can we distinguish string/int value in yaml file using yaml BaseLoader? - python

I have a yaml file with the following data:
apple: 1
banana: '2'
cat: "3"
My project is parsing it using Python(yaml.BaseLoader) and want to deduce that "apple" is associated with an integer, using the isinstance()?
But in my case, the value isinstance(config['apple'], int) is showing FALSE and isinstance(config['apple'], str) is TRUE.
I think it makes sense as well, as we are using BaseLoader, so is there a way to update this to include integer without replacing the BaseLoader as the project's parsing script is getting used at many places?

As you've noticed, the base loader doesn't distinguish between scalar types (the behavior is via BaseConstructor.construct_scalar).
I'm not quite sure if this is what you want, but...
There's no safe way (that wouldn't affect other libraries using BaseLoader) to add integers to BaseLoader, but if you're willing to do a single search-and-replace to replace the use of BaseLoader with something else, you can do
class OurLoader(BaseLoader):
pass
OurLoader.add_constructor(
"tag:yaml.org,2002:int", SafeConstructor.construct_yaml_int,
)
# Borrowed from the yaml module itself:
YAML_INT_RE = re.compile(
r"""
^(?:[-+]?0b[0-1_]+
|[-+]?0[0-7_]+
|[-+]?(?:0|[1-9][0-9_]*)
|[-+]?0x[0-9a-fA-F_]+
|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$""",
re.X,
)
OurLoader.add_implicit_resolver(
"tag:yaml.org,2002:int", YAML_INT_RE, list("-+0123456789")
)
to end up with a loader that knows integers but nothing else.

Related

Why the yaml can't load value as expected?

I use next minimal example to explain my problem:
test.py
#! /usr/bin/python3
import jinja2
import yaml
from yaml import CSafeLoader as SafeLoader
devices = [
"usb_otg_path: 1:8",
"usb_otg_path: m1:8",
"usb_otg_path: 18",
]
for device in devices:
template = jinja2.Template(device)
device_template = template.render()
print(device_template)
obj = yaml.load(device_template, Loader=SafeLoader)
print(obj)
The run result is:
root#pie:~# python3 test.py
usb_otg_path: 1:8
{'usb_otg_path': 68}
usb_otg_path: m1:8
{'usb_otg_path': 'm1:8'}
usb_otg_path: 18
{'usb_otg_path': 18}
You could see if the value of device_template is usb_otg_path: 1:8, then after yaml.load, the 1:8 becomes 68, looks like because we have : in it. But it's ok for other 2 inputs.
You know above is a simplify of a complex system, in which "usb_otg_path: 1:8" is the input value which I could not change, also the yaml.load is the basic mechanism it used to change a string to a python object.
Then if possible for me to get {'usb_otg_path': '1:8'} with some small changes (We need to upstream to that project, so may can't do big changes to affect others)? Something like change any parameters of yaml.load or something else?
YAML allows numerical literals (scalars) formatted as x:y:z and interprets them as "sexagesimal," that is to say: base 60.
1:8 is thus interpreted by YAML as 1*60**1 + 8*60**0, obviously giving you 68.
Notably you also have m1:8 as a string and 18 as a number. You sound like you want all strings? This answer might be useful:
yaml.load(yaml, Loader=yaml.BaseLoader)
This disables automatic value conversion, as BaseLoader "does not resolve or support any tags and construct only basic Python objects: lists, dictionaries, and Unicode strings." - See reference below

Replacing strings with variables inside file in Python

I have a bunch of files with many tags inside of the form {my_var}, {some_var}, etc. I am looking to open them, and replace them with my_var and some_var that I've read into Python.
To do these sorts of things I've been using inspect.cleandoc():
import inspect, markdown
my_var='this'
some_var='that'
something=inspect.cleandoc(f'''
All my vars are {some_var} and {my_var}. This is all.
''')
print(something)
#All my vars are that and this. This is all.
But I'd like to do this by reading files file1.md and file2.md
### file1.md
There are some strings such as {my_var} and {some_var}.
Done.
### file2.md
Here there are also some vars: {some_var}, {my_var}. Also done.
Here's the Python code:
import inspect, markdown
my_var='this'
some_var='that'
def filein(file):
with open(file, 'r') as file:
data = file.read()
return data
for filei in ['file1.md','file2.md']:
fin=filein(file)
pre=inspect.cleandoc(f'''{fin}''')
However, the above does not evaluate the strings inside filei and replace them with this (my_var) and that (some_var), and instead keeps them as strings {my_var} and {some_var}.
What am I doing wrong?
You can use the .format method.
You can use ** to pass it a dictionary containing the variable.
Therefore you can use the locals() or globals(), which are dictionary of all the locals and globals variables.
e.g.
text = text.format(**globals())
Complete code:
my_var="this"
some_var="that"
for file in ["file1.md", "file2.md"]:
with open(file, "r") as f:
text = f.read()
text = text.format(**globals())
print(text)
f-strings are a static replacement mechanism, they're an intrinsic part of the bytecode, not a general-purpose templating mechanism
I've no idea what you think inspect.cleandoc does, but it does not do that.
Python generally avoids magic, meaning it really doesn't give a rat's ass about your local variables unless you specifically make it, which is not the case here. Python generally works with explicitely provided dicts (mappings of some term to its replacement).
I guess what you want here is the format/format_map methods, which do apply to format strings using {} e.g.
filein(file).format(my_var=my_var, some_var=some_var)
This can be risky if the files you're reading are under the control of a third party though: str.format allows attribute access and thus ultimately provides tools for arbitrary code execution. In that case, tools like string.Template, old-style string substitution (%) or a proper template engine might be a better idea.

Python - How to make a parametrized string factory

What are good recipes or light-weight libraries to - given a specified schema/template - compile strings from parameters and parse parameters from strings?
This is especially useful when working with URIs (file paths, URLs, etc.). One would like to define the template, along with any needed value converters, and be able to
validate if a string obeys the template's rules
produce (i.e. compile) a string given the template's parameters
extract parameters from a valid string (i.e. parse)
It seems the builtin string.Formatter has a lot of what's needed (not the parameter extraction though), but the URI compiling and parsing use case seems so common that I'd be surprised if there wasn't already a go-to library for this.
Working example
I'm looking for something that would do the following
>>> ln = LinearNaming('/home/{user}/fav/{num}.txt', # template
... format_dict={'user': '[^/]+', 'num': '\d+'},
... process_info_dict={'num': int} # param conversion
... )
>>> ln.is_valid('/home/USER/fav/123.txt')
True
>>> ln.is_valid('/home/US/ER/fav/123.txt')
False
>>> ln.is_valid('/home/US/ER/fav/not_a_number.txt')
False
>>> ln.mk('USER', num=123) # making a string (with args or kwargs)
'/home/USER/fav/123.txt'
>>> # Note: but ln.mk('USER', num='not_a_number') would fail because num is not valid
>>> ln.info_dict('/home/USER/fav/123.txt') # note in the output, 123 is an int, not a string
{'user': 'USER', 'num': 123}
>>>
>>> ####### prefix methods #######
>>> ln.is_valid_prefix('/home/USER/fav/')
True
>>> ln.is_valid_prefix('/home/USER/fav/12')
False # too long
>>> ln.is_valid_prefix('/home/USER/fav')
False # too short
>>> ln.is_valid_prefix('/home/')
True # just right
>>> ln.is_valid_prefix('/home/USER/fav/123.txt') # full path, so output same as is_valid() method
True
>>>
>>> ln.mk_prefix('ME')
'/home/ME/fav/'
>>> ln.mk_prefix(user='YOU', num=456) # full specification, so output same as same as mk() method
'/home/YOU/fav/456.txt'
(The example above uses LinearNaming of https://gist.github.com/thorwhalen/7e6a967bde2a8ae4ddf8928f1c9d8ea5. Works, but the approach is ugly and doesn't use string.Formatter)
Werkzeug is a set of tools for making web applications, and it provides a lot of functionality for dealing with urls.
For example:
url = "https://mywebsite.com/?location=paris&season=winter"
from werkzeug import urls
params = urls.url_parse(url).decode_query()
params['season']
>>> 'winter'
Give it a look:
https://werkzeug.palletsprojects.com/en/0.14.x/urls/
Edit: As for generic templating, another library from the flask toolset, namely Jinja, could be a good option.
f-strings are also a simple tool for the job.
Now, the specific use-case you presented is basically a combination of templating, like what you see in Jinja or f-strings, with regex for validating the variables. I don't know how something so specific could be accomplished in a simpler way than through something that will end up being equivalent with the LinearNaming library. You need to call a function (at least if you don't want to monkey patch the string class), you need to define the regex (otherwise the library cannot distinguish important characters in the template like '/'), and you need to pass the variable name/value pairs.

Access python dict value in yaml with tags

Is it possible to load the value from a python dict in yaml?
I can access variable by using:
!!python/name:mymodule.myfile.myvar
but this give the whole dict.
Trying to use dict get method like so:
test: &TEST !!python/object/apply:mymod.myfile.mydict.get ['mykey']
give me the following error:
yaml.constructor.ConstructorError: while constructing a Python object cannot find module 'mymod.myfile.mydict' (No module named 'mymod.myfile.mydict'; 'mymod.myfile' is not a package)
I'm trying to do that because I have bunch of yaml files which define my project settings, one is for path directory, and I need to load it into some other yaml files and it looks like you cant load yaml variable from another yaml.
EDIT:
I have found one solution, creating my own function who return the values in dict and calling it like so:
test: &TEST !!python/object/apply:mymod.myfile.get_dict_value ['mykey']
There is no mechanism in YAML to refer to one document from another YAML document.
You'll have to do that by interpreting information in the document in the program that loads the initial YAML document. Whether you do that by explicit logic, or by using some tag doesn't make a practical difference.
Please be aware that it is unsafe to allow interpreting tags of the form !!python/name:.....`` (via yaml=YAML(typ='unsafe') in ruamel.yaml, or load() in PyYAML), and is never really necessary.

Working with Parameters containing Escaped Characters in Python Config file

I have a config file that I'm reading using the following code:
import configparser as cp
config = cp.ConfigParser()
config.read('MTXXX.ini')
MT=identify_MT(msgtext)
schema_file = config.get(MT,'kbfile')
fold_text = config.get(MT,'fold')
The relevant section of the config file looks like this:
[536]
kbfile=MT536.kb
fold=:16S:TRANSDET\n
Later I try to find text contained in a dictionary that matches the 'fold' parameter, I've found that if I find that text using the following function:
def test (find_text)
return {k for k, v in dictionary.items() if find_text in v}
I get different results if I call that function in one of two ways:
test(fold_text)
Fails to find the data I want, but:
test(':16S:TRANSDET\n')
returns the results I know are there.
And, if I print the content of the dictionary, I can see that it is, as expected, shown as
:16S:TRANSDET\n
So, it matches when I enter the search text directly, but doesn't find a match when I load the same text in from a config file.
I'm guessing that there's some magic being applied here when reading/handling the \n character pattern in from the config file, but don't know how to get it to work the way I want it to.
I want to be able to parameterise using escape characters but it seems I'm blocked from doing this due to some internal mechanism.
Is there some switch I can apply to the config reader, or some extra parsing I can do to get the behavior I want? Or perhaps there's an alternate solution. I do find the configparser module convenient to use, but perhaps this is a limitation that requires an alternative, or even self-built module to lift data out of a parameter file.

Categories

Resources