Working with Parameters containing Escaped Characters in Python Config file

Working with Parameters containing Escaped Characters in Python Config file - python

I have a config file that I'm reading using the following code:
import configparser as cp
config = cp.ConfigParser()
config.read('MTXXX.ini')
MT=identify_MT(msgtext)
schema_file = config.get(MT,'kbfile')
fold_text = config.get(MT,'fold')
The relevant section of the config file looks like this:
[536]
kbfile=MT536.kb
fold=:16S:TRANSDET\n
Later I try to find text contained in a dictionary that matches the 'fold' parameter, I've found that if I find that text using the following function:
def test (find_text)
return {k for k, v in dictionary.items() if find_text in v}
I get different results if I call that function in one of two ways:
test(fold_text)
Fails to find the data I want, but:
test(':16S:TRANSDET\n')
returns the results I know are there.
And, if I print the content of the dictionary, I can see that it is, as expected, shown as
:16S:TRANSDET\n
So, it matches when I enter the search text directly, but doesn't find a match when I load the same text in from a config file.
I'm guessing that there's some magic being applied here when reading/handling the \n character pattern in from the config file, but don't know how to get it to work the way I want it to.
I want to be able to parameterise using escape characters but it seems I'm blocked from doing this due to some internal mechanism.
Is there some switch I can apply to the config reader, or some extra parsing I can do to get the behavior I want? Or perhaps there's an alternate solution. I do find the configparser module convenient to use, but perhaps this is a limitation that requires an alternative, or even self-built module to lift data out of a parameter file.

Related

Extract text from a config file [duplicate]

This question already has answers here:
Parse key value pairs in a text file
(7 answers)
Closed 1 year ago.
I'm using a config file to inform my Python script of a few key-values, for use in authenticating the user against a website.
I have three variables: the URL, the user name, and the API token.
I've created a config file with each key on a different line, so:
url:<url string>
auth_user:<user name>
auth_token:<API token>
I want to be able to extract the text after the key words into variables, also stripping any "\n" that exist at the end of the line. Currently I'm doing this, and it works but seems clumsy:
with open(argv[1], mode='r') as config_file:
lines = config_file.readlines()
for line in lines:
url_match = match('jira_url:', line)
if url_match:
jira_url = line[9:].split("\n")[0]
user_match = match('auth_user:', line)
if user_match:
auth_user = line[10:].split("\n")[0]
token_match = match('auth_token', line)
if token_match:
auth_token = line[11:].split("\n")[0]
Can anybody suggest a more elegant solution? Specifically it's the ... = line[10:].split("\n")[0] lines that seem clunky to me.
I'm also slightly confused why I can't reuse my match object within the for loop, and have to create new match objects for each config item.

you could use a .yml file and read values with yaml.load() function:
import yaml
with open('settings.yml') as file:
settings = yaml.load(file, Loader=yaml.FullLoader)
now you can access elements like settings["url"] and so on

If the format is always <tag>:<value> you can easily parse it by splitting the line at the colon and filling up a custom dictionary:
config_file = open(filename,"r")
lines = config_file.readlines()
config_file.close()
settings = dict()
for l in lines:
elements = l[:-1].split(':')
settings[elements[0]] = ':'.join(elements[1:])
So, you get a dictionary that has the tags as keys and the values as values. You can then just refer to these dictionary entries in your pogram.
(e.g.: if you need the auth_token, just call settings["auth_token"]

if you can add 1 line for config file, configparser is good choice
https://docs.python.org/3/library/configparser.html
[1] config file : 1.cfg
[DEFAULT] # configparser's config file need section name
url:<url string>
auth_user:<user name>
auth_token:<API token>
[2] python scripts
import configparser
config = configparser.ConfigParser()
config.read('1.cfg')
print(config.get('DEFAULT','url'))
print(config.get('DEFAULT','auth_user'))
print(config.get('DEFAULT','auth_token'))
[3] output
<url string>
<user name>
<API token>
also configparser's methods is useful
whey you can't guarantee config file is always complete

You have a couple of great answers already, but I wanted to step back and provide some guidance on how you might approach these problems in the future. Getting quick answers sometimes prevents you from understanding how those people knew about the answers in the first place.
When you zoom out, the first thing that strikes me is that your task is to provide config, using a file, to your program. Software has the remarkable property of solve-once, use-anywhere. Config files have been a problem worth solving for at least 40 years, so you can bet your bottom dollar you don't need to solve this yourself. And already-solved means someone has already figured out all the little off-by-one and edge-case dramas like stripping line endings and dealing with expected input. The challenge of course, is knowing what solution already exists. If you haven't spent 40 years peeling back the covers of computers to see how they tick, it's difficult to "just know". So you might have a poke around on Google for "config file format" or something.
That would lead you to one of the most prevalent config file systems on the planet - the INI file. Just as useful now as it was 30 years ago, and as a bonus, looks not too dissimilar to your example config file. Then you might search for "read INI file in Python" or something, and come across configparser and you're basically done.
Or you might see that sometime in the last 30 years, YAML became the more trendy option, and wouldn't you know it, PyYAML will do most of the work for you.
But none of this gets you any better at using Python to extract from text files in general. So zooming in a bit, you want to know how to extract parts of lines in a text file. Again, this problem is an age-old problem, and if you were to learn about this problem (rather than just be handed the solution), you would learn that this is called parsing and often involves tokenisation. If you do some research on, say "parsing a text file in python" for example, you would learn about the general techniques that work regardless of the language, such as looping over lines and splitting each one in turn.
Zooming in one more step closer, you're looking to strip the new line off the end of the string so it doesn't get included in your value. Once again, this ain't a new problem, and with the right keywords you could dig up the well-trodden solutions. This is often called "chomping" or "stripping", and with some careful search terms, you'd find rstrip() and friends, and not have to do awkward things like splitting on the '\n' character.
Your final question is about re-using the match object. This is much harder to research. But again, the "solution" wont necessarily show you where you went wrong. What you need to keep in mind is that the statements in the for loop are sequential. To think them through you should literally execute them in your mind, one after one, and imagine what's happening. Each time you call match, it either returns None or a Match object. You never use the object, except to check for truthiness in the if statement. And next time you call match, you do so with different arguments so you get a new Match object (or None). Therefore, you don't need to keep the object around at all. You can simply do:
if match('jira_url:', line):
jira_url = line[9:].split("\n")[0]
if match('auth_user:', line):
auth_user = line[10:].split("\n")[0]
and so on. Not only that, if the first if triggered then you don't need to bother calling match again - it will certainly not trigger any of other matches for the same line. So you could do:
if match('jira_url:', line):
jira_url = line[9:].rstrip()
elif match('auth_user:', line):
auth_user = line[10:].rstrip()
and so on.
But then you can start to think - why bother doing all these matches on the colon, only to then manually split the string at the colon afterwards? You could just do:
tokens = line.rstrip().split(':')
if token[0] == 'jira_url':
jira_url = token[1]
elif token[0] == 'auth_user':
auth_user = token[1]
If you keep making these improvements (and there's lots more to make!), eventually you'll end up re-writing configparse, but at least you'll have learned why it's often a good idea to use an existing library where practical!

Replacing strings with variables inside file in Python

I have a bunch of files with many tags inside of the form {my_var}, {some_var}, etc. I am looking to open them, and replace them with my_var and some_var that I've read into Python.
To do these sorts of things I've been using inspect.cleandoc():
import inspect, markdown
my_var='this'
some_var='that'
something=inspect.cleandoc(f'''
All my vars are {some_var} and {my_var}. This is all.
''')
print(something)
#All my vars are that and this. This is all.
But I'd like to do this by reading files file1.md and file2.md
### file1.md
There are some strings such as {my_var} and {some_var}.
Done.
### file2.md
Here there are also some vars: {some_var}, {my_var}. Also done.
Here's the Python code:
import inspect, markdown
my_var='this'
some_var='that'
def filein(file):
with open(file, 'r') as file:
data = file.read()
return data
for filei in ['file1.md','file2.md']:
fin=filein(file)
pre=inspect.cleandoc(f'''{fin}''')
However, the above does not evaluate the strings inside filei and replace them with this (my_var) and that (some_var), and instead keeps them as strings {my_var} and {some_var}.
What am I doing wrong?

You can use the .format method.
You can use ** to pass it a dictionary containing the variable.
Therefore you can use the locals() or globals(), which are dictionary of all the locals and globals variables.
e.g.
text = text.format(**globals())
Complete code:
my_var="this"
some_var="that"
for file in ["file1.md", "file2.md"]:
with open(file, "r") as f:
text = f.read()
text = text.format(**globals())
print(text)

f-strings are a static replacement mechanism, they're an intrinsic part of the bytecode, not a general-purpose templating mechanism
I've no idea what you think inspect.cleandoc does, but it does not do that.
Python generally avoids magic, meaning it really doesn't give a rat's ass about your local variables unless you specifically make it, which is not the case here. Python generally works with explicitely provided dicts (mappings of some term to its replacement).
I guess what you want here is the format/format_map methods, which do apply to format strings using {} e.g.
filein(file).format(my_var=my_var, some_var=some_var)
This can be risky if the files you're reading are under the control of a third party though: str.format allows attribute access and thus ultimately provides tools for arbitrary code execution. In that case, tools like string.Template, old-style string substitution (%) or a proper template engine might be a better idea.

Access python dict value in yaml with tags

Is it possible to load the value from a python dict in yaml?
I can access variable by using:
!!python/name:mymodule.myfile.myvar
but this give the whole dict.
Trying to use dict get method like so:
test: &TEST !!python/object/apply:mymod.myfile.mydict.get ['mykey']
give me the following error:
yaml.constructor.ConstructorError: while constructing a Python object cannot find module 'mymod.myfile.mydict' (No module named 'mymod.myfile.mydict'; 'mymod.myfile' is not a package)
I'm trying to do that because I have bunch of yaml files which define my project settings, one is for path directory, and I need to load it into some other yaml files and it looks like you cant load yaml variable from another yaml.
EDIT:
I have found one solution, creating my own function who return the values in dict and calling it like so:
test: &TEST !!python/object/apply:mymod.myfile.get_dict_value ['mykey']

There is no mechanism in YAML to refer to one document from another YAML document.
You'll have to do that by interpreting information in the document in the program that loads the initial YAML document. Whether you do that by explicit logic, or by using some tag doesn't make a practical difference.
Please be aware that it is unsafe to allow interpreting tags of the form !!python/name:.....`` (via yaml=YAML(typ='unsafe') in ruamel.yaml, or load() in PyYAML), and is never really necessary.

Incremental Saves

I am trying to write up a script on incremental saves but there are a few hiccups that I am running into.
If the file name is "aaa.ma", I will get the following error - ValueError: invalid literal for int() with base 10: 'aaa' # and it does not happens if my file is named "aaa_0001"
And this happens if I wrote my code in this format: Link
As such, to rectify the above problem, I input in an if..else.. statement - Link, it seems to have resolved the issue on hand, but I was wondering if there is a better approach to this?
Any advice will be greatly appreciated!

Use regexes for better flexibility especially for file rename scripts like these.
In your case, since you know that the expected filename format is "some_file_name_<increment_number>", you can use regexes to do the searching and matching for you. The reason we should do this is because people/users may are not machines, and may not stick to the exact naming conventions that our scripts expect. For example, the user may name the file aaa_01.ma or even aaa001.ma instead of aaa_0001 that your script currently expects. To build this flexibility into your script, you can use regexes. For your use case, you could do:
# name = lastIncFile.partition(".")[0] # Use os.path.split instead
name, ext = os.path.splitext(lastIncFile)
import re
match_object = re.search("([a-zA-Z]*)_*([0-9]*)$", name)
# Here ([a-zA-Z]*) would be group(1) and would have "aaa" for ex.
# and ([0-9]*) would be group(2) and would have "0001" for ex.
# _* indicates that there may be an _, or not.
# The $ indicates that ([0-9]*) would be the LAST part of the name.
padding = 4 # Try and parameterize as many components as possible for easy maintenance
default_starting = 1
verName = str(default_starting).zfill(padding) # Default verName
if match_object: # True if the version string was found
name = match_object.group(1)
version_component = match_object.group(2)
if version_component:
verName = str(int(version_component) + 1).zfill(padding)
newFileName = "%s_%s.%s" % (name, verName, ext)
incSaveFilePath = os.path.join(curFileDir, newFileName)
Check out this nice tutorial on Python regexes to get an idea what is going on in the above block. Feel free to tweak, evolve and build the regex based on your use cases, tests and needs.
Extra tips:
Call cmds.file(renameToSave=True) at the beginning of the script. This will ensure that the file does not get saved over itself accidentally, and forces the script/user to rename the current file. Just a safety measure.
If you want to go a little fancy with your regex expression and make them more readable, you could try doing this:
match_object = re.search("(?P<name>[a-zA-Z]*)_*(?P<version>[0-9]*)$", name)
name = match_object.group('name')
version_component = match_object('version')
Here we use the ?P<var_name>... syntax to assign a dict key name to the matching group. Makes for better readability when you access it - mo.group('version') is much more clearer than mo.group(2).
Make sure to go through the official docs too.
Save using Maya's commands. This will ensure Maya does all it's checks while and before saving:
cmds.file(rename=incSaveFilePath)
cmds.file(save=True)
Update-2:
If you want space to be checked here's an updated regex:
match_object = re.search("(?P<name>[a-zA-Z]*)[_ ]*(?P<version>[0-9]*)$", name)
Here [_ ]* will check for 0 - many occurrences of _ or (space). For more regex stuff, trying and learn on your own is the best way. Check out the links on this post.
Hope this helps.

XML to store system paths in Python with lxml

I'm using an xml file to store configurations for a software.
One of theese configurations would be a system path like
> set_value = "c:\\test\\3 tests\\test"
i can store it by using:
> setting = etree.SubElement(settings,
> "setting", name=tmp_set_name, type =
> set_type , value= set_value)
If I use
doc.write(output_file, method='xml',encoding = 'utf-8', compression=0)
the file would be:
< setting type="str" name="MyPath" value="c:\test\3 tests\test"/>
Now I read it again with the etree.parse method
I obtain an etree child object with a string value, but the string
contains the
\3
character and if i try to use it to write again to xml it will be interpreted !!!!! So i cannot use it anymore as a path
Maybe i'm only missing a simple string operation, but I cannot see it =)
How would you solve it in a smart way ?
This is an example, but what is the best way, you think to store paths in xml and parse them with lxml ?
Thank you !!

Now I read it again with the
etree.parse method
I obtain an etree child object with a
string value, but the string contains
the
\3
character and if i try to use it to
write again to xml it will be
interpreted !!!!!
I just tried that, and it doesn't get "interpreted". The elements attributes as returned after parsed is:
{'type': 'str', 'name': 'yowza!', 'value': 'c:\\test\\3 tests\\test'}
So as you see this works just as you expected it to work. If you really have this problem, you are doing something else than what you are saying. Show us the real code, or make a small example code where you demonstrate the problem and use that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.