Replacing strings with variables inside file in Python - python

I have a bunch of files with many tags inside of the form {my_var}, {some_var}, etc. I am looking to open them, and replace them with my_var and some_var that I've read into Python.
To do these sorts of things I've been using inspect.cleandoc():
import inspect, markdown
my_var='this'
some_var='that'
something=inspect.cleandoc(f'''
All my vars are {some_var} and {my_var}. This is all.
''')
print(something)
#All my vars are that and this. This is all.
But I'd like to do this by reading files file1.md and file2.md
### file1.md
There are some strings such as {my_var} and {some_var}.
Done.
### file2.md
Here there are also some vars: {some_var}, {my_var}. Also done.
Here's the Python code:
import inspect, markdown
my_var='this'
some_var='that'
def filein(file):
with open(file, 'r') as file:
data = file.read()
return data
for filei in ['file1.md','file2.md']:
fin=filein(file)
pre=inspect.cleandoc(f'''{fin}''')
However, the above does not evaluate the strings inside filei and replace them with this (my_var) and that (some_var), and instead keeps them as strings {my_var} and {some_var}.
What am I doing wrong?

You can use the .format method.
You can use ** to pass it a dictionary containing the variable.
Therefore you can use the locals() or globals(), which are dictionary of all the locals and globals variables.
e.g.
text = text.format(**globals())
Complete code:
my_var="this"
some_var="that"
for file in ["file1.md", "file2.md"]:
with open(file, "r") as f:
text = f.read()
text = text.format(**globals())
print(text)

f-strings are a static replacement mechanism, they're an intrinsic part of the bytecode, not a general-purpose templating mechanism
I've no idea what you think inspect.cleandoc does, but it does not do that.
Python generally avoids magic, meaning it really doesn't give a rat's ass about your local variables unless you specifically make it, which is not the case here. Python generally works with explicitely provided dicts (mappings of some term to its replacement).
I guess what you want here is the format/format_map methods, which do apply to format strings using {} e.g.
filein(file).format(my_var=my_var, some_var=some_var)
This can be risky if the files you're reading are under the control of a third party though: str.format allows attribute access and thus ultimately provides tools for arbitrary code execution. In that case, tools like string.Template, old-style string substitution (%) or a proper template engine might be a better idea.

Related

Python: Pass string into function with re

I am new to Python. I created the below function to search a text using regex in a file. The result is then written to an excel sheet.
But I get the error "NonType" object has no attribute group for (which mean match is not found).
b_list=re.split('\s+', str(b.group()))
However, when I use the function as normal code, I am able to find the text. So it means the passed values into the function didn't work.
How do I pass strings or variables correctly into the function? Thank you.
The complete code as below.
import re
import openpyxl
def eval_text(mh, search_text, excel_sht, excel_col):
b_regex=re.compile(r'(?<=mh ).+')
b=b_regex.search(search_text)
b_list=re.split('\s+', str(b.group()))
if abs(b)>1:
cell_b=excel_sht.cell(row=i, column=excel_col).value='OK'
else abs(b)<1:
cell_b=excel_sht.cell(row=i, column=excel_col).value='Not OK'
wb=openpyxl.load_workbook('test.xlsm', data_only=True, read_only=False, keep_vba=True)
sht=wb['test']
url=sht.cell(row=1, column=1).value
with open (url, 'r') as b:
diag_text_lines=b.readlines()
diag_text="".join(diag_text_lines)
eval_text('jame', diag_text, sht, 9)
Since the mh parameter is not used anywhere else in the function, I assume that you expected it to get automatically inserted in place of the mh in the regular expression r'(?<=mh ).+'. However, this does not happen! You have to use a format string, e.g. f'(?<={mh} ).+' (note that besides the {...} I replaced the "raw" r prefix, which you do not really need here, with f).
def eval_text(mh, search_text, excel_sht, excel_col):
b_regex=re.compile(f'(?<={mh} ).+')
b = b_regex.search(search_text)
...
For older versions of Python, use the format method instead. If there are more {...} used in the regex, this might not work, though. In the worst case, you can still concatenate the string yourself: r'(?<=' + mh + r' ).+' or use the old % format r'(?<=%s ).+' % mh.

Python Style - Should statements be nested inside a context manager that do not require the context?

Is there style guidance or reason to prefer one of these patterns over the other?
Minimizing the amount of code under the context manager "feels" cleaner to me, but I can't point to a specific reason why. It may be that this is just preference and there is no official guidance on the matter.
1) All code inside with context.
with open(file) as f:
text = f.read()
data = text.split(',')
result = my_func(data)
# etc.
2) Only necessary code inside with context.
with open(file) as f:
text = f.read()
data = text.split(',')
result = my_func(data)
# etc.
I think readability is always the guideline in the absence of any "style guide" statements - you want to easily see all uses of the context manager variable ('f' above) while it's in scope. The difference between a one and four line block to that visibility isn't significant, but between three lines and 50 it probably is.

Working with Parameters containing Escaped Characters in Python Config file

I have a config file that I'm reading using the following code:
import configparser as cp
config = cp.ConfigParser()
config.read('MTXXX.ini')
MT=identify_MT(msgtext)
schema_file = config.get(MT,'kbfile')
fold_text = config.get(MT,'fold')
The relevant section of the config file looks like this:
[536]
kbfile=MT536.kb
fold=:16S:TRANSDET\n
Later I try to find text contained in a dictionary that matches the 'fold' parameter, I've found that if I find that text using the following function:
def test (find_text)
return {k for k, v in dictionary.items() if find_text in v}
I get different results if I call that function in one of two ways:
test(fold_text)
Fails to find the data I want, but:
test(':16S:TRANSDET\n')
returns the results I know are there.
And, if I print the content of the dictionary, I can see that it is, as expected, shown as
:16S:TRANSDET\n
So, it matches when I enter the search text directly, but doesn't find a match when I load the same text in from a config file.
I'm guessing that there's some magic being applied here when reading/handling the \n character pattern in from the config file, but don't know how to get it to work the way I want it to.
I want to be able to parameterise using escape characters but it seems I'm blocked from doing this due to some internal mechanism.
Is there some switch I can apply to the config reader, or some extra parsing I can do to get the behavior I want? Or perhaps there's an alternate solution. I do find the configparser module convenient to use, but perhaps this is a limitation that requires an alternative, or even self-built module to lift data out of a parameter file.

How to turn Perl blessed objects into YAML that Python can read

We have a REST web service written in Perl Dancer. It returns perl data structures in YAML format and also takes in parameters in YAML format - it is supposed to work with some other teams who query it using Python.
Here's the problem -- if I'm passing back just a regular old perl hash by Dancer's serialization everything works completely fine. JSON, YAML, XML... they all do the job.
HOWEVER, sometimes we need to pass Perl objects back that the Python can later pass back in as a parameter to help with unnecessary loading, etc. I played around and found that YAML is the only one that works with Perl's blessed objects in Dancer.
The problem is that Python's YAML can't parse through the YAMLs of the Perl objects (whereas it can handle regular old perl hash YAMLs without an issue).
The perl objects start out like this in YAML:
First one:
--- &1 !!perl/hash:Sequencing_API
Second:
--- !!perl/hash:SDB::DBIO
It errors out like this.
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:perl/hash:SDB::DBIO'
The regular files seem to get passed through like this:
---
fields:
library:
It seems like the extra stuff after --- are causing the issues. What can I do to address this? Or am I trying to do too much by passing around Perl objects?
the short answer is
!! is yaml shorthand for tag:yaml.org,2002: ... as such !!perl/hash is really tag:yaml.org,2002:perl/hash
now you need to tell python yaml how to deal with this type
so you add a constructor for it as follows
import yaml
def construct_perl_object(loader, node):
print "S:",suffix,"N:",node
return loader.construct_yaml_node(node)#this is likely wrong ....
yaml.add_multi_constructor(u"tag:yaml.org,2002:perl/hash:SDB::DBIO", construct_perl_object)
yaml.load(yaml_string)
or maybe just parse it out or return None maybe ... its hard to test with just that line ... but that may be what you are looking for

Python string templater

I'm using this REST web service, which returns various templated strings as urls, for example:
"http://api.app.com/{foo}"
In Ruby, I can then use
url = Addressable::Template.new("http://api.app.com/{foo}").expand('foo' => 'bar')
to get
"http://api.app.com/bar"
Is there any way to do this in Python? I know about %() templates, but obviously they're not working here.
In python 2.6 you can do this if you need exactly that syntax
from string import Formatter
f = Formatter()
f.format("http://api.app.com/{foo}", foo="bar")
If you need to use an earlier python version then you can either copy the 2.6 formatter class or hand roll a parser/regex to do it.
Don't use a quick hack.
What is used there (and implemented by Addressable) are URI Templates. There seem to be several libs for this in python, for example: uri-templates. described_routes_py also has a parser for them.
I cannot give you a perfect solution but you could try using string.Template.
You either pre-process your incoming URL and then use string.Template directly, like
In [6]: url="http://api.app.com/{foo}"
In [7]: up=string.Template(re.sub("{", "${", url))
In [8]: up.substitute({"foo":"bar"})
Out[8]: 'http://api.app.com/bar'
taking advantage of the default "${...}" syntax for replacement identifiers. Or you subclass string.Template to control the identifier pattern, like
class MyTemplate(string.Template):
delimiter = ...
pattern = ...
but I haven't figured that out.

Categories

Resources