Python: Pass string into function with re

Python: Pass string into function with re - python

I am new to Python. I created the below function to search a text using regex in a file. The result is then written to an excel sheet.
But I get the error "NonType" object has no attribute group for (which mean match is not found).
b_list=re.split('\s+', str(b.group()))
However, when I use the function as normal code, I am able to find the text. So it means the passed values into the function didn't work.
How do I pass strings or variables correctly into the function? Thank you.
The complete code as below.
import re
import openpyxl
def eval_text(mh, search_text, excel_sht, excel_col):
b_regex=re.compile(r'(?<=mh ).+')
b=b_regex.search(search_text)
b_list=re.split('\s+', str(b.group()))
if abs(b)>1:
cell_b=excel_sht.cell(row=i, column=excel_col).value='OK'
else abs(b)<1:
cell_b=excel_sht.cell(row=i, column=excel_col).value='Not OK'
wb=openpyxl.load_workbook('test.xlsm', data_only=True, read_only=False, keep_vba=True)
sht=wb['test']
url=sht.cell(row=1, column=1).value
with open (url, 'r') as b:
diag_text_lines=b.readlines()
diag_text="".join(diag_text_lines)
eval_text('jame', diag_text, sht, 9)

Since the mh parameter is not used anywhere else in the function, I assume that you expected it to get automatically inserted in place of the mh in the regular expression r'(?<=mh ).+'. However, this does not happen! You have to use a format string, e.g. f'(?<={mh} ).+' (note that besides the {...} I replaced the "raw" r prefix, which you do not really need here, with f).
def eval_text(mh, search_text, excel_sht, excel_col):
b_regex=re.compile(f'(?<={mh} ).+')
b = b_regex.search(search_text)
...
For older versions of Python, use the format method instead. If there are more {...} used in the regex, this might not work, though. In the worst case, you can still concatenate the string yourself: r'(?<=' + mh + r' ).+' or use the old % format r'(?<=%s ).+' % mh.

Related

Replacing strings with variables inside file in Python

I have a bunch of files with many tags inside of the form {my_var}, {some_var}, etc. I am looking to open them, and replace them with my_var and some_var that I've read into Python.
To do these sorts of things I've been using inspect.cleandoc():
import inspect, markdown
my_var='this'
some_var='that'
something=inspect.cleandoc(f'''
All my vars are {some_var} and {my_var}. This is all.
''')
print(something)
#All my vars are that and this. This is all.
But I'd like to do this by reading files file1.md and file2.md
### file1.md
There are some strings such as {my_var} and {some_var}.
Done.
### file2.md
Here there are also some vars: {some_var}, {my_var}. Also done.
Here's the Python code:
import inspect, markdown
my_var='this'
some_var='that'
def filein(file):
with open(file, 'r') as file:
data = file.read()
return data
for filei in ['file1.md','file2.md']:
fin=filein(file)
pre=inspect.cleandoc(f'''{fin}''')
However, the above does not evaluate the strings inside filei and replace them with this (my_var) and that (some_var), and instead keeps them as strings {my_var} and {some_var}.
What am I doing wrong?

You can use the .format method.
You can use ** to pass it a dictionary containing the variable.
Therefore you can use the locals() or globals(), which are dictionary of all the locals and globals variables.
e.g.
text = text.format(**globals())
Complete code:
my_var="this"
some_var="that"
for file in ["file1.md", "file2.md"]:
with open(file, "r") as f:
text = f.read()
text = text.format(**globals())
print(text)

f-strings are a static replacement mechanism, they're an intrinsic part of the bytecode, not a general-purpose templating mechanism
I've no idea what you think inspect.cleandoc does, but it does not do that.
Python generally avoids magic, meaning it really doesn't give a rat's ass about your local variables unless you specifically make it, which is not the case here. Python generally works with explicitely provided dicts (mappings of some term to its replacement).
I guess what you want here is the format/format_map methods, which do apply to format strings using {} e.g.
filein(file).format(my_var=my_var, some_var=some_var)
This can be risky if the files you're reading are under the control of a third party though: str.format allows attribute access and thus ultimately provides tools for arbitrary code execution. In that case, tools like string.Template, old-style string substitution (%) or a proper template engine might be a better idea.

Python - Regex - Match anything except

I'm trying to get my regular expression to work but can't figure out what I'm doing wrong. I am trying to find any file that is NOT in a specific format. For example all files are dates that are in this format MM-DD-YY.pdf (ex. 05-13-17.pdf). I want to be able to find any files that are not written in that format.
I can create a regex to find those with:
(\d\d-\d\d-\d\d\.pdf)
I tried using the negative lookahead so it looked like this:
(?!\d\d-\d\d-\d\d\.pdf)
That works in not finding those anymore but it doesn't find the files that are not like it.
I also tried adding a .* after the group but then that finds the whole list.
(?!\d\d-\d\d-\d\d\.pdf).*
I'm searching through a small list right now for testing:
05-17-17.pdf Test.pdf 05-48-2017.pdf 03-14-17.pdf
Is there a way to accomplish what I'm looking for?
Thanks!

You can try this:
import re
s = "Test.docx 04-05-2017.docx 04-04-17.pdf secondtest.pdf"
new_data = re.findall("[a-zA-Z]+\.[a-zA-Z]+|\d{1,}-\d{1,}-\d{4}\.[a-zA-Z]+", s)
Output:
['Test.docx', '04-05-2017.docx', 'secondtest.pdf']

First find all that are matching, then remove them from your list separately. firstFindtheMatching method first finds matching names using re library:
def firstFindtheMatching(listoffiles):
"""
:listoffiles: list is the name of the files to check if they match a format
:final_string: any file that doesn't match the format 01-01-17.pdf (MM-DD-YY.pdf) is put in one str type output. (ALSO) I'm returning the listoffiles so in that you can see the whole output in one place but you really won't need that.
"""
import re
matchednames = re.findall("\d{1,2}-\d{1,2}-\d{1,2}\.pdf", listoffiles)
#connect all output in one string for simpler handling using sets
final_string = ' '.join(matchednames)
return(final_string, listoffiles)
Here is the output:
('05-08-17.pdf 04-08-17.pdf 08-09-16.pdf', '05-08-17.pdf Test.pdf 04-08-17.pdf 08-09-16.pdf 08-09-2016.pdf some-all-letters.pdf')
set(['08-09-2016.pdf', 'some-all-letters.pdf', 'Test.pdf'])
I've used the main below if you like to regenerate the results. Good thing about doing it this way is that you can add more regex to your firstFindtheMatching(). It helps you to keep things separate.
def main():
filenames= "05-08-17.pdf Test.pdf 04-08-17.pdf 08-09-16.pdf 08-09-2016.pdf some-all-letters.pdf"
[matchednames , alllist] = firstFindtheMatching(filenames)
print(matchednames, alllist)
notcommon = set(filenames.split()) - set(matchednames.split())
print(notcommon)
if __name__ == '__main__':
main()

Working with Parameters containing Escaped Characters in Python Config file

I have a config file that I'm reading using the following code:
import configparser as cp
config = cp.ConfigParser()
config.read('MTXXX.ini')
MT=identify_MT(msgtext)
schema_file = config.get(MT,'kbfile')
fold_text = config.get(MT,'fold')
The relevant section of the config file looks like this:
[536]
kbfile=MT536.kb
fold=:16S:TRANSDET\n
Later I try to find text contained in a dictionary that matches the 'fold' parameter, I've found that if I find that text using the following function:
def test (find_text)
return {k for k, v in dictionary.items() if find_text in v}
I get different results if I call that function in one of two ways:
test(fold_text)
Fails to find the data I want, but:
test(':16S:TRANSDET\n')
returns the results I know are there.
And, if I print the content of the dictionary, I can see that it is, as expected, shown as
:16S:TRANSDET\n
So, it matches when I enter the search text directly, but doesn't find a match when I load the same text in from a config file.
I'm guessing that there's some magic being applied here when reading/handling the \n character pattern in from the config file, but don't know how to get it to work the way I want it to.
I want to be able to parameterise using escape characters but it seems I'm blocked from doing this due to some internal mechanism.
Is there some switch I can apply to the config reader, or some extra parsing I can do to get the behavior I want? Or perhaps there's an alternate solution. I do find the configparser module convenient to use, but perhaps this is a limitation that requires an alternative, or even self-built module to lift data out of a parameter file.

Incremental Saves

I am trying to write up a script on incremental saves but there are a few hiccups that I am running into.
If the file name is "aaa.ma", I will get the following error - ValueError: invalid literal for int() with base 10: 'aaa' # and it does not happens if my file is named "aaa_0001"
And this happens if I wrote my code in this format: Link
As such, to rectify the above problem, I input in an if..else.. statement - Link, it seems to have resolved the issue on hand, but I was wondering if there is a better approach to this?
Any advice will be greatly appreciated!

Use regexes for better flexibility especially for file rename scripts like these.
In your case, since you know that the expected filename format is "some_file_name_<increment_number>", you can use regexes to do the searching and matching for you. The reason we should do this is because people/users may are not machines, and may not stick to the exact naming conventions that our scripts expect. For example, the user may name the file aaa_01.ma or even aaa001.ma instead of aaa_0001 that your script currently expects. To build this flexibility into your script, you can use regexes. For your use case, you could do:
# name = lastIncFile.partition(".")[0] # Use os.path.split instead
name, ext = os.path.splitext(lastIncFile)
import re
match_object = re.search("([a-zA-Z]*)_*([0-9]*)$", name)
# Here ([a-zA-Z]*) would be group(1) and would have "aaa" for ex.
# and ([0-9]*) would be group(2) and would have "0001" for ex.
# _* indicates that there may be an _, or not.
# The $ indicates that ([0-9]*) would be the LAST part of the name.
padding = 4 # Try and parameterize as many components as possible for easy maintenance
default_starting = 1
verName = str(default_starting).zfill(padding) # Default verName
if match_object: # True if the version string was found
name = match_object.group(1)
version_component = match_object.group(2)
if version_component:
verName = str(int(version_component) + 1).zfill(padding)
newFileName = "%s_%s.%s" % (name, verName, ext)
incSaveFilePath = os.path.join(curFileDir, newFileName)
Check out this nice tutorial on Python regexes to get an idea what is going on in the above block. Feel free to tweak, evolve and build the regex based on your use cases, tests and needs.
Extra tips:
Call cmds.file(renameToSave=True) at the beginning of the script. This will ensure that the file does not get saved over itself accidentally, and forces the script/user to rename the current file. Just a safety measure.
If you want to go a little fancy with your regex expression and make them more readable, you could try doing this:
match_object = re.search("(?P<name>[a-zA-Z]*)_*(?P<version>[0-9]*)$", name)
name = match_object.group('name')
version_component = match_object('version')
Here we use the ?P<var_name>... syntax to assign a dict key name to the matching group. Makes for better readability when you access it - mo.group('version') is much more clearer than mo.group(2).
Make sure to go through the official docs too.
Save using Maya's commands. This will ensure Maya does all it's checks while and before saving:
cmds.file(rename=incSaveFilePath)
cmds.file(save=True)
Update-2:
If you want space to be checked here's an updated regex:
match_object = re.search("(?P<name>[a-zA-Z]*)[_ ]*(?P<version>[0-9]*)$", name)
Here [_ ]* will check for 0 - many occurrences of _ or (space). For more regex stuff, trying and learn on your own is the best way. Check out the links on this post.
Hope this helps.

Parsing unicode string read from a cell in an xlrd.Book object

I am trying to parse some unicode text from an excel2007 cell read by using xlrd (actually xlsxrd).
For some reason xlrd attaches "text: " to the beginning of the unicode string and is making it difficult for me to type cast. I eventually want to reverse the order of the string since it is a name and will be put in alphabetical order with several others. Any help would be greatly appreciated, thanks.
here is a simple example of what I'm trying to do:
>>> import xlrd, xlsxrd
>>> book = xlsxrd.open_workbook('C:\\fileDir\\fileName.xlsx')
>>> book.sheet_names()
[u'Sheet1', u'Sheet2']
>>> sh = book.sheet_by_index(1)
>>> print sh
<xlrd.sheet.Sheet object at 0x(hexaddress)>
>>> name = sh.cell(0, 0)
>>> print name
text: u'First Last'
from here I would like to parse "name" exchanging 'First' with 'Last' or just separating the two for storage in two different vars but every attempt I have made to type cast the unicode gives an error. perhaps I am going about it the wrong way?
Thanks in advance!

I think you may need
name = sh.cell(0,0).value
to get the unicode object. Then, to split into two variables, you can obtain a list with the first and last name, using an empty space as separator:
split_name = name.split(' ')
print split_name
This gives [u'First', u'Last']. You can easily reverse the list:
split_name = split_name.reverse()
print split_name
giving [u'Last', u'First'].

Read aboput the Cell class in the xlrd documentation. Work through the tutorial that you can get via www.python-excel.org.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Pass string into function with re - python

Related

Replacing strings with variables inside file in Python

Python - Regex - Match anything except

Working with Parameters containing Escaped Characters in Python Config file

Incremental Saves

Parsing unicode string read from a cell in an xlrd.Book object

Categories

Resources