I'd like to make the 'pyparsing' parsing result come out as a dictionary without neeing to post-process. For this, I need to define my own key strings. The following the best I could come up with that produces the desired results.
Line to parse:
%ADD22C,0.35X*%
Code:
import pyparsing as pyp
floatnum = pyp.Regex(r'([\d\.]+)')
comma = pyp.Literal(',').suppress()
cmd_app_def = pyp.Literal('AD').setParseAction(pyp.replaceWith('aperture-definition'))
cmd_app_def_opt_circ = pyp.Group(pyp.Literal('C') +
comma).setParseAction(pyp.replaceWith('circle'))
circular_apperture = pyp.Group(cmd_app_def_opt_circ +
pyp.Group(pyp.Empty().setParseAction(pyp.replaceWith('diameter')) + floatnum) +
pyp.Literal('X').suppress())
<the grammar for the entire line>
The result is:
['aperture-definition', '20', ['circle', ['diameter', '0.35']]]
What I consider a hack here is
pyp.Empty().setParseAction(pyp.replaceWith('diameter'))
which always matches and is empty, but then I assign my desired key name to it.
Is this the best way to do this? Am I abusing pyparsing to do something it's not meant to do?
If you want to name your floatnum as "diameter", you can use named results:
cmd_app_def_opt_circ = pyp.Group(pyp.Literal('C') +
comma)("circle")
circular_apperture = pyp.Group(cmd_app_def_opt_circ +
pyp.Group(floatnum)("diameter") +
pyp.Literal('X').suppress())
In this way, every time the parses encounters floatnum in the circular_appertur context, this result is named diameter. Also, as described above, you can name circle in the same fashion. Does this work for you?
See comments in the posted code.
import pyparsing as pyp
comma = pyp.Literal(',').suppress()
# use parse actions to do type conversion at parse time, so that results fields
# can immediately be used as ints or floats, without additional int() or float()
# calls
floatnum = pyp.Regex(r'([\d\.]+)').setParseAction(lambda t: float(t[0]))
integer = pyp.Word(pyp.nums).setParseAction(lambda t: int(t[0]))
# define the command keyword - I assume there will be other commands too, they
# should follow this general pattern (define the command keyword, then all the
# options, then define the overall command)
aperture_defn_command_keyword = pyp.Literal('AD')
# define a results name for the matched integer - I don't know what this
# option is, wasn't in your original post
d_option = 'D' + integer.setResultsName('D')
# shortcut for defining a results name is to use the expression as a
# callable, and pass the results name as the argument (I find this much
# cleaner and keeps the grammar definition from getting messy with lots
# of calls to setResultsName)
circular_aperture_defn = 'C' + comma + floatnum('diameter') + 'X'
# define the overall command
aperture_defn_command = aperture_defn_command_keyword("command") + d_option + pyp.Optional(circular_aperture_defn)
# use searchString to skip over '%'s and '*'s, gives us a ParseResults object
test = "%ADD22C,0.35X*%"
appData = aperture_defn_command.searchString(test)[0]
# ParseResults can be accessed directly just like a dict
print appData['command']
print appData['D']
print appData['diameter']
# or if you prefer attribute-style access to results names
print appData.command
print appData.D
print appData.diameter
# convert ParseResults to an actual Python dict, removes all unnamed tokens
print appData.asDict()
# dump() prints out the parsed tokens as a list, then all named results
print appData.dump()
Prints:
AD
22
0.35
AD
22
0.35
{'diameter': 0.34999999999999998, 'command': 'AD', 'D': 22}
['AD', 'D', 22, 'C', 0.34999999999999998, 'X']
- D: 22
- command: AD
- diameter: 0.35
Related
I'm new to Python and relatively new to programming. I'm trying to replace part of a file path with a different file path. If possible, I'd like to avoid regex as I don't know it. If not, I understand.
I want an item in the Python list [] before the word PROGRAM to be replaced with the 'replaceWith' variable.
How would you go about doing this?
Current Python List []
item1ToReplace1 = \\server\drive\BusinessFolder\PROGRAM\New\new.vb
item1ToReplace2 = \\server\drive\BusinessFolder\PROGRAM\old\old.vb
Variable to replace part of the Python list path
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
Desired results for Python List []:
item1ToReplace1 = C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb
item1ToReplace2 = C:\ProgramFiles\Micosoft\PROGRAM\old\old.vb
Thank you for your help.
The following code does what you ask, note I updated your '' to '\', you probably need to account for the backslash in your code since it is used as an escape character in python.
import os
item1ToReplace1 = '\\server\\drive\\BusinessFolder\\PROGRAM\\New\\new.vb'
item1ToReplace2 = '\\server\\drive\\BusinessFolder\\PROGRAM\\old\\old.vb'
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
keyword = "PROGRAM\\"
def replacer(rp, s, kw):
ss = s.split(kw,1)
if (len(ss) > 1):
tail = ss[1]
return os.path.join(rp, tail)
else:
return ""
print(replacer(replaceWith, item1ToReplace1, keyword))
print(replacer(replaceWith, item1ToReplace2, keyword))
The code splits on your keyword and puts that on the back of the string you want.
If your keyword is not in the string, your result will be an empty string.
Result:
C:\ProgramFiles\Microsoft\PROGRAM\New\new.vb
C:\ProgramFiles\Microsoft\PROGRAM\old\old.vb
One way would be:
item_ls = item1ToReplace1.split("\\")
idx = item_ls.index("PROGRAM")
result = ["C:", "ProgramFiles", "Micosoft"] + item_ls[idx:]
result = "\\".join(result)
Resulting in:
>>> item1ToReplace1 = r"\\server\drive\BusinessFolder\PROGRAM\New\new.vb"
... # the above
>>> result
'C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb'
Note the use of r"..." in order to avoid needing to have to 'escape the escape characters' of your input (i.e. the \). Also that the join/split requires you to escape these characters with a double backslash.
I have a text file ("input.param"), which serves as an input file for a package. I need to modify the value of one argument. The lines need to be changed are the following:
param1 0.01
model_name run_param1
I need to search the argument param1 and modify the value of 0.01 for a range of different values, meanwhile the model_name will also be changed accordingly for different value of param1. For example, if the para1 is changed to be 0.03, then the model_name is changed to be 'run_param1_p03'. Below is some of my attempting code:
import numpy as np
import os
param1_range = np.arange(0.01,0.5,0.01)
with open('input.param', 'r') as file :
filedata = file.read()
for p_value in param1_range:
filedata.replace('param1 0.01', 'param1 ' + str(p_value))
filedata.replace('model_name run_param1', 'model_name run_param1' + '_p0' + str(int(round(p_value*100))))
with open('input.param', 'w') as file:
file.write(filedata)
os.system('./bin/run_app param/input.param')
However, this is not working. I guess the main problem is that the replace command can not recognize the space. But I do not know how to search the argument param1 or model_name and change their values.
I'm editing this answer to more accurately answer the original question, which it did not adequately do.
The problem is "The replace command can not recognize the space". In order to do this, the re, or regex module, can be of great help. Your document is composed of an entry and its value, separated by spaces:
param1 0.01
model_name run_param1
In regex, a general capture would look like so:
import re
someline = 'param1 0.01'
pattern = re.match(r'^(\S+)\s+(\S+)$', someline)
pattern.groups()
# ('param1', '0.01')
The regex functions as follows:
^ captures a start-of-line
\S is any non-space char, or, anything not in ('\t', ' ', '\r', '\n')
+ indicates one or more as a greedy search (will go forward until the pattern stops matching)
\s+ is any whitespace char (opposite of \S, note the case here)
() indicate groups, or how you want to group your search
The groups make it fairly easy for you to unpack your arguments into variables if you so choose. To apply this to the code you have already:
import numpy as np
import re
param1_range = np.arange(0.01,0.5,0.01)
filedata = []
with open('input.param', 'r') as file:
# This will put the lines in a list
# so you can use ^ and $ in the regex
for line in file:
filedata.append(line.strip()) # get rid of trailing newlines
# filedata now looks like:
# ['param1 0.01', 'model_name run_param1']
# It might be easier to use a dictionary to keep all of your param vals
# since you aren't changing the names, just the values
groups = [re.match('^(\S+)\s+(\S+)$', x).groups() for x in filedata]
# Now you have a list of tuples which can be fed to dict()
my_params = dict(groups)
# {'param1': '0.01', 'model_name': 'run_param1'}
# Now just use that dict for setting your params
for p_value in param1_range:
my_params['param1'] = str(p_value)
my_params['model_name'] = 'run_param1_p0' + str(int(round(p_value*100)))
# And for the formatting back into the file, you can do some quick padding to get the format you want
with open('somefile.param', 'w') as fh:
content = '\n'.join([k.ljust(20) + v.rjust(20) for k,v in my_params.items()])
fh.write(content)
The padding is done using str.ljust and str.rjust methods so you get a format that looks like so:
for k, v in dict(groups).items():
intstr = k.ljust(20) + v.rjust(20)
print(intstr)
param1 0.01
model_name run_param1
Though you could arguably leave out the rjust if you felt so inclined.
I have user input statements which I would like to parse for arguments. If possible using regex.
I have read much about functools.partial on Stackoverflow where I could not find argument parsing. Also in regex on Stackoverflow I could not find how to check for a match, but exclude the used tokens. The Python tokenizer seems to heavy for my purpose.
import re
def getarguments(statement):
prog = re.compile("([(].*[)])")
result = prog.search(statement)
m = result.group()
# m = '(interval=1, percpu=True)'
# or m = "('/')"
# strip the parentheses, ugly but it works
return statement[result.start()+1:result.end()-1]
stm = 'psutil.cpu_percent(interval=1, percpu=True)'
arg_list = getarguments(stm)
print(arg_list) # returns : interval=1, percpu=True
# But combining single and double quotes like
stm = "psutil.disk_usage('/').percent"
arg_list = getarguments(stm) # in debug value is "'/'"
print(arg_list) # when printed value is : '/'
callfunction = psutil.disk_usage
args = []
args.append(arg_list)
# args.append('/')
funct1 = functools.partial(callfunction, *args)
perc = funct1().percent
print(perc)
This results an error :
builtins.FileNotFoundError: [Errno 2] No such file or directory: "'/'"
But
callfunction = psutil.disk_usage
args = []
#args.append(arg_list)
args.append('/')
funct1 = functools.partial(callfunction, *args)
perc = funct1().percent
print(perc)
Does return (for me) 20.3 This is correct.
So there is somewhere a difference.
The weird thing is, if I view the content in my IDE (WingIDE) the result is "'/'" and then, if I want to view the details then the result is '/'
I use Python 3.4.0 What is happening here, and how to solve?
Your help is really appreciated.
getarguments("psutil.disk_usage('/').percent") returns '/'. You can check this by printing len(arg_list), for example.
Your IDE adds ", because by default strings are enclosed into single quotes '. Now you have a string which actually contains ', so IDE uses double quotes to enclose the string.
Note, that '/' is not equal to "'/'". The former is a string of 1 character, the latter is a string of 3 characters. So in order to get things right you need to strip quotes (both double and single ones) in getarguments. You can do it with following snippet
if (s.startswith('\'') and s.endswith('\'')) or
(s.startswith('\"') and s.endswith('\"')):
s = s[1:-1]
I'm reading a file and I need to replace certain empty tags ([[Image:]]).
The problem is every replacement has to be unique.
Here's the code:
import re
import codecs
re_imagematch = re.compile('(\[\[Image:([^\]]+)?\]\])')
wf = codecs.open('converted.wiki', "r", "utf-8")
wikilines = wf.readlines()
wf.close()
imgidx = 0
for i in range(0,len(wikilines)):
if re_imagematch.search(wikilines[i]):
print 'MATCH #######################################################'
print wikilines[i]
wikilines[i] = re_imagematch.sub('[[Image:%s_%s.%s]]' % ('outname', imgidx, 'extension'), wikilines[i])
print wikilines[i]
imgidx += 1
This does not work, as there can be many tags in one line:
Here's the input file.
[[Image:]][[Image:]]
[[Image:]]
This is what the output should look like:
[[Image:outname_0.extension]][Image:outname_1.extension]]
[[Image:outname_2.extension]]
This is what it currently looks likeƶ
[[Image:outname_0.extension]][Image:outname_0.extension]]
[[Image:outname_1.extension]]
I tried using a replacement function, the problem is this function gets only called once per line using re.sub.
You can use itertools.count here and take some advantage of the fact that default arguments are calculated when function is created and value of mutable default arguments can persist between function calls.
from itertools import count
def rep(m, cnt=count()):
return '[[Image:%s_%s.%s]]' % ('outname', next(cnt) , 'extension')
This function will be invoked for each match found and it'll use a new value for each replacement.
So, you simply need to change this line in your code:
wikilines[i] = re_imagematch.sub(rep, wikilines[i])
Demo:
def rep(m, count=count()):
return str(next(count))
>>> re.sub(r'a', rep, 'aaa')
'012'
To get the current counter value:
>>> from copy import copy
>>> next(copy(rep.__defaults__[0])) - 1
2
I'd use a simple string replacement wrapped in a while loop:
s = '[[Image:]][[Image:]]\n[[Image:]]'
pattern = '[[Image:]]'
i = 0
while s.find(pattern) >= 0:
s = s.replace(pattern, '[[Image:outname_' + str(i) + '.extension]]', 1)
i += 1
print s
I set up a dictionary, and filled it from a file, like so:
filedusers = {} # cheap way to keep track of users, not for production
FILE = open(r"G:\School\CS442\users.txt", "r")
filedusers = ast.literal_eval("\"{" + FILE.readline().strip() + "}\"")
FILE.close()
then later I did a test on it, like this:
if not filedusers.get(words[0]):
where words[0] is a string for a username, but I get the following error:
'str' object has no attribute 'get'
but I verified already that after the FILE.close() I had a dictionary, and it had the correct values in it.
Any idea what's going on?
literal_eval takes a string, and converts it into a python object. So, the following is true...
ast.literal_eval('{"a" : 1}')
>> {'a' : 1}
However, you are adding in some quotations that aren't needed. If your file simply contained an empty dictionary ({}), then the string you create would look like this...
ast.literal_eval('"{}"') # The quotes that are here make it return the string "{}"
>> '{}'
So, the solution would be to change the line to...
ast.literal_eval("{" + FILE.readline().strip() + "}")
...or...
ast.literal_eval(FILE.readline().strip())
..depending on your file layout. Otherwise, literal_eval sees your string as an ACTUAL string because of the quotes.
>>> import ast
>>> username = "asd: '123'"
>>> filedusers = ast.literal_eval("\"{" + username + "}\"")
>>> print filedusers, type(filedusers)
{asd} <type 'str'>
You don't have a dictionary, it just looks like one. You have a string.
Python is dynamically typed: it does not require you to define variables as a specific type. And it lets you define variables implicitly. What you are doing is defining filedusers as a dictionary, and then redefining it as a string by assigning the result of ast.literal_eval to it.
EDIT: You need to remove those quotes. ast.literal_eval('"{}"') evaluates to a string. ast.literal_eval('{}') evaluates to a dictionary.