Try to parse procedure documentation comment block with pyparsing.
proc test {a b c} {
# Proc description
# :args: a - argument a, b - second argument
# c - third argument
# :return: nothing
puts $a
}
Tokens below created:
EOL = Suppress(pp.LineEnd())
line = pp.SkipTo(EOL)
commentStart = pp.Suppress('#')
commentLine = tclCommentStart + restOfLine
startDocsReturn = commentStart + pp.Keyword(":return:").suppress()
docsReturnLine = startDocsReturn + line
startDocsArgs = commentStart + pp.Keyword(":args:").suppress()
docsArgsLine = startDocsArgs + line
docsDescription = pp.OneOrMore(commentLine, stopOn=startDocsArgs).setParseAction(join_lines)
But it parse correctly if args block is one line. If it is multiline then adding OneOrMore to line token in docsArgsLine doesn't work because the line starts from the sharp character.
What is the correct expression to parse keyword multiline block whish start from special characters?
Related
After watching ArjanCodes video on dataclasses,
I've been trying to add variables to a python dataclass from a json config file to format the font style of a print function printT in Jupyterlab.
I use ANSI escapes for the formatting which doesn't work anymore if I import the variables to the dataclass. Instead of formatting the text, the ANSI code get's printed out.
# config.json
{
"lb" : "\n",
"solid_line" : "'___'*20 + config.lb",
"dotted_line" : "'---'*20 + config.lb",
"BOLD" : "\\033[1m",
"END" : "\\033[0m"
}
# config.py
from dataclasses import dataclass
import json
#dataclass
class PrintConfig:
lb : str
solid_line : str
dotted_line : str
BOLD : str
END : str
def read_config(config_file : str) -> PrintConfig:
with open(config_file, 'r') as file:
data = json.load(file)
return(PrintConfig(**data))
# helper.py
from config import read_config
config = read_config('config.json')
def printT(title,linebreak= True,addLine = True, lineType = config.solid_line,toDisplay = None):
'''
Prints a line break, the input text and a solid line.
Inputs:
title = as string
linebreak = True(default) or False; Adds a line break before printing the title
addLine = True(default) or False; Adds a line after printing the title
lineType = solid_line(default) or dotted_line; Defines line type
toDisplay = displays input, doesnt work with df.info(),because info executes during input
'''
if linebreak:
print(config.lb)
print(config.BOLD + title + config.END)
if addLine:
print(lineType)
if toDisplay is not None:
display(toDisplay)
# test.ipynb
from helper import printT
printT('Hello World')
Output
\033[1mHello World\033[0m
'___'*20 + config.lb
Desired result
Hello World
It works if I use eval if addLine: print(eval(lineType)) but I'd like to get deeper insights into the mechanics here. Is there a way of getting it to work without eval?
Also this part "solid_line" : "'___'*20 + config.lb" feels wrong.
Markdown as alternative to ANSI
Here's a basic configuration system. I won't add the output since it would need a screenshot but it works on bash/macos. Inspired by and [tip_colors_and_formatting]
And from (https://misc.flogisoft.com/bash/tip_colors_and_formatting):
In Bash, the character can be obtained with the following syntaxes:
\e
\033
\x1B
\e didn't work, so I went on to use to \x1B since that worked in the linked SE answer. \033 works too, I checked.
from dataclasses import dataclass
PREFIX = "\x1B["
#these aren't configurable, they are ANSI constants so probably
#not useful to put them in a config json
CODES = dict(
prefix = PREFIX,
bold = f"1",
reset = f"{PREFIX}0m",
red = "31",
green = "32",
)
#dataclass
class PrintConfig:
bold : bool = False
color : str = ""
def __post_init__(self):
# these are calculated variables, none of client code's
# business:
self.start = self.end = ""
start = ""
if self.bold:
start += CODES["bold"] + ";"
if self.color:
start += CODES[self.color.lower()] + ";"
if start:
self.end = CODES["reset"]
#add the escape prefix, then the codes and close with m
self.start = f"{CODES['prefix']}{start}".rstrip(";") + "m"
def print(self,v):
print(f"{self.start}{v}{self.end}")
normal = PrintConfig()
normal.print("Hello World")
bold = PrintConfig(bold=1)
print(f"{bold=}:")
bold.print(" Hello World")
boldred = PrintConfig(bold=1,color="red")
print(f"{boldred=}:")
boldred.print(" Hello bold red")
#this is how you would do it from json
green = PrintConfig(**dict(color="green"))
green.print(" Little Greenie")
#inspired from https://stackoverflow.com/a/287934
print("\n\ninspired by...")
CSI = "\x1B["
print(CSI+"31;40m" + "Colored Text" + CSI + "0m")
print(CSI+"1m" + "Colored Text" + CSI + "0m")
This string consists of an actual backslash followed by the digits 033, etc.
"BOLD" : "\\033[1m",
To turn on bold on an ansi terminal, you need an escape character (octal 33) followed by [1m. In Python, you can write those escape codes with a single backslash: "\033[1m". In a json file, you must provide the unicode codepoint of the escape character, \u001b. If the rest is in order, you'll see boldface.
"BOLD" : "\u001b[1m",
"END" : "\u001b[0m"
As for the eval part, you have a string containing the expression you need to evaluate. I assume you wrote it this way because you first tried without the double quotes, e.g. ,
"dotted_line" : '---'*20 + config.lb,
and you got a json syntax error. That's not surprising: Json files are data, not code, and they cannot incorporate expressions or variable references. Either place your config in a python file that you include instead of loading json, or move the dependencies to the code. Or both.
In a python file, config.py:
config = {
"lb": "\n",
"solid_line" : '___'*20,
...
In helper.py:
...
if addLine:
print(lineType + config.lb)
I am writing a function for deleting selected text (in a special way) when vim is running in a ssh session:
python << EOF
def delSelection():
buf = vim.current.buffer
(lnum1, col1) = buf.mark('<')
(lnum2, col2) = buf.mark('>')
# get selected text
# lines = vim.eval('getline({}, {})'.format(lnum1, lnum2))
# lines[0] = lines[0][col1:]
# lines[-1] = lines[-1][:col2+1]
# selected = "\n".join(lines) + "\n"
# passStrNc(selected)
# delete selected text
lnum1 -= 1
lnum2 -= 1
firstSeletedLine = buf[lnum1]
firstSeletedLineNew = buf[lnum1][:col1]
lastSelectedLine = buf[lnum2]
lastSelectedLineNew = buf[lnum2][(col2 + 1):]
newBuf = ["=" for i in range(lnum2 - lnum1 + 1)]
newBuf[0] = firstSeletedLineNew
newBuf[-1] = lastSelectedLineNew
print(len(newBuf))
print(len(buf[lnum1:(lnum2 + 1)]))
buf[lnum1:(lnum2 + 1)] = newBuf
EOF
function! DelSelection()
python << EOF
delSelection()
EOF
endfunction
python << EOF
import os
sshTty = os.getenv("SSH_TTY")
if sshTty:
cmd6 = "vnoremap d :call DelSelection()<cr>"
vim.command(cmd6)
EOF
Apparently vim is calling the function on every line selected, which defeats the whole purpose of the function. How should I do this properly?
That's because the : automatically inserts the '<,'> range when issued in visual mode. The canonical way to clear that is by prepending <C-u> to the mapping:
cmd6 = "vnoremap d :<C-u>call DelSelection()<cr>"
Alternatively, you can also append the range keyword to the :function definition, cp. :help a:firstline.
ok, I got it. I just need to add an Esc key before calling the function:
python << EOF
import os
sshTty = os.getenv("SSH_TTY")
if sshTty:
cmd6 = "vnoremap d <esc>:call DelSelection()<cr>"
vim.command(cmd6)
EOF
I have a neat little script in python that I would like to port to Ruby and I think it's highlighting my noobishness at Ruby. I'm getting the error that there is an unexpected END statement, but I don't see how this can be so. Perhaps there is a keyword that requires an END or something that doesn't want an END that I forgot about. Here is all of the code leading up to the offending line Offending line is commented.
begin
require base64
require base32
rescue LoadError
puts "etext requires base32. use 'gem install --remote base32' and try again"
end
# Get a string from a text file from disk
filename = ARGV.first
textFile = File.open(filename)
text = textFile.read()
mailType = "text only" # set the default mailType
#cut the email up by sections
textList1 = text.split(/\n\n/)
header = textList1[0]
if header.match (/MIME-Version/)
mailType = "MIME"
end
#If mail has no attachments, parse as text-only. This is the class that does this
class TextOnlyMailParser
def initialize(textList)
a = 1
body = ""
header = textList[0]
#parsedEmail = Email.new(header)
while a < textList.count
body += ('\n' + textList[a] + '\n')
a += 1
end
#parsedEmail.body = body
end
end
def separate(text,boundary = nil)
# returns list of strings and lists containing all of the parts of the email
if !boundary #look in the email for "boundary= X"
text.scan(/(?<=boundary=).*/) do |bound|
textList = recursiveSplit(text,bound)
end
return textList
end
if boundary
textList = recursiveSplit(text,boundary)
end
end
def recursiveSplit(chunk,boundary)
if chunk.is_a? String
searchString = "--" + boundary
ar = cunk.split(searchString)
return ar
elsif chunk.is_a? Array
chunk do |bit|
recursiveSplit(bit,boundary);
end
end
end
class MIMEParser
def initialize(textList)
#textList = textList
#nestedItems = []
newItem = NestItem.new(self)
newItem.value = #textList[0]
newItem.contentType = "Header"
#nestedItems.push(newItem)
#setup parsed email
#parsedEmail = Email.new(newItem.value)
self._constructNest
end
def checkForContentSpecial(item)
match = item.value.match (/Content-Disposition: attachment/)
if match
filename = item.value.match (/(?<=filename=").+(?=")/)
encoding = item.value.match (/(?<=Content-Transfer-Encoding: ).+/)
data = item.value.match (/(?<=\n\n).*(?=(\n--)|(--))/m)
dataGroup = data.split(/\n/)
dataString = ''
i = 0
while i < dataGroup.count
dataString += dataGroup[i]
i ++
end #<-----THIS IS THE OFFENDING LINE
#parsedEmail.attachments.push(Attachment.new(filename,encoding,dataString))
end
Your issue is the i ++ line, Ruby does not have a post or pre increment/decrement operators and the line is failing to parse. I can't personally account as to why i++ evaluates in IRB but i ++ does not perform any action.
Instead replace your ++ operators with += 1 making that last while:
while i < dataGroup.count
dataString += dataGroup[i]
i += 1
end
But also think about the ruby way, if you're just adding that to a string why not do a dataString = dataGroup.join instead of looping over with a while construct?
am trying to figure out how to use this nifty lib to parse BigIP config files...
the grammar should,be something like this:
stanza :: name { content }
name :: several words, might contain alphas nums dot dash underscore or slash
content:: stanza OR ZeroOrMore(printable characters)
To make things slightly more complicated, one exception:
If name starts with "rule ", then content cannot be "stanza"
I started with this:
from pyparsing import *
def parse(config):
def BNF():
"""
Example:
...
ltm virtual /Common/vdi.uis.test.com_80_vs {
destination /Common/1.2.3.4:80
http-class {
/Common/http2https
}
ip-protocol tcp
mask 255.255.255.255
profiles {
/Common/http { }
/Common/tcp { }
}
vlans-disabled
}
...
"""
lcb, rcb, slash, dot, underscore, dash = [c for c in '{}/._-']
name_word = Word(alphas + nums + dot + underscore + slash + dash)
name = OneOrMore(name_word).setResultsName("name")
stanza = Forward()
content = OneOrMore(stanza | ZeroOrMore(OneOrMore(Word(printables)))).setResultsName("content")
stanza << Group(name + lcb + content + rcb).setResultsName("stanza")
return stanza
return [x for x in BNF().scanString(config)]
The code above seems to lock up in some infinite loop. It is also missing my requirement for excluding looking for 'stanza" if "name" starts with "rule ".
OneOrMore(ZeroOrMore(OneOrMore(Word(printables))) will always match, thus leading to the infinite loop.
Also, printables includes a closing curly bracket, which gets consumed by the content term, and is no longer available for the stanza. (If your content can including a closing bracket, you need to define something to escape it, to distinguish a content bracket from a stanza bracket.)
To address the name rule, you need another content definition, one that doesn't include stanza, and a "rule rule".
def parse(config):
def BNF():
lcb, rcb, slash, dot, underscore, dash = [c for c in '{}/._-']
printables_no_rcb = Word(printables, excludeChars=rcb)
name_word = Word(alphas + nums + dot + underscore + slash + dash)
name = OneOrMore(name_word).setResultsName("name")
rule = Group(Literal('rule') + name).setResultsName("name")
rule_content = OneOrMore(printables_no_rcb).setResultsName("content")
stanza = Forward()
content = OneOrMore(stanza | OneOrMore(printables_no_rcb)).setResultsName("content")
stanza << Group(rule + lcb + rule_content + rcb | name + lcb + content + rcb).setResultsName("stanza")
return stanza
return [x for x in BNF().scanString(config)]
If I have a keyword, how can I get it to, once it encounters a keyword, to just grab the rest of the line and return it as a string? Once it encounters an end of line, return everything on that line.
Here is the line I'm looking at:
description here is the rest of my text to collect
Thus, when the lexer encounters description, I would like "here is the rest of my text to collect" returned as a string
I have the following defined, but it seems to be throwing an error:
states = (
('bcdescription', 'exclusive'),
)
def t_bcdescription(t):
r'description '
t.lexer.code_start = t.lexer.lexpos
t.lexer.level = 1
t.lexer.begin('bcdescription')
def t_bcdescription_close(t):
r'\n'
t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
t.type="BCDESCRIPTION"
t.lexer.lineno += t.valiue.count('\n')
t.lexer.begin('INITIAL')
return t
This is part of the error being returned:
File "/Users/me/Coding/wm/wm_parser/ply/lex.py", line 393, in token
raise LexError("Illegal character '%s' at index %d" % (lexdata[lexpos],lexpos), lexdata[lexpos:])
ply.lex.LexError: Illegal character ' ' at index 40
Finally, if I wanted this functionality for more than one token, how could I accomplish that?
Thanks for your time
There is no big problem with your code,in fact,i just copy your code and run it,it works well
import ply.lex as lex
states = (
('bcdescription', 'exclusive'),
)
tokens = ("BCDESCRIPTION",)
def t_bcdescription(t):
r'\bdescription\b'
t.lexer.code_start = t.lexer.lexpos
t.lexer.level = 1
t.lexer.begin('bcdescription')
def t_bcdescription_close(t):
r'\n'
t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
t.type="BCDESCRIPTION"
t.lexer.lineno += t.value.count('\n')
t.lexer.begin('INITIAL')
return t
def t_bcdescription_content(t):
r'[^\n]+'
lexer = lex.lex()
data = 'description here is the rest of my text to collect\n'
lexer.input(data)
while True:
tok = lexer.token()
if not tok: break
print tok
and result is :
LexToken(BCDESCRIPTION,' here is the rest of my text to collect\n',1,50)
So maybe your can check other parts of your code
and if I wanted this functionality for more than one token, then you can simply capture words and when there comes a word appears in those tokens, start to capture the rest of content by the code above.
It is not obvious why you need to use a lexer/parser for this without further information.
>>> x = 'description here is the rest of my text to collect'
>>> a, b = x.split(' ', 1)
>>> a
'description'
>>> b
'here is the rest of my text to collect'