How to write a PLY interface for hand-written lexer? - python

I'm writing a compiler in Python, and I made a hand-written lexer, because I can't figure out how to parse indentation in PLY. Also, my lexer uses some yield statements like so:
def scan():
...
for i in tokens:
if i[0]: yield Token(self.line, i[0] if i[0] in keywords else "ident", i[0])
elif i[1]:
if "e" in i[1]:
base, exp = i[1].split("e")
val = float(base) * 10 ** int(exp)
else: val = float(i[1])
yield Token(self.line, "float", val)
... other cases ...
However, I realized that the PLY parser requires a token method, so I made one that looks like this:
def token(self):
return next(self.scan())
The actual scanning using scan() takes an average of 124 ms, according to my tests, but when I use the PLY parser, the parsing doesn't start after a few minutes. It appears that my token() method has a problem.
Also, I tried to rename the scan() method so that it could become the interface. Python returns something like
AttributeError: 'generator' object has no attribute 'type'
So it appears that PLY needs a method that will return a single token at a time.
Is there any way to rewrite the token() method so that it would return the next iteration of scan() and not be that slow?

You need to save your generator somewhere, like:
def start(...):
self.lexer = self.scan()
def token(...):
return next(self.lexer)
Disclaimer: I don't know anything about PLY.

Related

I can't put "continue" command in a definition?

Let's say,
def sample():
if a==1:
print(a)
else:
continue
for i in language:
a=i
sample()
I want to use this function in a loop, but the continue command gives me an error because there is no loop. What can I do?
Return a boolean from the function and based on the return value make continue or not because continue must be within a loop
continue keyword in python is only available in for or while loops. Also block defined variables like a are not available on the global scope.
I don't know what you want to achieve but assuming your code, you want to extract a condition into a function, something like this:
def condition(a):
return a == 1
def sample(a):
print(a)
for i in language:
a=i
if condition(a):
sample(a)
else:
continue
There are several best-practice patterns of exactly how to do this, depending on your needs.
0. Factor your code better
Before doing any of the below, stop and ask yourself if you can just do this instead:
def sample(a):
print(a)
for i in language:
if i != 1:
continue
sample(i)
This is so much better:
it's clearer to the reader (everything you need to understand the loop's control flow is entirely local to the loop - it's right there in the loop, we don't have to look anywhere else farther away like a function definition to know when or why or how the loop will do the next thing),
it's cleaner (less boilerplate code than any of the solutions below),
it's more efficient, technically (not that this should matter until you measure a performance problem, but this might appeal to you; going into a function and coming back out of it, plus somehow telling the loop outside the function to continue - that's more work to achieve the same thing), and
it's simpler (objectively: there is less code complected together - the loop behavior is no longer tied to the body of the sample function, for example).
But, if you must:
1. Add boolean return
The simplest change that works with your example is to return a boolean:
def sample(a):
if a==1:
print(a)
else:
return True
return False
for i in language:
if sample(i):
continue
However, don't just mindlessly always use True for continue - for each function, use the one that fits with the function. In fact, in well-factored code, the boolean return value will make sense without even knowing that you are using it in some loop to continue or not.
For example, if you have a function called check_if_valid, then the boolean return value just makes sense without any loops - it tells you if the input is valid - and at the same time, either of these loops is sensible depending on context:
for thing in thing_list:
if check_if_valid(thing):
continue
... # do something to fix the invalid things
for thing in thing_list:
if not check_if_valid(thing):
continue
... # do something only with valid things
2. Reuse existing return
If your function already returns something, or you can rethink your code so that returns make sense, then you can ask yourself: is there a good way to decide to continue based on that return value?
For example, let's say inside your sample function you were actually trying to do something like this:
def sample(a):
record = select_from_database(a)
if record.status == 1:
print(record)
else:
continue
Well then you can rewrite it like this:
def sample(a):
record = select_from_database(a)
if record.status == 1:
print(record)
return record
for i in language:
record = sample(a)
if record.status != 1:
continue
Of course in this simple example, it's cleaner to just not have the sample function, but I am trusting that your sample function is justifiably more complex.
3. Special "continue" return
If no existing return value makes sense, or you don't want to couple the loop to the return value of your function, the next simplest pattern is to create and return a special unique "sentinel" object instance:
_continue = object()
def sample(a):
if a==1:
print(a)
else:
return _continue
for i in language:
result = sample(i):
if result = _continue:
continue
(If this is part of a module's API, which is something that you are saying if you name it like sample instead of like _sample, then I would name the sentinel value continue_ rather than _continue... But I also would not make something like this part of an API unless I absolutely had to.)
(If you're using a type checker and it complains about returning an object instance conflicting with your normal return value, you can make a Continue class and return an instance of that instead of an instance of object(). Then the type hinting for the function return value can be a type union between your normal return type and the Continue type. If you have multiple control flow constructs in your code that you want to smuggle across function call lines like this.)
4. Wrap return value (and "monads")
Sometimes, if the type union thing isn't good enough for some reason, you may want to create a wrapper object, and have it store either your original return value, or indicate control flow. I only mention this option for completeness, without examples, because I think the previous options are better most of the time in Python. But if you take the time to learn about "Option types" and "maybe monads", it's kinda like that.
(Also, notice that in all of my examples, I fixed your backdoor argument passing through a global variable to be an explicit clearly passed argument. This makes the code easier to understand, predict, and verify for correctness - you might not see that yet but keep an eye out for implicit state passing making code harder to follow and keep correct as you grow as a developer, read more code by others, and deal with bugs.)
It is because the scope of the function doesn't know we are in a loop. You have to put the continue keyword inside the loop
continue keyword cannot be used inside a function. It must be inside the loop. There is a similar question here. Maybe you can do something like the following.
language = [1,1,1,2,3]
a = 1
def sample():
if a == 1:
print(a)
return False
else:
return True
for i in language:
if sample():
continue
else:
a = i
OR something like this:
language = [1,1,1,2,3]
a = 1
def gen(base):
for item in base:
if a == 1:
yield a
else:
continue
for i in gen(language):
a = i
print(a)

Simplify chained expression in ANTLR listener

I have an ANTLR listener for C++ where I want to get the name of a member declarator. Currently, I'm using this approach:
def enterMemberDeclarator(self, ctx: CPP14Parser.MemberDeclaratorContext):
id = ctx.declarator().pointerDeclarator().noPointerDeclarator().noPointerDeclarator().declaratorid().idExpression().unqualifiedId()
which is just a horrible expression. I feel like there should be some way of getting the id immediately without having to go down that rabbit hole. Additionally, some of these expressions might be None so I fear that I would have to make even more effort to get to the result...
The grammar is from here
ANTLR4 supports XPath expressions to find specific nodes (see the documentation). That's somewhat easier to read than your expression, especially when you have to check for null:
ids = XPath.findAll(ctx, "/declarator/pointerDeclarator/noPointerDeclarator/noPointerDeclarator/declaratorid/idExpression/unqualifiedId")
(this is just pseudo code, I don't know python well).
This definitely looks very brittle (specific to a particular example).
You might consider a recursive method that checks the type of the context being passed in and chooses the appropriate attribute.
I'm not a Python programmer, so (in pseudoCode) something like the following:
function getMDName(ctx: <appropriate ANTLR base class) -> String
if ctx is MemberDeclarationContext
return getMDName(ctx.declarator()
if ctx is DeclaratorContext
if ctx.pointerDeclarator() != null
return getMDName(ctx.pointerDeclarator())
else
return getMDName(ctx.noPointerDeclarator())
if ctx is NoPointerDeclaratorContext
if ctx.declaratorid() != null
return getMDName(ctx.declaratorid())
else if ctx.pointerDeclarator() != null
return getMDName(ctx.pointerDeclaration())
else
return getMDNAME(ctx.noPointerDeclarator())
if ctx is PointerDeclarationContext
return getMDName(ctx.noPointerDeclaration())
if ctx is declaratorIdContext
return getMDName(ctx.idExpression()..unqualifiedId()
}

searching python object hierarchy

I'm using an unstable python library that's undergoing changes to the API in its various git submodules. Every few weeks, some member or method's location will get changed or renamed. For example, the expression
vk.rinkront[0].asyncamera.printout()
that worked a few weeks back, now has the asyncamera member located in
vk.toplink.enabledata.tx[0].asyncamera[0].print()
I manage to figure out the method's new location by greping, git diffing, ipython autocomplete, and generally bumbling my way around. This process is painful, because the repository has been quite abstracted and the member names observable from the python shell do not necessarily appear in the git submodule code.
Is there a python routine that performs a graph traversal of the object hierarchy while checking for keywords (e.g. print ) ?
(if no, I'll hack some bfs/dfs that checks the child node type before pushing/appending the __dict__, array, dir(). etc... contents onto a queue/stack. But some one must have come across this problem before and come up with a more elegant solution.)
-------EDITS----------
to sum up, is there an existing library such that the following code would work?
import unstable_library
import object_inspecter.search
vk = unstable_library.create() #initializing vk
object_inspecter.search(vk,"print") #searching for lost method
Oy, I feel for you... You probably should just lock down on a version, use it until the API becomes stable, and then upgrade later. That's why complex projects focus on dependency version control.
There is a way, but it's not really standard. The dir() built-in function returns a list of string names of all attributes and methods of a class. With a little bit a wrangling, you could write a script that recursively digs down.
The ultimate issue, though, is that you're going to run into infinite recursive loops when you try to investigate class members. You'll need to add in smarts to either recognize the pattern or limit the depth of your searching.
you can use fairly simple graph traversal to do this
def find_thing(ob,pattern,seen=set(),name=None,curDepth=0,maxDepth=99,isDict=True):
if(ob is None):
return []
if id(ob) in seen:
return []
seen.add(id(ob))
name = str(name or getattr(ob,"__name__",str(ob)))
base_case = check_base_cases(ob,name,pattern)
if base_case is not None:
return base_case
if(curDepth>=maxDepth):
return []
return recursive_step(ob,pattern,name=name,curDepth=curDepth,isDict=isDict)
now just define your two steps (base_case and recursive_step)... something like
def check_base_cases(ob,name,pattern):
if isinstance(ob,str):
if re.match(pattern,ob):
return [ob]
else:
return []
if isinstance(ob,(int,float,long)):
if ob == pattern or str(ob) == pattern:
return [ob]
else:
return []
if isinstance(ob,types.FunctionType):
if re.match(pattern,name):
return [name]
else:
return []
def recursive_step(ob,pattern,name,curDepth,isDict=True):
matches = []
if isinstance(ob,(list,tuple)):
for i,sub_ob in enumerate(ob):
matches.extend(find_thing(sub_ob,pattern,name='%s[%s]'%(name,i),curDepth=curDepth+1))
return matches
if isinstance(ob,dict):
for key,item in ob.items():
if re.match(pattern,str(key)):
matches.append('%s.%s'%(name,key) if not isDict else '%s["%s"]'%(name,key))
else:
matches.extend(find_thing(item,pattern,
name='%s["%s"]'%(name,key) if isDict else '%s.%s'%(name,key),
curDepth=curDepth+1))
return matches
else:
data = dict([(x,getattr(ob,x)) for x in dir(ob) if not x.startswith("_")])
return find_thing(data,pattern,name=name,curDepth=curDepth+1,isDict=False)
finally you can test it like so
print(find_thing(vk,".*print.*"))
I used the following Example
class vk(object):
class enabledata:
class camera:
class object2:
def printX(*args):
pass
asynccamera = [object2]
tx = [camera]
toplink = {'enabledata':enabledata}
running the following
print( find_thing(vk,'.*print.*') )
#['vk.toplink["enabledata"].camera.asynccamera[0].printX']

Python AST, ast.NodeTransformer ,TypeError: required field 'lineno' missing from stmt

I'd be v. gratefull to learn something, usefull, as for now, I've been moving blindly.
so the problem lays in python's ast.NodeTransformer. I was wondering if one is able to add an function to existing class using this way, and not getting mad.
this is how I proceded so far.
import ast, inspect, cla # cla is a name of class to which we want to add a new function
klass = inspect.getsource(cla)
tree = ast.parse(klass)
st = '''def function(): return 1'''
Foo = ast.parse(st)
class AddFunc(ast.NodeTransformer):
def visit_ClassDef(self, node):
return node, node.body + Foo.body
self.generic_visit(node)
inst = AddFunc()
stuff = i.visit(tree)
# now the trouble begins, a compiling..
co = compile(stuff, filename='<ast>', mode='exec')
# i get TypeError: required "lineno" missing from stmt
I have tried (unsuccesfully as you could probably guess) to handle this by using
ast library helper functions ast.fix_missing_locations(),
and ast.copy_locaction(), but in most cases I've ended up guessing or
facing AttributeError by tuple which is inside AddFunc class.
Have anybody some idea, how to manage this?
kolko's answer is correct, in this instance, you need to return an AST node. Foo in this case is a module, and its body should be the statements constructing the function, which can simply be spliced into the class definition.
It helps to understand that ast.NodeTransformer is very particular:
it calls the visit method by the specific class name, ignoring hierarchy
it evaluates the return value with isinstance(AST) and isinstance(list), anything else is ignored!
You may also get this error or the closely related "lineno missing from expr" even when you're using ast.fix_missing_locations to add location data.
If this happens, it's because your AST has an element that's not allowed in that place. Use astunparse.dump to carefully examine the structure. Within a module body, there can only be statements. Within a function def, there can only be statements. Then there are specific places that you can have an expression.
To solve this, write out the python you're expecting to generate, parse it, dump it and check it against what you're generating.
I found answer in twitter:
https://twitter.com/mitsuhiko/status/91169383254200320
mitsuhiko: "TypeError: required field "lineno" missing from stmt" — no, what you actually mean is "tuple is not a statement'. #helpfulpython
You return here tuple, but u must return ast
class AddFunc(ast.NodeTransformer):
def visit_ClassDef(self, node):
node.body += Foo.body
return node

Twisted inlineCallbacks and remote generators

I have used defer.inlineCallbacks in my code as I find it much easier to read and debug than using addCallbacks.
I am using PB and I have hit a problem when returning data to the client. The data is about 18Mb in size and I get a failed BananaError because of the length of the string being returned.
What I want to do is to write a generator so I can just keep calling the function and return some of the data each time the function is called.
How would I write that with inlineCallbacks already being used? Is it actually possible, If i return a value instead. Would something like the following work?
#defer.inlineCallbacks
def getLatestVersions(self):
returnlist = []
try:
latest_versions = yield self.cur.runQuery("""SELECT id, filename,path,attributes ,MAX(version) ,deleted ,snapshot , modified, size, hash,
chunk_table, added, isDir, isSymlink, enchash from files group by filename, path""")
except:
logger.exception("problem querying latest versions")
for result in latest_versions:
returnlist.append(result)
if len(return_list) >= 10:
yield return_list
returnlist = []
yield returnlist
A generator function decorated with inlineCallbacks returns a Deferred - not a generator. This is always the case. You can never return a generator from a function decorated with inlineCallbacks.
See the pager classes in twisted.spread.util for ideas about another approach you can take.

Categories

Resources