How to build a Full C Parser using pyparsing? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to build a full C Parser using pyparsing.
Actually what I want for my project is to identify certain lines of code in a C Program of interest to me. Eg. Complex Assignment instructions with typecasting, pointer dereference etc.
I thought, since I am investing the effort, I will implement the Full C Grammar in pyparsing, and use just what I need.
I referred to this C Grammar for YACC and wrote it according to pyparsing (to the best of my limited understanding of pyparsing).
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html#translation-unit
What I get however is that pyparsing gets stuck in an infinite loop. I have uploaded the python code here.
https://gist.github.com/gkernel/18cd1d38376d07db989a
I need help in this. Please also tell me an alternative approach to solve my problem if you know any.
EDIT:
To be clear, there could be a bug in the code, but I have already invested effort in checking that I have written the correct grammar. I basically want to ask if pyparsing can be used for something as complicated as this.
One of the things I have done is Forward() declare all the non-terminals in the grammar, and I want to know if this is the right approach. I did this because Python would complain of some names being undefined.

As far as I know, pyparsing creates recursive-descent grammars. Recursive-descent grammars will go into an infinite loop if presented with a left-recursive grammar, and it is most likely that the rather ancient C grammar you unearthed (and any more modern C grammar) will be left-recursive, since such grammars are easier to write and are acceptable input to LALR(1) and GLR parser generators, like bison.
C is not an easy language to parse, and more so if you don't understand the basics of parsing theory. If your goal is to learn parsing theory, I'd suggest that you try a simpler language. If your only goal is to parse C, as indicated in your question, then I'd suggest you use one of the available tools; both gcc and clang come with (unfortunately underdocumented) mechanisms to access the parse tree for a C program, and there are commercial products as well if you have a budget.

Related

python static code analysis tools - code analysis (preliminary research question) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Disclaimer:
I've just started researching this area/domain of knowledge; so I have no idea what exactly it's called; but through a google search, I believe it has to do with (static code analysis, or at least it's related to it).
My question is:
Given a python code - file - script - module - package. Is there a tool that can produce a report out of it detailing:
how many classes are used, functions, built-in functions; decorators ;if/for/while statements etc?
To give you an analogy most of us can relate to:
Given a text file: find all the verbs / nouns / adjectives / adverbs / proper noun.
NLP tools like spaCy or NLTK have the ability to do that for natural languages.
But what about programming languages? Is there a tool for that?
Can a tool like pylint do that?
UPDATE
As I expected such tools exist; one of them as #BoarGules suggested in his comment is the ast module ... It's the hint I needed to go further in my research; any further suggestions are welcome. BTW ast stands for abstract syntax tree.
Given a python code - file - script - module - package. Is there a tool that can produce a report out of it detailing: how many classes are used...
There cannot be an exact tool for that, since Python has an eval primitive.
When that primitive is executed, the set of classes or functions of your Python program can increase.
Be aware of Rice's theorem.
Consider using abstract interpretation and type inference techniques in your Python static analyzer.
Consider also using (painfully) Frama-C on the source code (the code written in C) of the Python interpreter. With a lot of work, Frama-C could be extended to analyze Python source code.
(but someone needs to do that work, or to pay for it)
Read also recent proceedings of ACM SIGPLAN conferences.

Is it possible to implement heuristic virus scanning in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to create a virus scanner in Python, and I know that signature based detection is possible, but is heuristic based detection possible in Python, ie. run a program in a safe environment, or scan the program's code, or check what the program behaves like, and then decide if the program is a virus or not.
Python is described as a general purpose programming language so yes, this is defiantly possible but not necessarily the best implementation. In programming, just like a trade, you should use the best tools for the job.
It could be recommended prototyping your application with Python and Clamd and then consider moving to another language if you want a closed source solution, which you can sell and protect your intellectual property.
Newb quotes:
Anything written in python is typically quite easy to
reverse-engineer, so it won't do for real protection.
I disagree, in fact a lot but it is up for debate I suppose. I really depends how the developer packages the application.
Yes, it is possible.
...and...
No, it is probably not the easiest, fastest, best performing, or most efficient way to accomplish the task.
Well, sure it's possible. Python is turing-complete, so you can use it to the same ends as you can use other programming languages like C++. And you can certainly do a primitive signature-based or code-inspecting check in Python without great difficulty. So the answer to that question is yes.
Now for the deeper question: are you asking whether Python is a good tool for this job? I don't think so, primarily because Python Code is Hard to Obscure, which means that if you develop an anti-virus in Python, it becomes weak the moment you give it to other people. That's because a virus developer will find it easy to inspect your anti-virus engine, since you will not be able to obscure your python code. That means that they can find vulnerabilities in your virus scanner easily.
Indeed, one of the key components of a good anti-virus is making it as hard to reverse-engineer as possible, so that virus developers won't figure out what the weaknesses of your anti-virus engine are. Anything written in python is typically quite easy to reverse-engineer, so it won't do for real protection.

Can Python Theoretically be "Decompiled" to C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Since Python itself is written in C, is it theoretically possible to "decompile" any Python program into C, for whatever reason? Not translate, (which would be taking the semantics of the program and writing another program in C that does the same thing) but truly decompile (use a program to find the appropriate C functions for each Python operation and implement them in a syntactically correct manner).
Any programming language can theoretically be translated to any other programming language. This theoretical possibility says nothing about how easy it is, or about whether any existing tools allow you to do it.
It's also ambiguous what counts as "decompiling". For example, I can use boost::python and embed a python program as a string in a C++ program. Now I have a C++ program completely equivalent to that python code. That hardly counts as a proper translation, though.
There are some things no translater will be able to do (well):
if ask_user():
a = 1
else:
a = "hi"
print(a)
Because of the compile-time type ambiguity, any equivalent c program will have to have some elaborate data structures with run-time type information.
Yes. Of course you could translate python to c. Parts of what the interpreter does would end up being in your c program. If you restrict your python to RPython it gets a lot easier. As some things in full python don't translate well. Mostly I don't see much point though.
Check out https://code.google.com/p/py2c/ to convert python to c.

Is there a standard lexer/parser tool for Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
A volunteer job requires us to convert a large number of LaTeX documents into ePub file format. It's a series of open-source fiction book which has so far only been produced only on paper via a print on demand service. We'd like to be able to offer the book to users of book-reader devices (such as Kindle) which require the ePub format for best results.
Fortunately, ePub is a very simple format, however there's no trivial way for LaTeX to produce the XHTML output required.
We experimented with alternative LaTeX compilers (e.g. plastex) but in the end we figured that it would probably be a lot easier to simply write our own compiler which understands a tiny subset of the LaTeX language and compiles directly to XHTML / ePub.
Previously I used a tool on Windows called GOLD. This allowed me to go directly from BNF grammars to a stub parser. It also alllowed me to implement the parser in any language I liked. (I'd choose Python).
This product has to work on Linux, so I'm wondering if there's an equivalent toolchain that works as well under Ubutnu / Eclipse / Python. The idea is that we will take the grammar of TeX and just implement a teeny subset of that, but we do not want to spend a huge amount of time worrying about grammar and parsing. A parser generator would obviously save us a great deal of time.
Sal
UPDATE 1: Bonus marks for a solution with excellent documentation or tutorials.
UPDATE 2: Extra bonus if there is grammar file for TeX already available, since all I'd have to do is implement the functions we care about.
Try pyparsing.
Se http://pyparsing.wikispaces.com/WhosUsingPyparsing, search for TeX. There's a project where pyparsing is used to parse a subset of TeX syntax mentioned on that page.
For documentation, I recommend the "Getting started with pyparsing" e-book, by pyparsing's author.
EDIT: According to PaulMcG, Pyparsing is no longer hosted on wikispaces.com. Go to the new GitHub site
Try PLY.
I once used tex4ht to convert LaTeX to XHTML+MathML. Worked quite nice. From that on, you could use the output HTML as base for the ePub.
Of course, this breaks the Python toolchain, so it might not become your favorite method...

What are features considerd as advanced python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I do basic python programming and now I want to get deep into language features. I have collected/considered the following to be advanced python capabilities and learning them now.
Decorator
Iterator
Generator
Meta Class
Anything else to be added/considered to the above list?
First, this thread should be community wiki.
Second, iterators and generators are pretty basic Python IMHO. I agree with you on decorators and metaclasses. But I'm not a very good programmer, so I probably find this more difficult to wrap my brain around than others.
Third, I would add threading/multiprocessing to the list. That's really tricky :)
There are some useful core concepts that can be added to your list, and that I would not necessarily teach in an introductory Python class (from the most common to the more specific):
the various protocols (sequence, iterator, context,…)
properties
packages
Some points related to important standard modules:
Making your classes compatible with the standard copy and pickle modules.
The first 3 are intermediate Python, not advanced. For advanced add the stuff in the Importing Modules and Python Language Services sections of the library reference.
I think you'll find that there isn't a good answer to your question. What's great about Python is that all of its features are fairly easy to understand. But there's enough stuff in the language and the library that you never get around to learning it all. So it really boils down to which you've had occasion to use, and which you've only heard about.
If you haven't used decorators or generators, they sound advanced. But once you actually have to use them in a real-world situation, you'll realize that they're really quite simple, and wonder how you managed to live without them before.

Categories

Resources