Parsing a .proto file without creating the descriptor

Parsing a .proto file without creating the descriptor - python

I understand that the normal way of using protobuf is to create the .proto and then compile it into the relevant class - Java, Python, etc. I have a requirement which might need to parse the .proto file in Python code. Has anyone tried creating own parser for the .proto file? Will it be recommended to always compile the class instead of directly parsing the .proto?

It probably won't help you directly, but yes, I've written my own parser (live demo, parser source). This code is C# hence why it probably won't help, but it clearly is possible. I started that branch 9 days ago, and now it is basically feature-complete including parser, generator, and an interactive web-site with syntax-error highlighting - so it isn't necessarily a huge amount of work.
However! You may find it easier just to shell execute "protoc" (available on maven). If you use the -oFILE / --descriptor_set_out=FILE switch (same thing, alternative syntax), then it parses the input .proto file and writes a file that is a serialized FileDescriptorSet from descriptor.proto. This means you can use your regular tools to generate code in your chosen language for descriptor.proto, then deserialize the file as a FileDescriptorSet instance. Once you've done that: you can just walk the object model to see the files, messages, enums, fields, etc. IIRC some protobuf implementations support working entirely from a descriptor (which is what protoc emits), without the codegen step.

Related

Static analyse and extract the difference between two versions of same Python Library

I'm doing a research project on detecting breaking changes from Python library upgrades. One of the steps is to extract the difference between two major versions of the same Python library by using static analysis(Coule be AST-based or not), in order to triage the pattern of change. The detection should not only find the difference from .py files, but also the difference from other project files including config files, resources, etc. Ideally, a scenario like if a .py file moved to another module should also be included. So I have two questions here:
Is there a tool that can do a similar job and also support flexible configuration for analysis?
If not, what will be the best strategy to search for that kind of difference and identify the category of this difference(e.g. variable, function, etc.)
Sorry, this might be a silly question, I'm not coming from a Python background, really running out of thoughts here. Any thoughts, ideas, and inputs are welcome. Thanks in advance.

Just spit balling some ideas here:
I don't think I'd be so concerned about detecting changes in the source files up front. There are a lot of ways to move code around among files without changing the interface to the module. For example you can put all of the code in __init__.py or, you can split it up into any number of files and subdirectories. However, the programmatic interface will stay the same.
Instead, you could use the dir() built-in to detect changes in the public classes and methods in the module. This will work well for libraries that used named arguments, but won't work well for functions which just use def func(*args, **kwargs) (this is why that should be avoided, all you former perl programmers!)
If the module uses the new type hinting, you can really get some mileage out of detecting change in types. If you use some tool that actually parses the python and infers types, that would work as well. I would guess VSCode probably contains such a library that it uses to give context-sensitive help.

Add own realtime custom parser to Python to generate and compile AST

My task is to add switch statement and remove mandatory colons from functions, classes, loops in Python.
Maybe to add some other nice features from Coffeescript.
The .py files with custom syntax must be imported with python interpreter, than parsed with a custom parser (just like Coffeescript compiler does).
(I already had a little experience in writing Python-like "for" syntax to already created custom parser, corrected several bugs. But it takes a long time to read all code and get it. So I decided to ask advice first.)
I searched a long time through internet, found several helpful answers, but still don't know how to implement it better.
Some from what I found:
Parse a .py file, read the AST, modify it, then write back the modified source code
Python's tokenize module
Python's ast module
Python's c-like preprocessor with import hook
What I think to do:
Rewrite Coffeescript parser or Python parser into pure Python
Make import hook to parse files to AST by my own parser.
Continue import (compile AST and import it to module)
(like Coffeescript does it)
So I have such questions:
- Is there a Python parser written in Python (not to rewrite all Coffeescript parser) ?
- Maybe is there any way to make ast.AST class frow own parser not rewriting ast library from C into Python ?
- How can I do it better and easier ? (except modifying Python's sources, all must be done in runtime and be totally compatible with all other Python interpreters)
- Maybe there are already some libraries, that help modifying Python's syntax ?
Thank you very much.
Best regards, Serj.

File extension naming: .p vs .pkl vs .pickle

When reading and writing pickle files, I've noticed that some snippets use .p others .pkl and some the full .pickle. Is there one most pythonic way of doing this?
My current view is that there is no one right answer, and that any of these will suffice. In fact, writing a filename of awesome.pkl or awesome.sauce won't make a difference when running pickle.load(open(filename, "rb")). This is to say, the file extension is just a convention which doesn't actually affect the underlying data. Is that right?
Bonus: What if I saved a PNG image as myimage.jpg? What havoc would that create?

The extension makes no difference because "The Pickle Protocol" runs every time.
That is to say whenever pickle.dumps or pickle.loads is run the objects are serialized/un-serialized according to the pickle protocol.
(The pickle protocol is a serialization format)
The pickle protocol is python specific(and there are several versions). It's only really designed for a user to re-use data themselves -> if you sent the pickled file to someone else who happened to have a different version of pickle/Python then the file might not load correctly and you probably can't do anything useful with that pickled file in another language like Java.
So, use what extensions you like because the pickler will ignore them.
JSON is another more popular way of serializing data, it can also be used by other languages unlike pickle - however it does not cater directly to python and so certain variable types are not understood by it :/
source if you want to read more
EDIT: while you could use any name, what should you use?
1 As mentioned by #Mike Williamson .pickle is used in the pickle docs
2 The python standard library json module loads files named with a .json extension. So it would follow the the pickle module would load a .pickle extension.
3 Using .pickle would also minimise any chance of accidental use by other programs.
.p extensions are used by some other programs, most notably MATLAB as the suffix for binary run-time files[sources: one, two]. Some risk of conflict
.pkl is used by some obscure windows "Migration Wizard Packing List file"[ source]. Incredibly low risk of conflict.
.pickle is only used for python pickling[source]. No risk of conflict.

What does "i" represent in Python .pyi extension?

In Python, what does "i" represent in .pyi extension?
In PEP-484, it mentions .pyi is "a stub file" but no mnemonic help on the extension. So does the "i" mean "Include"? "Implementation"? "Interface"?

I think the i in .pyi stands for "Interface"
Definition for Interface in Java:
An interface in the Java programming language is an abstract type that
is used to specify a behaviour that classes must implement
From Python typeshed github repository:
Each Python module is represented by a .pyi "stub". This is a normal
Python file (i.e., it can be interpreted by Python 3), except all the
methods are empty.
In 'Mypy' repository, they explicitly mention "stub" files as public interfaces:
A stubs file only contains a description of the public interface of
the module without any implementations.
Because "Interfaces" do not exist in Python (see this SO question between Abstract class and Interface) I think the designers intended to dedicate a special extension for it.
pyi implements "stub" file (definition from Martin Fowler)
Stubs: provide canned answers to calls made during the test, usually
not responding at all to anything outside what's programmed in for the
test.
But people are more familiar with Interfaces than "stub" files, therefore it was easier to choose .pyi rather than .pys to avoid unnecessary confusion.

Apparently PyCharm creates .pyi file for its own purposes:
The *.pyi files are used by PyCharm and other development tools to provide
more information, such as PEP 484 type hints, than it is able to glean from
introspection of extension types and methods. They are not intended to be
imported, executed or used for any other purpose other than providing info
to the tools. If you don't use use a tool that makes use of .pyi files then
you can safely ignore this file.
See: https://www.python.org/dev/peps/pep-0484/
https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html
This comment was found in: python27/Lib/site-packages/wx/core.pyi

The i in .pyi stands for ‘interface’.
The .pyi extension was first mentioned in this GitHub issue thread where JukkaL says:
I'd probably prefer an extension with just a single dot. It also needs to be something that is not in use (it should not be used by cython, etc.). .pys seems to be used in Windows (or was). Maybe .pyi, where i stands for an interface definition?

Another way to explain the contents of a module that Wing can't figure out is with a pyi Python Interface file. This file is merely a Python skeleton with the proper structure, call signature, and return values to correspond to the functions, attributes, classes, and methods specified in a module.

Save Workspace - save all variables to a file. Python doesn't have it)

I cannot understand it. Very simple, and obvious functionality:
You have a code in any programming language, You run it. In this code You generate variables, than You save them (the values, names, namely everything) to a file, with one command. When it's saved You may open such a file in Your code also with simple command.
It works perfect in matlab (save Workspace , load Workspace ) - in python there's some weird "pickle" protocol, which produces errors all the time, while all I want to do is save variable, and load it again in another session (?????)
f.e. You cannot save class with variables (in Matlab there's no problem)
You cannot load arrays in cPickle (but YOu can save them (?????) )
Why don't make it easier?
Is there a way to save the current variables with values, and then load them?

What you are describing is Matlab environment feature not a programming language.
What you need is a way to store serialized state of some object which could be easily done in almost any programming language. In python world pickle is the easiest way to achieve it and if you could provide more details about the errors it produces for you people would probably be able to give you more details on that.
In general for object oriented languages (including python) it is always a good approach to incapsulate a your state into single object that could be serialized and de-serialized and then store/load an instance of such class. Pickling and unpickling of such objects works perfectly for many developers so this must be something specific to your implementation.

Since you're talking about Matlab, you probably want to try out IPython, which is a shell for Python offering much more functionality than the standard interpreter shell you get when executing Python.
Among this functionality is the ability to load/save workspace sessions, create macros out of session input etc., which is probably more like what you are used to in Matlab (I actually use both and find IPython to be much more elegant, but YMMV):
http://ipython.scipy.org

PiCloud has implemented a fancier pickle, but I can't find the code. I saw a poster session.
Generally in Python instantiated objects don't have any one way to recreate them, and in some cases its particularly difficult (like an open file) as it takes several steps to recreate.

I take issue with the statement that the saving of variables in Matlab is an environment function. the "save" statement in matlab is a function and part of the matlab language not just a command. It is a very useful function as you don't have to worry about the trivial minutia of file i/o and it handles all sorts of variables from scalar, matrix, objects, structures.

It's an old thread, but thought I should throw it out there anyway - Spyder the Scientific Python development environment allows you to do just this through the Variable explorer. There's a button there Save data that packs your whole workspace up in a .spydata file that you can later reload. Works like a charm when you're switching between projects!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.