I am trying to transfer large amounts of structured data from Java to Python. That includes many objects that are related to each other in some form or another. When I receive them in my Python code, it's quiet ugly to work with the types that are provided by protobuf. My VIM IDE crashed when trying to use autocomplete on the types, PyCharm doesn't complete anything and generally it just seems absurd that they don't provide some clean class definition for the different types.
Is there a way to get IDE support while working with protobuf messages in python? I'm looking at 20+ methods handling complex messages and without IDE support I might as well code with notepad.
I understand that protobuf is using metaclasses (although I don't know why they do that). Maybe there is a way to generate python class files from that data or maybe there is something similar to typescript typing files.
Did I maybe misuse protobuf? I believed I would describe my domain model in a way that may be used across languages. In Java I am happy with the generated classes and I can use them easily. Should I maybe have used something like swagger.io instead?
If you are using a recent Python (3.7+) then https://github.com/danielgtaylor/python-betterproto (disclaimer: I'm the author) will generate very clean Python dataclasses as output which will give you proper typing and IDE completion support.
For example, this input:
syntax = "proto3";
package hello;
// Greeting represents a message you can tell a user.
message Greeting {
string message = 1;
}
Would generate the following output:
# Generated by the protocol buffer compiler. DO NOT EDIT!
# sources: hello.proto
# plugin: python-betterproto
from dataclasses import dataclass
import betterproto
#dataclass
class Hello(betterproto.Message):
"""Greeting represents a message you can tell a user."""
message: str = betterproto.string_field(1)
In general the output of this plugin mimics the *.proto input and is very easy to read if you happen to jump to definition on a message or field. It's been a huge improvement for me personally over the official Google compiler plugin, and supports async gRPC out of the box as well.
As of now, nothing like that is available. You might want to follow this issue: https://github.com/google/protobuf/issues/2638 to be up to date.
mypy-protobuf generates the type hint files. But as discussed here this works only from protobuf 3.0 and python 2.7 onwards.
Related
Question in short:
Do you maybe know, or have any idea how I could implement the below mentioned, existing libraries in Python, whether that be through bindings or any other possible solution?
Description:
I'm working on a project and I have a very large quantity of custom built, fast-changing C++ Qt Libraries (version 5.15.2) that I need to use in Python.
I have done a lot of research on the topic over the past few weeks. However I can't seem to find a suitable solution on how to perform the bindings in an appropriate way.
I have mainly researched Shiboken, as that is the python bindings solution that the Qt Framework officially supports and encourages.
However Shiboken requires me to handwrite a type-system and header file for each library with each method or required part of that library that I want to use in my bindings. This is an issue for me as I require every method that is present in the C++ version to be present in the Python version, and hand writing that file is practically impossible.
They will need to be called from Python scripts as any other library would, or in a similar manner.
Shiboken could be a good choice. Yes, you need to specify each class/struct/enum/namespace in a type-system and header file. But not each method. If you specify a class in a type-system file, all the class methods will be extracted to Python. For example, this definition is enough:
<?xml version="1.0"?>
<typesystem package="mylibrary">
<object-type name="MyClass1"/>
<object-type name="MyClass2"/>
</typesystem>
Also, you can look at PySide sources. There is a big amount of Qt libraries extracted to Python. Good example.
I have some Python source code, and want to find out the type of a variable. For example given the string
"""
greeting = "Hello"
"""
I want to have get_type('greeting') == str. Or a more complex example:
"""
def test(input: str):
output = len(input)
return str
"""
In pseudocode, I want to be able to do something like:
>>> m = parse_module()
>>> m.functions['test'].locals['output'].get_type()
int
It seems this should be possible with type annotations and MyPy in Python 3, but I can't figure out how. IDEs like VS code have become very good at guessing the types in python code, that is why I'm guessing there must be an exposed way to do this.
There seems to be a module typed-ast, which is also used by MyPy, that gets me part of the way there. However, this does no type inference or propagation, it just gives me the explicit annotations as far as I understand. MyPy as an api, but it only lets you run the checker, and returns the same error messages as the command line tool. I am looking for a way to "reach into" MyPy, and get some of the inferred information out - or some alternative solution I haven't thought of.
Mypy currently has an extremely primitive, bare-bones API, which you can find "documented" within the source code here: https://github.com/python/mypy/blob/master/mypy/api.py. To use it, you essentially need to write your string to a temporary file which you later clean up.
You can perhaps combine this with the reveal_type(...) special directive (and perhaps even the hidden --shadow-file option) to typecheck your string.
The other alternative is to reverse engineer and re-implement pieces of mypy's main.py, essentially hijacking their internal API. I don't really think this will be hard, just somewhat ugly and fragile.
(Note that mypy can theoretically support typechecking arbitrary strings, and the core devs aren't opposed to extending the API for mypy in principle -- it's just that mypy is still under active development which means implementing an API has been very low priority for a while now. And since mypy is still actively being worked on/extended, the devs are somewhat reluctant to commit to implementing a more complex API that they'll subsequently have to support. You can find more context and details regarding the current state of the API in mypy's issue tracker.)
I understand that the normal way of using protobuf is to create the .proto and then compile it into the relevant class - Java, Python, etc. I have a requirement which might need to parse the .proto file in Python code. Has anyone tried creating own parser for the .proto file? Will it be recommended to always compile the class instead of directly parsing the .proto?
It probably won't help you directly, but yes, I've written my own parser (live demo, parser source). This code is C# hence why it probably won't help, but it clearly is possible. I started that branch 9 days ago, and now it is basically feature-complete including parser, generator, and an interactive web-site with syntax-error highlighting - so it isn't necessarily a huge amount of work.
However! You may find it easier just to shell execute "protoc" (available on maven). If you use the -oFILE / --descriptor_set_out=FILE switch (same thing, alternative syntax), then it parses the input .proto file and writes a file that is a serialized FileDescriptorSet from descriptor.proto. This means you can use your regular tools to generate code in your chosen language for descriptor.proto, then deserialize the file as a FileDescriptorSet instance. Once you've done that: you can just walk the object model to see the files, messages, enums, fields, etc. IIRC some protobuf implementations support working entirely from a descriptor (which is what protoc emits), without the codegen step.
Long story short, a piece of code that I'm working with at work has the line:
from System import System
with a later bit of code of:
desc_ = System()
xmlParser = Parser(desc_.getDocument())
# xmlParser.setEntityBase(self.dtdBase)
for featureXMLfile in featureXmlList.split(","):
print featureXMLfile
xmlParser.parse(featureXMLfile)
feat = desc_.get(featureName)
return feat
Parser is an XML parser in Java (it's included in a different import), but I don't get what the desc_ bit is doing. I mean obviously, it somehow holds the feature that we're trying to pull out, but I don't entirely see where. Is System a standard library in Python or Java, or am I looking at something custom?
Unfortunately, everyone else in my group is out for Christmas Eve vacation, so I can't ask them directly. Thank you for your help. I'm still not horribly familiar with Python.
This isn't from the standard library, so you'll need to check your system (Python has plenty of introspection to help you with that).
You can tell as Python modules in the standard library use lowercase names as per PEP-8, or by searching the library reference.
Note as well that Python has it's own XML parsing tools that will be much nicer to work with in Python than Java's.
Edit: As you have noted in the comments you are using Jython, it seems likely this is Java's System package.
millimoose indicated the correct answer in his comment, but neglected to submit it as an answer, so I'm posting to indicate the correct answer. It was indeed a custom module built by my company. I was able to determine this by typing import System; print(System) into the interpreter.
Is there any officially supported way to get the parent message for a given ProtoBuf message in Python? The way the Python protobuf interface is designed, we are guaranteed that each message will have at most one parent. It would be nice to be able to navigate from a message to its parent without building an external index.
Clearly, this information is present, and I can use the following code to get a weak pointer to the parent of any given message:
>>> my_parent = my_message._listener._parent_message_weakref
However, this uses internal attributes -- I would much rather use officially supported methods if possible.
If there is no officially supported way to do this, then I'll need to decide whether to build an external child→parent index (which could hurt performance), or to use this "hackish" method (appropriately wrapped).
After looking into this further (reading the source code), it's clear that there's no officially supported way to do this in Python.