extracting GPB descriptor from a proto file

extracting GPB descriptor from a proto file - python

I have a proto file defining some GPB (proto buffer) messages.
I want to implement a simple python script that go over the different messages and write to external file (lets say a JSON file) the basic information regarding each of the messages' fields (name, type, default value, etc..).
I searched on the WEB and found that once I get the GPB descriptor the rest should be relatively easy.
However, I have no idea how to get the descriptor itself.
Can someone help me here??
10x

protoc has an option --descriptor_set_out which writes the descriptors as a FileDescriptorSet as described in descriptor.proto from the Protobuf source code. See protoc --help for more info.
Alternatively, you might consider actually writing your script as a code generator plugin. In this case, you wouldn't be generating code, but just a JSON file (or whatever), but the mechanism is the same.

Related

How can I get pb2.py into excel?

I am having a hard time with this. Is there a way to get a compiled protocol buffer file’s (pb2.py) contents into excel?

Your question lacks detail and a demonstration of your attempt to solve this problem, it is likely that it will be closed.
Presumably (!?) your intent is to start with serialized binary/wire format protocol buffer messages, unmarshal these into Python objects and then, using a Python package (list that can interact with Excel, enter these objects as row into Excel.
The Python (pb2.pby) file generated by the protocol buffer compiler (protoc) from a .proto file, contains everything you need to marshal and unmarshal messages in the binary/wire format to Python objects that represent the messages etc. that are defined by the .proto file. The protocol buffer documentation is comprehensive and explains this well (link).
Once you've unmarshaled the data into one or more Python objects, you will need to use the Python package for Excel of your choosing to output these objects into the spreadsheet(s).
It is unclear whether you have flat or hierarchical data. If you have anything non-trivial, you'll also need to decide how to represent the structure in the spreadsheet's table-oriented structure.

How to make Python ssl module use data in memory rather than pass file paths?

The full explanation of what I want to do and why would take a while to explain. Basically, I want to use a private SSL connection in a publicly distributed application, and not handout my private ssl keys, because that negates the purpose! I.e. I want secure remote database operations which no one can see into - inclusive of the client.
My core question is : How could I make the Python ssl module use data held in memory containing the ssl pem file contents instead of hard file system paths to them?
The constructor for class SSLSocket calls load_verify_locations(ca_certs) and load_cert_chain(certfile, keyfile) which I can't trace into because they are .pyd files. In those black boxes, I presume those files are read into memory. How might I short circuit the process and pass the data directly? (perhaps swapping out the .pyd?...)
Other thoughts I had were: I could use io.StringIO to create a virtual file, and then pass the file descriptor around. I've used that concept with classes that will take a descriptor rather than a path. Unfortunately, these classes aren't designed that way.
Or, maybe use a virtual file system / ram drive? That could be trouble though because I need this to be cross platform. Plus, that would probably negate what I'm trying to do if someone could access those paths from any external program...
I suppose I could keep them as real files, but "hide" them somewhere in the file system.
I can't be the first person to have this issue.
UPDATE
I found the source for the "black boxes"...
https://github.com/python/cpython/blob/master/Modules/_ssl.c
They work as expected. They just read the file contents from the paths, but you have to dig down into the C layer to get to this.
I can write in C, but I've never tried to recompile an underlying Python source. It looks like maybe I should follow the directions here https://devguide.python.org/ to pull the Python repo, and make changes to. I guess I can then submit my update to the Python community to see if they want to make a new standardized feature like I'm describing... Lots of work ahead it seems...

It took some effort, but I did, in fact, solve this in the manner I suggested. I revised the underlying code in the _ssl.c Python module / extension and rebuilt Python as a whole. After figuring out the process for building Python from source, I had to learn the details for how to pass variables between Python and C, and I needed to dig into guts of OpenSSL (over which the Python module is a wrapper).
Fortunately, OpenSSL already has functions for this exact purpose, so it was just a matter of swapping out the how Python is trying to pass file paths into the C, and instead bypassing the file reading process and jumping straight to the implementation of using the ca/cert/key data directly instead.
For the moment, I only did this for Windows. Since I'm ultimately creating a cross platform program, I'll have to repeat the build process for the other platforms I'll support - so that's a hassle. Consider how badly you want this, if you are going to pursue it yourself...
Note that when I rebuilt Python, I didn't use that as my actual Python installation. I just kept it off to the side.
One thing that was really nice about this process was that after that rebuild, all I needed to do was drop the single new _ssl.pyd into my working directory. With that file in place, I could pass my direct cert data. If I removed it, I could pass the normal file paths instead. It will use either the normal Python source, or implicitly use the override if the .pyd file is simply put in the program's directory.

How to customize Flyway so that it can handle CSV files as input as well?

Has someone implemented the CSV-handling for Flyway? It was requested some time ago (Flyway specific migration with csv files). Flyway comments it now as a possibility for the MigrationResolver and MigrationExecutor, but it does not seem to be implemented.
I've tried to do it myself with Flyway 4.2, but I'm not very good with java. I got as far as creating my own jar using the sample and make it accessible to flyway. But how does flyway distinguish when to use the SqlMigrator and when to use my CsvMigrator? I thought I have to register my own prefix/suffix (as the question above writes), but FlywayConfiguration seems to be read-only, at least I did not see any API calls for doing this :(.
How to connect the different Resolvers to the different migration file types? (.sql to the migration using Sql and .csv/.py to the loading of Csv and executing python scripts)

After some shed of tears and blood, it looks like came up with something on this. I can't make the whole code available because it is using proprietary file format, but here's the main ideas:
implement the ConfigurationAware as well, and use the setFlywayConfiguration implementation to catalog the extra files you want to handle (i.e. .csv). This is executed only once during the run.
during this cataloging I could not use the scanner or LoadableResources, there's some Java magic I do not understand. All the classes and methods seem to be available and accessible, even when using .getMethods() runtime... but when trying to actually call them during a run it throws java.lang.NoSuchMethodError and java.lang.NoClassDefFoundError. I've wasted a whole day on this - don't do that, just copy-paste the code from org.flywaydb.core.internal.util.scanner.filesystem.FileSystemScanner.
use Set< String > instead of LoadableResources[], way easier to work with, especially since there's no access to LoadableResources anyway and working with [] was a nightmare.
the python/shell call will go to the execute(). Some tips:
any exception or fawlty exitcode needs to be translated to an SQLException.
the build is enforcing Java 1.6, so new ProcessBuilder(cmd).inheritIO() cannot be used. Look at these solutions: ProcessBuilder: Forwarding stdout and stderr of started processes without blocking the main thread if you want to print the STDOUT/STDERR.
to compile flyway including your custom module, clone the whole flyway repo from git, edit the main pom.xml to include your module as well and use this command to compile: "mvn install -P-CommercialDBTest -P-CommandlinePlatformAssemblies -DskipTests=true" (I found this in another stackoverflow question.)
what I haven't done yet is the checksum part, I don't know yet what that wants.

Interfacing with Python code via file read/write?

Working with a Windows program that has it's own language with minimal interfacing options with external code, but it can read & write to files. I am looking for a method to send a set of configuration values to Python 3 code like "12,43,47,62" to query data in Pandas and return the associated results.
Someone mentioned this could possibly be done through a file interface where inputs were written to a file from the originating program and values were read back from an alternate file. I have a couple of questions regarding this concept hopefully someone could clarify for me.
How well does this method handle simultaneous access where multiple calls are being made for different queries?
What is the correct terminology for this type of task?
Is there a way to do it so the Python code senses the change as opposed to repeatedly checking for changes?

1) Poorly. You should put each query in its own file, responses in their own files, and encode request ID's or other information in the file names.
2) I'm not sure there is one. "File Based Communication" maybe.
3) Yes, Python watchdog.

Processing (possibly) optional arguments in Python

I am working on a series of command line tools which connect to the same server and do related but different things. I'd like users to be able to have a single configuration file where they can place common arguments such as connection information that can be shared across all the tools. Ideally, I'd like something that does the following for me:
If the server address is specified at the command line use this and ignore any other values
If the server address is not specified at the command line but is in a config file that is specified at the command line use this address. Ignore any other values.
If the server address is not specified at the command line or a config file specified at the command, but is available in a in a config file in the user's home directory (say .myapprc), use this value.
If the server address is not specified in any of the above mechinisms exit with an error message.
The closest I've seen to this is the configparse module, which from what I can tell offers an option parser that will also look at config files, but does not seem to have the notion of "Must be specified somewhere" which I need.
Does anyone know of an existing module that can cover my use case above? If not, a simple extension to optparse, configparse, or some other module I have not reviewed would also be greatly appreciated.

This-party module configparse is written to extend optparse from the standard Python library. As the optparse docs I pointed to mention, "optparse doesn’t prevent you from implementing required options, but doesn’t give you much help at it either" (though it follows with a couple of URLs that show you ways to do it). Simplest is to use the default value functionality: specify a default value that's not actually a legal value (for something like a server's address, that's pretty easy) -- then, once options are processed, verify that the specified value is legal (which is a good idea anyway!-) and raise the appropriate exception otherwise.

I've used opster's middleware feature together with SafeConfigParser to achieve a similar (but slightly simpler) effect as you ask. You have to implement the specific logic you described yourself, but it assists you enough to make it relatively painless. An example of opster's middleware use is in its test/test.py example.

use a dict to store options to your program.
first parse the option file in the user's directory and store every options in a dict (configparse or any other module is welcome). then parse the command line (using any module you want, optparse might fit well), if an arguments specifies a config file, parse the specified file in a dict and update your options from what you read (dict.update is really handy to merge 2 dict). then store all other arguments into another dict, and merge them again (dict.update again...).
this way, you are sure that the dict in which you stored the options contains the value you want, which was either read from the user's file, from the specified config file or directly from the command line. if it does not contain a required value, exit with an error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.