Python protobuf converted to json

Python protobuf converted to json - python

I am trying to convert message object
message = [id: "ff90608b-bb1f-463b-ad26-e0027e67e826"
byte_content: "PK\003\004\024\000\000\000\010\000\360\206\322R\007AMb\201\...00\000\310\031\000\000\000\000"
file_type: "application/pdf"
file_name: "cumulative-essentials-visit.pdf"
]
by
from google.protobuf.json_format import MessageToDict
dict_obj = MessageToDict(message_obj)
to json but got an error
message_descriptor = message.DESCRIPTOR
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'DESCRIPTOR'
Is there any idea?
Thanks

Here is a working example as well as reproducing the exception above.
Step 1: todolist.proto file with following content:
syntax = "proto3";
// Not necessary for Python but should still be declared to avoid name collisions
// in the Protocol Buffers namespace and non-Python languages
package protoblog;
// Style guide prefers prefixing enum values instead of surrounding
// with an enclosing message
enum TaskState {
TASK_OPEN = 0;
TASK_IN_PROGRESS = 1;
TASK_POST_PONED = 2;
TASK_CLOSED = 3;
TASK_DONE = 4;
}
message TodoList {
int32 owner_id = 1;
string owner_name = 2;
message ListItems {
TaskState state = 1;
string task = 2;
string due_date = 3;
}
repeated ListItems todos = 3;
}
Step 2: Generate python specific code from todolist.proto file by running following:
protoc -I=. --python_out=. todolist.proto
This will generate a file in the current directory todolist_pb2.py
Step 3: Create a python project and copy todolist_pb2.py to it.
Step 4: Create a python module proto_test.py with following content:
import json
from google.protobuf.json_format import Parse
from google.protobuf.json_format import MessageToDict
from todolist_pb2 import TodoList
todolist_json_message = {
"ownerId": "1234",
"ownerName": "Tim",
"todos": [
{
"state": "TASK_DONE",
"task": "Test ProtoBuf for Python",
"dueDate": "31.10.2019"
}
]
}
todolist_proto_message = Parse(json.dumps(todolist_json_message), TodoList())
print(todolist_proto_message)
# Successfully converts the message to dictionary
todolist_proto_message_dict = MessageToDict(todolist_proto_message)
print(todolist_proto_message_dict)
# If you try to convert a field from your message rather than entire message,
# you will get object has no attribute 'DESCRIPTOR exception'
# Examples:
# Eg.1: Produces AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute
# 'DESCRIPTOR.'
todos_as_dict = MessageToDict(todolist_proto_message.todos)
# Eg.2: Produces AttributeError: 'int' object has no attribute 'DESCRIPTOR'
owner_id_as_dict = MessageToDict(todolist_proto_message.owner_id)
Step 5: Run the proto_test.py module and you can see the failing behavior and the successful behavior.
So it seems like you are not converting your actual message rather you are converting a field of type list from your message/response. So try to convert the entire message and then retrieve the field you are interested in.
Please let me know if it helps.
NOTE: You need to ensure protoc compiler is installed in your machine to compile .proto file to python specific code as mentioned in step 2.
Installation instruction can be found below:
MacOS/Linux
Windows

Related

Unable to parse JSON from file using Python3

I'm trying to get the value of ["pooled_metrics"]["vmaf"]["harmonic_mean"] from a JSON file I want to parse using python. This is the current state of my code:
for crf in crf_ranges:
vmaf_output_crf_list_log = job['config']['output_dir'] + '/' + build_name(stream) + f'/vmaf_{crf}.json'
# read the vmaf_output_crf_list_log file and get value from ["pooled_metrics"]["vmaf"]["harmonic_mean"]
with open(vmaf_output_crf_list_log, 'r') as json_vmaf_file:
# load the json_string["pooled_metrics"] into a python dictionary
vm = json.loads(json_vmaf_file.read())
vmaf_values.append((crf, vm["pooled_metrics"]["vmaf"]["harmonic_mean"]))
This will give me back the following error:
AttributeError: 'dict' object has no attribute 'loads'
I always get back the same AttributeError not matter if I use "load" or "loads".
I validated the contents of the JSON, which is valid using various online validators, but still, I am not able to load the JSON for further parsing operations.
I expect that I can load a file that contains valid JSON data. The content of the file looks like this:
{
"frames": [
{
"frameNum": 0,
"metrics": {
"integer_vif_scale2": 0.997330,
}
},
],
"pooled_metrics": {
"vmaf": {
"min": 89.617207,
"harmonic_mean": 99.868023
}
},
"aggregate_metrics": {
}
}
Can somebody provide me some advice onto this behavior, what does it seem so absolutely impossible to load this JSON file?

loads is a method for the json library as the docs say https://docs.python.org/3/library/json.html#json.loads. In this case you are having a AttributeError this means that probably you have created another variable named "json" and when you call json.loads is calling that variable hence it won't have a loads method.

Expanding a Scribunto module that doesn't have a function

I want to get the return value of this Wikimedia Scribunto module in Python. Its source code is roughly like this:
local Languages = {}
Languages = {
["aa"] = {
name = "afarština",
dir = "ltr",
name_attr_gen_pl = "afarských"
},
-- More languages...
["zza"] = {
name = "zazaki",
dir = "ltr"
}
}
return Languages
In the Wiktextract library, there is already Python code to accomplish similar tasks:
def expand_template(sub_domain: str, text: str) -> str:
import requests
# https://www.mediawiki.org/wiki/API:Expandtemplates
params = {
"action": "expandtemplates",
"format": "json",
"text": text,
"prop": "wikitext",
"formatversion": "2",
}
r = requests.get(f"https://{sub_domain}.wiktionary.org/w/api.php",
params=params)
data = r.json()
return data["expandtemplates"]["wikitext"]
This works for languages like French because there the Scribunto module has a well-defined function that returns a value, as an example here:
Scribunto module:
p = {}
function p.affiche_langues_python(frame)
-- returns the needed stuff here
end
The associated Python function:
def get_fr_languages():
# https://fr.wiktionary.org/wiki/Module:langues/analyse
json_text = expand_template(
"fr", "{{#invoke:langues/analyse|affiche_langues_python}}"
)
json_text = json_text[json_text.index("{") : json_text.index("}") + 1]
json_text = json_text.replace(",\r\n}", "}") # remove tailing comma
data = json.loads(json_text)
lang_data = {}
for lang_code, lang_name in data.items():
lang_data[lang_code] = [lang_name[0].upper() + lang_name[1:]]
save_json_file(lang_data, "fr")
But in our case we don't have a function to call.
So if we try:
def get_cs_languages():
# https://cs.wiktionary.org/wiki/Modul:Languages
json_text = expand_template(
"cs", "{{#invoke:Languages}}"
)
print(json_text)
we get <strong class="error"><span class="scribunto-error" id="mw-scribunto-error-0">Chyba skriptu: Musíte uvést funkci, která se má zavolat.</span></strong> usage: get_languages.py [-h] sub_domain lang_code get_languages.py: error: the following arguments are required: sub_domain, lang_code. (Translated as "You have to specify a function you want to call. But when you enter a function name as a parameter like in the French example, it complains that that function does not exist.)
What could be a way to solve this?

The easiest and most general way is to get the return value of the module as JSON and parse it in Python.
Make another module that exports a function dump_as_json that takes the name of the first module as a frame argument and returns the first module as JSON. In Python, expand {{#invoke:json module|dump_as_json|Module:module to dump}} using the expandtemplates API and parse the return value of the module invocation as JSON with json.loads(data["expandtemplates"]["wikitext"]).
Text of Module:json module (call it what you want):
return {
dump_as_json = function(frame)
local module_name = frame.args[0]
local json_encode = mw.text.jsonEncode
-- json_encode = require "Module:JSON".toJSON
return json_encode(require(module_name))
end
}
With pywikibot:
from pywikibot import Site
site = Site(code="cs", fam="wiktionary")
languages = json.loads(site.expand_text("{{#invoke:json module|dump_as_json|Module:module to dump}}")
If you get the error Lua error: Cannot pass circular reference to PHP, this means that at least one of the tables in Module:module to dump is referenced by another table more than once, like if the module was
local t = {}
return { t, t }
To handle these tables, you will have to get a pure-Lua JSON encoder function to replace mw.text.jsonEncode, like the toJSON function from Module:JSON on English Wiktionary.
One warning about this method that is not relevant for the module you are trying to get: string values in the JSON will only be accurate if they were NFC-normalized valid UTF-8 with no special ASCII control codes (U+0000-U+001F excluding tab U+0009 and LF U+000A) when they were returned from Module:module to dump. As on a wiki page, the expandtemplates API will replace ASCII control codes and invalid UTF-8 with the U+FFFD character, and will NFC-normalize everything else. That is, "\1\128e" .. mw.ustring.char(0x0301) would be modified to the equivalent of mw.ustring.char(0xFFFD, 0xFFFD, 0x00E9). This doesn't matter in most cases (like if the table contains readable text), but if it did matter, the JSON-encoding module would have to output JSON escapes for non-NFC character sequences and ASCII control codes and find some way to encode invalid UTF-8.
If, like the module you are dumping, Module:module to dump is a pure table of literal values with no references to other modules or to Scribunto-only global values, you could also get its raw wikitext with the Revisions API and parse it in Lua on your machine and pass it to Python. I think there is a Python extension that allows you to directly use a Lua state in Python.
Running a module with dependencies on the local machine is not possible unless you go to the trouble of setting up the full Scribunto environment on your machine, and figuring out a way to download the module dependencies and make them available to the Lua state. I have sort of done this myself, but it isn't necessary for your use case.

extracting 'blob' python serialized tuple data in php

I'm trying to utilize a database from another program in a php based website tool, and apparently the original was built in python and puts some of it's data into a python tuple and serializes it to store it as a blob in the sql table.
I'm not a python programmer so I'm not sure how to even see what is in this blob, but I do know that some of the 'type' indicators for the data field are stored in there and I want to extract them and anything else useful.
Is there any way to 'unserialize' a python tuple in php?

The blob data turned out to be a pickled tuple (part of the reason I despise python - both data types that only python can read! Python programmers: 'standardized conventions? Who needs standardized conventions?!?!')
I came up with a cludgy way to 'unpickle' the data and json serialize it using a command line. To get the binary blob data into the command line, I base64 encode it. It's janky but it works for what I need:
/**
* use a python exec call to 'unpickle' the blob_data
* to get the binary blob into a command line argument, base64 encode it
* to get the data back out of python, json serialize it
* #param string $blob binary blob data
* #return mixed
*/
public static function unpickle($blob) {
$cmd = sprintf("import pickle; import base64; import json; print(json.dumps(pickle.loads(base64.b64decode('%s'))))", base64_encode($blob));
$pcmd = sprintf("python -c \"%s\"", $cmd);
$result = exec($pcmd);
$resdec = json_decode($result);
return $resdec;
}

With a little more playing on this concept, I gave myself a few more alternatives. First is, I took the command line version above and made it into a little more functional python script:
unpickle.py:
#!/usr/bin/env python3
import pickle
import json
import sys
import base64
import select
def isBase64(s):
try:
return s == base64.b64encode(base64.b64decode(s)).decode('ascii')
except Exception:
return False
bblob = None
if (len(sys.argv) > 1) and isBase64(sys.argv[1]):
bblob = base64.b64decode(sys.argv[1])
elif select.select([sys.stdin, ], [], [], 0.0)[0]:
try:
with open(0, 'rb') as f:
bblob = f.read()
except Exception as e:
err_unknown(e)
if bblob != None:
unpik = pickle.loads(bblob)
jsout = json.dumps(unpik)
print(jsout)
This script allows you to either specify the blob data from the pickled tuple 'byte' as a base64 encoded string on the command line, or you can pipe raw blob data into the script. Both variations will output json if the data is valid and formatted properly. (null if not)
You can convert this to a self-contained binary to plop on systems without python using pyinstaller -F if need be. To play with it in the event I am running it on systems with the pyinstaller binary vs one with the python script vs one with just python, I created the following static methods in my laravel model. (I'll eventually move it into a service module)
/**
* call either a pyinstaller binary or python script with raw blob data to be unpickled
*
* #param string $b binary data of blob
* #return false|mixed
*/
public static function unpickle($b)
{
$cmd = base_path(env('UNPICKLE_BINARY', 'bin/unpickle'));
if(!(is_file($cmd) && is_executable($cmd))) { // make sure unpickle cmd exists
// check for UNPICKLE_BINARY with .py after and python binary
$pyExe = env('PYTHON_EXE', '/usr/bin/python');
if (is_file($cmd.".py") && (is_file($pyExe) && is_executable($pyExe))) {
$cmd = sprintf("%s %s.py", $pyExe, $cmd);
} else
return static::unpyckle($b); // try direct python call
}
$descriptorspec = [
["pipe", "r"],
["pipe", "w"],
["pipe", "w"]
];
$cwd = dirname($cmd);
$env = [];
$process = proc_open($cmd, $descriptorspec, $pipes, $cwd, $env);
if (is_resource($process)) {
fwrite($pipes[0], $b);
fclose($pipes[0]);
$output = stream_get_contents($pipes[1]);
fclose($pipes[1]);
$return_value = proc_close($process);
if(static::isJson($output))
return json_decode($output);
else
return false;
}
return false;
}
/**
* use a python exec call to 'unpickle' the blob_data
* to get the binary blob into a command line argument, base64 encode it
* to get the data back out of python, json serialize it
* #param string $blob binary blob data
* #return mixed
*/
public static function unpyckle($blob) {
$pyExe = env('PYTHON_EXE', '/usr/bin/python');
if (!(is_file($pyExe) && is_executable($pyExe)))
throw new Exception('python executable not found!');
$bblob = base64_encode($blob);
$cmd = sprintf("import pickle; import base64; import json; print(json.dumps(pickle.loads(base64.b64decode('%s'))))", $bblob);
$pcmd = sprintf("%s -c \"%s\"", $pyExe, $cmd);
$result = exec($pcmd);
$resdec = json_decode($result);
return $resdec;
}
/**
* try to detect if a string is a json string
*
* #param $str
* #return bool
*/
public static function isJson($str) {
if(is_string($str) && !empty($str)) {
json_decode($str);
return (json_last_error() == JSON_ERROR_NONE);
}
return false;
}
example .env values:
UNPICKLE_BINARY=bin/unpickle
PYTHON_EXE=/usr/bin/python3
basically showing three different ways to call python to do essentially the same thing...

google.protobuf.message.DecodeError: Wrong wire type in tag Error in Protocol Buffer

I trying decrypt my data using google protocol buffer in python
sample.proto file:-
syntax = "proto3";
message SimpleMessage {
string deviceID = 1;
string timeStamp = 2;
string data = 3;
}
After that, I have generated python files using the proto command:-
protoc --proto_path=./ --python_out=./ simple.proto
My Python code below:-
import json
import simple_pb2
import base64
encryptedData = 'iOjEuMCwic2VxIjoxODEsInRtcyI6IjIwMjEtMDEtMjJUMTQ6MDY6MzJaIiwiZGlkIjoiUlFI'
t2 = bytes(encryptedData, encoding='utf8')
print(encryptedData)
data = base64.b64decode(encryptedData)
test = simple_pb2.SimpleMessage()
v1 = test.ParseFromString(data)
While executing above code getting error:- google.protobuf.message.DecodeError: Wrong wire type in tag Error
What i am doing wrong. can anyone help?

Your data is not "encrypted", it's just base64-encoded. If you use your example code and inspect your data variable, then you get:
import base64
data = base64.b64decode(b'eyJ2ZXIiOjEuMCwic2VxIjoxODEsInRtcyI6IjIwMjEtMDEtMjJUMTQ6MDY6MzJaIiwiZGlkIjoiUlFIVlRKRjAwMDExNzY2IiwiZG9wIjoxLjEwMDAwMDAyMzg0MTg1NzksImVyciI6MCwiZXZ0IjoiVE5UIiwiaWdzIjpmYWxzZSwibGF0IjoyMi45OTI0OTc5OSwibG5nIjo3Mi41Mzg3NDgyOTk5OTk5OTUsInNwZCI6MC4wfQo=')
print(data)
> b'{"ver":1.0,"seq":181,"tms":"2021-01-22T14:06:32Z","did":"RQHVTJF00011766","dop":1.1000000238418579,"err":0,"evt":"TNT","igs":false,"lat":22.99249799,"lng":72.538748299999995,"spd":0.0}\n'
Which is evidently a piece of of JSON data, not a binary-serialized protocol buffer - which is what ParseFromString expects. Also, looking at the names and types of the fields, it looks like this payload just doesn't match the proto definition you've shown.
There are certainly ways to parse a JSON into a proto, and even to control the field names in that transformation, but not even the number of fields match directly. So you first need to define what you want: what proto message would you expect this JSON object to represent?

JSON to Protobuf in Python

Hey I know there is a solution for this in Java, I'm curious to know if anyone knows of a Python 3 solution for converting a JSON object or file into protobuf format. I would accept either or as converting to an object is trivial. Searching the stackoverflow site, I only found examples of protobuf->json, but not the other way around. There is one extremely old repo that may do this but it is in Python 2 and our pipeline is Python 3. Any help is as always, appreciated.

The library you're looking for is google.protobuf.json_format. You can install it with the directions in the README here. The library is compatible with Python >= 2.7.
Example usage:
Given a protobuf message like this:
message Thing {
string first = 1;
bool second = 2;
int32 third = 3;
}
You can go from Python dict or JSON string to protobuf like:
import json
from google.protobuf.json_format import Parse, ParseDict
d = {
"first": "a string",
"second": True,
"third": 123456789
}
message = ParseDict(d, Thing())
# or
message = Parse(json.dumps(d), Thing())
print(message.first) # "a string"
print(message.second) # True
print(message.third) # 123456789
or from protobuf to Python dict or JSON string:
from google.protobuf.json_format import MessageToDict, MessageToJson
message_as_dict = MessageToDict(message)
message_as_dict['first'] # == 'a string'
message_as_dict['second'] # == True
message_as_dict['third'] # == 123456789
# or
message_as_json_str = MessageToJson(message)
The documentation for the json_format module is here.

Here is a much simpler way by using xia-easy-proto module. No need to pre-define anything.
pip install xia-easy-proto
And then
from xia_easy_proto import EasyProto
if __name__ == '__main__':
songs = {"composer": {'given_name': 'Johann', 'family_name': 'Pachelbel'},
"title": 'Canon in D',
"year": [1680, 1681]}
song_class, song_payload = EasyProto.serialize(songs)
print(song_class) # It is the message class
print(song_payload) # It is the serialized message

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python protobuf converted to json - python

Related

Unable to parse JSON from file using Python3

Expanding a Scribunto module that doesn't have a function

extracting 'blob' python serialized tuple data in php

google.protobuf.message.DecodeError: Wrong wire type in tag Error in Protocol Buffer

JSON to Protobuf in Python

Categories

Resources