JSON to Protobuf in Python - python

Hey I know there is a solution for this in Java, I'm curious to know if anyone knows of a Python 3 solution for converting a JSON object or file into protobuf format. I would accept either or as converting to an object is trivial. Searching the stackoverflow site, I only found examples of protobuf->json, but not the other way around. There is one extremely old repo that may do this but it is in Python 2 and our pipeline is Python 3. Any help is as always, appreciated.

The library you're looking for is google.protobuf.json_format. You can install it with the directions in the README here. The library is compatible with Python >= 2.7.
Example usage:
Given a protobuf message like this:
message Thing {
string first = 1;
bool second = 2;
int32 third = 3;
}
You can go from Python dict or JSON string to protobuf like:
import json
from google.protobuf.json_format import Parse, ParseDict
d = {
"first": "a string",
"second": True,
"third": 123456789
}
message = ParseDict(d, Thing())
# or
message = Parse(json.dumps(d), Thing())
print(message.first) # "a string"
print(message.second) # True
print(message.third) # 123456789
or from protobuf to Python dict or JSON string:
from google.protobuf.json_format import MessageToDict, MessageToJson
message_as_dict = MessageToDict(message)
message_as_dict['first'] # == 'a string'
message_as_dict['second'] # == True
message_as_dict['third'] # == 123456789
# or
message_as_json_str = MessageToJson(message)
The documentation for the json_format module is here.

Here is a much simpler way by using xia-easy-proto module. No need to pre-define anything.
pip install xia-easy-proto
And then
from xia_easy_proto import EasyProto
if __name__ == '__main__':
songs = {"composer": {'given_name': 'Johann', 'family_name': 'Pachelbel'},
"title": 'Canon in D',
"year": [1680, 1681]}
song_class, song_payload = EasyProto.serialize(songs)
print(song_class) # It is the message class
print(song_payload) # It is the serialized message

Related

Expanding a Scribunto module that doesn't have a function

I want to get the return value of this Wikimedia Scribunto module in Python. Its source code is roughly like this:
local Languages = {}
Languages = {
["aa"] = {
name = "afarština",
dir = "ltr",
name_attr_gen_pl = "afarských"
},
-- More languages...
["zza"] = {
name = "zazaki",
dir = "ltr"
}
}
return Languages
In the Wiktextract library, there is already Python code to accomplish similar tasks:
def expand_template(sub_domain: str, text: str) -> str:
import requests
# https://www.mediawiki.org/wiki/API:Expandtemplates
params = {
"action": "expandtemplates",
"format": "json",
"text": text,
"prop": "wikitext",
"formatversion": "2",
}
r = requests.get(f"https://{sub_domain}.wiktionary.org/w/api.php",
params=params)
data = r.json()
return data["expandtemplates"]["wikitext"]
This works for languages like French because there the Scribunto module has a well-defined function that returns a value, as an example here:
Scribunto module:
p = {}
function p.affiche_langues_python(frame)
-- returns the needed stuff here
end
The associated Python function:
def get_fr_languages():
# https://fr.wiktionary.org/wiki/Module:langues/analyse
json_text = expand_template(
"fr", "{{#invoke:langues/analyse|affiche_langues_python}}"
)
json_text = json_text[json_text.index("{") : json_text.index("}") + 1]
json_text = json_text.replace(",\r\n}", "}") # remove tailing comma
data = json.loads(json_text)
lang_data = {}
for lang_code, lang_name in data.items():
lang_data[lang_code] = [lang_name[0].upper() + lang_name[1:]]
save_json_file(lang_data, "fr")
But in our case we don't have a function to call.
So if we try:
def get_cs_languages():
# https://cs.wiktionary.org/wiki/Modul:Languages
json_text = expand_template(
"cs", "{{#invoke:Languages}}"
)
print(json_text)
we get <strong class="error"><span class="scribunto-error" id="mw-scribunto-error-0">Chyba skriptu: Musíte uvést funkci, která se má zavolat.</span></strong> usage: get_languages.py [-h] sub_domain lang_code get_languages.py: error: the following arguments are required: sub_domain, lang_code. (Translated as "You have to specify a function you want to call. But when you enter a function name as a parameter like in the French example, it complains that that function does not exist.)
What could be a way to solve this?
The easiest and most general way is to get the return value of the module as JSON and parse it in Python.
Make another module that exports a function dump_as_json that takes the name of the first module as a frame argument and returns the first module as JSON. In Python, expand {{#invoke:json module|dump_as_json|Module:module to dump}} using the expandtemplates API and parse the return value of the module invocation as JSON with json.loads(data["expandtemplates"]["wikitext"]).
Text of Module:json module (call it what you want):
return {
dump_as_json = function(frame)
local module_name = frame.args[0]
local json_encode = mw.text.jsonEncode
-- json_encode = require "Module:JSON".toJSON
return json_encode(require(module_name))
end
}
With pywikibot:
from pywikibot import Site
site = Site(code="cs", fam="wiktionary")
languages = json.loads(site.expand_text("{{#invoke:json module|dump_as_json|Module:module to dump}}")
If you get the error Lua error: Cannot pass circular reference to PHP, this means that at least one of the tables in Module:module to dump is referenced by another table more than once, like if the module was
local t = {}
return { t, t }
To handle these tables, you will have to get a pure-Lua JSON encoder function to replace mw.text.jsonEncode, like the toJSON function from Module:JSON on English Wiktionary.
One warning about this method that is not relevant for the module you are trying to get: string values in the JSON will only be accurate if they were NFC-normalized valid UTF-8 with no special ASCII control codes (U+0000-U+001F excluding tab U+0009 and LF U+000A) when they were returned from Module:module to dump. As on a wiki page, the expandtemplates API will replace ASCII control codes and invalid UTF-8 with the U+FFFD character, and will NFC-normalize everything else. That is, "\1\128e" .. mw.ustring.char(0x0301) would be modified to the equivalent of mw.ustring.char(0xFFFD, 0xFFFD, 0x00E9). This doesn't matter in most cases (like if the table contains readable text), but if it did matter, the JSON-encoding module would have to output JSON escapes for non-NFC character sequences and ASCII control codes and find some way to encode invalid UTF-8.
If, like the module you are dumping, Module:module to dump is a pure table of literal values with no references to other modules or to Scribunto-only global values, you could also get its raw wikitext with the Revisions API and parse it in Lua on your machine and pass it to Python. I think there is a Python extension that allows you to directly use a Lua state in Python.
Running a module with dependencies on the local machine is not possible unless you go to the trouble of setting up the full Scribunto environment on your machine, and figuring out a way to download the module dependencies and make them available to the Lua state. I have sort of done this myself, but it isn't necessary for your use case.

Using python and suds, data not read by server side because element is not defined as an array

I am a very inexperienced programmer with no formal education. Details will be extremely helpful in any responses.
I have made several basic python scripts to call SOAP APIs, but I am running into an issue with a specific API function that has an embedded array.
Here is a sample excerpt from a working XML format to show nested data:
<bomData xsi:type="urn:inputBOM" SOAP-ENC:arrayType="urn:bomItem[]">
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
</bomData>
I have tried 3 different things to get this to work to no avail.
I can generate the near exact XML from my script, but a key attribute missing is the 'SOAP-ENC:arrayType="urn:bomItem[]"' in the above XML example.
Option 1 was using MessagePlugin, but I get an error because my section is like the 3 element and it always injects into the first element. I have tried body[2], but this throws an error.
Option 2 I am trying to create the object(?). I read a lot of stack overflow, but I might be missing something for this.
Option 3 looked simple enough, but also failed. I tried setting the values in the JSON directly. I got these examples by an XML sample to JSON.
I have also done a several other minor things to try to get it working, but not worth mentioning. Although, if there is a way to somehow do the following, then I'm all ears:
bomItem[]: bomData = {"bomItem"[{...,...,...}]}
Here is a sample of my script:
# for python 3
# using pip install suds-py3
from suds.client import Client
from suds.plugin import MessagePlugin
# Config
#option 1: trying to set it as an array using plugin
class MyPlugin(MessagePlugin):
def marshalled(self, context):
body = context.envelope.getChild('Body')
bomItem = body[0]
bomItem.set('SOAP-ENC:arrayType', 'urn:bomItem[]')
URL = "http://localhost/application/soap?wsdl"
client = Client(URL, plugins=[MyPlugin()])
transact_info = {
"username":"",
"transaction":"",
"workorder":"",
"serial":"",
"trans_qty":"",
"seqnum":"",
"opcode":"",
"warehouseloc":"",
"warehousebin":"",
"machine_id":"",
"comment":"",
"defect_code":""
}
#WIP - trying to get bomData below working first
inputData = {
"dataItem":[
{
"fieldname": "",
"fielddata": ""
}
]
}
#option 2: trying to create the element here and define as an array
#inputbom = client.factory.create('ns3:inputBOM')
#inputbom._type = "SOAP-ENC:arrayType"
#inputbom.value = "urn:bomItem[]"
bomData = {
#Option 3: trying to set the time and array type in JSON
#"#xsi:type":"urn:inputBOM",
#"#SOAP-ENC:arrayType":"urn:bomItem[]",
"bomItem":[
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
},
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
}
]
}
try:
response = client.service.transactUnit(transact_info,inputData,bomData)
print("RESPONSE: ")
print(response)
#print(client)
#print(envelope)
except Exception as e:
#handle error here
print(e)
I appreciate any help and hope it is easy to solve.
I have found the answer I was looking for. At least a working solution.
In any case, option 1 worked out. I read up on it at the following link:
https://suds-py3.readthedocs.io/en/latest/
You can review at the '!MessagePlugin' section.
I found a solution to get message plugin working from the following post:
unmarshalling Error: For input string: ""
A user posted an example how to crawl through the XML structure and modify it.
Here is my modified example to get my script working:
#Using MessagePlugin to modify elements before sending to server
class MyPlugin(MessagePlugin):
# created method that could be reused to modify sections with similar
# structure/requirements
def addArrayType(self, dataType, arrayType, transactUnit):
# this is the code that is key to crawling through the XML - I get
# the child of each parent element until I am at the right level for
# modification
data = transactUnit.getChild(dataType)
if data:
data.set('SOAP-ENC:arrayType', arrayType)
def marshalled(self, context):
# Alter the envelope so that the xsd namespace is allowed
context.envelope.nsprefixes['xsd'] = 'http://www.w3.org/2001/XMLSchema'
body = context.envelope.getChild('Body')
transactUnit = body.getChild("transactUnit")
if transactUnit:
self.addArrayType('inputData', 'urn:dataItem[]', transactUnit)
self.addArrayType('bomData', 'urn:bomItem[]', transactUnit)

Python protobuf converted to json

I am trying to convert message object
message = [id: "ff90608b-bb1f-463b-ad26-e0027e67e826"
byte_content: "PK\003\004\024\000\000\000\010\000\360\206\322R\007AMb\201\...00\000\310\031\000\000\000\000"
file_type: "application/pdf"
file_name: "cumulative-essentials-visit.pdf"
]
by
from google.protobuf.json_format import MessageToDict
dict_obj = MessageToDict(message_obj)
to json but got an error
message_descriptor = message.DESCRIPTOR
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'DESCRIPTOR'
Is there any idea?
Thanks
Here is a working example as well as reproducing the exception above.
Step 1: todolist.proto file with following content:
syntax = "proto3";
// Not necessary for Python but should still be declared to avoid name collisions
// in the Protocol Buffers namespace and non-Python languages
package protoblog;
// Style guide prefers prefixing enum values instead of surrounding
// with an enclosing message
enum TaskState {
TASK_OPEN = 0;
TASK_IN_PROGRESS = 1;
TASK_POST_PONED = 2;
TASK_CLOSED = 3;
TASK_DONE = 4;
}
message TodoList {
int32 owner_id = 1;
string owner_name = 2;
message ListItems {
TaskState state = 1;
string task = 2;
string due_date = 3;
}
repeated ListItems todos = 3;
}
Step 2: Generate python specific code from todolist.proto file by running following:
protoc -I=. --python_out=. todolist.proto
This will generate a file in the current directory todolist_pb2.py
Step 3: Create a python project and copy todolist_pb2.py to it.
Step 4: Create a python module proto_test.py with following content:
import json
from google.protobuf.json_format import Parse
from google.protobuf.json_format import MessageToDict
from todolist_pb2 import TodoList
todolist_json_message = {
"ownerId": "1234",
"ownerName": "Tim",
"todos": [
{
"state": "TASK_DONE",
"task": "Test ProtoBuf for Python",
"dueDate": "31.10.2019"
}
]
}
todolist_proto_message = Parse(json.dumps(todolist_json_message), TodoList())
print(todolist_proto_message)
# Successfully converts the message to dictionary
todolist_proto_message_dict = MessageToDict(todolist_proto_message)
print(todolist_proto_message_dict)
# If you try to convert a field from your message rather than entire message,
# you will get object has no attribute 'DESCRIPTOR exception'
# Examples:
# Eg.1: Produces AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute
# 'DESCRIPTOR.'
todos_as_dict = MessageToDict(todolist_proto_message.todos)
# Eg.2: Produces AttributeError: 'int' object has no attribute 'DESCRIPTOR'
owner_id_as_dict = MessageToDict(todolist_proto_message.owner_id)
Step 5: Run the proto_test.py module and you can see the failing behavior and the successful behavior.
So it seems like you are not converting your actual message rather you are converting a field of type list from your message/response. So try to convert the entire message and then retrieve the field you are interested in.
Please let me know if it helps.
NOTE: You need to ensure protoc compiler is installed in your machine to compile .proto file to python specific code as mentioned in step 2.
Installation instruction can be found below:
MacOS/Linux
Windows

Attribute error DESCRIPTOR while trying to convert google vision response to dictionary with python

I am on Windows, using Python 3.8.6rc1, protobuf version 3.13.0 and google-cloud-vision version 2.0.0.
My Code is :
from google.protobuf.json_format import MessageToDict
from google.cloud import vision
client = vision.ImageAnnotatorClient()
response = client.annotate_image({
'image': {'source': {'image_uri': 'https://images.unsplash.com/photo-1508138221679-760a23a2285b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=60'}},
})
MessageToDict(response)
It fails at MessageToDict(response), I have an attribute error: "DESCRIPTOR". It seems like the response is not a valid protobuf object. Can someone help me? Thank you
This does not really answer my question but I find that one way to solve it and access the protobuf object is to use response._pb so the code becomes:
response = client.annotate_image({
'image': {'source': {'image_uri': 'https://images.unsplash.com/photo-1508138221679-760a23a2285b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=60'}},
})
MessageToDict(response._pb)
Look step 3,
Step 1: Import this lib
from google.protobuf.json_format import MessageToDict
Step 2: Send request
keyword_ideas = keyword_plan_idea_service.generate_keyword_ideas(
request=request
)
Step 3: Convert response to json [Look here, add ".pd"]
keyword_ideas_json = MessageToDict(keyword_ideas._pb) // add ._pb at the end of object
Step 4: Do whatever you want with that json
print(keyword_ideas_json)
Github for this same issue: here
maybe have a look at this post
json_string = type(response).to_json(response)
# Alternatively
import proto
json_string = proto.Message.to_json(response)
From the github issue #FriedrichSal posted, you can see that proto does the job and this is still valid in 2022 (library name is proto-plus):
All message types are now defined using proto-plus, which uses different methods for serialization and deserialization.
import proto
objects = client.object_localization(image=image)
json_obs = proto.Message.to_json(objects)
dict_obs = proto.Message.to_dict(objects)
The MessageToJson(objects._pb) still works, but maybe someone prefers not to depend on a "hidden" property.

Convert JSON to XML in Python

I see a number of questions on SO asking about ways to convert XML to JSON, but I'm interested in going the other way. Is there a python library for converting JSON to XML?
Edit: Nothing came back right away, so I went ahead and wrote a script that solves this problem.
Python already allows you to convert from JSON into a native dict (using json or, in versions < 2.6, simplejson), so I wrote a library that converts native dicts into an XML string.
https://github.com/quandyfactory/dict2xml
It supports int, float, boolean, string (and unicode), array and dict data types and arbitrary nesting (yay recursion).
I'll post this as an answer once 8 hours have passed.
Nothing came back right away, so I went ahead and wrote a script that solves this problem.
Python already allows you to convert from JSON into a native dict (using json or, in versions < 2.6, simplejson), so I wrote a library that converts native dicts into an XML string.
https://github.com/quandyfactory/dict2xml
It supports int, float, boolean, string (and unicode), array and dict data types and arbitrary nesting (yay recursion).
If you don't have such a package, you can try:
def json2xml(json_obj, line_padding=""):
result_list = list()
json_obj_type = type(json_obj)
if json_obj_type is list:
for sub_elem in json_obj:
result_list.append(json2xml(sub_elem, line_padding))
return "\n".join(result_list)
if json_obj_type is dict:
for tag_name in json_obj:
sub_obj = json_obj[tag_name]
result_list.append("%s<%s>" % (line_padding, tag_name))
result_list.append(json2xml(sub_obj, "\t" + line_padding))
result_list.append("%s</%s>" % (line_padding, tag_name))
return "\n".join(result_list)
return "%s%s" % (line_padding, json_obj)
For example:
s='{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
j = json.loads(s)
print(json2xml(j))
Result:
<main>
<aaa>
10
</aaa>
<bbb>
1
2
3
</bbb>
</main>
Load it into a dict using json.loads then use anything from this question...
Serialize Python dictionary to XML
Use dicttoxml to convert JSON directly to XML
Installation
pip install dicttoxml
or
easy_install dicttoxml
In [2]: from json import loads
In [3]: from dicttoxml import dicttoxml
In [4]: json_obj = '{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
In [5]: xml = dicttoxml(loads(json_obj))
In [6]: print(xml)
<?xml version="1.0" encoding="UTF-8" ?><root><main type="dict"><aaa type="str">10</aaa><bbb type="list"><item type="int">1</item><item type="int">2</item><item type="int">3</item></bbb></main></root>
In [7]: xml = dicttoxml(loads(json_obj), attr_type=False)
In [8]: print(xml)
<?xml version="1.0" encoding="UTF-8" ?><root><main><aaa>10</aaa><bbb><item>1</item><item>2</item><item>3</item></bbb></main></root>
For more information on dicttoxml
I found xmltodict to be useful. Looks like it was released after some of the posts here. https://pypi.org/project/xmltodict/
import xmltodict
import json
sample_json = {"note": {"to": "Tove", "from": "Jani", "heading": "Reminder", "body": "Don't forget me this weekend!"}}
#############
#json to xml
#############
json_to_xml = xmltodict.unparse(sample_json)
print(json_to_xml)
#############
#xmlto json
#############
x_to_j_dict = xmltodict.parse(json_to_xml)
x_to_j_string = json.dumps(x_to_j_dict)
back_to_json = json.loads(x_to_j_string)
print(back_to_json)
from json import loads
from dicttoxml import dicttoxml
s='{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
xml = dicttoxml(loads(s))
Or if your data is stored in a pandas data.frame as mine often is:
df['xml'] = df['json'].apply(lambda s: dicttoxml(json.loads(s))

Categories

Resources