Require a `oneof` in protobuf?

Require a `oneof` in protobuf? - python

I want to make a protobuf Event message that can contain several different event types. Here's an example:
message Event {
required int32 event_id = 1;
oneof EventType {
FooEvent foo_event = 2;
BarEvent bar_event = 3;
BazEvent baz_event = 4;
}
}
This works fine, but one thing that bugs me is that EventType is optional: I can encode an object with only an event_id and protobuf won't complain.
>>> e = test_pb2.Event()
>>> e.IsInitialized()
False
>>> e.event_id = 1234
>>> e.IsInitialized()
True
Is there any way to require the EventType to be set? I'm using Python, if that matters.

According to Protocol Buffers document, the required field rule is not recommended and has already been removed in proto3.
Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.
And as the above document says, you should consider using application-specific validation instead of marking the fields as required.
There is no way to mark a oneof as "required" (even in proto2) because at the time oneof was introduced, it was already widely accepted that fields probably should never be "required", and so the designers did not bother implementing a way to make a oneof required.

Using options syntax, there are ways to specify validation rules and auto-generate the code for the validation routines.
You can use https://github.com/envoyproxy/protoc-gen-validate like this:
import "validate/validate.proto";
message Event {
required int32 event_id = 1;
oneof EventType {
option (validate.required) = true;
FooEvent foo_event = 2;
BarEvent bar_event = 3;
BazEvent baz_event = 4;
}
}
"You should consider writing application-specific custom validation routines for your buffers instead." And here we are auto-generating such custom validation routines.
But wait, is this going against the spirit of protobuf spec? Why is required bad and validate good? My own answer is that the protobuf spec cares very much about "proxies", i.e. software which serializes/deserializes messages, but has almost no business logic on its own. Such software can simply omit the validation (it's an option), but it cannot omit required (it must render the message unparseable).
For business logic's side, all of this is not a big problem in my experience.

Related

Safe and generic serialization in Python

I want to (de)serialize simple objects in Python to a human-readable (e.g. JSON) format. The data may come from an untrusted source. I really like how the Rust library, serde, works:
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
// Convert the Point to a JSON string.
let serialized = serde_json::to_string(&point).unwrap();
// Prints serialized = {"x":1,"y":2}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Point.
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Point { x: 1, y: 2 }
println!("deserialized = {:?}", deserialized);
}
I'd like to achieve something like this in Python. Since Python is not statically typed, I'd expect the syntax to be something like:
deserialized = library.loads(data_str, ClassName)
where ClassName is the expected class.
jsonpickle is bad, bad, bad. It makes absolutely no sanitization and its usage leads to arbitrary code execution
There are the serialization libraries: lima, marshmallow, kim but all of them require manually defining serialization schemes. It, in fact, leads to code duplication, which is bad.
Is there anything I could use for simple, generic yet secure serialization in Python?
EDIT: other requirements, which were implicit before
Handle nested serialization (serde can do it: https://gist.github.com/63bcd00691b4bedee781c49435d0d729)
Handle built-in types, i.e. be able to serialize and deserialize everything that the built-in json module can, without special treatment of built-in types.

Since Python doesn't require type annotations, any such library would need to either
use its own classes
take advantage of type annotations.
The latter would be the perfect solution but I have not found any library doing that.
I found a module, though, which requires to define only one class as a model: https://github.com/dimagi/jsonobject
Usage example:
import jsonobject
class Node(jsonobject.JsonObject):
id = jsonobject.IntegerProperty(required=True)
name = jsonobject.StringProperty(required=True)
class Transaction(jsonobject.JsonObject):
provider = jsonobject.ObjectProperty(Node)
requestor = jsonobject.ObjectProperty(Node)
req = Node(id=42, name="REQ")
prov = Node(id=24, name="PROV")
tx = Transaction(provider=prov, requestor=req)
js = tx.to_json()
tx2 = Transaction(js)
print(tx)
print(tx2)

For Python, I would start just by checking the size of the input. The only security risk is running json.load() is a DOS by sending an enormous file.
Once the JSON is parsed, consider running a schema validator such as PyKwalify.

Using google.protobuf.Any in python file

I have such .proto file
syntax = "proto3";
import "google/protobuf/any.proto";
message Request {
google.protobuf.Any request_parameters = 1;
}
How can I create Request object and populate its fields? I tried this:
import ma_pb2
from google.protobuf.any_pb2 import Any
parameters = {"a": 1, "b": 2}
Request = ma_pb2.Request()
some_any = Any()
some_any.CopyFrom(parameters)
Request.request_parameters = some_any
But I have an error:
TypeError: Parameter to CopyFrom() must be instance of same class: expected google.protobuf.Any got dict.
UPDATE
Following prompts of #Kevin I added new message to .proto file:
message Small {
string a = 1;
}
Now code looks like this:
Request = ma_pb2.Request()
small = ma_pb2.Small()
small.a = "1"
some_any = Any()
some_any.Pack(small)
Request.request_parameters = small
But at the last assignment I have an error:
Request.request_parameters = small
AttributeError: Assignment not allowed to field "request_parameters" in protocol message object.
What did I do wrong?

Any is not a magic box for storing arbitrary keys and values. The purpose of Any is to denote "any" message type, in cases where you might not know which message you want to use until runtime. But at runtime, you still need to have some specific message in mind. You can then use the .Pack() and .Unpack() methods to convert that message into an Any, and at that point you would do something like Request.request_parameters.CopyFrom(some_any).
So, if you want to store this specific dictionary:
{"a": 1, "b": 2}
...you'll need a .proto file which describes some message type that has integer fields named a and b. Personally, I'd see that as overkill; just throw your a and b fields directly into the Request message, unless you have a good reason for separating them out. If you "forget" one of these keys, you can always add it later, so don't worry too much about completeness.
If you really want a "magic box for storing arbitrary keys and values" rather than what I described above, you could use a Map instead of Any. This has the advantage of not requiring you to declare all of your keys upfront, in cases where the set of keys might include arbitrary strings (for example, HTTP headers). It has the disadvantage of being harder to lint or type-check (especially in statically-typed languages), because you can misspell a string more easily than an attribute. As shown in the linked resource, Maps are basically syntactic sugar for a repeated field like the following (that is, the on-wire representation is exactly the same as what you'd get from doing this, so it's backwards compatible to clients which don't support Maps):
message MapFieldEntry {
key_type key = 1;
value_type value = 2;
}
repeated MapFieldEntry map_field = N;

How can I specify a default time for a ndb.TimeProperty()?

I find myself stuck on this problem, and repeated Googling, checking SO, and reading numerous docs has not helped me get the right answer, so I hope this isn't a bad question.
One entity I want to create is an event taking place during a convention. I'm giving it the property start_time = ndb.TimeProperty(). I also have a property date = messages.DateProperty(), and I'd like to keep the two discrete (in other words, not using DateTimeProperty).
When a user enters information to create an event, I want to specify defaults for any fields they do not enter at creation and I'd like to set the default time as midnight, but I can't seem to format it correctly so the service accepts it (constant 503 Service Unavailable response when I try it using the API explorer).
Right now I've set things up like this (some unnecessary details removed):
event_defaults = {...
...
"start_time": 0000,
...
}
and then I try looping over my default values to enter them into a dictionary which I'll use to .put() the info on the server.
data = {field.name: getattr(request, field.name) for field in request.all_fields()
for default in event_defaults:
if data[default] in (None, []):
data[default] = event_defaults[default]
setattr(request, default, event_defaults[default])
In the logs, I see the error Encountered unexpected error from ProtoRPC method implementation: BadValueError (Expected time, got 0). I have also tried using the time and datetime modules, but I must be using them incorrectly, because I still receive errors.
I suppose I could work around this problem by using ndb.StringProperty() instead, and just deal with strings, but then I'd feel like I would be missing out on a chance to learn more about how GAE and NDB work (all of this is for a project on udacity.com, so learning is certainly the point).
So, how can I structure my default time properly for midnight? Sorry for the wall of text.
Link to code on github. The conference.py file contains the code I'm having the trouble with, and models.py contains my definitions for the entities I'm working with.
Update: I'm a dummy. I had my model class using a TimeProperty() and the corresponding message class using a StringField(), but I was never making the proper conversion between expected types. That's why I could never seem to give it the right thing, but it expected two different things at different points in the code. Issue resolved.

TimeProperty expects a datetime.time value
import datetime
event_defaults = {...
...
"start_time": datetime.time(),
...
}
More in the docs: https://cloud.google.com/appengine/docs/python/ndb/entity-property-reference#Date_and_Time

Use the datetime() library to convert it into a valid ndb time property value
if data['time']:
data['time'] = datetime.strptime(data['time'][:10], "%H:%M").time()
else:
data['time'] = datetime.datetime.now().time()
ps: Don't forget to change data['time'] with your field name

Accessing field of Protobuf message of unknown type in Python

Let's say I have 2 Protobuf-Messages, A and B. Their overall structure is similar, but not identical. So we moved the shared stuff out into a separate message we called Common. This works beautifully.
However, I'm now facing the following problem: A special case exists where I have to process a serialized message, but I don't know whether it's a message of type A or type B. I have a working solution in C++ (shown below), but I failed to find a way to do the same thing in Python.
Example:
// file: Common.proto
// contains some kind of shared struct that is used by all messages:
message Common {
...
}
// file: A.proto
import "Common.proto";
message A {
required int32 FormatVersion = 1;
optional bool SomeFlag [default = true] = 2;
optional Common CommonSettings = 3;
... A-specific Fields ...
}
// file: B.proto
import "Common.proto";
message B {
required int32 FormatVersion = 1;
optional bool SomeFlag [default = true] = 2;
optional Common CommonSettings = 3;
... B-specific Fields ...
}
Working Solution in C++
In C++ I'm using the reflection API to get access to the CommonSettings field like this:
namespace gp = google::protobuf;
...
Common* getCommonBlock(gp::Message* paMessage)
{
gp::Message* paMessage = new gp::Message();
gp::FieldDescriptor* paFieldDescriptor = paMessage->GetDescriptor()->FindFieldByNumber(3);
gp::Reflection* paReflection = paMessage->GetReflection();
return dynamic_cast<Common&>(paReflection->GetMessage(*paMessage,paFieldDescriptor));
}
The method 'getCommonBlock' uses FindFieldByNumber() to get hold of the descriptor of the field I'm trying to get. Then it uses reflection to fetch the actual data. getCommonBlock can process messages of type A, B or any future type as long as the Common field remains located at index 3.
My Question is: Is there a way to do a similar thing Python? I've been looking at the Protobuf documentation, but couldn't figure out a way to do it.

I know this is an old thread, but I'll respond anyway for posterity:
Firstly, as you know, it's not possible to determine the type of a protocol buffer message purely from its serialized form. The only information in the serialized form you have access to is the field numbers, and their serialized values.
Secondly, the "right" way to do this would be to have a proto that contains both, like
message Parent {
required int32 FormatVersion = 1;
optional bool SomeFlag [default = true] = 2;
optional Common CommonSettings = 3;
oneof letters_of_alphabet {
A a_specific = 4;
B b_specific = 5;
}
}
This way, there's no ambiguity: you just parse the same proto (Parent) every time.
Anyway, if it's too late to change that, what I recommend you do is define a new message with only the shared fields, like
message Shared {
required int32 FormatVersion = 1;
optional bool SomeFlag [default = true] = 2;
optional Common CommonSettings = 3;
}
You should then be able to pretend that the message (either A or B) is in fact a Shared, and parse it accordingly. The unknown fields will be irrelevant.

One of the advantages of Python over a statically-typed language like C++ is that you don't need to use any special reflection code to get an attribute of an object of unknown type: you just ask the object. The built-in function that does this is getattr, so you can do:
settings_value = getattr(obj, 'CommonSettings')

I had a similar problem.
What I did was to create a new message, with an enum specifying the type:
enum TYPE {
A = 0;
B = 1;
}
message Base {
required TYPE type = 1;
... Other common fields ...
}
Then create specific message types:
message A {
required TYPE type = 1 [default: A];
... other A fields ...
}
And:
message B {
required TYPE type = 1 [default: B];
... other B fields ...
}
Be sure to define correctly the 'Base' message, or you won't be binary compatible if you add fields lately (as you will have to shift inheriting message fields too).
That way, you can recive a generic message:
msg = ... receive message from net ...
# detect message type
packet = Base()
packet.ParseFromString(msg)
# check for type
if packet.type == TYPE.A:
# parse message as appropriate type
packet = A()
packet.ParseFromString(msg)
else:
# this is a B message... or whatever
# ... continue with your business logic ...
Hope this helps.

How about "concatenating" two protocol buffers in a header+payload format, e.g. header as the common data follows by either message A or B as suggested by protobuf techniques?
This is how I did it with various types of payload as blob within mqtt message.

Canonical way to transmit protocol buffers over network?

I'm trying to use Google's protocol buffers (protobuf) with Python in a networked program, using bare sockets. My question is: after the transmitting side sends a message, how does the receiving side knows what kind of message was transmitted? For example, say I have message definitions:
message StrMessage {
required string str = 1;
}
message IntMessage {
required int32 num = 1;
}
Now the transmitter makes a StrMessage, serializes it, and sends the serialized bytes over the network. How does the receiver know to deserialize the bytes with StrMessage rather than IntMessage? I've tried doing two things:
// Proposal 1: send one byte header to indicate type
enum MessageType {
STR_MESSAGE = 1;
INT_MESSAGE = 2;
}
// Proposal 2: use a wrapper message
message Packet {
optional StrMessage m_str = 1;
optional IntMessage m_int = 2;
}
Neither of these seems very clean, though, and both require me to list all the message types by hand. Is there a canonical/better way to handle this problem?
Thanks!

This has been discussed before, for example this thread on the protobuf list, but simply there is no canonical / de-facto way of doing this.
Personally, I like the Packet approach, as it keeps everything self-contained (and indeed, in protobuf-net I have specific methods to process data in that format, just returning the StrMessage / IntMessage, leaving the Packet layer as an unimportant implementation detail that never actually gets used), but since that previous proposal by Kenton never got implemented (AFAIK) it is entirely a matter of personal taste.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.