Safe and generic serialization in Python

Safe and generic serialization in Python - python

I want to (de)serialize simple objects in Python to a human-readable (e.g. JSON) format. The data may come from an untrusted source. I really like how the Rust library, serde, works:
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
// Convert the Point to a JSON string.
let serialized = serde_json::to_string(&point).unwrap();
// Prints serialized = {"x":1,"y":2}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Point.
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Point { x: 1, y: 2 }
println!("deserialized = {:?}", deserialized);
}
I'd like to achieve something like this in Python. Since Python is not statically typed, I'd expect the syntax to be something like:
deserialized = library.loads(data_str, ClassName)
where ClassName is the expected class.
jsonpickle is bad, bad, bad. It makes absolutely no sanitization and its usage leads to arbitrary code execution
There are the serialization libraries: lima, marshmallow, kim but all of them require manually defining serialization schemes. It, in fact, leads to code duplication, which is bad.
Is there anything I could use for simple, generic yet secure serialization in Python?
EDIT: other requirements, which were implicit before
Handle nested serialization (serde can do it: https://gist.github.com/63bcd00691b4bedee781c49435d0d729)
Handle built-in types, i.e. be able to serialize and deserialize everything that the built-in json module can, without special treatment of built-in types.

Since Python doesn't require type annotations, any such library would need to either
use its own classes
take advantage of type annotations.
The latter would be the perfect solution but I have not found any library doing that.
I found a module, though, which requires to define only one class as a model: https://github.com/dimagi/jsonobject
Usage example:
import jsonobject
class Node(jsonobject.JsonObject):
id = jsonobject.IntegerProperty(required=True)
name = jsonobject.StringProperty(required=True)
class Transaction(jsonobject.JsonObject):
provider = jsonobject.ObjectProperty(Node)
requestor = jsonobject.ObjectProperty(Node)
req = Node(id=42, name="REQ")
prov = Node(id=24, name="PROV")
tx = Transaction(provider=prov, requestor=req)
js = tx.to_json()
tx2 = Transaction(js)
print(tx)
print(tx2)

For Python, I would start just by checking the size of the input. The only security risk is running json.load() is a DOS by sending an enormous file.
Once the JSON is parsed, consider running a schema validator such as PyKwalify.

Related

Reading Binary data type from Redis published by Streambase(Java)

Here is the java code that, that publishes data to Redis
import com.streambase.sb.util.ByteOrderedDataOutput;
byte[] valuebuffer=null;
ByteOrderedDataOutput boutput = new ByteOrderedDataOutput(0,tuple.getByteOrder());
tuple.serialize(boutput,0);
valuebuffer = boutput.getBuffer();
byte[] keybuffer = null;
String keyvalue = redisStream+"."+keyFieldStr;
keybuffer = keyvalue.getBytes();
strLuaCommands += "redis.call('set',KEYS["+ (++keyCount) +"],ARGV["+ (++argCount) +"])";
keys.add(keybuffer);
args.add(valuebuffer);
I was able to get the data through python struct, but this is not in correct format.
import redis, struct
redis_client = redis.StrictRedis(host="abc.com", port=6379, db=0)
temp = redis_client.get('samplekey')
struct.unpack("<" + ("s" * (len(temp))), temp)

Tuple.serialize() uses the com.streambase.sb.util.ByteOrderedDataOutput class, which has never been part of the StreamBase public API. Therefore the Tuple.serialize() methods shouldn't be considered part of the public API, either.
Also, there's no particular reason to believe that the Python struct.unpack() method knows how to understand StreamBase's ByteOrderedDataOutput, whatever that is. So it's not surprising that what you are unpacking is not what you want.
One sort of quick-to-imagine workaround would be to use the StreamBase Python Operator to convert your StreamBase tuple into Python objects, and then use a Python script to write whatever you want to write into redis. Then, since you will have now encoded the stuff and decoded the stuff with the same Python complementary library functions, you'll have a much better chance of not mangling your data.

Passing a function to an Object at Instantiation in C#

I am currently working on a project where I am trying to make an GUI for programming an I2C device. I have a bunch of register addresses and values in those registers both in hex strings like so:
[Address, Value][0x00, 0x01][0x01, 0xFF][0x02, 0xA0]... //not code, just abstract representation
Each register value represents something different. I have to convert each register value from a hex string to its associated human-understandable representation. My plan for this is to create a dictionary whose keys are the register addresses and whose values are a register object. The register object would contain information about the register like a description and a conversion function which takes in the hex string and outputs a converted value. The conversion function is different for every register since they represent different things. I want to create a generic register class and pass in each register's unique conversion function at instantiation. Passing this function is where I'm not sure if I'm making things more complicated than they need to be.
My code currently looks like this:
private class my_register
{
private string Description;
{get;}
//using delegate to be able to store function specified outside of class
public delegate string convert_del(string reg_val);
public convert_del conversion_func;
//constructor uses a delegate to store the function passed at instantiation
my_register(string desc, func<string, string> convert_func_input)
this.Description = desc;
this.conversion_func = new convert_del(convert_func_input)
}
//Now I can create the object and pass in the function
static void Main()
{
// lambda function is just a simple place holder to show that I can pass a function
my_register first_reg = new my_register("Temp", reg_val => (Convert.ToInt32(reg_val, 16)
+ 10).ToString())
Console.WriteLine(first_reg.conversion_func("0x0A")) //output is 20
}
This code works at least for the minimal testing I ran. This took me a long time to figure out, perhaps because I just didn't understand delegates well, but I am wondering if this is a convoluted way of going about this in C#. I come from a python background, though I'm not particularly skilled in that either. The way I would do this in python is as follows:
class my_register(object):
def __init__(self, desc, conversion):
self.Description = desc
self.conversion = conversion
def temp_convert(reg_val):
return str(int(reg_val, 0) + 10)
first_reg = my_register("Temp", temp_convert)
first_reg.conversion(10) #returns '20'
The pythonic way just seems way simpler, so I'm wondering if there a better more canonical way of achieving this function passing in C# or if there is a way to avoid passing the function all together?

Working with Your Own Types - Python

I am trying to understand the following topic and have some outstanding questions. Can anyone help me?:
class MyObj(object):
def __init__(self, s):
self.s = s
def __repr__(self):
return '<MyObj(%s)>' % self.s
====================================
import json
import json_myobj
obj = json_myobj.MyObj('instance value goes here')
print 'First attempt'
try:
print json.dumps(obj)
except TypeError, err:
print 'ERROR:', err
def convert_to_builtin_type(obj):
print 'default(', repr(obj), ')'
# Convert objects to a dictionary of their representation
d = { '__class__':obj.__class__.__name__,
'__module__':obj.__module__,
}
d.update(obj.__dict__)
return d
print
print 'With default'
print json.dumps(obj, default=convert_to_builtin_type)
Question: what is the purpose of the following code?
d = { '__class__':obj.__class__.__name__,
'__module__':obj.__module__,
}
d.update(obj.__dict__)

I think there are two things you need to know to understand this code snippet.
JSON serialization and deserialization.
JSON is a data-exchange format. Particularly it is text-based, which means if you want to save your data into a text file, you have to determine how to represent your data as the text (The serialization process). Of course, when you load data from a text file, you also need to determine how to parse the text into the memory structure (The deserialization process). Luckily, by default, the json module of python would handle most of the built-in data types, e.g., scalar type, list, dict and etc. But for your case, you have created your own data type, thus you have to specify how to serialize your own data type. This is what function convert_to_builtin_type does.
Python data model
Now we come across the problem how to serialize the self-defined object Myobj. There is no uniform answer for this question, but the base line is that you can recover your object (deserialize) by the serialized text. In your case:
d = { '__class__':obj.__class__.__name__,
'__module__':obj.__module__,
}
d.update(obj.__dict__)
The obj.__dict__ is a built-in dictionary that stores attributes of obj. You may read the python documentation Data Model to understand it. The intention here is try to give enough information to recover obj. For example:
__class__=<c> provides the name of the class
__module__=<m> provides the module to find the class.
s=<v> provides the attribute and value of Myobj.s
With these three, you can recover the object you previously stored. For these hidden (built-in) attributes starting with __, you need to check the python document.
Hope this would be helpful.

Using google.protobuf.Any in python file

I have such .proto file
syntax = "proto3";
import "google/protobuf/any.proto";
message Request {
google.protobuf.Any request_parameters = 1;
}
How can I create Request object and populate its fields? I tried this:
import ma_pb2
from google.protobuf.any_pb2 import Any
parameters = {"a": 1, "b": 2}
Request = ma_pb2.Request()
some_any = Any()
some_any.CopyFrom(parameters)
Request.request_parameters = some_any
But I have an error:
TypeError: Parameter to CopyFrom() must be instance of same class: expected google.protobuf.Any got dict.
UPDATE
Following prompts of #Kevin I added new message to .proto file:
message Small {
string a = 1;
}
Now code looks like this:
Request = ma_pb2.Request()
small = ma_pb2.Small()
small.a = "1"
some_any = Any()
some_any.Pack(small)
Request.request_parameters = small
But at the last assignment I have an error:
Request.request_parameters = small
AttributeError: Assignment not allowed to field "request_parameters" in protocol message object.
What did I do wrong?

Any is not a magic box for storing arbitrary keys and values. The purpose of Any is to denote "any" message type, in cases where you might not know which message you want to use until runtime. But at runtime, you still need to have some specific message in mind. You can then use the .Pack() and .Unpack() methods to convert that message into an Any, and at that point you would do something like Request.request_parameters.CopyFrom(some_any).
So, if you want to store this specific dictionary:
{"a": 1, "b": 2}
...you'll need a .proto file which describes some message type that has integer fields named a and b. Personally, I'd see that as overkill; just throw your a and b fields directly into the Request message, unless you have a good reason for separating them out. If you "forget" one of these keys, you can always add it later, so don't worry too much about completeness.
If you really want a "magic box for storing arbitrary keys and values" rather than what I described above, you could use a Map instead of Any. This has the advantage of not requiring you to declare all of your keys upfront, in cases where the set of keys might include arbitrary strings (for example, HTTP headers). It has the disadvantage of being harder to lint or type-check (especially in statically-typed languages), because you can misspell a string more easily than an attribute. As shown in the linked resource, Maps are basically syntactic sugar for a repeated field like the following (that is, the on-wire representation is exactly the same as what you'd get from doing this, so it's backwards compatible to clients which don't support Maps):
message MapFieldEntry {
key_type key = 1;
value_type value = 2;
}
repeated MapFieldEntry map_field = N;

Python framework for object-XML mapping through decorators?

After coming up impressed with the ease of use of the XML serialization framework Simple XML in Java, I've tried looking for a Python counterpart that would facilitate implementing classes and their XML serialization in a similar fashion. So far, I've come up more or less empty-handed, although there are interesting candidates (but none conveniently using decorators, as far as I could tell); for example, I started looking at dexml, but I got stumped with an example as simple as implementing a class that would allow deserialization of
<Error Code="0">OK</Error>
With Simple in Java, I could write a class such as
#Root(name="Error")
public class Error {
#Attribute(name = "Code")
private int code; // public getter and setter
#Text(required = false)
private String description; // public getter and setter
}
Is there already a similar flavored framework in Python as Simple for Java? I have preference for Python 2.6 support, although that's not mandatory; if it's supported only for Python 3 I'll also look into it.

Actually this syntax is supported in dexml. It took me awhile to figure it out (reading the source code helped).
class Error(dexml.Model):
code = dexml.fields.String()
value = dexml.fields.String(tagname=".")
And the following will return the xml rendering of desire:
e = Error(code="0",value="OK")
print e.render(fragment=True)

Don't have an answer but will confirm the difficulty of using dexml to parse your example. It doesn't look like there is a way to parse an element with attributes and a text node. Defining the Code attribute is simple:
class Error(dexml.Model):
code = dexml.fields.String(attrname="Code")
But there is no way to reference a child text node. One would like to do something like:
class Error(dexml.Model):
code = dexml.fields.String(attrname="Code")
text = dexml.fields.String(textnode=True)
One not very satisfactory way to capture the text would be to wrap it in extra tags:
<Error Code="0"><text>OK</text></Error>
Then you could define the class as:
class Error(dexml.Model):
code = dexml.fields.String(attrname="Code")
text = dexml.fields.String(tagname="text")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.