Why are extension attributes inaccessible in python protobuf objects? - python

I am attempting to read and analyze GTFS-realtime data from the NYC subway in Python. So far, I have successfully used both gtfs-realtime.proto and nyct-subway.proto to generate the proper Python classes and parsed the protobuf data into Python objects.
My problem comes when trying to access certain fields in these objects. For example, the header (feed.header) looks like this:
gtfs_realtime_version: "1.0"
incrementality: FULL_DATASET
timestamp: 1533111586
[nyct_feed_header] {
nyct_subway_version: "1.0"
trip_replacement_period {
route_id: "A"
replacement_period {
end: 1533113386
...
I can access the first three attributes using dot access, but not nyct_feed_header. I suspect this is because it is part of the nyct-subway.proto extension, while the other three are part of the original.
I have found this attribute accessible in feed.header.ListFields(), but since that returns a list of (name, attribute) pairs, it is at best awkward to access.
Why aren't attributes from extensions accessible by dot access like the rest of them? Is there a better or more elegant way to access them than by using ListFields?

Extensions are accessed via the Extensions property on an object (see docs). E.g. with GTFS and the NYCT extensions:
import gtfs_realtime_pb2 as gtfs
import nyct_subway_pb2 as nyct
feed = gtfs.FeedMessage()
feed.ParseFromString(...)
feed.entity[0].trip_update.trip.Extensions[nyct.nyct_trip_descriptor].direction

Related

Is it safe to use the default load in ruamel.yaml

I can load and dump YAML files with tags using ruamel.yaml, and the tags are preserved.
If I let my customers edit the YAML document, will they be able to exploit the YAML vulnerabilities because of arbitrary python module code execution? As I understand it ruamel.yaml is derived from PyYAML that has such vulnerabilities according to its documentation.
From your question I deduct you are using .load() method on a YAML() instance as in:
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load(some_path)
and not the old PyYAML load compatible function (which cannot handle unknown tags, and which can be unsafe). The short answer is that yes that is safe as no interpretation of tags is done without calling any tag dependent code, the tags just get assigned (as strings) to known types (CommentedMap, CommentedSeq, TaggedScalar for mapping, sequence and scalar respectively).
The vulnerability in PyYAML comes from its Constructor, that is part of the unsafe Loader. This was the default for PyYAML for the longest time and most people used it even when they could have used the safeloader because they were not using any tags (or could have regeistred the tags they needed against the SafeLoader/Constructor). That unsafe Constructor is a subclass of SafeConstructor and what makes it unsafe are the multi-methods registered for the interpretation of python/{name,module,object,apply,new):.... tags, essentially dynamically interpreting these tags to load modules and run code (i.e. execute functions/methods/instantiate classes).
The initial idea behind ruamel.yaml was its RoundTripConstructor , which is also a subclass of the SafeConstructor. You used to get this using the now deprecated round_trip_load function and nowadays via the .load() method after using YAML(typ='rt'), this is also the default for a YAML() instance without typ argument. That RoundTripConstructor does not registers any tags or have any code that interprets tags above and beyond the normal tags that the SafeConstructor uses.
Only the old PyYAML load and ruamel.yaml using typ='unsafe' can execute arbitrary Python code on doctored YAML input by executing code via the !!python/.. tags.
When you use typ='rt' the tag on nodes is preserved, even when a tag is not registered. This is possible because while processing nodes (in round-trip mode), those tags will just be assigned as strings to an attribute on the constructed type of the tagged node (mapping, sequence, or scalar, with the exception of tagged null). And when dumping these types, if they have a tag, it gets re-inserted into the representation processing code. At no point is the tag evaluated, used to import code, etc.

Work with nested objects using couchdb-python

Disclaimer: Both Python and CouchDB are new for me. So far my "programming" has mostly consisted of Bash scripts.
I'm trying to create a small script that updates objects in a CouchDB database. The objects however aren't created by my script but by an App called Tap Forms that uses CouchDB for sync. Basically I'm trying to automatically update the content of the app. That also means I can't really influence the structure or names of the objects in CouchDB.
The Database is mostly filled with objects of this structure:
{
"_id": "rec-3b17...",
"_rev": "21-cdf6...",
"values": {
"fld-c3d4...": 4,
"fld-1def...": 1000000000000,
"fld-bb44...": 760000000000,
"fld-a44f...": "admin,name",
"fld-5fc0...": "SSD",
"fld-642c...": true,
},
"deviceName": "MacBook Air",
"dateModified": "2019-02-08T14:47:06.051Z",
"dateCreated": "2019-02-08T11:33:00.018Z",
"type": "frm-7ff3...",
"dbID": "db-1435...",
"form": "frm-7ff3..."
}
I shortened the numbers a bit and removed some entries to increase readability.
Now the actual values I'm trying to update are within the "values" : {...} array (or object, or list, guess I don't have much experience with JSON either).
As I know some of these values, I managed to create view that finds the _id of an object on the server. I then use the python-couchdb module as described in documentation:
for item in db.view('CustomViews/test2', key="GENERIC"):
doc = db[item.id]
This gives me the object. However I want to update one of the values within the values array, lets say fld-c3d4.... But how? Using doc['values'] = 'new_value' updates the whole array. I tried other (seemingly logical) ways along the lines of doc['values['fld-c3d4']'] = 'new_value' but couldn't wrap my head around it. I couldn't find an example in any documentation.
So here's a example how to update the fld-c3d4.
You have your document that represent a dictionary with nested dictionary.
If you want to get the values, you will do something like this:
values = doc['values']
Now the variable values points to the values in your document.
From there, you can access a sub value:
values['fld-c3d4'] = 'new value'
If you want to directly update the value from the doc, you just have to chain those operations:
doc['values']['fld-c3d4'] = 'new value'

Access python dict value in yaml with tags

Is it possible to load the value from a python dict in yaml?
I can access variable by using:
!!python/name:mymodule.myfile.myvar
but this give the whole dict.
Trying to use dict get method like so:
test: &TEST !!python/object/apply:mymod.myfile.mydict.get ['mykey']
give me the following error:
yaml.constructor.ConstructorError: while constructing a Python object cannot find module 'mymod.myfile.mydict' (No module named 'mymod.myfile.mydict'; 'mymod.myfile' is not a package)
I'm trying to do that because I have bunch of yaml files which define my project settings, one is for path directory, and I need to load it into some other yaml files and it looks like you cant load yaml variable from another yaml.
EDIT:
I have found one solution, creating my own function who return the values in dict and calling it like so:
test: &TEST !!python/object/apply:mymod.myfile.get_dict_value ['mykey']
There is no mechanism in YAML to refer to one document from another YAML document.
You'll have to do that by interpreting information in the document in the program that loads the initial YAML document. Whether you do that by explicit logic, or by using some tag doesn't make a practical difference.
Please be aware that it is unsafe to allow interpreting tags of the form !!python/name:.....`` (via yaml=YAML(typ='unsafe') in ruamel.yaml, or load() in PyYAML), and is never really necessary.

Is parsing a json naively into a Python class or struct secure?

Some background first: I have a few rather simple data structures which are persisted as json files on disk. These json files are shared between applications of different languages and different environments (like web frontend and data manipulation tools).
For each of the files I want to create a Python "POPO" (Plain Old Python Object), and a corresponding data mapper class for each item should implement some simple CRUD like behavior (e.g. save will serialize the class and store as json file on disk).
I think a simple mapper (which only knows about basic types) will work. However, I'm concerned about security. Some of the json files will be generated by a web frontend, so a possible security risk if a user feeds me some bad json.
Finally, here is the simple mapping code (found at How to convert JSON data into a Python object):
class User(object):
def __init__(self, name, username):
self.name = name
self.username = username
import json
j = json.loads(your_json)
u = User(**j)
What possible security issues do you see?
NB: I'm new to Python.
Edit: Thanks all for your comments. I've found out that I have one json where I have 2 arrays, each having a map. Unfortunately this starts to look like it gets cumbersome when I get more of these.
I'm extending the question to mapping a json input to a recordtype. The original code is from here: https://stackoverflow.com/a/15882054/1708349.
Since I need mutable objects, I'd change it to use a namedlist instead of a namedtuple:
import json
from namedlist import namedlist
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedlist('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
Is it still safe?
There's not much wrong that can happen in the first case. You're limiting what arguments can be provided and it's easy to add validation/conversion right after loading from JSON.
The second example is a bit worse. Packing things into records like this will not help you in any way. You don't inherit any methods, because each type you define is new. You can't compare values easily, because dicts are not ordered. You don't know if you have all arguments handled, or if there is some extra data, which can lead to hidden problems later.
So in summary: with User(**data), you're pretty safe. With namedlist there's space for ambiguity and you don't really gain anything. (compared to bare, parsed json)
If you blindly accept users json input without sanity check, you are at risk of become json injection victim.
See detail explanation of json injection attack here: https://www.acunetix.com/blog/web-security-zone/what-are-json-injections/
Besides security vulnerability, parse JSON to Python object this way is not type safe.
With your example of User class, I would assume you expect both fields name and username to be string type. What if the json input is like this:
{
"name": "my name",
"username": 1
}
j = json.loads(your_json)
u = User(**j)
type(u.username) # int
You have gotten an object with unexpected type.
One solution to make sure type safe is to use json schema to validate input json. more about json schema: https://json-schema.org/

How can I change a Python object into XML?

I am looking to convert a Python object into XML data. I've tried lxml, but eventually had to write custom code for saving my object as xml which isn't perfect.
I'm looking for something more like pyxser. Unfortunately pyxser xml code looks different from what I need.
For instance I have my own class Person
Class Person:
name = ""
age = 0
ids = []
and I want to covert it into xml code looking like
<Person>
<name>Mike</name>
<age> 25 </age>
<ids>
<id>1234</id>
<id>333333</id>
<id>999494</id>
</ids>
</Person>
I didn't find any method in lxml.objectify that takes object and returns xml code.
Best is rather subjective and I'm not sure it's possible to say what's best without knowing more about your requirements. However Gnosis has previously been recommended for serializing Python objects to XML so you might want to start with that.
From the Gnosis homepage:
Gnosis Utils contains several Python modules for XML processing, plus other generally useful tools:
xml.pickle (serializes objects to/from XML)
API compatible with the standard pickle module)
xml.objectify (turns arbitrary XML documents into Python objects)
xml.validity (enforces XML validity constraints via DTD or Schema)
xml.indexer (full text indexing/searching)
many more...
Another option is lxml.objectify.
Mike,
you can either implement object rendering into XML :
class Person:
...
def toXml( self):
print '<Person>'
print '\t<name>...</name>
...
print '</Person>'
or you can transform Gnosis or pyxser output using XSLT.

Categories

Resources