How to turn Perl blessed objects into YAML that Python can read - python

We have a REST web service written in Perl Dancer. It returns perl data structures in YAML format and also takes in parameters in YAML format - it is supposed to work with some other teams who query it using Python.
Here's the problem -- if I'm passing back just a regular old perl hash by Dancer's serialization everything works completely fine. JSON, YAML, XML... they all do the job.
HOWEVER, sometimes we need to pass Perl objects back that the Python can later pass back in as a parameter to help with unnecessary loading, etc. I played around and found that YAML is the only one that works with Perl's blessed objects in Dancer.
The problem is that Python's YAML can't parse through the YAMLs of the Perl objects (whereas it can handle regular old perl hash YAMLs without an issue).
The perl objects start out like this in YAML:
First one:
--- &1 !!perl/hash:Sequencing_API
Second:
--- !!perl/hash:SDB::DBIO
It errors out like this.
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:perl/hash:SDB::DBIO'
The regular files seem to get passed through like this:
---
fields:
library:
It seems like the extra stuff after --- are causing the issues. What can I do to address this? Or am I trying to do too much by passing around Perl objects?

the short answer is
!! is yaml shorthand for tag:yaml.org,2002: ... as such !!perl/hash is really tag:yaml.org,2002:perl/hash
now you need to tell python yaml how to deal with this type
so you add a constructor for it as follows
import yaml
def construct_perl_object(loader, node):
print "S:",suffix,"N:",node
return loader.construct_yaml_node(node)#this is likely wrong ....
yaml.add_multi_constructor(u"tag:yaml.org,2002:perl/hash:SDB::DBIO", construct_perl_object)
yaml.load(yaml_string)
or maybe just parse it out or return None maybe ... its hard to test with just that line ... but that may be what you are looking for

Related

Is there a Protostuff-equivalent Protobuff library for Python?

In Java, libraries like protostuff allow you to generate buffers from a Java POJO approximately like so:
Schema<Foo> schema = RuntimeSchema.getSchema(Foo.class);
...
protostuff = ProtostuffIOUtil.toByteArray(foo, schema, buffer);
I've been trying to find a similar solution for Python, but except for attempting to programmatically build Descriptors and FieldDescriptors (which come with their own challenges and problems since 4.x.x), I couldn't find anything. Is this simply impossible in Python, just isn't implemented anywhere, or am I missing something obvious here?
The alternative Python protobuf libraries of pure-protobuf and python-betterproto both have their own syntax that can be used directly from Python, without a .proto file.
It however doesn't work for completely plain Python objects, as you still need to specify the field types and tag numbers (example from pure-protobuf):
#message
#dataclass
class SearchRequest:
query: str = field(1, default='')
page_number: int32 = field(2, default=int32(0))
result_per_page: int32 = field(3, default=int32(0))

Is it safe to use the default load in ruamel.yaml

I can load and dump YAML files with tags using ruamel.yaml, and the tags are preserved.
If I let my customers edit the YAML document, will they be able to exploit the YAML vulnerabilities because of arbitrary python module code execution? As I understand it ruamel.yaml is derived from PyYAML that has such vulnerabilities according to its documentation.
From your question I deduct you are using .load() method on a YAML() instance as in:
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load(some_path)
and not the old PyYAML load compatible function (which cannot handle unknown tags, and which can be unsafe). The short answer is that yes that is safe as no interpretation of tags is done without calling any tag dependent code, the tags just get assigned (as strings) to known types (CommentedMap, CommentedSeq, TaggedScalar for mapping, sequence and scalar respectively).
The vulnerability in PyYAML comes from its Constructor, that is part of the unsafe Loader. This was the default for PyYAML for the longest time and most people used it even when they could have used the safeloader because they were not using any tags (or could have regeistred the tags they needed against the SafeLoader/Constructor). That unsafe Constructor is a subclass of SafeConstructor and what makes it unsafe are the multi-methods registered for the interpretation of python/{name,module,object,apply,new):.... tags, essentially dynamically interpreting these tags to load modules and run code (i.e. execute functions/methods/instantiate classes).
The initial idea behind ruamel.yaml was its RoundTripConstructor , which is also a subclass of the SafeConstructor. You used to get this using the now deprecated round_trip_load function and nowadays via the .load() method after using YAML(typ='rt'), this is also the default for a YAML() instance without typ argument. That RoundTripConstructor does not registers any tags or have any code that interprets tags above and beyond the normal tags that the SafeConstructor uses.
Only the old PyYAML load and ruamel.yaml using typ='unsafe' can execute arbitrary Python code on doctored YAML input by executing code via the !!python/.. tags.
When you use typ='rt' the tag on nodes is preserved, even when a tag is not registered. This is possible because while processing nodes (in round-trip mode), those tags will just be assigned as strings to an attribute on the constructed type of the tagged node (mapping, sequence, or scalar, with the exception of tagged null). And when dumping these types, if they have a tag, it gets re-inserted into the representation processing code. At no point is the tag evaluated, used to import code, etc.

Access python dict value in yaml with tags

Is it possible to load the value from a python dict in yaml?
I can access variable by using:
!!python/name:mymodule.myfile.myvar
but this give the whole dict.
Trying to use dict get method like so:
test: &TEST !!python/object/apply:mymod.myfile.mydict.get ['mykey']
give me the following error:
yaml.constructor.ConstructorError: while constructing a Python object cannot find module 'mymod.myfile.mydict' (No module named 'mymod.myfile.mydict'; 'mymod.myfile' is not a package)
I'm trying to do that because I have bunch of yaml files which define my project settings, one is for path directory, and I need to load it into some other yaml files and it looks like you cant load yaml variable from another yaml.
EDIT:
I have found one solution, creating my own function who return the values in dict and calling it like so:
test: &TEST !!python/object/apply:mymod.myfile.get_dict_value ['mykey']
There is no mechanism in YAML to refer to one document from another YAML document.
You'll have to do that by interpreting information in the document in the program that loads the initial YAML document. Whether you do that by explicit logic, or by using some tag doesn't make a practical difference.
Please be aware that it is unsafe to allow interpreting tags of the form !!python/name:.....`` (via yaml=YAML(typ='unsafe') in ruamel.yaml, or load() in PyYAML), and is never really necessary.

How can I change a Python object into XML?

I am looking to convert a Python object into XML data. I've tried lxml, but eventually had to write custom code for saving my object as xml which isn't perfect.
I'm looking for something more like pyxser. Unfortunately pyxser xml code looks different from what I need.
For instance I have my own class Person
Class Person:
name = ""
age = 0
ids = []
and I want to covert it into xml code looking like
<Person>
<name>Mike</name>
<age> 25 </age>
<ids>
<id>1234</id>
<id>333333</id>
<id>999494</id>
</ids>
</Person>
I didn't find any method in lxml.objectify that takes object and returns xml code.
Best is rather subjective and I'm not sure it's possible to say what's best without knowing more about your requirements. However Gnosis has previously been recommended for serializing Python objects to XML so you might want to start with that.
From the Gnosis homepage:
Gnosis Utils contains several Python modules for XML processing, plus other generally useful tools:
xml.pickle (serializes objects to/from XML)
API compatible with the standard pickle module)
xml.objectify (turns arbitrary XML documents into Python objects)
xml.validity (enforces XML validity constraints via DTD or Schema)
xml.indexer (full text indexing/searching)
many more...
Another option is lxml.objectify.
Mike,
you can either implement object rendering into XML :
class Person:
...
def toXml( self):
print '<Person>'
print '\t<name>...</name>
...
print '</Person>'
or you can transform Gnosis or pyxser output using XSLT.

Python: Matching & Stripping port number from socket data

I have data coming in to a python server via a socket. Within this data is the string '<port>80</port>' or which ever port is being used.
I wish to extract the port number into a variable. The data coming in is not XML, I just used the tag approach to identifying data for future XML use if needed. I do not wish to use an XML python library, but simply use something like regexp and strings.
What would you recommend is the best way to match and strip this data?
I am currently using this code with no luck:
p = re.compile('<port>\w</port>')
m = p.search(data)
print m
Thank you :)
Regex can't parse XML and shouldn't be used to parse fake XML. You should do one of
Use a serialization method that is nicer to work with to start with, such as JSON or an ini file with the ConfigParser module.
Really use XML and not something that just sort of looks like XML and really parse it with something like lxml.etree.
Just store the number in a file if this is the entirety of your configuration. This solution isn't really easier than just using JSON or something, but it's better than the current one.
Implementing a bad solution now for future needs that you have no way of defining or accurately predicting is always a bad approach. You will be kept busy enough trying to write and maintain software now that there is no good reason to try to satisfy unknown future needs. I have never seen a case where "I'll put this in for later" has led to less headache later on, especially when I put it in by doing something completely wrong. YAGNI!
As to what's wrong with your snippet other than using an entirely wrong approach, angled brackets have a meaning in regex.
Though Mike Graham is correct, using regex for xml is not 'recommended', the following will work:
(I have defined searchType as 'd' for numerals)
searchStr = 'port'
if searchType == 'd':
retPattern = '(<%s>)(\d+)(</%s>)'
else:
retPattern = '(<%s>)(.+?)(</%s>)'
searchPattern = re.compile(retPattern % (searchStr, searchStr))
found = searchPattern.search(searchStr)
retVal = found.group(2)
(note the complete lack of error checking, that is left as an exercise for the user)

Categories

Resources