I'm trying to parse custom XML file formats with PyXB. So, I first wrote the following XML schema:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="outertag" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:all>
<xs:element name="innertag0"
minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="innertag1"
minOccurs="0"
maxOccurs="unbounded"/>
</xs:all>
</xs:complexType>
</xs:element>
</xs:schema>
I used the following pyxbgen command to generate the Python module's source, py_schema_module.py:
pyxbgen -m py_schema_module -u schema.xsd
I then wrote the following script for parsing an XML file I call example.xml:
#!/usr/bin/env python2.7
import py_schema_module
if __name__ == "__main__":
with open("example.xml", "r") as f:
py_schema_module.CreateFromDocument(f.read())
I use that script to determine the legality of example.xml's syntax. For instance, the following example.xml file has legal syntax per the schema:
<outertag>
<innertag0></innertag0>
<innertag1></innertag1>
</outertag>
So does this:
<outertag>
<innertag1></innertag1>
<innertag0></innertag0>
</outertag>
However, the following syntax is illegal:
<outertag>
<innertag1></innertag1>
<innertag0></innertag0>
<innertag1></innertag1>
</outertag>
So is this:
<outertag>
<innertag0></innertag0>
<innertag1></innertag1>
<innertag0></innertag0>
</outertag>
I am able to write innertag0 and then innertag1. I am also able to write innertag1 and then innertag0. I can also repeat the instances of innertag0 and innertag1 arbitrarily (examples not shown for the sake of brevity). However, what I cannot do is switch between innertag0 and innertag1.
Let's assume I want the format to support this functionality. How should I alter my XML schema file?
The following XML Schema (XSD) 1.0 should cover your use case regardless of the sequential order of the innertag(0|1) element. Default value for both minOccurs and maxOccurs is 1.
Useful link: XML schema, why xs:group can't be child of xs:all?
XML
<outertag>
<innertag1></innertag1>
<innertag0></innertag0>
</outertag>
XSD
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="outertag">
<xs:complexType>
<xs:all>
<xs:element name="innertag0" type="xs:string"/>
<xs:element name="innertag1" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>
</xs:schema>
Your schema processor doesn't seem to be doing very careful checking against the spec.
If I try to process your schema as an XSD 1.0 schema with Saxon, it tells me there are four errors:
Error at xs:element on line 3 column 59 of test.xsd:
Attribute #minOccurs is not allowed on element <xs:element>
Error at xs:element on line 3 column 59 of test.xsd:
Attribute #maxOccurs is not allowed on element <xs:element>
Error at xs:all on line 5 column 15 of test.xsd:
Within <xs:all>, an <xs:element> must have #maxOccurs equal to 0 or 1
Error at xs:all on line 5 column 15 of test.xsd:
Within <xs:all>, an <xs:element> must have #maxOccurs equal to 0 or 1
Schema processing failed: 4 errors were found while processing the schema
The first two say that minOccurs and maxOccurs are not allowed on a global element declaration.
The second two say that maxOccurs must be 1 within xs:all - XSD 1.0 doesn't allow an element to repeat when the content model is xs:all. Your processor told you it was an error in the XML instance, but it's actually an error in your schema.
XSD 1.1 does allow multiple occurrences within xs:all. If I correct the global element declaration by deleting #minOccurs and #maxOccurs, the schema is now valid under XSD 1.1, and allows the interleaved instance examples that you were having trouble with.
Related
I have wsdl with ArrayOfVEHICLE type:
<xs:complexType name="ArrayOfVEHICLE">
<xs:sequence>
<xs:choice maxOccurs="unbounded" minOccurs="0">
<xs:element name="VEHICLE" nillable="true" type="tns:VEHICLE"/>
<xs:element name="VEHICLEV2" nillable="true" type="tns:VEHICLEV2"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
I am trying to create element with that type with zeep:
vehicle_v2_type = client.get_type("ns0:ArrayOfVEHICLE")
vehicle_v2 = vehicle_v2_type(VEHICLEV2={...})
And I get an error:
TypeError: {http://www.vsk.ru}ArrayOfVEHICLE() got an unexpected keyword argument 'VEHICLE2'. Signature: `({VEHICLE: {http://www.vsk.ru}VEHICLE} | {VEHICLEV2: {http://www.vsk.ru}VEHICLEV2})[]`
I have tried using _value_1 method from zeep docs like this:
vehicle_v2 = vehicle_v2_type(_value_1={"VEHICLEV2": {...}})
And I get another error:
TypeError: No complete xsd:Sequence found for the xsd:Choice '_value_1'.
The signature is: ({VEHICLE: {http://www.vsk.ru}VEHICLE} | {VEHICLEV2: {http://www.vsk.ru}VEHICLEV2})[]
Anybody knows how to create that element with zeep?
Ok, i got it. My wsdl says that choise element got to be list, because of signature:
<xs:choice maxOccurs="unbounded" minOccurs="0">
And the easy way is to create Nested list using _value_1, without factories in my case
client.service.SomeService(
...
vehicles={ # Element with ArrayOfVEHICLE type
"_value_1" : [
{
"VEHICLE2": {...}
}
]
}
)
Hope this wil help someone
I am currently trying to parse a XSD file in python using the lxml library.
For testing purposes I copied the following file together:
<xs:schema targetNamespace="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="BaselineShiftValueType">
<xs:annotation>
<xs:documentation>The actual definition is
baseline | sub | super | <percentage> | <length> | inherit
not sure that union can do this
</xs:documentation>
</xs:annotation>
<xs:restriction base="string"/>
</xs:simpleType>
</xs:schema>
Now I tried to get the children of the root (schema), which would be: xs:element and xs:simpleType.
By iterating over the children of the root, everything works fine:
root = self.XMLTree.getroot()
for child in root:
print("{}: {}".format(child.tag, child.attrib))
This leads to the output:
{http://www.w3.org/2001/XMLSchema}element: {'name': 'note'}
{http://www.w3.org/2001/XMLSchema}simpleType: {'name': 'BaselineShiftValueType'}
But when I want to have only children of a certain type, it does not work:
root = self.XMLTree.getroot()
element = self.XMLTree.find("element")
print(str(element))
This gives me the following output:
None
Also using findall or writing ./element or .//element does not change the result.
I am quite sure I am missing something. What is the right way to do this?
You are missing the namespace. Unprefixed XPath selectors are considered as belonging to no namespace. You will have to register it with register_namespace:
self.XMLTree.register_namespace('xs',"http://www.w3.org/2001/XMLSchema")
and then use prefixed selectors to find your elements:
element = self.XMLTree.find("xs:element")
To follow the #helderdarocha's answer, you can also define your namespace in a dictionary and use it in your search functions like in the python xml.etree.ElementTree doc:
ns = {'xs',"http://www.w3.org/2001/XMLSchema"}
element = self.XMLTree.find("element", ns)
I am having a hard time getting started with PyXB.
Say I have an XSD file (an XML schema). I would like to:
Use PyXB to define Python objects according to the schema.
Save those objects to disk as XML files that satisfy the schema.
How can I do this with PyXB? Below is a simple example of an XSD file (from Wikipedia) that encodes an address, but I am having a hard time even getting started.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="FullName" type="xs:string" />
<xs:element name="House" type="xs:string" />
<xs:element name="Street" type="xs:string" />
<xs:element name="Town" type="xs:string" />
<xs:element name="County" type="xs:string" minOccurs="0" />
<xs:element name="PostCode" type="xs:string" />
<xs:element name="Country" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="IN" />
<xs:enumeration value="DE" />
<xs:enumeration value="ES" />
<xs:enumeration value="UK" />
<xs:enumeration value="US" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Update
Once I run
pyxbgen -u example.xsd -m example
I get a example.py that has the following classes:
example.Address example.STD_ANON
example.CTD_ANON example.StringIO
example.CreateFromDOM example.pyxb
example.CreateFromDocument example.sys
example.Namespace
I think I understand what CreateFromDocument does - it presumably reads an XML and creates the corresponding python object-, but which class do I use to create a new object and then save it to an XML?
A simple google search brings this: http://pyxb.sourceforge.net/userref_pyxbgen.html#pyxbgen
In particular the part that says:
Translate this into Python with the following command:
pyxbgen -u po1.xsd -m po1
The -u parameter identifies a schema
document describing contents of a namespace. The parameter may be a
path to a file on the local system, or a URL to a network-accessible
location like
http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd. The -m
parameter specifies the name to be used by the Python module holding
the bindings generated for the namespace in the preceding schema.
After running this, the Python bindings will be in a file named
po1.py.
EDIT Following your update:
Now that you have your generated Address class and all the associated helpers, look at http://pyxb.sourceforge.net/userref_usebind.html in order to learn how to use them. For your specific question, you want to study the "Creating Instances in Python Code" paragraph. Basically to generate XML from your application data you simply do:
import example
address = Address()
address.FullName = "Jo La Banane"
# fill other members of address
# ...
with open('myoutput.xml', 'w') as file
f.write(address.toxml("utf-8"))
Now it's up to you to be curious and read the code being generated, pyxb's doc, call the various generated methods and experiment!
I'm having a problem with soaplib.
I've the following function provided by a web service :
#soap(Integer, Integer, _returns=Integer)
def test(self, n1, n2):
return n1 + n2
The corresponding declaration for the datatypes in the generated WSDL file is
<xs:complexType name="test">
<xs:sequence>
<xs:element name="n1" type="xs:integer" minOccurs="0" nillable="true"/>
<xs:element name="n2" type="xs:integer" minOccurs="0" nillable="true"/>
</xs:sequence>
</xs:complexType>
<xs:complexType> name="testResponse">
<xs:sequence>
<xs:element name="testResult" type="xs:integer" minOccurs="0" nillable="true"/>
</xs:sequence>
</xs:complexType>
When I use some IDE (Visual Studio, PowerBuilder) to generate code from that WSDL file, whatever the IDE, it generates two classes for test and for testResponse, whose attributes are Strings.
Does anyone know if I can tweak my Python declaration so that I avoid complexType and obtain real Integer datatype on my client side?
I checked your code but i get same output. I am using suds to parse the values.
In [3]: from suds import client
In [4]: cl = client.Client('http://localhost:8080/?wsdl')
In [5]: cl.service.test(10,2)
Out[5]: 12
But when i check the type of that value.
In [6]: type(cl.service.test(10,2))
Out[6]: <class 'suds.sax.text.Text'>
So SOAPLIB will be return string but from type of that data you can convert it.
I check the response by writing this
#soap(_returns=Integer)
def test(self):
return 12
So i get in SOA Client Plugin of Firefox response as
<?xml version='1.0' encoding='utf-8'?>
<senv:Envelope
xmlns:wsa="http://schemas.xmlsoap.org/ws/2003/03/addressing"
xmlns:plink="http://schemas.xmlsoap.org/ws/2003/05/partner-link/"
xmlns:xop="http://www.w3.org/2004/08/xop/include"
xmlns:senc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:s12env="http://www.w3.org/2003/05/soap-envelope/"
xmlns:s12enc="http://www.w3.org/2003/05/soap-encoding/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:senv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/">
<senv:Body>
<tns:testResponse>
<tns:testResult>
12
</tns:testResult>
</tns:testResponse>
</senv:Body>
</senv:Envelope>
From XML you cant get raw integer data.
OK, not all datatypes of XSD are defined in soaplib.
Integer is defined in soaplib and is seen in the WSDL file as an integer, which the .NET framework (used by PowerBuilder) failed to understand.
Int is OK for .NET/PowerBuilder but soaplib isn't defined in soaplib.
Thus, I moved from soaplib to rpclib. Those libs are very close (one is a fork of the other).
Fought with the same thing, but couldn't move away from soaplib.
So, I monkeypatch it this way:
from soaplib.serializers.primitive import Integer
class BetterInteger(Integer):
__type_name__ = "int"
Integer = BetterInteger
And then move on with life.
However, the XSD spec defines both 'integer': "Represents a signed integer. Values may begin with an optional "+" or "-" sign. Derived from the decimal datatype." and 'int' "Represents a 32-bit signed integer in the range [-2,147,483,648, 2,147,483,647]. Derived from the long datatype."
So, the better solution is:
from soaplib.serializers.primitive import Integer
class Int32(Integer):
__type_name__ = "int"
And use your new 'Int32' class to type your input parameters.
[soaplib 1.0.0]
I am using lxml to parse an xsd file and am looking for an easy way to remove the URL namespace attached to each element name. Here's the xsd file:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="rootelement">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element minOccurs="1" maxOccurs="1" name="element1">
<xs:complexType>
<xs:all>
<xs:element name="subelement1" type="xs:string" />
<xs:element name="subelement2" type="xs:integer" />
<xs:element name="subelement3" type="xs:dateTime" />
</xs:all>
<xs:attribute name="id" type="xs:integer" use="required" />
</xs:complexType>
</xs:element>
</xs:choice>
<xs:attribute fixed="2.0" name="version" type="xs:decimal" use="required" />
</xs:complexType>
</xs:element>
</xs:schema>
and using this code:
from lxml import etree
parser = etree.XMLParser()
data = etree.parse(open("testschema.xsd"),parser)
root = data.getroot()
rootelement = root.getchildren()[0]
rootelementattribute = rootelement.getchildren()[0].getchildren()[1]
print "root element tags"
print rootelement[0].tag
print rootelementattribute.tag
elements = rootelement.getchildren()[0].getchildren()[0].getchildren()
elements_attribute = elements[0].getchildren()[0].getchildren()[1]
print "element tags"
print elements[0].tag
print elements_attribute.tag
subelements = elements[0].getchildren()[0].getchildren()[0].getchildren()
print "subelements"
print subelements
I get the following output
root element tags
{http://www.w3.org/2001/XMLSchema}complexType
{http://www.w3.org/2001/XMLSchema}attribute
element tags
{http://www.w3.org/2001/XMLSchema}element
{http://www.w3.org/2001/XMLSchema}attribute
subelements
[<Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb16e0>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb1780>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb17d0>]
I don't want "{http://www.w3.org/2001/XMLSchema}" to appear at all when I pull the tag data (altering the xsd file is not an option). The reason I need the xsd tag info is that I am using this to validate column names from a series of flat files. On the "element" level there are multiple elements that I'm pulling, as well as subelements, which I am using a dictionary to validate columns. Also, any suggestions on improving the code above would be greatly, such as a way to use fewer "getchildren" calls, or just make it more organized.
I'd use:
print elem.tag.split('}')[-1]
But you could also use the xpath function local-name():
print elem.xpath('local-name()')
As for fewer getchildren() calls: just leave them out. getchildren() is a deprecated way of making a list of the direct children (you should just use list(elem) instead if you actually want this).
You can iterate over, or use an index on an element directly. For example: rootelement[0] will give you the first child element of rootelement (but more efficient than if you were use rootelement.getchildren()[0], because this would act like list(rootelement) and create a new list first)
I wonder why etree.XMLParser(ns_clean=True) doesn't work. It had not worked for me so did it getting namespace from root.nsmap between brackets and replacing it with empty string
print rootelement[0].tag.replace('{%s}' %root.nsmap['xs'], '')
The easiest thing to do is to just use string slicing to remove namespace prefix:
>>> print rootelement[0].tag[34:]
complexType
If the URI might change in the future (for some unknown reason or you're truly paranoid), consider the following:
print "root element tags"
tag, nsmap, prefix = rootelement[0].tag, rootelement[0].nsmap, rootelement[0].prefix
tag = tag[len(nsmap[prefix]) + 2:]
print tag
This is a very unlikely case, but who knows?