Python create xml from xsd

Python create xml from xsd - python

I am new with python and need to implement an interface to an accounting tool. I received some XSD Files which describes the interface.
What is the easiest way to generate the XML according to the XSD?
Is there any module I can use?
Do I have to create the XML all by myself and I can use the XSD just to verify it?
How do I best proceed?

I think, generateDS is the solution to your problem.
Starting from chapter 5, the command
python generateDS.py -o people.py -s peoplesubs.py people.xsd
reads the XSD file and creates several classes and subclasses. It generates many data structures and getters and setters for accessing and using data :)
If there is any XML file that complies with that XSD, it can be read straight away by using
import people
rootObject = people.parse('people.xml')
within the code. More information is given in chapter 12.
The aforementioned classes also provide methods to export data as an XML format.
The level of documentation is good and it is highly suggested to use this for any future project.

There are some projects on github that do that by using xmlschema library, for instance fortesp/xsd2xml or miaozn/xsd2xml (python2)
For instance with the former:
xmlgenerator = XMLGenerator('resources/pain.001.001.09.xsd', True, DataFacet())
print(xmlgenerator.execute()) # Output to console
xmlgenerator.write('filename.xml') # Output to file
Unfortunately none of these are properly packaged though.

Related

Parse a python file to a XML template in Apache Velocity

I have a Python source file which I would like to convert, according to a certain template written in Apache Velocity, to an XML file.
Is it possible to read the contents of a python code file and extract the necessary information in native Velocity language? Or do I need to write a python script (or some other language)to parse the python file to the xml template?

As you have identified, you need to have two distinct phases:
Extract information from the Python source file (aka parsing)
Publish this information back to an XML file
Apache Velocity can help you accomplish step 1, but knows nothing about parsing.
There are several ways to achieve the parsing step:
you could parse it with whatever tool you like, and publish it in a format easily understandable for Velocity, like a Properties file that you would put in the context. If you need to use hierarchical properties, have a look at the ValueParser tool, which can return submaps from sets of properties like foo.bar=woogie and foo.schmoo=wiggie.
you could give a try to Stillness, a parsing tool that uses a Velocity-like syntax (disclaimer: I wrote it).

Map xml file with extracted information

I am trying to map one xml file to another, based on a configuration file (that too can be an xml file).
Input
<ia>
<ib>...</ib>
<ic>...</ic>
</ia>
Output
<oa>
<ob>...</ob>
<oc>...</oc>
</oa>
Config
<config>
<conf>
<input>ia</input>
<output>oa</output>
</conf>
<conf>
<input>ib</input>
<output>ob</output>
</conf>
.....
</config>
So, the intention is to parse an xml file and retrieve the information interesting to me, and write into another xml file where the mapping information is specified in the config file.
Due to the scripting nature (and extending with plugins lateron), and the support for xml processing I was considering python. I just learned the syntax and basics of language, and came to know about lxml
One way of doing this
parse the config file (where , tag can have xpath to the node I am interested in)
read the input file
write into output, using etbuilder based on the config file
Being new to python, and not seeing xpath support for etbuilder I wonder is this the best approach. Also not sure all the exceptional cases. Is there an easier way, or native support in any other libraries. If possible, I do not want to spend too much time on this task as I could focus on the core task.
thanks ins advance.

If you wish to transform an XML file into another XML file then XSLT was made for this purpose. You have to define a .xslt file that describes the transformation of XML content and what the eventual output should look like. That's one way to do it.
You can also read the XML file using lxml and generate the output XML with lxml.etree.ElementTree. I'm not familiar with etbuilder but I don't think generating the desired output is that difficult. Once you have parsed the input files, you can build the config XML and write it to a file.
XPath is primarily for reading XML content, you don't need it for constructing XML files. In fact, if you use a proper XML parser then you don't need XPath either to read the file contents, although XPath could make life a bit easier.

How do I deal with conflicting names when building python docs with doxygen

I'm having a problem with Doxygen for Windows with Python where input files with the same failename cause a conflict wth the output files. This seems to be a bug in doxygen - is there a way to work-around this problem?
Background
We build docs for our API using Doxygen. Our project is overwhelmingly written in python and the only components that our clients care about are python. Due to accidents of history our classes often have unfortunate naming conventions.
For example we have a classes whose fully-qualified name are:
tools.b.foo.Foo
tools.b.bar.Bar
Later this class was re-implemented and put into a new module:
tools.c.foo.Foo_improved
tools.c.bar.Bar_improved
When we want to build our tools API documentation we have a process which checks out tools.* into a directory on the build-server and then we call doxygen with a fairly standard configuration file.
We'd expect that there should be four HTML files in the output, two for foo and two for bar. However what we get is only two files. Both sets of sripts are parsed, however since the module names are the same the documentation for the old version ends up over-writing the documentation which was generated for the new versions. As a result in every case where a python module name is duplicated (but in a different sub-package) we are only getting a single doc file for every file name.
FYI, we are using doxygen 1.7.1 on Windows XP 32bit with Python 2.4.4
Config file is here:
http://pastebin.me/002f3ec3145f4e1896a9cf79e7179493
UPDATE 1: In the generated doc index I can see entries for all four files, however if I follow the links to both Foo and Foo_improved both point to the same file.

You could try explicitly declaring a class w/ full namespace
http://www.doxygen.nl/manual/commands.html#cmdclass

How To: View MFC Doc File in Python

I want to use Python to access MFC document files generically? Can CArchive be used to query a file and view the structure, or does Python, in opening the document, need to know more about the document structure in order to view the contents?

I think that the Python code needs to know the document structure.
Maybe you should make a python wrapper of your c++ code.
In this case, I would recommend to use http://sourceforge.net/projects/pycpp/>pycpp which is my opinion a great library for making python extensions in c++.

ElementTree in Python 2.6.2 Processing Instructions support?

I'm trying to create XML using the ElementTree object structure in python. It all works very well except when it comes to processing instructions. I can create a PI easily using the factory function ProcessingInstruction(), but it doesn't get added into the elementtree. I can add it manually, but I can't figure out how to add it above the root element where PI's are normally placed. Anyone know how to do this? I know of plenty of alternative methods of doing it, but it seems that this must be built in somewhere that I just can't find.

Try the lxml library: it follows the ElementTree api, plus adds a lot of extras. From the compatibility overview:
ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.
You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.
Not in the stdlib, I know, but in my experience the best bet when you need stuff that the standard ElementTree doesn't provide.

With the lxml API it couldn't be easier, though it is a bit "underdocumented":
If you need a top-level processing instruction, create it like this:
from lxml import etree
root = etree.Element("anytagname")
root.addprevious(etree.ProcessingInstruction("anypi", "anypicontent"))
The resulting document will look like this:
<?anypi anypicontent?>
<anytagname />
They certainly should add this to their FAQ because IMO it is another feature that sets this fine API apart.

Yeah, I don't believe it's possible, sorry. ElementTree provides a simpler interface to (non-namespaced) element-centric XML processing than DOM, but the price for that is that it doesn't support the whole XML infoset.
There is no apparent way to represent the content that lives outside the root element (comments, PIs, the doctype and the XML declaration), and these are also discarded at parse time. (Aside: this appears to include any default attributes specified in the DTD internal subset, which makes ElementTree strictly-speaking a non-compliant XML processor.)
You can probably work around it by subclassing or monkey-patching the Python native ElementTree implementation's write() method to call _write on your extra PIs before _writeing the _root, but it could be a bit fragile.
If you need support for the full XML infoset, probably best stick with DOM.

I don't know much about ElementTree. But it is possible that you might be able to solve your problem using a library I wrote called "xe".
xe is a set of Python classes designed to make it easy to create structured XML. I haven't worked on it in a long time, for various reasons, but I'd be willing to help you if you have questions about it, or need bugs fixed.
It has the bare bones of support for things like processing instructions, and with a little bit of work I think it could do what you need. (When I started adding processing instructions, I didn't really understand them, and I didn't have any need for them, so the code is sort of half-baked.)
Take a look and see if it seems useful.
http://home.avvanta.com/~steveha/xe.html
Here's an example of using it:
import xe
doc = xe.XMLDoc()
prefs = xe.NestElement("prefs")
prefs.user_name = xe.TextElement("user_name")
prefs.paper = xe.NestElement("paper")
prefs.paper.width = xe.IntElement("width")
prefs.paper.height = xe.IntElement("height")
doc.root_element = prefs
prefs.user_name = "John Doe"
prefs.paper.width = 8
prefs.paper.height = 10
c = xe.Comment("this is a comment")
doc.top.append(c)
If you ran the above code and then ran print doc here is what you would get:
<?xml version="1.0" encoding="utf-8"?>
<!-- this is a comment -->
<prefs>
<user_name>John Doe</user_name>
<paper>
<width>8</width>
<height>10</height>
</paper>
</prefs>
If you are interested in this but need some help, just let me know.
Good luck with your project.

f = open('D:\Python\XML\test.xml', 'r+')
old = f.read()
f.seek(44,0) #place cursor after xml declaration
f.write('<?xml-stylesheet type="text/xsl" href="C:\Stylesheets\expand.xsl"?>'+ old[44:])
I was facing the same problem and came up with this crude solution after failing to insert the PI into the .xml file correctly even after using one of the Element methods in my case root.insert (0, PI) and trying multiple ways to cut and paste the inserted PI to the correct location only to find the data to be deleted from unexpected locations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.