How To: View MFC Doc File in Python - python

I want to use Python to access MFC document files generically? Can CArchive be used to query a file and view the structure, or does Python, in opening the document, need to know more about the document structure in order to view the contents?

I think that the Python code needs to know the document structure.
Maybe you should make a python wrapper of your c++ code.
In this case, I would recommend to use http://sourceforge.net/projects/pycpp/>pycpp which is my opinion a great library for making python extensions in c++.

Related

Python create xml from xsd

I am new with python and need to implement an interface to an accounting tool. I received some XSD Files which describes the interface.
What is the easiest way to generate the XML according to the XSD?
Is there any module I can use?
Do I have to create the XML all by myself and I can use the XSD just to verify it?
How do I best proceed?
I think, generateDS is the solution to your problem.
Starting from chapter 5, the command
python generateDS.py -o people.py -s peoplesubs.py people.xsd
reads the XSD file and creates several classes and subclasses. It generates many data structures and getters and setters for accessing and using data :)
If there is any XML file that complies with that XSD, it can be read straight away by using
import people
rootObject = people.parse('people.xml')
within the code. More information is given in chapter 12.
The aforementioned classes also provide methods to export data as an XML format.
The level of documentation is good and it is highly suggested to use this for any future project.
There are some projects on github that do that by using xmlschema library, for instance fortesp/xsd2xml or miaozn/xsd2xml (python2)
For instance with the former:
xmlgenerator = XMLGenerator('resources/pain.001.001.09.xsd', True, DataFacet())
print(xmlgenerator.execute()) # Output to console
xmlgenerator.write('filename.xml') # Output to file
Unfortunately none of these are properly packaged though.

What python modules should I use to edit a word document then turn it to a pdf?

I want users to be able to create the report template in Microsoft Word, I'll then probably add document fields. Then the script evaluates a number of things adds the appropriate text to the fields then creates a pdf of the filled in form.
So which modules would be best for this? I've looked at reportlab but I need to work from a pre-generated template and that doesn't seem feasible.
If you will use it only under Windows, having Word installed you could use PyWin32 that lets you access the api of the suite. You could also try IronPython as suggested here.
If you need to read a docx template regardless of the platform you could try this outdated extension.
If it suits your application to use a cloud service to populate Doc/DocX files there is a commercial system called Docmosis that can popluate plain-text (or merge) fields and stream back populated PDF documents to your Python system, or deliver via email etc.
You would upload your "template" Doc files to Docmosis via the Website (or api calls) then invoke Docmosis using a https post from your Python code.
Please note I work for the company that created Docmosis.
Hope that helps.

What's a good document standard to use programmatically?

I'm writing a program that requires input in the form of a document, it needs to replace a few values, insert a table, and convert it to PDF. It's written in Python + Qt (PyQt). Is there any well known document standard which can be easily used programmatically? It must be cross platform, and preferably open.
I have looked into Microsoft Doc and Docx, which are binary formats and I can't edit them. Python has bindings for it, but they're only on Windows.
Open Office's ODT/ODF is zipped in an xml file, so I can edit that one but there's no command line utilities or any way to programmatically convert the file to a PDF. Open Office provides bindings, but you need to run Open Office from the command line, start a server, etc. And my clients may not have Open Office installed.
RTF is readable from Python, but I couldn't find any way/libraries to convert RTF documents to PDF.
At the moment I'm exporting from Microsoft Word to HTML, replacing the values and using PyQt to convert it to a PDF. However it loses formatting features and looks awful. I'm surprised there isn't a well known library which lets you edit a variety of document formats and convert them into other formats, am I missing something?
Update: Thanks for the advice, I'll have a look at using Latex.
Thanks,
Jackson
Have you looked into using LaTeX documents?
They are perfect to use programatically (compiling documents? You gotta love that...), and you have several Python frameworks you can use such as plasTeX and PyTex.
Exporting a LaTeX documents to PDF is almost immediate.
Since you're already using PyQt anyway, it might be worth looking at Qt's built-in RTF processing module which looks decent. Here's the documentation on detailed content manipulation including inserting tables. Also the QPrinter module's default print-to-file format happens to be PDF.
Without knowing more about your particular needs it's hard to say if these would do what you want, but since your application already has PyQt as a dependency, seems silly to introduce any more without evaluating the functionality you've already got available.
The non-GUI parts of the Qt framework are often overlooked though.
edit: included more links.
You might want to try ReportLab. The open source version can write PDFs, and the commercial version has a lot of really nice abstractions to allow output to a variety of different formats from a single input.
I don't know the kind of odience of your program, Tex is good and i would go with it.
Another possible choice is Excel format, parsing it with xlrd.
I've used it a couple of time and it's pretty straightforward.
Excel file is a good for the following reasons:
Well known format easy to edit
You could prepare a predefined template with constrains and table
Creating XML documents, transforming them to XSL/fo and rendering with Fop or RenderX. If you use docbook as the primary input, there are toolchains freely available for converting that to PDF, RTF, HTML and so forth.
It is rather quirky to use and not my idea of fun, but is does deliver and can be embedded in an application, AFAICT.
Creating docbook is very straightforward as it has a wide range of semantic tags, table support etc to give a "meaningful" markup which can be reliably formatted. The XSL stylesheets are modular and allow parts to be customized or replaced to generate your own look and feel.
It works well for relatively free flow documents with lots of text.
For filling in the blanks kind of documents, a regular reporting engine may be a better fit, or some straighforward XSL stylesheets spitting out the XSL-fo directly.

Using Sphinx to create context-sensitive help files in HTML

I am currently using AsciiDoc for documenting my software projects because it supports PDF and HTML help generation. I am currently running it through Cygwin so that the a2x toolchain functions properly. This works well for me but is a pain to setup on other Windows computers. I have been looking for alternative methods and recently revisited Sphinx. Noticing that it now produces HTML help files I gave it a try and it seems to work well in the small tests I performed.
My question is, is there a way to specify map id's for context sensitive help in the text so that my Windows programs can call the proper help API and the file is launched and opened to the desired location?
In AsciiDoc I am using pass::[<?dbhh topicname="_about" topicid="801"?>]. By using these constructs a context.h and alias.h are generated along with the other HTML help files (context sensitive help information).
I do not know about AcsiiDoc much, but in Sphinx you can reference arbitrary locations by placing anchors where you need them. See :ref: role.

Are there any libraries for generating Python source?

I'd like to write a code generation tool that will allow me to create sourcefiles for dynamically generated classes. I can create the class and use it in code, but it would be nice to have a sourcefile both for documentation and to allow something to import.
Does such a thing exist? I've seen sourcecodegen, but I'd rather avoid messing with the ast trees as they're not portable.
I'm not aware of any off-the-shelf library, but have a look at the Python templating engines Mako and Jinja2. They can both generate Python source behind the scenes (they convert text templates to Python code and then to Python bytecode).

Categories

Resources