pdf to chm handling libraries for python3 - python

I've tried to find a software for pdf to chm conversion to convert my pdf e-books to chm, but I ended up disappointed.
So, as a pythonian, I decided to create my own program to convert pdf files to chm, however, all pdf/chm libraries I found are python2 libraries.
Are there python 3 libraries to handle pdf/chm files?

If you want to write CHM, then afaik only Free Pascal (and therefore Delphi with minimal effort) has a free CHM generator library.
All other tools use the Microsoft commandline tool behind the scenes.
For reading there is chmlib, I assume there is some python wrapper for it somewhere.

Related

Opening and Parsing a QXDM .isf File using Python

I'm working on a project where I need to open and parse multiple .isf file formats. Is there a python way to import the data from the .isf file format ? If not is there any other way I could do it ?
The premise of the project is to open multiple .isf files, parse them into one big file, and also analyze the data.
I was also looking for QXDM ISF parsing mechanisms, and found only two:
for windows with QXDM installed, search manual 80-V5627-1 (ISF Processing Interfaces). "This provides a scripting framework through which any COM-compliant scripting language (VBScript, JScript, PERL, etc.) can access and manipulate files created."
if you need to work in Linux or do not have QXDM installed, I found mentioned in Interface Control Documents (ICDS) that those who purchase an ICD are also offered a QXDM Professional ISF Access SDK at no extra charge. This SDK allows you to process proprietary ISF logs directly without conversion to DLF and does not require QXDM Professional to be installed. A copy of the document can be found here.
Since we haven't purchased the SDK and we work in Linux, we convert from ISF to DLF by hand and parse the DLF files in Linux.

Providing Standard Library for embedded Python

I've successfully embedded Python in a multi-platform C++ project.
This required linking to a libpython, which needs to be provided for each platform I'm targeting. For OSX it was easy, I just pulled it out of some homebrew folder.
But I would like my Python scripts to use imports from the standard library (e.g. this one)
What is that going to involve?
Standard Library documentation for Python 3 says that the standard library is a mix of compiled units and .py files, so I'm expecting I will have to maybe link my project against a second library, and somehow inform the Python runtime of the location of the folder containing the standard library's .py files.
But is it really going to be this simple? Is this process documented anywhere?
Am I going to run into trouble on mobile platforms? It looks as though Kivy might be on their way towards solving this problem...

Insert Image into PPT using Standard Libraries

I know this is possible to do using additional libraries such as win32com or python-pptx, but I wasn wondering if anyone knew of a way to insert an image into a powerpoint slide using the standard libraries. Lots of googling has indicated that the best solution is probably win32com, but since I can guarantee that every system this script will be deployed to will have win32com, I am looking for an implemention leveraging libraries all systems with a standard python 2.7 install will have.
It is probably possible to modify a .pptx file with the standard library without much effort: these new generation of files are meant to be zip-compressed XML + external images files, and can be handled by ziplib and standard xml parsers.
Legacy .ppt files however are a binary closed format, with little documentation, and hundrededs of corner cases. It would alwasys "be possible" to change them, since they are still just bytes, but it would take considerable effort.
That said, starting with Python 3.4, the Python installer "PIP" comes default with the language install: probably the best way to go would be to script the installation of external libraries based on the built-in PIP - that way one would not have to all external library usage.

How to read/write .sit files with Python in Linux

How to read/write a .sit archive using Python in Linux?
For dealing with older library formats I tend to fall back on command line utilities. You should be able to find sit manipulation tools such as this one:
http://ctan.binkerton.com/ctan.readme.php?filename=tools/unstuff/unsit.c
As to making them, I'd suggest using an alternative format. You probably have a specific purpose in mind, but it's a fairly outdated format and you'd be better off with ZIP or TAR.GZ.

Read content of RAR file into memory in Python

I'm looking for a way to read specific files from a rar archive into memory. Specifically they are a collection of numbered image files (I'm writing a comic reader). While I can simply unrar these files and load them as needed (deleting them when done), I'd prefer to avoid that if possible.
That all said, I'd prefer a solution that's cross platform (Windows/Linux) if possible, but Linux is a must. Just as importantly, if you're going to point out a library to handle this for me, please understand that it must be free (as in beer) or OSS.
See the rarfile module:
http://grue.l-t.ee/~marko/src/rarfile/README.html
http://pypi.python.org/pypi/rarfile/
https://github.com/markokr/rarfile
The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.
Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.
The pyUnRAR2 library can extract files from RAR archives to memory (and disk if you want). It's available under the MIT license and simply wraps UnRAR.dll on Windows and unrar on Unix. Click "QuickTutorial" for usage examples.
On Windows, it is able to extract to memory (and not disk) with the (included) UnRAR.dll by setting a callback using RARSetCallback() and then calling RARProcessFile() with the RAR_TEST option instead of the RAR_EXTRACT option to avoid extracting any files to disk. The callback then watches for UCM_PROCESSDATA events to read the data. From the documentation for UCM_PROCESSDATA events: "Process unpacked data. It may be used to read a file while it is being extracted or tested without actual extracting file to disk."
On Unix, unrar can simply print the file to stdout, so the library just reads from a pipe connected to unrar's stdout. The unrar binary you need is the one that has the "p" for "Print file to stdout" command. Use "apt-get install unrar" to install it on Ubuntu.
It seems like the limitation that rarsoft imposes on derivative works is that you may not use the unrar source code to create a variation of the RAR COMPRESSION algorithm. From the context, it would appear that it's specifically allowing folks to use his code (modified or not) to decompress files, but you cannot use them if you intend to write your own compression code. Here is a direct quote from the license.txt file I just downloaded:
The UnRAR sources may be used in any software to handle RAR
archives without limitations free of charge, but cannot be used
to re-create the RAR compression algorithm, which is proprietary.
Distribution of modified UnRAR sources in separate form or as a
part of other software is permitted, provided that it is clearly
stated in the documentation and source comments that the code may
not be used to develop a RAR (WinRAR) compatible archiver.
Seeing as everyone seemed to just want something that would allow them to write a comic viewer capable of handling reading images from CBR (rar) files, I don't see why people think there's anything keeping them from using the provided source code.
RAR is a proprietary format; I don't think there are any public specs, so third-party tool and library support is poor to non-existant.
You're much better off using ZIP; it's completely free, has an accurate public spec, the compression library is available everywhere (zlib is one of the most widely-deployed libraries in the world), and it's very easy to code for.
http://docs.python.org/library/zipfile.html
The free 7zip library is also able to handle RAR files.
Look at the Python "struct" module. You can then interpret the RAR file format directly in your Python program, allowing you to retrieve the content inside the RAR without depending on external software to do it for you.
EDIT: This is of course vanilla Python - there are alternatives which use third-party modules (as already posted).
EDIT 2: According to Wikipedia's article my answer would require you to have permission from the author.

Categories

Resources