I'm looking for a way to read specific files from a rar archive into memory. Specifically they are a collection of numbered image files (I'm writing a comic reader). While I can simply unrar these files and load them as needed (deleting them when done), I'd prefer to avoid that if possible.
That all said, I'd prefer a solution that's cross platform (Windows/Linux) if possible, but Linux is a must. Just as importantly, if you're going to point out a library to handle this for me, please understand that it must be free (as in beer) or OSS.
See the rarfile module:
http://grue.l-t.ee/~marko/src/rarfile/README.html
http://pypi.python.org/pypi/rarfile/
https://github.com/markokr/rarfile
The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.
Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.
The pyUnRAR2 library can extract files from RAR archives to memory (and disk if you want). It's available under the MIT license and simply wraps UnRAR.dll on Windows and unrar on Unix. Click "QuickTutorial" for usage examples.
On Windows, it is able to extract to memory (and not disk) with the (included) UnRAR.dll by setting a callback using RARSetCallback() and then calling RARProcessFile() with the RAR_TEST option instead of the RAR_EXTRACT option to avoid extracting any files to disk. The callback then watches for UCM_PROCESSDATA events to read the data. From the documentation for UCM_PROCESSDATA events: "Process unpacked data. It may be used to read a file while it is being extracted or tested without actual extracting file to disk."
On Unix, unrar can simply print the file to stdout, so the library just reads from a pipe connected to unrar's stdout. The unrar binary you need is the one that has the "p" for "Print file to stdout" command. Use "apt-get install unrar" to install it on Ubuntu.
It seems like the limitation that rarsoft imposes on derivative works is that you may not use the unrar source code to create a variation of the RAR COMPRESSION algorithm. From the context, it would appear that it's specifically allowing folks to use his code (modified or not) to decompress files, but you cannot use them if you intend to write your own compression code. Here is a direct quote from the license.txt file I just downloaded:
The UnRAR sources may be used in any software to handle RAR
archives without limitations free of charge, but cannot be used
to re-create the RAR compression algorithm, which is proprietary.
Distribution of modified UnRAR sources in separate form or as a
part of other software is permitted, provided that it is clearly
stated in the documentation and source comments that the code may
not be used to develop a RAR (WinRAR) compatible archiver.
Seeing as everyone seemed to just want something that would allow them to write a comic viewer capable of handling reading images from CBR (rar) files, I don't see why people think there's anything keeping them from using the provided source code.
RAR is a proprietary format; I don't think there are any public specs, so third-party tool and library support is poor to non-existant.
You're much better off using ZIP; it's completely free, has an accurate public spec, the compression library is available everywhere (zlib is one of the most widely-deployed libraries in the world), and it's very easy to code for.
http://docs.python.org/library/zipfile.html
The free 7zip library is also able to handle RAR files.
Look at the Python "struct" module. You can then interpret the RAR file format directly in your Python program, allowing you to retrieve the content inside the RAR without depending on external software to do it for you.
EDIT: This is of course vanilla Python - there are alternatives which use third-party modules (as already posted).
EDIT 2: According to Wikipedia's article my answer would require you to have permission from the author.
Related
I know similar questions have been popular in the past, but none refers to my problem. I'm looking for a way to read data from Excel file in Python, but I'm strongly against using non-builtin modules.
The reason why is that in my case Python is a component of another software, so incorporating additional module would require from every user knowledge about how to use pip, which Python installation on your pc should one install module into, etc. The solution must not require any additional actions from user.
I can read CSV files with Python builtin easily, so that could work, but how can I convert Excel to CSV in the first place? Or is there a way to read Excel directly?
Edit: It is Python 2, that is used in this software.
Edit2:
Anyone minds explaining the downvote? I think this isn't a question about a ready solution or module, but rather a method and is well detailed. It is not always possible to use external modules, so this is an actual problem. If it is not possible at all though, then I would simply expect an answer instead of -1.
Not really the prettiest solution, but you could download the complete code repository of one of the excel handling packages for python (openpyxl for example) and put these files in the same directory as the python script that you're going to run. Subsequently you can do an import of these local package files in your script.
Note: if the excel handling package has dependencies on other packages, then you'll need to download these as well.
I know this is possible to do using additional libraries such as win32com or python-pptx, but I wasn wondering if anyone knew of a way to insert an image into a powerpoint slide using the standard libraries. Lots of googling has indicated that the best solution is probably win32com, but since I can guarantee that every system this script will be deployed to will have win32com, I am looking for an implemention leveraging libraries all systems with a standard python 2.7 install will have.
It is probably possible to modify a .pptx file with the standard library without much effort: these new generation of files are meant to be zip-compressed XML + external images files, and can be handled by ziplib and standard xml parsers.
Legacy .ppt files however are a binary closed format, with little documentation, and hundrededs of corner cases. It would alwasys "be possible" to change them, since they are still just bytes, but it would take considerable effort.
That said, starting with Python 3.4, the Python installer "PIP" comes default with the language install: probably the best way to go would be to script the installation of external libraries based on the built-in PIP - that way one would not have to all external library usage.
For a website i am developing in django i need users to be able to upload .wav or .aif files. I, of course, have to make sure these files really are what they pretend to be - audiofiles. The files then are provided on the webpage, where i need them to be either .ogg or .mp3
While searching for a solution i stumbled across some fearsome possibilities, like using ctypes to handle external libraries. I also found, of course, PyMedia, which i cannot use because i develop on MacOSX. And the python audio tools provide a lot of functionality i do not need.
So far i can see a few possibilities that would satisfy me and are within reach of my programming capabilities:
1 Get PyMedia to run on MacOSX
2 Find a way to use some modules of the python audio tools without the need to use libcdio
3 use python subprocess to run the command line tools of the converters
As i have used none of those tools yet, i can't tell which would possibly be the quickest way to solve my problem. If you Python-Audio-Gurus are out there, could you please share some thoughts? Or maybe you even have a fantastic 1-step-to-happiness solution?
Not strictly a pythonic answer, but perhaps take a look at sox which is a simple command line audio file converter. It can do resampling of audio files for you as well.
Check out the command line options of sox for details. This will of course involve calling the external program using the subprocess module(or other method).
I'm wondering if it wouldn't be a better if Python would store the compiled code in a file stream of the original source file. This would work on file systems supporting forks/data-streams, and fall-back if this is not possible.
On Windows using ADS (Alternative Data Streams)
On OS X using resource forks
On Linux using extended file attributes if compiled file is under 32k
Doing this will solve the problem of polluting the source tree or having problems like after the removal of a .py the .pyc remained and was loaded and used.
What do you think about this, sounds like a good idea or not? What issues to do see.
You sure do sacrifice an awful lot of portability this way -- right now .pyc files are uncommonly portable (often used by heterogeneous systems on a LAN through some kind of network file system arrangement, for example, though I've never been a fan of the performance characteristics of that approach), while your approach would only work on very specific filesystems and (I suspect) never across a network mount on heterogenous machines.
So, it would be a dire mistake to make the behavior you want the default one -- but it would surely be neat to have it as an option available for specific request if your deployment environment doesn't care about all of the above issues and does care about some of those you mention. Another "cool option to have", that I would actually use about 100 times more often, is to put the .pyc "files" in a database instead of having them in filesystems.
The cool thing is that this is (relatively) easily accomplished as an add-on "import hack" one way or another (depending on Python versons) -- most easily in recent-enough versions with importlib, Brett Cannon's masterpiece (but that might make backporting to older Python versions harder than other ways... too much depends on exactly what versions you need to support, a detail which I don't see in your Q, so I won't go into the implementation details, but the general idea doesn't change much across implementations).
One problem I forsee is that it then means that each platform has different behaviour.
The next is that not every filesystem OS X supports also supports resource forks (and the way it stores them in non-hfs filesystems is universally hated by everyone else: ._ )
Having said that, I have often been bitten by a .pyc file being used by apache because the apache process can't read the .py file I have replaced. But I think that this is not the solution: a better deployment process is ;)
How to read/write a .sit archive using Python in Linux?
For dealing with older library formats I tend to fall back on command line utilities. You should be able to find sit manipulation tools such as this one:
http://ctan.binkerton.com/ctan.readme.php?filename=tools/unstuff/unsit.c
As to making them, I'd suggest using an alternative format. You probably have a specific purpose in mind, but it's a fairly outdated format and you'd be better off with ZIP or TAR.GZ.