Detecting audio file integrity with python - python

If I download an audio file from the web and something bad happens to the download process, how does one efficiently detect that the audio file is incomplete with python?
There are some ideas, such as using the file command in linux:
file audio.mp4
But it recognizes that it's mp4:
audio.mp4: ISO Media, MPEG v4 system, version 2
Even mplayer detects the mp4 audio type, but fails when trying to play. I don't think launching mplayerfrom python and checking if it failed is a scalable solution though.
Here is a sample of broken file:
https://www.dropbox.com/s/5rpscb9r1xrrx4t/They
The sample above fails with mutagen and mp4file, causing them to hang indefinitely. It has to do with fileObject.tell().

There are many different audio file formats, and container formats for things that that may or may not be audio files.
Fortunately, there are libraries that can a wide variety of different kinds of files. And there are Python wrappers for:
Portable command-line tools like ffmpeg and mplayer.
Portable libraries like libavcodec (what ffmpeg uses).
Platform-specific libraries like Core Audio or QuickTime or Windows Media.
If you're willing to use separate wrappers for separate file types, there are even more choices (e.g., libmp4v2 is great for MP4 files, but useless for anything else).
Of course there are huge tradeoffs—the more powerful libraries are often going to be more complex, or have more prerequisites. Do some searching at http://pypi.python.org/ to see what turns up; you should be able to find something that does everything you want.
For one really simple example, mp4file will attempt to parse any MPEG4 container. If it's incomplete, or has any invalid atoms, you'll get an exception. So, the check is just one line, mp4file.Mp4File(path). If it succeeds, it's complete; if it throws an exception, it's incomplete or invalid. But of course this will accept a complete MPEG4 video file, or MPEG4 with no audio or video in it, and it will reject a complete MP3, or even a complete M4A with one broken metadata tag.

Related

How to convert wav with custom codec to mp3?

I've received a set of .wav files which are encoded with a custom codec. I wish to convert these files to .mp3 (or really any standard audio format that other software will recognize).
The codec has been provided to me as a .msi file, which allows me to install the codec in Windows. Running the installer creates a .acm file (in its own folder, not in the standard codec location which seems to be c:\windows\system32 ). Once I do so, Windows Media Player will play the audio in the files. No other software will play them (I've tried Audacity, VLC, QuickTime, ...).
I've tried to convert the files using ffmpeg, but it doesn't recognize the format either.
I've attempted the technique of burning the audio to a CD (with the intent of ripping the CD as mp3), but WMP gives an error when it attempts to burn them.
I'm proficient in Python and Javascript so I don't mind coding a solution if that's possible--I'm not really familiar with the theory of audio codecs. I looked at a couple python libraries but they didn't seem able to do anything with the .msi file I've been given.
Can anyone suggest a solution? Solutions might include:
How to point ffmpeg or Audacity or VLC to the installed codec so that it is able to play/convert the files?
Point me to some resources/libraries/code snippets for coding a converter based on the supplied .acm file?
How to convice WMP to save these files in some other format?

No download module to manipulate sound with MP3 in Python?

I'm looking to make an audio editor using Python for a project in which I'm not allowed to use modules that need be downloaded (I can only do a simple import ).
I want to be able to have users upload a file (preferably in mp3 or some other common format for all operating systems) and be able to play back and edit it. I also need to write out a mp3 file with the new audio.
Would this be feasible in Python 2.7 without outside modules?
EDIT: This will be hosted online if that makes any difference.
Would any of these audio modules help?
audioop is a built-in module to manipulate raw audio data. It requires a format such as .WAV but you could either convert that separately or in the program.
If you want to simply upload wav files to be used with audioop, consider looking at the wave module.

Best python solution to play ALL kinds of audio on Windows (and Linux)?

I'm trying to write some scripts to play some of my music collection with python. Finding python modules that will play ogg and mp3 is not a problem. However, I'm having repeated failures with aac-encoded m4a files from iTunes (not DRM). pygame's audio machinery doesn't support them, so I tried pymedia:
a = pymedia.player.Player()
a.start()
a.startPlayback("myM4a.m4a", format='aac')
I've tried several versions of the last line of code, including omitting the format argument, changing the files to mp4, etc. mp3's work fine, however.
pymedia even claims to support aac encoded files, but the project appears to have been abandoned anyway.
Is there a good, up to date, solution for playing ALL types of audio in python? What is used by existing python media centers/players?
I should add that I intend to use this primarily on windows, so windows support for the library is a must, but cross-platform would obviously be preferable.
You should look at the gStreamer API. It has plugins for many major audio types, is used by many audio players including Banshee and Rhythmbox and it can run on Linux, Windows and Mac. It has Python bindings as well as bindings for many other languages:
http://gstreamer.freedesktop.org/bindings/
MPlayer plays most known audio formats, and there's Python wrapper for it:
http://code.google.com/p/python-mplayer/
And a list of audio codecs supported by MPlayer:
http://www.mplayerhq.hu/DOCS/codecs-status.html#ac

Quick way to validate and convert Audio Files with Python?

For a website i am developing in django i need users to be able to upload .wav or .aif files. I, of course, have to make sure these files really are what they pretend to be - audiofiles. The files then are provided on the webpage, where i need them to be either .ogg or .mp3
While searching for a solution i stumbled across some fearsome possibilities, like using ctypes to handle external libraries. I also found, of course, PyMedia, which i cannot use because i develop on MacOSX. And the python audio tools provide a lot of functionality i do not need.
So far i can see a few possibilities that would satisfy me and are within reach of my programming capabilities:
1 Get PyMedia to run on MacOSX
2 Find a way to use some modules of the python audio tools without the need to use libcdio
3 use python subprocess to run the command line tools of the converters
As i have used none of those tools yet, i can't tell which would possibly be the quickest way to solve my problem. If you Python-Audio-Gurus are out there, could you please share some thoughts? Or maybe you even have a fantastic 1-step-to-happiness solution?
Not strictly a pythonic answer, but perhaps take a look at sox which is a simple command line audio file converter. It can do resampling of audio files for you as well.
Check out the command line options of sox for details. This will of course involve calling the external program using the subprocess module(or other method).

Read content of RAR file into memory in Python

I'm looking for a way to read specific files from a rar archive into memory. Specifically they are a collection of numbered image files (I'm writing a comic reader). While I can simply unrar these files and load them as needed (deleting them when done), I'd prefer to avoid that if possible.
That all said, I'd prefer a solution that's cross platform (Windows/Linux) if possible, but Linux is a must. Just as importantly, if you're going to point out a library to handle this for me, please understand that it must be free (as in beer) or OSS.
See the rarfile module:
http://grue.l-t.ee/~marko/src/rarfile/README.html
http://pypi.python.org/pypi/rarfile/
https://github.com/markokr/rarfile
The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.
Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.
The pyUnRAR2 library can extract files from RAR archives to memory (and disk if you want). It's available under the MIT license and simply wraps UnRAR.dll on Windows and unrar on Unix. Click "QuickTutorial" for usage examples.
On Windows, it is able to extract to memory (and not disk) with the (included) UnRAR.dll by setting a callback using RARSetCallback() and then calling RARProcessFile() with the RAR_TEST option instead of the RAR_EXTRACT option to avoid extracting any files to disk. The callback then watches for UCM_PROCESSDATA events to read the data. From the documentation for UCM_PROCESSDATA events: "Process unpacked data. It may be used to read a file while it is being extracted or tested without actual extracting file to disk."
On Unix, unrar can simply print the file to stdout, so the library just reads from a pipe connected to unrar's stdout. The unrar binary you need is the one that has the "p" for "Print file to stdout" command. Use "apt-get install unrar" to install it on Ubuntu.
It seems like the limitation that rarsoft imposes on derivative works is that you may not use the unrar source code to create a variation of the RAR COMPRESSION algorithm. From the context, it would appear that it's specifically allowing folks to use his code (modified or not) to decompress files, but you cannot use them if you intend to write your own compression code. Here is a direct quote from the license.txt file I just downloaded:
The UnRAR sources may be used in any software to handle RAR
archives without limitations free of charge, but cannot be used
to re-create the RAR compression algorithm, which is proprietary.
Distribution of modified UnRAR sources in separate form or as a
part of other software is permitted, provided that it is clearly
stated in the documentation and source comments that the code may
not be used to develop a RAR (WinRAR) compatible archiver.
Seeing as everyone seemed to just want something that would allow them to write a comic viewer capable of handling reading images from CBR (rar) files, I don't see why people think there's anything keeping them from using the provided source code.
RAR is a proprietary format; I don't think there are any public specs, so third-party tool and library support is poor to non-existant.
You're much better off using ZIP; it's completely free, has an accurate public spec, the compression library is available everywhere (zlib is one of the most widely-deployed libraries in the world), and it's very easy to code for.
http://docs.python.org/library/zipfile.html
The free 7zip library is also able to handle RAR files.
Look at the Python "struct" module. You can then interpret the RAR file format directly in your Python program, allowing you to retrieve the content inside the RAR without depending on external software to do it for you.
EDIT: This is of course vanilla Python - there are alternatives which use third-party modules (as already posted).
EDIT 2: According to Wikipedia's article my answer would require you to have permission from the author.

Categories

Resources