Python Audio Edit

Python Audio Edit - python

I am searching for a way to write a simple python
program to perform an automatic edit on an audio file.
I wrote with PIL automatic picture resizing to a predefined size.
I would like to write the same for automatic file re-encoding into a predefined bitrate.
similarly, i would like to write a python program that can stretch an audio file and re-encode it.
do i have to parse MP3's by myself, or is there a library that can be used for this?

Rather than doing this natively in Python, I strongly recommend leaving the heavy lifting up to FFMPEG, by executing it from your script.
It can chop, encode, and decode just about anything you throw at it. You can find a list of common parameters here: http://howto-pages.org/ffmpeg/
This way, you can leave your Python program to figure out the logic of what you want to cut and where, and not spend a decade writing code to deal with all of the audio formats available.
If you don't like the idea of directly executing it, there is also a Python wrapper available for FFMPEG.

There is pydub. It's an easy to use library.

Related

What is the Python equivalent of Lame MP3 Converter?

I need to convert mp3 audio files to 64kbps on the server side.
Right now, I am using subprocess to call lame, but I wonder if there are any good alternatives?

There seems to be a slightly old thread on that topic here: http://www.dreamincode.net/forums/topic/72083-lame-mp3-encoder-for-python/
The final conclusion was to create a custom binding to lame_enc.dll via Python->C bindings.
The reason for that conclusion was that the existing binding libraries (pymedia/py-lame) have not been maintained.
Unfortunately the guy didn't get it to work :)
Maybe you should continue to use subprocess. You could take advantage of that choice, abstract your encoding at a slightly higher level, and reuse the code/strategy to optionally execute other command line encoding tools (such as ogg or shn tools).
I've seen several audio ripping tools adopt that strategy.

I've been working with Python Audio Tools, which is capable of make conversions between different audio formats.
I've already used it to convert .wav files into mp3, .flac and .m4a.

If you want to use LAME to encode your MP3s (and not PyMedia), you can always use ctypes to wrap the lame encoder DLL (or .so if you are on Linux). The exact wrapper code you'll use is going to be tied to the LAME DLL version (and there are many of these flying around, unfortunately), so I can't really give you any example, but the ctypes docs should be clear enough about wrapping DLLs.

Caveat: relatively new programmer here and I haven't had a need to convert audio files before.
However, if I understand what you mean by server-side, correctly, you might be looking for a good approach to manage mass conversions, and your interest in a python solution might be in part to be able to better manage the resource use or integrate into your processing chain. I had a similar problem/goal, which I resolved using a mix of Merlyn's recommendation and Celery. I don't use django-celery, but if this is for a django-based project, that might appeal to you as well. You can find out more about celery here:
http://celeryproject.org/community.html
http://ask.github.com/celery/getting-started/introduction.html
Depending on what you have setup already, there may be a little upfront time needed to get setup. To take advantage of everything you'll need rabbitmq/erlang installed, but if you follow the guide on the sites above, it's pretty quick now.
Here's an example of how I use celery with subprocess to address a similar issue. Similar to the poster's suggestion above, I use subprocess to call ffmpeg, which is as good as it gets for video tools, and probably would actually be as good as it gets for audio tools too. I'm including a bit more than necessary here to give you a feel for how you might configure your own a little.
#example of configuring an option, here I'm selecting how much I want to adjust bitrate
#based on my input's format
def generate_command_line_method(self):
if self.bitrate:
compression_dict = {'.mp4':1.5, '.rm':1.5, '.avi': 1.2,
'.mkv': 1.2, '.mpg': 1, '.mpeg':1}
if self.ext.lower() in compression_dict.keys():
compression_factor = compression_dict[self.ext.lower()]
#Making a list to send to the command line through subprocess
ffscript = ['ffmpeg',
'-i', self.fullpath,
'-b', str(self.bitrate * compression_factor),
'-qscale', '3', #quality factor, based on trial and error
'-g', '90', #iframe roughly per 3 seconds
'-intra',
outpath
]
return ffscript
#The celery side of things, I'd have a celeryconfig.py file in the
#same directory as the script that points to the following function, so my task
#queue would know the specifics of the function I'll call through it. You can
#see example configs on the sites above, but it's basically just going to be
#a tuple that says, here are the modules I want you to look in, celery, e.g.
#CELERY_MODULES = ("exciting_asynchronous_module.py",). This file then contains,
from celery.decorators import task
from mymodule import myobject
from subprocess import Popen
#task(time_limit=600) #say, for example, 10 mins
def run_ffscript(ffscript):
some_result = Popen(ffscript).wait()
#Note: we'll wait because we don't want to compound
#the asynchronous aspect (we don't want celery to launch the subprocess and think
#it has finished.
#Then I start up celery/rabbitmq, and got into my interactive shell (ipython shown):
#I'll have some generator feeding these ffscripts command lines, then process them
#with something like:
In[1]: for generated_ffscript in generator:
run_ffscript.delay(generated_ffscript)
Let me know if this was useful to you. I'm relatively new to answering questions here and not sure if my attempts are helpful or not. Good luck!

Well, Gstreamer has the "ugly plugin" lamemp3enc and there are python bindings for Gstreamer (gst-python 1.2, supports python 3.3). I haven't tried going this route myself so I'm not really in a position to recommend anything... Frankly, a subprocess solution seems a lot simpler, if not "cleaner", to me.

python library to compare a .wav with mic input?

I need a python library or module or whatever that can compare a .wav file to microphone input without too too much code. Sample code would be cool too. TY!

I don't know if there's a library already for that, but if you have the microphone input as a WAV file as well, you can use the wave and audioop modules and write it yourself.

If you're trying to compare something like, what words people are saying, this would have to be a fairly complex piece of code. You could directly compare them at a frequency/wave level, but you'll very rarely if ever get a match.

Using Imagemagick without making files?

I'm working in Python to create images from text. I've already been back and forth with PIL and frankly, its font and alignment options need a lot of work.
I can subprocess Imagemagick and it works great, except that it seems to always need to write a file to disk. I would like to subprocess the image creation and just get the data returned to Python, keeping everything in memory.
I've looked into a number of supposed Python wrappers for ImageMagick, but they're all hopelessly years out of date or not documented whatsoever. Even searching extensively on SO doesn't see to clearly point to a defacto way to use ImageMagic with Python. So I think going for subprocessing is the best way forward.

convert and the other ImageMagick commands can output image data to stdout if you specify format:- as the output file. You can capture that output in Python using the subprocess module.
For instance:
cmd = ["convert", "test.bmp", "jpg:-"]
output_stream = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout

It would be a lot more work than piping data to ImageMagick, but there are several Pango based solutions. I used pango and pygtk awhile back, and I am pretty sure you could develop a headless gtk or gdk application to render text to a pixbuf.
A simpler solution might be to use the python cairo bondings.
Pango works at a pretty low level, so simple stuff can be a lot more complicated, but rendering quality is hard to beat, and it gives you a lot of fine grained control over the layout.

Python library to modify MP3 audio without transcoding

I am looking for some general advice about the mp3 format before I start a small project to make sure I am not on a wild-goose chase.
My understanding of the internals of the mp3 format is minimal. Ideally, I am looking for a library that would abstract those details away. I would prefer to use Python (but could be convinced otherwise).
I would like to modify a set of mp3 files in a fairly simple way. I am not so much interested in the ID3 tags but in the audio itself. I want to be able to delete sections (e.g. drop 10 seconds from the 3rd minute), and insert sections (e.g. add credits to the end.)
My understanding is that the mp3 format is lossy, and so decoding it to (for example) PCM format, making the modifications, and then encoding it again to MP3 will lower the audio quality. (I would love to hear that I am wrong.)
I conjecture that if I stay in mp3 format, there will be some sort of minimum frame or packet-size to deal with, so the granularity of the operations may be coarser. I can live with that, as long as I get an accuracy of within a couple of seconds.
I have looked at PyMedia, but it requires me to migrate to PCM to process the data. Similarly, LAME wants to help me encode, but not access the data in place. I have seen several other libraries that only deal with the ID3 tags.
Can anyone recommend a Python MP3 library? Alternatively, can you disabuse me of my assumption that going to PCM and back is bad and avoidable?

If you want to do things low-level, use pymad. It turns MP3s into a buffer of sample data.
If you want something a little higher-level, use the Echo Nest Remix API (disclosure: I wrote part of it for my dayjob). It includes a few examples. If you look at the cowbell example (i.e., MoreCowbell.dj), you'll see a fork of pymad that gives you a NumPy array instead of a buffer. That datatype makes it easier to slice out sections and do math on them.

I got three quality answers, and I thank you all for them. I haven't chosen any as the accepted answer, because each addressed one aspect, so I wanted to write a summary.
Do you need to work in MP3?
Transcoding to PCM and back to MP3 is unlikely to result in a drop in quality.
Don't optimise audio-quality prematurely; test it with a simple prototype and listen to it.
Working in MP3
Wikipedia has a summary of the MP3 File Format.
MP3 frames are short (1152 samples, or just a few milliseconds) allowing for moderate precision at that level.
However, Wikipedia warns that "Frames are not independent items ("byte reservoir") and therefore cannot be extracted on arbitrary frame boundaries."
Existing libraries are unlikely to be of assistance, if I really want to avoid decoding.
Working in PCM
There are several libraries at this level:
LAME (latest release: October 2017)
PyMedia (latest release: February 2006)
PyMad (Linux only? Decoder only? Latest release: January 2007)
Working at a higher level
Echo Nest Remix API (Mac or Linux only, at the moment) is an API to a web-service that supports quite sophisticated operations (e.g. finding the locations of music beats and tempo, etc.)
mp3DirectCut (Windows only) is a GUI that apparently performs the operations I want, but as an app. It is not open-source. (I tried to run it, got an Access Denied installer error, and didn't follow up. A GUI isn't suitably for me, as I want to repeatedly run these operations on a changing library of files.)
My plan is now to start out in PyMedia, using PCM.

Mp3 is lossy, but it is lossy in a very specific way. The algorithms used as designed to discard certain parts of the audio which your ears are unable to hear (or are very difficult to hear). Re-doing the compression process at the same level of compression over and over is likely to yield nearly identical results for a given piece of audio. However, some additional losses may slowly accumulate. If you're going to be modifying files a lot, this might be a bad idea. It would also be a bad idea if you were concerned about quality, but then using MP3 if you are concerned about quality is a bad idea over all.
You could construct a test using an encoder and a decoder to re-encode a few different mp3 files a few times and watch how they change, this could help you determine the rate of deterioration and figure out if it is acceptable to you. Sounds like you have libraries you could use to run this simple test already.
MP3 files are composed of "frames" of audio and so it should be possible, with some effort, to remove entire frames with minimal processing (remove the frame, update some minor details in the file header). I believe frames are pretty short (a few milliseconds each) which would give the precision you're looking for. So doing some reading on the MP3 File Format should give you enough information to code your own python library to do this. This is a fair bit different than traditional "audio processing" (since you don't care about precision) and so you're unlikely to find an existing library that does this. Most, as you've found, will decompress the audio first so you can have complete fine-grained control.

Not a direct answer to your needs, but check the mp3DirectCut software that does what you want (as a GUI app). I think that the source code is available, so even if you don't find a library, you could build one of your own, or build a python extension using code from mp3DirectCut.

As for removing or extracting mp3 segments from an mp3 file while staying in the MP3 domain (that is, without conversion to PCM format and back), there is also the open source package PyMp3Cut.
As for splicing MP3 files together (adding e.g. 'Credits' to the end or beginning of an mp3 file) I've found you can simply concatenate the MP3 files providing that the files have the same sampling rate (e.g. 44.1khz) and the same number of channels (e.g. both are stereo or both are mono).

Editing Photoshop PSD text layers programmatically

I have a multi-layered PSD, with one specific layer being non-rasterized text. I'm trying to figure out a way I can, from a bash/perl/python/whatever-else program:
load the PSD
edit the text in said layer
flatten all layers in the image
save as a web-friendly format like PNG or JPG
I immediately thought of ImageMagick, but I don't think I can edit the text layer through IM. If I can accomplish the first two steps some other programmatic way, I can always use ImageMagick to perform the last two steps.
After a couple of hours of googling and searching CPAN and PyPI, I still have found nothing promising. Does anyone have advice or ideas on the subject?

If you don't like to use the officially supported AppleScript, JavaScript, or VBScript, then there is also the possibility to do it in Python. This is explained in the article Photoshop scripting with Python, which relies on Photoshop's COM interface.
I have not tried it, so in case it does not work for you:
If your text is preserved after conversion to SVG then you can simply replace it by whatever tool you like. Afterwards, convert it to PNG (eg. by inkscape --export-png=...).

The only way I can think of to automate the changing of text inside of a PSD would be to use a regex based substitution.
Create a very simple picture in Photoshop, perhaps a white background and a text layer, with the text being a known length.
Search the file for your text, and with a hex editor, search nearby for the length of the text (which may or may not be part of the file format).
Try changing the text, first to a string of the same length, then to something shorter/longer.
Open in Photoshop after each change to see if the file is corrupt.
This method, if viable, will only work if the layer in question contains a known string, which can be substituted for your other value. Note that I have no idea whether this will work, as I don't have Photoshop on this computer to try this method out. Perhaps you can make it work?
As for converting to png, I am at a loss. If the replacing script is in Python, you may be able to do it with the Python Imaging Library (PIL, which seems to support it), but otherwise you may just have to open Photoshop to do the conversion. Which means that it probably wouldn't be worth it to change the text pragmatically in the first place.

Have you considered opening and editing the image in The GIMP? It has very good PSD support, and can be scripted in several languages.
Which one you use depends in part on your platform, the Perl interface didn't work on Windows the last I knew. I believe Scheme is supported in all ports.

You can use Photoshop itself to do this with OLE. You will need to install Photoshop, of course. Win32::OLE in Perl or similar module in Python. See http://www.adobe.com/devnet/photoshop/pdfs/PhotoshopScriptingGuide.pdf

If you're going to automate Photoshop, you pretty much have to use Photoshop's own scripting systems. I don't think there's a way around that.
Looking at the problem a different way, can you export from Photoshop to some other format which supports layers, like PNG, which is editable by ImageMagick?

You can also try this using Node.js. I made a PSD command-line tool
One-line command install (needs NodeJS/NPM installed)
npm install -g psd-cli
You can then use it by typing in your terminal
psd myfile.psd -t
You can check out the code to use it from another node script or use it through your shell is from another Bash/Perl/whatever script.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.