Is there any tool/library for Python that will allow me to manipulate sound files (wav/mp3)?
Desired operations are:
Create a new audio file
Place sounds on a timeline with specified volume level, allowing them to overlap
The ideal tool would be used like:
result = AudioFile(12) # New 12 sec audio file
sounds = [load_sound(fname) for fname in soundfiles]
result.add(sounds[0], start_time=0)
result.add(sounds[1], start_time=2, volume_level=0.6)
result.save('result.wav')
The result.wav should now be a 12 seconds audio composed of sounds 0 and 1 that will overlap if sound 0 is longer than 2 seconds.
Q: Is there something like this out there?
First, you can almost do this with just the standard library.
wave can parse and create WAV files. It can't do MP3 (or AAC or other file formats you probably care about); if that's a critical feature you'll need to turn to a third-party library, but there are tons of options. (pymad was the first one that came up in a search, but you should do your own searches on PyPI and/or Google, because SO is not well-suited to get recommendations and opinions.)
audioop lets you do simple operations on audio buffers—nothing too fancy, but enough to normalize, scale, and merge. And you can build what you want easily out of that.
If you want to do things at a higher level, there are bindings for well-known tools like sox, libavcodec/ffmpeg, etc. In my experience, every time I've needed to write something beyond a quick hack, I couldn't find anything with complete-enough, stable-enough bindings that met the relevant licensing requirements, but again, you'll have to search for yourself. Or, alternatively, just call the command-line tools with subprocess, which is usually a whole lot simpler.
Related
I'm trying to write something that catches the audio being played to the speakers/headphones/soundcard and see whether it is playing and what the longest silence is. This is to test an application that is being executed and to see if it stops playing audio after a certain point, as such i don't actually need to really know what the audio itself is, just whether or not there is audio playing.
I need this to be fully programmatic (so not requiring the use of GUI tools or the like, to set up an environment). I know applications like projectM do this, I just can't for the life of me find anything anywhere that denotes how.
An audio level meter would also work for this, as would ossiliscope data or the like, really would take any recommendation.
Here is a very similar question: record output sound in python
You could try to route your output to a new device with jack and record this with portaudio. There are Python Bindings for portaudio called pyaudio and for jack called PyJack. I have never used the latter one but pyaudio works great.
I'd like to seperate the channels of a mp3 file in Python and save it in two other files.
Does anybody know a library for this.
Thanks in advance.
I assume you want to split the channels losslessly, without decoding MP3 and re-encoding it - otherwise you would not have mentioned MP3 at all and would have easily found many tools like Audacity to do that.
There are 4 channel modes of MP3 frames - this means 4 types of MP3 files: simple stereo, joint-stereo, dual-channel, mono. joint-stereo files can't be split without loss. mono files doesn't need splitting. The rest: stereo and dual-channel, consists of less than 0.1% of all MP3 files, technically can be split into 2 files, each for a channel, without loss. However there is not any tool on the Internet to do that - not any command line tool nor any GUI tool, because few need the function.
There are not any python library for you neither. Most libraries abstracted MP3 files into a common audio which you can manipulate, after decoding. pymad is the only one specific to MP3 file, and it can tell if a file is using any of the 4 channel modes, but does not offer to extract a channel without decoding it. If you write a new tool, you will have to work on raw MP3 files or produce a library for it.
And it is not easy to write a tool or library for it. It's one stream with 2 channels and not two streams interleaved on a frame level. You cannot simply work on MP3 frames, drop some frames, keep others, and manage to extract a channel out that way. It's a task for a professional, and perhaps best happen in a decoder project (like lame or libmad) and not in a file manipulation project (like mp3info or the python eyeD3). In other words, this feature is likely written in C, not python.
Implementaiton Note:
The task to build such a tool thus suits well for a computer science C-programming language course project:
1. it takes a lot of time to do;
2. it requires every skill learned from C programming course;
3. it can get wrong easily;
4. it is likely built on the work of other projects, a lesson of adaptating existing work;
5. is a damn-hard endeavor that no-one did before and thus very rewarding
6. perhaps can be done in 300 difficult lines of code instead of bloated simple Visual Basic code, thus is a good lession of modesty and quality;
7. and finally: nobody is waiting in an hurry for a working implementation.
All condition fits perfectly for a C-programming course project.
Implementation Note 2:
some bit-rates are only possible in mono mode (80kbps), and some bit-rates are only possible in stereo mode (e.g. 320kpbs). Luckily this does not present a problem in this task, because all dual-mp3 bit-rate can be mapped into a fitting mono-mp3 bit-rate -- but not vice versa!
As a personal project (in order to learn python better), I starter working on a duplicate file remover (especially for .mp3 files since I thought of it while trying to organise my full-of-duplicates music collection). Now, I'm fairly clear on how to proceed, matching file names and offering for deletion only those that present more that 0.7 similarity ratio, and using md5 sums for those files that are the same but have completely different names (eg: "metallica-nothing else matters" and "Track1"). The problem is that I don't know what to do about those files that have different names and they are a bit different from one another, for example, "nothing else matters" and "Track1" are the same except for the fact that "Track1" has 2 seconds of silence at the end. My question is: Is there some kind of method or algorithm that checks similarities between files themselves? Something like string matching but on files? Doesn't matter if it's a complicated algorithm, the harder the better since I'm doing this only to learn :D
You could use Chromaprint, that computes a fingerprint for a piece of music. It should be able to find similar music files.
If you want to push this further, you could use the api of musicbrainz to find the exact information about a piece of music.
These libraries are used in two greats music library tagging and sorting applications I use : picard and beets.
you can also look at win32 module, here is the link
http://timgolden.me.uk/python/index.html
I need to convert mp3 audio files to 64kbps on the server side.
Right now, I am using subprocess to call lame, but I wonder if there are any good alternatives?
There seems to be a slightly old thread on that topic here: http://www.dreamincode.net/forums/topic/72083-lame-mp3-encoder-for-python/
The final conclusion was to create a custom binding to lame_enc.dll via Python->C bindings.
The reason for that conclusion was that the existing binding libraries (pymedia/py-lame) have not been maintained.
Unfortunately the guy didn't get it to work :)
Maybe you should continue to use subprocess. You could take advantage of that choice, abstract your encoding at a slightly higher level, and reuse the code/strategy to optionally execute other command line encoding tools (such as ogg or shn tools).
I've seen several audio ripping tools adopt that strategy.
I've been working with Python Audio Tools, which is capable of make conversions between different audio formats.
I've already used it to convert .wav files into mp3, .flac and .m4a.
If you want to use LAME to encode your MP3s (and not PyMedia), you can always use ctypes to wrap the lame encoder DLL (or .so if you are on Linux). The exact wrapper code you'll use is going to be tied to the LAME DLL version (and there are many of these flying around, unfortunately), so I can't really give you any example, but the ctypes docs should be clear enough about wrapping DLLs.
Caveat: relatively new programmer here and I haven't had a need to convert audio files before.
However, if I understand what you mean by server-side, correctly, you might be looking for a good approach to manage mass conversions, and your interest in a python solution might be in part to be able to better manage the resource use or integrate into your processing chain. I had a similar problem/goal, which I resolved using a mix of Merlyn's recommendation and Celery. I don't use django-celery, but if this is for a django-based project, that might appeal to you as well. You can find out more about celery here:
http://celeryproject.org/community.html
http://ask.github.com/celery/getting-started/introduction.html
Depending on what you have setup already, there may be a little upfront time needed to get setup. To take advantage of everything you'll need rabbitmq/erlang installed, but if you follow the guide on the sites above, it's pretty quick now.
Here's an example of how I use celery with subprocess to address a similar issue. Similar to the poster's suggestion above, I use subprocess to call ffmpeg, which is as good as it gets for video tools, and probably would actually be as good as it gets for audio tools too. I'm including a bit more than necessary here to give you a feel for how you might configure your own a little.
#example of configuring an option, here I'm selecting how much I want to adjust bitrate
#based on my input's format
def generate_command_line_method(self):
if self.bitrate:
compression_dict = {'.mp4':1.5, '.rm':1.5, '.avi': 1.2,
'.mkv': 1.2, '.mpg': 1, '.mpeg':1}
if self.ext.lower() in compression_dict.keys():
compression_factor = compression_dict[self.ext.lower()]
#Making a list to send to the command line through subprocess
ffscript = ['ffmpeg',
'-i', self.fullpath,
'-b', str(self.bitrate * compression_factor),
'-qscale', '3', #quality factor, based on trial and error
'-g', '90', #iframe roughly per 3 seconds
'-intra',
outpath
]
return ffscript
#The celery side of things, I'd have a celeryconfig.py file in the
#same directory as the script that points to the following function, so my task
#queue would know the specifics of the function I'll call through it. You can
#see example configs on the sites above, but it's basically just going to be
#a tuple that says, here are the modules I want you to look in, celery, e.g.
#CELERY_MODULES = ("exciting_asynchronous_module.py",). This file then contains,
from celery.decorators import task
from mymodule import myobject
from subprocess import Popen
#task(time_limit=600) #say, for example, 10 mins
def run_ffscript(ffscript):
some_result = Popen(ffscript).wait()
#Note: we'll wait because we don't want to compound
#the asynchronous aspect (we don't want celery to launch the subprocess and think
#it has finished.
#Then I start up celery/rabbitmq, and got into my interactive shell (ipython shown):
#I'll have some generator feeding these ffscripts command lines, then process them
#with something like:
In[1]: for generated_ffscript in generator:
run_ffscript.delay(generated_ffscript)
Let me know if this was useful to you. I'm relatively new to answering questions here and not sure if my attempts are helpful or not. Good luck!
Well, Gstreamer has the "ugly plugin" lamemp3enc and there are python bindings for Gstreamer (gst-python 1.2, supports python 3.3). I haven't tried going this route myself so I'm not really in a position to recommend anything... Frankly, a subprocess solution seems a lot simpler, if not "cleaner", to me.
I am looking for some general advice about the mp3 format before I start a small project to make sure I am not on a wild-goose chase.
My understanding of the internals of the mp3 format is minimal. Ideally, I am looking for a library that would abstract those details away. I would prefer to use Python (but could be convinced otherwise).
I would like to modify a set of mp3 files in a fairly simple way. I am not so much interested in the ID3 tags but in the audio itself. I want to be able to delete sections (e.g. drop 10 seconds from the 3rd minute), and insert sections (e.g. add credits to the end.)
My understanding is that the mp3 format is lossy, and so decoding it to (for example) PCM format, making the modifications, and then encoding it again to MP3 will lower the audio quality. (I would love to hear that I am wrong.)
I conjecture that if I stay in mp3 format, there will be some sort of minimum frame or packet-size to deal with, so the granularity of the operations may be coarser. I can live with that, as long as I get an accuracy of within a couple of seconds.
I have looked at PyMedia, but it requires me to migrate to PCM to process the data. Similarly, LAME wants to help me encode, but not access the data in place. I have seen several other libraries that only deal with the ID3 tags.
Can anyone recommend a Python MP3 library? Alternatively, can you disabuse me of my assumption that going to PCM and back is bad and avoidable?
If you want to do things low-level, use pymad. It turns MP3s into a buffer of sample data.
If you want something a little higher-level, use the Echo Nest Remix API (disclosure: I wrote part of it for my dayjob). It includes a few examples. If you look at the cowbell example (i.e., MoreCowbell.dj), you'll see a fork of pymad that gives you a NumPy array instead of a buffer. That datatype makes it easier to slice out sections and do math on them.
I got three quality answers, and I thank you all for them. I haven't chosen any as the accepted answer, because each addressed one aspect, so I wanted to write a summary.
Do you need to work in MP3?
Transcoding to PCM and back to MP3 is unlikely to result in a drop in quality.
Don't optimise audio-quality prematurely; test it with a simple prototype and listen to it.
Working in MP3
Wikipedia has a summary of the MP3 File Format.
MP3 frames are short (1152 samples, or just a few milliseconds) allowing for moderate precision at that level.
However, Wikipedia warns that "Frames are not independent items ("byte reservoir") and therefore cannot be extracted on arbitrary frame boundaries."
Existing libraries are unlikely to be of assistance, if I really want to avoid decoding.
Working in PCM
There are several libraries at this level:
LAME (latest release: October 2017)
PyMedia (latest release: February 2006)
PyMad (Linux only? Decoder only? Latest release: January 2007)
Working at a higher level
Echo Nest Remix API (Mac or Linux only, at the moment) is an API to a web-service that supports quite sophisticated operations (e.g. finding the locations of music beats and tempo, etc.)
mp3DirectCut (Windows only) is a GUI that apparently performs the operations I want, but as an app. It is not open-source. (I tried to run it, got an Access Denied installer error, and didn't follow up. A GUI isn't suitably for me, as I want to repeatedly run these operations on a changing library of files.)
My plan is now to start out in PyMedia, using PCM.
Mp3 is lossy, but it is lossy in a very specific way. The algorithms used as designed to discard certain parts of the audio which your ears are unable to hear (or are very difficult to hear). Re-doing the compression process at the same level of compression over and over is likely to yield nearly identical results for a given piece of audio. However, some additional losses may slowly accumulate. If you're going to be modifying files a lot, this might be a bad idea. It would also be a bad idea if you were concerned about quality, but then using MP3 if you are concerned about quality is a bad idea over all.
You could construct a test using an encoder and a decoder to re-encode a few different mp3 files a few times and watch how they change, this could help you determine the rate of deterioration and figure out if it is acceptable to you. Sounds like you have libraries you could use to run this simple test already.
MP3 files are composed of "frames" of audio and so it should be possible, with some effort, to remove entire frames with minimal processing (remove the frame, update some minor details in the file header). I believe frames are pretty short (a few milliseconds each) which would give the precision you're looking for. So doing some reading on the MP3 File Format should give you enough information to code your own python library to do this. This is a fair bit different than traditional "audio processing" (since you don't care about precision) and so you're unlikely to find an existing library that does this. Most, as you've found, will decompress the audio first so you can have complete fine-grained control.
Not a direct answer to your needs, but check the mp3DirectCut software that does what you want (as a GUI app). I think that the source code is available, so even if you don't find a library, you could build one of your own, or build a python extension using code from mp3DirectCut.
As for removing or extracting mp3 segments from an mp3 file while staying in the MP3 domain (that is, without conversion to PCM format and back), there is also the open source package PyMp3Cut.
As for splicing MP3 files together (adding e.g. 'Credits' to the end or beginning of an mp3 file) I've found you can simply concatenate the MP3 files providing that the files have the same sampling rate (e.g. 44.1khz) and the same number of channels (e.g. both are stereo or both are mono).