I feel like this is a fairly common problem but I haven't yet found a suitable answer. I have many audio files of human speech that I would like to break on words, which can be done heuristically by looking at pauses in the waveform, but can anyone point me to a function/library in python that does this automatically?

An easier way to do this is using pydub module. recent addition of silent utilities does all the heavy lifting such as setting up silence threahold , setting up silence length. etc and simplifies code significantly as opposed to other methods mentioned.
Here is an demo implementation , inspiration from here
I had a audio file with spoken english letters from A to Z in the file "a-z.wav". A sub-directory splitAudio was created in the current working directory. Upon executing the demo code, the files were split onto 26 separate files with each audio file storing each syllable.
Some of the syllables were cut off, possibly needing modification of following parameters,
One may want to tune these to one's own requirement.
Demo Code:
from pydub import AudioSegment
from pydub.silence import split_on_silence
sound_file = AudioSegment.from_wav("a-z.wav")
audio_chunks = split_on_silence(sound_file,
# must be silent for at least half a second
# consider it silent if quieter than -16 dBFS
for i, chunk in enumerate(audio_chunks):
out_file = ".//splitAudio//chunk{0}.wav".format(i)
print "exporting", out_file
chunk.export(out_file, format="wav")
Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
exporting .//splitAudio//chunk0.wav
exporting .//splitAudio//chunk1.wav
exporting .//splitAudio//chunk2.wav
exporting .//splitAudio//chunk3.wav
exporting .//splitAudio//chunk4.wav
exporting .//splitAudio//chunk5.wav
exporting .//splitAudio//chunk6.wav
exporting .//splitAudio//chunk7.wav
exporting .//splitAudio//chunk8.wav
exporting .//splitAudio//chunk9.wav
exporting .//splitAudio//chunk10.wav
exporting .//splitAudio//chunk11.wav
exporting .//splitAudio//chunk12.wav
exporting .//splitAudio//chunk13.wav
exporting .//splitAudio//chunk14.wav
exporting .//splitAudio//chunk15.wav
exporting .//splitAudio//chunk16.wav
exporting .//splitAudio//chunk17.wav
exporting .//splitAudio//chunk18.wav
exporting .//splitAudio//chunk19.wav
exporting .//splitAudio//chunk20.wav
exporting .//splitAudio//chunk21.wav
exporting .//splitAudio//chunk22.wav
exporting .//splitAudio//chunk23.wav
exporting .//splitAudio//chunk24.wav
exporting .//splitAudio//chunk25.wav
exporting .//splitAudio//chunk26.wav

You could look at Audiolab It provides a decent API to convert the voice samples into numpy arrays.
The Audiolab module uses the libsndfile C++ library to do the heavy lifting.
You can then parse the arrays to find the lower values to find the pauses.

Use IBM STT. Using timestamps=true you will get the word break up along with when the system detects them to have been spoken.
There are a lot of other cool features like word_alternatives_threshold to get other possibilities of words and word_confidence to get the confidence with which the system predicts the word. Set word_alternatives_threshold to between (0.1 and 0.01) to get a real idea.
This needs sign on, following which you can use the username and password generated.
The IBM STT is already a part of the speechrecognition module mentioned, but to get the word timestamp, you will need to modify the function.
An extracted and modified form looks like:
def extracted_from_sr_recognize_ibm(audio_data, username=IBM_USERNAME, password=IBM_PASSWORD, language="en-US", show_all=False, timestamps=False,
word_confidence=False, word_alternatives_threshold=0.1):
assert isinstance(username, str), "``username`` must be a string"
assert isinstance(password, str), "``password`` must be a string"
flac_data = audio_data.get_flac_data(
convert_rate=None if audio_data.sample_rate >= 16000 else 16000, # audio samples should be at least 16 kHz
convert_width=None if audio_data.sample_width >= 2 else 2 # audio samples should be at least 16-bit
url = "https://stream-fra.watsonplatform.net/speech-to-text/api/v1/recognize?{}".format(urlencode({
"profanity_filter": "false",
"continuous": "true",
"model": "{}_BroadbandModel".format(language),
"timestamps": "{}".format(str(timestamps).lower()),
"word_confidence": "{}".format(str(word_confidence).lower()),
"word_alternatives_threshold": "{}".format(word_alternatives_threshold)
request = Request(url, data=flac_data, headers={
"Content-Type": "audio/x-flac",
"X-Watson-Learning-Opt-Out": "true", # prevent requests from being logged, for improved privacy
authorization_value = base64.standard_b64encode("{}:{}".format(username, password).encode("utf-8")).decode("utf-8")
request.add_header("Authorization", "Basic {}".format(authorization_value))
response = urlopen(request, timeout=None)
except HTTPError as e:
raise sr.RequestError("recognition request failed: {}".format(e.reason))
except URLError as e:
raise sr.RequestError("recognition connection failed: {}".format(e.reason))
response_text = response.read().decode("utf-8")
result = json.loads(response_text)
# return results
if show_all: return result
if "results" not in result or len(result["results"]) < 1 or "alternatives" not in result["results"][0]:
raise Exception("Unknown Value Exception")
transcription = []
for utterance in result["results"]:
if "alternatives" not in utterance:
raise Exception("Unknown Value Exception. No Alternatives returned")
for hypothesis in utterance["alternatives"]:
if "transcript" in hypothesis:
return "\n".join(transcription)

pyAudioAnalysis can segment an audio file if the words are clearly separated (this is rarely the case in natural speech). The package is relatively easy to use:
python pyAudioAnalysis/pyAudioAnalysis/audioAnalysis.py silenceRemoval -i SPEECH_AUDIO_FILE_TO_SPLIT.mp3 --smoothing 1.0 --weight 0.3
More details on my blog.

My variant of function, which probably will be easier to modify for your needs:
from scipy.io.wavfile import write as write_wav
import numpy as np
import librosa
def zero_runs(a):
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
def split_in_parts(audio_path, out_dir):
# Some constants
min_length_for_silence = 0.01 # seconds
percentage_for_silence = 0.01 # eps value for silence
required_length_of_chunk_in_seconds = 60 # Chunk will be around this value not exact
sample_rate = 16000 # Set to None to use default
# Load audio
waveform, sampling_rate = librosa.load(audio_path, sr=sample_rate)
# Create mask of silence
eps = waveform.max() * percentage_for_silence
silence_mask = (np.abs(waveform) < eps).astype(np.uint8)
# Find where silence start and end
runs = zero_runs(silence_mask)
lengths = runs[:, 1] - runs[:, 0]
# Left only large silence ranges
min_length_for_silence = min_length_for_silence * sampling_rate
large_runs = runs[lengths > min_length_for_silence]
lengths = lengths[lengths > min_length_for_silence]
# Mark only center of silence
silence_mask[...] = 0
for start, end in large_runs:
center = (start + end) // 2
silence_mask[center] = 1
min_required_length = required_length_of_chunk_in_seconds * sampling_rate
chunks = []
prev_pos = 0
for i in range(min_required_length, len(waveform), min_required_length):
start = i
end = i + min_required_length
next_pos = start + silence_mask[start:end].argmax()
part = waveform[prev_pos:next_pos].copy()
prev_pos = next_pos
if len(part) > 0:
# Add last part of waveform
part = waveform[prev_pos:].copy()
print('Total chunks: {}'.format(len(chunks)))
new_files = []
for i, chunk in enumerate(chunks):
out_file = out_dir + "chunk_{}.wav".format(i)
print("exporting", out_file)
write_wav(out_file, sampling_rate, chunk)
return new_files


Unable to find headers of jpg while reading a raw disk image (dd)

I am reading a raw disk image using python 3. My task is to retrieve (carve) jpgs as individual files from the disk image. As I know header pattern (\xd8\xff\xe0 or \xd8\xff\xe1) of jpg. I want to know where I get this while reading file.
fobj = open('carve.dd', 'rb')
data = fobj.read(32)
while data != '':
head_loc = findheader(data)
data = fobj.read(32)
def findheader(data) : # to find header in each 32 bytes of data of raw disk image
for i in range(0, len(data) - 3) :
if data[i] == b'\xff' :
if data[i+1:i+4] == b'\xd8\xff\xe0' or data[i+1:i+4] == b'\xd8\xff\xe1' :
return i
return -1
The same code is working fine in Python 2. In Python 2, I am able to get headers in just a few seconds from image. Can someone help me out, what is the problem in Python 3?
This code snippet is actually from this https://github.com/darth-cheney/JPEG-Recover/blob/master/jpegrecover2.py
This runs fine in Python 2 but not in Python 3. Please forget about inconsistent tab error when you run the code in link. I again retyped in VS code.
Like the old saying goes, I've got some bad news and some good news. The bad is I can't figure out why your code doesn't work the same in both version 2 and version 3 of Python.
The good is that I was able to reproduce the problem using the sample data you provided, but—more importantly—able to devise something that not only works consistently in both versions, it's likely much faster because it doesn't use a for loop to search through each chunk of data looking for the .jpg header patterns.
from __future__ import print_function
LIMIT = 100000 # Number of chunks (for testing).
CHUNKSIZE = 32 # Bytes.
HDRS = b'\xff\xd8\xff\xe0', b'\xff\xd8\xff\xe1'
IMG_PATH = r'C:\vols\Files\Temp\carve.dd.002'
with open(IMG_PATH, 'rb') as file:
chunk_index = 0
found = 0
while True:
data = file.read(CHUNKSIZE)
if not data:
# Search for each of the headers in each chunk.
for hdr in HDRS:
offset = 0
while offset < (CHUNKSIZE - len(hdr)):
head_loc = data[offset:].index(hdr)
except ValueError: # Not found.
found += 1
file_offset = chunk_index*CHUNKSIZE + head_loc
print('found: #{} at {:,}'.format(found, file_offset))
offset += (head_loc + len(hdr))
chunk_index += 1
if LIMIT and (chunk_index == LIMIT): break # Stop after this many chunks.
print('total found {}'.format(found))

Python - Mix two audio chunks

I have two Byte objects.
One comes from using the Wave module to read a "chunk" of data:
def get_wave_from_file(filename):
import wave
original_wave = wave.open(filename, 'rb')
return original_wave
The other uses MIDI information and a Synthesizer module (fluidsynth)
def create_wave_from_midi_info(sound_font_path, notes):
import fluidsynth
s = []
fl = fluidsynth.Synth()
sfid = fl.sfload(sound_font_path) # Loads a soundfont
fl.program_select(track=0, soundfontid=sfid, banknum=0, presetnum=0) # Selects the soundfont
for n in notes:
fl.noteon(0, n['midi_num'], n['velocity'])
s = np.append(s, fl.get_samples(int(44100 * n['duration']))) # Gives the note the correct duration, based on a sample rate of 44.1Khz
fl.noteoff(0, n['midi_num'])
samps = fluidsynth.raw_audio_string(s)
return samps
The two files are of different length.
I want to combine the two waves, so that both are heard simultaneously.
Specifically, I would like to do this "one chunk at a time".
Here is my setup:
def get_a_chunk_from_each(wave_object, bytes_from_midi, chunk_size=1024, starting_sample=0)):
from_wav_data = wave_object.readframes(chunk_size)
from_midi_data = bytes_from_midi[starting_sample:starting_sample + chunk_size]
return from_wav_data, from_midi_data
Info about the return from get_a_chunk_from_each():
type(from_wav_data), type(from_midi_data)
len(from_wav_data), type(from_midi_data)
4096 1024
Firstly, I'm confused as to why the lengths are different (the one generated from wave_object.readframes(1024) is exactly 4 times longer than the one generated by manually slicing bytes_from_midi[0:1024]. This may be part of the reason I have been unsuccessful.
Secondly, I want to create the function which combines the two chunks. The following "pseudocode" illustrates what I want to happen:
def combine_chunks(chunk1, chunk2):
mixed = chunk1 + chunk2
# OR, probably more like:
mixed = (chunk1 + chunk2) / 2
# To prevent clipping?
return mixed
It turns out there is a very, very simple solution.
I simply used the library audioop:
and used their "add" function ("width" is the sample width in bytes. Since this is 16 bit audio, that's 16 / 8 = 2 bytes):
audioop.add(chunk1, chunk2, width=2)

Instructables open source code: Python IndexError: list index out of range

I've seen this error on several other questions but couldn't find the answer.
{I'm a complete stranger to Python, but I'm following the instructions from a site and I keep getting this error once I try to run the script:
IndexError: list index out of range
Here's the script:
##//txt to stl conversion - 3d printable record
##//by Amanda Ghassaei
##//Dec 2012
## * This program is free software; you can redistribute it and/or modify
## * it under the terms of the GNU General Public License as published by
## * the Free Software Foundation; either version 3 of the License, or
## * (at your option) any later version.
import wave
import math
import struct
bitDepth = 8#target bitDepth
frate = 44100#target frame rate
fileName = "bill.wav"#file to be imported (change this)
#read file and get data
w = wave.open(fileName, 'r')
numframes = w.getnframes()
frame = w.readframes(numframes)#w.getnframes()
frameInt = map(ord, list(frame))#turn into array
#separate left and right channels and merge bytes
frameOneChannel = [0]*numframes#initialize list of one channel of wave
for i in range(numframes):
frameOneChannel[i] = frameInt[4*i+1]*2**8+frameInt[4*i]#separate channels and store one channel in new list
if frameOneChannel[i] > 2**15:
frameOneChannel[i] = (frameOneChannel[i]-2**16)
elif frameOneChannel[i] == 2**15:
frameOneChannel[i] = 0
frameOneChannel[i] = frameOneChannel[i]
#convert to string
audioStr = ''
for i in range(numframes):
audioStr += str(frameOneChannel[i])
audioStr += ","#separate elements with comma
fileName = fileName[:-3]#remove .wav extension
text_file = open(fileName+"txt", "w")
Thanks a lot,
Leart - check these it may help:
Is your input file in correct format? As I see it, you need to produce that file before hand before you can use it in this program... Post that file in here as well.
Check if your bitrate and frame rates are correct
Just for debugging purposes (if the code is correct, this may not produce correct results, but good for testing). You are accessing frameInt[4*i+1], with index i multiplied by 4 then adding 1 (going beyond the frameInt index eventually).
Add an 'if' to check size before accessing the array element in frameInt:
if len(frameInt)>=(4*i+1):
Add that statement right after the first occurence of "for i in range(numframes):" and just before "frameOneChannel[i] = frameInt[4*i+1]*2**8+frameInt[4*i]#separate channels and store one channel in new list"
*watch tab spaces

Generate 2d images of molecules from PubChem FTP data

Rather than crawl PubChem's website, I'd prefer to be nice and generate the images locally from the PubChem ftp site:
The only problem is that I'm limited to OSX and Linux and I can't seem to find a way of programmatically generating the 2d images that they have on their site. See this example:
Under the heading "2D Structure" we have this image here:
That is what I'm trying to generate.
If you want something working out of the box I would suggest using molconvert from ChemAxon's Marvin (https://www.chemaxon.com/products/marvin/), which is free for academics. It can be used easily from the command line and it supports plenty of input and output formats. So for your example it would be:
molconvert "png" -s "C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl" -o cdnb.png
Resulting in the following image:
It also allows you to set parameters such as width, height, quality, background color and so on.
However, if you are a programmer I would definitely recommend RDKit. Follows a code which generates images for a pair of compounds given as smiles.
from rdkit import Chem
from rdkit.Chem import Draw
ms_smis = [["C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl", "cdnb"],
["C1=CC(=CC(=C1)N)C(=O)N", "3aminobenzamide"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]
for m in ms: Draw.MolToFile(m[0], m[1] + ".svg", size=(800, 800))
This gives you following images:
So I also emailed the PubChem guys and they got back to me very quickly with this response:
The only bulk access we have to images is through the download
service: https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi
You can request up to 50,000 images at a time.
Which is better than I was expecting, but still not amazing since it requires downloading things that I in theory could generate locally. So I'm leaving this question open until some kind soul writes an open source library to do the same.
I figure I might as well save people some time if they are doing the same thing as I am. I've created a Ruby Gem backed on Mechanize to automate the downloading of images. Please be kind to their servers and only download what you need.
gem install pubchem
An open source option is the Indigo Toolkit, which also has pre-compiled packages for Linux, Windows, and MacOS and language bindings for Python, Java, .NET, and C libraries. I chose the 1.4.0 beta.
I had a similar interest to yours in converting SMILES to 2D structures and adapted my Python to address your question and to capture timing information. It uses the PubChem FTP (Compound/Extras) download of CID-SMILES.gz. The following script is an implementation of a local SMILES-to-2D-structure converter that reads a range of rows from the PubChem CID-SMILES file of isomeric SMILES (which contains over 102 million compound records) and converts the SMILES to PNG images of the 2D structures. In three tests with 1000 SMILES-to-structure conversions, it took 35, 50, and 60 seconds to convert 1000 SMILES at file row offsets of 0, 100,000, and 10,000,000 on my Windows 10 laptop (Intel i7-7500U CPU, 2.70GHz) with a solid state drive and running Python 3.7.4. The 3000 files totaled 100 MB in size.
from indigo import *
from indigo.renderer import *
import subprocess
import datetime
def timerstart():
# start timer and print time, return start time
start = datetime.datetime.now()
print("Start time =", start)
return start
def timerstop(start):
# end timer and print time and elapsed time, return elapsed time
endtime = datetime.datetime.now()
elapsed = endtime - start
print("End time =", endtime)
print("Elapsed time =", elapsed)
return elapsed
numrecs = 1000
recoffset = 0 # 10000000 # record offset
starttime = timerstart()
indigo = Indigo()
renderer = IndigoRenderer(indigo)
# set render options
indigo.setOption("render-atom-color-property", "color")
indigo.setOption("render-coloring", True)
indigo.setOption("render-comment-position", "bottom")
indigo.setOption("render-comment-offset", "20")
indigo.setOption("render-background-color", 1.0, 1.0, 1.0)
indigo.setOption("render-output-format", "png")
# set data path (including data file) and output file path
datapath = r'../Download/CID-SMILES'
pngpath = r'./2D/'
# read subset of rows from data file
mycmd = "head -" + str(recoffset+numrecs) + " " + datapath + " | tail -" + str(numrecs)
(out, err) = subprocess.Popen(mycmd, stdout=subprocess.PIPE, shell=True).communicate()
lines = str(out.decode("utf-8")).split("\n")
count = 0
for line in lines:
cols = line.split("\t") # split on tab
key = cols[0] # cid in cols[0]
smiles = cols[1] # smiles in cols[1]
mol = indigo.loadMolecule(smiles)
s = "CID=" + key
indigo.setOption("render-comment", s)
#indigo.setOption("render-image-size", 200, 250)
#indigo.setOption("render-image-size", 400, 500)
renderer.renderToFile(mol, pngpath + key + ".png")
count += 1
print("Error processing line after", str(count), ":", line)
elapsedtime = timerstop(starttime)
print("Converted", str(count), "SMILES to PNG")

How can I produce real-time audio output from music made with Music21?

How can I produce real-time audio output from music made with Music21. Failing that, how can i produce ANY audio output from music made with Music21 via open-source software? Thanks for the help.
As you've seen, music21 isn't designed to be a music playback system, but it IS designed to be embedded within other playback systems or to call them from within the system. We're not planning on putting too much work into playback systems (because of the hardware support, our being a tiny research lab, the work still needing to be done on musical analysis, etc.), but your solution is so elegant that it is now included in all versions of music21 (post v1.1) as the music21.midi.realtime module. Here's an example that takes music21's ability to dynamically allocate midi channels with different pitch-bend objects in order to simulate microtonal playback (a major problem for most midi playback):
# Set up a detuned piano
# (where each key has a random
# but consistent detuning from 30 cents flat to sharp)
# and play a Bach Chorale on it in real time.
from music21 import *
import random
keyDetune = []
for i in range(0, 127):
keyDetune.append(random.randint(-30, 30))
b = corpus.parse('bach/bwv66.6')
for n in b.flat.notes:
n.microtone = keyDetune[n.pitch.midi]
sp = midi.realtime.StreamPlayer(b)
The StreamPlayer's .play() function can also take busyFunction and busyArgs and busyWaitMilliseconds arguments which specify a function to call with arguments at most every busyWaitMilliseconds (could be more if your system is slower). There is also an endFunction and endArgs that will be called at the end, in case you want to set up some sort of threaded playback. -- Myke Cuthbert (Music21 creator)
So here's what I found out. Here's a python script that works on Windows XP. It needs pygame in addition to music21.
# genPlayM21Score.py Generates and Plays 2 Music21 Scores "on the fly".
# see way below for source notes
from music21 import *
# we create the music21 Bottom Part, and do this explicitly, one object at a time.
n1 = note.Note('e4')
n1.duration.type = 'whole'
n2 = note.Note('d4')
n2.duration.type = 'whole'
m1 = stream.Measure()
m2 = stream.Measure()
partLower = stream.Part()
# For the music21 Upper Part, we automate the note creation procedure
data1 = [('g4', 'quarter'), ('a4', 'quarter'), ('b4', 'quarter'), ('c#5', 'quarter')]
data2 = [('d5', 'whole')]
data = [data1, data2]
partUpper = stream.Part()
def makeUpperPart(data):
for mData in data:
m = stream.Measure()
for pitchName, durType in mData:
n = note.Note(pitchName)
n.duration.type = durType
# Now, we can add both Part objects into a music21 Score object.
sCadence = stream.Score()
sCadence.insert(0, partUpper)
sCadence.insert(0, partLower)
# Now, let's play the MIDI of the sCadence Score [from memory, ie no file write necessary] using pygame
import cStringIO
# for music21 <= v.1.2:
if hasattr(sCadence, 'midiFile'):
sCadence_mf = sCadence.midiFile
else: # for >= v.1.3:
sCadence_mf = midi.translate.streamToMidiFile(sCadence)
sCadence_mStr = sCadence_mf.writestr()
sCadence_mStrFile = cStringIO.StringIO(sCadence_mStr)
import pygame
freq = 44100 # audio CD quality
bitsize = -16 # unsigned 16 bit
channels = 2 # 1 is mono, 2 is stereo
buffer = 1024 # number of samples
pygame.mixer.init(freq, bitsize, channels, buffer)
# optional volume 0 to 1.0
def play_music(music_file):
stream music with mixer.music module in blocking manner
this will stream the sound from disk while playing
clock = pygame.time.Clock()
print "Music file %s loaded!" % music_file
except pygame.error:
print "File %s not found! (%s)" % (music_file, pygame.get_error())
while pygame.mixer.music.get_busy():
# check if playback has finished
# play the midi file we just saved
# now let's make a new music21 Score by reversing the upperPart notes
data2 = [('d5', 'whole')]
data = [data1, data2]
partUpper = stream.Part()
sCadence2 = stream.Score()
sCadence2.insert(0, partUpper)
sCadence2.insert(0, partLower)
# now let's play the new Score
sCadence2_mf = sCadence2.midiFile
sCadence2_mStr = sCadence2_mf.writestr()
sCadence2_mStrFile = cStringIO.StringIO(sCadence2_mStr)
## There are 3 sources for this mashup:
# 1. Source for the Music21 Score Creation http://web.mit.edu/music21/doc/html/quickStart.html#creating-notes-measures-parts-and-scores
# 2. Source for the Music21 MidiFile Class Behaviour http://mit.edu/music21/doc/html/moduleMidiBase.html?highlight=midifile#music21.midi.base.MidiFile
# 3. Source for the pygame player: http://www.daniweb.com/software-development/python/code/216979/embed-and-play-midi-music-in-your-code-python

