Mix audio files make clipping on python

Mix audio files make clipping on python - python

I have some audio files.
I mixed audio files.
for idx,f in enumerate(files):
if idx == 0:
sound = pydub.AudioSegment.from_file(f)
else:
temp = pydub.AudioSegment.from_file(f)
sound = sound.overlay(temp, position=0)
sound.export("totakmix.wav",format="wav")
Each audio file is not clipping.
However, mix file is clipping.
Is there any way to prevent this??

The easiest thing you can do to prevent clipping while using overlay is to apply negative gain correction with gain_during_overlay like this:
sound = sound.overlay(temp, position=0, gain_during_overlay=-3)
to changes the audio by 3 dB while overlaying audio. Why 3 dB? It translates to roughly twice power gain, so if your original audio was not clipping, the end result should not either.

Related

How to process a wav file for WebCRT VAD on Python

I'm learning how to use py-webrtcvad library for Python and trying to pick out segments containing speech on a .wav file. Right now, I don't know how to split a .wav files into smaller frames to put it into a vad object. Here's my code below
spf = wave.open("test4.wav", "rb")
signal = spf.readframes(-1)
vad = webrtcvad.Vad(3)
sample_rate = 16000
frame_duration = 10 # ms
#TODO: turn signal into segments and run the below print statement in a for loop
print('Contains speech: %s' % (vad.is_speech(signal, sample_rate)))
As I have read from https://github.com/wiseman/py-webrtcvad, the frames fed into the vad object must be 16-bit and either 10,20 or 30ms long. My sample is about 9s long and now I don't know how to split it into the appropriate frames. I have tested the example.py in the repo, but it is a bit too complicated for me so I tried to do a very simple example first.
Thank you.

Trying to split audio into 20ms chunks and making a spectrogram but its looking weird

I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar

How to apply transparency to clips in moviepy?

So I'm trying to create a clip with moviepy where five semi-transparent clips are overlaid on each other using the CompositeVideoClip.
The output should be a clip of length equal to the longest clip, where the all the layers of the composite clip are visible.
My code looks something like this:
from moviepy.editor import *
clip_1 = VideoFileClip('some\\path\\here.mp4')
clip_2 = VideoFileClip('some\\path\\here.mp4')
clip_3 = VideoFileClip('some\\path\\here.mp4')
clip_4 = VideoFileClip('some\\path\\here.mp4')
clip_5 = VideoFileClip('some\\path\\here.mp4')
list_of_clips = [clip_1, clip_2, clip_3, clip_4, clip_5]
for index, clip in enumerate(list_of_clips):
list_of_clips[index] = clip.set_opacity(.20)
output_clip = CompositeVideoClip(list_of_clips)
output_clip.write_videofile('some\\path\\here.mp4')
Code runs fine, except transparency is not applied.
Neither does this work:
clip = VideoFileClip(some\\path\\here.mp4).set_opacity(.30)
clip.write_videofile(some\\path\\here.mp4)
Export works fine, but clip is fully opaque.
Any suggestions for how to achieve transparency in clip outputs?

the mp4 (I'm assuming h264) format does not offer transparency. webM (vp9) and some variants of h265 do offer transparency.
Im not sure exactly what you are trying to do - but perhaps creating the overlaid videos as webm (transparency supported) - and then converting to h264 at the end might work for you.

pydub audio glitches when splitting/joining mp3

I'm experimenting with pydub, which I like very much, however I am having a problem when splitting/joining an mp3 file.
I need to generate a series of small snippets of audio on the server, which will be sent in sequence to a web browser and played via an <audio/> element. I need the audio playback to be 'seamless' with no audible joins between the separate pieces. At the moment however, the joins between the separate bits of audio are quite obvious, sometimes there is a short silence and sometimes a strange audio glitch.
In my proof of concept code I have taken a single large mp3 and split it into 1-second chunks as follows:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
segment = song[p1:p2] # 1 second of audio
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
song_pos += 1
The client_data values are streamed to the browser over a long-lived http connection:
socket.send("HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: audio/mp3\r\n\r\n")
and then for each new chunk of audio
socket.send(client_data)
Can anyone explain the glitches that I am hearing, and suggest a way to eliminate them?

Upgrading my comment to an answer:
The primary issue is that MP3 codecs used by ffmpeg add silence to the end of the encoded audio (and your approach is producing multiple individual audio files).
If possible, use a lossless format like wave and then reduce the file size with gzip or similar. You may also be able to use lossless audio compression (for example, flac) but it probably depends on how the encoder works.
I don't have a conclusive explanation for the audible artifacts you're hearing, but it could be that you're splitting the audio at a point where the signal is non-zero. If a sound begins with a sample with a value of 100 (for example), that would sound like a digital popping sound. The MP3 compression may also alter the sound though, especially at lower bit rates. If this is the issue, a 1ms fade in will eliminate the pop without a noticeable audible "fade" (though potentially introduce other artifacts) - a longer fade in (like 20 or 50 ms would avoid strange frequency domain artifacts but would introduce noticeable a "fade in".
If you're willing to do a little more (coding) work, you can search for a "zero crossing" (basically, a place where the signal is at a zero point naturally) and split the audio there.
Probably the best approach if it's possible:
Encode the entire signal as a single, compressed file, and send the bytes (of that one file) down to the client in chunks for playback as a single stream. If you use constant bitrate mp3 encoding (CBR) you can send almost perfectly 1 second long chunks just by counting bytes. e.g., with 256kbps CBR, just send 256 KB at a time.

So, I could be totally wrong I don't usually mess with audio files but it could be an indexing issue. try,
p2 = p1 + 1001
but you may need to invert the concatenation process for it to work. Unless you add an extra millisecond on the end.
The only other thing I would think it could be is an artifact in the stream that enters when you convert the bytes to string. Try, using the AudioSegment().raw_data endpoint for a bytes representation of the audio.

Sound is waveform and you are connecting two waves that are out of phase from one another; so you get a step function and that makes the pop.
I'm unfamiliar with this software but codifying Nils Werner's suggestions, you might try:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
# begin with a millisecond of blank
segment = AudioSegment.silent(duration=1)
# append all your pieces to it
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
#append an item to your segment with several milliseconds of crossfade
segment.append(song[p1:p2], crossfade=50)
song_pos += 1
# then pass it on to your client outside of your loop
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
depending upon how low/high the frequency of what you're joining you'll need to adjust the crossfade time to blend; low frequency will require more fade.

Sound generation / synthesis with python?

Is it possible to get python to generate a simple sound like a sine wave?
Is there a module available for this? If not, how would you go about creating your own?
Also, would you need some kind of host environment for python to run in in order to play sound, or can it be achieved just from making calls from the terminal?
If the answer is OS-dependent, I'm using a mac.

I was looking for the same thing, In the end, I wrote this code which is working fine.
import math #import needed modules
import pyaudio #sudo apt-get install python-pyaudio
PyAudio = pyaudio.PyAudio #initialize pyaudio
#See https://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 16000 #number of frames per second/frameset.
FREQUENCY = 500 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1 #seconds to play sound
BITRATE = max(BITRATE, FREQUENCY+100)
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
#generating wawes
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()

I know I'm a little late to the game on this one, but this is a pretty fantastic python project for synthesis and audio composition: https://github.com/hecanjog/pippi
It's still actively being developed, but it's been going for a while.

After wasting time on some uncompilable or non-existent projects, I discovered the python module wavebender, which offers generation of single or multiple channels of sine, square and combined waves. The results can be written either to a wavefile or to sys.stdout, from where they can be interpreted directly by aplay in real-time. Some useful examples are explained here, and are included at the project's github page.

The Python In Music wiki page has not been terribly well-kept-up, but it's a good starting point.
http://wiki.python.org/moin/PythonInMusic

I am working on a powerful synthesizer in python. I used custom functions to write directly to a .wav file. There are built in functions that can be used for this purpose. You will need to modify the .wav header to reflect the sample rate, bits per sample, number of channels, and duration of synthesis.
Here is an early version of a sin wave generator that outputs a list of values that after applying bytearray becomes suitable for writing to the data parameter of a wave file. [edit] A conversion function will need to transform the list into little endian hex values before applying the bytearray. See the WAVE PCM soundfile format link below for details on the .wav specification.[/edit]
def sin_basic(freq, time=1, amp=1, phase=0, samplerate=44100, bitspersample=16):
bytelist = []
import math
TwoPiDivSamplerate = 2*math.pi/samplerate
increment = TwoPiDivSamplerate * freq
incadd = phase*increment
for i in range(int(samplerate*time)):
if incadd > (2**(bitspersample - 1) - 1):
incadd = (2**(bitspersample - 1) - 1) - (incadd - (2**(bitspersample - 1) - 1))
elif incadd < -(2**(bitspersample - 1) - 1):
incadd = -(2**(bitspersample - 1) - 1) + (-(2**(bitspersample - 1) - 1) - incadd)
bytelist.append(int(round(amp*(2**(bitspersample - 1) - 1)*math.sin(incadd))))
incadd += increment
return bytelist
A newer version can use waveforms to modulate the frequency, amplitude, and phase of the waveform parameters. The data format makes it trivial to blend and concatenate waves together. If this seems up your alley, check out WAVE PCM soundfile format.

I like PyAudiere , which lets you play numpy arrays as sounds... I guess it jives well with my Matlab background. I believe it's cross-platform. I think scikits.audiolab does the same thing, and may be more current / better supported... seems easier to me than trying to save things as wavfiles or write them to buffers and use Python's builtin sound library.

I found these two python repositories very useful, might wanna have a look at it...
python https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
ipython : https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
[EDIT] As pointed out, here is an explanational of the two links
python one seems to have an error, but many people were able to make it run, so I'm not sure. ipython worked like a charm, so I hope you can run it.
Both of the links are supposed to take an audio as an input, preferably .wav file. Featurize it ( USE FFT : 512, step size = 512/8 ) to obtain spectrograms ( you can even visualize it ), it's a 2D matrix, and then train your Machine learning objects or do whatever you want using a matrix that represents the original audio. If you want, at anypoint, what those vectors represent you can resynthesize audio back as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.