my program reduces music speed by 50% but only in one channel

my program reduces music speed by 50% but only in one channel - python

I am using the wave library in python to attempt to reduce the speed of audio by 50%. I have been successful, but only in the right channel. in the left channel it is a whole bunch of static.
import wave,os,math
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
for i in range(0,len(frames)-1):
newframes.append(frames[i])
newframes.append(frames[i])
w.writeframesraw(newframes)
why is this? since I am just copying and pasting raw data surely I can't generate static?
edit: I've been looking for ages and I finally found a useful resource for the wave format: http://soundfile.sapp.org/doc/WaveFormat/
If I want to preserve stereo sound, it looks like I need to copy the actual sample width of 4 twice. This is because there are two channels and they take up 4 bytes instead of 2.
`import wave
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
w.setframerate(r.getframerate())
print(r.getsampwidth())
for i in range(0,len(frames)-4,4):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
w.writeframesraw(newframes)`
Edit 2:
Okay I have no idea what drove me to do this but I am already enjoying the freedoms it is giving me. I chose to copy the wav file into memory, edit the copy directly, and write it to an output file. I am incredibly happy with the results. I can import a wav, repeat the audio once, and write it to an output file, in only 0.2 seconds. Reducing the speed by half times now takes only 9 seconds instead of the 30+ seconds with my old code using the wav plugin :) here's the code, still kind of un-optimized i guess but it's better than what it was.
import struct
import time as t
t.clock()
r=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/aha.wav","rb")
w=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/output.wav","wb")
rbuff=bytearray(r.read())
def replacebytes(array,bites,stop):
length=len(bites)
start=stop-length
for i in range(start,stop):
array[i]=bites[i-start]
def write(audio):
w.write(audio)
def repeat(audio,repeats):
if(repeats==1):
return(audio)
if(repeats==0):
return(audio[:44])
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*repeats), 44)
return(audio+(audio[44:len(audio)-58]*(repeats-1)))
def slowhalf(audio):
buff=bytearray()
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*2), 44)
for i in range(44,len(audio)-62,4):
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
return(audio[:44]+buff)
rbuff=slowhalf(rbuff)
write(rbuff)
print(t.clock())
I am surprised at how small the code is.

Each of the elements returned by readframes is a single byte, even though the type is int. An audio sample is typically 2 bytes. By doubling up each byte instead of each whole sample, you get noise.
I have no idea why one channel would work, with the code shown in the question it should be all noise.
This is a partial fix. It still intermixes the left and right channel, but it will give you an idea of what will work.
for i in range(0,len(frames)-1,2):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i])
newframes.append(frames[i+1])
Edit: here's the code that should work in stereo. It copies 4 bytes at a time, 2 for the left channel and 2 for the right, then does it again to double them up. This will keep the channel data from interleaving.
for i in range(0, len(frames), 4):
for _ in range(2):
for j in range(4):
newframes.append(frames[i+j])

Related

Speeding up grabbing alphachannel

I don't know if there is anything that can be done to speed up my code at all, probably not by much if at all, but I thought I would ask here.
I am working on a python script for a program that uses a custom embedded python interpreter so I can only use the default libraries. External libraries like Pillow and Numpy don't work because they changed the name of the python dll and so the precompiled libraries can't interact with it.
This program doesn't support pasting transparent images from the clipboard outside of its own proprietary format. So I'm writing a script to cover that feature. It grabs the CF_DIBv5 format from the clipboard using ctypes and checks to see if it is 32bpp and that an alphamask exists.
Here's the slow part. I then need to isolate the alpha channel and save it as its own separate image. I can do this easily enough. Just grab a Long from the byte string, & the mask to get the alpha channel, and add pack it back to my new bitmap bytestring. On a small 300x300 image, this takes close to 10 seconds. Which isn't horrible. I will gladly live with that. However, I fear it's going to be horribly slow on larger megapixel images.
I'm not showing the complete code here because it's a horrible ugly mess and most of it is just defining the structures I'm using for my bitmap class and getting ctypes working. But here are the important parts where I loop over the data.
rowsizemask = calcRowSize(24,bmp.header.bV5Width) #returns bytes per row needed
rowmaskpadding = b'\x00'*(rowsizemask - bmp.header.bV5Width*3) #creates padding bytes
#loop over image data
for y in range(bmp.header.bV5Height):
for x in range(bmp.header.bV5Width):
offset, color = unpack(offset,">L",buff) #calls struct.unpack in custom function
color = color[0] & bmp.header.bV5AlphaMask #gets alpha channel
newbmp.pixels += struct.pack(">3B", color,color,color) #creates 24bpp listing
newbmp.pixels += rowmaskpadding #pad row to meet BMP specs
So what do you think? Am I missing something obvious? Or is this about as good as it's going to get with pure python only?

Okay, so after some more digging. I realized I could use ctypes.create_string_buffer to create a binary string of the perfect size and then use slices to change the values.
There are more tiny optimizations and code cleanups I can do but this has taken it from a script that can easily take several minutes to complete on a 900x900 pixel image, to just a few seconds.
Is this the best option? No idea, but it works. And it's faster than I had thought possible. See the edited code here. The changes are minor.
rowSizeMask = calcRowSize(24,bmp.header.bV5Width) #returns bytes per row needed
paddingLength = (rowSizeMask = bmp.header.bV5Width*3)
rowMaskPadding = b'\x00'*paddingLength #creates padding bytes
writeOffset = 0
#create pixel buffer
#rowsize mask includes padding, multiply by height for total byte count
newBmp.pixels = ctypes.create_string_buffer(bmp.heaer.bV5Height * rowSizeMask)
#loop over image data
for y in range(bmp.header.bV5Height):
for x in range(bmp.header.bV5Width):
offset, color = unpack(offset,">L",buff) #calls struct.unpack in custom function
color = color[0] & bmp.header.bV5AlphaMask #gets alpha channel
newBmp.pixels[writeOffset:writeOffset+3] = struct.pack(">3B", color,color,color) #creates 24bpp listing
writeOffset += 3
newBmp.pixels += rowMaskPadding #pad row to meet BMP specs
writeOffset += paddingLength

Load a image.png in a few milliseconds

I need to perform a function on images in less than 1 second. I have a problem on a 1000x1000 image that, just to load it as a matrix in the program, takes 1 second.
The function I use to load it is as follows:
import png
def load(fname):
with open(fname, mode='rb') as f:
reader = png.Reader(file=f)
w, h, png_img, _ = reader.asRGB8()
img = []
for line in png_img:
l = []
for i in range(0, len(line), 3):
l+=[(line[i], line[i+1], line[i+2])]
img+=[l]
return img
How can I modify it in such a way that, when opening the image, it takes a little more than a few milliseconds?
IMPORTANT NOTE: I cannot import other functions outside of this (this is a university exercise and therefore there are rules -.-). So I have to get one myself

you can use PIL to do this for you, it's highly optimized and fast
from PIL import Image
def load(path):
return Image.open(path)

Appending to a list is inherently slow - read about Shlemiel the painter’s algorithm. You can replace it with a generator expression and slicing.
for line in png_img:
img += list(zip(line[0::3], line[1::3], line[2::3])

I'm not sure it is remotely possible to run a python script that opens a file, etc. in just a few ms. On my computer, the simplest program takes several 10ms
Without knowing more about the specifics of your problem and the reasons for your constraint, it is hard to answer. You should consider what you are trying to do, in the context of the way your program really works, and then formulate a strategy to achieve your goal.
The total context here is, you're asking the computer to:
run python, load your code and interpret it
load any modules you want to use
find your image file and read it from disk
give those bytes some meaning as an image abstraction - parse, etc these bytes
do some kind of transform or "work" on the image
export your result in some way
You need to figure out which of those steps is it that really needs to be lightning fast. After that, maybe someone can make a suggestion.

Sounddevice's wait() method not working for short sounds

For a project including a moving robot arm, i need a "Geiger-Müller counter"-like distance-based alarm.
For that i wrote a python module and tried to add the posibility, if the robot arm is on the left side of an object, the sound is only on the left speaker, right side analogue.
For that i looked into the sounddevice library, where an easy channel mapping is possible.
As can be seen in the code-snippet 1 below, and like in the docu, i am calling sd.play() and waiting with sd.wait() till the sound finishes.
When i am using a sample-sound with a length of 6 secs, everything works fine. But if i am using the desired sound, a short beep-sound (under 1 sec) the code is not working. The script ends after ~1 sec without any sound.
I can fix this via adding a sleep-statement (commented out in the code).
But for the context i need a smaller playing-window than 0.5 sec.
Does anyone know how i can fix this or work around?
I tried to use pyaudio instead, but wasnt able to get the mono .wav file into a stereo byte array, for switching the audio dynamically on left/right speaker. (code-snippet2)
But there i am always encountering an error ("asscii codec cant decode byte on pos 2: ordinal not in range(128)")
Snippet1:
import sounddevice as sd
import soundfile as sf
data, fs = sf.read(args.filename, dtype='float32')
byte = np.array(data)
sd.play(byte, fs)
#time.sleep(0.5)
status = sd.wait()
Snippet2:
data = wave.readframes(chunkSize)
stereo_signal = np.zeros([len(data), 2])
stereo_signal[:,0] = data[:]
stereo_signal[:,1] = 0

pydub audio glitches when splitting/joining mp3

I'm experimenting with pydub, which I like very much, however I am having a problem when splitting/joining an mp3 file.
I need to generate a series of small snippets of audio on the server, which will be sent in sequence to a web browser and played via an <audio/> element. I need the audio playback to be 'seamless' with no audible joins between the separate pieces. At the moment however, the joins between the separate bits of audio are quite obvious, sometimes there is a short silence and sometimes a strange audio glitch.
In my proof of concept code I have taken a single large mp3 and split it into 1-second chunks as follows:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
segment = song[p1:p2] # 1 second of audio
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
song_pos += 1
The client_data values are streamed to the browser over a long-lived http connection:
socket.send("HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: audio/mp3\r\n\r\n")
and then for each new chunk of audio
socket.send(client_data)
Can anyone explain the glitches that I am hearing, and suggest a way to eliminate them?

Upgrading my comment to an answer:
The primary issue is that MP3 codecs used by ffmpeg add silence to the end of the encoded audio (and your approach is producing multiple individual audio files).
If possible, use a lossless format like wave and then reduce the file size with gzip or similar. You may also be able to use lossless audio compression (for example, flac) but it probably depends on how the encoder works.
I don't have a conclusive explanation for the audible artifacts you're hearing, but it could be that you're splitting the audio at a point where the signal is non-zero. If a sound begins with a sample with a value of 100 (for example), that would sound like a digital popping sound. The MP3 compression may also alter the sound though, especially at lower bit rates. If this is the issue, a 1ms fade in will eliminate the pop without a noticeable audible "fade" (though potentially introduce other artifacts) - a longer fade in (like 20 or 50 ms would avoid strange frequency domain artifacts but would introduce noticeable a "fade in".
If you're willing to do a little more (coding) work, you can search for a "zero crossing" (basically, a place where the signal is at a zero point naturally) and split the audio there.
Probably the best approach if it's possible:
Encode the entire signal as a single, compressed file, and send the bytes (of that one file) down to the client in chunks for playback as a single stream. If you use constant bitrate mp3 encoding (CBR) you can send almost perfectly 1 second long chunks just by counting bytes. e.g., with 256kbps CBR, just send 256 KB at a time.

So, I could be totally wrong I don't usually mess with audio files but it could be an indexing issue. try,
p2 = p1 + 1001
but you may need to invert the concatenation process for it to work. Unless you add an extra millisecond on the end.
The only other thing I would think it could be is an artifact in the stream that enters when you convert the bytes to string. Try, using the AudioSegment().raw_data endpoint for a bytes representation of the audio.

Sound is waveform and you are connecting two waves that are out of phase from one another; so you get a step function and that makes the pop.
I'm unfamiliar with this software but codifying Nils Werner's suggestions, you might try:
song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
# begin with a millisecond of blank
segment = AudioSegment.silent(duration=1)
# append all your pieces to it
while song_pos < 100:
p1 = song_pos * 1000
p2 = p1 + 1000
#append an item to your segment with several milliseconds of crossfade
segment.append(song[p1:p2], crossfade=50)
song_pos += 1
# then pass it on to your client outside of your loop
output = StringIO.StringIO()
segment.export(output, format="mp3")
client_data = output.getvalue() # send this to client
depending upon how low/high the frequency of what you're joining you'll need to adjust the crossfade time to blend; low frequency will require more fade.

splitting wav file in python

I'm trying to split a wav file programmatically in Python. Based on hints from stackoverflow as well as the documentation from the Python wave module I'm doing the following
import wave
origAudio = wave.open('inputFile.wav','r')
frameRate = origAudio.getframerate()
nChannels = origAudio.getnchannels()
sampWidth = origAudio.getsampwidth()
start = float(someStartVal)
end = float(someEndVal)
origAudio.setpos(start*frameRate)
chunkData = origAudio.readframes(int((end-start)*frameRate))
chunkAudio = wave.open('outputFile.wav','w')
chunkAudio.setnchannels(nChannels)
chunkAudio.setsampwidth(sampWidth)
chunkAudio.setframerate(frameRate)
chunkAudio.writeframes(chunkData)
chunkAudio.close()
I iterate through a number of different start and end values, and extract chunks of audio from the original file in this manner. What's weird is that the technique works perfectly fine for some chunks, and produces garbage white noise for others. Also there's no obvious pattern of which start and end positions produce white noise, just that it happens consistently for an input file.
Anyone experienced this sort of behaviour before? Or know what I'm doing wrong? Suggestions on better ways of splitting an audio file programmatically are welcome.
Thanks in advance.

This may have to do with start*frameRate being a float when calling setpos. Perhaps after readframes you should use tell to find the current location of the file pointer instead..

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.