I'm trying to split a wav file programmatically in Python. Based on hints from stackoverflow as well as the documentation from the Python wave module I'm doing the following
import wave
origAudio = wave.open('inputFile.wav','r')
frameRate = origAudio.getframerate()
nChannels = origAudio.getnchannels()
sampWidth = origAudio.getsampwidth()
start = float(someStartVal)
end = float(someEndVal)
origAudio.setpos(start*frameRate)
chunkData = origAudio.readframes(int((end-start)*frameRate))
chunkAudio = wave.open('outputFile.wav','w')
chunkAudio.setnchannels(nChannels)
chunkAudio.setsampwidth(sampWidth)
chunkAudio.setframerate(frameRate)
chunkAudio.writeframes(chunkData)
chunkAudio.close()
I iterate through a number of different start and end values, and extract chunks of audio from the original file in this manner. What's weird is that the technique works perfectly fine for some chunks, and produces garbage white noise for others. Also there's no obvious pattern of which start and end positions produce white noise, just that it happens consistently for an input file.
Anyone experienced this sort of behaviour before? Or know what I'm doing wrong? Suggestions on better ways of splitting an audio file programmatically are welcome.
Thanks in advance.
This may have to do with start*frameRate being a float when calling setpos. Perhaps after readframes you should use tell to find the current location of the file pointer instead..
Related
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
I need to perform a function on images in less than 1 second. I have a problem on a 1000x1000 image that, just to load it as a matrix in the program, takes 1 second.
The function I use to load it is as follows:
import png
def load(fname):
with open(fname, mode='rb') as f:
reader = png.Reader(file=f)
w, h, png_img, _ = reader.asRGB8()
img = []
for line in png_img:
l = []
for i in range(0, len(line), 3):
l+=[(line[i], line[i+1], line[i+2])]
img+=[l]
return img
How can I modify it in such a way that, when opening the image, it takes a little more than a few milliseconds?
IMPORTANT NOTE: I cannot import other functions outside of this (this is a university exercise and therefore there are rules -.-). So I have to get one myself
you can use PIL to do this for you, it's highly optimized and fast
from PIL import Image
def load(path):
return Image.open(path)
Appending to a list is inherently slow - read about Shlemiel the painter’s algorithm. You can replace it with a generator expression and slicing.
for line in png_img:
img += list(zip(line[0::3], line[1::3], line[2::3])
I'm not sure it is remotely possible to run a python script that opens a file, etc. in just a few ms. On my computer, the simplest program takes several 10ms
Without knowing more about the specifics of your problem and the reasons for your constraint, it is hard to answer. You should consider what you are trying to do, in the context of the way your program really works, and then formulate a strategy to achieve your goal.
The total context here is, you're asking the computer to:
run python, load your code and interpret it
load any modules you want to use
find your image file and read it from disk
give those bytes some meaning as an image abstraction - parse, etc these bytes
do some kind of transform or "work" on the image
export your result in some way
You need to figure out which of those steps is it that really needs to be lightning fast. After that, maybe someone can make a suggestion.
I am using the wave library in python to attempt to reduce the speed of audio by 50%. I have been successful, but only in the right channel. in the left channel it is a whole bunch of static.
import wave,os,math
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
for i in range(0,len(frames)-1):
newframes.append(frames[i])
newframes.append(frames[i])
w.writeframesraw(newframes)
why is this? since I am just copying and pasting raw data surely I can't generate static?
edit: I've been looking for ages and I finally found a useful resource for the wave format: http://soundfile.sapp.org/doc/WaveFormat/
If I want to preserve stereo sound, it looks like I need to copy the actual sample width of 4 twice. This is because there are two channels and they take up 4 bytes instead of 2.
`import wave
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
w.setframerate(r.getframerate())
print(r.getsampwidth())
for i in range(0,len(frames)-4,4):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
w.writeframesraw(newframes)`
Edit 2:
Okay I have no idea what drove me to do this but I am already enjoying the freedoms it is giving me. I chose to copy the wav file into memory, edit the copy directly, and write it to an output file. I am incredibly happy with the results. I can import a wav, repeat the audio once, and write it to an output file, in only 0.2 seconds. Reducing the speed by half times now takes only 9 seconds instead of the 30+ seconds with my old code using the wav plugin :) here's the code, still kind of un-optimized i guess but it's better than what it was.
import struct
import time as t
t.clock()
r=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/aha.wav","rb")
w=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/output.wav","wb")
rbuff=bytearray(r.read())
def replacebytes(array,bites,stop):
length=len(bites)
start=stop-length
for i in range(start,stop):
array[i]=bites[i-start]
def write(audio):
w.write(audio)
def repeat(audio,repeats):
if(repeats==1):
return(audio)
if(repeats==0):
return(audio[:44])
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*repeats), 44)
return(audio+(audio[44:len(audio)-58]*(repeats-1)))
def slowhalf(audio):
buff=bytearray()
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*2), 44)
for i in range(44,len(audio)-62,4):
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
return(audio[:44]+buff)
rbuff=slowhalf(rbuff)
write(rbuff)
print(t.clock())
I am surprised at how small the code is.
Each of the elements returned by readframes is a single byte, even though the type is int. An audio sample is typically 2 bytes. By doubling up each byte instead of each whole sample, you get noise.
I have no idea why one channel would work, with the code shown in the question it should be all noise.
This is a partial fix. It still intermixes the left and right channel, but it will give you an idea of what will work.
for i in range(0,len(frames)-1,2):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i])
newframes.append(frames[i+1])
Edit: here's the code that should work in stereo. It copies 4 bytes at a time, 2 for the left channel and 2 for the right, then does it again to double them up. This will keep the channel data from interleaving.
for i in range(0, len(frames), 4):
for _ in range(2):
for j in range(4):
newframes.append(frames[i+j])
I am trying to read raw image data from a cr2 (canon raw image file). I want to read the data only (no header, etc.) pre-processed if possible (i.e pre-bayer/the most native unprocessed data) and store it in a numpy array. I have tried a bunch of libraries such as opencv, rawkit, rawpy but nothing seems to work correctly.
Any suggestion on how I should do this? What I should use? I have tried a bunch of things.
Thank you
Since libraw/dcraw can read cr2, it should be easy to do. With rawpy:
#!/usr/bin/env python
import rawpy
raw = rawpy.imread("/some/path.cr2")
bayer = raw.raw_image # with border
bayer_visible = raw.raw_image_visible # just visible area
Both bayer and bayer_visible are then a 2D numpy array.
You can use rawkit to get this data, however, you won't be able to use the actual rawkit module (which provides higher level APIs for dealing with Raw images). Instead, you'll want to use mostly the libraw module which allows you to access the underlying LibRaw APIs.
It's hard to tell exactly what you want from this question, but I'm going to assume the following: Raw bayer data, including the "masked" border pixels (which aren't displayed, but are used to calculate various things about the image). Something like the following (completely untested) script will allow you to get what you want:
#!/usr/bin/env python
import ctypes
from rawkit.raw import Raw
with Raw(filename="some_file.CR2") as raw:
raw.unpack()
# For more information, see the LibRaw docs:
# http://www.libraw.org/docs/API-datastruct-eng.html#libraw_rawdata_t
rawdata = raw.data.contents.rawdata
data_size = rawdata.sizes.raw_height * rawdata.sizes.raw_width
data_pointer = ctypes.cast(
rawdata.raw_image,
ctypes.POINTER(ctypes.c_ushort * data_size)
)
data = data_pointer.contents
# Grab the first few pixels for demonstration purposes...
for i in range(5):
print('Pixel {}: {}'.format(i, data[i]))
There's a good chance that I'm misunderstanding something and the size is off, in which case this will segfault eventually, but this isn't something I've tried to make LibRaw do before.
More information can be found in this question on the LibRaw forums, or in the LibRaw struct docs.
Storing in a numpy array I leave as an excersize for the user, or for a follow up answer (I have no experience with numpy).
I have a .tar file containing several hundreds of pictures (.png). I need to process them via opencv.
I am wondering whether - for efficiency reasons - it is possible to process them without passing by the disc. In other, words I want to read the pictures from the memory stream related to the tar file.
Consider for instance
import tarfile
import cv2
tar0 = tarfile.open('mytar.tar')
im = cv2.imread( tar0.extractfile('fname.png').read() )
The last line doesn't work as imread expects a file name rather than a stream.
Consider that this way of reading directly from the tar stream can be achieved e.g. for text (see e.g. this SO question).
Any suggestion to open the stream with the correct png encoding?
Untarring to ramdisk is of course an option, although I was looking for something more cachable.
Thanks to the suggestion of #abarry and this SO answer I managed to find the answer.
Consider the following
def get_np_array_from_tar_object(tar_extractfl):
'''converts a buffer from a tar file in np.array'''
return np.asarray(
bytearray(tar_extractfl.read())
, dtype=np.uint8)
tar0 = tarfile.open('mytar.tar')
im0 = cv2.imdecode(
get_np_array_from_tar_object(tar0.extractfile('fname.png'))
, 0 )
Perhaps use imdecode with a buffer coming out of the tar file? I haven't tried it but seems promising.