I am currently working on processing .wav files with python, using Pyaudio for streaming the audio, and the python wave library for loading the file data.
I plan to later on include processing of the individual stereo channels, with regards to amplitude of the signal, and panning of the stereo signal, but for now i'm just trying to seperate the two channels of the wave file, and stitch them back together - Hopefully ending up with data that is identical to the input data.
Below is my code.
The method getRawSample works perfectly fine, and i can stream audio through that function.
The problem is my getSample method. Somewhere along the line, where i'm seperating the two channels of audio, and joining them back together, the audio gets distorted. I have even commented out the part where i do amplitude and panning adjustment, so in theory it's data in -> data out.
Below is an example of my code:
class Sample(threading.Thread) :
def __init__(self, filepath, chunk):
super(Sample, self).__init__()
self.CHUNK = chunk
self.filepath = filepath
self.wave = wave.open(self.filepath, 'rb')
self.amp = 0.5 # varies from 0 to 1
self.pan = 0 # varies from -pi to pi
self.WIDTH = self.wave.getsampwidth()
self.CHANNELS = self.wave.getnchannels()
self.RATE = self.wave.getframerate()
self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK # maximum even number of chunks
self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string
self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string
self.framePos = 0 # keeps track of how many chunks of data fed
# panning and amplitude adjustment of input sample data
def panAmp(self, data, panVal, ampVal): # when panning, using constant power panning
[left, right] = self.getChannels(data)
#left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
#right = np.multiply(0.5, right) # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
outputList = self.combineChannels(left, right)
dataResult = struct.pack(self.pckstr, *outputList)
return dataResult
def getChannels(self, data):
dataPrepare = list(struct.unpack(self.unpstr, data))
left = dataPrepare[0::self.CHANNELS]
right = dataPrepare[1::self.CHANNELS]
return [left, right]
def combineChannels(self, left, right):
stereoData = left
for i in range(0, self.CHUNK/self.WIDTH):
index = i*2+1
stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
return stereoData
def getSample(self, panVal, ampVal):
data = self.wave.readframes(self.CHUNK)
self.framePos += 1
if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process
self.wave.rewind()
data = self.wave.readframes(self.CHUNK)
self.framePos = 1
return self.panAmp(data, panVal, ampVal)
def getRawSample(self): # for debugging, bypasses pan and amp functions
data = self.wave.readframes(self.CHUNK)
self.framePos += 1
if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process
self.wave.rewind()
data = self.wave.readframes(self.CHUNK)
self.framePos = 1
return data
i am suspecting that the error is in the way that i stitch together the left and right channel, but not sure.
I load the project with 16 bit 44100khz .wav files.
Below is a link to an audio file so that you can hear the resulting audio output.
The first part is running two files (both two channel) through the getSample method, while the next part is running those same files, through the getRawSample method.
https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav
Basing on the audio, as said earlier, it seems like the stereo file gets distorted. Looking at the waveform of above file, it seems as though the right and left channels are exactly the same after going through the getSample method.
If needed, i can also post my code including the main function.
Hopefully my question isn't too vague, but i am grateful for any help or input!
As it so often happens, i slept on it, and woke up the next day with a solution.
The problem was in the combineChannels function.
Following is the working code:
def combineChannels(self, left, right):
stereoData = left
for i in range(0, self.CHUNK):
index = i*2+1
stereoData = np.insert(stereoData, index, right[i:(i+1)])
return stereoData
The changes are
For loop bounds: as i have 1024 items (the same as my chunk size) in the lists left and right, i ofcourse need to iterate through every one of those.
index: the index definition remains the same
stereoData: Again, here i remember that im working with lists, each containing a frame of audio. The code in the question assumes that my list is stored as a bytestring, but this is ofcourse not the case. And as you see, the resulting code is much simpler.
Related
I am currently working on a research project relating to the use of neural networks operating on EEG datasets. I am using the BCICIV 2a dataset, which consists of a series of files containing trial data from subjects. Each file contains a set of 25 channels and a very long ~600000 time step array of signals. I have been working on writing code to preprocess this data into something I can pass into the neural network, but have run into some efficiency issues. Currently, I have written code that determines the location in the array of all the trials in a file, then attempts to extract a 3D NumPy array that is stored in another array. When I attempt to run this code however, it is ridiculously slow. I am not very familiar with NumPy, the majority of my experience at this point being in C. My intention had been to write the results of the preprocessing to a separate file that can be loaded to avoid the preprocessing. From a C perspective, all that would be necessary is to move the pointers around to format the data appropriately, so I am not sure why NumPy is so slow. Any suggestions would be very helpful since currently for 1 file it takes ~2 minutes to extract 1 trial, with 288 trials in a file and 9 files, this would take much longer than I would like. I am not very comfortable with my knowledge of how to make good use of NumPy's efficiency improvements over generic lists. Thanks!
import glob, os
import numpy as np
import mne
DURATION = 313
XDIM = 7
YDIM = 6
IGNORE = ('EOG-left', 'EOG-central', 'EOG-right')
def getIndex(raw, tagIndex):
return int(raw.annotations[tagIndex]['onset']*250)
def isEvent(raw, tagIndex, events):
for event in events:
if (raw.annotations[tagIndex]['description'] == event):
return True
return False
def getSlice1D(raw, channel, dur, index):
if (type(channel) == int):
channel = raw.ch_names[channel]
return raw[channel][0][0][index:index+dur]
def getSliceFull(raw, dur, index):
trial = np.zeros((XDIM, YDIM, dur))
for channel in raw.ch_names:
if not channel in IGNORE:
x, y = convertIndices(channel)
trial[x][y] = getSlice1D(raw, channel, dur, index)
return trial
def convertIndices(channel):
xDict = {'EEG-Fz':3, 'EEG-0':1, 'EEG-1':2, 'EEG-2':3, 'EEG-3':4, 'EEG-4':5, 'EEG-5':0, 'EEG-C3':1, 'EEG-6':2, 'EEG-Cz':3, 'EEG-7':4, 'EEG-C4':5, 'EEG-8':6, 'EEG-9':1, 'EEG-10':2, 'EEG-11':3, 'EEG-12':4, 'EEG-13':5, 'EEG-14':2, 'EEG-Pz':3, 'EEG-15':4, 'EEG-16':3}
yDict = {'EEG-Fz':0, 'EEG-0':1, 'EEG-1':1, 'EEG-2':1, 'EEG-3':1, 'EEG-4':1, 'EEG-5':2, 'EEG-C3':2, 'EEG-6':2, 'EEG-Cz':2, 'EEG-7':2, 'EEG-C4':2, 'EEG-8':2, 'EEG-9':3, 'EEG-10':3, 'EEG-11':3, 'EEG-12':3, 'EEG-13':3, 'EEG-14':4, 'EEG-Pz':4, 'EEG-15':4, 'EEG-16':5}
return xDict[channel], yDict[channel]
data_files = glob.glob('../datasets/BCICIV_2a_gdf/*.gdf')
try:
raw = mne.io.read_raw_gdf(data_files[0], verbose='ERROR')
except IndexError:
print("No data files found")
event_times = []
for i in range(len(raw.annotations)):
if (isEvent(raw, i, ('769', '770', '771', '772'))):
event_times.append(getIndex(raw, i))
data = np.empty((len(event_times), XDIM, YDIM, DURATION))
print(len(event_times))
for i, event in enumerate(event_times):
data[i] = getSliceFull(raw, DURATION, event)
EDIT:
I wanted to come back and add some more details on the structure of the dataset. There is the 25x~600000 array that contains the data and a much shorter annotation object that includes event tags and relates those to times within the larger array. Specific events indicate a motor imagery cue which is the trial that my network is being trained on, I am attempting to extract a 3D slice which includes the relevant channels formatted appropriately with a temporal dimension, which is found to be 313 timesteps long. The annotations gives me the relevant timesteps to investigate. The results of the profiling recommended by Ian showed that the main compute time is located in the getSlice1D() function. Particularly where I index into the raw object. The code that is extracting the event times from the annotations is comparably negligible.
This is a partial answer bc the formatting in comments is kind of garbage but
def getIndex(raw, tagIndex):
return int(raw.annotations[tagIndex]['onset']*250)
def isEvent(raw, tagIndex, events):
for event in events:
if (raw.annotations[tagIndex]['description'] == event):
return True
return False
for i in range(len(raw.annotations)):
if (isEvent(raw, i, ('769', '770', '771', '772'))):
event_times.append(getIndex(raw, i))
Notice how you're iterating over the I. What you could do is instead
def isEvent(raw_annotations_desc, raw_annotations_onset, events):
flag_container = []
for event in events: # Iterate through all the events
# Do a vectorized comparison across all the indices
flag_container.append(raw_annotations_desc == event)
# At this point flag_container will be of shape (|events|, len(raw_annotations_desc)
"""
Assuming I understand correctly, for a given index if
ANY of the events is true, we return true and get the index, correct?
def getIndex(raw, tagIndex):
return int(raw.annotations[tagIndex]['onset']*250)
"""
flag_container = np.asarray(flag_container) # Change raw list to np array
# Python treats False as 0 and True as 1. So, we sum over the cols
# and we now have an array of shape (1, len(raw_annotations))
flag_container = flag_container.sum(1)
# Add indices because we will use these later
flag_container = np.asarray(np.arange(len(raw_annotations)), flag_container)
# Almost there. Now, flag_container has 2 cols: the index AND the number of True in a given row
# Gets us all the indices where the sum was greater than 1 (aka one positive)
flag_container = flag_container[flag_container[1,:] > 0]
# Now, an array of shape (2, x <= len(raw_annotations_desc))
flag_container = flag_container[0, :] # We only care about the indices, not the actual count of positives now so we slice out the 0th-col
return int(raw_annotations_onset[flag_container] * 250)
Something to that effect :) That should speed things up a bit
I've tried to merge 2 midi files into one, such that the second midi file will not be heard when playing the output midi file.
I've managed to get a result that plays only the first song, but I wonder if there is a way to hide (encrypt) the second file with the option to recover it out of the output file.
In my way, there is no option for recovery, because all of the second song's notes are in velocity 0, so unless I write the original velocity to an external file or something like that, it's impossible to recover them.
def merge(first, second):
mid1 = MidiFile(first)
mid2 = MidiFile(second)
output = MidiFile(ticks_per_beat=mid1.ticks_per_beat, clip=mid1.clip, charset=mid1.charset, type=mid1.type)
for i, track in enumerate(mid2.tracks):
for j, msg in enumerate(mid2.tracks[i]):
try:
msg.velocity = 0
except Exception as e:
pass
output.tracks.append(track)
for i, track in enumerate(mid1.tracks):
output.tracks.append(track)
print(output.length)
output.save(filename="merged.mid")
Thank you!
I've managed to hide the velocities with Meta-Messages.
For every note's message I took it's velocity and created a new Meta-Message with a json object "{index_of_note_in_track: old_velocity}", and put those Meta-Messages in the end of the track.
def merge(first, second):
mid1 = MidiFile(first)
mid2 = MidiFile(second)
output = MidiFile(ticks_per_beat=mid1.ticks_per_beat, clip=mid1.clip, charset=mid1.charset, type=mid1.type)
for i, track in enumerate(mid2.tracks):
new_msgs = []
for j, msg in enumerate(mid2.tracks[i]):
if "velocity" in msg.dict().keys():
new_msgs.append(MetaMessage('text', **{'text': f'{{"{j}":{str(msg.velocity)}}}'}))
msg.velocity = 0
for msg in new_msgs:
track.insert(len(track), msg)
output.tracks.append(track)
for i, track in enumerate(mid1.tracks):
output.tracks.append(track)
print(output.length)
output.save(filename="merged.mid")
Can you think of any other way to hide one MIDI file in another?
I have never done any video-based programming before, and although this SuperUser post provides a way to do it on the command line, I prefer a programmatic approach, preferably with Python.
I have a bunch of sub-videos. Suppose one of them is called 1234_trimmed.mp4 which is a short segment cut from the original, much-longer video 1234.mp4. How can I figure out the start and end timestamps of 1234_trimmed.mp4 inside 1234.mp4?
FYI, the videos are all originally on YouTube anyway ("1234" corresponds to the YouTube video ID) if there's any shortcut that way.
I figured it out myself with cv2. My strategy was to get the first and last frames of the subvideo and iterate over each frame of the original video, where I compare the current frame's dhash (minimum hamming distance instead of checking for equality in case of resizing and other transformations) against the first and last frames. I'm sure there may be some optimization opportunities but I need this yesterday.
import cv2
original_video_fpath = '5 POPULAR iNSTAGRAM BEAUTY TRENDS (DiY Feather Eyebrows, Colored Mascara, Drippy Lips, Etc)-vsNVU7y6dUE.mp4'
subvideo_fpath = 'vsNVU7y6dUE_trimmed-out.mp4'
def dhash(image, hashSize=8):
# resize the input image, adding a single column (width) so we
# can compute the horizontal gradient
resized = cv2.resize(image, (hashSize + 1, hashSize))
# compute the (relative) horizontal gradient between adjacent
# column pixels
diff = resized[:, 1:] > resized[:, :-1]
# convert the difference image to a hash
return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])
def hamming(a, b):
return bin(a^b).count('1')
def get_video_frame_by_index(video_cap, frame_index):
# get total number of frames
totalFrames = video_cap.get(cv2.CAP_PROP_FRAME_COUNT)
if frame_index < 0:
frame_index = int(totalFrames) + frame_index
# check for valid frame number
if frame_index >= 0 & frame_index <= totalFrames:
# set frame position
video_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_index)
_, frame = video_cap.read()
return frame
def main():
cap_original_video = cv2.VideoCapture(original_video_fpath)
cap_subvideo = cv2.VideoCapture(subvideo_fpath)
first_frame_subvideo = get_video_frame_by_index(cap_subvideo, 0)
last_frame_subvideo = get_video_frame_by_index(cap_subvideo, -1)
first_frame_subvideo_gray = cv2.cvtColor(first_frame_subvideo, cv2.COLOR_BGR2GRAY)
last_frame_subvideo_gray = cv2.cvtColor(last_frame_subvideo, cv2.COLOR_BGR2GRAY)
hash_first_frame_subvideo = dhash(first_frame_subvideo)
hash_last_frame_subvideo = dhash(last_frame_subvideo)
min_hamming_dist_with_first_frame = float('inf')
closest_frame_index_first = None
closest_frame_timestamp_first = None
min_hamming_dist_with_last_frame = float('inf')
closest_frame_index_last = None
closest_frame_timestamp_last = None
frame_index = 0
while(cap_original_video.isOpened()):
frame_exists, curr_frame = cap_original_video.read()
if frame_exists:
timestamp = cap_original_video.get(cv2.CAP_PROP_POS_MSEC) // 1000
hash_curr_frame = dhash(curr_frame)
hamming_dist_with_first_frame = hamming(hash_curr_frame, hash_first_frame_subvideo)
hamming_dist_with_last_frame = hamming(hash_curr_frame, hash_last_frame_subvideo)
if hamming_dist_with_first_frame < min_hamming_dist_with_first_frame:
min_hamming_dist_with_first_frame = hamming_dist_with_first_frame
closest_frame_index_first = frame_index
closest_frame_timestamp_first = timestamp
if hamming_dist_with_last_frame < min_hamming_dist_with_last_frame:
min_hamming_dist_with_last_frame = hamming_dist_with_last_frame
closest_frame_index_last = frame_index
closest_frame_timestamp_last = timestamp
frame_index += 1
else:
print('processed {} frames'.format(frame_index+1))
break
cap_original_video.release()
print('timestamp_start={}, timestamp_end={}'.format(closest_frame_timestamp_first, closest_frame_timestamp_last))
if __name__ == '__main__':
main()
MP4 utilizes relative timestamps. When the file was trimmed the old timestamps were lost, and the new file now begins at time stamp zero.
So the only way to identify where this file may overlap with another file is to use computer vision or perceptual hashing. Both options are too complex to describe in a single stackoverflow answer.
If they were simply -codec copy'd, the timestamps should be as they were in the original file. If they weren't, ffmpeg is not the tool for the job. In that case, you should look into other utilities that can find an exactly matching video and audio frame in both files and get the timestamps from there.
I have two Byte objects.
One comes from using the Wave module to read a "chunk" of data:
def get_wave_from_file(filename):
import wave
original_wave = wave.open(filename, 'rb')
return original_wave
The other uses MIDI information and a Synthesizer module (fluidsynth)
def create_wave_from_midi_info(sound_font_path, notes):
import fluidsynth
s = []
fl = fluidsynth.Synth()
sfid = fl.sfload(sound_font_path) # Loads a soundfont
fl.program_select(track=0, soundfontid=sfid, banknum=0, presetnum=0) # Selects the soundfont
for n in notes:
fl.noteon(0, n['midi_num'], n['velocity'])
s = np.append(s, fl.get_samples(int(44100 * n['duration']))) # Gives the note the correct duration, based on a sample rate of 44.1Khz
fl.noteoff(0, n['midi_num'])
fl.delete()
samps = fluidsynth.raw_audio_string(s)
return samps
The two files are of different length.
I want to combine the two waves, so that both are heard simultaneously.
Specifically, I would like to do this "one chunk at a time".
Here is my setup:
def get_a_chunk_from_each(wave_object, bytes_from_midi, chunk_size=1024, starting_sample=0)):
from_wav_data = wave_object.readframes(chunk_size)
from_midi_data = bytes_from_midi[starting_sample:starting_sample + chunk_size]
return from_wav_data, from_midi_data
Info about the return from get_a_chunk_from_each():
type(from_wav_data), type(from_midi_data)
len(from_wav_data), type(from_midi_data)
4096 1024
Firstly, I'm confused as to why the lengths are different (the one generated from wave_object.readframes(1024) is exactly 4 times longer than the one generated by manually slicing bytes_from_midi[0:1024]. This may be part of the reason I have been unsuccessful.
Secondly, I want to create the function which combines the two chunks. The following "pseudocode" illustrates what I want to happen:
def combine_chunks(chunk1, chunk2):
mixed = chunk1 + chunk2
# OR, probably more like:
mixed = (chunk1 + chunk2) / 2
# To prevent clipping?
return mixed
It turns out there is a very, very simple solution.
I simply used the library audioop:
https://docs.python.org/3/library/audioop.html
and used their "add" function ("width" is the sample width in bytes. Since this is 16 bit audio, that's 16 / 8 = 2 bytes):
audioop.add(chunk1, chunk2, width=2)
I am extracting 150 different cell values from 350,000 (20kb) ascii raster files. My current code is fine for processing the 150 cell values from 100's of the ascii files, however it is very slow when running on the full data set.
I am still learning python so are there any obvious inefficiencies? or suggestions to improve the below code.
I have tried closing the 'dat' file in the 2nd function; no improvement.
dat = None
First: I have a function which returns the row and column locations from a cartesian grid.
def world2Pixel(gt, x, y):
ulX = gt[0]
ulY = gt[3]
xDist = gt[1]
yDist = gt[5]
rtnX = gt[2]
rtnY = gt[4]
pixel = int((x - ulX) / xDist)
line = int((ulY - y) / xDist)
return (pixel, line)
Second: A function to which I pass lists of 150 'id','x' and 'y' values in a for loop. The first function is called within and used to extract the cell value which is appended to a new list. I also have a list of files 'asc_list' and corresponding times in 'date_list'. Please ignore count / enumerate as I use this later; unless it is impeding efficiency.
def asc2series(id, x, y):
#count = 1
ls_id = []
ls_p = []
ls_d = []
for n, (asc,date) in enumerate(zip(asc, date_list)):
dat = gdal.Open(asc_list)
gt = dat.GetGeoTransform()
pixel, line = world2Pixel(gt, east, nort)
band = dat.GetRasterBand(1)
#dat = None
value = band.ReadAsArray(pixel, line, 1, 1)[0, 0]
ls_id.append(id)
ls_p.append(value)
ls_d.append(date)
Many thanks
In world2pixel you are setting rtnX and rtnY which you don't use.
You probably meant gdal.Open(asc) -- not asc_list.
You could move gt = dat.GetGeoTransform() out of the loop. (Rereading made me realize you can't really.)
You could cache calls to world2Pixel.
You're opening dat file for each pixel -- you should probably turn the logic around to only open files once and lookup all the pixels mapped to this file.
Benchmark, check the links in this podcast to see how: http://talkpython.fm/episodes/show/28/making-python-fast-profiling-python-code