I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how would I go about extracting the slice of audio that matches the segment of interest of the video?
My intuition is to convert frames to time via:
start_time = frame_start / fps
end_time = frame_end / fps
the convert time to sample position with:
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
Is this correct? Or is there something I'm missing? I'm worried that there will be loss of information since I'm converting the samples into ints with floor.
Let's say you have
fs = 44100 # audio sampling frequency
vfr = 24 # video frame rate
frame_start = 10 # index of first frame
frame_end = 10 # index of last frame
audio = np.arange(44100) # audio in form of ndarray
you can calculate at which points in time you want to slice the audio
time_start = frame_start / vfr
time_end = frame_end / vfr # or (frame_end + 1) / vfr for inclusive cut
and then to which samples those points in time correspond:
sample_start_idx = int(time_start * fs)
sample_end_idx = int(time_end * fs)
Its up to you if you want to be super-precise and take into account the fact that audio corresponding to a given frame should rather be starting half a frame before a frame and end half a frame after.
In such a case use:
time_start = np.clip((frame_start - 0.5) / vfr, 0, np.inf)
time_end = (frame_end + 0.5) / vfr
Your solution is just fine. Assuming your sample rate is 16000, the flooring will cause a video/audio desynch on the order of 4.166e-05 seconds, which is orders of magnitude below what human ears are able to discern.
import math
fps = 60
frame_start = 121
frame_end = 181
sr=16000
start_time = frame_start / fps
end_time = frame_end / fps
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
print(end_time-end_sample/sr) # 4.166666666671759e-05
How can I execute ser.readline() at a controlled rate, say every 0.002 seconds? The following code below in Python returns a list of varying sizes after every run, meaning the sampling rate varies every time. I was wondering if there was a controlled way of reading from the serial port given a desired sampling rate of 500 scans/second:
import numpy as np
from time import time
import serial
ser = serial.Serial('COM3', 115200, timeout=1)
ser.flushInput()
digital_data = np.array([])
# Set the end time 60 seconds from start
te = time() + 60
# While loop runs for 60 seconds
while time() <= te:
digital_data = append(digital_data, ser.readline().decode('utf-8'))
print(len(digital_data)) # Varies in size for each run
This may not be exact depending on the OS you're running...
import numpy as np
import time
import serial
ser = serial.Serial('COM3', 115200, timeout=1)
ser.flushInput()
digital_data = np.array([])
# Set the end time 60 seconds from start
te = time.time() + 60
delay = 0.002
while time.time() <= te:
start = time.time()
digital_data = append(digital_data, ser.readline().decode('utf-8'))
duration = time.time() - start
if duration < delay:
time.sleep(delay - duration)
print(len(digital_data)) # Varies in size for each run
I'm new in Vizard. I'm trying to create a simple code to perform two tasks sequentially for a specific time set:
A black image for 0.8 seconds
A sequence of images (from a folder) taken randomly, for 1.5 seconds.
I can perform these task separately, but I can't merge together. If someone has suggestions, thank you
import viz
import vizact
import vizinfo
import random
viz.setMultiSample(4)
viz.fov(60)
viz.go()
vizinfo.InfoPanel()
viz.clearcolor(viz.BLACK)
FRAME_RATE = 0.667 # in Hertz
r = list(range(7))
random.shuffle(r)
movieImages = viz.cycle( [ viz.addTexture('sequence_IMG/img%d.jpg' % i) for i in r ] )
screen = viz.addTexQuad()
screen.setPosition([0, 1.82, 1.5])
screen.setScale([4.0/3, 1, 1])
def executeExperiment():
for trialNumber in range(3):
yield Dark() #wait for doTrial to finish
yield vizact.ontimer(1.0/FRAME_RATE, NextMovieFrame)
print('Trial Done: ', trialNumber)
print('done with experiment')
#Setup timer to swap texture at specified frame rate
def NextMovieFrame():
screen.texture(movieImages.next())
def Dark():
yield viztask.waitTime(1) #wait for 1 second
viz.clearcolor(viz.BLACK)
vizact.ontimer(1.0/FRAME_RATE, NextMovieFrame)
viztask.schedule(executeExperiment())
How do I build a performant video stream buffer that I can do numpy array operations on?
This is my implementation currently - I just shift the previous array forward 1 frame and assign the last element to the current frame.
import numpy as np
import cv2
import time
cap = cv2.VideoCapture(0)
status, frame = cap.read()
buffer = np.empty([100, frame.shape[0], frame.shape[1], frame.shape[2]])
i=0
total = 100
while i < total:
if not i:
start = time.time()
status, frame = cap.read()
t = time.time()
if i < total/2:
buffer[i] = frame
else:
buffer[:-1] = buffer[1:]
buffer[-1] = frame
if i == total/2:
middle = t
i += 1
# Calculations on the buffer ommitted for brevity but include mean, std, etc.
stop = time.time()
print((middle-start)/(total/2))
print((stop-middle)/(total/2))
It takes about 350X longer to shift the array as opposed to simply assigning the values of a frame to an element of the array. I know this is because I am shifting all the pointers in the array which is unnecessary and expensive. Keeping the frames in order is nice but not necessary.
One surprisingly simple way to make a minor improvement to this is to use a Python List for the actual shifting/appending, then re-instantiate the buffer as a new NumPy array, like so:
import numpy as np
import cv2
import itertools
import time
cap = cv2.VideoCapture(0)
status, frame = cap.read()
buffer = np.empty([100, frame.shape[0], frame.shape[1], frame.shape[2]])
i=0
total = 100
middle = 0
while i < total:
if not i:
start = time.time()
status, frame = cap.read()
t = time.time()
if i < total/2:
buffer[i] = frame
else:
list_buffer = [item for item in buffer[1:]]
list_buffer.append(frame)
buffer = np.asanyarray(list_buffer)
if i == total/2:
middle = t
i += 1
# Calculations on the buffer ommitted for brevity but include mean, std, etc.
stop = time.time()
print((middle-start)/(total/2))
print((stop-middle)/(total/2))
On my machine that takes the second time total from 1.7 seconds down to about 1.36 seconds. Not a huge improvement, but not insignificant either (~20% speedup).
However, if we instead use the list_buffer in the whole loop to keep track of the contents of the buffer and simply do both our slicing and appending on that:
import numpy as np
import cv2
import itertools
import time
cap = cv2.VideoCapture(0)
status, frame = cap.read()
buffer = np.empty([100, frame.shape[0], frame.shape[1], frame.shape[2]])
i=0
total = 100
middle = 0
list_buffer = []
while i < total:
if not i:
start = time.time()
status, frame = cap.read()
t = time.time()
if i < total/2:
buffer[i] = frame
list_buffer.append(frame)
else:
list_buffer = list_buffer[1:]
list_buffer.append(frame)
buffer = np.asanyarray(list_buffer)
if i == total/2:
middle = t
i += 1
# Calculations on the buffer ommitted for brevity but include mean, std, etc.
stop = time.time()
print((middle-start)/(total/2))
print((stop-middle)/(total/2))
suddenly our output looks like this:
>>> 0.08505516052246094
>>> 0.08459827899932862
Hope that helps!
Reformatting the list as a numpy array costs very little. The deque/linked list was not really any more efficient with 100 frames in the buffer.
import numpy as np
import cv2
import time
import collections
cap = cv2.VideoCapture(0)
i=0
buff_len = 100
# buffer = [] #Standard list
# buffer = collections.deque() #linked list
status, frame = cap.read() #numpy array - replaces the first frame once it reaches the last frame
buffer = np.empty([buff_len, frame.shape[0], frame.shape[1], frame.shape[2]])
times_through = 3
start = time.time()
while i < times_through*buff_len:
t = time.time()
status, frame = cap.read()
# buffer.append(frame) #list and linked list
buffer[i%(buff_len)] = frame #numpy array
# if i >= buff_len: #list and linked list
# buffer.pop(0) #list
# buffer.popleft() #linked list
if i == buff_len:
full = t
i += 1
print(i, np.mean(buffer, dtype=np.int), int((time.time()-t)*100)/100.)
stop = time.time()
print((full-start)/(buff_len))
print((stop-full)/(buff_len*(times_through-1)))
print(len(buffer))
Results in seconds/frame:
# list
# 0.19624330043792726
# 0.3691681241989136
# linked list
# 0.19301403045654297
# 0.3468885350227356
# numpy Array
# 0.316677029132843
# 0.30973124504089355
I have a function that has to loop through individual pixels of an image and calculate some geometry. This function takes a very long time to run (~5 hours on a 24 Megapixel image) but seems like it should be easy to run in parallel on multiple cores. However, I can't for the life of me find a well documented, well explained example of doing something like this using the Multiprocessing package. Here is the code I am running right now as a toy example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from skimage import color
import multiprocessing
from multiprocessing import Process
#Some dumb stand in function for this exercise
def dumb_func(image):
ny, nx = image.shape
temp = np.empty_like(image)
for y in range(ny):
for x in range(nx):
temp[y, x] = np.square(image[y, x])
return temp
#Convert image to greyscale
img = color.rgb2gray(misc.ascent())
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
#Pull core count and divide by two to be safe
cores = int(multiprocessing.cpu_count() / 2)
result = np.empty_like(chunked)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
Basically this code loads in an image, converts it to greyscale, makes it bigger, and then chunks it up. The chunked array is of shape (i, j, ny, nx) where i and j are indices that identify the chunk of the image I am working with, and ny,nx describe the size in pixels of each chunk.
Additionally, I am creating an array called idxs that stores all possible indices into the chunked array to pull the chunked images out.
What I want to do is run a function (in this case the dumb_func as an example) over the chunks in parallel and store the results in the results array of the same shape. The way I imagined doing it was to loop over the idxs array and assign processes the chunks belonging to those indexes up to the number of cores, wait for those cores to finish, then feed the cores more processes until finished. I got stuck because I couldn't A) figure out how to access the return value in the function, and B) how to handle a situation where I might have 16 chunks and 5 cores leading to the last iteration only requiring a single process.
How can I go about doing this? I've spent the last 6-7 hours reading about Multiprocessing Pool, Process, Map, Starmap, etc... and can't for the life of me understand how to implement this.
Edit for Reedinationer:
This is my updated code and runs without error. However the new_data array is never updated. I filled it with a value of 100 and at the end of the routine new_data is exactly how it was initialized.
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from multiprocessing import Process, JoinableQueue
from time import time
#SOme dumb stand in function for this exercise
def dumb_func(q, new_data):
while True:
index, image = q.get()
temp = image **2
new_data[index[0], index[1], :, :] = temp
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
img = misc.ascent()
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
new_data = np.full(chunked.shape, 100)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
for i in range(len(idxs)):
q.put((idxs[i], chunked[idxs[i][0], idxs[i][1], :, :]))
print ('starting workers')
worker_count = len(idxs)
processes = []
for i in range(worker_count):
p = Process(target=dumb_func, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
I'd do something like this, starting with dependencies:
from multiprocessing import Pool
import numpy as np
from PIL import Image
# and some for testing
from random import random
from time import sleep
first I define a function to divide an image up into "chunks", sort of as you talked about:
def chunkit(ys, xs, blocksize=64):
for y in range(0, ys, blocksize):
yt = (y, min(ys, y + blocksize))
for x in range(0, xs, blocksize):
xt = (x, min(xs, x + blocksize))
yield yt, xt
it's a lazy iterator, so this can go on for a while.
I then define my worker function:
def dumb_func(cc):
(y0,y1), (x0,x1) = cc
# convert to floats for ease of processing
chunk = image[y0:y1,x0:x1] / 255.
# random slow down for testing
# sleep(random() ** 6)
res = chunk ** 2
# convert back to bytes for efficiency
return cc, (res * 255).astype(np.uint8)
I make sure the source array stays as close to original format as possible for efficiency and send it back in the same format (this might take some fiddling, if you're dealing with other pixel formats obviously).
then I put it together:
if __name__ == '__main__':
source = Image.open('tmp.jpeg')
image = np.asarray(source)
print("loaded", image.shape, image.dtype)
with Pool() as pool:
resit = pool.imap_unordered(
dumb_func, chunkit(*image.shape[:2]))
output = np.empty_like(image)
for cc, res in resit:
(y0,y1), (x0,x1) = cc
output[y0:y1,x0:x1] = res
im = Image.fromarray(output, 'RGB')
im.save('out.jpeg')
this churns through a 15Mpixel image in a couple of seconds, with most of that spent loading/saving the image. it could probably be a lot more clever with array strides and cache friendliness, but hope that helps!
note: I think this code relies on CPython Unix style process forking semantics to make sure the image is shared between processes efficiently. not sure what would happen if you ran it on something else
I've been working on code for basically this same thing. Right now the goal is just to replace white pixels with transparent ones, but it seems to replace the entire image so there is a bug somewhere...It doesn't get an error within the multiprocessing module anymore though, so maybe it could serve as an example of how to load a Queue and then have your worker processes work on it!
from PIL import Image
from multiprocessing import Process, JoinableQueue
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
new_data = [0] * len(datas) # make a blank array the size of our image to fill later
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
I think it's important to "protect" your code inside the if __name__ == "__main__" block otherwise the spawned processes seem to run it.
update
It looks like you need to implement a Manager() (or there are probably other ways I am ignorant of as well!). I got my code to run by altering it into:
from PIL import Image
from multiprocessing import Process, JoinableQueue, Manager
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
# new_data = [(0, 0, 0, 0)]*len(datas)
manager = Manager()
new_data = manager.list([(0, 0, 0, 0)]*len(datas))
print(new_data)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
print("Saving Image")
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
Although this doesn't seem like the fastest option! I'm sure there are other ways to increase speed. My code to do the same thing with Threads looks VERY similar:
from PIL import Image
from threading import Thread
from queue import Queue
import time
start = time.time()
q = Queue()
planeIm = Image.open('InputImage.jpg')
planeIm = planeIm.convert('RGBA')
datas = planeIm.getdata()
new_data = [0] * len(datas)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
def worker_function():
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
print('starting workers')
worker_count = 100
for i in range(worker_count):
t = Thread(target=worker_function)
t.daemon = True
t.start()
print('main thread waiting')
q.join()
print('Queue has been joined')
planeIm.putdata(new_data)
planeIm.save('output.png', "PNG")
end = time.time()
elapsed = end - start
print('{:3.3} seconds elapsed'.format(elapsed))
Yet, processing my image takes ~23 seconds with threads and ~170 seconds with multiprocessing!! I suspect this would come from the larger overhead needed to start Process objects, and the fact that my algorithm for processing each pixel is simple for now (just the if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240: bit), so I'm likely not yielding the speed improvements that a complex pixel processing algorithm would get me. Also to note multiprocessing documentation
a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.
Which leads me to believe that there are alternatives that are faster.