How to Locate Timestamps of a Subset of a Video

How to Locate Timestamps of a Subset of a Video - python

I have never done any video-based programming before, and although this SuperUser post provides a way to do it on the command line, I prefer a programmatic approach, preferably with Python.
I have a bunch of sub-videos. Suppose one of them is called 1234_trimmed.mp4 which is a short segment cut from the original, much-longer video 1234.mp4. How can I figure out the start and end timestamps of 1234_trimmed.mp4 inside 1234.mp4?
FYI, the videos are all originally on YouTube anyway ("1234" corresponds to the YouTube video ID) if there's any shortcut that way.

I figured it out myself with cv2. My strategy was to get the first and last frames of the subvideo and iterate over each frame of the original video, where I compare the current frame's dhash (minimum hamming distance instead of checking for equality in case of resizing and other transformations) against the first and last frames. I'm sure there may be some optimization opportunities but I need this yesterday.
import cv2
original_video_fpath = '5 POPULAR iNSTAGRAM BEAUTY TRENDS (DiY Feather Eyebrows, Colored Mascara, Drippy Lips, Etc)-vsNVU7y6dUE.mp4'
subvideo_fpath = 'vsNVU7y6dUE_trimmed-out.mp4'
def dhash(image, hashSize=8):
# resize the input image, adding a single column (width) so we
# can compute the horizontal gradient
resized = cv2.resize(image, (hashSize + 1, hashSize))
# compute the (relative) horizontal gradient between adjacent
# column pixels
diff = resized[:, 1:] > resized[:, :-1]
# convert the difference image to a hash
return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])
def hamming(a, b):
return bin(a^b).count('1')
def get_video_frame_by_index(video_cap, frame_index):
# get total number of frames
totalFrames = video_cap.get(cv2.CAP_PROP_FRAME_COUNT)
if frame_index < 0:
frame_index = int(totalFrames) + frame_index
# check for valid frame number
if frame_index >= 0 & frame_index <= totalFrames:
# set frame position
video_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_index)
_, frame = video_cap.read()
return frame
def main():
cap_original_video = cv2.VideoCapture(original_video_fpath)
cap_subvideo = cv2.VideoCapture(subvideo_fpath)
first_frame_subvideo = get_video_frame_by_index(cap_subvideo, 0)
last_frame_subvideo = get_video_frame_by_index(cap_subvideo, -1)
first_frame_subvideo_gray = cv2.cvtColor(first_frame_subvideo, cv2.COLOR_BGR2GRAY)
last_frame_subvideo_gray = cv2.cvtColor(last_frame_subvideo, cv2.COLOR_BGR2GRAY)
hash_first_frame_subvideo = dhash(first_frame_subvideo)
hash_last_frame_subvideo = dhash(last_frame_subvideo)
min_hamming_dist_with_first_frame = float('inf')
closest_frame_index_first = None
closest_frame_timestamp_first = None
min_hamming_dist_with_last_frame = float('inf')
closest_frame_index_last = None
closest_frame_timestamp_last = None
frame_index = 0
while(cap_original_video.isOpened()):
frame_exists, curr_frame = cap_original_video.read()
if frame_exists:
timestamp = cap_original_video.get(cv2.CAP_PROP_POS_MSEC) // 1000
hash_curr_frame = dhash(curr_frame)
hamming_dist_with_first_frame = hamming(hash_curr_frame, hash_first_frame_subvideo)
hamming_dist_with_last_frame = hamming(hash_curr_frame, hash_last_frame_subvideo)
if hamming_dist_with_first_frame < min_hamming_dist_with_first_frame:
min_hamming_dist_with_first_frame = hamming_dist_with_first_frame
closest_frame_index_first = frame_index
closest_frame_timestamp_first = timestamp
if hamming_dist_with_last_frame < min_hamming_dist_with_last_frame:
min_hamming_dist_with_last_frame = hamming_dist_with_last_frame
closest_frame_index_last = frame_index
closest_frame_timestamp_last = timestamp
frame_index += 1
else:
print('processed {} frames'.format(frame_index+1))
break
cap_original_video.release()
print('timestamp_start={}, timestamp_end={}'.format(closest_frame_timestamp_first, closest_frame_timestamp_last))
if __name__ == '__main__':
main()

MP4 utilizes relative timestamps. When the file was trimmed the old timestamps were lost, and the new file now begins at time stamp zero.
So the only way to identify where this file may overlap with another file is to use computer vision or perceptual hashing. Both options are too complex to describe in a single stackoverflow answer.

If they were simply -codec copy'd, the timestamps should be as they were in the original file. If they weren't, ffmpeg is not the tool for the job. In that case, you should look into other utilities that can find an exactly matching video and audio frame in both files and get the timestamps from there.

Related

Concatenate a video, image and audio using ffmpeg

I am trying to concatenate a group of images with associated audio with a video clip at the start and front of the video. Whenever I concatenate the image with the associated audio it dosen't playback correctly in VLC media player and only displays the image for a frame before cutting to black and continually playing audio. I came across this github issue: https://github.com/kkroening/ffmpeg-python/issues/274 where the accepted solution was the one I implemented but one of the comments mentioned this issue of incorrect playback and error on youtube.
'''
Generates a clip from an image and a wav file, helper function for export_video
'''
def generate_clip(img):
transition_cond = os.path.exists("static/transitions/" + img + ".mp4")
chart_path = os.path.exists("charts/" + img + ".png")
if transition_cond:
clip = ffmpeg.input("static/transitions/" + img + ".mp4")
elif chart_path:
clip = ffmpeg.input("charts/" + img + ".png")
else:
clip = ffmpeg.input("static/transitions/Transition.jpg")
audio_clip = ffmpeg.input("audio/" + img + ".wav")
clip = ffmpeg.concat(clip, audio_clip, v=1, a=1)
clip = ffmpeg.filter(clip, "setdar","16/9")
return clip
'''
Combines the charts from charts/ and the audio from audio/ to generate one final video that will be uploaded to Youtube
'''
def export_video(CHARTS):
clips = []
intro = generate_clip("Intro")
clips.append(intro)
for key in CHARTS.keys():
value = CHARTS.get(key)
value.insert(0, key)
subclip = []
for img in value:
subclip.append(generate_clip(img))
concat_clip = ffmpeg.concat(*subclip)
clips.append(concat_clip)
outro = generate_clip("Outro")
clips.append(outro)
concat_clip = ffmpeg.concat(*clips)
concat_clip.output("export/export.mp4").run(overwrite_output=True)

It is unfortunate concat filter does not offer the shortest option like overlay. Anyway, the issue here is that image2 demuxer uses 25 fps by default, so a video stream with one image only lasts for 1/25 seconds long. There are a several ways to address this, but you first need to get the duration of the paired audio files. To incorporate the duration information to the ffmpeg command, you can:
Use tpad filter for each video (in series with setdar) to make the video duration to match the audio. Padded amount should be 1/25 seconds less than the audio duration.
Specify -loop 1 input option so the image will loop (indefinitely) and then specify an additional -t {duration} input option to limit the number of loops. Caution that the video duration may not be exact.
Specify -r {1/duration} so the image will last as long as the audio and use fps filter on each input to the output frame rate.
I'm not familiar with ffmpeg-python so I cannot provide its solution, but if you're interested, I'd be happy to post an equivalent code with my ffmpegio package.
[edit]
ffmpegio Solution
Here is how I'd code the 3rd solution with ffmpegio:
import ffmpegio
def generate_clip(img):
"""
Generates a clip from an image and a wav file,
helper function for export_video
"""
transition_cond = path.exists("static/transitions/" + img + ".mp4")
chart_path = path.exists("charts/" + img + ".png")
if transition_cond:
video_file = "static/transitions/" + img + ".mp4"
elif chart_path:
video_file = "charts/" + img + ".png"
else:
video_file = "static/transitions/Transition.jpg"
audio_file = "audio/" + img + ".wav"
video_opts = {}
if not transition_cond:
# audio_streams_basic() returns audio duration in seconds as Fraction
# set the "framerate" of the video to be the reciprocal
info = ffmpegio.probe.audio_streams_basic(audio_file)
video_opts["r"] = 1 / info[0]["duration"]
return [(video_file, video_opts), (audio_file, None)]
def export_video(CHARTS):
"""
Combines the charts from charts/ and the audio from audio/
to generate one final video that will be uploaded to Youtube
"""
# get all input files (video/audio pairs)
clips = [
generate_clip("Intro"),
*(generate_clip(img) for key, value in CHARTS.items() for img in value),
generate_clip("Outro"),
]
# number of clips
nclips = len(clips)
# filter chains to set DAR and fps of all video streams
vfilters = (f"[{2*n}:v]setdar=16/9,fps=30[v{n}]" for n in range(nclips))
# concatenation filter input: [v0][1:a][v1][3:a][v2][5:a]...
concatfilter = "".join((f"[v{n}][{2*n+1}:a]" for n in range(nclips))) + f"concat=n={nclips}:v=1:a=1[vout][aout]"
# form the full filtergraph
fg = ";".join((*vfilters, concatfilter))
# set output file and options
output = ("export/export.mp4", {"map": ["[vout]", "[aout]"]})
# run ffmpeg
ffmpegio.ffmpegprocess.run(
{
"inputs": [input for pair in clips for input in pair],
"outputs": [output],
"global_options": {"filter_complex": fg},
},
overwrite=True,
)
Since this code does not use the read/write features, ffmpegio-core package suffices:
pip install ffmpegio-core
Make sure that FFmpeg binary can be found by ffmpegio. See the installation doc.
Here are the direct links to the documentations of the functions used:
ffmpegprocess.run
ffmpeg_args dict argument
probe.audio_streams_basic (Ignore the documentation error both duration and start_time are both of Fraction type.
The code has not been fully validated. If you encounter a problem, it might be the easiest to post it on the GitHub Discussions to proceed.

OpenCV (Python) VideoCapture.read() on missing frames

New to python, new to OpenCV, which I'm gonna use for my master-thesis, and already got some problems using the VideoCapture object of OpenCV.
Situation:
I got 2 folders containing corresponding images (taken with RGB and infrared cameras). I want to display them sibe by side in a Window using a while-loop. The problem arises, when there are some images missing from one of the image-sequences (Due to problems while recording or whatever, I don't really know but that should be of no importance). My idea was to use the bool-returnvalue of the .read() function to check wheather there is a frame to be read and if not, replace the image by a black one. This is what I did:
Code:
import cv2
import numpy as np
pathRGB = "Bilder/RGB"
pathIR = "Bilder/IR"
# the paths to the folders containing the images
capRGB = cv2.VideoCapture(pathRGB + "/frame_%06d.jpg")
capIR = cv2.VideoCapture(pathIR + "/frame_%06d.jpg")
# setting up the VideoCapture-elements with the according format
shapeRGB = capRGB.read()[1].shape
shapeIR = capIR.read()[1].shape
# get the shape of the first image in each folder to later create the black
# dummy-image
dtypeRGB = capRGB.read()[1].dtype
dtypeIR = capIR.read()[1].dtype
# get the type of the first image in each folder to later create the black
# dummy-image
if (capRGB.isOpened() is False):
print("Error opening RGB images")
if (capIR.isOpened() is False):
print("Error opening IR images")
cv2.namedWindow("frames", cv2.WINDOW_NORMAL)
while capRGB.isOpened() and capIR.isOpened() is True:
retRGB, imgRGB = capRGB.read()
retIR, imgIR = capIR.read()
# read both images
if retRGB is True and retIR is False:
imgIR = np.zeros(shapeIR, dtype=dtypeIR)
# if there is no IR image, crate a dummy one
if retIR is True and retRGB is False:
imgRGB = np.zeros(shapeRGB, dtype=dtypeRGB)
# if there is no RGB image, crate a dummy one
if retRGB is False and retIR is False:
break
imgCombined = np.hstack((imgRGB, imgIR))
# put both images together
cv2.imshow("frames", imgCombined)
k = cv2.waitKey(1)
if k == ord("q"):
break
capRGB.release()
capIR.release()
cv2.destroyAllWindows()
Problem:
From my understanding, the problem arises as capIR.read() attempts to read a missing image (in my case the 527th) and instead of just returning false/None it attempts to read the same image over and over again. Up to the missing frame, everything works fine, the right "IR" image even turns black but then the videoplayback begins to slow down and while i still can close the window by pressing 'q', spyder IDE freezes and if I wait "too long" i even have to shut it down. Console gives out "[image2 # 000002a7af8f0480] Could not open file : Bilder/IR/frame_000527.jpg" over and over again, so much that i can't scroll to the top.
I guess what I'm asking is: Is there any way to make the .read() function just attempt 1 read and after it fails continue with the next frame?
Best regards and thank you very much in advance!

Simulated for testing with different files and directory names.
Will retrieve the largest frame number from both directories and afterwards iterate over all frame numbers for reading the files from both directories.
import os
import cv2
import re
import glob
image_dir1 = 'test1'
image_dir2 = 'test2'
# retrieve all frames in both directories
frames_cap1 = glob.glob(os.path.join(image_dir1, "frame_*.jpg"))
frames_cap2 = glob.glob(os.path.join(image_dir2, "frame_*.jpg"))
# sort inscending
frames_cap1.sort()
frames_cap2.sort()
# retrieve last frame No for both directories
last_frame_cap1 = frames_cap1[-1]
last_frame_cap2 = frames_cap2[-1]
# extract integer counter as a group
# modify regex to match file name if required
match_cap1 = re.search('frame_(\d+).jpg', last_frame_cap1)
match_cap2 = re.search('frame_(\d+).jpg', last_frame_cap2)
last_frame_no_cap1 = int(match_cap1.group(1))
last_frame_no_cap2 = int(match_cap2.group(1))
# retrieve max frame No
max_frame_no = max(last_frame_no_cap1, last_frame_no_cap2)
for i in range(max_frame_no + 1):
# adapt formatting of frame number to digit count in file name
# here: 6 digits with leading zeros
image_path_cap1 = os.path.join(image_dir1, f"frame_{i:06d}.jpg")
image_path_cap2 = os.path.join(image_dir2, f"frame_{i:06d}.jpg")
if not os.path.isfile(image_path_cap1):
print(f"handle missing file: '{image_path_cap1}'")
# ...
else:
img1 = cv2.imread(image_path_cap1)
# …
if not os.path.isfile(image_path_cap2):
print(f"handle missing file: '{image_path_cap2}'")
# ...
else:
img2 = cv2.imread(image_path_cap2)
# …
# …

Assuming that the images in directory1 have the same names as directory2 images, but we know that some image may not be present in both directories...
import glob,os,cv2
path1 = "folder1/"
path2 = "folder2/"
#change directory to path1
os.chdir(path1)
l1 = glob.glob("*.jpg") #get a list of images names
os.chdir("../") #go one directory up
blackimg = cv2.imread("blackimg.jpg")
for fname in l1:
#check if image1 exists , then read it . otherwise im1 = blackimg
if os.path.isfile(path1+fname):
im1=cv2.imread(path1+fname)
else:
im1=blackimg
#check if image2 exists , then read it . otherwise im2 = blackimg
if os.path.isfile(path2+fname):
im2=cv2.imread(path2+fname)
else:
im2=blackimg
imgCombined = np.hstack((im1, im2))
cv2.imshow("Combined", imgCombined)
print("press any key to continue, q to exit")
k = cv2.waitKey(0)
if k == ord("q"):break
cv2.destroyAllWindows()

How to list lowest values in numpy array

I'm working on a facial recognition project which creates a database of face encodings, then upon selection of a target photo will encode that photo as well and match it against the known encodings.
The program works correctly except it only provides the best match, if one is found within the set tolerance. The problem is that it is not always correct, even when i know that the target face is in my database, but the target picture is different enough that it can cause a false positive.
Because of this, I would like to list the top 3 or 5 results to hopefully get the correct result within those top 5.
Here's the code.
def recognize():
#define path for target photo
path=tkinter.filedialog.askopenfilename(filetypes=[("Image File",'.jpg .png')])
with open('dataset_faces.dat', 'rb') as f:
encoding = pickle.load(f)
def classify_face(im):
faces = encoding
faces_encoded = list(faces.values())
known_face_names = list(faces.keys())
img = cv2.imread(im, 1)
img = cv2.resize(img, (600, 600), fx=0.5, fy=0.5)
#img = img[:,:,::-1]
face_locations = face_recognition.face_locations(img, number_of_times_to_upsample=2, model="cnn")
unknown_face_encodings = face_recognition.face_encodings(img, face_locations, num_jitters=100)
face_names = []
for face_encoding in unknown_face_encodings:
# See if the face is a match for the known face(s)
name = "Unknown"
# use the known face with the smallest distance to the new face
face_distances = face_recognition.face_distance(faces_encoded, face_encoding)
best_match_index = np.argmin(face_distances)
#set distance between known faces and target face. The lower the distance between them the lower the match. Higher dist = more error.
if face_distances[best_match_index] < 0.60:
name = known_face_names[best_match_index]
face_names.append(name)
print(name)
I have tried adding code like
top_3_matches = np.argsort(face_distances)[:3]
top3 = face_names.append(top_3_matches)
print(top3)
However this gives me no hits.
Any ideas?

list.append does not return anything, so you should not try to affect that expression to a variable.
names = known_face_names[top_3_matches]
face_names.append(names)
print(names)
should do the same thing as
name = known_face_names[best_match_index]
face_names.append(name)
print(name)
for three elements instead of one.

The following code solves the issue. The issue was that I was using a numpy functions on a list that hadn't been converted into a numpy array, as per Aubergine's answer.
def classify_face(im):
faces = encoding
faces_encoded = list(faces.values())
known_face_names = list(faces.keys())
#make lists into numpy arrays
n_faces_encoded = np.array(faces_encoded)
n_known_face_names = np.array(known_face_names)
and to sort the numpy array for the 3 lowest values:
n_face_distances = face_recognition.face_distance(n_faces_encoded, face_encoding)
top_3_matches = np.argsort(n_face_distances)[:3]
printing the best 3 matches:
other_matches = n_known_face_names[top_3_matches]
print(other_matches)

Choppy audio from separating and then joining .wav stereo channels

I am currently working on processing .wav files with python, using Pyaudio for streaming the audio, and the python wave library for loading the file data.
I plan to later on include processing of the individual stereo channels, with regards to amplitude of the signal, and panning of the stereo signal, but for now i'm just trying to seperate the two channels of the wave file, and stitch them back together - Hopefully ending up with data that is identical to the input data.
Below is my code.
The method getRawSample works perfectly fine, and i can stream audio through that function.
The problem is my getSample method. Somewhere along the line, where i'm seperating the two channels of audio, and joining them back together, the audio gets distorted. I have even commented out the part where i do amplitude and panning adjustment, so in theory it's data in -> data out.
Below is an example of my code:
class Sample(threading.Thread) :
def __init__(self, filepath, chunk):
super(Sample, self).__init__()
self.CHUNK = chunk
self.filepath = filepath
self.wave = wave.open(self.filepath, 'rb')
self.amp = 0.5 # varies from 0 to 1
self.pan = 0 # varies from -pi to pi
self.WIDTH = self.wave.getsampwidth()
self.CHANNELS = self.wave.getnchannels()
self.RATE = self.wave.getframerate()
self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK # maximum even number of chunks
self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string
self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string
self.framePos = 0 # keeps track of how many chunks of data fed
# panning and amplitude adjustment of input sample data
def panAmp(self, data, panVal, ampVal): # when panning, using constant power panning
[left, right] = self.getChannels(data)
#left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
#right = np.multiply(0.5, right) # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
outputList = self.combineChannels(left, right)
dataResult = struct.pack(self.pckstr, *outputList)
return dataResult
def getChannels(self, data):
dataPrepare = list(struct.unpack(self.unpstr, data))
left = dataPrepare[0::self.CHANNELS]
right = dataPrepare[1::self.CHANNELS]
return [left, right]
def combineChannels(self, left, right):
stereoData = left
for i in range(0, self.CHUNK/self.WIDTH):
index = i*2+1
stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
return stereoData
def getSample(self, panVal, ampVal):
data = self.wave.readframes(self.CHUNK)
self.framePos += 1
if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process
self.wave.rewind()
data = self.wave.readframes(self.CHUNK)
self.framePos = 1
return self.panAmp(data, panVal, ampVal)
def getRawSample(self): # for debugging, bypasses pan and amp functions
data = self.wave.readframes(self.CHUNK)
self.framePos += 1
if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process
self.wave.rewind()
data = self.wave.readframes(self.CHUNK)
self.framePos = 1
return data
i am suspecting that the error is in the way that i stitch together the left and right channel, but not sure.
I load the project with 16 bit 44100khz .wav files.
Below is a link to an audio file so that you can hear the resulting audio output.
The first part is running two files (both two channel) through the getSample method, while the next part is running those same files, through the getRawSample method.
https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav
Basing on the audio, as said earlier, it seems like the stereo file gets distorted. Looking at the waveform of above file, it seems as though the right and left channels are exactly the same after going through the getSample method.
If needed, i can also post my code including the main function.
Hopefully my question isn't too vague, but i am grateful for any help or input!

As it so often happens, i slept on it, and woke up the next day with a solution.
The problem was in the combineChannels function.
Following is the working code:
def combineChannels(self, left, right):
stereoData = left
for i in range(0, self.CHUNK):
index = i*2+1
stereoData = np.insert(stereoData, index, right[i:(i+1)])
return stereoData
The changes are
For loop bounds: as i have 1024 items (the same as my chunk size) in the lists left and right, i ofcourse need to iterate through every one of those.
index: the index definition remains the same
stereoData: Again, here i remember that im working with lists, each containing a frame of audio. The code in the question assumes that my list is stored as a bytestring, but this is ofcourse not the case. And as you see, the resulting code is much simpler.

script to remove an entire cell of a part automatically in ABAQUS

I am trying to write an script to remove cells in a part in ABAQUS if the cell volume is smaller than a given value.
Is there a simple command to delete a cell?
This is what I have tried:
# Keeps cells bigger than a certain minimum value 'paramVol': paramVol=volCell/part_volume_r
cellsVolume = []
pfacesInter_clean = []
allCells = pInterName.cells
mask_r = pInter.cells.getMask();
cellobj_sequence_r = pInter.cells.getSequenceFromMask(mask=mask_r);
part_volume_r = pInterName.getVolume(cells=cellobj_sequence_r);
volume_sliver = 0
# get faces
for i in range(0, len(allCells)):
volCell = allCells[i].getSize()
cellsVolume.append(volCell)
paramVol = volCell / part_volume_r
print 'paramVol= '+str(paramVol)
if paramVol < 0.01:
print 'liver Volume'
#session.viewports['Viewport: 1'].setColor(initialColor='#FF0000') #-->RED
faces = allCells[i].getFaces()
highlight(allCells[i].getFaces())
#pfacesInter_clean = [x for i, x in enumerate(pfacesInter) if i not in faces]
volume_sliver += volCell
else:
print 'Not an sliver Volume'
Thanks!

How about this, assuming pInter is a Part object:
pInter.RemoveFaces(faceList=[pInter.faces[j] for j in pInter.cells[i].getFaces()])
Update: once the common face of two cells is deleted, both cells cease to exist. Therefore, we need to do a little workaround:
faces_preserved = # List of faces that belong to cells with 'big' volume.
for cell in pInter.cells:
pInter.RemoveFaces(faceList=[face for face in pInter.faces if \
face not in faces_preserved])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Locate Timestamps of a Subset of a Video - python

Related

Concatenate a video, image and audio using ffmpeg

OpenCV (Python) VideoCapture.read() on missing frames

How to list lowest values in numpy array

Choppy audio from separating and then joining .wav stereo channels

script to remove an entire cell of a part automatically in ABAQUS

Categories

Resources