So I'm trying to extract every frame of a video, then use ffprobe to see when each frame is played within a video, then be able to stitch that video back together using those extracted images and ffprobe output.
Right now, I have this batch file:
for %%a in (*.mp4) do (
mkdir "%%~na_images" > NUL
ffmpeg.exe -hide_banner -i "%%a" -t 100 "%%~na_images\image-%%d.png"
ffprobe.exe "%%a" -hide_banner -show_entries frame=coded_picture_number,best_effort_timestamp_time -of csv > "%%~na_frames.txt"
)
First, a directory is made for the images.
Then ffmpeg extracts all the frames of the video to individual PNG files, which are numbered appropriately.
Lastly, ffprobe sees when each frame is first shown within that video (IE: frame 1 is shown at 0 seconds, but at say 60fps then frame 2 is played at 0.016667 seconds in the video). The output looks like this:
frame,0.000000,0
frame,0.000000
frame,0.017000,1
frame,0.023220
Where the first number (IE 0.17000 is the time the second frame appears) and the 2nd number is the frame number.
Now my problem is using ffmpeg to take each frame and place it in the proper time within the video. I can do this using another language (probably Python), but my best guess is to make a loop to iterate through the ffprobe output file, get the frame time and image number, place that frame at the points that it appears, then move on to the next frame and time placement. Looking at the frame data I used as an example above, it 'd be something like this:
for line in lines:
mySplit = line.split(',')
# Get image number 0 and insert at time 0.000000
This is the part that I'm not sure how to do in a coding sense. I can read in and parse the lines of the ffprobe output text file, but I have no idea how to insert the frames at certain points in a video using ffmpeg or similar solutions.
You need to tell the system there is more than 1 token. i.e
for /f "tokens=1-4 delims=," %%a in ...
Here you tell it that you want to grab tokens 1 to 4 and delimeter is ,
So for an example of a file containing 1,100,1000,10000 it will assign the %%a to the first token (being 1, and %%b to the second (being 100) etc.
Related
I am using ffmpeg on JEtson Xavier NX to split a video into frames as follows
ffmpeg -i input.mkv -r 30 %2d.jpg
This generates output as
1.jpg
2.jpg
3.jpg...etc
However I want to start the counter from a custom number and not 1.
For example
32.jpg
33.jpg
....etc
The starting number of the frame should based on some variable that I want to initialize and then it should proceed labeling sequentially from there. How do I do this?
Assume an H.264-encoded video stream is stored in an mp4 container. What is a straightforward way of detecting frame types and associating them with parts of the stored data?
I can extract the frame types using the below command. I would like to associate them with specific data segments (e.g. byte X to byte Y) so that I can apply different amounts of noise to I-, P-, and B-frames. At the moment, I'm using a simple Python script to flip random bits in the stored data at a fixed error rate, regardless of the frame type.
ffprobe -show_frames <filename>.mp4
As I later discovered, one way of identifying the bytes of each I-frame is by using ffprobe and the pkt_pos and pkt_size parameters. The first represents the first byte of the observed frame, while their sum reduced by one gives the frame's last byte. In Python, a frame's bytes are then extracted using
with open(f'{name_root}.{name_extension}', 'rb') as f:
data = list(f.read())
frame = data[pkt_pos:pkt_pos+pkt_size]
Dealing with multiple frames makes matters more complicated. The below will display the positions of all I-frames
ffprobe -show_frames <filename>.mp4 | grep "=I" -B 18 -A 11 | grep "pkt_pos"
I decided to copy the output to a CSV which I then open in Python.
Note, pkt_pos in the above can be substituted with pkt_size or any other parameter.
I'm trying to get the number of audio tracks in a video file. The video have multiple tracks (like different, selectable languages for the same movie.) So if there are three optional languages for the video, i'd like to get the number 3 in the end, no matter if the audio is in stereo, mono or in 5.1.
So far I tried to do it with moviepy. I found only the function "reader.nchannels", but that counts only the first audio track's left and right channel, so I get the number 2 every time.
The code right now is really simple, it looks like this:
from moviepy.editor import *
from moviepy.audio import *
clip = VideoFileClip(source)
audio_tracks = clip.audio.reader.nchannels
I also tried to get every info from the audio like this:
audio = AudioFileClip(source)
tracks= audio_tracks.reader.infos
The output for this looks like this:
"'audio_found': True, 'audio_fps': 48000}"
tburrows13, thanks for pointing to the right direction.
I was able to get the numbers of audio channels and store it in a variable through a py script. Maybe this is not the most elegant solution, but it works, so here it is, if someone needs it. You have to import "subprocess" and use ffprobe with it. ffprobe comes with ffmpeg.
To get the number of streams the command goes like this:
ffprobe <filename here> -show_entries format=nb_streams
This will give you the number of streams in the file, not just the audios, but the video streams too. There is an option to get the data only for the audio streams, but this was not necessary for my project.
You can call this command through a python script. The command needs to be a string, you can store it in a variable too. To get and store the output of this commmand in an other variable you can use this:
variable = subprocess.check_output(subprocesscommand) # subprocesscommand is the string version of the command wrote above.
If you print out now this variable the output will be something like: b'[FORMAT]\r\nnb_streams=3\r\n[/FORMAT]\r\n'
Now you just need to slice the string value, to get the number of the streams.
Thanks again for your help!
I have to use FFmpeg to detect shot changes in a video, an also save the timestamps and scores of the detected shot changes? How can i do this with a single command?
EDIT
I jumped to my use case directly, as it was solved directly using FFmpeg, without the need of raw frames.
The best and perfect solution I came across after reading tonnes of Q/A:
Simply use the command:
ffmpeg inputvideo.mp4 -filter_complex "select='gt(scene,0.3)',metadata=print:file=time.txt" -vsync vfr img%03d.png
This will save just the relevant information in the time.txt file like below:
frame:0 pts:108859 pts_time:1.20954
lavfi.scene_score=0.436456
frame:1 pts:285285 pts_time:3.16983
lavfi.scene_score=0.444537
frame:2 pts:487987 pts_time:5.42208
lavfi.scene_score=0.494256
frame:3 pts:904654 pts_time:10.0517
lavfi.scene_score=0.462327
frame:4 pts:2533781 pts_time:28.1531
lavfi.scene_score=0.460413
frame:5 pts:2668916 pts_time:29.6546
lavfi.scene_score=0.432326
I am attempting to build a MP3 decoder / parser in Python which supports files encoded by LAME or FFMPEG.
My encoding shell script is shown here:
#!/bin/bash
for i in wav/*.wav; do
i=${i##*/};
lame --nores --strictly-enforce-ISO -t --cbr -b 64 -h "wav/${i}" "mpeg/lame/${i%.wav}.mp3";
ffmpeg -i "wav/${i}" -codec:a libmp3lame -qscale:a 2 "mpeg/ffmpeg/${i%.wav}.mp3";
done
This scripts reads WAVE files located in ./wav/ and produces a controlled-bitrate MP3 of 64kbps in my ./mp3/lame/ directory, and a variable-bitrate MP3 of quality 2 in my ./mp3/ffmpeg/.
I have written a Python script that iterates through both resultant MP3s, counting the number of frames and samples. Both the LAME and FFMPEG results are equivalent (in terms of frames and samples), but their binary files are different.
The LAME/FFMPEG sample count was done by iterating through the binary MP3 files, locating and parsing the frame header, then using the MP3 spec to determine the number of samples per frame.
Number of MP3 data-frames: 112 (ignoring the Xing/Info first frame)
Number of output frames: 112*576 = 64512
Here is a comparison of the sample count for a single 4-second input file:
Input WAV # of samples = 62996
Output LAME/FFMPEG # of samples = 64512
Difference = 1516
I understand that according to the LAME FAQ file, resultant MP3 files are zero padded in the front and back to make sure the inverse MDCT is performed properly, but also because the windows overlap.
What I can't ascertain from the above FAQ, or from any previous StackOverflow post, is how to compute the number of artificially added samples. If I can be sure that all 1516 of these samples are zeros, and I can be sure of their position in the bytestream, I'd like to be able to confidently toss them out. Since there are 1516 "extra" samples and there are 576 samples per frame for a V2LIII encoding, then there must be more than two (but less than three) erroneous MPEG frames here.
Is anyone here savvy enough with MPEG encoding/decoding to know how many samples are added, and in which frames those samples will be in? In other words, will the first frame and last frame always contain blank data, or are there more frames?
The easiest way to do this is to decode the resultant MP3s with ffmpeg with loglevel debug mode.
ffmpeg -i file.mp3 -f null - -v 48
Within the console output, you'll have this line
[mp3 # 0000000002be28c0] pad 576 1105
This doesn't include the fixed encoder delay.
So the actual skipped sample count is shown by these two lines
Start padding in first frame:
[mp3 # 0000000002e6bb80] skip 1105/1152 samples
End padding in last frame:
[mp3 # 0000000002e6bb80] discard 576/1152 samples
This info is only present if the Xing header is written.