I'm trying to get the number of audio tracks in a video file. The video have multiple tracks (like different, selectable languages for the same movie.) So if there are three optional languages for the video, i'd like to get the number 3 in the end, no matter if the audio is in stereo, mono or in 5.1.
So far I tried to do it with moviepy. I found only the function "reader.nchannels", but that counts only the first audio track's left and right channel, so I get the number 2 every time.
The code right now is really simple, it looks like this:
from moviepy.editor import *
from moviepy.audio import *
clip = VideoFileClip(source)
audio_tracks = clip.audio.reader.nchannels
I also tried to get every info from the audio like this:
audio = AudioFileClip(source)
tracks= audio_tracks.reader.infos
The output for this looks like this:
"'audio_found': True, 'audio_fps': 48000}"
tburrows13, thanks for pointing to the right direction.
I was able to get the numbers of audio channels and store it in a variable through a py script. Maybe this is not the most elegant solution, but it works, so here it is, if someone needs it. You have to import "subprocess" and use ffprobe with it. ffprobe comes with ffmpeg.
To get the number of streams the command goes like this:
ffprobe <filename here> -show_entries format=nb_streams
This will give you the number of streams in the file, not just the audios, but the video streams too. There is an option to get the data only for the audio streams, but this was not necessary for my project.
You can call this command through a python script. The command needs to be a string, you can store it in a variable too. To get and store the output of this commmand in an other variable you can use this:
variable = subprocess.check_output(subprocesscommand) # subprocesscommand is the string version of the command wrote above.
If you print out now this variable the output will be something like: b'[FORMAT]\r\nnb_streams=3\r\n[/FORMAT]\r\n'
Now you just need to slice the string value, to get the number of the streams.
Thanks again for your help!
Related
While recording my screen with OBS capture, I accumulated large quantity of videos that had been subject to a forced system shutdown, leaving them unfinalized. The videos were created using an .flv format, so when I play them in VLC Player they play flawlessly, however they are missing an end time (video length). Instead, the videos show the running time as they play, but maintain the 00:00 end time, despite the actual video playing for several minutes.
From my understanding, unlike .mp4 formatting, .flv formatted video should be able to be recovered if it has not been finalized (as in the case of my footage stopped by unexpected shutdowns). Since I have a large quantity of unfinalized, I need an automated solution to fix them.
Using MoviePy write_videofile
I attempted to fix the videos by using the MoviePy write_videofile command in the python shell with the directory set to the directory of the bad video:
from moviepy.editor import * #no error
vid = VideoFileClip("oldVideoName.flv") #no error
vid.write_videofile("corrected.mp4") #IndexError
The final line created breifly created a file "correctedTEMP_MPY_wvf_snd.mp3"(only 1KB, unplayable in Audacity), shorty before throwing an exception. I recieved a massive traceback with the final teir reading:
File "\Python37-32\lib\site-packages\moviepy\audio\io\readers.py", line 168, in get_frame
"Accessing time t=%.02f-%.02f seconds, "%(tt[0], tt[-1])+
IndexError: index 0 is out of bounds for axis 0 with size 0
I assumed that this was caused by a problem with an audio reader not accepting the supposed 00:00 timestamp as the length of the video.
Using MoviePy subclip
I attempted to see if there was a way that I could manually feed MoviePy the start and end timestamps, using the subclip method. I know the video is at least 4 seconds long, so I used that as a control test:
clip = vid.subclip("00:00:00", "00:00:05") #no error
clip.write_videofile("corrected.mp4") #OSError
The write_videofile method again threw an exception:
File "\Python37-32\lib\site-packages\moviepy\audio\io\readers.py", line 169, in get_frame
"with clip duration=%d seconds, "%self.duration)
OSError: Error in file oldVideoName.flv,
Accessing time t=0.00-0.04 seconds, with clip duration=0 seconds,
Even if this method were to work, I would need to find a way to automate the process of discovering the video end time.
Using OpenCV CAP_PROP_FRAME_COUNT
One possible solution to finding the end time (video length) is to use cv2, per this post.
import cv2 #no error
vid=cv2.VideoCapture("oldVideoName.flv") #no error
vid.get(cv2.CAP_PROP_FRAME_COUNT) #returns -5.534023222112865e+17
I was not expecting to receive a negative float for this value. Further tests reveal to me that this float does not correspond at all with the length of the video, as all unfinalized videos return the same float for this request. (Normal videos do return their length for this method call) This is useful to iterate over a directory identifying unfinalized videos.
Is using MoviePy to correct a large quantity of unfinalized videos a viable or even possible solution? Is it better to use cv2 (Python OpenCV) for solving this problem?
I was able to fix the video files using yamdi, an open source metadata injector for FLV files. After downloading and installing yamdi, I can use the following command to repair an .flv file named oldVideoName.flv:
yamdi -i oldVideoName.flv -o corrected.flv
The command leaves oldVideoName.flv untouched, and saves a repaired file as corrected.flv.
I have to use FFmpeg to detect shot changes in a video, an also save the timestamps and scores of the detected shot changes? How can i do this with a single command?
EDIT
I jumped to my use case directly, as it was solved directly using FFmpeg, without the need of raw frames.
The best and perfect solution I came across after reading tonnes of Q/A:
Simply use the command:
ffmpeg inputvideo.mp4 -filter_complex "select='gt(scene,0.3)',metadata=print:file=time.txt" -vsync vfr img%03d.png
This will save just the relevant information in the time.txt file like below:
frame:0 pts:108859 pts_time:1.20954
lavfi.scene_score=0.436456
frame:1 pts:285285 pts_time:3.16983
lavfi.scene_score=0.444537
frame:2 pts:487987 pts_time:5.42208
lavfi.scene_score=0.494256
frame:3 pts:904654 pts_time:10.0517
lavfi.scene_score=0.462327
frame:4 pts:2533781 pts_time:28.1531
lavfi.scene_score=0.460413
frame:5 pts:2668916 pts_time:29.6546
lavfi.scene_score=0.432326
So I'm trying to extract every frame of a video, then use ffprobe to see when each frame is played within a video, then be able to stitch that video back together using those extracted images and ffprobe output.
Right now, I have this batch file:
for %%a in (*.mp4) do (
mkdir "%%~na_images" > NUL
ffmpeg.exe -hide_banner -i "%%a" -t 100 "%%~na_images\image-%%d.png"
ffprobe.exe "%%a" -hide_banner -show_entries frame=coded_picture_number,best_effort_timestamp_time -of csv > "%%~na_frames.txt"
)
First, a directory is made for the images.
Then ffmpeg extracts all the frames of the video to individual PNG files, which are numbered appropriately.
Lastly, ffprobe sees when each frame is first shown within that video (IE: frame 1 is shown at 0 seconds, but at say 60fps then frame 2 is played at 0.016667 seconds in the video). The output looks like this:
frame,0.000000,0
frame,0.000000
frame,0.017000,1
frame,0.023220
Where the first number (IE 0.17000 is the time the second frame appears) and the 2nd number is the frame number.
Now my problem is using ffmpeg to take each frame and place it in the proper time within the video. I can do this using another language (probably Python), but my best guess is to make a loop to iterate through the ffprobe output file, get the frame time and image number, place that frame at the points that it appears, then move on to the next frame and time placement. Looking at the frame data I used as an example above, it 'd be something like this:
for line in lines:
mySplit = line.split(',')
# Get image number 0 and insert at time 0.000000
This is the part that I'm not sure how to do in a coding sense. I can read in and parse the lines of the ffprobe output text file, but I have no idea how to insert the frames at certain points in a video using ffmpeg or similar solutions.
You need to tell the system there is more than 1 token. i.e
for /f "tokens=1-4 delims=," %%a in ...
Here you tell it that you want to grab tokens 1 to 4 and delimeter is ,
So for an example of a file containing 1,100,1000,10000 it will assign the %%a to the first token (being 1, and %%b to the second (being 100) etc.
I am using the wave library in python to attempt to reduce the speed of audio by 50%. I have been successful, but only in the right channel. in the left channel it is a whole bunch of static.
import wave,os,math
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
for i in range(0,len(frames)-1):
newframes.append(frames[i])
newframes.append(frames[i])
w.writeframesraw(newframes)
why is this? since I am just copying and pasting raw data surely I can't generate static?
edit: I've been looking for ages and I finally found a useful resource for the wave format: http://soundfile.sapp.org/doc/WaveFormat/
If I want to preserve stereo sound, it looks like I need to copy the actual sample width of 4 twice. This is because there are two channels and they take up 4 bytes instead of 2.
`import wave
r=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\aha.wav","r")
w=wave.open(r"C:\Users\A\My Documents\LiClipse Workspace\Audio
compression\Audio compression\ahaout.wav","w")
frames=r.readframes(r.getnframes())
newframes=bytearray()
w.setparams(r.getparams())
w.setframerate(r.getframerate())
print(r.getsampwidth())
for i in range(0,len(frames)-4,4):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i+2])
newframes.append(frames[i+3])
w.writeframesraw(newframes)`
Edit 2:
Okay I have no idea what drove me to do this but I am already enjoying the freedoms it is giving me. I chose to copy the wav file into memory, edit the copy directly, and write it to an output file. I am incredibly happy with the results. I can import a wav, repeat the audio once, and write it to an output file, in only 0.2 seconds. Reducing the speed by half times now takes only 9 seconds instead of the 30+ seconds with my old code using the wav plugin :) here's the code, still kind of un-optimized i guess but it's better than what it was.
import struct
import time as t
t.clock()
r=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/aha.wav","rb")
w=open(r"C:/Users/apier/Documents/LiClipse Workspace/audio editing
software/main/output.wav","wb")
rbuff=bytearray(r.read())
def replacebytes(array,bites,stop):
length=len(bites)
start=stop-length
for i in range(start,stop):
array[i]=bites[i-start]
def write(audio):
w.write(audio)
def repeat(audio,repeats):
if(repeats==1):
return(audio)
if(repeats==0):
return(audio[:44])
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*repeats), 44)
return(audio+(audio[44:len(audio)-58]*(repeats-1)))
def slowhalf(audio):
buff=bytearray()
replacebytes(audio, struct.pack('<I', struct.unpack('<I',audio[40:44])
[0]*2), 44)
for i in range(44,len(audio)-62,4):
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
buff.append(audio[i])
buff.append(audio[i+1])
buff.append(audio[i+2])
buff.append(audio[i+3])
return(audio[:44]+buff)
rbuff=slowhalf(rbuff)
write(rbuff)
print(t.clock())
I am surprised at how small the code is.
Each of the elements returned by readframes is a single byte, even though the type is int. An audio sample is typically 2 bytes. By doubling up each byte instead of each whole sample, you get noise.
I have no idea why one channel would work, with the code shown in the question it should be all noise.
This is a partial fix. It still intermixes the left and right channel, but it will give you an idea of what will work.
for i in range(0,len(frames)-1,2):
newframes.append(frames[i])
newframes.append(frames[i+1])
newframes.append(frames[i])
newframes.append(frames[i+1])
Edit: here's the code that should work in stereo. It copies 4 bytes at a time, 2 for the left channel and 2 for the right, then does it again to double them up. This will keep the channel data from interleaving.
for i in range(0, len(frames), 4):
for _ in range(2):
for j in range(4):
newframes.append(frames[i+j])
I've use pydub to output a file(chop the file into shorter one), everything is great, but the bitrate has changed from 256k to 124k(why I will get this number instead 128k?). I know that AudioSegment has an argument to set bitrate, but I just want the same bitrate instead manually set every time. Any way to fix this issue?
This has mainly to do with ffmpeg/avlib, but you can pass a flag to the AudioSegment().export() method to specify the bitrate you'd like:
from pydub import AudioSegment
from pydub.utils import mediainfo
source_file = "/path/to/sound.mp3"
original_bitrate = mediainfo(source_file)['bit_rate']
sound = AudioSegment.from_mp3(source_file)
sound.export("/path/to/output.mp3", format="mp3", bitrate=original_bitrate)
I was unable to use the example above using the mediainfo object. I just found the way to calculate the bitrate for WAV files here and used that.
Translating it into python and pydub, and assuming the pydub object is called wav you would get that:
bitrate = str((wav.frame_rate * wav.frame_width * 8 * wav.channels) / 1000)
Then you could pass it forward into the export function and not set it manually. Hope it helps :)