This script is now available as an ongoing project that can be downloaded. Please see the Video Encoding Project. This post describes the methods and mathematics behind the script.
After much searching, I found almost nothing regarding how to target a specific file size with ffmpeg, such as 700MiB to fit on a CD. Almost everything I found suggested to use a negative bitrate for mencoder, such as -700000. However, this did not work for me at all (even by copying and pasting verbatim). Because of this, I set about doing it manually with some mathematics.
What I am going to do here assumes a single audio stream, a single video stream and nothing else in the output file at all (i.e. no subtitles, no alternate languages, no director’s commentaries, etc).
When encoding media, audio uses a constant bit rate. This means that you can calculate how much space it will take if you know which bit rate you are using and how long the stream is in seconds. This also means that to know how big the video stream should be, we can calculate how big the audio stream will be and subtract it from the target file size. For example, if the target file size is 350MiB and the audio will take up 30MiB, we know the video should aim to be 320MiB (ignoring overheads in the multiplexing, container headers, etc). We can use the same calculations in reverse to work out an average or constant bit rate from a file size and a length in seconds.
In this example, I will be using Dad’s Army Series 6 Episode 1 – The Deadly Attachment. I have already ripped this from the DVD with SlySoft AnyDVD and PgcDemux on Windows. Automating PgcDemux rips is worth a whole entry in itself. Note that I have bought this legally and I do not intend to sell or share it. I just want a copy that I can store on my network and play from any computer. Storing 14 DVD images on hard drives takes quite a lot of space, which is where encoding comes in handy. In fact, some of my box sets contain over 60 DVDs (roughly 270GiB).
———-
To install ffmpeg on Debian, it is best to get it from the Debian Multimedia repository. Add
deb http://www.debian-multimedia.org lenny main
deb-src http://www.debian-multimedia.org lenny main
to your /etc/apt/sources.list and then run
# apt-get update
# apt-get install debian-multimedia-keyring
# apt-get update
# apt-get install ffmpeg
———-
The first thing we need to do to put this into practice is to get the length of the stream. If you supply the file to ffmpeg without telling it to do anything, it will give you an error. However, this error already contains all of the information we need:
$ ffmpeg -i da_6_01.VOB
FFmpeg version SVN-r13582, Copyright (c) 2000-2008 Fabrice Bellard, et al.
configuration: --prefix=/usr ...
built on May 3 2009 12:07:18, gcc: 4.3.2
Input #0, mpeg, from 'da_6_01.VOB':
Duration: 00:29:23.90, start: 0.287267, bitrate: 5987 kb/s
Stream #0.0[0x1e0]: Video: mpeg2video, yuv420p, 720x576 [PAR 16:15 DAR 4:3], 9800 kb/s, 25.00 tb(r)
Stream #0.1[0x80]: Audio: ac3, 48000 Hz, stereo, 192 kb/s
Must supply at least one output file |
We can see that the duration is 00:29:23.90.
As this is an error, we need to redirect stderr to stdout. This can be done with 2>&1. We then need the line that contains the “Duration”, which we can get with grep. We can cut this line up to take the time out.
$ ffmpeg -i da_6_01.VOB 2>&1 | grep Duration | cut -d ' ' -f 4 | cut -d '.' -f 1
00:29:23 |
This can be converted into seconds with the formula s' = (h * 60 + m) * 60 + s.
I used a python script to do this and to perform the necessary calculations mentioned earlier. Note that most of this python script is just for the convenience of getopts and that, if needed, it can be cut down significantly at the cost of flexibility. If you always want the same output size and audio bitrate, this script can be simplified to just a couple of lines.
calc_bitrate.py:
#!/usr/bin/python
from getopt import getopt, GetoptError
from sys import argv, exit
def secs(h, m, s):
return (h * 60 + m) * 60 + s
def calc_video_bitrate(target_size, abr, time):
target_bytes = target_size * 1024 * 1024
audio_bytes = ((abr * 1000) / 8) * time
return (target_bytes - audio_bytes) / time * 8
try:
opts, args = getopt(argv[1:], "s:a:t:", ["size=", "abr=", "time="])
except GetoptError, err:
print str(err)
exit(2)
target_size = 350 #MiB
abr = 128 #kbps
time = 0
for o, a in opts:
if o in ("-s", "--size"):
target_size = int(a)
elif o in ("-a", "--abr"):
abr = int(a)
if len(args) < 1:
print "Usage: " + argv[0] + " [-s <size (M)>] [-a <audio bit rate (k)>] [[hh:]mm:]ss"
exit(2)
time_part = args[0].split(":")
if len(time_part) < 2: time_part = [0] + time_part
if len(time_part) < 3: time_part = [0] + time_part
time = secs(int(time_part[0]), int(time_part[1]), int(time_part[2]))
print "%d" % calc_video_bitrate(target_size, abr, time) |
A typical usage for this would be ./calc_python.py -s 350 -a 128 29:23.
We can then use a combination of these inside a shell script easily. I’m using bash syntax just to nest the commands into a single line but for compatibility with other shells, backticks (`) can be used and this is what I normally prefer. This script loops through all of the .VOB files in the directory and gives their target video bitrate for a 350MiB output and 128kbps audio.
#!/bin/bash
for vob in *.VOB
do
BR=$(./calc_bitrate.py -s 350 -a 128 $(ffmpeg -i $vob 2>&1 | grep Duration | cut -d ' ' -f 4 | cut -d '.' -f 1))
echo $vob $BR
done |
I use this in 2-pass VBR mode with the libx264 video encoder and libfaac audio encoder in an mp4 container just by replacing the “echo” line of the above script with the following 2 lines:
ffmpeg -i $vob -an -pass 1 -vcodec libx264 -vpre fastfirstpass -b $BR -bt $BR -deinterlace -threads 0 -y pass1.mp4
ffmpeg -i $vob -acodec libfaac -ab 128k -ac 2 -pass 2 -vcodec libx264 -vpre hq -b $BR -bt $BR -deinterlace -threads 0 -y `echo $vob | sed 's/VOB/mp4/'` |
I have 2 versions of this script, one with “-deinterlace” and one without. PAL video (576i – used in Europe), such as Dad’s Army needs deinterlacing. NTSC video (480 – used in North America) usually does not.
For ripping DVDs to H.264 and maintaining the quality, I wouldn’t use a bitrate lower than 1000k or an audio bitrate lower than 128k stereo. This means that for getting media onto a CD, you probably want no smaller than 350MiB for 30 to 40 minutes or 700MiB for 1 hour or more, depending on your resolution. Also note that I only intend to play these in computers and fitting them onto CDs is just a convenience – standalone DVD players will not be able to play H.264 or AAC (Blu-ray and HDDVD players may be able to – mp4 containers and reading CDs would be more of an issue).
Update:
I have successfully tested this in a LG BD370 Blu-ray player.
I have not tried actually burning these onto CDs yet so if 2 won’t fit on a 700MiB CD, even with overburn (because of overheads that were ignored, as stated earlier) then the simplest solution is to just target a smaller file size with something like “-s 348″.
The result
Dad’s Army episodes are not all of the same length. Some are several minutes longer than others. The following shows how no matter the size of the input file, they all produce the same size output file:
$ ll da_5_*.VOB
-r--r--r-- 1 mythtv mythtv 1364660224 2009-06-04 00:01 da_5_01.VOB
-r--r--r-- 1 mythtv mythtv 1360699392 2009-06-04 00:01 da_5_02.VOB
-r--r--r-- 1 mythtv mythtv 956084224 2009-06-04 00:02 da_5_03.VOB
-r--r--r-- 1 mythtv mythtv 1355888640 2009-06-04 00:02 da_5_04.VOB
-r--r--r-- 1 mythtv mythtv 1114351616 2009-06-03 23:59 da_5_05.VOB
-r--r--r-- 1 mythtv mythtv 1034289152 2009-06-04 00:00 da_5_06.VOB
-r--r--r-- 1 mythtv mythtv 1045202944 2009-06-04 00:00 da_5_07.VOB
-r--r--r-- 1 mythtv mythtv 1345318912 2009-06-04 00:00 da_5_08.VOB
-r--r--r-- 1 mythtv mythtv 1391529984 2009-06-04 00:02 da_5_09.VOB
-r--r--r-- 1 mythtv mythtv 1342748672 2009-06-04 00:03 da_5_10.VOB
-r--r--r-- 1 mythtv mythtv 1158017024 2009-06-04 00:03 da_5_11.VOB
-r--r--r-- 1 mythtv mythtv 1157097472 2009-06-04 00:03 da_5_12.VOB
-r--r--r-- 1 mythtv mythtv 1180098560 2009-06-04 00:04 da_5_13.VOB
$ ll -h da_5_*.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 06:18 da_5_01.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 06:55 da_5_02.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 07:30 da_5_03.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 08:08 da_5_04.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 08:44 da_5_05.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 09:21 da_5_06.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 09:58 da_5_07.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 10:36 da_5_08.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 11:15 da_5_09.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 11:54 da_5_10.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 12:29 da_5_11.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 13:05 da_5_12.mp4
-rw-r--r-- 1 mythtv mythtv 352M 2009-06-17 13:41 da_5_13.mp4
Update:
A pure python replacement script is available here that simplifies use.