Introducing mp4pack Or How to sort out your media files
Posted by moonwatcher at Wednesday, June 23, 2010
Labels: code , featured , video
This is a preliminary introduction for mp4pack, a script to automate all i have learned about managing audio and video collections over the last 10 years. Say you're surprised? Say you like it? Say it's just what you wanted? Because it's yours. Because it's GPL.
Overture
the bash version of mp4pack was mainly written for internal testing and research and is very inefficient in the way it uses some cli tools. I am currently completely rewriting it as a python script which allows to integrate the various scripts into a more cohesive application and improve speed significantly and well as error handling. It also opens the door for using c libraries of some utilities instead of a CLI interface which is faster and more robust. The python implementation will be easier to expand, allow to more fine control of which files get processed on each command and have proper config files instead of a bunch of global variables on the beginning of the file.
that said, since 99% of the time running mp4pack is spent on either x264 encoding by HandBrake or matroska muxing you can say its perfectly usable right now, in it's lame bash implementation. I am releasing it to get feedback and to satisfy the curiosity of some friends.
Introduction
media files today contain video and audio streams encoded in many different codecs and packaged in many different containers. I had the pleasure of being part of the open source audio/video community for more than 10 years and have watched many tradeoffs came and go over the years when available software and hardware kept changing. encoding and decoding complexity, compression efficiency, memory buffer limitations, compatibility with hardware decoders, meta data tags, subtitles in many languages and character encodings, chapters, cover images and so on. We have covered a long distance since the days of the now obsolete avi container who could only manage one video stream (without bi-directional predicted frames) and one audio stream. We went through many proprietary formats who could achieve more but were all lacking in some respects.
Looking back at those days we are now in video heaven. We have an open source container who can do almost anything and is supported across the board: matroska. We have a standardized MPEG container who is rapidly catching up to all that matroska can do and is even more widely supported by hardware decoders and mobile devices. We have a standardized video codec, h.264, and an extremely efficient (i am running out of superlatives here) open source implementation of it: x264. We have tools to manipulate meta data, chapter markers and attachments (for cover arts). We can multiplex utf-8 text subtitles directly into the containers. We have online, community maintained, databases with rich data on movies, entire TV shows and audio CDs that can can dish out all their riches as neat xml files. We even have almost perfect unique key registrars for the media files like imdb.
Contrary to the dark ages of the late 90's and first half of the first decade of the 21st century all those tools are standard libraries that can be compiled and run on any OS and not focused around the horrible Microsoft VFW API with its filter tangle hell no sane person could make sense of. The result of having all this goodness available as libraries and command line tools is that we now see more and more highly integrated frameworks who amalgamate all those tools into super scripts and utilities that can take care of the entire processing pipeline. Still, the amount of useful combinations is infinite and depends on what your desired inputs and outputs are.
Without any intention of inciting a flamewar, i want to focus on the Apple environment with all its desktop computers glory and satellite devices like the appletv, the iPad, the iPod touch and iPhone products. I will also focus on the latest versions of all operating systems, software and utilities. Almost everything following should also work on most linux distributions with possibly a few exceptions, but i will draw your attention to those when we get there. Apple chose to adopt the standardized MPEG mp4 container as its native media format. h.264 video compression (different hardware devices support different profiles), aac audio (appletv can also play ac3 from mp4), chapters, meta data and utf-8 subtitles all work in a harmony that, to my humble opinion, did not exist anywhere before in the space time continuum. With maybe a few exceptions, we will use open source implementations for encoding and packaging and standard Apple software for decoding, playback, browsing and control.
iTunes can be very efficient at managing HUGE collections of movies, TV show episodes, music tracks, podcasts, ePub books and audio books. it's scope is constantly growing and the name iTunes has grown to become somewhat deceiving. it can also make all that metadata-rich collection available for the assorted apple hardware devices, it can keep track of which videos you already watched, when, and where you left off. It can remember which audio tracks you listen to and how often, and it can manage syncing all that data to all your devices. If you ever tried to manage a big collection of media, by now you should be quite excited that all this is not only possible but is also available as out-of-the-box functionality. Some will argue the apple experience comes with a price tag. I will make a counter claim that compared to the quality, robustness and stability of the implementation, not to mention how often its updated, that price tag is perfectly reasonable and in fact second to none. but lets leave financial arguments outside the scope of this debate, what i am about to to describe is mostly aimed at people who agree with my above conclusion.
Apple implemented all this primarily to facilitate its position as one of the biggest content distribution conglomerate in the world. Unlike some other companies with similar aspirations, they chose to use standardized codecs and containers which in turn allowed the open source community to tap into the Apple environment and feed it with an alternative/complementary sources of content. Since apple also makes money by selling the hardware and software you might say this created a symbiotic relationship between apple and its customer base that is mutually beneficial. It's offerings can satisfy a wide variety of consumers ranging form the least knowledgeable user to the most sophisticated geek. I personally think this symbiosis is the source of apple's power and the reason for their meteoritic success. This too will be left outside the scope of the debate.
Declared Goals
mp4pack glues together several open source tools to implements a processing pipeline that outputs apple universe conforming media files and takes almost any input format sufficiently popular on the internet (and decodable by ffmpeg). To focus the discussion further i will summarize the inputs and outputs it is currently targeting
input
- container: avi, mkv, mp4
- video: mostly h.264 and h.263 but potentially anything decodable by ffmpeg
- audio: mostly mp3, aac, ac3 and dts but possibly a few others for which open source decoders are available for
- subtitles: mostly srt but ass, ssa and sub formats can be easily transcoded (its just text after all)
- chapters: any sane text based format, its just a name and a timestamp after all
- artwork: jpg and png
- metadata: thetvdb.com for tv shows, themoviedb.org for movies, imdb as a unique key registrar.
output
- container: mkv, mp4
- video: h.264 implemented by the open source x264 codec
- audio: aac and ac3
- subtitles: utf-8 encoded, tx3g subtitles tracks in mp4 container
- chapters: native mp4 chapter markers with artwork
- artwork: embedded in the mp4 container
- metadata: we can mimic almost all the tagging you see on files coming from the iTunes store.
note: the apple proprietary core aac encoder, available on OS X as a framework, is regarded by many in the community as far superior to the open source faac so we can use that if we want in the latest versions of HandBrake.
Repository Structure
Like any pipeline we need to establish some conventions about our data structure. Nothing too fancy, but we need a canonic directory structure and file naming schemes so that the scripts know what we feed them and we know where to find the files they produce. Just specifying input and output directories is not enough because we often assemble our output from multiple input files. Media files tends to be very big and consume large amounts of storage space, and while storage has become ridiculously cheap today, it still comes in chunks of 1 to 2 terabytes, so we define the concept of a volume and replicate the entire directory structure on each volume and we can have as many volumes as we want. because all the operations are preformed in a shell environment, we can use some symlinks to make it all easy to navigate and our volumes can span network storage over nfs, afp or any other mounting mechanism transparent to the shell.
root/volume/media kind/container/...
- root: the folder containing symlinks to all the volumes.
- volume: an atomic storage unit. The directory structure is the same on all volumes
- media kind: TV Show, Movie, Music, AudioBook or anything iTunes supports and categorizes
- container: m4v, m4a, mkv, artwork, chapter, srt, xml. m4a and m4v are just names for mp4, nothing more
under the container level things furcate for various cases.
.../(mkv|m4v|m4a)/profile/...
video and audio containers will be further broken down into profiles. a profile is a set of settings for the encoder that created the files. a profile can be appletv denoting the files under it were transcoded to be compatible with playback on the appletv or ipod to denote they were scaled down to 320x480 and constrained accordingly to play nicely on the ipod hardware. once you establish the constraints for your target device you can define an additional profile and ask mp4pack to transcode an entire directory subtree to this profile. another case of profiles is when you simply check in files that were encoded at some unknown settings and you want to prepare them for transcoding into your preferred profile, in this case you can simply file them under an 1080, 720 or sd profile for future reference.
.../(artwork|xml|chapter)/...
those are normally managed by mp4pack automatically and store caches of the downloaded xml and artwork files used for tagging and chapter markers extracted from files or provided in any other means. the internal directory structure is obviously different between tv shows and movies due to the different topology
for movies: .../srt/profile/language/...
for tv shows: .../srt/tv show/season/profile/language/...
because we don't yet have a universal method of extracting subtitles into srt format from mp4 and because subtitles are essentially text files and are very small, we can keep a copy of them in our repository. This makes things easier and more flexible. profiles have slightly different semantic for subtitles. subtitles can be obtained from many online, community maintained, websites and sometimes require some intervention to fit your existing media files. frame rate conversion from PAL to NTSC, applying a fixed time offset or just filtering out unwanted text with regular expressions. iPods b0rk on empty subtitles so we want to get rid of those too, which require reindexing to keep the indexes consistent in srt format. mp4pack can do most of that with very little intervention and put the modified subtitles in a different profile so you can choose to multiplex those into your files later but still keep the originals so you can do the process again later if you find out you did something horribly wrong.
Unique keys and canonic file names
You must have been asking yourself that question by now (shame on you if you haven't) but how the hell does mp4pack know if a given video file contains Citizen Kane or the 3rd episode of the second season of Dexter? If you have some database background you know we need a primary key and we need it in the file name because we can't make too many assumptions about what can be stored inside the file (avi files, for instance, can't store meta data). All we need is a canonic form mp4pack can understand and extract from the file name. the canonic form we use is different for tv shows and movies. This is the small amount of manual labour you will need to preform to allow the rest to happen automatically. with a little shell experience even that can be reduced to a mere few seconds of typing.
for movies we use the imdb id. it is fairly easy to find, any movie has one (well almost any movie) and because it was there first most other meta dat aggregators can fetch by it. so file names look like this:
IMDb<imdb id> <movie name>.<file type>
in fact you don't even need to do that, it's enough to rename the file to:
IMDb<imdb id>.<file type>
and ask mp4pack to rename it to the canonic form.
for tv shows its a bit tricker, we have a config file mapping thetvdb.com tv show id to the tv show name (you only need to add an entry once when you started collecting episodes for that show) and the episodes are names accordingly:
<tv show name> s<season with leading zero>e<episode with leading zero> <episode name>.<file type>
as before its enough to rename it:
<tv show name> s<season with leading zero>e<episode with leading zero>.<file type>
and let mp4pack rename it to the canonic form.
once you rename the files into their canonic names you can ask mp4pack to import them and it will copy them into their proper place in the repository. let's sum up the subject with an example:
+ pool
+ alpha
+ movie
+ artwork
+ tt0033467
+ poster.jpg
+ chapter
+ IMDbtt0033467 Citizen Kane.txt
+ m4v
+ appletv
+ IMDbtt0033467 Citizen Kane.m4v
+ srt
+ original
+ eng
+ IMDbtt0033467 Citizen Kane.srt
+ heb
+ IMDbtt0033467 Citizen Kane.srt
+ clean
+ eng
+ IMDbtt0033467 Citizen Kane.srt
+ heb
+ IMDbtt0033467 Citizen Kane.srt
+ xml
+ tt0033467
+ info.xml
+ tvshow
+ artwork
+ chapter
+ Dexter
+ 2
+ Dexter s02e03 An Inconvenient Lie.txt
+ m4v
+ appletv
+ Dexter
+ 2
+ Dexter s02e03 An Inconvenient Lie.m4v
+ ipod
+ Dexter
+ 2
+ Dexter s02e03 An Inconvenient Lie.m4v
+ srt
+ Dexter
+ 2
+ clean
+ eng
+ Dexter s02e03 An Inconvenient Lie.srt
+ heb
+ Dexter s02e03 An Inconvenient Lie.srt
+ original
+ eng
+ Dexter s02e03 An Inconvenient Lie.srt
+ heb
+ Dexter s02e03 An Inconvenient Lie.srt
+ xml
+ Dexter
+ en.xml
+ actors.xml
+ banners.xml
+ en.xml
+ music
+ m4a
+ ipod
+ Bob Dylan
+ Nashville Skyline
+ 06 Lay Lady Lay.m4a
+ lossless
+ Bob Dylan
+ Nashville Skyline
+ 06 Lay Lady Lay.m4a
+ beta
+ gama
Utilities
ok, so mp4pack doesn't actually do all that work on its own, it just calls lots of other tools made by many other people and tells those tools all they need to know. so we need to talk about what tools mp4pack depends on. Most you can download compiled, all i recommend compiling on your own from latest snapshots if you can. hopefully i will write about how to do that too at some point, but for now RTFM...
You will need to edit the global variables on the beginning of mp4pack to make sure they point to the path where you stored all the utilities and config files.
show.thetvdb
a file with mapping of tv show names to their thetvdb id. i know its ugly, mp4pack.py will make this better...
mp4pack.movie and mp4pack.tvshow
mp4pack.movie and mp4pack.tvshow implement functionality specific to each media type and are loaded accordingly when running the script. as lame as it sounds, it would have been much more complicated to implement differently in bash and at that point i already had plans to move to python, so think of it as a temporary solution. it also means you can't operate on a directory that has both tvshow and movie type files directly in the same level (the corresponding driver is loaded by probing the path for either /tvshow/ or /movie/ ) but since the script is called recursively for each directory you can just call it on the parent directory and it will work.
mkvmerge and mkvextract
matroska tools from the mkvtoolnix package. its not a very short walk to build on your own, mainly because it depends on boost which takes ages to build even on a mac pro, but you can install it from macports.
Handbrake
Command line version of HandBrake, the super video transcoder with apple friendly settings. You need the CLI version. You should probably not use an official release because those are quite old either. Best is to build it on your own, its not very difficult, the build scripts are quite friendly and documented. second best is the developers snapshots
Subler
Subler builds on mp4v2 and can manage chapter markers, tags and stream multiplexing including subtitle tracks in mp4 files. You need the command line version, SublerCLI, which you can either download or build with xcode (not difficult). Subler is written in Objective -C and only works on OS X. The command line version doesn't depend on Cocoa and potentially can be built on linux. if you know how to do that drop me a line.
mp4v2
mp4v2 utilities for manipulating mp4 files. Subler is slowly taking over the functionality provided by those and uses the mp4v2 library internally.
mplayer
a command line version of mplayer for probing data about streams inside files. Building it on OS X makes launching the space shuttle look like an easy task but i am sure you can find a ready made statically built binary somewhere. Alternatively you can install the dynamic version from macports. i hope to remove this requirement in mp4pack.py
subclean
a subtitle manipulation script in python, bundled with mp4pack. It was inspired by subcli and will eventually be integrated into mp4pack.py. The config files contain regular expressions for lines to be removed and phrases to be replaced. those need to be placed somewhere and the SUBCLEAN variable need to reflect their correct location.
mkvdts2ac3
a script for adding an ac3 sound track transcoded from a DTS sound track. We need this because HandBrake tends to loose sync when transcoding DTS into aac. This will be integrated into mp4pack.py. mkvdts2ac3 requires libdca for DTS decoding and aften for ac3 encoding. both are quite easy to build.
mp4pack is hosted on github and released under GPL.
