When the FATE initiative went live, I asked the guru how to handle the H.264 conformance suite samples– should I just dump all of them into the database whether they were working or not, or should I only enter the samples that worked with the current version of FFmpeg? His answer was far more complicated than I could have anticipated:
- Enter all currently working samples
- If a particular H.264 conformance vector used to work with FFmpeg, add the sample and enter a new issue in the tracker
- Otherwise, don’t add the test yet
Whoa. As you know, I got task #1 accomplished relatively easily. Now I’m back to take on task #2.
Hypothesis: most of the code that can make or break the H.264 decoding process lives in files named libavcodec/h264*. Thus, test the sample suite against every single FFmpeg revision in which one of those files was touched.
for file in libavcodec/h264*; do svn log $file; done | grep "^r.*lines$" | sed -e 's/^r\([0-9]*\).*$/\1/' | sort -n | uniq
That produces just over 400 different FFmpeg revisions that need testing. I had better get started early.
Algorithm outline:
- create a script that takes the above revision list and the directory full of H.264 conformance vectors
- create a list of standard test names based on the convention already in the database
- query the database to obtain a complete list of all tests known to work currently
- remove the working tests from the list of all tests
- for each revision:
-
- get the FFmpeg code corresponding to that revision
- build FFmpeg, and use ccache to hopefully gain a little speedup in the process
- test FFmpeg against all of the non-working samples, output results in a CSV format: “revision, 0, 0, 0, 1, 0, 0, 0, 0, 0,…”; this should facilitate analysis and serve to illustrate that the non-working samples have been broken from the get-go
Hey, computing cycles are cheap, right? Perhaps the same ambitious strategy can be employed as a one-time brute force method to learn when other FFmpeg components broke so that they can be fixed and subsequently tested via FATE. And there’s no reason I have to do this on my own; I know certain FFmpeg developers who like to brag about their cumulative 27 or so underworked CPU cores laying around their flats (you devs know who you are).
i got a monkey wrench for you
how about only testing the known ‘not working’ samples once every 10? 50? 100? commits… or maybe after each h264.c commit?
bwahaha
oops, posted too soon, ignore my whole post haha.
If you mean on a forward-going basis, I actually am doing that. The tool that I developed to dump the initial working samples into the database can be re-run at any time. In fact, I added 3 newly fixed tests to the database a little more than a week ago.