I’ve been feeling a bit scattered for the last week since I was fired from my volunteer position as the FFmpeg QA manager, wondering if there is anything I should attempt to do with the project. It can’t be denied that the new system is working well. But one area I’ve wondered about is test coverage.
Under my old regime I tracked test coverage as a wiki page which was a highly flawed method– tedious and error-prone. There are those 2 adjectives again– tedious and error-prone; whenever I see those, I search for automation methods. I think that might be more plausible thanks to the new FATE’s tighter integration with the FFmpeg build system.
I don’t think anyone is working on this problem so I wanted to toss out a brainstorm:
- First, run ‘ffmpeg -formats’, ‘ffmpeg -codecs’, etc. and parse the output to collect a list of all the features (full list: -formats, -codecs, -bsfs, -protocols, -filters, -pix_fmts). Transform these lists into a standardized list of features, e.g.,
"DEVSD ffvhuff Huffyuv FFmpeg variant"
represents features ‘decode-video-ffvhuff’, ‘encode-video-ffvhuff’, ‘ffvhuff-horizband’, and ‘ffvhuff-dr1’. - Next, tag each individual test spec with the features that it exercises. E.g., test ‘fate-vqa-cc’ exercises features ‘demux-wsvqa’, ‘decode-video-vqavideo’, and ‘decode-audio-adpcm_ima_ws’.
- Finally, compare the data from parts 1 and 2. Print a list of all the features that are not exercised in FATE.
I think a lot of this could be implemented at the GNU make level. Then again, I’m no expert on GNU make syntax so I may be overestimating its capabilities. Or there might be simpler ways to automatically track test coverage stats based on the improved testing infrastructure.
Have you had a look at http://www.testcocoon.org ? The idea is to measure which source code lines were actually executed in each source file during testing. It would be useful to track coding methods that are not tested, even though we do test that particular codec.
Isn’t the –coverage gcc option with gcov for processing a method to do test coverage?
Doing post-processing to map files to codecs/formats and consider those covered with e.g. > 50 % gcov coverage should be possible with reasonable effort?
Agreed, we do need to track total code coverage. This reminds me that I have some sub-projects queued up to that effect regarding computing code coverage. Thanks for the reminder and thanks for the tip about TestCocoon. I’ll investigate that in addition to gcov, which is what I had been working with.
Btw. one of the things I have been thinking about but never had the time/motivation to work on was a kind of “bug hotspot” statistic.
E.g. fuzzing files and mark where crashes happened, tracking commits that fixed a security issue (or even normal issues) and which lines they changed, and thus coming up with statistics like “code related to task A had a lot of issues, maybe we can think of a more reliable way to handle this?”
I think most developers do this kind of thing when deciding to rewrite or rearchitect some code, but I wonder how much help an automated system could provide there…
@Reimar: This makes me think of Mike’s project of testing the pixsearch files that once upon a time crashed ffmpeg. I wonder how well latest SVN would do… Maybe Mike is still the QA manager, it is just that one project was spun off his department ;)
@Vitor: Thanks for reminding me that I don’t necessarily have to plunge into a downward spiral of alcoholism and drug addiction after the transfer of FATE power. There’s still plenty of work left to do for testing FFmpeg. I can always start by combing back through all of my blog posts tagged with FATE and collect all of my brainstorms that I would like to do just as soon as FATE has a better infrastructure… which it now has.