Last December, I set about on the task of downloading and testing a huge number of files that were known, at one point, the crash FFmpeg. I devised a system for automatically running the files and determining whether they still crash in FFmpeg. Quite a few of them did. Then, I sort of let the project sit.
I got around to running a new round of tests with the utility I created in December and compared the results with those of 4 months ago. Today’s test was conducted with FFmpeg SVN-r18707 built with “gcc: 4.0.1 (Apple Inc. build 5484)”, 32-bit version, and run on Mac OS X.
Result | December 8, 2008 | April 27, 2009 |
---|---|---|
Success | 2148 | 2781 |
FFmpeg error | 1333 | 1389 |
SIGABRT | 6 | 6 |
SIGFPE | 376 | 1 |
SIGKILL (timeouts) | 16 | 17 |
SIGBUS | 7 | 97 |
SIGSEGV | 529 | 123 |
Great progress, especially on those floating point exceptions. I’m pretty sure nearly all of those were attributable to one or a few problems in the Real demuxer that have since been addressed. The only remaining problem in the FPE category is an AVI file.
The timeout category represents the number of files that ran longer than a minute (need to keep the process moving). The “FFmpeg error” category (return code 1) is on the rise. I surmise that’s because FFmpeg is getting better at rejecting errant files vs. crashing on them. I should really formulate a query that reveals which files’ status changed, and how, between runs.
A big reason I sat on this project for so long is that I didn’t know how to proceed. Should I start testing the problem files manually, collect stack traces, and flood the FFmpeg issue tracker with hundreds of new reports? I don’t want to deal with that kind of manual labor and I don’t think my co-devs want to deal with that volume of (possibly redundant) bug traffic.
Since December, I have developed another idea: Automatically running the problem files through gdb and looking for patterns. For example, I manually checked those 6 crashers that threw SIGABRT (the same 6 files from each run, BTW, and all ASF files). They all seem to fail as follows:
Program received signal SIGABRT, Aborted. 0x96dbbe42 in __kill () (gdb) bt #0 0x96dbbe42 in __kill () #1 0x96dbbe34 in kill$UNIX2003 () #2 0x96e2e23a in raise () #3 0x96e3a679 in abort () #4 0x96e2f3db in __assert_rtn () #5 0x00026529 in ff_asf_parse_packet (s=0x1002600, pb=0xa00200, pkt=0xbfffe954) at /Users/melanson/ffmpeg/ffmpeg-main/libavformat/asfdec.c:709
It would be nice to create a script that identifies that all 6 of those files suffer from the same, or similar problem and group those files together in a report. I am not sure if gdb offers non-interactive options that are conducive to this situation. I know it has a -batch mode, but I’m not really sure what that’s for. If need be, I can always create a Python script that opens gdb in interactive mode and has a stdin/stdout conversation with it.
See Also:
Once you get through with playback, how about testing encode and streamcopy/audio extraction? Just in normal use I’ve found tons of files that immediately break with timestamp errors.
Maybe it would be a good idea to also test these files under valgrind. Linux and MacOS are particularly forgiving about reading a few bytes beyond the end of allocated memory, so it might well be that some of the files that didn’t crash on linux might crash on cygwin.
Another advantage is that valgrind allows to make a distinction between invalid reads and invalid writes (sec holes).
-Vitor
There is a tool called breakpad that can collect these kind of things.
http://code.google.com/p/google-breakpad/
And you are right we need to group the errors to be able to handle the load.
I always wanted some kind of bayesian filter able to find dupes out of build logs or backtraces, haven’t had much lack yet. The main problem happens when you don’t have a fixed released binary and thus you get different addresses in the traces…
But anyway the first step is remove the part of the trace that gets in the way; in this case the frames 0-5 are useless; if you run the stuff through gdb (rather than using post-mortem core analysis) you should probably break on “abort()“ to avoid adding more frames.
@Benjamin: While that breakpad tool sounds useful in theory, its precise operation seems to be none of our business. I can’t find any useful information about how to apply the tool (though I can download and compile it without incident).
@Flameeyes: At the very basic level, I was planning to strip out the addresses and only rely on symbol names. This would also entail stripping out function parameters since those will vary between runs.
@Mike They have this example, but I’m not sure it is easy to use.
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/testdata/test_app.cc
It’s something, even if it is C++. Thanks for the pointer. I’ll try to work with that.
Maybe the –command= option to GDB may be of use to run commands and create log files of the output.