Revised FATE Test Types

Let’s examine the types of tests I am deploying in the next revision of FATE, their specific syntax, and how they will be executed, both locally and remotely. Read through these specs and see if your idea of how to test FFmpeg is already listed. Otherwise, please leave a comment discussing more tests.

This is a long one…

Regular test:
The most basic type of test, this just specifies a command line to be executed through a shell and captures the return code, stdout and stderr for storage into the database. FATE substitutes the variables $FFMPEG_PATH and $SAMPLES_PATH for the paths of the FFmpeg binary and the sample suite, respectively. While Python substitutes these manually at runtime, variables in this form make it simple to copy and paste test specs to a shell prompt, assuming the variables are defined in the shell environment.

If running the test on a local machine through FATE, just substitute the variables and pass the command line to the shell.

If running the test on a remote machine, do the same as for a local machine, but prefix the command with ‘ssh’.

Filesize test:
This type of test spec queries the size of a file and stores it in stdout. The syntax for this test spec is ‘{FILESIZE} [path]’, where [path] may also include the variable outlined above. This is useful for tracking the size of a built binary over time. stderr is irrelevant for this test. Status code is set to 0 if the file exists; non-zero if the file does not exist.

This was the first type of meta-test I implemented in FATE because it was easy to do. The first 3 test specs I ever entered in the database track the sizes of the ffmpeg, ffplay, and ffserver binaries. This test spec isn’t actually that useful. I have had to disable the filesize checks for the latter 2 binaries since they aren’t built on all systems. Further, I eventually moved to building shared libraries where feasible (based on platform) instead of monolithic binaries. This makes tracking the main binary size less useful. Tracking the size of individual shared libraries is tougher due to varying library file extensions depending on platform.

This test spec is implemented by using Python’s file system functions to query the size of a local file. Remote execution is not considered for this test spec since binaries are always built on a local machine.

MD5 test:
The syntax for this test spec is {MD5} [regular test spec]. This will execute [regular test spec] as outline above, calculate an MD5 hash on the stdout, and replace the stdout with the MD5 hash. The status code and stderr from the executed test are untouched. This type of test spec is useful for testing bit-exactness on PCM output.

Note that while MD5 is considered deprecated for security applications, security (i.e., the possibility of a deliberate hash collision) is not a concern here (though it would probably be a remarkable programming feat). MD5 is just the first hash that came to mind when I was working on this, and these types of tests used to be implemented as regular tests with their stdout piped through the command line ‘md5sum’ command. In the current implementation, Python computes the hash internally which has the added benefit that the checksum calculation does not count against the test’s CPU runtime.

This test functions identically to the regular test spec in terms of local or remote execution. In both cases, the Python script ends up with the stdout data and computes MD5 manually.

However, while researching this post, I eventually realized that calculating MD5 external to the program is somewhat superfluous. When I was first writing test specs, I must have been unaware of the ‘-f crc’ muxer type which does an adequate job and is built into the program. Thus, I might deprecate this test in the next FATE revision and revise the existing MD5 tests to use ‘-f crc’ instead.

Regression suite:
This is a one-trick test spec: it runs the ‘make test’ command. The syntax is simply ‘{MAKETEST}’. It captures the status code and, if the status indicates failure, also captures the last 30 lines from each of the stdout and stderr channels. It also allows the exact syntax of the ‘make test’ command line to be spelled out (e.g., change to some directory and then run ‘/usr/bin/make test’).

This test spec has some shortcomings and will be deprecated in the next revision in favor of the next type of test spec.

Make invocation test:
The previous {MAKETEST} test spec was limited in at least 2 ways: 1) It could only execute ‘make test’ (and not any other test target within the Makefile); 2) A FATE configuration file could only specify one method for running ‘make’ which governed all configurations for that installation. This is at odds with my proposed parallelization refactoring that would allow one configuration to more flexibly manage building and testing for many different configurations on both local and remote machines.

The proposed syntax for the new make invocation test is ‘{MAKE} [target]’. E.g., ‘{MAKE} test’ will do the same as the existing ‘{MAKETEST}’ test spec. Other tests such as ‘{MAKE} checkheaders’ will also be possible. The capture policy will be the same as for the original ‘{MAKETEST}’ command: if the test returns non-zero, capture and store up to 30 lines from both stdout and stderr.

Further, while the configuration file will define one primary command for invoking [GNU] ‘make’, individual configurations will be allowed to override it.

As for local vs. remote testing, I have it on good authority that FFmpeg’s build system is already quite intelligent in that area (i.e., prefixing ‘ssh’ where appropriate).

MD5 file test:
The proposed syntax for this test spec is ‘{MD5FILE,[path]} [regular test spec]’. This is where test specs begin to get a little complicated. But this will also enable a huge category of new FATE tests. This spec performs [regular test spec] as defined above and then computes an MD5 hash of the file on disk as indicated by [path]. This will be used for validating the output of encoding tests. Thus far, FATE has dumped encoded test output to stdout and computed MD5 from that channel. However, such an approach is only valid with muxers that don’t have to seek backwards during the muxing process which is a fairly common operation during muxing.

All of the usual substitution rules apply, for both the regular test spec and for the path. For testing locally, use Python APIs to open, read, and compute the MD5 hash of the file. Remote testing is a little trickier, but I suppose there are a few ways to accomplish this. The most brute force method I can think of is ‘ssh [remotehost] /bin/cat [path]’ which will transmit the file data via the stdout channel so that Python can compute the MD5 hash. Obviously, this assumes that /bin/cat exists on the remote host. This probably needs to be a configurable item as well.

1-off test:
The proposed syntax for this test spec is ‘{1OFF,[ref file],[precision]} [regular test spec]’. This executes [regular test spec] as described above and collects the output in the stdout. It then computes the absolute difference between the stdout and the [ref file], treating the 2 buffers as vectors of integers whose precision is defined by [precision]. [precision] can be 8, 16, 24, or 32. Reference files will always be signed; precision > 8 wll always be treated as little endian.

The test is successful (and the status code is 0) if no absolute value in the difference vector is greater than 1. Upon failure, the status code is set to non-zero and the stdout contains a string describing the failure (e.g., the stdout size and the ref file size were different; n / total absolute values differed by more than one).

In both the local and remote instances, the test will send back decoded data via stdout for Python to process.

RMS test:
The proposed syntax for this test spec is ‘{RMS,[ref file],[threshold]} [regular test spec]’. This type of test spec runs [regular test spec], captures the stdout, computes the RMS between the stdout and [ref file], and passes the test if the RMS is below [threshold]. More discussion of this test spec can be found in the blog post Numerical Gymnastics Redux.

As I recall, this test type is at a bit of a standstill. According to my experiments, when the RMS calculation is performed in various integer precision number spaces, default FFmpeg computes passing RMS numbers. When performed in the floating point space, default FFmpeg fails RMS tests. FFmpeg must be modified to output higher precision integers from the MPEG audio decoders for the float-space RMS calculations to pass. So should we perform the less adequate integer-space RMS or try to get a change into FFmpeg that will allow us to test float-space RMS correctly?

For both local and remote instances, as in the 1-off test type, the test will send back decoded data via stdout for Python to process.

Seek test:
I’m a little bit confused on this one owing to the fact that I still don’t know how the current seek testing works in the default ‘make test’ regression suite. It seems that seek_test is just an extra utility program that is run against files that are generated early in ‘make test’. Frankly, I think this — and possibly other custom tools — can be accommodated by the test types already outlined, for both local and remote.

Extra tools will have to be built (either for local execution or properly cross-compiled, just like ‘ffmpeg’) and transported to the remote machine along with ‘ffmpeg’ in the remote case. The tests can then be executed just as ‘ffmpeg’ is executed in a standard test spec with the status code, stdout, and stderr captured and logged.

Anything else?

See Also:

10 thoughts on “Revised FATE Test Types

  1. Reimar

    I don’t think -f crc is such a great thing, crc is only 32 bit of data and the bit patterns that make it miss a change aren’t that complicated.
    I admit it is still very unlikely to miss a regression, but probably a new format based on the existing md5, sha or rc4 functions in libavutil might be better.
    For consistency md5 might be best, even if it is weak from the crypto standpoint.
    The crc muxer still has the disadvantage that it is a bit inconsistent about packets: e.g. if you split an audio packet into two, it will give the same crc, but if you interleave audio and video differently it will give different crcs.

  2. boris

    I’ve found most implementations of SHA1 are on par with MD5 as far as CPU-use goes. Given the cost of using SHA1 is the same as MD5, its superior hashing abilities should be considered.

  3. Multimedia Mike Post author

    @Reimar: Thanks for submitting the new hash format muxer to ffmpeg-devel. It actually makes me wonder if I should think harder about including some of these other test types directly into the program somehow. FATE has a little more clout now than when I first started it and I can make demands like this. :-)

  4. Reimar

    Hm, it might be possible to implement RMS and one-off as muxers, too. You could make the muxer compare stream i against stream i+1.
    The command line would likely be a bit ugly though…

  5. Ramiro

    Will it be possible for the “test” to be passed as a parameter somehow to {MAKE}?

    The mingw32 tests run on cygwin, which in turn calls an msys shell to run make test, so the test command looks something like:

    MAKE_TEST_COMMAND = “/msys/bin/sh.exe –login -c \”cd $BUILD_PATH && /usr/bin/make test\””

    “test” needs to be inside the quotes.

  6. Multimedia Mike Post author

    @Ramiro: Interesting. If you were to run another make target (like ‘make checkheaders’), would all the cruft still be the same? If so, I don’t think Cygwin’s this will pose a problem.

    Eventually, I hope to do away with ‘make test’ in FATE. However, I still want to run other make-related tests like ‘make checkheaders’.

  7. Michael Kostylev

    @Ramiro: If I understand correctly you’re asking about something like this
    MAKE_COMMAND=”/msys/bin/sh.exe -login -c ‘/usr/bin/make -C %s $@’ make” % BUILD_PATH

  8. Ramiro

    @Mike: yes for make checkheaders I’d still have to cwd from within msys’ sh.exe (or use -C like michael suggested).

    @Reimar: I don’t think so, the thing is that msys is getting lost in its cwd when sh.exe starts. I haven’t tested if make will work alright from within cygwin’s shell.

    @Michael: would $@ be “test” and such? If so I think that would be ok.

  9. Michael Kostylev

    @Ramiro: I hope so. At least you can test whether
    MAKE_TEST_COMMAND = “%s test” % MAKE_COMMAND
    works ok.

Comments are closed.