Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Meta:

Pictures of FATE Machines

March 8th, 2010 by Multimedia Mike

Apologies for not properly crediting ideas here: Someone once suggested that it would be useful to have a MultimediaWiki page that collected information about all of the various FATE machines, rather than keeping it in the central FATE database to be displayed in a very inept fashion through the main website. Further, someone (possibly the same person) thought it would be neat to have pictures of all the machines performing FATE duty. I have done both on the new FATE Machines page. Eventually, FATE will simply link over to that wiki page rather than its own internal page.



The machine shown above is pretty much the hardest working computer on the FATE farm. It sits on the floor of my living room and constantly churns away, rebuilding and testing FFmpeg for 22 different compiler configurations.

Other FATE machine administrators are welcome to edit their machine’s descriptions and upload pictures (provided they have physical access to their machines; I’m really not sure about the arrangements that some of you have).

Posted in FATE Server | No Comments »

350 Tests

February 24th, 2010 by Multimedia Mike

Another milestone of sorts — an even 350 active FATE tests. Thanks to Vitor for figuring out what was wrong with my ea-tgq test. It seems that I was being overzealous with my application of the ‘-idct simple’ option. While normally standard testing procedure for DCT-type codecs, the simple IDCT option made this test overflow, according to Vitor.

I’m really starting to run out of FATE tests to add before I’m forced to stop putting off the fundamental upgrades that would allow me to test the remaining stuff (mostly encoders, muxers, and bit-inexact audio).

I learned something else related to FATE: Don’t mount a suite of FATE samples over wireless if such an arrangement can be avoided. I was able to save around 4 minutes per test cycle on my Mac Mini by not mounting the share with 300 MB of FATE test samples via wireless-G, but instead rsyncing locally. Thus, the Mac Mini, which only has to worry about 2 configurations, tends to be the most frequent builder.

Eating my own rsync repository has the benefit of allowing me to properly test that samples are staged before I activate them, which has bitten us repeatedly.

Posted in FATE Server | 3 Comments »

Bink Video in FFmpeg

February 21st, 2010 by Multimedia Mike

Today was the day: Kostya committed his Bink video decoder to FFmpeg. Here’s just one little screenshot:


Screenshot of the attract mode Bink video from Indiana Jones and the Emperor's Tomb

Of course, this is just one Bink file out of the literal thousands of software titles that have incorporated Bink video (the above comes from Indiana Jones and the Emperor’s Tomb for Windows). For this reason, it’s entirely possible that the Bink video decoder (not to mention the Bink audio decoder and the Bink file format demuxer) might not cover all the cases out there. This is especially relevant considering intel I have received from a guy who has talked to the guy who invented Bink and described the development process. The upshot is that there could conceivably be a lot of custom Bink versions out there. That’s why Kostya hopes for a lot of testing with as many different Bink files that people can throw at this system. To that end, I started with my old Multimedia Exploration Journal and did a text search for every game that I recorded as using Bink.

Just think: The next time that YouTube and assorted other video uploading services update their video conversion backends, they can finally be flooded with Bink videos. (I know it seems silly, but I sometimes feel like my biggest contribution to open source multimedia has been to allow people to upload to YouTube video files that they found on their old Sega Saturn CD-ROMs).

As for FATE, is it plausible to get a basic decoding test staged at this point? I ran a simple sample through my RPC testing tool and learned that the video output is bit exact across platforms. Test staged.

(Aside: Thanks to Vitor Sessak, Valgrinder extraordinaire, for locating a memory bug in the Musepack v7 demuxer. Since I created and staged a v7 sample at the same time I staged a sample for the Musepack v8 demuxer, I have already activated a Musepack v7 demuxing test.)

Here’s a project for someone that likes text processing and searching puzzles: Find a simple, efficient method for comparing my list of DOS/Windows games (here’s the HTML list and here it is in CSV) against the big list of known Bink titles and find all the Bink games in my PC game collection. I have already harvested samples from: Alien vs. Predator Gold Edition, Disney’s Atlantis, Gabriel Knight 3, Gods & Generals, Halo 3 (Xbox 360), In Cold Blood, Indiana Jones and the Emperor’s Tomb, Monsters Inc. Wreck Room Arcade, Starlancer, Tony Hawk Pro Skater 2, Uru: Ages Beyond Myst.

Posted in FATE Server, Game Hacking, Open Source Multimedia | 15 Comments »

More on Adjunct Profiling

February 20th, 2010 by Multimedia Mike

People offered a lot of constructive advice about my recent systematic profiling idea. As in many engineering situations, there’s a strong desire to get things correct at the start while at the same time, some hard decisions need to be made or else the idea will never get off the ground.

Code Coverage
A hot topic in the comments of the last post dealt with my selection of samples for the profiling project. It seems that the Big Buck Bunny encodes use a very sparse selection of features, at least when it comes to the H.264 files. The consensus seems to be that, to do this profiling project “right”, I should select samples that exercise as many decoder features as possible.

I’m not entirely sure I agree with this position. Code coverage is certainly an important part of testing that should receive even more consideration as FATE expands its general test coverage. But for the sake of argument, how would I go about encoding samples for maximum H.264 code coverage, or at least samples that exercise a wider set of features than the much-derided Apple encoder is known to support?

At least this experiment has introduced me to the concept of code coverage tools. Right now I’m trying to figure out how to make the GNU code coverage (gcov) tool work. It’s a bumpy ride.

Memory Usage
I think this project would also be a good opportunity to profile memory usage as well as CPU usage. Obvious question: How to do that? I see that on Linux, /proc/<pid>/status contains a field called VmPeak which is supposed to advertise the maximum amount of memory that the process has allocated. This might be useful if I can keep the process from dying after it has completed so that the parent process can read its status file one last time. Otherwise, I suppose the parent script can periodically poll the file and track the largest value seen. Since this is testing long running processes and I think that, ideally, a lot of necessary memory will be allocated up front, this approach might work. However, if my early FATE memories are correct, the child process is likely to hang around as a zombie until the final status poll(). Thus, check the status file before the poll.

Unless someone has a better idea.

Posted in FATE Server | 10 Comments »

Another Round of Samples and Tests

February 18th, 2010 by Multimedia Mike

Thanks to all of the advice in the comments of the last post about filling in gaps in the FATE test coverage, I have staged 11 new FATE test specs:

Regarding that group of 6 Sun raster files, it’s interesting to note that the 24-bit raw Sun raster file sample is smaller than the 24-bit RLE version.

I encountered a few problems with the suggestions from the last post. Among them:

  • ami_stuff came through with a sample of a Fibonacci-encoded 8svx file. Unfortunately, it’s attached to a bug report because it’s not presently working. Test not staged.
  • I downloaded the free Command & Conquer games from EA and looked into Tiberian Sun specifically. It looks like all the game resources are wrapped up into .MIX files. Not a big deal– I wrote a program years ago to take these apart. Unfortunately, the files are in a different MIX file format, apparently. So I’m still stuck on trying to get the audio samples I need.
  • Carl Eugen pointed to a sample of Blu-Ray PCM (mpegts+h264+++trunc_read_packet_loop.m2ts). Thing is, the file has 9 streams; the pcm_bluray stream is right in the middle. I still don’t know how to tell FFmpeg to select that stream.
  • On the subject of files that have more than 1 audio and 1 video stream, most of these samples with subtitles have the same problem as encountered with the last item– I don’t know how to tell FFmpeg to process the subtitle stream. In fact, the sample from the previous item also has 4 pgssub streams. How do I select one? And will I be able to cleanly mux a subtitle stream into the framecrc format?

I think FFmpeg’s -map option may hold the key. But I’m a little too tired and annoyed to read the source code which I’m certain is the only true documentation for how it works.

Posted in FATE Server | 4 Comments »

Call for Samples

February 14th, 2010 by Multimedia Mike

In my last post regarding recently-staged FATE tests, a number of Amiga sentimentalists expressed willingness to help me track down multimedia formats that were prevalent on that platform. To that end, I ask: Where do I find Fibonacci-encoded 8svx files? 8svx files can contain several audio codecs, but I have been unable to find ones with the Fibonacci format.

While we’re on the subject, I may as well put out a general call for samples that have eluded me:

  • Fibonacci-encoded 8svx samples, as mentioned above
  • ASS/SSA samples; plus, is there any good way to test ASS/SSA subtitles using ‘ffmpeg’?
  • ADTS AAC: How do I generate that? I thought faac was supposed to help me with that but I couldn’t seem to get ADTS out of it.
  • raw Ingenient MJPEG
  • I know how to generate MPC (vs. MPC8 files, which I have already covered); the demuxer just doesn’t seem to work correctly right now.
  • There are a number of formats like NC camera feed format, rtsp, and sdp that I suspect are impossible to test from disk rather than network.
  • TXD: I think this is a raw format and that I have to supply parameters from the command line to decode it properly. I think these are valid TXD files but I don’t know their resolution (or, indeed, if they’re single images since TXD is supposed to be a texture dictionary).
  • pcm_bluray and pcm_dvd: any VOBs in the archive with these data types?
  • pcm_s16le_planar: Based on my code excavation, this is used in certain EA chunked formats, such as in NBA Live 2003 according to our wiki page on EA formats. We lack samples in the archive for that game. However, this reminds me that I really should modify the FILM/CPK demuxer so that it outputs planar audio instead of interleaving the audio in the demuxer (maybe someone else wants to get on top of that, if they’re looking for an easy task).
  • pgssub, xsub: again, where are samples and how do I test subtitle formats?
  • Sunplus JPEG (SP5X)
  • Sun Rasterfile image
  • Westwood Audio (SND1)

There are plenty of formats not covered yet according to the FATE test coverage page. For formats which have both an encoder and decoder in FFmpeg, I plan to have a better system in place in the next FATE version for testing those (which will also obviate the need for the {MAKETEST} test spec). Then there are the non-bitexact formats that require more advanced testing features which are in development.

Meanwhile, I learned that MPEG-4 ALS actually does have a formal conformance suite available (you can usually count on that for MPEG standards; take that, Xiph). So I will be disabling the current ad-hoc test spec and have staged 6 of the conformance vectors known to be correct (based on features that have been implemented thus far): 00, 01, 02, 03, 04, and 05. Further, 2 more new specs: iff-byterun1 and frwu are ready to go.

Posted in FATE Server | 17 Comments »

Indeo 5 and Partial Bink in FFmpeg

February 11th, 2010 by Multimedia Mike

There have been some great additions to FFmpeg in recent weeks. Most notable is an Indeo 5 video decoder. Congratulations to everyone who worked hard to reverse engineer this codec that was used in quite a few video games. The sample I selected for a FATE test spec is called Educ_Movie_DeadlyForce.avi:


SWAT 3: Deadly force Indeo 5 video

The video is much funnier in its original context (though it’s no longer posted there). Thankfully, the math behind Indeo 5 is bit exact which allows me to enter a test spec right away.

While Indeo 5 was used in quite a few PC games through the years, no game-related format can touch Bink. FFmpeg now includes a Bink file demuxer. Further, FFmpeg now has decoders for both variations of Bink audio (designated DCT and RDFT), which can also occur in Smacker files.

So I added new FATE test specs to cover those new additions. I also went through the FATE test coverage wiki page and eliminated a bunch of low-hanging fruit. Sometimes, there were samples (some difficult to find) at the samples archive; other times, it was necessary to do a Google search for “filetype:<file extension>”. To give you an idea of the current trends in the shifting sands of the internet, such searches invariably seem to yield Facebook pages as their top hits.

These are the new FATE tests:

Michael has been at work fixing more formal H.264 conformance vectors. 2 new tests that reflect this work are h264-conformance-frext-frext_mmco4_sony_b and h264-conformance-frext-frext2_panasonic_b. Further, I am in the process of amending the ea-mad (now ea-mad-adpcm-ea-r1) test to use a sample that has EA R1 ADPCM in addition to EA Madcow video. The new sample is staged and I will update the spec to reflect that new sample when I activate the new specs.

Regarding the iff-ilbm test, I could only find one sample on the internet for that format. It’s a bit weird:


lms-matriks

It came from a demoscene archive. I wonder if this immortalized test vector is self-deprecating humor of one’s own demo group or slander of a rival demo group?

Posted in FATE Server | 17 Comments »

Security Memory

February 4th, 2010 by Multimedia Mike

I dug up this old security alert. It’s very dear to me in that I’m directly responsible for the security problem outlined. Whenever I feel like my work doesn’t matter, I just have to remind myself that I have written code that has become widespread enough that it warrants security notices. Many programmers likely go their whole career without making that kind of impact. (That kind of positive spin might be similar to not knowing or caring about the difference between positive and negative attention.)

For the curious, I wrote an AIFF demuxer (among many others) for the xine project. For some reason, I allocated a static buffer of 100 bytes on the stack and proceeded to read a number of bytes from user input, a number that was also determined by the same user input. Big no-no, and I really don’t know what I was thinking; hardcoded, arbitrary constants (char buffer[100]) aren’t usually my style. After that was found, I audited the rest of my demuxers for similar mistakes and found none. It may seem like this would only be a problem if a user directly loaded a malicious file into xine. However, since AIFF has a MIME type, and because there was a Mozilla plugin version of xine, it would have been possible to send a malicious AIFF through a web page.

The reason I was reflecting on this was due to a major security problem I found in FATE recently as I was investigating another problem. It has to do with the data logging script that receives FFmpeg build and test information from FATE clients. I’ll let my commit message to my private git repository tell the tale:

    Get rid of mind-boggling security hazard that actually prints out the
    user's actual hash key when passed an invalid hash. This was obviously
    added for debugging purposes and was only triggered if a user had access
    to insert data for a particular configuration.

If an attacker knew a valid username, the system would cheerfully reveal the corresponding hash key if the HMAC failed. Using this vector, an attacker could have polluted the FATE database with loads of bad data. Not a huge deal in the grand scheme of things. But given that this is the only attack that the system is trying to guard against, a total failure in context.

Honestly, sometimes I can’t believe people let me anywhere near a programming environment.

One last — and fascinating — note about that AIFF exploit: It was the result of an infamous university course (perhaps this one?) given by D. J. Bernstein in which students were required to find 10 security holes in open source software programs during the term. Reportedly, all of the students failed the class since none actually found 10 holes. I don’t know if the class was ever held again.

Posted in FATE Server, Programming | 1 Comment »

What’s So Hard About 0xA9?

February 3rd, 2010 by Multimedia Mike

No matter how much I think I know about about character encoding or trying to work around issues arising from the same, I’ll always get bitten.

For a long time, one particular FATE configuration has shown 87/127 tests succeeding, even though the total number of test specs has crept up over to 300. I investigated this errant configuration on the client side and concluded that it was, in fact, executing all of the tests and sending all the results over the server in one neat package. Apparently, the problem was on the server side. Since it was an older Intel C compiler configuration, I didn’t care about investigating much further.

At one point, some bad bit of code was checked into FFmpeg and all of the results started showing xy/127 tests succeeded. This made the issue a bit more pressing. Mans discovered that the problem had to do with the svq3 test spec failing. The bad code affecting the SVQ3 test was quickly fixed so I didn’t worry about it again until yesterday when, once again, FATE’s various configs were only reporting that 127 tests had been run.

Here’s what was happening: FATE stores the stderr output of a test only if the test spec fails. This is a key data point since everything is fine when the test is successful and FATE tosses the output. The sample used for the SVQ3 test outputs the following metadata (among other data) in the stderr (seen, for example, in this test result):

    copyright       : ? Vertical Online 2001
    copyright-eng   : ? Vertical Online 2001

Those mystery characters map to the byte 0xA9 which is the c-in-a-circle copyright symbol according to UTF tables I can find (at least, U+00A9 is). That byte is making the system choke somewhere along the line, which annoys me greatly. When the client-side Python script executes the test and stuffs the stdout and stderr into the SQLite database, the relevant field is supposed to be a blob– a binary large object. The receiving PHP script on the server is also supposed to honor that blob schema.

Mans’ solution is to specifically encode the stdout/stderr blobs as UTF8 strings in the client-side Python script. That fixes this problem. But I’m confused as to why this is necessary in the first place. Was the PHP script doing its best to interpret the data inside the blob and falling over? Or was the SQLite engine on the server confused by the 0xA9 character in the blob?

Also, I suddenly find myself wondering how the A9 search company got its name.

Update: Thanks for MichaelK. for pointing out the problem. While I was properly converting (since that’s necessary) stdout/stderr from build records to binary type, I never did the same for test result stdout/stderr. I had to do it for the build record output since that was compressed before going into the database. Since the test result data is “just strings”, or so I thought, no reason to do so.

Posted in FATE Server | 3 Comments »

Systematic Benchmarking Adjunct to FATE

January 26th, 2010 by Multimedia Mike

Pursuant to my rant on the futility of comparing, performance-wise, the output of various compilers, I wholly acknowledge the utility of systematically benchmarking FFmpeg. FATE is not an appropriate mechanism for doing so, at least not in its normal mode of operation. The “normal mode” would have each of every configuration (60 or so) running certain extended test specs during every cycle. Quite a waste.

Hypothesis: By tracking the performance of a single x86_64 configuration, we should be able to catch performance regressions in FFmpeg.

Proposed methodology: Create a new script that watches for SVN commits. For each and every commit (no skipping), check out the code, build it, and run a series of longer tests. Log the results and move on to the next revision.

What compiler to use? I’m thinking about using gcc 4.2.4 for this. In my (now abandoned) controlled benchmarks, it was the worst performer by a notable margin. I’m thinking that the low performance might help to accentuate performance regressions. Is this a plausible theory? 2 years of testing via FATE haven’t revealed any other major problems with this version.

What kind of samples to test? Thankfully, Big Buck Bunny is available in 4 common formats:

  • MP4/MPEG-4 part 2 video/AC3 audio
  • MP4/H.264 video/AAC audio
  • Ogg/Theora video/Vorbis audio
  • AVI/MS MPEG-4 video/MP3 audio

I have the 1080p versions of all those files, though I’m not sure if it’s necessary to decode all 10 minutes of each. It depends on what kind of hardware I select to run this on.

Further, I may wish to rip an entire audio CD as a single track, encode it with MP3, Vorbis, AAC, WMA, FLAC, and ALAC, and decode each of those.

What other common formats would be useful to track? Note that I only wish to benchmark decoding. My reasoning for this is that decoding should, on the whole, only ever get faster, never slower. Encoding might justifiably get slower as algorithmic trade-offs are made.

I’m torn on the matter of whether to validate the decoding output during the benchmarking test. The case against validation says that computing framecrc’s is going to impact the overall benchmarking process; further, validation is redundant since that’s FATE’s main job. The case for validation says that since this will always be run on the same configuration, there is no need to worry about off-by-1 rounding issues; further, if a validation fails, that data point can be scrapped (which will also happen if a build fails) and will not count towards the overall trend. An errant build could throw off the performance data. Back on the ‘against’ side, that’s exactly what statistical methods like weighted moving averages are supposed to help smooth out.

I’m hoping that graphing this idea for all to see will be made trivial thanks do Google’s Visualization API.

The script would run continuously, waiting for new SVN commits. When it’s not busy with new code, it would work backwards through FFmpeg’s history to backfill performance data.

So, does this whole idea hold water?

If I really want to run this on every single commit, I’m going to have to do a little analysis to determine a reasonable average number of FFmpeg SVN commits per day over the past year and perhaps what the rate of change is (I’m almost certain the rate of commits has been increasing). If anyone would like to take on that task, that would be a useful exercise (’svn log’, some text manipulation tools, and a spreadsheet should do the trick; you could even put it in a Google Spreadsheet and post a comment with a link to the published document).

Posted in FATE Server | 16 Comments »

« Previous Entries