Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Meta:

FFmpeg and Multiple Build Threads

November 19th, 2009 by Multimedia Mike

I got bored today and decided to empirically determine how much FFmpeg compilation time can be improved by using multiple build threads, i.e., ‘make -jN’ where N > 1. I also wanted to see if the old rule of “number of CPUs plus 1″ makes any worthwhile difference. The thinking behind that latter rule is that there should always be one more build job queued up ready to be placed on the CPU if one of the current build jobs has to access the disk. I think I first learned this from reading the Gentoo manuals. I didn’t find that it made a significant improvement. But then, Gentoo is for ricers.


FFmpeg being built with multiple threads on a 2x Core 2 Duo


FFmpeg being built with multiple threads on a Core 2 Duo


FFmpeg being built with multiple threads on a dual-core hyperthreaded Atom

I think the most interesting thing to observe about these graphs is the CPU time (the amount of time the build jobs are actually spending on the combined CPUs). The number is roughly steady for the Core 2 CPUs regardless of number of jobs while the Hyperthreaded Atom CPU sees a marked increase in total CPU time. Is that an artifact of Hyperthreading? Or maybe I just didn’t put together a stable testing methodology.

Posted in Programming | 10 Comments »

Process Runner Redux

November 12th, 2009 by Multimedia Mike

Pursuant to yesterday’s conundrum of creating a portable process runner in Python for FATE that can be reliably killed when exceeding time constraints, I settled on a solution. As Raymond Tau reminded us in the ensuing discussion, Python won’t use a shell to launch the process if the program can supply the command and its arguments as a sequence data structure. I knew this but was intentionally avoiding it. It seems like a simple problem to break up a command line into a sequence of arguments– just split on spaces. However, I hope to test metadata options eventually which could include arguments such as ‘-title “Hey, is this thing on?”‘ where splitting on spaces clearly isn’t the right solution.

I got frustrated enough with the problem that I decided to split on spaces anyway. Hey, I control this system from top to bottom, so new rule: No command line arguments in test specs will have spaces with quotes around them. I already enforce the rule that no sample files can have spaces in their filenames since that causes trouble with remote testing. When I get to the part about testing metadata, said metadata will take the form of ‘-title “HeyIsThisThingOn?”‘ (which will then fail to catch myriad bugs related to FFmpeg’s incorrect handling of whitespace in metadata arguments, but this is all about trade-offs).

So the revised Python process runner seems to work correctly on Linux. The hangaround.c program simulates a badly misbehaving program by eating the TERM signal and must be dealt with using the KILL signal. The last line in these examples is a tuple containing return code, stdout, stderr, and CPU time. For Linux:

$ ./upr.py
['./hangaround', '40']
process ID = 2645
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

The unmodified code works the same on Mac OS X:

$ ./upr.py
['./hangaround', '40']
process ID = 94866
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

Now a bigger test: Running the upr.py script on Linux in order to launch the hangaround process remotely on Mac OS X via SSH:

$ ./upr.py
['/usr/bin/ssh', 'foster-home', './hangaround', '40']
process ID = 2673
timeout, sending TERM
[143, '', '', 50]

So that’s good… sort of. Monitoring the process on the other end reveals that hangaround is still doing just that, even after SSH goes away. This occurs whether or not hangaround is ignoring the TERM signal. This is still suboptimal.

It would be possible to open a separate SSH session to send a TERM or KILL signal to the original process… except that I wouldn’t know the PID of the remote process. Or could I? I’m open to Unix shell magic tricks on this problem since anything responding to SSH requests is probably going to be acceptably Unix-like. I would rather not go the ‘killall ffmpeg’ route because that could interfere with some multiprocessing ideas I’m working on.

Here’s a brute force brainstorm: When operating in remote-SSH mode, prefix the command with ‘ln -s ffmpeg ffmpeg-<unique-key>’ and then execute the symbolic link instead of the main binary. Then the script should be able to open a separate SSH session and execute ‘killall ffmpeg-<unique-key>’ without interfering with other processes. Outlandish but possibly workable.

Posted in FATE Server, Python | 12 Comments »

Process of Confusion

November 12th, 2009 by Multimedia Mike

I am working hard at designing a better FATE right now. But first thing's first: I'm revisiting an old problem and hoping to conclusively determine certain process-related behavior.

I first described the problem in this post and claimed in this post that I had hacked around the problem. Here's the thing: When I spin off a new process to run an FFmpeg command line, Python's process object specifies a PID. Who does this PID belong to? The natural assumption would be that it belongs to FFmpeg. However, I learned empirically that it actually belongs to a shell interpreter that is launching the FFmpeg command line, which has a PID 1 greater than the shell interpreter. So my quick and dirty solution was to assume that the actual FFmpeg PID was 1 greater than the PID returned from Python's subprocess.Popen() call.

Bad assumption. The above holds true for Linux but not for Mac OS X, where the FFmpeg command line has the returned PID. I'm not sure what Windows does.

This all matters for the timeout killer. FATE guards against the possibility of infinite loops by specifying a timeout for each test. Timeouts don't do much good when they trigger TERM and KILL signals to the wrong PID. I tested my process runner carefully when first writing FATE (on Linux) and everything worked okay with using the same PID returned by the API. I think that was because I was testing the process runner using the built-in 'sleep' shell command. This time, I wrote a separate program called 'hangaround' that takes a number of seconds to hang around before exiting. This is my testing methodology:

PYTHON:
  1. import subprocess
  2. >>> process = subprocess.Popen("./hangaround 30",
  3.   shell=True,
  4.   stdout=subprocess.PIPE,
  5.   stderr=subprocess.PIPE)
  6. >>> process.pid
  7. 21433

From another command line:

$ ps ax|grep hangaround
21433 pts/2    S+     0:00 /bin/sh -c ./hangaround 30
21434 pts/2    S+     0:00 ./hangaround 30
21436 pts/0    R+     0:00 grep hangaround

That's Linux; for Mac OS X:

>>> process.pid
82079

$ ps ax|grep hangaround
82079 s005  S+     0:00.01 ./hangaround 30
82084 s006  R+     0:00.00 grep hangaround

So, the upshot is that I'm a little confused about how I'm going to create a general solution to work around this problem-- a problem that doesn't occur very often but makes FATE fail hard when it does show up.

Followup:

Posted in FATE Server, Python | 20 Comments »

State of the Art Compiler Optimization

November 8th, 2009 by Multimedia Mike

Felix von Leitner delivered a talk at the 2009 Linux Kongress about the state of the art in compiler optimization (link to PDF slides). Presentation slides by themselves are not a good way to understand a talk and it would be better to learn if video for the actual talk is posted somewhere. Compiler optimization (or lack thereof) is fairly important to FFmpeg developers.

The talk analyzes how LLVM, icc, MSVC, Sun C, and gcc generate fast code in this day and age. One basic theme I gathered is that coders should forgo clever C optimizations as they tend to be counterproductive. I wish I could believe that, but there was that recent episode where I optimized FFmpeg's Theora decoder by removing structure dereferences. I'm sure that other performance-minded multimedia hackers will have other nits to pick with the broad generalizations in the presentation. I call your attention to the fighting words (which I have taken out of context since it's such a fun quote) on slide 41: "Note: gcc is smarter than the video codec programmer on all platforms." Further, slides 53-55 specifically call out madplay for inline ASM that allegedly didn't improve efficiency vs. what the compiler could achieve with the raw C code.

On the whole, the findings are probably quite accurate for the kind of C code that most people need to write (e.g., "No need to write a >> 2 when you mean a/4!").

Speaking of compilers, FATE now covers Intel's 11.1 series C compiler for both 32- and 64-bit icc. I have also updated the stale snapshots of the gcc-svn for my machines (I still need to write a tool to do that for me automatically and continuously).

Posted in FATE Server, Programming | 7 Comments »

Roketz VQM

October 20th, 2009 by Multimedia Mike

I've been on a gaming kick lately. I found a game in my collection called Roketz; the full DOS game can be downloaded from the publisher. The game has 2 files bearing the extension VQM that appear to be FMV files. Wiki and samples.

Roketz comes from a company called Bluemoon. According to their website, they're also responsible for building the technological foundations for 2 well-known pieces of software: Kazaa and Skype. I've never used either, personally, but I understand that Skype uses a custom vocoder called SILK. Maybe Roketz and VQM is where the team got their start in codec technology?

Posted in Game Hacking | 3 Comments »

iPhone Developments

October 10th, 2009 by Multimedia Mike

Wikipedia's knowledge of compilers credits the first compiler to Grace Hopper in 1952 (for a language called A-0). I suspect that if blogs existed in 1952, we would have been treated to rants such as:

I, personally, have a problem with a developer who feels entitled to be able to develop for a computer without investing the time to learn the machine opcodes or punch card formats that the machine was built around.

This is one of the arguments I have been hearing this past week after my employer announced an upcoming method for exporting Flash/AS3 projects as iPhone apps. It strikes me as an age-old argument between low vs. high level languages, that's all. I chortle when recalling how certain people urged me to construct the FATE system in POSIX-complaint, ANSI C for maximum portability and speed instead of using a language like Python. Nowadays, that Python code is testing FFmpeg on a dozen different CPUs running 10 different operating systems. It makes me shudder to think of how much work it would have been to write the FATE script in straight C and how little benefit doing so would have brought.

In other groundbreaking iPhone news, Mans recently announced that FFmpeg can be built for the iPhone out of the SVN tree with only a minor modification to Apple's iPhone toolchain (call the SDK police!). This is a feat that has thus far proved challenging, as Mans outlined here. I understand it will be a little difficult to continuously test FFmpeg on either a real iPhone or an emulator. However, I'm planning a revision to FATE's architecture so that certain configurations can be marked "build-only" and forgo the test phase. This will also be useful for Hitachi SH-4 and perhaps other architectures that FFmpeg supports but for which we don't have access to hardware for the sake of continuous testing.

Whenever the notion of compiling and running FFmpeg on the iPhone crops up, it prompts me to wonder why. Why do people care about this? Are they transcoding media on the iPhone? Are they republishing old games and using FFmpeg's numerous game-oriented decoders for direct playback instead of doing the sensible thing and transcoding the original media to MP4/CAF/H.264/AAC for native playback through the platform's frameworks and hardware acceleration? Is it just a point of academic curiosity thanks to the fact that FFmpeg is quickly becoming a standardized metric of compiler quality? Why?

Posted in FATE Server | 4 Comments »

13 Architectures

October 7th, 2009 by Multimedia Mike

An impromptu query from the FATE database:

SQL:
  1. mysql> SELECT DISTINCT(architecture)
  2.   FROM web_config_cache
  3.   WHERE revision IS NOT NULL
  4.   ORDER BY architecture;

This yields 13 architectures currently being continuously tested in FATE:

+--------------+
| architecture |
+--------------+
| Alpha        |
| ARMv5TE      |
| ARMv7        |
| AVR32        |
| ia64         |
| MIPS         |
| PA-RISC      |
| PowerPC      |
| PowerPC 64   |
| Sparc        |
| Sparc64      |
| x86_32       |
| x86_64       |
+--------------+

I suppose it's questionable to treat ARMv5TE and ARMv7 as truly separate architectures. Still, it's not a bad list of CPU coverage. It makes me wonder how it stacks up to the Linux kernel in terms of CPU support. According to Wikipedia, Linux still has the advantage.

One day I'll figure out a way to continuously test FFmpeg on a Hitachi SH-4 using my old Sega Dreamcast. That'll bring us closer.

Posted in FATE Server | 5 Comments »

A Lot Of New FATE Machines

October 3rd, 2009 by Multimedia Mike

Thanks to Thibaut VARĂˆNE for bringing an incredible number of new machines to the FATE table:

  • PowerPC / Mac OS X
  • ia64 / Linux
  • PA-RISC / Linux
  • Sparc / Linux
  • Sparc64 / Linux

As of this writing, the Sparc/64 machine is having trouble getting its first results uploaded to the FATE server. Those will hopefully start showing up soon.

Right now, none of the ia64 configurations compile successfully. This is indirectly how Thibaut learned of FATE (via this Roundup issue). No configuration is too marginal for us to track as long as someone has the resources to continuously run FATE cycles. If this is ever in doubt, just remember that Michael K. is testing FFmpeg on (Free)DOS via FATE.

Posted in FATE Server | 5 Comments »

I Need 16 Optimal Huffman Trees

October 2nd, 2009 by Multimedia Mike

Actually, I need 80 optimal Huffman trees, but let's take it one step at a time.

The VP3 video codec -- the basis of Theora -- employs 80 different Huffman trees. There are 16 for DC coefficients and 16 each for 4 different AC coefficient groups. An individual VP3/Theora frame gets to select 4 different Huffman trees: one for Y-plane DC, one for C-plane DC, one for Y-plane AC, and one for C-plane AC. VP3 hardcodes these tables. Theora allows more flexibility and an encoder is free to either use the default VP3 trees or create its own set and encode them into the header of the container (typical an Ogg file).

Generating an optimal Huffman tree for a particular set of input is rather well established; any introduction to Huffman codes covers that much. What I'm curious about is how one would go about creating a set of, e.g., 16 optimal Huffman trees for a given input. The first solution that comes to mind is to treat this as a vector quantization (VQ) problem. I have no idea if this idea holds water, or if it even has any sane basis in mathematics, but when has that ever stopped me from running with a brainstorm?

Here's the pitch:

  • Modify FFmpeg's VP3/Theora decoder to print after each frame decode the count of each type of token that was decoded from the stream (for each of the 5 coefficient groups, and for each of the plane types), as well as the number of bits that token was encoded with. This will allow tallying of the actual number of bits used for encoding tokens in each frame.
  • Create a separate tool to process the data by applying a basic VQ codebook training algorithm. It will be necessary to treat all of the Y-plane AC tokens as single vectors and do the same with the C-plane AC tokens, even though each AC token vector needs to be comprised of 4 separate AC group vectors. Re-use some existing E/LGB code for this step.
  • Generate Huffman trees from the resulting vectors and count the number of bits per token for each.
  • Iterate through the frequency vectors captured from the first step and match them to the codebooks using a standard distance algorithm.
  • Tally the bits from using the new vectors and see if there is any improvement versus the default vectors (Huffman tables).

I don't know if I'll have time to get around to trying this experiment in the near future but I wanted to throw it out there anyway. With all of the improvments that the new Theora encoder brings to the tables, it seems that the custom Huffman trees feature is one that is left un-exercised per my reading of the documentation and source code. From studying the Big Buck Bunny Theora encodes (my standard Theora test vectors these days), I see that they use the default VP3 tables. The 1080p variant occupied 866 MB. Could there be any notable space savings from generating custom Huffman tables? Or was this a pointless feature to add to the Theora spec?

Posted in VP3/Theora | 5 Comments »

Star-Shaped Discs

October 1st, 2009 by Multimedia Mike

I purchased a Sony PlayStation 3 recently. I thoroughly read the accompanying manual on a train ride and a particular detail caught this optical media aficionado's eye:


Sony PlayStation 3 manual -- disc shape notice

Wait... what? Star-shaped discs? Heart-shaped ones as well? Are those real? How would those even work? I know about 80 cm discs that fit in the smaller groove of a CD tray. I also know about the business card-shaped CD's; I even have a few games that were published on such a form factor (for example). But a star has points. And a heart? How?

A brief bit of Googling for "star shaped disc" leads me directly to the Wikipedia article on shaped CDs, which happens to showcase a heart-shaped CD. But how would a star-shaped disc work? That (typically) has 5 points. Where would the circular track go, the one that holds data? I figure there could be sort of a fat star, a circle with 5 points. This turns out to be the correct idea as this disc manufacturing page indicates.


star-shaped-cd

Check out the page and see the oddest shape-- the house CD.

Posted in General | 3 Comments »

« Previous Entries