Author Archives: Multimedia Mike

The Best Type Of Compression

The best type of compression is to encode no data at all.


stdout flow

I’m a little embarrassed to admit that this didn’t occur to me until just now, 2 months after I first deployed the FATE Server. Each test specification in the database has an expected stdout text blob associated with it. The server sends this to the client, who compares the expected stdout with the actual stdout gathered from running a test. The client then sends the actual stdout text back to the server.

Wait! There’s no reason to send the actual stdout back to the server. At least, not if the test was successful. Logically, that means that (actual stdout) == (expected stdout). Send back a special code to indicate that the stdout matched. The server can decide to clone the expected stdout into the actual stdout column in the database under that condition.

Wait, #2! There’s no reason to store the actual stdout if the test is successful. Logically, it’s the same as the data that’s already sitting in the expected stdout field. Which, BTW, is only in the database once. Whereas, actual stdout data occurs many times in a different table.

As you can see, I have been considering optimization strategies. After a client is finished running all the tests for a given configuration, it logs the results for all the tests. There are presently only 90 tests and it seems to take about 30 seconds, give or take 10, to log all the results. That’s a measly 3 records per second, which is annoying, especially since I want this suite to embody hundreds upon hundreds of individual tests eventually. This issue is sort of blocking me from really ramping up on the number of test cases.

Right now. the test clients use the direct MySQL protocol through Python and I doubt that it is being compressed over the wire. I hope to revise the infrastructure so that the test results will be serialized, compressed, and sent to a CGI script on the FATE server. The CGI script will decompress, deserialize, and enter the test results from a position much closer to the actual database server. Hopefully, this will improve performance. If nothing else, it will set the stage for running the FATE client on machines that don’t have working Python MySQLdb libraries, or that can’t access the MySQL port directly due to firewalling.

So that will hopefully address the bandwidth concerns. There is still the issue of disk storage. As discussed previously, raw disk space is really not an issue. I could swallow a gigabyte or 2 per month and still be okay for several years. But it would still be nice for the database to remain a manageable size for the purpose of responsible backups. The idea of not storing actual stdout, rather just a bool to indicate that it checked out, will help to reduce storage requirements. However, I think I should also institute a schedule of “retiring” the stdout/stderr data from old build records and test results.

Someone showed up on ffmpeg-devel yesterday with a bug report that ‘ffmpeg -h’ crashes the program on Solaris. It seems quite reasonable to add a test spec for that simple case with a NULL for expected stdout which would indicate “don’t care”. I would be concerned about filling up so much space with the help command (stdout on ‘ffmpeg -h’ is presently about 28K) on each run. But I might not mind so much if I could retire (ruthlessly delete) the data later.

Summer of Code 2008

Google has announced their Summer of Code, 2008 edition. The mentor organization application process begins on Monday, March 3 and I will be right there, ready with our group’s app. My FFmpeg cohorts and myself have been busily assembling a new Wiki page detailing what prospective students might do for the project, should FFmpeg be accepted as a mentoring organization for a third year. Like last year, we will be enforcing the requirement that students must successfully complete a qualification task in order to be considered for a project. I feel like I live in sort of a bubble these days, but I am becoming increasingly aware that straight-up, performance-minded C programming seems to be a dying art and we can’t necessarily count of students knowing the language already when they apply for the project.


Google Code logo

Of course, you are free to add to the list, either in the 2nd tier proposals section or the qualification tasks. But don’t bother adding anything under the 1st tier proposals unless you are willing to mentor the project. Anyone can toss out any idea. But we need project ideas that can be plausibly completed by a talented student over the course of a summer, and we need mentors who will commit. As for qualification tasks, remember that these are bite-sized pieces of work that would ideally take a seasoned FFmpeg developer a few hours at most to complete. If you create a new qualification task, please put some detail into it. Look at it from the perspective of a new student who may not be up to speed on FFmpeg. A qualification task of “fix XYZ thing” is quite bewildering. Expound just a little bit. Remember to link to other pages within the Wiki.

I logged into my SoC mentor dashboard for the first time in a long time. It claimed that I had not completed my final program survey from last year, even though I’m quite certain that I did (I wonder if that’s why I never saw my mentor money?). Anyway, one of the questions:

What advice would you give to future would-be Summer of Code mentoring organizations? (required)

I don’t remember my answer last time, but this is the most honest answer that came to mind this time: “Don’t try to compete with us for prize students. FFmpeg is a sexier project — an alpha project, if you will — and you won’t beat us.”

Git Chat

Actual IM conversation:

[me]: ever use git?
[them]: Why would I do such a thing?
[me]: peer pressure, because all of your co-devs 
      told you it was the cool thing to do

I thought that developing the software that drives FATE would serve as a good opportunity to learn the Git source control software. Foolish.

Git terrifies me. Thing is, I make mistakes. Lots of mistakes. I need a source control management system (SCM) that is sympathetic to my incompetence. As it stands, when I make a mistake, I have to dig through 140 git-* commands on my system to try to guess which one just might offer a shimmering hope of redemption. If I choose poorly, I will only exacerbate the situation as well as pollute the official history log. Such was the case when I tried to revert one particular commit. I can’t remember how that worked out exactly. I guess I got the correct code back eventually, but the log file tells a sordid tale.

More recently, I edited a file but decided I didn’t want the changes; I wanted the previous committed version back. Perhaps use git-revert, like most other SCMs? Goodness, no. Maybe git-reset? Guess again. Turns out git-checkout is what I was looking for (thanks, Mans). Now, I have made the mistake of using git-commit in such a way that actually committed more files than I thought it would (serves me right for following examples and not reading the pedantic documentation first). Now I find myself wanting to undo the commit for one particular file but not actually lose the changes.

Here’s a solution that can’t fail: ‘rm -rf .git/’, followed by a re-reading of how to initialize a local Subversion repository. And whose idea was it to tag revisions with random 160-bit hex codes like 488dfe6a946bbbbb4e095a5d758ad9808f7336b1? (Yeah, I know, they’re SHA-1 codes or some such. I don’t care; it’s still not human-friendly). I hope FFmpeg never gets around to making the switch.

Prompt FATE

If you have visited the FATE server, the first thing you will notice is how abysmally slow the main page is. That’s because of the absurd amount of queries required to put together that concise summary page. It’s hard to get people to take the system seriously when the front page takes 30-60 seconds to load. Look, what can I tell you? I’m a multimedia hacker, not a DBA.

But it all changed tonight. That’s right– it’s all about caching! Give the FATE Server another look. Give it many looks, in fact, because the main page will load immediately. Be advised that the cache is only updated every 15 minutes, so the trade-off is not quite as much instant gratification.