Category Archives: Programming

ISO Compromise

Engineering is about trade-offs and compromises. One of the most fundamental trade-offs to be made when designing a storage format is whether multi-byte numbers will be encoded as little or big endian numbers. But have you ever studied the data structures involved in ISO-9660, the standard filesystem format for optical discs? It seems that the committee tasked with developing this standard were unwilling to make this one tough decision and specified all multi-byte numbers as omni-endian. I just made that term up. Maybe it could be called bi-endian or multi-endian. The raw detail is that multi-byte numbers are stored in little endian format and then in big endian. For example, 0x11223344 is stored using 8 bytes: 0x44 0x33 0x22 0x11 0x11 0x22 0x33 0x44.


CD-ROM

Do any other filesystems take this compromise? I am not that versed. I have studied the odd game-related optical filesystem; I had to write a manual ext2 searching tool for a sysadmin class; I also had to try to recover a virus-corrupted FAT16 filesystem (to no avail; that virus cleanly chewed up some of the most important sectors).

Anyway, if I were to go ahead and try for a new FUSE driver for ISO-9660 (or modify an existing driver), I would want to go after the main format. Plus, I would want to natively interpret that CISO format on the fly. Further, I would use this as a platform to understand what is so special about the apparent ISO-9660 data ripped from a Sega Dreamcast GD-ROM.

Are there any other ISO bastardizations to target for such a tool?

Linus Is Still The Man

Linus Torvalds– a legendary figure who sat down one day and wrote an operating system. To many ordinary programmers like myself, he is a distant figurehead, difficult to comprehend. Every now and then, however, we catch a glimpse that helps us to humanize the mighty coder. And I don’t know about you, but I love a good knockdown, drag-em-out C vs. C++/Java/OOP flame war and this thread does not disappoint: Linus tells it like it is on the topic of C++.

Perhaps I’m too harsh on C++. In fact, there is one instance where I really appreciate the use of good, solid C++ coding– when a binary target that I wish to reverse engineer was originally authored in C++, compiled, and still has the mangled C++ symbols. gcc’s binutils do a fabulous job of recovering the original class and method names, as well as argument lists.

Sometimes I think I should get off my high horse with regards to C. After all, this article from May listed C programming as one of the top 10 dead or dying computer skills, right up there with Cobol and OS/2. This is not the first time that I have encountered such sentiment, that C is going the way of raw assembler. I think it’s all a conspiracy perpetrated by the computer book publishing industry. The C language simply does not move anywhere near as many books as the latest flavor of the month fad language.

Language Scavenger Hunt

So many fun programming languages out there, and more emerging all the time. But who has time to learn them all? I certainly don’t, but I still want to learn. One issue I have is that I don’t learn that well by reading through a language reference or a tutorial. I learn best by doing, and when it comes to learning a new language, I learn best when I have a specific task I am trying to accomplish. To that end, I was thinking that it would nice to have a list of essential, yet simple, programming exercises, preferably ones that are suited for higher level languages. These would give me concrete goals to research whilst attempting to learn a new language. Further, I would build up a small repository of sample programs to which I could refer later. When I need to write something in Perl, my chief method of refreshing my skill is to look back on similar code I wrote as many as 10 years ago.

I did a cursory Google search for something along these lines, but came up empty-handed. It may be necessary to start assembling a list of my own. This would include items like:

  • Reading a file, line by line, and processing each line through a regular expression; or perhaps a more complicated, less sed-style textual processing application
  • Opening a socket to a web server and fetching a web page; perhaps screen-scrape something useful off of the fetched data
  • Write a simple web server (and consider carefully the security implications of what you produce)
  • If the language has a graphics API (e.g., through something like gd or SDL), create a canvas, draw some dots, lines, shapes, load a font and write some text, load an image and blt it; decode a video in real-time using FFmpeg
  • If the language has an API for accessing your favorite database, use it to connect to the db server, SELECT, INSERT, UPDATE, DELETE, etc.; understand how the API can organize query data into the language’s native data structures (e.g., Perl’s DBI can fetch the results into a hash array which is extremely useful and intuitive)

These are just a few ideas off the top of my head. Another important aspect would be specific exercises targeted at understanding the language’s native data structures since those tend to be a key selling point of many very high level languages.

Revenge Of The Autobuilds

Takis has been a busy FFmpeg hacker: He recently established an experimental server to automatically build the current source-controlled copy of FFmpeg and perform some rudimentary tests with the output. This is some great initiative on his part.

(Oh, and look what else Takis has been up to while no one is looking: a graph of FFmpeg code change over time.)

I have wanted to build an automated building and testing infrastructure for FFmpeg for a long time now. I got my first concept up and running late last November. I just realized that I never blogged about it although I did announce it on the ffmpeg-devel mailing list. The concept lives at http://builds.multimedia.cx/, though be advised that the script that updates it went offline in late December.

Predictably, people seemed to think the autobuild system was a good idea but that my implementation needed a lot of work. And they were right. The reason that I never blogged about it is likely that I figured I was about to deploy a better concept very soon.

It is now July and I have had months to brainstorm ideas for an improved autobuild and test infrastructure. Unfortunately, as can often happen with revision 2 of an unproven idea, I fear my concept has devolved into an exercise in architecture astronomy.


Architecture Astronomy

Read Joel Spolsky’s excellent essay, “Don’t Let Architecture Astronauts Scare You”. It’s about people who heavily theorize in the abstract but rarely accomplish anything useful. Personally, I consider it a clear indicator of architecture astronomy when a program’s fundamental paradigm revolves around the idea that, “Everything is an object (or module)!” It is my opinion that declaring everything in your architecture to be an object is the abstraction endgame (to be more specific, everything is a swappable, user-configurable module, even the central engine of the program that is supposed to coordinate everything between other modules).

I’ll explain the evolution of my autobuild idea: It started simply enough with a script that iterated through a bunch of compiler versions and ran the configure/make commands to build each. It logged stdout and stderr separately and logged general information about success/failure, SVN version, etc. into a rudimentary database table that could be simply queried with a PHP script.

I soon realized that this is wholly inadequate to the overall goals I wished to accomplish in this endeavor (building and testing on many platforms). Security is a major issue, which I blogged about before, and which I solved in the first iteration using the most paranoid policies of chroot’ing the configure/make steps and prohibiting network access during the process. Another problem is the eventuality of infinite loop bugs. Any build or test step could conceivably encounter such a condition.

This realization led me to redesign the autobuild/test system as a series of individual executable steps, all stored in a database, of which the primary script has no hardcoded knowledge. And this is where the “Everything is a module” philosophy comes into play. Unfortunately, the further I plot this out on paper, the harder it becomes because the execution module concept is too generic; it’s hard to do certain specific things. I realize I need to back off a bit on the abstraction.