Monthly Archives: September 2011

Space Adventure CD-ROM

I acquired a CD-ROM entitled Space Adventure by Knowledge Adventure (I like these people; they make decent, entertaining educational games). The physical media displays a copyright date of 1993, very early in the multimedia era.



This 1993 CD-ROM makes proud use of multimedia files. What kind? There’s a movies/ directory with 17 .mov files. It would be way too simple if these were QuickTime files, though. These represent a custom format, and video-only since a separate sounds/ directory contains .snd files with filenames corresponding to the .mov files. The .snd files are actually Creative Voice (a.k.a. VOC) files. As for this MOV format, wiki page and samples.



I was also surprised to find the binary ultrasnd.exe file among the drivers on the disc. The Gravis UltraSound was released in 1992. The sound setup utility does not have an option for the GUS, however. No matter since DOSBox has great SB/Pro/16 emulation.

I’m also a bit puzzled about why the DOSBox screenshots are 720 x 480 (posted here are various cropping and resizings).

What’s So Hard About Building?

I finally had a revelation as to why so building software can be so difficult– because build systems are typically built on programming languages that you don’t normally use in your day to day programming activities. If the project is simple enough, the build system usually takes care of the complexities. If there are subtle complexities — and there always are — then you have to figure out how to customize the build system to meet your needs.

First, there’s the Makefile. It’s easy to forget that the syntax which comprises a Makefile pretty well qualifies as a programming language. I wonder if it’s Turing-complete? But writing and maintaining Makefiles manually is arduous and many systems have been created to generate Makefiles for you. At the end of the day, running ‘make’ still requires the presence of a Makefile and in the worst case scenario, you’re going to have to inspect and debug what was automatically generated for that Makefile.

So there is the widespread GNU build system, a.k.a., “the autotools”, named due to its principle components such as autoconf and automake. In this situation, you have no fewer than 3 distinct languages at work. You write your general build instructions using a set of m4 macros (language #1). These get processed by the autotools in order to generate a shell script (language #2) called configure. When this is executed by the user, it eventually generates a Makefile (language #3).

Over the years, a few challengers have attempted to dethrone autotools. One is CMake which configures a project using its own custom programming language that you will need to learn. Configuration generates a standard Makefile. So there are 2 languages involved in this approach.

Another option is SCons, which is Python-based, top to bottom. Only one programming language is involved in the build system; there’s no Makefile generated and run. Until I started writing this, I was guessing that the Python component generated a Makefile, but no.

That actually makes SCons look fairly desirable, at least if your only metric when choosing a build system is to minimize friction against rarely-used programming languages.

I should also make mention of a few others: Apache Ant is a build system in which the build process is described by an XML file. XML doesn’t qualify as a programming language (though that apparently doesn’t stop some people from using it as such). I see there’s also qmake, related to the Qt system. This system uses its own custom syntax.

Playing With File

I played with the ‘file’ utility a long time ago because I wanted to make it recognize a large number of multimedia formats. I had trouble getting my changes to take. But I’m prepared to try again after many years.

Aiming at the Corpus
In my local mirror of the MPlayerHQ samples archive, I find 9853 unique files. So I run all of them through the ‘file’ command:

  'find /path/to/samples -type f -print0 | xargs -0 file --no-pad'

My Ubuntu installation has file v5.04. I also tested against 5.07 and the latest, 5.08. Here is the number of files each version was unable to identify (generically marking as ‘data’):

5.04  1521
5.07  1405
5.08  1501

That seems like a regression for v5.08 until I dug into the details and saw quite a few items like this, indicating that the MPEG detection could use some work:

-mov/mov-demux-infinite-loop.mpg: DOS-executable (
+mov/mov-demux-infinite-loop.mpg: data
-image-samples/UNeedQT4.pntg: DOS-executable (
+image-samples/UNeedQT4.pntg: data

Workflow
These are just notes to myself and perhaps anyone else who wants to add new file formats to be identified by the ‘file’ command.

First, download either the latest release from the FTP or clone from Github. Do the usual unpack, ‘./configure’, ‘make’ routine. To use this newly-built version and its associated magic file:

  ./src/file --magic-file magic/magic.mgc <file>

To add a new format for ID, first, run the foregoing command to ensure that it’s not already identified. Then, check over the files in magic/Magdir and see which one might pertain to what you’re doing (it’s unlikely that your format will merit a new file in this directory). For example, for this round, I modified animation, audio, iff, and riff. Add or modify existing specs based on the copious examples in the directory and by consulting the appropriate man page (‘man 5 magic’).

Finally, run ‘make’ again which will regenerate the magic file. Invoke the above command again to use the modified magic file.

Before and After
On a selection of formats taken from the samples archive (renamed and cut down to a kilobyte because detection typically only relies on the first few bytes), here is the “before”:

amv:            RIFF (little-endian) data
armovie:        data
bbc-dirac:      data
interplay-mve:  data
mtv:            data
nintendo-thp:   data
nullsoft-video: data
redcode:        data
sega-film:      data
smacker:        data
trueaudio:      data
vqa:            IFF data
wavpack:        data
wc3-mve:        IFF data
wtv:            data

And the “after”:

amv:            RIFF (little-endian) data, AMV 
armovie:        ARMovie
bbc-dirac:      BBC Dirac Video
interplay-mve:  Interplay MVE Movie
mtv:            MTV Multimedia File
nintendo-thp:   Nintendo THP Multimedia
nullsoft-video: Nullsoft Video
redcode:        REDCode Video
sega-film:      Sega FILM/CPK Multimedia, 320 x 224
smacker:        RAD Game Tools Smacker Multimedia version 2, 320 x 200, 100 frames
trueaudio:      True Audio Lossless Audio
vqa:            IFF data, Westwood Studios VQA Multimedia, 418 video frames, 320 x 200
wavpack:        WavPack Lossless Audio
wc3-mve:        IFF data, Wing Commander III Video, PC version
wtv:            Windows Television DVR Media

After rerunning ‘file’ on the mphq corpus using the modified magic file, only 1329 files remain unidentified (down from 1501).

Going Forward
As mentioned, MPEG detection could probably be strengthened. However, a major weakness is QuickTime/MP4. Many files are not detected, probably owing to the many ways that QuickTime files can begin.

Started Programming Young

I have some of the strangest memories of my struggles to jump into computer programming.

Back To BASIC
I remember doing some Logo programming on Apple II computers at school in 5th grade (1987 timeframe). But that was mostly driving turtle graphics. Then I remember doing some TRS-80 BASIC in 7th grade, circa 1989. Emboldened by what very little I had learned in perhaps the week or 2 we took in a science class to do this, I tried a little GW-BASIC on my family’s “IBM-PC compatible” computer (they were still called that back then). I still remember what my first program consisted of. Even back then I was interested in manipulating graphics and color on a computer screen. Thus:

10 color 1
20 print "This is color 1"
30 color 2
40 print "This is color 2"
...

And so on through 15 colors. Hey, it did the job– it demonstrated the 15 different colors you could set in text mode.

What’s FOR For?
That 7th grade computer unit in science class wasn’t very thick on computer science details. I recall working with a lab partner to transcribe code listings into a computer (and also saving my work to a storage cassette). We also developed form processing programs that would print instructions to input text followed by an “INPUT I$” statement to obtain the user’s output.

I remember there was some situation where we needed a brief delay between input and printing. The teacher told us to use a construct of the form:

10 FOR I = 1 TO 20000
20 NEXT I

We had to calibrate the number based on our empirical assessment of how long it lasted but I recall that the number couldn’t be much higher than about 32000, for reasons that would become clearer much later.

Imagine my confusion when I would read and try to comprehend BASIC program code I would find in magazines. I would of course see that FOR..NEXT construct all over the place but obviously not in the context of introducing deliberate execution delays. Indeed, my understanding of one of the fundamental building blocks of computer programming — iteration — was completely skewed because of this early lesson.

Refactoring
Somewhere along the line, I figured out that the FOR..NEXT could be used to do the same thing a bunch of times, possibly with different values. A few years after I had written that color program, I found it again and realized that I could write it as:

10 for I = 1 to 15
20 color I
30 print I
40 next I

It still took me a few more years to sort out the meaning of WHILE..WEND, though.