Author Archives: Multimedia Mike

Programming Language Levels

I’ve been doing this programming thing for some 20 years now. Things sure do change. One change I ponder from time to time is the matter of programming language levels. Allow me to explain.

The 1990s
When I first took computer classes in the early 1990s, my texts would classify computer languages into 3 categories, or levels. The lower the level, the closer to the hardware; the higher the level, the more abstract (and presumably, easier to use). I recall that the levels went something like this:

  • High level: Pascal, BASIC, Logo, Fortran
  • Medium level: C, Forth
  • Low level: Assembly language

Keep in mind that these were the same texts which took the time to explain the history of computers from mainframes -> minicomputers -> a relatively recent phenomenon called microcomputers or “PCs”.

Somewhere in the mid-late 1990s, when I was at university, I was introduced to a new tier:

  • Very high level: Perl, shell scripting

I think there was some debate among my peers about whether C++ and Java were properly classified as high or very high level. The distinction between high and very high, in my observation, seemed to be that very high level languages had more complex data structures (at the very least, a hash / dictionary / associative array / key-value map) built into the language, as well as implicit memory management.

Modern Day
These days, the old hierarchy is apparently forgotten (much like minicomputers). I observe that there is generally a much simpler 2-tier classification:

  • Low level: C, assembly language
  • High level: absolutely every other programming language in wide use today

I find myself wondering where C++ and Objective-C fit in this classification scheme. Then I remember that it doesn’t matter and this is all academic.

Relevancy
I think about this because I have pretty much stuck to low-level programming all of my life, mostly due to my interest in game and multimedia-type programming. But the trends in computing have favored many higher level languages and programming paradigms. I woke up one day and realized that the kind of work I often do — lower level stuff — is not very common.

I’m not here to argue that low or high level is superior. You know I’m all about using the appropriate tool for the job. But I sometimes find myself caught between worlds, having the defend and explain one to the other.

  • On one hand, it’s not unusual for the multitudes of programmers working at the high level to gasp and wonder why I or anyone else would ever use C or assembly language for anything when there are so many beautiful high level languages. I patiently explain that those languages have to be written in some other language (at first) and that they need to run on some operating system and that most assuredly won’t be written in a high level language. For further reading, I refer them to Joel Spolsky’s great essay called Back to Basics which describes why it can be useful to know at least a little bit about how the computer does what it does at the lowest levels.
  • On the other hand, believe it or not, I sometimes have to defend the merits of high level languages to my low level brethren. I’ll often hear variations of, “Any program can be written in C. Using a high level language to achieve the same will create a slow and bloated solution.” I try to explain that the trade-off in time to complete the programming task weighed against the often-negligible performance hit of what is often an I/O-bound operation in the first place makes it worthwhile to use the high level language for a wide variety of tasks.

    Or I just ignore them. That’s actually the best strategy.

Cloaked Archive Wiki

Google’s Chrome browser has made me phenomenally lazy. I don’t even attempt to type proper, complete URLs into the address bar anymore. I just type something vaguely related to the address and let the search engine take over. I saw something weird when I used this method to visit Archive Team’s site:



There’s greater detail when you elect to view more results from the site:



As the administrator of a MediaWiki installation like the one that archiveteam.org runs on, I was a little worried that they might have a spam problem. However, clicking through to any of those out-of-place pages does not indicate anything related to pharmaceuticals. Viewing source also reveals nothing amiss.

I quickly deduced that this is a textbook example of website cloaking. This is when a website reports different content to a search engine than it reports to normal web browsers (humans, presumably). General pseudocode:

if (web_request.user_agent_string == CRAWLER_USER_AGENT)
  return cloaked_data;
else
  return real_data;

You can verify this for yourself using the wget command line utility:


$ wget --quiet --user-agent="Mozilla/5.0" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>GeoCities - Archiveteam</title>

$ wget --quiet --user-agent="Googlebot/2.1" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>Cheap xanax | Online Drug Store, Big Discounts</title>

I guess the little web prank worked because the phaux-pharma stuff got indexed. It makes we wonder if there’s a MediaWiki plugin that does this automatically.

For extra fun, here’s a site called the CloakingDetector which purports to be able to detect whether a page employs cloaking. This is just one humble observer’s opinion, but I don’t think the site works too well:
Continue reading

Curator of the Samples Archive

Remember how I mirrored the world-famous MPlayerHQ samples archive a few months ago? Due to a series of events, the original archive is no longer online. However, me and the people who control the mplayerhq.hu domain figured out how to make samples.mplayerhq.hu point to samples.multimedia.cx.

That means… I’m the current owner and curator of our central multimedia samples repository. Such power! This should probably be the fulfillment of a decade-long dream for me, having managed swaths of the archive, most notably the game formats section.

How This Came To Be

If you pay any attention to the open source multimedia scene, you might have noticed that there has been a smidge of turmoil. Heated words were exchanged, authority was questioned, some people probably said some things they didn’t mean, and the upshot is that, where once there was one project (FFmpeg), there are now 2 projects (also Libav). And to everyone who has wanted me to mention it on my blog– there, I finally broke my silence and formally acknowledged the schism.

For my part, I was just determined to ensure that the samples archive remained online, preferably at the original samples.mplayerhq.hu address. There are 10 years worth of web links out there pointing into the original repository.

Better Solution

I concede that it’s not entirely optimal to host the repository here at multimedia.cx. While I can offer a crazy amount of monthly bandwidth, I can’t offer rsync (invaluable for keeping mirrors in sync), nor can the server provide anonymous FTP or allow me to offer accounts to other admins who can manage the repository.

The samples archive is also mirrored at samples.libav.org/samples. I understand that service is provided by VideoLAN. Right now, both repositories are known to be static. I’m open to brainstorms about how to improve the situation.

Creating A Lossless SMC Encoder

Look, I can’t explain how or why I come up with this stuff. For some reason, I thought it would be interesting to write a new encoder for the Apple SMC video codec. I can’t even remember why. I just sat down the other day, started writing, and now I have a lossless SMC encoder that I’m not sure what to do with. Maybe this is to be my new thing— writing encoders for marginal multimedia formats.

Introduction
SMC is a vector quantizer (a lossy method) but I decided to attack it from the angle of lossless encoding. A.k.a. Apple Graphics Codec, SMC operates on 4×4 blocks in an 8-bit paletted colorspace. Each 4×4 block can be encoded with 1, 2, 4, 8, or 16 colors. Blocks can also be skipped (copied from previous frame) or copied from blocks rendered immediately prior within the same frame.

Step 1: Validating Infrastructure
The goal of this step is to encode the most braindead SMC frame possible and see if FFmpeg/libav’s QuickTime muxer can create a valid file. I think the simplest frame would be one in which each vector is encoded with the single-color mode, starting with color 0 and incrementing through the palette.

Status: Successful. The only ‘trick’ was to set avctx->bits_per_coded_sample to 8. (For fun, this can also be set to 40 (8 | 0x20) to specify a grayscale palette.)



Step 2: Preprocessing
The video frames will arrive at the encoder as 32-bit RGB. These will need to be converted to a paletted colorspace before encoding. I don’t want to use FFmpeg’s default dithering approach as this will result in a substantial loss of quality as described in this post. I would rather maintain a palette built from observed colors throughout successive frames. If the total number of unique observed colors ever exceeds 256, error out.

That’s what I would like to do. However, I noticed that FFmpeg/libav’s QuickTime muxer has never taken into account the possibility of encoding palettes. The path of least resistance in this case is to dither the input to match QuickTime’s default 8-bit palette (if a paletted QuickTime file does not specify a palette, a default 1-, 2-, 4-, or 8-bit palette is selected).

Status: Successful, if slow. I definitely need to optimize this step later.

Step 3: Most Naive Encoding
The most basic encoding is to “encode” each block as a 16-color block. This will actually result in a slightly larger frame size than a raw encoding since each 4×4 block will be prepended by a byte opcode (0xE0 in this case) to indicate encoding mode. This should demonstrate that the encoder is functioning at the most basic level.

Status: Successful. Try not to laugh too hard at the Big Buck Bunny dithered to an 8-bit palette:



Step 4: Better Representation Continue reading