Category Archives: Programming

Programming Language Levels

I’ve been doing this programming thing for some 20 years now. Things sure do change. One change I ponder from time to time is the matter of programming language levels. Allow me to explain.

The 1990s
When I first took computer classes in the early 1990s, my texts would classify computer languages into 3 categories, or levels. The lower the level, the closer to the hardware; the higher the level, the more abstract (and presumably, easier to use). I recall that the levels went something like this:

  • High level: Pascal, BASIC, Logo, Fortran
  • Medium level: C, Forth
  • Low level: Assembly language

Keep in mind that these were the same texts which took the time to explain the history of computers from mainframes -> minicomputers -> a relatively recent phenomenon called microcomputers or “PCs”.

Somewhere in the mid-late 1990s, when I was at university, I was introduced to a new tier:

  • Very high level: Perl, shell scripting

I think there was some debate among my peers about whether C++ and Java were properly classified as high or very high level. The distinction between high and very high, in my observation, seemed to be that very high level languages had more complex data structures (at the very least, a hash / dictionary / associative array / key-value map) built into the language, as well as implicit memory management.

Modern Day
These days, the old hierarchy is apparently forgotten (much like minicomputers). I observe that there is generally a much simpler 2-tier classification:

  • Low level: C, assembly language
  • High level: absolutely every other programming language in wide use today

I find myself wondering where C++ and Objective-C fit in this classification scheme. Then I remember that it doesn’t matter and this is all academic.

Relevancy
I think about this because I have pretty much stuck to low-level programming all of my life, mostly due to my interest in game and multimedia-type programming. But the trends in computing have favored many higher level languages and programming paradigms. I woke up one day and realized that the kind of work I often do — lower level stuff — is not very common.

I’m not here to argue that low or high level is superior. You know I’m all about using the appropriate tool for the job. But I sometimes find myself caught between worlds, having the defend and explain one to the other.

  • On one hand, it’s not unusual for the multitudes of programmers working at the high level to gasp and wonder why I or anyone else would ever use C or assembly language for anything when there are so many beautiful high level languages. I patiently explain that those languages have to be written in some other language (at first) and that they need to run on some operating system and that most assuredly won’t be written in a high level language. For further reading, I refer them to Joel Spolsky’s great essay called Back to Basics which describes why it can be useful to know at least a little bit about how the computer does what it does at the lowest levels.
  • On the other hand, believe it or not, I sometimes have to defend the merits of high level languages to my low level brethren. I’ll often hear variations of, “Any program can be written in C. Using a high level language to achieve the same will create a slow and bloated solution.” I try to explain that the trade-off in time to complete the programming task weighed against the often-negligible performance hit of what is often an I/O-bound operation in the first place makes it worthwhile to use the high level language for a wide variety of tasks.

    Or I just ignore them. That’s actually the best strategy.

Of ctors and dtors

I haven’t given up on the Sega Dreamcast programming. I was able to compile a bunch of homebrew code for the DC many years ago and I can’t make it work anymore. Again, I was working with a purpose-built, open source RTOS named KallistiOS (or KOS). I can make the programs compile but not run. I had ELF files left over from years ago which still executed. But when I tried to build new ELF files, no luck– the programs crashed before even reaching my main() function.

I found the problem: ELF files are comprised of a number of sections and 2 of these sections are named ‘.ctors’ and ‘.dtors’ which stand for constructors and destructors. The KOS RTOS performs a manual traversal of .ctors section during program initialization and this is where things go bad. The traversal code doesn’t seem to account for a .ctors section that only contains a single entry. I commented out the function that does the traversal and programs started to work, at least until it was time to exit the program and return control to the program loader. That’s when the counterpart .dtors section traversal code ran and demonstrated the same problem. I’ll exhibit the problematic code at the end of this post.

So I’m finally tinkering with Sega Dreamcast programming once again and with a slightly better grasp of software engineering than the first time I did this.

Portable and Compatible C?
If nothing else, this low-level embedded stuff exposes you to some serious toolchain arcana, the likes of which you will likely never see working strictly in the desktop arena.

Still, this exercise makes me wonder why C code from a decade ago doesn’t compile reliably now. Part of it is because gcc has gotten stricter about the syntax it will accept. In the case of this specific crashing problem, I suspect it comes down to a difference in the way the linker generates the final ELF file. I’ve written a list of items I have had to modify in the KOS codebase in order to get it to compile on more recent gcc versions. I wonder if it would be worth publishing the specifics, or if anyone would ever find the information useful? Oh, who am I kidding? Of course I’ll write it up, perhaps publish a new version of the code, if only because that’s the best chance I have of finding my own work again some years down the road.

Problematic C Code
See if this code makes any sense to you. It somehow traverse a list of 32-bit function pointers (in different directions, depending on constructors or destructors), executing each in turn. However, it appears to fall over if the list of pointers consists of a single entry.
Continue reading

Heroic Defender of the Stack

Problem Statement

I have been investigating stack smashing and countermeasures (stack smashing prevention, or SSP). Briefly, stack smashing occurs when a function allocates a static array on the stack and writes past the end of it, onto other local variables and eventually onto other function stack frames. When it comes time to return from the function, the return address has been corrupted and the program ends up some place it really shouldn’t. In the best case, the program just crashes; in the worst case, a malicious party crafts code to exploit this malfunction.

Further, debugging such a problem is especially obnoxious because by the time the program has crashed, it has already trashed any record (on the stack) of how it got into the errant state.

Preventative Countermeasure

GCC has had SSP since version 4.1. The computer inserts SSP as additional code when the -fstack-protector command line switch is specified. Implementation-wise, SSP basically inserts a special value (the literature refers to this as the ‘canary’ as in “canary in the coalmine”) at the top of the stack frame when entering the function, and code before leaving the function to make sure the canary didn’t get stepped on. If something happens to the canary, the program is immediately aborted with a message to stderr about what happened. Further, gcc’s man page on my Ubuntu machine proudly trumpets that this functionality is enabled per default ever since Ubuntu 6.10.

And that’s really all there is to it. Your code is safe from stack smashing by default. Or so the hand-wavy documentation would have you believe.

Not exactly

Exercising the SSP

I wanted to see the SSP in action to make sure it was a real thing. So I wrote some code that smashes the stack in pretty brazen ways so that I could reasonably expect to trigger the SSP (see later in this post for the code). Here’s what I learned that wasn’t in any documentation: Continue reading

Learn Multimedia Programming By Writing A JPEG Decoder

For those of you who hack on multimedia tech, how did you get started? Did you begin by studying the mathematical underpinnings of multimedia codec algorithms? Or did you find a practical problem and jump right in by writing code? (Personally, I was always more of a nuts & bolts hacker than a math guy.) I ask because I occasionally get emails from aspiring multimedia hackers who want to know where to begin. Invariably, they want to go the math-first route. I heavily discourage this approach.

I have a crazy idea for anyone who wants a crash course on multimedia hacking: write a JPEG decoder. In doing so, you will be exposed to a lot of key domain concepts such as bitstream parsing, Huffman decoding, dequantization, zigzagging, the dreaded (inverse) discrete cosine transform, YUV vs. RGB colorspaces, macroblock organization, delta coding, and run length coding.

Sure, JPEG decoding is a solved problem. But that’s hardly the point. Why would you enter an unfamiliar field and hope to come up to speed on the basics by leaping straight into the domain’s unsolved problems? If you are successful in this exercise, no one will ever use the fruits of your labor, but that doesn’t really matter.

So, do you want to learn multimedia hacking quickly? Then grab a JPEG file (maybe create a few contrived ones that are small, have friendly dimensions, and feature predictable patterns), grab a good JPEG reference, and implement the decoding algorithm in the language and platform of your choice.

On the matter of the reference, my personal favorite reference has always been A note about the JPEG decoding algorithm by Cristi Cuturicu. The English grammar is a bit dodgy but overall, it might be the best reference you’ll find on the matter– as simple as it needs to be, but no simpler.

Good luck!