For those of you who hack on multimedia tech, how did you get started? Did you begin by studying the mathematical underpinnings of multimedia codec algorithms? Or did you find a practical problem and jump right in by writing code? (Personally, I was always more of a nuts & bolts hacker than a math guy.) I ask because I occasionally get emails from aspiring multimedia hackers who want to know where to begin. Invariably, they want to go the math-first route. I heavily discourage this approach.
I have a crazy idea for anyone who wants a crash course on multimedia hacking: write a JPEG decoder. In doing so, you will be exposed to a lot of key domain concepts such as bitstream parsing, Huffman decoding, dequantization, zigzagging, the dreaded (inverse) discrete cosine transform, YUV vs. RGB colorspaces, macroblock organization, delta coding, and run length coding.
Sure, JPEG decoding is a solved problem. But that’s hardly the point. Why would you enter an unfamiliar field and hope to come up to speed on the basics by leaping straight into the domain’s unsolved problems? If you are successful in this exercise, no one will ever use the fruits of your labor, but that doesn’t really matter.
So, do you want to learn multimedia hacking quickly? Then grab a JPEG file (maybe create a few contrived ones that are small, have friendly dimensions, and feature predictable patterns), grab a good JPEG reference, and implement the decoding algorithm in the language and platform of your choice.
On the matter of the reference, my personal favorite reference has always been A note about the JPEG decoding algorithm by Cristi Cuturicu. The English grammar is a bit dodgy but overall, it might be the best reference you’ll find on the matter– as simple as it needs to be, but no simpler.
Good luck!
Mike! I’ve been following for awhile but I had to comment on this one. I started encoding a long time before I did any programming, but about half way through my first high school Java class I took to the task of writing a JPEG encoder on the side. It took a couple of months and I never did figure out the final huffman coding and bitstream stages, but otherwise it could compress and decompress images up through RLE/DPCM. Definitely a good learning exercise and it let me do some fun stuff like messing with the block size and transform algorithm to produce some really wild images. It would have been nice to reach that milestone of producing a file readable by other decoders, but oh well.
Ha! I started with JPEG too.
For the learning purpose. I must add one more link codecdictionary.com where any one of you multimedia guys need to consult with while working with codecs and take quick help. Mike! Since you mentioned http://www.opennet.ru/docs/formats/jpeg.txt , I think its really a good and comprehensive guide. Well, on the other hand, i wont argue on whether starting JPEG decoder or encoder! both can teach the essentials of multimedia.
You could bypass some of the pain of huffman decoding by using RTjpeg instead, but I guess there’s hardly any useful documentation about it (I’d like to claim that the FFmpeg decoder written by me should serve well as documentation, but I am biased I guess)…
@Reimar: I’m not too familiar with RTjpeg but it sounds like it might be similar to PSX MDEC in terms of complexity. I.e., JPEG without the messy Huffman part (most of the time).
Unfortunately, that’s sparsely documented as well and the samples tend to be a mess.
Funny you should mention the PSX MDEC because that was my gateway codec into the world of multimedia hacking. JPEG has more layers to its format which might be confusing for newcomers, so I feel lucky I got to start with the simpler PSX STR format. It introduced me to all the crazy acronyms and terms (IDCT, DCT, VLC, Huffman, AC, DC, ZRLC, qscale, qtab, etc.), and those core concepts you listed that are used in the most popular codecs today. Related to that, I still have a half-finished blog post in response to your post about the DCT http://multimedia.cx/eggs/dct-pr/ .
“PSX MDEC…sparsely documented”
I’d like to think my 45 page document about the format provides everything one would need to understand it–but I’m probably a little biased. ;)
I completely agree Mike! JPEG decoding and encoding was also one of my first multimedia programming excursions. Then I started modifying it to play around with wavelets and different forms of entropy coding. It’s amazing that first time seeing an image on the screen and you know it had to go through your code to get there. :)
TIFF is also fun because the file format is simple and there are so many strange varieties to implement.
Any documents that describe “progressive jpeg”. That has been by far the hardest thing for me to understand. One pass decoder done.
The relatively small C++ “jpgd” project on Google Code can decompress progressive jpeg files:
http://code.google.com/p/jpgd/
See the method jpeg_decoder::decode_scan() (and the methods around it) here:
http://code.google.com/p/jpgd/source/browse/trunk/jpegdecoder.cpp