<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Breaking Eggs And Making Omelettes &#187; Codec Technology</title>
	<atom:link href="http://multimedia.cx/eggs/category/codec-technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://multimedia.cx/eggs</link>
	<description>Topics On Multimedia Technology and Reverse Engineering</description>
	<lastBuildDate>Sun, 29 Apr 2012 05:01:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>On ALAC&#8217;s Open Sourcing</title>
		<link>http://multimedia.cx/eggs/alac-open-sourced/</link>
		<comments>http://multimedia.cx/eggs/alac-open-sourced/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 06:52:34 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/?p=3608</guid>
		<description><![CDATA[Thoughts and observations about Apple's open sourcing of their ALAC (Apple Lossless) codec]]></description>
			<content:encoded><![CDATA[<p>Apple open sourced their <a href="http://wiki.multimedia.cx/index.php?title=Apple_Lossless_Audio_Coding">lossless audio codec</a> last week. <a href="http://alac.macosforge.org/">Pretty awesome!</a> I have a theory that, given enough time, absolutely every codec will be open source in one way or another.</p>
<p>I know I shouldn&#8217;t bother reading internet conversation around any news related to multimedia technology. And if I do read it, I shouldn&#8217;t waste any effort getting annoyed about them. But here are some general corrections:</p>
<ul>
<li>ALAC is not in the same league as &#8212; nor is it a suitable replacement for &#8212; MP3/AAC/Vorbis or any other commonly used perceptual audio codec. It&#8217;s not a matter of better or worse; they&#8217;re just different families of codecs designed for different purposes.</li>
<li>Apple open sourced ALAC, not AAC&#8211; easy mistake, though there&#8217;s nothing to &#8216;open source&#8217; about AAC (though people can, and will, argue about its absolute &#8216;open-ness&#8217;).</li>
<li>There&#8217;s not much technical room to argue between ALAC and FLAC, the leading open source lossless audio compressor. Both perform similarly in terms of codec speeds (screamingly fast) and compression efficiency (results vary slightly depending on source material).</li>
<li>Perhaps the most frustrating facet is the blithe ignorance about ALAC&#8217;s current open source status. While this event simply added an official &#8220;open source&#8221; status to the codec, ALAC has effectively been open source for a very long time. According to my notes, the ALAC decoding algorithm was <a href="http://multimedia.cx/eggs/apple-lossless-audio-codec-red/">reverse engineered in 2005 and added into FFmpeg in March of the same year</a>. Then in 2008, Google &#8212; through their Summer of Code program &#8212; <a href="http://multimedia.cx/eggs/gsoc-showcase-alac/">sponsored an open source ALAC encoder</a>.</li>
</ul>
<p>From the multimedia-savvy who are versed in these concepts, the conversation revolves around which would win in a fight, ALAC or <a href="http://flac.sourceforge.net/">FLAC</a>? And who between Apple and FFmpeg/Libav has a faster  ALAC decoder? The faster and more efficient ALAC encoder? I contend that these issues don&#8217;t really matter. If you have any experience working with lossless audio encoders, you know that they tend to be ridiculously fast to both encode and decode and that many different lossless codecs compress at roughly the same ratios.</p>
<p>As for which encoder is the fastest: use whatever encoder is handiest and most familiar, either iTunes or FFmpeg/Libav.</p>
<p>As for whether to use FLAC or ALAC &#8212; if you&#8217;ve already been using one or the other for years, keep on using it. Support isn&#8217;t going to vanish. If you&#8217;re deciding which to use for a new project, again, perhaps choose based on software you&#8217;re already familiar with. Also, consider hardware support&#8211; ALAC enjoys iPod support, FLAC is probably better supported in a variety of non-iPod devices, though that may change going forward due to this open sourcing event.</p>
<p>For my part, I&#8217;m just ecstatic that the question of moral superiority based on open source status has been removed from the equation.</p>
<p>Code-wise, I&#8217;m interested in studying the official ALAC code to see if it has any corner-case modes that the existing open source decoders don&#8217;t yet account for. The source makes mention of multichannel (i.e., greater than stereo) configurations, but I don&#8217;t know if that&#8217;s in FFmpeg/Libav.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/alac-open-sourced/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>More Cinepak Madness</title>
		<link>http://multimedia.cx/eggs/more-cinepak-madness/</link>
		<comments>http://multimedia.cx/eggs/more-cinepak-madness/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 06:13:36 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/?p=3601</guid>
		<description><![CDATA[Cataloging different types of Cinepak video codec data]]></description>
			<content:encoded><![CDATA[<p>Fellow digital archaeologist <a href="http://clone2727.blogspot.com/">Clone2727</a> found a <em>possible</em> fifth variant of the <a href="http://wiki.multimedia.cx/index.php?title=Cinepak">Cinepak video codec</a>. He asked me if I cared to investigate the sample. I assured him I wouldn&#8217;t be able to die a happy multimedia nerd unless I have cataloged all possible Cinepak variants known to exist in the wild. I&#8217;m sure there are chemistry nerds out there who are ecstatic when another element is added to the periodic table. Well, that&#8217;s me, except with weird multimedia formats.</p>
<p><strong>Background</strong><br />
Cinepak is a video codec that saw widespread use in the early days of digital multimedia. To date, we have cataloged 4 variants of Cinepak in the wild. This distinction is useful when trying to write and maintain an all-in-one decoder. The variants are:</p>
<ol>
<li>The standard type: Most Cinepak data falls into this category. It decodes to a modified/simplified YUV 4:2:0 planar colorspace and is often seen in AVI and QuickTime/MOV files.</li>
<li>8-bit greyscale: Essentially the same as the standard type but with only a Y plane. This has only been identified in AVI files and is distinguished by the file header&#8217;s video bits/pixel field being set to 8 instead of 24.</li>
<li>8-bit paletted: Again, this is identified by the video header specifying 8 bits/pixel for a Cinepak stream. There is essentially only a Y plane in the data, however, each 8-bit value is a palette index. The palette is transported along with the video header. To date, only one known sample of this format has even been spotted in the wild, and it&#8217;s classified as NSFW. It is also a QuickTime/MOV file.</li>
<li><a href="http://wiki.multimedia.cx/index.php?title=Sega_FILM">Sega/FILM CPK</a> data: Sega Saturn games often used CPK files which stored a variant of Cinepak that, while very close the standard Cinepak, couldn&#8217;t be decoded with standard decoder components.</li>
</ol>
<p>So, a flexible Cinepak decoder has to identify if the file&#8217;s video header specified 8 bits/pixel. How does it distinguish between greyscale and paletted? If a file is paletted, a custom palette should have been included with the video header. Thus, if video bits/pixel is 8 and a palette is present, use paletted; else, use greyscale. Beyond that, the Cinepak decoder has a heuristic to determine how to handle the standard type of data, which might deviate slightly if it comes from a Sega CPK file.</p>
<p><strong>The Fifth Variant?</strong><br />
Now, regarding this fifth variant&#8211; the reason this issue came up is because of that aforementioned heuristic. Basically, a Cinepak chunk is supposed to store the length of the entire chunk in its header. The data from a Sega CPK file plays fast and loose with this chunk size and the discrepancy makes it easy to determine if the data requires special handling. However, a handful of files discovered on a Macintosh game called <a href="http://www.mobygames.com/game/macintosh/journeyman-project-pegasus-prime">&#8220;The Journeyman Project: Pegasus Prime&#8221;</a> have chunk lengths which are sometimes in disagreement with the lengths reported in the containing QuickTime file&#8217;s stsz atom. This trips the heuristic and tries to apply the CPK rules against Cinepak data which, aside from the weird chunk length, is perfectly compliant.</p>
<p>Here are the first few chunk sizes, as reported by the file header (stsz atom) and the chunk:</p>
<pre>
size from stsz = 7880 (0x1EC8); from header = 3940 (0xF64)
size from stsz = 3940 (0xF64); from header = 3940 (0xF64)
size from stsz = 15792 (0x3DB0); from header = 3948 (0xF6C)
size from stsz = 11844 (0x2E44); from header = 3948 (0xF6C)
</pre>
<p>Hey, there&#8217;s a pattern here. If they don&#8217;t match, then the stsz size is an even multiple of the chunk size (2x, 3x, or 4x in my observation). I suppose I could revise the heuristic to state that if the stsz size is 2x, 3x, 4x, or equal to the chunk header, qualify it as compliant Cinepak data.</p>
<p>Of course it feels impure, but software engineering is rarely about programmatic purity. A decade of special cases in the <a href="http://ffmpeg.org/">FFmpeg</a> / <a href="http://libav.org/">Libav</a> codebases are a testament to that.</p>
<p><strong>What&#8217;s A Variant?</strong><br />
Suddenly, I find myself contemplating what truly constitutes a variant. Maybe this was just a broken encoder program making these files? And for that, I assign it the designation of distinct variation, like some sort of special, unique showflake?</p>
<p>Then again, I documented <a href="http://wiki.multimedia.cx/index.php?title=Flic_Video#Magic_Carpet">Magic Carpet FLIC</a> as being a distinct variant of the broader FLIC format (which has <a href="http://www.compuphase.com/flic.htm">an enormous number of variants</a> as well).</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/more-cinepak-madness/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>DCT PR</title>
		<link>http://multimedia.cx/eggs/dct-pr/</link>
		<comments>http://multimedia.cx/eggs/dct-pr/#comments</comments>
		<pubDate>Fri, 03 Jul 2009 06:36:05 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/?p=1625</guid>
		<description><![CDATA[Multimedia compression is more than discrete cosine transforms]]></description>
			<content:encoded><![CDATA[<p>Some people think that multimedia compression is basically all <a href="http://en.wikipedia.org/wiki/Discrete_cosine_transform">discrete cosine transform (DCT)</a> and little else.</p>
<p>2 years ago at <a href="http://www.linuxtag.org/">LinuxTag</a>, I gave a fairly general presentation regarding <a href="http://ffmpeg.org/">FFmpeg</a> and open source multimedia hacking (I just noticed that the main page <em>still</em> uses a photo of me and my presentation). I theorized that one problem our little community has when it comes to attracting new multimedia hacking talent is that the field seems scary and mathematically unapproachable. I have this perception that this is what might happen when a curious individual wants to get into multimedia hacking:</p>
<blockquote><p>
I wonder how multimedia compression works?</p>
<p>Well, I&#8217;ve heard that everyone uses something called MPEG for multimedia compression.</p>
<p>Further, I have heard something about how MPEG is based around the discrete cosine transform (DCT).</p>
<p>Let&#8217;s look up what the DCT is, exactly&#8230;
</p></blockquote>
<p><center><br />
<a href="http://blog.lib.umn.edu//mcfa0086/discretecosine/"><img src="http://multimedia.cx/eggs/wp-content/uploads/2009/07/discete-cosine-transform.jpg" alt="Discrete cosine transform written out on a chalkboard" title="Discrete cosine transform written out on a chalkboard" width="400" height="80" class="aligncenter size-full wp-image-1628" /></a><br />
<em>Clever photo cribbed from a blog actually entitled <a href="http://blog.lib.umn.edu//mcfa0086/discretecosine/">Discrete Cosine</a></em><br />
</center></p>
<p>At which point the prospective contributor screams and runs away from the possibility of ever being productive in the field. </p>
<p>Now, the original talk discussed how that need not be the case, because DCT is really a minor part of multimedia technology overall; how there are lots and lots of diverse problems in the field yet to solve; and how there is room for lots of different types of contributors.</p>
<p>The notion of DCT&#8217;s paramount importance in the grand scheme of multimedia compression persists to this day. While reading the HTML5 spec development mailing list, <a href="http://blog.gingertech.net/">Sylvia Pfeiffer</a> expressed <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020667.html">this same line of thinking vis-à-vis Theora</a>:</p>
<blockquote><p>
Even if there is no vendor right now who produces an ASIC for Theora, the components of the Theora codec are not fundamentally different to the components of other DCT based codecs. Therefore, AISCs <em>[sic]</em> that were built for other DCT based codecs may well be adaptable by the ASIC vendor to support Theora.
</p></blockquote>
<p>This prompted me to recall something I read in <a href="http://www.faqs.org/faqs/mpeg-faq/part2/">the MPEG FAQ</a> a long time ago:</p>
<blockquote><p>
<strong>MPEG is a DCT based scheme?</strong></p>
<p>The DCT and Huffman algorithms receive the most press coverage (e.g. &#8220;MPEG is a DCT based scheme with Huffman coding&#8221;), but are in fact less significant when compared to the variety of coding modes signaled to the decoder as context-dependent side information. The MPEG-1 and MPEG-2 IDCT has the same definition as H.261, H.263, JPEG.
</p></blockquote>
<p>A few questions later, the FAQ describes no less than 18 different considerations that help compress video data in MPEG; only the first one deals with transforms. Theora is much the same way. When I wrote the <a href="http://multimedia.cx/vp3-format.txt">document about Theora&#8217;s foundation codec, VP3</a>, I started by listing off all of the coding methods involved: DCT, quantization, run length encoding, zigzag reordering, predictive coding, motion compensation, Huffman entropy coding, and variable length run length Booleans. Theora adds a few more concepts (such as encoding the large amount of stream-specific configuration data).</p>
<p>I used to have the same idea, though: I was one of the first people to download <a href="http://www.duck.com/vpvision/">On2&#8242;s VpVision package</a> (the release of their VP3 code) and try to understand the algorithm. I remember focusing on the DCT and trying to find DCT-related code, assuming that it was central to the codec. I was surprised and confused to find that a vast amount of logic was devoted to simply reversing DC coefficient prediction. At the end of a huge amount of frame reconstruction code was a small, humble call to an IDCT function.</p>
<p>What I would like to get across here is that Theora is rather different than most video codecs, in just about every way you can name (no, wait: the base quantization matrix for golden frames is the same as the quantization matrix found in JPEG). As for the idea that most DCT-based codecs are all fundamentally the same, ironically, you can&#8217;t even count on that with Theora&#8211; its DCT is different than the one found in MPEG-1/2/4, H.263, and JPEG (which all use the same DCT). This was likely done in On2&#8242;s valiant quest to make everything about the codec <em>just different enough</em> from every other popular codec, which runs quite contrary to the hope that ASIC vendors should be able to simply re-use a bunch of stuff used from other codecs.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/dct-pr/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Reverse Engineering Math Formulas</title>
		<link>http://multimedia.cx/eggs/reverse-engineering-math-formulas/</link>
		<comments>http://multimedia.cx/eggs/reverse-engineering-math-formulas/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 01:04:15 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[zmbv]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/?p=1296</guid>
		<description><![CDATA[I need pictures to help me understand math]]></description>
			<content:encoded><![CDATA[<p>Even though I have been studying and working on multimedia technology since 2000 -- reverse engineering, documenting, and reimplementing a variety of audio and video codecs -- I didn't actually begin to understand <strong><em>why</em></strong> various algorithms achieved their compression until about 2003. I'm just like that -- I study the practice first, and then the underlying theory eventually becomes clear to me (maybe; it has been 9 years and I still couldn't explain everything about the <a href="http://en.wikipedia.org/wiki/Discrete_cosine_transform">discrete cosine transform</a> if you asked).</p>
<p>I happened to be looking back over the <a href="http://wiki.multimedia.cx/index.php?title=DosBox_Capture_Codec">ZMBV (DOSBox) video codec</a> today. <span id="more-1296"></span> The <a href="http://multimedia.cx/eggs/zmbv-tinkering/">last time I looked at ZMBV</a>, I thought of a slightly better method for computing error between 2 blocks. My method involved summing the number of pixels that differed between 2 blocks. <a href="http://guru.multimedia.cx/">Michael</a> promptly did me one better by implementing a "0<sup>th</sup>-order entropy approximator". When I took a closer look at the method today, I thought it was functionally identical to my algorithm. It is XOR'ing 2 vectors of pixels, maintaining a histogram of the results (i.e., the number of pixel pairs whose bitwise XOR was 1, 2, etc.), then summing the numbers in the histogram array.</p>
<p>Except not quite. A closer examination of the summation revealed "sum+= score_tab[histogram[i]]", and it occurs on elements 1..255. What is score_tab[]? Why, it's:</p>
<div class="igBar"><span id="lc-1"><a href="#" onclick="javascript:showPlainTxt('c-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-1">
<div class="c">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">/* for i from 1..255 */</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; score_tab<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>= -i * log<span style="color: #66cc66;">&#40;</span>i/<span style="color: #66cc66;">&#40;</span><span style="color: #993333;">double</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#40;</span>ZMBV_BLOCK*ZMBV_BLOCK<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> * <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;color:#800000;">256</span>/M_LN2<span style="color: #66cc66;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>(ZMBV_BLOCK = 16, M_LN2 = ln(2)).</p>
<p>I want pictures. Pictures help me a lot. I don't do a lot of heavy duty math lifting and am not up to date on modern tools. But I figured that in this day and age, there must be plenty of graphing tools on the web. This is the top <a href="http://www.walterzorn.com/grapher/grapher_e.htm">Google hit for "graph a function"</a> and it offers the following graph:</p>
<p><center><br />
<img src="http://multimedia.cx/eggs/wp-content/uploads/2009/03/zmbv-me-graph.png" alt="y = -x * ln (x / 256) * (256 / ln (2) ) " title="y = -x * ln (x / 256) * (256 / ln (2) ) " width="383" height="369" class="aligncenter size-full wp-image-1299" /><br />
</center></p>
<p>So that's interesting. Now it's time to reconcile this with <a href="http://codecs.multimedia.cx/">Kostya's</a> helpful <a href="http://multimedia.cx/eggs/zmbv-tinkering/#comment-105328">description of the method</a>:</p>
<blockquote><p>
First of all, Michael’s code explained: he used table to estimate entropy (well, base term from the Shannon’s theory of information) of the XOR’ed block. In other words, the higher count of the same symbols, the lower entropy and the better compression of the block. 0-th order means he didn’t use context to estimate entropy.
</p></blockquote>
<p>I think I might be starting to understand. My method was predicated on trying to generate an error vector that had as many 0s as possible grouped into strings of 8 bits. I.e., I was trying to maximize 0 bytes without regard to what the non-zero bytes contained. Michael's method is also concerned with maximizing 0 bytes but also cares about what is happening in those non-zero bytes. That graph indicates that the method prefers values at the top and bottom of the unsigned 8-bit range. Why would that be? I suspect that it may have something to do with the fact that the higher and lower values have longer strings of repeating symbols (e.g., 1 = 00000001; 255 = 11111111).</p>
<p>Or I might be completely wrong.</p>
<p>Sorry if this is kindergarten-level stuff to some of you readers. But you know I like to use this space as a personal technical journal for working out these lessons.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/reverse-engineering-math-formulas/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Special QuickTime Features</title>
		<link>http://multimedia.cx/eggs/special-quicktime-features/</link>
		<comments>http://multimedia.cx/eggs/special-quicktime-features/#comments</comments>
		<pubDate>Mon, 12 Jan 2009 07:21:40 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Video Codecs]]></category>
		<category><![CDATA[apple]]></category>
		<category><![CDATA[gain]]></category>
		<category><![CDATA[video codec]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/?p=867</guid>
		<description><![CDATA[I found real-world samples showcasing some interesting QuickTime features]]></description>
			<content:encoded><![CDATA[<p>I processed some more unknown samples today, the ones that came from <a href="http://multimedia.cx/eggs/i-heart-picsearch-and-python/">last month's big Picsearch score</a>. I found some interesting QuickTime specimens. One of them was filed under video codec FourCC 'fire'. The sample only contained one frame of type fire and that frame was very small (238 bytes) and looked to contain a number of small sub-atoms. Since the sample had a .mov extension, I decided to check it out in Apple's QuickTime Player. It played fine, and you can see the result on the <a href="http://wiki.multimedia.cx/index.php?title=FIRE">new fire page I made in the MultimediaWiki</a>. Apparently, it's built into QuickTime. The file also features a single frame of <a href="http://wiki.multimedia.cx/index.php?title=RPZA">RPZA video data</a>. My guess is that the logo on display is encoded with RPZA while the fire block defines parameters for a fire animation.</p>
<p>Moving right along, I got to another set of QuickTime samples that were filed under 'gain' video codec. This appears to be another meta-codec and this is what it looks like in action:</p>
<p><center><br />
<img src="http://multimedia.cx/eggs/wp-content/uploads/2009/01/apple-gain-codec.jpg" alt="Apple QuickTime Player using the gain/fade feature" title="Apple QuickTime Player using the gain/fade feature" width="359" height="680" class="aligncenter size-full wp-image-866" /><br />
</center></p>
<p>I decided to post this pretty screenshot here since I didn't feel like creating another Wiki page for what I perceive to be not a "real" video codec. The <a href="http://canto.de/main_menu/products/cumulus/movie/CumulusQuickTimeSlideshow.mov">foregoing CumulusQuickTimeSlideshow.mov sample comes from here</a> and actually contains 5 separate trak atoms: 2 define 'jpeg' data, 1 is 'gain', 1 is 'dslv' and the last is 'text', which defines ASCII strings containing the filenames on the bottom of the slideshow. I have no idea what the dslv atom is for, but something, somewhere in the file defines whether this so-called alpha gain effect will use a cross fade (as seen with the Cumulus shapes) or if it will use an Iris transitional effect (as seen in <a href="http://idonotlike.tv/2003/na_visit03.mov">the sample na_visit03.mov here</a>).</p>
<p>So much about the QuickTime format remains a mystery.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/special-quicktime-features/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
<enclosure url="http://canto.de/main_menu/products/cumulus/movie/CumulusQuickTimeSlideshow.mov" length="120205" type="video/quicktime" />
<enclosure url="http://idonotlike.tv/2003/na_visit03.mov" length="293460" type="video/quicktime" />
		</item>
		<item>
		<title>Video Coding Concepts: YUV and RGB Colorspaces And Pixel Formats</title>
		<link>http://multimedia.cx/eggs/yuv-and-rgb/</link>
		<comments>http://multimedia.cx/eggs/yuv-and-rgb/#comments</comments>
		<pubDate>Fri, 16 Nov 2007 01:18:51 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[Video Codecs]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/yuv-and-rgb/</guid>
		<description><![CDATA[Understanding all of those different pixel formats...]]></description>
			<content:encoded><![CDATA[<p>If you have any experience in programming computer graphics, you probably know all about red/green/blue (RGB) video modes and pixel formats. Guess what? It is all useless now that you are working on video codec technology!</p>
<p>No, that's not entirely true. Some video codecs operate on RGB video natively. A majority of modern codecs use some kind of YUV colorspace. We will get to that. Since many programmers are familiar with RGB pixel formats, let's use that as a starting point.</p>
<p><strong>RGB Colors</strong></p>
<p>To review, computers generally display RGB pixels. These pixels have red (R), green (G), and blue (B) components to them. Here are the various combinations of R, G, and B components at their minimum (0) and maximum (255/0xFF) values:</p>
<table border="1" cellpadding="2">
<tr>
<th>R</th>
<th>G</th>
<th>B</th>
<th>color</th>
<th>notes:</th>
</tr>
<tr>
<td>0x00</td>
<td>0x00</td>
<td>0x00</td>
<td width="256" height="32" bgcolor="#000000"></td>
<td><em>absence of R, G, and B = full black</em></td>
</tr>
<tr>
<td>0x00</td>
<td>0x00</td>
<td>0xFF</td>
<td width="256" height="32" bgcolor="#0000ff"></td>
<td><em>full blue</em></td>
</tr>
<tr>
<td>0x00</td>
<td>0xFF</td>
<td>0x00</td>
<td width="256" height="32" bgcolor="#00ff00"></td>
<td><em>full green</em></td>
</tr>
<tr>
<td>0x00</td>
<td>0xFF</td>
<td>0xFF</td>
<td width="256" height="32" bgcolor="#00ffff"></td>
</tr>
<tr>
<td>0xFF</td>
<td>0x00</td>
<td>0x00</td>
<td width="256" height="32" bgcolor="#ff0000"></td>
<td><em>full red</em></td>
</tr>
<tr>
<td>0xFF</td>
<td>0x00</td>
<td>0xFF</td>
<td width="256" height="32" bgcolor="#ff00ff"></td>
</tr>
<tr>
<td>0xFF</td>
<td>0xFF</td>
<td>0x00</td>
<td width="256" height="32" bgcolor="#ffff00"></td>
</tr>
<tr>
<td>0xFF</td>
<td>0xFF</td>
<td>0xFF</td>
<td width="256" height="32" bgcolor="#ffffff"></td>
<td><em>full R, G, and B combine to make full white</em></td>
</tr>
</table>
<p><strong>YUV Colors</strong><br />
If you are used to dealing with RGB colors, YUV will seem a bit unintuitive at first. What does YUV stand for? Nothing you would guess. It turns out Y stands for intensity. U stands for blue and V stands for red. U is also denoted as C<sub>b</sub> and V is also denoted as C<sub>r</sub>. So YUV is sometimes written as YC<sub>b</sub>C<sub>r</sub>. </p>
<p>Here are the various combinations of Y, U, and V components at their minimum (0) and maximum (255/0xFF) values:</p>
<table border="1" cellpadding="2">
<tr>
<th>Y</th>
<th>U/<br />C<sub>b</sub></th>
<th>V/<br />C<sub>r</sub></th>
<th>color</th>
<th>notes</th>
</tr>
<tr>
<td>0x00</td>
<td>0x00</td>
<td>0x00</td>
<td><img src="/eggs/images/yuv-00-00-00.png" alt="" /></td>
</tr>
<tr>
<td>0x00</td>
<td>0x00</td>
<td>0xFF</td>
<td><img src="/eggs/images/yuv-00-00-ff.png" alt="" /></td>
</tr>
<tr>
<td>0x00</td>
<td>0xFF</td>
<td>0x00</td>
<td><img src="/eggs/images/yuv-00-ff-00.png" alt="" /></td>
</tr>
<tr>
<td>0x00</td>
<td>0xFF</td>
<td>0xFF</td>
<td><img src="/eggs/images/yuv-00-ff-ff.png" alt="" /></td>
</tr>
<tr>
<td>0xFF</td>
<td>0x00</td>
<td>0x00</td>
<td><img src="/eggs/images/yuv-ff-00-00.png" alt="" /></td>
<td><em>full green</em></td>
</tr>
<tr>
<td>0xFF</td>
<td>0x00</td>
<td>0xFF</td>
<td><img src="/eggs/images/yuv-ff-00-ff.png" alt="" /></td>
</tr>
<tr>
<td>0xFF</td>
<td>0xFF</td>
<td>0x00</td>
<td><img src="/eggs/images/yuv-ff-ff-00.png" alt="" /></td>
</tr>
<tr>
<td>0xFF</td>
<td>0xFF</td>
<td>0xFF</td>
<td><img src="/eggs/images/yuv-ff-ff-ff.png" alt="" /></td>
</tr>
<tr>
<td>0x00</td>
<td>0x80</td>
<td>0x80</td>
<td><img src="/eggs/images/yuv-00-80-80.png" alt="" /></td>
<td><em>full black</em></td>
</tr>
<tr>
<td>0x80</td>
<td>0x80</td>
<td>0x80</td>
<td><img src="/eggs/images/yuv-80-80-80.png" alt="" /></td>
</tr>
<tr>
<td>0xFF</td>
<td>0x80</td>
<td>0x80</td>
<td><img src="/eggs/images/yuv-ff-80-80.png" alt="" /></td>
<td><em>full white</em></td>
</tr>
</table>
<p>So, all minimum and all maximum components do not generate intuitive (read: similar to RGB) results. In fact, all 0s in the YUV colorspace result in a dull green rather than black. That last point is useful to understand when a video is displaying a lot of green block errors-- that probably means that the decoder is skipping blocks of data completely and leaving the underlying YUV data as all 0.</p>
<p><strong>Further Reading:</strong></p>
<ul>
<li><a href="http://wiki.multimedia.cx/index.php?title=YCbCr">YC<sub>b</sub>C<sub>r</sub></a> at the MultimediaWiki</li>
<li><a href="http://wiki.multimedia.cx/index.php?title=Category:YCbCr_Formats">YC<sub>b</sub>C<sub>r</sub> Formats </a> category page at the MultimediaWiki</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/yuv-and-rgb/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Palette Communication</title>
		<link>http://multimedia.cx/eggs/palette-communication/</link>
		<comments>http://multimedia.cx/eggs/palette-communication/#comments</comments>
		<pubDate>Thu, 08 Nov 2007 04:31:39 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[Open Source Multimedia]]></category>
		<category><![CDATA[colors]]></category>
		<category><![CDATA[multimedia]]></category>
		<category><![CDATA[palette]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/palette-communication/</guid>
		<description><![CDATA[If there is one meager accomplishment I think I can claim in the realm of open source multimedia, it would be as the point-man on palette support in xine, MPlayer, and FFmpeg. Problem statement: Many multimedia formats -- typically older formats -- need to deal with color palettes alongside compressed video. There are generally three [...]]]></description>
			<content:encoded><![CDATA[<p>If there is one meager accomplishment I think I can claim in the realm of open source multimedia, it would be as the point-man on palette support in <a href="http://xinehq.de/">xine</a>, <a href="http://mplayerhq.hu/">MPlayer</a>, and <a href="http://ffmpeg.org/">FFmpeg</a>.</p>
<p><center><br />
<img src="http://wiki.multimedia.cx/images/3/34/Noia_palette.png" alt="Palette icon" /><br />
</center></p>
<p>Problem statement: Many multimedia formats -- typically older formats -- need to deal with color palettes alongside compressed video. There are generally three situations arising from paletted video codecs:</p>
<ol>
<li>The palette is encoded in the video codec's data stream. This makes palette handling easy since the media player does not need to care about ferrying special data between layers. Examples: <a href="http://wiki.multimedia.cx/index.php?title=Flic_Video">Autodesk FLIC</a> and <a href="http://wiki.multimedia.cx/index.php?title=VQA">Westwood VQA</a>.
</li>
<li>The palette is part of the transport container's header data. Generally, a modular media player will need to communicate the palette from the file demuxer layer to the video decoder layer via an out-of-band/extradata channel provided by the program's architecture. Examples: <a href="http://wiki.multimedia.cx/index.php?title=QuickTime_container">QuickTime files</a> containing <a href="http://wiki.multimedia.cx/index.php?title=Apple_QuickTime_RLE">Apple Animation (RLE)</a> or <a href="http://wiki.multimedia.cx/index.php?title=Apple_SMC">Apple Video (SMC)</a> data.
</li>
<li>The palette is stored separately from the video data and must be transported between the demuxer and the video decoder. However, the palette could potentially change at any time during playback. This can provide a challenge if the media player is designed with the assumption that a palette would only occur at initialization. Examples: <a href="http://wiki.multimedia.cx/index.php?title=Microsoft_Audio/Video_Interleaved">AVI files</a> containing paletted video data (such as <a href="http://wiki.multimedia.cx/index.php?title=Microsoft_RLE">MS RLE</a>) and <a href="http://wiki.multimedia.cx/index.php?title=Wing_Commander_III_MVE">Wing Commander III MVE</a>.
</li>
</ol>
<p>Transporting the palette from the demuxer layer to the decoder layer is not the only be part of the battle. In some applications, such as FFmpeg, the palette data also needs to travel from the decoder layer to the video output layer, the part that creates a final video frame to either be displayed or converted. This used to cause a problem for the multithreaded ffplay component of FFmpeg. The original mechanism (that I put into place) was not thread-safe-- palette changes ended up occurring sooner than they were supposed to. The primary ffmpeg command line conversion tool is single-threaded so it does not have the same problem. xine is multi-threaded but does not suffer from the ffplay problem because all data sent from the video decoder layer to the video output layer must be in a YUV format, thus paletted images are converted before leaving the layer. I'm not sure about MPlayer these days, but when I implemented a paletted format (FLIC), I rendered the data in higher bit depths in the decoder layer. I would be interested to know if MPlayer's video output layer can handle palettes directly these days.</p>
<p>I hope this has been educational from a practical multimedia hacking perspective.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/palette-communication/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>VQ Case Study: Textures</title>
		<link>http://multimedia.cx/eggs/vq-case-study-textures/</link>
		<comments>http://multimedia.cx/eggs/vq-case-study-textures/#comments</comments>
		<pubDate>Sat, 28 Apr 2007 01:47:37 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[Vector Quantization]]></category>
		<category><![CDATA[Video Codecs]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/vq-case-study-textures/</guid>
		<description><![CDATA[Per my understanding, a lot of 3D hardware operates by allowing the programmer to specify a set of vertices between which the graphics chip draws lines. Then, the programmer can specify that a bitmap needs to be plotted between some of those lines. In 3D graphics parlance, those bitmaps are called textures. More textures make [...]]]></description>
			<content:encoded><![CDATA[<p>Per my understanding, a lot of 3D hardware operates by allowing the programmer to specify a set of vertices between which the graphics chip draws lines. Then, the programmer can specify that a bitmap needs to be plotted between some of those lines. In 3D graphics parlance, those bitmaps are called textures. More textures make a game prettier, but a graphics card only has so much memory for storing these textures. In order to stretch the video RAM budget, some graphics cards allow for compressing textures using vector quantization.</p>
<p>A specific example of VQ in 3D graphics hardware is the <a href="http://en.wikipedia.org/wiki/Dreamcast">Sega Dreamcast</a> with its <a href="http://en.wikipedia.org/wiki/PowerVR">PowerVR2</a> graphics hardware. Textures can be specified in a number of pixel formats including, but not limited to, RGB555, RGB565, and VQ. In the VQ mode, a 256-entry vector codebook is initialized somewhere in video RAM. Each vector is 8 bytes large and specifies a 2x2 block of pixels in either RGB555 or RGB565 (can't remember which, or it might be configurable). For the texture in video RAM that is specified as VQ, each byte is actually an index into the codebook. Instant 8:1 compression, notwithstanding the 2048-byte codebook overhead which can be negligible depending on how many textures leverage the codebook and how large those textures are.</p>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/vq-case-study-textures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VQ Case Study: RoQ</title>
		<link>http://multimedia.cx/eggs/vq-case-study-roq/</link>
		<comments>http://multimedia.cx/eggs/vq-case-study-roq/#comments</comments>
		<pubDate>Thu, 26 Apr 2007 14:26:58 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[Vector Quantization]]></category>
		<category><![CDATA[Video Codecs]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/vq-case-study-roq/</guid>
		<description><![CDATA[RoQ was first developed for the FMV-based adventure game The 11th hour and was later adopted by Id for the Quake III engine and derivative games. RoQ operates in a YUV 4:2:0 space. However, it was developed for a game released in the late 1994/early 1995 timeframe. Back then, cutting edge video was 640x480 at [...]]]></description>
			<content:encoded><![CDATA[<p>RoQ was first developed for the FMV-based adventure game <a href="http://www.mobygames.com/game/11th-hour"><em>The 11th hour</em></a> and was later adopted by Id for the <a href="http://www.mobygames.com/game/quake-iii-arena"><em>Quake III</em></a> engine and derivative games.</p>
<p>RoQ operates in a YUV 4:2:0 space. However, it was developed for a game released in the late 1994/early 1995 timeframe. Back then, cutting edge video was 640x480 at 256 colors or <em>maybe</em> 64K colors. and it was not feasible to take a large video frame and convert the entire thing from YUV -> RGB 30, 24, or even 15 times per second. However, RoQ's design solved some of these problems.</p>
<p><span id="more-429"></span></p>
<p>The first RoQ frame contains a full vector codebook. The codebook contains 2x2 pixel vectors and 4x4 pixel vectors. The remainder of the encoded stream contains codes for tiling these vectors in order to reconstruct an image. Further, interframes can take advantage of motion compensation. Frames also have the option of transmitting new vectors to be used in the codebook.</p>
<p>An important design decision is that the vectors are decoded into the codebook in advance of decoding and can be converted to any format necessary. I.e., if the graphic hardware demanded RGB555 or RGB565, as was probably the original assumption for the codec, the decoder can convert all the vectors from YUV 4:2:0 to the appropriate colorspace during the file initialization. Then it can simply tile using these vectors without concerning itself with further colorspace conversion unless replacement vectors show up in successive frames.</p>
<p>However, since it is a proper YUV colorspace, <a href="http://ffmpeg.org/">FFmpeg</a> just decodes and outputs the raw YUV 4:2:0 data.</p>
<p>Wired ran a portion of the <a href="http://www.wired.com/wired/archive/3.08/shipping.html">codec's creator's diary</a> (one <a href="http://www.mobygames.com/developer/sheet/view/developerId,2913/">Graeme Devine</a>) that describes how imperative it was to squeeze performance out of the encoder. When you have a lot of video to encode and you wish to meet a shipping deadline, AND you are encoding using VQ, performance tweaks count:</p>
<blockquote><p>A technical Daily Report. Sorry, it's the way I'm thinking right now.</p>
<p>Well, I didn't think that I could meet my goal of knocking a second out of drawing 100 dense quad structures, but I did. Twice. Tonight it's down to 5.7 seconds. A total of two seconds faster.</p>
<p>What does this mean? A dense quad is around 10 times larger than a typical frame chunk quad - it contains essentially lossless data. These dense quads make good profiling data, since they overload the overhead in the decoder nicely.</p>
<p>Most of the speedups came by fixing 486 CPU stalls. The 486 will "stall" and wait for a cycle if the previous machine instruction is somehow required for the next. Fixing stalls is tricky. Writing stall-free code is almost impossible. You have to think below the assembly-code level and think about how the CPU is dealing with data. </p></blockquote>
<p>An observation regarding the inversely proportional complexities of the encoder and the decoder:</p>
<blockquote><p>Verrry, verry tricky right now. The encoder is now getting as complex as a MPEG encoder, which is making decoding simpler (as opposed to MPEG). Basically cutting down drastically on memory usage per frame for the same rez frame.</p></blockquote>
<p>VQ houses at the time were obsessive about getting ahold of the fastest hardware on the market at the time (things like high-end 486 computers or very early Pentiums):</p>
<blockquote><p>
Encoding is going slowly because if we add machines it crashes. Those super-fast HP systems would make a real difference right now - we need some number crunchers online asap!</p></blockquote>
<p>...and...</p>
<blockquote><p>
Encoding video is one of the few problems you can throw fast hardware at and get results.</p>
<p>Today we got in 3 HP 735/125 systems. They run fast, not just fast in fact, but stinkingly disgustingly fast. Around three times the speed of a 90-MHz Pentium, probably faster if they were not network and disk i/o bound. It's the fastest piece of hardware I've ever used. Between the Envoy and the 735, this has been a great week for new toys. The 735 machines will allow 11h to ship much sooner. </p></blockquote>
<p>And here's the part where he must have added the code to convert the YUV 4:2:0 vectors to RGB24 when unpacking, along with the appropriate copying code:</p>
<blockquote><p>
Tonight I added true 24-bit support for 640-by-320-by-30fps at 24-bit from a 150K/second drive. Of course, you'll need a Pentium system for this to work, but it looks unbelievable. 24-bit support gives us better-than-TV color depth and a picture that finally looks non-blocky, non-pixelly, and about as good as it gets. This makes The 11th Hour the first game with true full-motion video software playback at 30fps and the first 24-bit fmv on a home-class computer.
</p></blockquote>
<p>After all the R&#038;D, these were the final specs decided for the shipping version of <em>The 11th Hour</em>:</p>
<blockquote><p>
At first we supported all the way from 32-bit color down to 8-bit color, but, after surveying the video cards sold, installed bases, and how the game looks, we're only going to support those with a 16-bit or 24-bit DAC at 640-by-320 or 320-by-160.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/vq-case-study-roq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VQ Case Study: Sorenson Video 1</title>
		<link>http://multimedia.cx/eggs/vq-case-study-sorenson-video-1/</link>
		<comments>http://multimedia.cx/eggs/vq-case-study-sorenson-video-1/#comments</comments>
		<pubDate>Wed, 25 Apr 2007 16:37:32 +0000</pubDate>
		<dc:creator>Multimedia Mike</dc:creator>
				<category><![CDATA[Codec Technology]]></category>
		<category><![CDATA[Vector Quantization]]></category>
		<category><![CDATA[Video Codecs]]></category>

		<guid isPermaLink="false">http://multimedia.cx/eggs/vq-case-study-sorenson-video-1/</guid>
		<description><![CDATA[Sorenson Video 1 (SVQ1) makes me sentimental. It had a lot to do with why I started multimedia hacking. Strange that it all seems so simple now. SVQ1 is a stark contrast to our last subject, Cinepak. SVQ1 does not store its codebooks in the encoded video bitstream. Rather, the codebooks are a hardwired characteristic [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://wiki.multimedia.cx/index.php?title=Sorenson_Video_1">Sorenson Video 1</a> (SVQ1) <a href="http://multimedia.cx/eggs/the-star-wars-quicktime-connection/">makes me sentimental</a>. It had a lot to do with why I started multimedia hacking. Strange that it all seems so simple now.</p>
<p>SVQ1 is a stark contrast to our last subject, <a href="http://multimedia.cx/eggs/vq-case-study-cinepak/">Cinepak</a>. SVQ1 does not store its codebooks in the encoded video bitstream. Rather, the codebooks are a hardwired characteristic of the coding scheme. That's actually a really good thing considering that the algorithm is a hierarchical multistage vector quantizer.</p>
<p><span id="more-426"></span></p>
<p><em>"Whoa! Just when I was getting over my aversion to the words 'vector' and 'quantizer' strung together, they throw in 'hierarchical' and 'multistage'."</em> Don't worry, it's also pretty simple even though it sounds complicated. "Hierarchical", in this context, simply means that a block of pixels is broken down into smaller blocks of pixels. In SVQ1, the block size starts at 16x16. The encoder may decide to encode a block at that size, or slice it in half, into 2 16x8 blocks, and code each of those individually. The encoder may further subdivide the blocks to obtain 8x8, 8x4, 4x4, and 4x2 sizes.</p>
<p>The "multistage" word refers to the fact that SVQ1 has several layers, or stages, of codebooks that can be combined in order to reconstruct a given vector. In fact, for each block size from 4x2 up through 8x8, there are 6 stages of codebooks for each size, and each codebook contains 16 vectors.</p>
<p>SVQ1 has one trick up its sleeve that I have never seen used anywhere else so far: Vector <a href="http://wiki.multimedia.cx/index.php?title=Mean_Removal">mean removal</a>. Again, this may sound fancy, but all it means is to average all the numbers in a vector and subtract the average from each number. If all of the numbers in the vector are relatively similar, the mean-removed vector will have rather small components and thus be more efficient to code.</p>
<p>Remember how a major problem in VQ deals with matching vectors from a source image to the best possible vector from a generated codebook? This problem is exacerbated in SVQ1 due to the layers of codebooks that the encoder must evaluate in combination with the hierarchy. The encoder must remove the mean for a vector of samples and encode the mean using the sum of between 0 and 6 vectors from different layers of codebooks, or decide to split the block in half and repeat the process, or find some combination of hierarchical encoding combined with codebook sequences, and then move on to the next vector.</p>
<p>SVQ1 interframes can also take advantage of motion vectors and the residual between a block from the previous frame and the current frame is encoded with the same intraframe hierarchical multistage method.</p>
<p>It all sounds like a lot of trouble to encode, but the reward was that relatively slow machines from the 1998-1999 timeframe could easily play back, e.g., 480x212 video on CPUs that would still struggle with MPEG-type data at a similar resolution, and the files were feasibly distributable via the dialup-dominated internet.</p>
<p>Where do the codebooks come from? As mentioned, the codebooks are a hardcoded component of the SVQ1 coding scheme. I strongly suspect (or would like to think) that the codebooks were constructed as a result of a long and arduous design phase that involved countless pieces of sample media drawn from numerous diverse genres of material. So out of the 2 classic VQ problems (finding optimal codebooks and matching source material to the codebooks), at least that's one less problem that the encoder needs to worry about. I, for one, am thankful for that since SVQ1 is, to date, the only encoder I have written for the <a href="http://ffmpeg.org/">FFmpeg project</a>. Encoding the codebooks would also incur huge overhead in the final encoded bitstream.</p>
<p>SVQ1 has a few weaknesses, to be sure (or "trade-offs" to be neutral about it). I never appreciated the fact the various planes are stored separately. All the Y data is encoded in the stream, followed by all the U data, and then the V data. This means that the entire frame must be decoded before any of it may be processed (e.g., converted to RGB) before display. Owing to its large vector size (16x16) and also its native colorspace (<a href="http://wiki.multimedia.cx/index.php?title=YCbCr_4:1:0">YUV 4:1:0</a>, where a 4x4 block shares one U and one V sample), I can understand why it would be implausible to chop the data into macroblock-type units. However, it might have been nice for the bitstream header to encode relative offsets to the start of the U and V data segments. A decoder could have made use of this to optimize for things like slice dispatch.</p>
<p>Another problem (that I perceive, and I'm pretty sure <a href="http://guru.multimedia.cx/">the guru</a> agrees with me on this point) is that SVQ1 uses a breadth-first algorithm for traversing the vector hierarchy, vs. a depth-first algorithm. It's a technical nitpick, sure, and I can't remember all the details. But I strongly recall that it was a headache for coding.</p>
<p>Through it all, SVQ1 is still one of my favorite codecs.</p>
<p><strong>Further Reading:</strong></p>
<ul>
<li><a href="http://wiki.multimedia.cx/index.php?title=Sorenson_Video_1">Description of the Sorenson Video 1 codec at the MultimediaWiki</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://multimedia.cx/eggs/vq-case-study-sorenson-video-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

