Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Announcing the World’s Worst VP8 Encoder

October 4th, 2010 by Multimedia Mike

I wanted to see if I could write an extremely basic VP8 encoder. It turned out to be one of the hardest endeavors I have ever attempted (and arguably one of the least successful).

Results
I started with the Big Buck Bunny title image:



And this is the best encoding that this experiment could yield:



Squint hard enough and you can totally make out the logo. Pretty silly effort, I know. It should also be noted that the resultant .webm file holding that single 400×225 image was 191324 bytes. When FFmpeg decoded it to a PNG, it was only 187200 bytes.

The Story
Remember my post about a naive SVQ1 encoder? Long story short, I set out to do the same thing with VP8. (I wanted to do the same thing with VP3/Theora for years. But take a good look at what it would entail to create even the most basic bitstream. As involved as VP8 may be, its bitstream is absolutely trivial compared to VP3/Theora.)
Read the rest of this entry »

Posted in Outlandish Brainstorms, VP8 | 4 Comments »

My Own Offline RSS Reader

March 28th, 2010 by Multimedia Mike

My current living situation saddles me with a rather lengthy commute. More time to work on my old Asus Eee PC 701 (still can’t think of a reason to get a better netbook). It would be neat if I could read RSS feeds offline using Ubuntu-based Linux on this thing. But with all the Linux software I can find, that’s just not to be. I think the best hope I had was Google Reader in offline mode using Google Gears. But I couldn’t get it installed in Firefox and the Linux version of Gears doesn’t support the Linux version of Chrome. I did a bunch of searching beside and all I could find were forum posts with similar laments: Offline RSS readers don’t allow you to read things offline. Actually, to be fair, I think these offline RSS readers operate exactly as advertised: They allow you to read the RSS feeds offline. The problem is that an RSS feed doesn’t usually contain much meat, just a title, a synopsis, and a link to the main content. What I (and, I suspect, most people) want in an “offline reader” is a program that follows those links, downloads the HTML pages, and downloads any supporting images and stylesheets, all for later browsing.

I didn’t want to have to reinvent this particular wheel, but here goes.

Here’s the pitch: Create a text file with a list of RSS feeds. Create a Python script that retrieves each. Use Python’s XML capabilities (which I have already had success with) to iterate through each item in an RSS feed. For each item, parse the corresponding link. Fetch the link and parse through the HTML. For each CSS, JS, or IMG reference, download that data as well. Compute a hash of that supporting data and replace the link with that hash. Dump that data in a local SQLite database (you knew that was coming). Dump the modified HTML page into that database as well.

Part 2 is to create a Python-based webserver that serves up this data from a localhost address.

One nifty aspect of this idea is that my Eee PC does not have to do the actual RSS updating. If the relevant scripts and the SQLite database are stored on a Flash drive, the updating process can be run on any system with standard Python.

See Also:

  • Part 2, where I get this idea to a minimally functioning state
  • GhettoRSS, what I eventually called the software when I released it

Posted in Outlandish Brainstorms | 7 Comments »

Designing a Codec For Parallelized Entropy Decoding

February 9th, 2010 by Multimedia Mike

When I was banging my head hard on FFmpeg’s VP3/Theora VLC decoder, I kept wondering if there’s a better way to do it. Libtheora shows us the way in that there obviously must be, since it can decode so much faster than FFmpeg.

In the course of brainstorming better VLC approaches, I wondered about the concept of parallelizing the decoding operation somehow. This sounds awfully crazy, I know, and naturally, it’s intractable to directly parallelize VLC decoding since you can’t know where item (n+1) begins in the bitstream until you have decoded item (n). I thought it might be plausible to make a pass through the bitstream decoding all of the tokens and ancillary information into fixed lenth arrays and then make a second pass in order to sort out the information in some kind of parallelized manner. Due to the somewhat context-adaptive nature of Theora’s VLCs, I don’t know if I can make a go of that.

Why can’t VLC decoding be done in parallel, either through SIMD or thread-level parallelization? I think the answer is clearly, “because the codecs weren’t designed for it”. Here’s the pitch: Create a codec bitstream format where the frame header contains indices pointing into the bitstream. For example, Sorenson Video 1 encodes the Y, U, and V planes in order and could be effectively decoded in parallel if only the frame header encoded offsets to the start of the U and V planes. Similarly, if a Theora frame header encoded offsets to certain coefficient groups in the header, that could facilitate multithreaded coefficient VLC decoding. Some other modifications would need to occur in the encoder, e.g., guaranteeing that no end-of-block runs cross coefficient group boundaries as they often do now.

Taking this a step farther, think about arithmetic/range coding. Decoding a bitstream encoded with such an algorithm notoriously requires a multiplication per value decoded. How about designing the bitstream in such a way that 2 or 4 multiplications can be done in parallel through any number of SIMD instruction sets?

Do these ideas have any plausibility? I have this weird feeling that if they do, they’re probably already patented. If they aren’t, well, maybe this post can serve as prior art.

Posted in Outlandish Brainstorms | 2 Comments »

Better Compression For ISOs

December 19th, 2009 by Multimedia Mike

Pursuant to my project idea of archiving a bunch of obscure, old, CD-ROM-based games, I got an idea about possibly compressing ISO-9660 filesystems more efficiently than could be done with standard lossless compressors such as gzip, bzip2, and lzma. This is filed under “outlandish brainstorms” since I don’t intend to get around to doing this anytime in the near future, but I wanted to put the idea out there anyway.

Game developers throughout the era of optical disc-based games have been notoriously lazy about data efficiency. A typical manifestation of this is when one opens a disc to find dozens or hundreds of sizable, uncompressed WAV audio files. Same goes for uncompressed graphical assets. General lossless compressors are able to squeeze some bytes out of uncompressed audio and video data but specialized algorithms (like FLAC for audio and PNG for images) perform better.

Here’s the pitch: Create a format that analyzes individual files in an ISO filesystem and uses more appropriate compression algorithms. If there’s a 16-bit PCM WAV file, use FLAC to compress it. If there’s an uncompressed BMP file, compress as a PNG file. Stuff them all in a new file with an index and a mapping to their original files.

As an additional constraint, it obviously needs to be possible to reconstruct a bit-exact ISO from one of these compressed images. So the index will need to store information about what sector ranges different files occupied.

The only other attempt I have seen for a specialized ISO compression format is the CISO format used for compressing ISO filesystems for storage on flash memory to be read in PSP units. That format uses zlib to compress sector by sector which has the advantage of being able to mount and use a CISO image in place without decompressing the entire filesystem. That should be possible for this filesystem a well by using a FUSE driver. Mount the filesystem and the driver presents the file tree upon request and decompressed the files on demand.

Many compression algorithms have assorted methods of operation depending on what is appropriate for a particular range of data. This scheme is merely an extension of that principle. I have no idea if this idea would hold water in the general case. But thanks to my archiving brainstorm, I expect I will have quite a lot of data to analyze.

See Also:

Posted in Outlandish Brainstorms | 11 Comments »

FATE Opportunities In The Cloud

August 20th, 2009 by Multimedia Mike

Just a few months ago, I couldn’t see any value in this fad known as cloud computing. But then, it was in the comments for that blog post that Tomer Gabel introduced me to the notion of using cloud computing for, well, computing. Prior to that, I had only heard of cloud computing as it pertained to data storage. A prime example of cloud computing resources is Amazon’s Elastic Compute Cloud (EC2).

After reading up on EC2, my first thought vis-à-vis FATE was to migrate my 32- and 64-bit x86 Linux compilation duties to a beefy instance in the cloud. However, the smallest instance currently costs $73/month to leave running continously. That doesn’t seem cost effective.

My next idea was to spin up an instance for each 32- or 64-bit x86 Linux build on demand when there was new FFmpeg code in need of compilation. That would mean 20 separate instances right now, instances that each wouldn’t have to run very long. This still doesn’t seem like a very good idea since instance computing time is billed by the hour and it’s rounded up. Thus, even bringing up an instance for 8 minutes of build/test time incurs a full billable hour. I’m unclear on whether bring-up/tear-down cycles of, say, 1 minute each, are each billed as separate compute hours, but it still doesn’t sound like a worthwhile solution no matter how I slice it.

A different architecture occurred to me recently: For each new code revision, spin up a new, single core compute instance and run all 20 build/test configurations serially. This would at long last guarantee that each revision is being built, at least for 32- and 64-bit x86 Linux. It’s interesting to note that I could easily quantify the financial cost of an SVN commit– if it took, say, 2-3 hours to build and test the 20 configs, that would amount to $.30 per code commit.

Those aren’t the only costs, though. There are additional costs for bandwidth, both in and out (at different rates depending on direction). Fortunately, I designed FATE to minimize bandwidth burden, so I wouldn’t be worried about that cost. Understand, though, that data on these compute instances is not persistent. If you need persistent storage, that’s a separate service called Elastic Block Store. I can imagine using this for ccache output. Pricing for this service is fascinating: Amazon charges for capacity used on a monthly basis, naturally, but also $.10 per million I/O requests.

This is filed under “Outlandish Brainstorms” for the time being, mostly because of the uncertain and possibly intractable costs (out of my own pocket). But the whole cloud thing is worth keeping an eye on since it’s a decent wager that prices will only decline from this point on and make these ideas and others workable. How about a dedicated instance loaded with the mphq samples archive, iterating through it periodically and looking for crashers? How about loading up a small compute instance’s 160 GB of space with a random corpus of multimedia samples found all over the web (I think I know people who could hook us up) which would start periodically and process random samples from the corpus while tracking and logging statistics about performance?

Posted in Outlandish Brainstorms | 1 Comment »

iOpenAppStoreMetaData

August 2nd, 2009 by Multimedia Mike

For my latest silly little project, I created an offline database containing information about Apple’s iPhone App Store, made it browseable offline with full text search, and even generated some nifty charts and tables about the apps. Wanna see?

Go here for all of the juicy data.

Introduction
Have you ever gotten one little spark of an idea and started to research and prototype it, only to have it snowball into something absolutely unrecognizable in a short period of time? As I write up this idea, I can scarcely remember how I started down the path of creating an offline, web browser-accessible version of Apple’s iPhone App Store. On the way, I learned a bunch about modern web programming and even full text search.

What’s wrong with me anyway? Why can’t I do simple exercises to come up to speed on certain well-established concepts? I think normal people would start out by developing trivial websites showcasing pictures of their cats when trying to get up to speed on modern web development. But no, not me. No, I just have to punish myself by dreaming up the most outlandish scenario in which to learn some technology, purely as an ancillary goal to a bizarre primary focus.

The Pitch
Look, here’s how it happened: Remember, I contribute heavily to this video game database called MobyGames. The database excels in archiving data about arcane systems and obscure, archaic titles. That, and Barbie games, thanks to my tireless efforts. The database has challenges keeping up with all the latest and greatest releases for all kinds of systems. However, this whole iPhone App Store thing is really throwing us a curveball. In just a little over a year, over 13,000 unique titles categorized as “games” have accumulated for the iPhone/iPod Touch system. At the time of this writing, MobyGames has cataloged a meager 111 of these titles.

Read the rest of this entry »

Posted in Outlandish Brainstorms | 10 Comments »

FATE on Twitter?

March 27th, 2009 by Multimedia Mike

For the most part, I haven’t been able to abide the notion of Twitter, where people “micro-blog” little messages of up to 140 characters each. It just seems like a bunch of “look at me” nonsense with no real purpose. But then, I used to think the same about blogs in general until I found a few interesting blogs. Lately, I have finally found a few interesting people to follow on Twitter.


Twitter logo

Then I read mentions of “Twitter clients” and quickly realized that there is an entire software ecosystem around these little messages. Indeed, there is an officially sanctioned HTTP/REST-based API for writing your own client apps.

Naturally, where I’m going with this is: Would it be useful to adapt FATE to post, ahem, “tweets” regarding state transitions (something either broke or got fixed with an individual build)? Would anyone care, i.e., does anyone already actively follow Twitter? I’m getting close to the point where I believe I can implement an email notification system, most likely to a separate mailing list. But this new channel might not be too difficult to implement at the same time. (Actually, I’m still trying to figure out from the documentation whether or not it’s possible to post a new message through the API; I can’t find the right function, or perhaps I just don’t understand all the Twitter-specific jargon yet.)

This is a little more outlandish, but as I was looking at a list of tweets today, I suddenly wondered about the possibility of sending encrypted messages through such a channel. I’m not the only person who was curious. This person beat me to the brainstorm, even going so far as to hack up a proof of concept that encodes a message of arbitrary length into multiple tweets.

Posted in FATE Server, Outlandish Brainstorms | 7 Comments »

Escape From HappyFaceLand

March 4th, 2009 by Multimedia Mike

I delved deep into my personal programming archives and was reminded of the brief stint I served at my dream job as a video game developer. The game I worked on was entitled Escape From HappyFaceLand. The stories you have no doubt heard about the game industry are true– Ridiculous hours, breakneck development cycles, struggles with arcane gaming technology, haphazard coding just to get something that barely works in time to meet an artificial deadline, only to be promptly forgotten about, even though the game is an unqualified success at release time.

Read the rest of this entry »

Posted in Outlandish Brainstorms | 6 Comments »

The Parallelized Elephants In The Room

February 23rd, 2008 by Multimedia Mike

I think it’s time to face up to the fact that this whole parallelization fad is probably not going to go away. There was a recent thread of ffmpeg-devel regarding the possibility of ‘porting’ FFmpeg to something called the Nvidia Tesla. This discussion rekindled a dormant interest I have regarding what optimization possibilities could be in store for the Cell processor on board the Sony PlayStation 3, and whether there should be effort directed toward making FFmpeg capable of using such features.

SPE Element SPE Element
SPE Element PPE Element PPE Element SPE Element
SPE Element SPE Element

I finally took some time to read through many of the basic and advanced tutorials on offer and finally have a feel for what the system is set up to do. Unfortunately, it’s not always clear what these parallel architectures are capable of, a situation only exacerbated by vague, impenetrable marketing materials. Too many people confuse the Cell architecture with a homogenous multiprocessor environment, as is common today, which is simply not the case. In order to take advantage of the machine’s full power, an app has to be written with a special awareness of the fact that the Cell has a primary core (PPE) and 6 little helper coprocessors (SPEs), as is half-heartedly illustrated above. The PPE is a dual-threaded general-purpose CPU (64-bit PowerPC) and can do anything. Meanwhile, each SPE is essentially another 64-bit PPC that has its own pool of 256 kilobytes of memory (LS) and a special memory controller (MFC) that coordinates contact with the outside world. To take advantage of the SPEs, the PPE has to load programs into their memory space and tell them to execute the code. The Cell also features DMA facilities to efficiently shuttle data between main memory and the SPEs’ local memory, and there are mailbox facilities and interrupts to facilitate communication between the PPE and the SPEs.

I don’t know about a general parallelized architecture for FFmpeg that would take advantage of multiple architectures like Cell and Tesla (because I still can’t figure out how Tesla is supposed to work). However, in a media playback application, it might be possible to assign one SPE the task of decoding perceptual audio. Another SPE might be performing inverse transform operations for a video codec, while another SPE does postprocessing and yet another handles YUV -> RGB conversion. On the opposite end, it seems reasonable that SPEs could be put to work at tasks like motion estimation for video encoding.

Would this qualify as a Google Summer of Code project for FFmpeg? There is precedent for this– see “Development assistant for the ‘Ghost’ audio codec” which was essentially a lab rat for Monty’s (of Vorbis fame) newer audio coding ideas. Fortunately, a prospective student would not require a PS3 for this project; just a Linux machine. For it seems that IBM has a freely downloadable tool called the Cell Simulation Environment. I’m still working on getting the program running (it’s distributed as an RPM and is most happy on a Red Hat system).

I am a little surprised that there is not a PS3 Media Center project, in the spirit of the Xbox Media Center, at least not that I have been able to locate via web searches. I have been pondering the technical plausibility of such an endeavor. It almost seems as though the PS3 gives the guest OS just enough of a confined playground environment that it can’t possibly blossom into a reasonably high-end enviroment. While real-time video playback must be possible, is it possible to run at, say, full 1080p resolution at 30 fps? With all of the processing power, I trust that the Cell can handle any kind of video decoding, though I heard an unsubstantiated rumor once that it takes the PPE and 4 SPEs to decode HD H.264 video from a Blu-Ray disc. The PS3′s native HD player would have a slight advantage since it would presumably use the video hardware’s full feature set, which likely allows the PS3 to pass through raw 12-bit YUV data to be handled by the video hardware, in one way or another. In Linux under the hypervisor, you basically get to play with a big RGB frame buffer. That means that not only to you have to convert YUV -> RGB, but you also have to shuffle 2.5x as much raw video data to the video memory for each frame. That works out to upwards of 250 MB of data shuffling each second ((1920 * 1080 pixels/frame) * (4 bytes/pixel) * (30 frames/second)). I have read conflicting sources about whether it’s possible for Linux under the PS3 hypervisor to DMA data from main RAM to system RAM. Some sources contend that there is work ongoing while other sources claim that this feature was fixed in later firmware revisions (i.e., no longer possible).

One possible dealbreaker in the proposal to use the PS3′s guest OS mode to install Linux and a general purpose media player is that, from everything I have read, the hypervisor only allows the guest OS to output stereo audio. This might be a long shot, but perhaps it would be possible to transcode super-stereo (more than 2 channels) audio to Dolby Pro Logic II to be sent out to a capable decoder module. Hey, it’s sort of like true surround sound.

If you are interested in the hard technical details of running Linux on a PlayStation 3 and programming its Cell Processor, this directory at kernel.org seems to be fairly authoritative on the matter. The latest iteration of the tech documents (dated 2008-02-01) are here.

Posted in Outlandish Brainstorms | 8 Comments »

Revenge Of PAVC

November 18th, 2007 by Multimedia Mike

Remember my old PAVC idea? I have been thinking about it again. As a refresher, this idea concerns efficiently and losslessly compressing RGB video frames output from an emulator for early video game systems such as the Nintendo Entertainment System (NES) and the Super NES. This time, I have been considering backing off my original generalized approach and going with a PPU-specific approach. (PPU stands for picture processing unit, which is what they used to call the video hardware in these old video games systems.) Naturally, I would want to start this experiment (again) with my favorite — nay — the greatest video game console of all time, the NES. Time for more obligatory, if superfluous, NES screenshots.


Little Samson (NES) game map
Little Samson
, all-around awesome game

Here’s the pitch: Modify an emulator (I’m working with FCE-Ultra) to dump PPU data to a file. Step 2 is to take that data and run it through a compression tool. What kind of data would I care about for step 1? On the first frame, dump out all of the interesting areas of the PPU memory space. This may sound huge, but it is only about 9-12 kilobytes, depending on the cartridge hardware. Also, dump the initial states of a few key PPU registers that are mapped into the CPU’s memory space. As the game runs, watch all of these memory and register values and log changes. This really isn’t as difficult as it sounds since FCEU already cares deeply when one of these values changes. When something changes, mark it as “dirty” and dump that value during the next scanline update.

With that data captured, the next challenge is to compress it. I am open to suggestions on how best to encode this change data. I would hope that we could come up with something a little better than shoving a frame of change data through zlib.

Decompression and playback would entail unraveling whatever was performed in step 2 above. Then, the decoder simulates the NES PPU by drawing scanline by scanline, and applying state change data between scanlines.

What are the benefits to this approach? Ideally, I am aiming not only for lossless compression, but for better compression than what is ordinarily achieved with the large files distributed via BitTorrent and coordinated at tasvideos.org. When I first started investigating this idea over 2 years ago, MPEG-4 variants were still popular for compressing the videos. These days, H.264 seems to have taken over, which performs much better, even on this type of data (allegedly, H.264′s 4×4 transform allows for lower artifacts on sharp edge data such as material from old video game consoles).


Sword Master (NES)
Sword Master
, mediocre game with great graphics

There are also some benefits from the perspective of NES purists. The most flexible NES emulators allow the user to switch palettes in order to get one that is “just right”. A decoder for this type of data could offer the same benefits.

Of course, an encoder is not much use without an analogous decoder that end users can easily install and use. I think this is less of an issue due to the possibility of creating a decoder in Flash or Java.

Posted in Outlandish Brainstorms, PAVC | 7 Comments »

« Previous Entries