Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Pushing Projects to Github

February 16th, 2012 by Multimedia Mike

I finally got around to importing some old projects into my Github account. I guess it’s good to have a backup out there in the cloud.

GhettoRSS
https://github.com/multimediamike/GhettoRSS
I describe this as a true offline RSS reader. Technically, it’s arguably not a true offline RSS reader. Rather, it does what most people actually want an offline RSS reader to do.

I wrote this about 2 years ago when I had a long daily train ride with a disconnected netbook. I quickly learned that I couldn’t count on offline RSS readers simply because most RSS feeds to not contain much meat. Thus, I created a program that follows URLs in RSS feeds, downloads web pages and supporting images and CSS files, and caches them in an offline database which can be read via a local web browser.

I wrote more information about this little project 2 years ago (here is part 1 and here is part 2). I fixed a few bugs in preparation for posting it but I probably won’t work on this anymore since I don’t have any use for it (the commute is long gone, but I didn’t even use it when I was commuting because I decided I just didn’t care enough to read the feeds on the train).

xbfuse
https://github.com/multimediamike/xbfuse
This is a FUSE module for mounting Xbox/360 optical disc filesystems. Here is when I first discussed it. The tool has had its own little homepage for a long time. This tool has seen some development, as I learned from Googling for “xbfuse”. Regrettably, no one who has modified the tool has ever contacted me about it (at least, not that I can recall). This is unfortunate because the patches I have seen floating around which fix my xbfuse for various installations usually boil down replacing many occurrences of an include path in the autotool-generated build system. There is probably a simpler, cleaner fix.

gcfuse
https://github.com/multimediamike/gcfuse
Written prior to xbfuse, this is a FUSE module for mounting GameCube optical disc filesystems. I first discussed this here and here. This tool has not seen too much direct development although someone eventually used it as the basis for WiiFuse which, as you can predict, mounts optical disc filesystems from Nintendo Wii games.

Posted in Game Hacking, Python | No Comments »

Samples RSS And Flashback Samples

December 21st, 2011 by Multimedia Mike

I made good on my claim that I would create an RSS feed for the samples repository.

Here is the link to the samples RSS feed [ http://samples.mplayerhq.hu/samples-rss.xml ]. Also, here is the Python source code I threw together for the task.

I just want to check: I’m not the only person who still relies on RSS these days, right? The tech press has been cheerfully proclaiming its demise for some time now. But then, they have been proclaiming the same for Adobe Flash as well.

I’m no expert in RSS. If you have any suggestions for how to improve the features presented in the feed, please let me know. And, of course, keep the samples coming. This script should help provide more visibility for a broader audience.

Mario and Flashback Samples
Thanks to LuigiBlood who sent in some samples that allowed me to test out my new script for automatically syncing the repositories and updating the samples RSS feed. First, there are CPC multimedia files from the Japanese 3DO port of Flashback: The Quest for Identity. Then, there is an Interplay MVE file on the CD version of Mario Teaches Typing in which the video doesn’t decode correctly.

LuigiBlood also sent in another file from the latter game. It’s big and has the extension .AV. It could be a multimedia file as it appears to have a palette and PCM audio inside. But there’s no header and I’m a bit unsure about how to catalog it.

Posted in Game Hacking, Python | 14 Comments »

Basic Video Palette Conversion

August 19th, 2011 by Multimedia Mike

How do you take a 24-bit RGB image and convert it to an 8-bit paletted image for the purpose of compression using a codec that requires 8-bit input images? Seems simple enough and that’s what I’m tackling in this post.

Ask FFmpeg/Libav To Do It
Ideally, FFmpeg / Libav should be able to handle this automatically. Indeed, FFmpeg used to be able to, at least at the time I wrote this post about ZMBV and was unhappy with FFmpeg’s default results. Somewhere along the line, FFmpeg and Libav lost the ability to do this. I suspect it got removed during some swscale refactoring.

Still, there’s no telling if the old system would have computed palettes correctly for QuickTime files.

Distance Approach
When I started writing my SMC video encoder, I needed to convert RGB (from PNG files) to PAL8 colorspace. The path of least resistance was to match the pixels in the input image to the default 256-color palette that QuickTime assumes (and is hardcoded into FFmpeg/Libav).

How to perform the matching? Find the palette entry that is closest to a given input pixel, where “closest” is the minimum distance as computed by the usual distance formula (square root of the sum of the squares of the diffs of all the components).



That means for each pixel in an image, check the pixel against 256 palette entries (early termination is possible if an acceptable threshold is met). As you might imagine, this can be a bit time-consuming. I wondered about a faster approach…

Lookup Table
Read the rest of this entry »

Posted in General, Python | 14 Comments »

A Better Process Runner

December 31st, 2010 by Multimedia Mike

I was recently processing a huge corpus of data. It went like this: For each file in a large set, run 'cmdline-tool <file>', capture the output and log results to a database, including whether the tool crashed. I wrote it in Python. I have done this exact type of the thing enough times in Python that I’m starting to notice a pattern.

Every time I start writing such a program, I always begin with using Python’s commands module because it’s the easiest thing to do. Then I always have to abandon the module when I remember the hard way that whatever ‘cmdline-tool’ is, it might run errant and try to execute forever. That’s when I import (rather, copy over) my process runner from FATE, the one that is able to kill a process after it has been running too long. I have used this module enough times that I wonder if I should spin it off into a new Python module.

Or maybe I’m going about this the wrong way. Perhaps when the data set reaches a certain size, I’m really supposed to throw it on some kind of distributed cluster rather than task it to a Python script (a multithreaded one, to be sure, but one that runs on a single machine). Running the job on a distributed architecture wouldn’t obviate the need for such early termination. But hopefully, such architectures already have that functionality built in. It’s something to research in the new year.

I guess there are also process limits, enforced by the shell. I don’t think I have ever gotten those to work correctly, though.

Posted in Python | 4 Comments »

FFmpeg and Code Coverage Tools

August 21st, 2010 by Multimedia Mike

Code coverage tools likely occupy the same niche as profiling tools: Tools that you’re supposed to use somewhere during the software engineering process but probably never quite get around to it, usually because you’re too busy adding features or fixing bugs. But there may come a day when you wish to learn how much of your code is actually being exercised in normal production use. For example, the team charged with continuously testing the FFmpeg project, would be curious to know how much code is being exercised, especially since many of the FATE test specs explicitly claim to be “exercising XYZ subsystem”.

The primary GNU code coverage tool is called gcov and is probably already on your GNU-based development system. I set out to determine how much FFmpeg source code is exercised while running the full FATE suite. I ran into some problems when trying to use gcov on a project-wide scale. I spackled around those holes with some very ad-hoc solutions. I’m sure I was just overlooking some more obvious solutions about which you all will be happy to enlighten me.

Results
I’ve learned to cut to the chase earlier in blog posts (results first, methods second). With that, here are the results I produced from this experiment. This Google spreadsheet contains 3 sheets: The first contains code coverage stats for a bunch of FFmpeg C files sorted first by percent coverage (ascending), then by number of lines (descending), thus highlighting which files have the most uncovered code (ffserver.c currently tops that chart). The second sheet has files for which no stats were generated. The third sheet has “problems”. These files were rejected by my ad-hoc script.

Here’s a link to the data in CSV if you want to play with it yourself.

Using gcov with FFmpeg Read the rest of this entry »

Posted in FATE Server, Python | 10 Comments »

Brute Force Dimensional Analysis

July 14th, 2010 by Multimedia Mike

I was poking at the data files of a really bad (is there any other kind?) interactive movie video game known simply by one letter: D. The Sega Saturn version of the game is comprised primarily of Sega FILM/CPK files, about which I wrote the book. The second most prolific file type bears the extension ‘.dg2′. Cursory examination of sample files revealed an apparently headerless format. Many of the video files are 288×144 in resolution. Multiplying that width by that height and then doubling it (as in, 2 bytes/pixel) yields 82944, which happens to be the size of a number of these DG2 files. Now, if only I had a tool that could take a suspected raw RGB file and convert it to a more standard image format.

Here’s the FFmpeg conversion recipe I used:

 ffmpeg -f rawvideo -pix_fmt rgb555 -s 288x144 -i raw_file -y output.png

So that covers the files that are suspected to be 288×144 in dimension. But what about other file sizes? My brute force approach was to try all possible dimensions that would yield a particular file size. The Python code for performing this operation is listed at the end of this post.

It’s interesting to view the progression as the script compresses to different sizes:



That ‘D’ is supposed to be red. So right away, we see that rgb555(le) is not the correct input format. Annoyingly, FFmpeg cannot handle rgb5[5|6]5be as a raw input format. But this little project worked well enough as a proof of concept.

If you want to toy around with these files (and I know you do), I have uploaded a selection at: http://multimedia.cx/dg2/.

Here is my quick Python script for converting one of these files to every acceptable resolution.

work-out-resolution.py:
Read the rest of this entry »

Posted in Game Hacking, Python | 13 Comments »

Multiprocess FATE Revisited

June 25th, 2010 by Multimedia Mike

I thought I had brainstormed a simple, elegant, multithreaded, deadlock-free refactoring for FATE in a previous post. However, I sort of glossed over the test ordering logic which I had not yet prototyped. The grim, possibly deadlock-afflicted reality is that the main thread needs to be notified as tests are completed. So, the main thread sends test specs through a queue to be executed by n tester threads and those threads send results to a results aggregator thread. Additionally, the results aggregator will need to send completed test IDs back to the main thread.



But when I step back and look at the graph, I can’t rationalize why there should be a separate results aggregator thread. That was added to cut down on deadlock possibilities since the main thread and the tester threads would not be waiting for data from each other. Now that I’ve come to terms with the fact that the main and the testers need to exchange data in realtime, I think I can safely eliminate the result thread. Adding more threads is not the best way to guard against race conditions and deadlocks. Ask xine.



Read the rest of this entry »

Posted in FATE Server, Python | 4 Comments »

Monster Battery Power Revisited

May 27th, 2010 by Multimedia Mike

So I have this new fat netbook battery and I performed an experiment to determine how long it really lasts. In my last post on the matter, it was suggested that I should rely on the information that gnome-power-manager is giving me. However, I have rarely seen GPM report more than about 2 hours of charge; even on a full battery, it only reports 3h25m when I profiled it as lasting over 5 hours in my typical use. So I started digging to understand how GPM gets its numbers and determine if, perhaps, it’s not getting accurate data from the system.

I started poking around /proc for the data I wanted. You can learn a lot in /proc as long as you know the right question to ask. I had to remember what the power subsystem is called — ACPI — and this led me to /proc/acpi/battery/BAT0/state which has data such as:

present:                 yes
capacity state:          ok
charging state:          charged
present rate:            unknown
remaining capacity:      100 mAh
present voltage:         8326 mV

“Remaining capacity” rated in mAh is a little odd; I would later determine that this should actually be expressed as a percentage (i.e., 100% charge at the time of this reading). Examining the GPM source code, it seems to determine as a function of the current CPU load (queried via /proc/stat) and the battery state queried via a facility called devicekit. I couldn’t immediately find any source code to the latter but I was able to install a utility called ‘devkit-power’. Mostly, it appears to rehash data already found in the above /proc file.

Curiously, the file /proc/acpi/battery/BAT0/info, which displays essential information about the battery, reports the design capacity of my battery as only 4400 mAh which is true for the original battery; the new monster battery is supposed to be 10400 mAh. I can imagine that all of these data points could be conspiring to under-report my remaining battery life.

Science project: Repeat the previous power-related science project but also parse and track the remaining capacity and present voltage fields from the battery state proc file.

Let’s skip straight to the results (which are consistent with my last set of results in terms of longevity):



So there is definitely something strange going on with the reporting– the 4400 mAh battery reports discharge at a linear rate while the 10400 mAh battery reports precipitous dropoff after 60%.

Another curious item is that my script broke at first when there was 20% power remaining which, as you can imagine, is a really annoying time to discover such a bug. At that point, the “time to empty” reported by devkit-power jumped from 0 seconds to 20 hours (the first state change observed for that field).

Here’s my script, this time elevated from Bash script to Python. It requires xdotool and devkit-power to be installed (both should be available in the package manager for a distro).
Read the rest of this entry »

Posted in Python, Science Projects | 1 Comment »

My Own Offline RSS Reader (Part 2)

March 29th, 2010 by Multimedia Mike

About that “true” offline RSS reader that I pitched in my last post, I’ll have you know that I made a minimally functioning system based on that outline.

These are the primary challenges/unknowns that I assessed from the outset:

  1. Manipulating relative URLs of supporting files
  2. Parsing HTML in Python
  3. Searching and replacing within the HTML file
  4. Downloaded .js files that include other .js files

For #1, Python’s urlparse library works wonders. For #2 and #3, look no farther than Python’s HTMLParser module. This blog post helped me greatly. I have chosen not to address #4 at this time. I’m not downloading any JavaScript files right now; the CSS and supporting images are mostly adequate.

Further, it turned out not to be necessary to manually build an XML parser. Whenever I encountered a task that felt like it was going to be too much work — like manually parsing the XML feeds using Python’s low-level XML systems — a little searching revealed that all the hard work was already done. In the case of parsing the RSS files, the task was rendered trivial thanks to FeedParser.

Brief TODO list, for my own reference:

  • Index the database tables in a sane manner
  • Deal with exceptions thrown by malformed HTML
  • Update the post table to indicate that a post has been “read” when it is accessed
  • Implement HTTP redirection (since some RSS feeds apparently do that)
  • Implement cache control so that the browser will properly refresh feed lists
  • Add a stylesheet that will allow the server to control the appearance of links depending on whether or not the posts have been read
  • Take into account non-ASCII encoding (really need to train myself to do this from the get-go)
  • Forge user agent and referrer strings in HTTP requests, for good measure
  • Slap some kind of UI prettiness on top of the whole affair; I’m thinking an accordian widget containing tables might work well and I think there are a number of JavaScript libraries that could make that happen

Once I get that far, I’ll probably put some code out there. Based on what I have read, I’m not the only person who is looking for a solution like this.

I eventually released this software. Find it on Github.

Posted in Python | 7 Comments »

Process Runner Redux

November 12th, 2009 by Multimedia Mike

Pursuant to yesterday’s conundrum of creating a portable process runner in Python for FATE that can be reliably killed when exceeding time constraints, I settled on a solution. As Raymond Tau reminded us in the ensuing discussion, Python won’t use a shell to launch the process if the program can supply the command and its arguments as a sequence data structure. I knew this but was intentionally avoiding it. It seems like a simple problem to break up a command line into a sequence of arguments– just split on spaces. However, I hope to test metadata options eventually which could include arguments such as ‘-title “Hey, is this thing on?”‘ where splitting on spaces clearly isn’t the right solution.

I got frustrated enough with the problem that I decided to split on spaces anyway. Hey, I control this system from top to bottom, so new rule: No command line arguments in test specs will have spaces with quotes around them. I already enforce the rule that no sample files can have spaces in their filenames since that causes trouble with remote testing. When I get to the part about testing metadata, said metadata will take the form of ‘-title “HeyIsThisThingOn?”‘ (which will then fail to catch myriad bugs related to FFmpeg’s incorrect handling of whitespace in metadata arguments, but this is all about trade-offs).

So the revised Python process runner seems to work correctly on Linux. The hangaround.c program simulates a badly misbehaving program by eating the TERM signal and must be dealt with using the KILL signal. The last line in these examples is a tuple containing return code, stdout, stderr, and CPU time. For Linux:

$ ./upr.py 
['./hangaround', '40']
process ID = 2645
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

The unmodified code works the same on Mac OS X:

$ ./upr.py
['./hangaround', '40']
process ID = 94866
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

Now a bigger test: Running the upr.py script on Linux in order to launch the hangaround process remotely on Mac OS X via SSH:

$ ./upr.py 
['/usr/bin/ssh', 'foster-home', './hangaround', '40']
process ID = 2673
timeout, sending TERM
[143, '', '', 50]

So that’s good… sort of. Monitoring the process on the other end reveals that hangaround is still doing just that, even after SSH goes away. This occurs whether or not hangaround is ignoring the TERM signal. This is still suboptimal.

It would be possible to open a separate SSH session to send a TERM or KILL signal to the original process… except that I wouldn’t know the PID of the remote process. Or could I? I’m open to Unix shell magic tricks on this problem since anything responding to SSH requests is probably going to be acceptably Unix-like. I would rather not go the ‘killall ffmpeg’ route because that could interfere with some multiprocessing ideas I’m working on.

Here’s a brute force brainstorm: When operating in remote-SSH mode, prefix the command with ‘ln -s ffmpeg ffmpeg-<unique-key>’ and then execute the symbolic link instead of the main binary. Then the script should be able to open a separate SSH session and execute ‘killall ffmpeg-<unique-key>’ without interfering with other processes. Outlandish but possibly workable.

Posted in FATE Server, Python | 12 Comments »

« Previous Entries