Category Archives: General

Visualizing Call Graphs Using Gephi

When I was at university studying computer science, I took a basic chemistry course. During an accompanying lab, the teaching assistant chatted me up and asked about my major. He then said, “Computer science? Well, that’s just typing stuff, right?”

My impulsive retort: “Sure, and chemistry is just about mixing together liquids and coming up with different colored liquids, as seen on the cover of my high school chemistry textbook, right?”

In fact, pure computer science has precious little to do with typing (as is joked in CS circles, computer science is about computers in the same way that astronomy is about telescopes). However, people who study computer science often pursue careers as programmers, or to put it in fancier professional language, software engineers.

So, what’s a software engineer’s job? Isn’t it just typing? That’s where I’ve been going with this overly long setup. After thinking about it for long enough, I like to say that a software engineer’s trade is managing complexity.

A few years ago, I discovered Gephi, an open source tool for graph and data visualization. It looked neat but I didn’t have much use for it at the time. Recently, however, I was trying to get a better handle on a large codebase. I.e., I was trying to manage the project’s complexity. And then I thought of Gephi again.

Prior Work
One way to get a grip on a large C codebase is to instrument it for profiling and extract details from the profiler. On Linux systems, this means compiling and linking the code using the -pg flag. After running the executable, there will be a gmon.out file which is post-processed using the gprof command.

GNU software development tools have a reputation for being rather powerful and flexible, but also extremely raw. This first hit home when I was learning how to use the GNU tool for code coverage — gcov — and the way it outputs very raw data that you need to massage with other tools in order to get really useful intelligence.

And so it is with gprof output. The output gives you a list of functions sorted by the amount of processing time spent in each. Then it gives you a flattened call tree. This is arranged as “during the profiled executions, function c was called by functions a and b and called functions d, e, and f; function d was called by function c and called functions g and h”.

How can this call tree data be represented in a more instructive manner that is easier to navigate? My first impulse (and I don’t think I’m alone in this) is to convert the gprof call tree into a representation suitable for interpretation by Graphviz. Unfortunately, doing so tends to generate some enormous and unwieldy static images.

Feeding gprof Data To Gephi
I learned of Gephi a few years ago and recalled it when I developed an interest in gaining better perspective on a large base of alien C code. To understand what this codebase is doing for a particular use case, instrument it with gprof, gather execution data, and then study the code paths.

How could I feed the gprof data into Gephi? Gephi supports numerous graphing formats including an XML-based format named GEXF.

Thus, the challenge becomes converting gprof output to GEXF.

Which I did.

Demonstration
I have been absent from FFmpeg development for a long time, which is a pity because a lot of interesting development has occurred over the last 2-3 years after a troubling period of stagnation. I know that 2 big video codec developments have been HEVC (next in the line of MPEG codecs) and VP9 (heir to VP8’s throne). FFmpeg implements them both now.

I decided I wanted to study the code flow of VP9. So I got the latest FFmpeg code from git and built it using the options "--extra-cflags=-pg --extra-ldflags=-pg". Annoyingly, I also needed to specify "--disable-asm" because gcc complains of some register allocation snafus when compiling inline ASM in profiling mode (and this is on x86_64). No matter; ASM isn’t necessary for understanding overall code flow.

After compiling, the binary ‘ffmpeg_g’ will have symbols and be instrumented for profiling. I grabbed a sample from this VP9 test vector set and went to work.

./ffmpeg_g -i vp90-2-00-quantizer-00.webm -f null /dev/null
gprof ./ffmpeg_g > vp9decode.txt
convert-gprof-to-gexf.py vp9decode.txt > ~/bigdisk/vp9decode.gexf

Gephi loads vp9decode.gexf with no problem. Using Gephi, however, can be a bit challenging if one is not versed in any data exploration jargon. I recommend this Gephi getting starting guide in slide deck form. Here’s what the default graph looks like:

Not very pretty or helpful. BTW, that beefy arrow running from mid-top to lower-right is the call from decode_coeffs_b -> iwht_iwht_4x4_add_c. There were 18774 from the former to the latter in this execution. Right now, the edge thicknesses correlate to number of calls between the nodes, which I’m not sure is the best representation.

Continue reading →

Server Move For multimedia.cx

I made a big change to multimedia.cx last week: I moved hosting from a shared web hosting plan that I had been using for 10 years to a dedicated virtual private server (VPS). In short, I now have no one to blame but myself for any server problems I experience from here on out.

The tipping point occurred a few months ago when my game music search engine kept breaking regardless of what technology I was using. First, I had an admittedly odd C-based CGI solution which broke due to mysterious binary compatibility issues, the sort that are bound to occur when trying to make a Linux binary run on heterogeneous distributions. The second solution was an SQLite-based solution. Like the first solution, this worked great until it didn’t work anymore. Something else mysteriously broke vis-à-vis PHP and SQLite on my server. I started investigating a MySQL-based full text search solution but couldn’t make it work, and decided that I shouldn’t have to either.

Ironically, just before I finished this entire move operation, I noticed that my SQLite-based FTS solution was working again on the old shared host. I’m not sure when that problem went away. No matter, I had already thrown the switch.

How Hard Could It Be?
We all have thresholds for the type of chores we’re willing to put up with and which we’d rather pay someone else to perform. For the past 10 years, I felt that administering a website’s underlying software is something that I would rather pay someone else to worry about. To be fair, 10 years ago, I don’t think VPSs were a thing, or at least a viable thing in the consumer space, and I wouldn’t have been competent enough to properly administer one. Though I would have been a full-time Linux user for 5 years at that point, I was still the type to build all of my own packages from source (I may have still been running Linux From Scratch 10 years ago) which might not be the most tractable solution for server stability.

These days, VPSs are a much more affordable option (easily competitive with shared web hosting). I also realized I know exactly how to install and configure all the software that runs the main components of the various multimedia.cx sites, having done it on local setups just to ensure that my automated backups would actually be useful in the event of catastrophe.

All I needed was the will to do it.

The Switchover Process
Here’s the rough plan:

Investigate options for both VPS providers and mail hosts– I might be willing to run a web server but NOT a mail server
Start plotting several months in advance of my yearly shared hosting renewal date
Screw around for several months, playing video games and generally finding reasons to put off the move
Panic when realizing there are only a few days left before the yearly renewal comes due

So that’s the planning phase. BTW, I chose Digital Ocean for VPS and Zoho for email hosting. Here’s the execution phase I did last week:

Register with Digital Ocean and set up DNS entries to point to the old shared host for the time being
Once the D-O DNS servers respond correctly using a manual ‘dig’ command, use their servers as the authoritative ones for multimedia.cx
Create a new Droplet (D-O VPS), install all the right software, move the databases, upload the files; and exhaustively document each step, gotcha, and pitfall; treat a VPS as necessarily disposable and have an eye towards iterating the process with a new VPS
Use /etc/hosts on a local machine to point DNS to the new server and verify that each site is working correctly
After everything looks all right, update the DNS records to point to the new server

Finally, flip the switch on the MX record by pointing it to the new email provider.

Improvements and Problems
Hosting on Digital Ocean is quite amazing so far. Maybe it’s the SSDs. Whatever it is, all the sites are performing far better than on the old shared web host. People who edit the MultimediaWiki report that changes get saved in less than the 10 or so seconds required on the old server.

Again, all problems are now my problems. A sore spot with the shared web host was general poor performance. The hosting company would sometimes complain that my sites were using too much CPU. I would have loved to try to optimize things. However, the cPanel interface found on many shared hosts don’t give you a great deal of data for debugging performance problems. However, same sites, same software, same load on the VPS is considerably more performant.

Problem: I’ve already had the MySQL database die due to a spike in usage. I had to manually restart it. I was considering a cron-based solution to check if the server is running and restart it if not. In response to my analysis that my databases are mostly read and not often modified, so db crashes shouldn’t be too disastrous, a friend helpfully reminded me that, “You would not make a good sysadmin with attitudes like ‘an occasional crash is okay’.”

To this end, I am planning to migrate the database server to a separate VPS. This is a strategy that even Digital Ocean recommends. I’m hoping that the MySQL server isn’t subject to such memory spikes, but I’ll continue to monitor it after I set it up.

Overall, the server continues to get modest amounts of traffic. I predict it will remain that way unless Dark Shikari resurrects the x264dev blog. The biggest spike that multimedia.cx ever saw was when Steve Jobs linked to this WebM post.

Dropped Sites
There are a bunch of subdomains I dropped because I hadn’t done anything with them for years and I doubt anyone will notice they’re gone. One notable section that I decided to drop is the samples.mplayerhq.hu archive. It will live on, but it will be hosted by samples.ffmpeg.org, which had a full mirror anyway. The lower-end VPS instances don’t have the 53 GB necessary.

Going Forward
Here’s to another 10 years of multimedia.cx, even if multimedia isn’t as exciting as it was 10 years ago (personal opinion; I’ll have another post on this later). But at least I can get working on some other projects now that this is done. For the past 4 months or so, whenever I think of doing some other project, I always remembered that this server move took priority over everything else.

Playing With Emscripten and ASM.js

The last 5 years or so have provided a tremendous amount of hype about the capabilities of JavaScript. I think it really kicked off when Google announced their Chrome web browser in September, 2008 along with its V8 JS engine. This seemed to spark an arms race in JS engine performance along with much hyperbole that eventually all software could, would, and/or should be written in straight JavaScript for maximum portability and future-proofing, perhaps aided by Emscripten, a tool which magically transforms C and C++ code into JS. The latest round of rhetoric comes courtesy of something called asm.js which purports to narrow the gap between JS and native code performance.

I haven’t been a believer, to express it charitably. But I wanted to be certain, so I set out to devise my own experiment to test modern JS performance.

Up Front Summary
I was extremely surprised that my experiment demonstrated JS performance FAR beyond my expectations. There might be something to these claims of magnficent JS speed in numerical applications. Basically, here were my thoughts during the process:

There’s no way that JavaScript can come anywhere close to C performance for a numerically intensive operation; a simple experiment should demonstrate this.
Here’s a straightforward C program to perform a simple yet numerically intensive operation.
Let’s compile the C program on gcc and get some baseline performance numbers.
Let’s use Emscripten to convert the C program to JavaScript and run it under Chrome.
Ha! Pitiful JS performance, just as I expected!
Try the same program under Firefox, since Firefox is supposed to have some crazy optimization for asm.js code, allegedly emitted by Emscripten.
LOL! Firefox performs even worse than Chrome!
Wait a minute… the Emscripten documentation mentioned using optimization levels for generating higher performance JS, so try ‘-O1’.
Umm… wow: Chrome’s performance increased dramatically! What about Firefox? Not only is Firefox faster than Chrome, it’s faster than the gcc-generated code!
As my faith in C is suddenly shaken to its core, I remembered to compile the gcc version with an explicit optimization level. The native C version pulled ahead of Firefox again, but the Firefox code is still close.
Aha! This is just desktop– but what about mobile? One of the leading arguments for converting everything to pure JavaScript is that such programs will magically run perfectly in mobile browsers. So I wager that this is where the experiment will fall over.
I proceed to try the same converted program on a variety of mobile platforms.
The mobile platforms perform rather admirably as well.
I am surprised.

The Experiment
I wanted to run a simple yet numerically-intensive and relevant benchmark, and something I am familiar with. I settled on JPEG image decoding. Again, I wanted to keep this simple, ideally in a single file because I didn’t know how hard it might be to deal with Emscripten. I found NanoJPEG, which is a straightforward JPEG decoder contained in a single C file.
Continue reading →

Long Overdue MediaWiki Upgrade

What do I do? What I do? This library book is 42 years overdue!
I admit that it’s mine, yet I can’t pay the fine,
Should I turn it in or should I hide it again?
What do I do? What do I do?

I internalized the forgoing paean to the perils of procrastination by Shel Silverstein in my formative years. It’s probably why I’ve never paid a single cent in late fees in my entire life.

However, I have been woefully negligent as the steward of the MediaWiki software that drives the world famous MultimediaWiki, the internet’s central repository of obscure technical knowledge related to multimedia. It is currently running of version 1.6 software. The latest version is 1.22.

The Story So Far
According to my records, I first set up the wiki late in 2005. I don’t know which MediaWiki release I was using at the time. I probably conducted a few upgrades in the early days, but that went by the wayside perhaps in 2007. My web host stopped allowing shell access and the MediaWiki upgrade process pretty much requires running a PHP script from a command line. Upgrade time came around and I put off the project. Weeks turned into months turned into years until, according to some notes, the wiki abruptly stopped working in July, 2011. Suddenly, there were PHP errors about “Namespace” being a reserved word.

While I finally laid out a plan to upgrade the wiki after all these years, I eventually found that the problem had been caused when my webhost upgraded from PHP 5.2 -> 5.3. I also learned of a small number of code changes that caused the problem to go away, thus kicking the can down the road once more.

Then a new problem showed up last week. I think it might be related to a new version of PHP again. This time, a few other things on my site broke, and I learned that my webhost now allows me to select a PHP version to use (with the version then set to “auto”, which didn’t yield much information). Rolling back to an earlier version of PHP might have solved the problem easily.

But NO! I made the determination that this goes no further. I want this wiki upgraded.

The Arduous Upgrade Path
There are 2 general upgrade paths I can think of:
Continue reading →

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Category Archives: General

Visualizing Call Graphs Using Gephi

Server Move For multimedia.cx

Playing With Emscripten and ASM.js

Long Overdue MediaWiki Upgrade