September 27th, 2009 by Multimedia Mike
Let’s examine the types of tests I am deploying in the next revision of FATE, their specific syntax, and how they will be executed, both locally and remotely. Read through these specs and see if your idea of how to test FFmpeg is already listed. Otherwise, please leave a comment discussing more tests.
This is a long one… Read the rest of this entry »
Posted in FATE Server | 10 Comments »
September 26th, 2009 by Multimedia Mike
I staged 19 new FATE tests today which will finally push FATE over 300 individual test specs once I activate them tomorrow. The first 12 were a dozen more fidelity range extension (“FRExt”) H.264 conformance vectors. Thanks to Carl Eugen Hoyos for doing the validation on these vectors and informing me of the right command line options to get the output correct.
These are the other tests I entered:
I’d also like the recognize Michael K. once more for his FATE contributions in the testing “other” platforms category. In the last week, he started contributing FATE results for x86_64 variations of both OpenSolaris and OpenBSD.
Posted in FATE Server | Comments Off
September 25th, 2009 by Multimedia Mike
No sooner did I press “Publish” on my last post pertaining to multithreading FFmpeg’s Theora decoder, than did I receive an email from the theora-dev list regarding the release of Theora 1.1.0. It took them many, many years to release the first official version and about 10 months to get the second version out, so congratulations on that matter. This release includes the much-vaunted Thusnelda encoder which is supposed to offer substantial encoding improvements vs. the original 1.0 encoder.
So, fair warning: Be prepared for a new round of “Theora Bests H.264 / HTML5 Video Poised To Conquer Internet” type of stories.
Since I have been doing a bunch of optimizations to the FFmpeg Theora decoder this past week (a.k.a. the Theora decoder that most people actually use), I thought this would be the perfect opportunity to benchmark Theora 1.1 alongside FFmpeg’s decoder. Fortunately, libtheora has an example tool called dump_video that decodes video directly to YUV4MPEG2 format, the same way I was testing FFmpeg’s decoder.
FFmpeg command line:
ffmpeg -threads n -i big_buck_bunny_1080p_stereo.ogg
-f yuv4mpegpipe -an -y /dev/null
Libtheora command line:
dump_video big_buck_bunny_1080p_stereo.ogg > /dev/null
The results (on my Core 2 Duo Mac Mini) were thus:
6:44 - FFmpeg, 1 thread
6:09 - FFmpeg, 2 threads *
4:51 - libtheora 1.1
* multithreaded version isn’t complete yet
Mind you, libtheora’s decoder is singly-threaded and only has basic MMX SIMD optimizations. After seeing libtheora’s relative performance, I think I blacked out. Or maybe I just went to bed since it was so late; it’s sort of a blur. I awoke in a confused stupor wondering what I’m doing wrong in the FFmpeg Theora decoder. Why is it so slow? Actually, I know why– unpack_vlcs(), which continues to dominate profiling statistics. Perhaps the question I should start with is, how does libtheora unpack VLCs so quickly? That’s a good jumping-off point for a future investigation.
Posted in VP3/Theora | 10 Comments »
September 24th, 2009 by Multimedia Mike
As briefly mentioned in my last Theora post, I think FFmpeg’s Theora decoder can exploit multiple CPUs in a few ways: 1) Perform all of the DC prediction reversals in a separate thread while the main thread is busy decoding the AC coefficients (meanwhile, I have committed an optimization where the reversal occurs immediately after DC decoding in order to exploit CPU cache); 2) create n separate threads and assign each (num_slices / n) slices to decode (where a slice is a row of the image that is 16 pixels high).
So there’s the plan. Now, how to take advantage of FFmpeg’s threading API (which supports POSIX threads, Win32 threads, BeOS threads, and even OS/2 threads)? Would it surprise you to learn that this aspect is not extensively documented? Time to reverse engineer the API.
I also did some Googling regarding multithreaded FFmpeg. I mostly found forum posts complaining that FFmpeg isn’t effectively leveraging however many n cores someone’s turbo-charged machine happens to present to the OS, as demonstrated by their CPU monitoring tool. Since I suspect this post will rise in Google’s top search hits on the topic, allow me to apologize to searchers in advance by explaining that multimedia processing, while certainly CPU-intensive, does not necessarily lend itself to multithreading/multiprocessing. There are a few bits here and there in the encode or decode processes that can be parallelized but the entire operation overall tends to be rather serial.
So this is the goal:
…to see FFmpeg break through the 99.9% barrier in the CPU monitor. As an aside, it briefly struck me as ironic that people want
FFmpeg to use as much of as many available CPUs as possible but scorn
the project from my day job
for being quite capable of doing the same.
Moving right along, let’s see what can be done about exploiting what limited multithreading opportunities that Theora affords.
First off: it’s necessary to explicitly enable threading at configure-time (e.g., “–enable-pthreads” for POSIX threads on Unix flavors). Not sure why this is, but there it is.
Read the rest of this entry »
Posted in VP3/Theora | 18 Comments »
September 23rd, 2009 by Multimedia Mike
FFmpeg crossed the 20,000 commit threshold today. Mans captured the distinction when he submitted an ARM NEON optimization for int32_to_float_fmul_scalar(). Does that warrant a prize? Diego presented the statistics:
It took 7 years to get to r10000, but only two more to get to r20000.
FFmpeg is approaching warp 6 :-)
Today was also the day I noticed that YouTube upgraded their backend conversion system somewhere along the line. Nearly 3 years ago, I started poking at YouTube to see what kind of multimedia files it can convert and cataloged my findings at the MultimediaWiki.
Today, I was clicking around on some of my old videos and noticed that this video which came from an Ogg Theora source now looks correct. Actually, according to the comments (and I receive enough between all my videos that I rarely pay attention to any of them), this was working over a year ago.
It’s interesting to note that this means that YouTube/Google keeps all of the source material that users upload. When it was time to recode, they obviously had to go back to the original material.
I found that CSCD, KMVC, 3iv2, ZMBV, and VP6 video codecs all work; Vivo files, Westwood v2 VQAs, Real files with RV40, and the bastardized FLIC files from The Magic Carpet are all fine as well; Wing Commander III MVE files, id CIN files, and Interplay MVE files all transcode with audio but with either missing or glitched video.
Sorry if I seem a bit sentimental about this but it all still amazes me. When I was writing the bulk of the subsystems for all manner of bizarre formats circa 2001-2003, I never could have imagined that there would be a website that would take the weird video formats as input and convert them to a standard video format for anyone to view.
Posted in Open Source Multimedia | 3 Comments »
September 19th, 2009 by Multimedia Mike
Google released the third version of their year-old Chrome browser this past week. This reminded me that they incorporate FFmpeg into the software (and thanks to the devs for making various fixes available to us). Chrome uses FFmpeg for decoding HTML5/video tag-type video and accompanying audio. This always makes me wonder, why would they use FFmpeg’s Theora decoder? It sucks. I should know; I wrote it.
Last year, Reimar discovered that the VP3/Theora decoder spent the vast majority of its time decoding the coefficient stream. He proposed a fix that made it faster. I got a chance to check out the decoder tonight and profile it with OProfile and FFmpeg’s own internal timer facilities. It turns out that the function named unpack_vlcs() is still responsible for 44-50% of the decoding time, depending on machine and sample file. This is mildly disconcerting considering the significant amount of effort I put forth to even make it that fast (it took a lot of VLC magic).
So a function in a multimedia program is slow? Well, throw assembly language and SIMD instructions at the problem! Right? It’s not that simple with entropy decoders.
Reimar had a good idea in his patch and I took it to its logical conclusion: Optimize away the arrows, i.e., structure dereferences. The function insists on repeatedly grabbing items out of arrays from a context structure. Thus, create local pointers to the same array and save a bunch of dereferences through each of the innumerable iterations.
Results were positive– both OProfile and the TSC-based internal counter showed notable improvements.
Ideas for further improvements: Multithreading is all the rage for video decoders these days. Unfortunately, entropy decoding is always a serial proposition. However, VP3/Theora is in a unique position to take advantage of another multithreading opportunity: It could call reverse_dc_prediction() in a separate thread after all the DC coefficients are decoded. Finally, an upside to the algorithm’s unorthodox bitstream format! According to my OProfile reports, reverse_dc_prediction() consistently takes around 6-7% of the decode time. So it would probably be of benefit to remove that from the primary thread which would be busy with the AC coefficients.
Taking advantage of multiple threads would likely help with the render_slice() function. One thing at a time, though. Wish me luck with presenting the de-dereferencing patch to the list.
Posted in Programming, VP3/Theora | 4 Comments »
September 18th, 2009 by Multimedia Mike
Here’s a little project of absolutely no use to anyone (a specialty of mine, as if you didn’t know): Pure Python classes for writing and reading bitstreams. This was just one of those things where I was sitting around wondering what it would take to accomplish, and a cursory Google search didn’t reveal anything useful (though it’s probably out there, in all likelihood), so I sat down and pounded out the code.
To what end? Oh, I don’t know– reimplement FFmpeg in Python; go crazy. Behold brute force bit banging in Python:
Read the rest of this entry »
Posted in Python | 2 Comments »
September 16th, 2009 by Multimedia Mike
Once upon a time, all the way back in 1998, I remember downloading a demo version of BeOS on some kind of live HD partition hosted under Windows. I booted into it twice and couldn’t find a good reason to do it a third time. However, there is that bustling community of developers developing the clone of BeOS named Haiku. This article at Ars Technica leads me to believe that the Haiku OS has reached some kind of development milestone (R1 alpha1).
Of course, this all reminds me that FFmpeg does have 1 or 2 developers who like to make sure that the application still builds and runs on Haiku. But are there any takers for running FATE continuously on Haiku? I installed the ISO image in a VMware session but was unable to connect to a network. I’m a little surprised Haiku doesn’t at least support the VMware network device (or does it? Perhaps I need to manually configure it somehow).
I think I may finally understand the compelling reason to continue supporting gcc 2.95 in FFmpeg: that’s the default one installed in BeOS. This strikes me as odd since BeOS was alleged to be based largely on C++ and gcc’s C++ language support as of 2.95 was known to be less than stellar. Perhaps the OS builders simply limited themselves to a sane subset of the language which could conceivably make Be programming halfway tolerable.
For my part, I’m wondering how to program Haiku/Be in the first place. Haiku is supposed to reimplement Be’s C++ API, but where is that defined? Is O’Reilly’s online Be programming book the last word on the matter? I should check my boxes and see if I still have a giant book of Be that a friend gave me a long time ago for no good reason. He must have gotten the impression I was interested in hacking operating systems or something.
Posted in FATE Server | 4 Comments »
September 15th, 2009 by Multimedia Mike
I have never really figured out what role Reddit plays in the grand scheme of things. But someone over there has taken an interest in figuring out the Treasure Master code system, something on which I have previously hypothesized.
It’s a determined bunch and I’m impressed with the headway they seem to be making. I never had time to get to the bottom of this. I’m eagerly watching to see if they can crack this ancient and useless puzzle.
Posted in Game Hacking | Comments Off
September 1st, 2009 by Multimedia Mike
In reading Ars Technica’s lengthy, thorough review of Apple’s new Snow Leopard, I noticed the addition of screen recording to QuickTime. The screenshots indicate that it is configurable for “medium” and “high” quality. Naturally, I bring this up because I wonder what format the video is saved in. QuickTime’s extensive suite of default video codecs does not include a lossless, screen video-oriented codec (per my recollection). And since the feature is out there, people are going to expect FFmpeg and all of its descendant apps to be able to transcode it.
Posted in Multimedia PressWatch | 5 Comments »