I have a friend who was considering purchasing a Mac Mini recently. At the time of this writing, there are 3 desktop models (and 2 more “server” models).
The cheapest one is a Core i5 2.5 GHz. Then there are 2 Core i7 models: 2.3 GHz and 2.6 GHz. The difference between the latter 2 is US$100. The only appreciable technical difference is the extra 0.3 GHz and the choice came down to those 2.
He asked me which one would be able to play HD video at full frame rate. I found this query puzzling. But then, I have been “in the biz” for a bit too long. Whether or not a computer or device can play a video well depends on a lot of factors.
First of all, looking at the raw speed of the general-purpose CPU inside of a computer as a gauge of video playback performance is generally misguided in this day and age. In general, we have a video standard (H.264, which I’ll focus on for this post) and many bits of hardware are able to accelerate decoding. So, the question is not whether the CPU can decode the data in real time, but can any other hardware in the device (likely the graphics hardware) handle it? These machines have Intel HD 4000 graphics and, per my reading of the literature, they are capable of accelerating H.264 video decoding.
Great, so the hardware supports accelerated decoding. So it’s a done deal, right? Not quite…
Operating System Support
An application can’t do anything pertaining to hardware without permission from the operating system. So the next question is: Does Mac OS X allow an application to access accelerated video decoding hardware if it’s available? This used to be a contentious matter (notably, Adobe Flash Player was unable to accelerate H.264 playback on Mac in the absence of such an API) but then Apple released an official API detailed in Technical Note TN2267.
So, does this mean that video is magically accelerated? Nope, we’re still not there yet…
It’s great that all of these underlying pieces are in place, but if an individual application chooses to decode the video directly on the CPU, it’s all for naught. An application needs to query the facilities and direct data through the API if it wants to leverage the acceleration. Obviously, at this point it becomes a matter of “which application?”
My friend eventually opted to get the pricier of the desktop Mac Mini models and we ran some ad-hoc tests since I was curious how widespread the acceleration support is among Mac multimedia players. Here are some programs I wanted to test, playing 1080p H.264:
- Apple QuickTime Player
- YouTube with Flash Player (any browser)
- YouTube with Safari/HTML5
- YouTube with Chrome/HTML5
- YouTube with Firefox/HTML5
I didn’t take exhaustive notes but my impromptu tests revealed QuickTime Player was, far and away, the most performant player, occupying only around 5% of the CPU according to the Mac OS X System Profiler graph (which is likely largely spent on audio decoding).
VLC consistently required 20-30% CPU, so it’s probably leveraging some acceleration facilities. I think that Flash Player and the various HTML5 elements performed similarly (their multi-process architectures can make such a trivial profiling test difficult).
The outlier was Netflix running in Firefox via Microsoft’s Silverlight plugin. Of course, the inner workings of Netflix’s technology are opaque to outsiders and we don’t even know if it uses H.264. It may very well use Microsoft’s VC-1 which is not a capability provided by the Mac OS X acceleration API (it doesn’t look like the Intel HD 4000 chip can handle it either). I have never seen any data one way or another about how Netflix encodes video. However, I was able to see that Netflix required an enormous amount of CPU muscle on the Mac platform.
The foregoing is a slight simplification of the video playback pipeline. There are some other considerations, most notably how the video is displayed afterwards. To circle back around to the original question: Can the Mac Mini handle full HD video playback? As my friend found, the meager Mac Mini can do an admirable job at playing full HD video without loading down the CPU.
Hmm, mine had a hard time playing 1280×720 H.264 files. It’s 1.42GHz G4 though :)
XBMC has great hardware support now. My 2010 Mac mini can now play anything I throw at it without any frame drops.
You should have told your friend that the 35 USD Raspberry Pi model B rev 2 (with Ethernet and 512MB) decodes 1080p H.264 video (streamed over the network) at real time and he could have saved 865 USD.
@Z.T.: Thanks for the guidance. However, I probably should have stated in the original blog post that playing video was an ancillary concern he was curious about.
This individual’s primary purpose for the Mac Mini purchase was iOS development, which falls well outside the RPi’s charter. :-)
Hm, 20% – 30% CPU sounds like pure software decode, and actually I am assuming you mean 20%-30% of a single core.
20% – 30% CPU of all cores would not only be software decode, it would be bad software decode.
But for hardware decode there is also the difference between decoding directly to the GPU or into main memory.
The latter one in some configurations can be (much) slower than decoding in software, and at least some time ago it was the only method VLC supported.
@Reimar: I think you have a point. I started composing this post over a month ago but just got around to publishing it. I thought I remembered checking the VLC source code at that time and finding that it used the accelerated API. However, I just checked again and I can’t find any reference to VDADecoderCreate(), the first step to using the acceleration API.
Something else I wanted to discuss in the post was a description of different video acceleration strategies. I.e., in addition to offloading decode, apps could do pure software decode but then accelerate scaling and presentation using either YUV overlays and shader-based rendering. I think VLC takes the latter route.
How do you account for Silverlight’s poor performance (as in, pegging a single CPU core, and it’s not even full screen)? I’m guessing pure software decode, software colorspace conversion, and software scaling, all without any SIMD-optimized assistance.
To my knowledge FFmpeg contains a full VDA decoder, so it is possible to use it without being able to find any VDA code in VLC itself. I don’t know about the performance, since that variant ends up copying the video data back from the GPU into main memory.
I could imagine it to be slower than CPU decode in many cases, though it should have lower CPU usage.
I’m not sure about Silverlight. A single core doesn’t sound like it should be enough to do everything without SIMD, unless it is a fairly low resolution video. But they might even be doing some things in .Net – and even though .Net code gets compiled before execution that is likely to make things a good bit slower then. But most likely? I’d say they are using their VC-1 decoder, compiled without any assembly and in fairly bad/unoptimized C, and then push the result into the normal OSX display queue.
But testing if software scaling is involved is fairly easy: does it get a lot slower/faster depending on the output size?