If you wish to follow along in this example, the target file is:
wmv9dmod.dll, 807032 bytes MD5: 292beecb089f13b70af8e44e5bfefa5cThis file is available on the MPlayer Codec page in the Win32 Codecpack ("lite" version will do).
First, disassemble the file:
scd wmv9dmod.dll > wmv9dmod.dll.txtNext, run the output disassembly through the scd-addresses.pl script:
scd-addresses.pl < wmv9dmod.dll.txt > wmv9dmod-functions.txtThe next step is a little more complicated. This is the part that involves gathering execution data. The general idea is to set the trap flag before the function you wish to monitor and clear it afterwards. On an Intel x86 CPU, doing this will result in a trap interrupt after every instruction. Then, a custom trap handler (acting on SIGTRAP in Linux-land) logs each instruction pointer address to a file.
With some special formatting, a little patience, and a LOT of disk space, the data is ready to be churned through another Perl script, sort-addresses.pl. This script reads in a file containing a list of addresses generated using whatever methods necessary, and one or more files containing function boundaries. The script outputs a series of profiles that show how much time the program spent in each function.
The wmv9dmod.dll file conforms to the Microsoft DirectMedia Object (DMO) API. This particular type of binary module has a very small number of public functions and they typically return structures with pointers to subfunctions within the module that do the real work. Such is the case here as module initialization yields the addresses of ProcessInput() and ProcessOutput(). These sound like the 2 most interesting functions to monitor.
For this experiment, I took 5 Microsoft WMV files, all identical save for the fact they they are encoded at different bitrates: 56K, 128K, 300K, 500K, and 700K. I set up the profiling facilities to trap execution data while decoding the first 4 frames of each type of data. The first frames of this particular video are mostly dark. The thinking here is that the first frame of each file will be a keyframe which will flex certain decoding functions. Meanwhile, the subsequent 3 frames should be interframes with very little different from the first frame and they will exploit different areas of code.
I was surprised by the relative sizes of the profiling output files:
93614956 wmv9-56k.txt 81672616 wmv9-128k.txt 125989596 wmv9-300k.txt 115864856 wmv9-500k.txt 115944496 wmv9-700k.txtMy guess was that progressively higher bitrates would require more and more instructions to decode, but that was not always the case. There might be other factors at work here. Perhaps a slower framerate caused a fade-in sooner on the 56K file vs. the 128K file, so that a subsequent image required more decoding logic in the 56K file.
The next thing that surprised me is that I can't coax Perl into doing quite what I expect. Actually, that is not a big surprise. Until I get the bugs ironed out of this script, I can post the typical results from running ProcessInput() on the first frame. These functions always dominated:
************************************************************************* Profile: ProcessInput(), frame #0 total addresses executed: 6573909 ************************************************************************* (no name): 00440430 -> 004406AF, count = 1175863 (17.8868158959913%) (no name): 0043CDE0 -> 0043F8FF, count = 991800 (15.0869140415543%) (no name): 0044E840 -> 0044EB6F, count = 943158 (14.346988983267%) (no name): 0044EB70 -> 0044ED9F, count = 685402 (10.4260950372145%) (no name): 004A4B20 -> FFFFFFFF, count = 594077 (9.03689114041585%) (no name): 00432840 -> 004333BF, count = 477467 (7.26306068428997%) (no name): 00440870 -> 0044092F, count = 408885 (6.21981533361657%) (no name): 004406B0 -> 0044086F, count = 234320 (3.56439372677657%) (no name): 004334A0 -> 004337FF, count = 228732 (3.47939102899051%) (no name): 00458CC0 -> 0045A45F, count = 186976 (2.84421338962861%) (no name): 00434D30 -> 00434FAF, count = 116100 (1.76607251484619%) (no name): 00432520 -> 0043283F, count = 100977 (1.53602673842914%) (no name): 00440370 -> 0044042F, count = 98678 (1.50105515607229%) (no name): 00440930 -> 00440AAF, count = 85124 (1.29487645782745%) (no name): 004307F0 -> 0043085F, count = 66600 (1.0130958612296%) [...]The fifth function, the one that ranges from 004A4B20 -> 4294967295, represents any execution addresses that did not fall into the other bins. These are the pieces of native support code needed to run the binary module. Eventually, these functions should be represented in the breakdown as well.