People offered a lot of constructive advice about my recent systematic profiling idea. As in many engineering situations, there’s a strong desire to get things correct at the start while at the same time, some hard decisions need to be made or else the idea will never get off the ground.
Code Coverage
A hot topic in the comments of the last post dealt with my selection of samples for the profiling project. It seems that the Big Buck Bunny encodes use a very sparse selection of features, at least when it comes to the H.264 files. The consensus seems to be that, to do this profiling project “right”, I should select samples that exercise as many decoder features as possible.
I’m not entirely sure I agree with this position. Code coverage is certainly an important part of testing that should receive even more consideration as FATE expands its general test coverage. But for the sake of argument, how would I go about encoding samples for maximum H.264 code coverage, or at least samples that exercise a wider set of features than the much-derided Apple encoder is known to support?
At least this experiment has introduced me to the concept of code coverage tools. Right now I’m trying to figure out how to make the GNU code coverage (gcov) tool work. It’s a bumpy ride.
Memory Usage
I think this project would also be a good opportunity to profile memory usage as well as CPU usage. Obvious question: How to do that? I see that on Linux, /proc/<pid>/status contains a field called VmPeak which is supposed to advertise the maximum amount of memory that the process has allocated. This might be useful if I can keep the process from dying after it has completed so that the parent process can read its status file one last time. Otherwise, I suppose the parent script can periodically poll the file and track the largest value seen. Since this is testing long running processes and I think that, ideally, a lot of necessary memory will be allocated up front, this approach might work. However, if my early FATE memories are correct, the child process is likely to hang around as a zombie until the final status poll(). Thus, check the status file before the poll.
Unless someone has a better idea.
> Unless someone has a better idea.
Run it under valgrind –tool=massif and you’ll get detailed heap stats. –tool=callgrind might give code coverage as well.
There is another tool for valgrind as well, called freya, that should have even better results. But you should ask lu_zero about that as I haven’t followed what it’s supposed to do anyway.
FFmpeg already uses getrusage, you could simply extend that to also use memory usage parts of the result.
Of course that would only work on Linux for the moment.
For Windows GetProcessMemoryInfo would do it.
And getrusage works on far more than Linux I guess, but POSIX only specifies the CPU usage parts.
Actually, I already made a patch. Will send it after testing.
Haven’t used it, but there’s BSD process accounting:
If you say Y here, a user level program will be able to instruct the kernel (via a special system call) to write process accounting information to a file: whenever a process exits, information about that process will be appended to the file by the kernel. The information includes things such as creation time, owning user, command name, memory usage, controlling terminal etc. (the complete list is in the struct acct in ). It is up to the user level program to do useful things with this information.
For automatic stats collection along with the normal tests, having ffmpeg itself print it is by far the simplest solution. You’ll get trouble on the embedded targets otherwise.
I like the getrusage() idea and having FFmpeg print the result. I hope to see it get into the tree soon.
Well, it’s in. Now you only need to send a patch if you want the output on stderr instead of stdout.
For the h.264 clip, instead of Apple trailers, I would obviously use some high profile output of the leading open source encoder, of course :D It’s the one that matters after all. Something made with –preset veryslow (more references, bframes, cabac). It might be a good idea to have one with large resolution but relatively low bitrate and one with relatively high bitrate, to better isolate the speed impact of entropy coding with cabac, maybe.
Separate simpler stream (cavlc) should probably used too…