{"id":45,"date":"2005-03-13T18:20:14","date_gmt":"2005-03-14T01:20:14","guid":{"rendered":"\/?p=45"},"modified":"2006-05-18T10:47:46","modified_gmt":"2006-05-18T17:47:46","slug":"automated-memory-excavation","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/automated-memory-excavation\/","title":{"rendered":"Automated Memory Excavation"},"content":{"rendered":"<p>Sometimes, multimedia programs are modular which facilitates reverse engineering. But what if they are not modular or are just standalone programs that do one thing well?<\/p>\n<p><!--more--><\/p>\n<p>Many multimedia program architectures are highly modular in order to allow easy expandability for new codecs and container formats. Since the modules conform to well-defined interfaces, usually exporting one or more functions from a standard API, such modular platforms provide an excellent jumping-off point for reverse engineering. But what happens when you encounter a standalone player where all of the decoding code is hidden in there somewhere amidst megabytes of unrelated code? Or, in the case of Apple&#8217;s QuickTime Player, many of the core codecs are rolled up into one mega-DLL?<\/p>\n<p>In situations like this, I like to use a chase-the-data-through-the-buffers strategy. I will need to come up with a better name for that. It operates on the following simple logical premise:<\/p>\n<ul>\n<li>Somewhere in this big, binary module, there is the entry point to a decoding function, a decoding function that I want.<\/li>\n<li>One of the parameters to that decoding function will need to be a pointer to encoded data.<\/li>\n<li>Question: Where does that data come from? Answer: On disk.<\/li>\n<li>Solution: Intercept the data when it is read from the disk.<\/li>\n<\/ul>\n<p>I once had access to a debugger that could be made to stop when a particular area of memory was read from. I would put a breakpoint on the standard Windows ReadFile() function. I would wait until I recognized the buffer of data being read from the file (since I had the sample file open in a hex editor in a different window). Then I would put a read breakpoint over the entire data buffer which would lead me straight to the core decoding function.<\/p>\n<p>I do not have access to that debugger anymore and it seems to be a fairly rare feature in modern debuggers. Besides, using debuggers just encourages one to step through the code. I want cool software tools that do tedious work for me.<\/p>\n<p>So, let&#8217;s brainstorm a new program. Operating on a similar logical premise as above, feed it a buffer pattern you wish to match. The pattern would be, for example, the first 16 bytes of the first keyframe of a file that contains data in a format that is still only decodable by a binary codec. The program would then set the x86 CPU trap flag, so that every instruction causes a breakpoint handler to be invoked, and the custom breakpoint handler would search memory for that pattern.<\/p>\n<p>Seems like a big task. I think I would break it down like this:<\/p>\n<ul>\n<li>wait for data to be moved into a register<\/li>\n<li>if new data in register appears to be a reasonable data memory location (need some heuristics on this one), start doing a pattern match starting at that address<\/li>\n<li>log to a file every match (likely more than one occurrence), and the address of the mov instruction that placed the data buffer into a register<\/li>\n<li>log to a separate file the last memory location that was about to be checked&#8211; use this intelligence on subsequent runs after the profiling program has accessed illegal memory<\/li>\n<\/ul>\n<p>Nutty, and not something I am likely to get started on soon.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How about a tool that searches for a pattern through memory as a program is running?<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-45","post","type-post","status-publish","format-standard","hentry","category-reverse-engineering"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/45","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=45"}],"version-history":[{"count":0,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/45\/revisions"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=45"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=45"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=45"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}