I have been assisting The Translator in the translation of another mid-1990s adventure game. This one isn’t quite as multimedia-heavy as the last title, and the challenges are a bit different. I wanted to compose this post in order to describe my thought process and mental model in approaching this problem. Hopefully, this will help some others understand my approach since what I’m doing here often appears as magic to some of my correspondents.
High Level Model
At the highest level, it is valuable to understand the code and the data at play. The code is the game’s engine and the data refers to the collection of resources that comprise the game’s graphics, sound, text, and other assets.
Simplistic high-level game engine model
Ideally, we want to change the data in such a way that the original game engine adopts it as its own because it has the same format as the original data. It is very undesirable to have to modify the binary engine executable in any way.
Modifying The Game Data Directly
How to modify the data? If we modify the text strings for the sake of language translation, one approach might be to search for strings within the game data files and change them directly. This model assumes that the text strings are stored in a plain, uncompressed format. Some games might store these strings in a text format which can be easily edited with any text editor. Other games will store them as binary data.
In the latter situation, a game hacker can scan through data files with utilities like Unix ‘strings’ to find the resources with the desired strings. Then, use a hex editor to edit the strings directly. For example, change “Original String”…
0098F800 00 00 00 00 00 00 00 4F 72 69 67 69 6E 61 6C 20 .......Original 0098F810 53 74 72 69 6E 67 00 00 00 00 00 00 00 00 00 00 String..........
…to “Short String” and pad the difference in string lengths using spaces (0x20):
0098F800 00 00 00 00 00 00 00 53 68 6F 72 74 20 53 74 72 .......Short Str 0098F810 69 6E 67 20 20 20 00 00 00 00 00 00 00 00 00 00 ing ..........
This has some obvious problems. First, translated strings need to be of equal our smaller length compared to the original. What if we want to encode “Much Longer String”?
0098F800 00 00 00 00 00 00 00 4D 75 63 68 20 4C 6F 6E 67 .......Much Long 0098F810 65 72 20 53 74 72 00 00 00 00 00 00 00 00 00 00 er Str..........
It won’t fit. The second problem pertains to character set limitations. If the font in use was only designed for ASCII, it’s going to be inadequate for expressing nearly any other language.
So a better approach is needed.
Understanding The Data Structures
An alternative to the approach outlined above is to understand the game’s resources so they can be modified at a deeper level. Here’s a model to motivate this investigation:
Model of the game resource archive format
This is a very common layout for such formats: there is a file header, a sequence of resource blocks, and a trailing index which describes the locations and types of the foregoing blocks.
What use is understanding the data structures? In doing so, it becomes possible to write new utilities that disassemble the data into individual pieces, modify the necessary pieces, and then reassemble them into a form that the original game engine likes.
It’s important to take a careful, experimental approach to this since mistakes can be ruthlessly difficult to debug (unless you relish the thought of debugging the control flow through an opaque DOS executable). Thus, the very first goal in all of this is to create a program that can disassemble and reassemble the resource, thus creating an identical resource file. This diagram illustrates this complex initial process:
Rewriting the game resource file
So, yeah, this is one of the most complicated “copy file” operations that I can possibly code. But it forms an important basis, since the next step is to carefully replace one piece at a time.
Modifying a specific game resource
This diagram shows a simplistic model of a resource block that contains a series of message strings. The header contains pointers to each of the strings within the block. Instead of copying this particular resource block directly to the new file, a proposed modification utility will intercept it and rewrite the entire thing, writing new strings of arbitrary length and creating an adjusted header which will correctly point to the start of each new string. Thus, translated strings can be longer than the original strings.
Further Work
Exploiting this same approach, we can intercept and modify other game resources including fonts, images, and anything else that might need to be translated. I will explore specific examples in a later blog post.
Followup
- Translating Return to Ringworld, in which I apply the ideas expressed in this post.
Didn’t you use any disassembler for the work? You got lucky those packages aren’t encrypted AND/OR compressed, crc-ed etc.
@Bartosz; Indeed, if the strings were not plainly visible, the task would have been a lot harder. Fortunately, games of this period tended not to use a lot of obfuscation in their formats.
For this particular target I didn’t have to do any major reverse engineering because the ScummVM project did all the hard work already. I just had to read the ScummVM source code. I will describe the specifics in a later post.
As an aside, I clicked on your link– do you perform language translations for games on a professional basis?
I’m not doing translations, I’ve worked on many games internals to provide tools for translators and gfx guys, so they can add localized fonts, translated strings, switch images in the packages, all without any source code with lots of reverse engineering and low level programming (adding missing features). So I know the pleasure of discovering game formats and its modification ;)
Too bad you have stopped blogging, I always like this blog.
Oh, I haven’t quit blogging. I just greatly reduced the frequency. I always have plenty of projects going on simultaneously. I don’t get around to posting until I can post something meaty, hopefully when a project or sub-project is coming to fruition.