I have a confession– it has been a very long time since I created any meaningful backup of the FATE database. The thing has been operational for over 8 months now and has collected hordes of data. I already had one adventure with dramatically trimming the data size. But when I try to create a backup, it just goes on and on and on.
I finally thought to look up what MySQL facilities might be able to help me diagnose this. Here’s something: SHOW TABLE STATUS. It turns out that the table that stores the build records is nearly 600 MB large. The table that holds the test results exceeds 230 MB (at least it’s not 2.3 GB like the last time I visited the size issue). The same culprit is to blame– stdout and stderr:
mysql> SELECT
-> SUM(LENGTH(stdout)),
-> SUM(LENGTH(stderr))
-> FROM build_record;
+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
| 424930434 | 165123451 |
+---------------------+---------------------+
The database has evolved before and it’s time for the next evolution, based on the premise that there is almost no reason whatsoever to store the stdout data. I know of one circumstance where it helps– it is useful to read the configure script output to verify configuration options (especially after I update the gcc SVN configurations). And there is definitely a good reason to keep stderr around– if the build fails, the stderr is the first stop for diagnosing the problem. Even if the build succeeds, it should be theoretically possible to compare stderr across builds to find common warnings that should be eliminated (possible future expansion).
So, my solution: Periodically retire old data. In this case, I was planning to retire stdout/stderr on build records from before September. But, well… mistakes were made and I managed to retire ALL the stdout/stderr data for all existing build records, thus further underscoring the need for responsible periodic backups. Eh, I don’t think anyone looked at that data anyway. And the table is nicely compact now:
+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
| 665532 | 665532 |
+---------------------+---------------------+
It’s actually possible to back it up now. A curious observation: MySQL’s SHOW TABLE STATUS still reports a Data_length for this table of 591471720 bytes, which concerned me. But then I spied the Data_free column which reports 588306412 bytes. I think the numbers are related.
I’m still deciding on more efficient policies for the test results table, especially since I have plans to expand to other platforms soon.