Category Archives: FATE Server

Good FATE Housekeeping

The front page of FATE may not look all that different, save for some new tests every few days. But I assure you I am doing a lot of work on the backend to improve the whole thing. I finally deployed the optimizations outlined in this post. That put the brakes on the unrestrained growth of the database (2.5 GB in as many months). I also thought of another –ahem– groundbreaking optimization: If the client-side test script compresses the stdout or stderr for a build record and the compressed size is larger than 64K, there’s really no reason to send it over and store it in the database– the database won’t be able to decompress the data block anyway after it is truncated to 64K. This is the case for every single icc build. Intel’s C compiler insists on operating in a pedantic mode that I can’t seem to disable, despite the options documentation. Fortunately, I have been logging total stderr line count for for several weeks now (though I haven’t made it available via the web interface yet).

So what to do about this enormous database? At first, I suspected that all those specious icc stderr blobs had a significant impact. But no: that’s only occupying 13 MB. Still, that useless data is now gone.

The real space pigs are the cumulative stderr/stdout blobs logged prior to my space-saving epiphany:

mysql> SELECT 
  SUM(LENGTH(stdout)), 
  SUM(LENGTH(stderr)) 
  FROM test_result;
+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
|          1337031227 |          1096300800 |
+---------------------+---------------------+

Whoa. So that’s where all the space went. That’s right: ~1.3 GB and ~1.1 GB, respectively. Getting rid of the stderr blobs is pretty straightforward. I don’t care about stderr if the test was successful:

mysql> UPDATE test_result
  SET stderr=NULL
  WHERE test_passed=1;

The stdout blobs are a bit trickier. If the test failed, the stdout is always a keeper, just like stderr. If the test succeeded, I still want the stdout data if the corresponding test spec listed the expected stdout as being NULL– that means the stdout is supposed to be retained for logging purpose. This is presently used for tracking the assorted binaries’ filesizes over time and I don’t want to drop that data. So it seems I need data from another table in order to make this query work. Sounds like a job for an JOIN operation, which didn’t seem to be possible per the raw syntax spec. I’m glad I turned out to be wrong as indicated by this blog post. So I cloned a small test table and went to work prototyping it backwards and forwards, lest I lose 3 months of precious filesize data.

mysql> UPDATE test_result result 
  INNER JOIN test_spec spec
  ON result.test_spec=spec.id
  SET result.stdout=NULL 
  WHERE result.test_passed=1
  AND spec.expected_stdout IS NOT NULL;

Things are a bit more manageable now and I’m happy to report that I have a fighting chance of implementing a regular, automated backup schedule for FATE:

+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
|             2046425 |             4292739 |
+---------------------+---------------------+

I hope you have enjoyed yet another installment of The New Adventures Of The Amazingly Inept DBA.

Feigning Transactions

There is some collateral damage showing up on FATE due to that indexing solution deployed last week, a side effect that should also be obvious to anyone with one or two computer science courses behind them: Indexing a column of a table means that the index must be updated during each insert which makes the insert operation slower. The net effect is that it increases the opportunity that the FATE page cache generator might erroneously report too few tests corresponding to a build record. A previously yellow box (meaning that one or more tests failed) is green but closer inspection reveals only 21/21 tests succeeded.

What is happening is that, when a FATE build/test cycle is complete for a configuration, the script enters a new build record and, if the build succeeded, proceeds to enter a new test result record for each of the (currently) 111 tests. When the page cache update script kicks in on its usual 15-minute interval, a client script might still be inserting build records, leading to a race condition. I mitigate this possibility with the following algorithm:

  query the test result set corresponding to the a build record
  current_count = items in test result set
  while (1):
    query test results again
    new_count = items in test result set
    if current_count == new_count:
      break
    else:
      current_count = new_count
      wait 4 seconds and try again (up to 10 times, then quit the script;
        it'll try from cron again anyway)

The heuristic actually works quite well. However, sometimes the server is extremely bogged down for one reason or another and the insert operations are occurring more than 4 seconds apart, or perhaps the client lost connection before it could enter all the test results.

The proper solution to this would be database transactions. MySQL is not renowned for its transactional support. True, version 5 is supposed to support them, and I am on version 5. But it requires special configuration that I don’t know how to perform and am not sure if I even have the access privileges to set up. But I have determined empirically that transactions are not supported with my current database and configuration (method: create a new table; insert a new record; start a transaction; update the record; query the record from a different session; the record has been updated, ergo, the START TRANSACTION was not honored).

Idea: Feign a transaction by adding a new field to the build record table, a boolean flag to indicate if the record is complete. When the build record is first entered, the flag is cleared. Only after all the test results have been entered will the flag be manually set true. Using this method, FATE will easily be able to find build records that were completed. This has the downside of leaving specious, “zombie” data in the tables and I will probably need to create a process for periodically cleaning the cruft in lieu of proper transaction/rollback support.

A perfect hack solution, I hope. We make do with what we have because MySQL is just so fast and free.

An Object Lesson In Database Optimization

I have a tendency to regard a database engine as a black box. I just formulate my queries and count on the engine to make them fast, somehow. I think this is similar to the faith that people tend to place in language compilers– just write the C code and the compiler will just magically optimize it. And if the code isn’t fast enough, maybe you should use a higher optimization level. Of course, a computer scientist ought to be able to analyze algorithmic running efficiency and spot opportunities for theoretical improvement, rather than relying on the compiler to insert a faster machine instruction here or there.

I started reading up on MySQL optimization strategies. There are a few things to understand about how the database works under the covers, things that are quite intuitive to anyone who has a semester of data structures and algorithms coursework. The FATE database is getting slower as it grows larger. The table growing the fastest is test_result. Each build currently generates 111 new rows, one for each active test specification.

mysql> SELECT COUNT(test_spec) 
       FROM test_result 
       WHERE build_record=6742;
+------------------+
| COUNT(test_spec) |
+------------------+
|              111 |
+------------------+
1 row in set (4.12 sec)

Continue reading

How Many IDs In A Database?

Current snapshot of the FATE database:

And we’re just getting started. This might be construed as either long-term planning or silly paranoia, but I have started to wonder what it would take to overflow the id field of the test_result table. I’m not even sure how large it is. MySQL simply reports the database field as being type “int(11)”. I have read various bits of literature which do not give a definitive answer on just how many bits that is. Worst case, I am assuming 32 bits, signed, with a useful positive range around 2 billion. Suppose I ramp up to around 500 unique tests in the database (hey, with all the individual regression tests yet to be imported, as well as various official conformance suites, that’s actually a fairly conservative estimate) and add 6 more configurations to round out to 20. That means each build/test cycle will generate 500 * 20 = 10000 test results. If there are 10 cycles on an average day, that means 100,000 test results per day and 3 million per month. That would last the 31-bit range for about 715 days, or nearly 2 years.

Okay, I guess I will put off worrying about the implications for the time being. But I still need to revise the test_result table to be more efficient (i.e., quit storing the stdout field if it’s the same as was specified in the test specification).