I admit that a major issue I had with launching the second iteration of my build/test infrastructure was getting over my severe phobia of web database programming. I did a little web database programming circa 2001 but lost the thread shortly thereafter. Meanwhile, in the intervening years, I have read story after IT horror story about database-driven websites being subverted with something called SQL injection attacks. However, the problem is not quite as scary once you understand the key causes and what language features are in place to prevent abuse.
I now recognize that my web database programming from 2001 was probably vulnerable to a variety of SQL injections. I should dig up those old PHP scripts for an audit.
For the uninitiated, an SQL injection occurs when user input is passed to an SQL query without proper (or, indeed, any validation). If your program constructs an SQL query that looks like:
SELECT this, that FROM some_table WHERE field="$user_input"
At issue is that the user might enter data such as ‘ “; DROP DATABASE <well_known_database_name>;’. I.e., prematurely end the query and insert malicious commands. Now, most database installations these days have default counter-measures in place that protect against such severe consequences (like not allowing stacked statements as illustrated; it’s also a basic security consideration to restrict the script’s SQL privileges so it can’t drop the database or tables). However, more subtle attacks are still possible. The pedagogical example is to input something similar to ‘ ” OR ” ” = ” ‘, which would complete the query as:
SELECT this, that FROM some_table WHERE field=" " OR " " = " "
Since the final condition is always true, all rows in the table will be selected, thus subverting the query and possibly exposing too much information. The classical solution to this problem is to escape quotes in user input, so that the original query would become:
SELECT this, that FROM some_table WHERE field=" \" OR \" \" = \" "
which would likely be safer.
Still paranoid about these issues, I designed my new build/test reporting PHP script with numerical arguments only; no strings. Guess what? That doesn’t help. In fact, I successfully launched an SQL injection against my own script as follows:
- http://builds.multimedia.cx/index.php?machine=2 executes the following query: “SELECT * FROM machine WHERE id=${_GET[‘machine’]}”; thus, it returns record #2, corresponding to x86_32/Linux.
- Manually hacking the URL to read http://builds.multimedia.cx/index.php?machine=2+OR+1=1 creates a query of “…WHERE id=2 OR 1=1” which selects all rows and the first one returned happens to be record #1, corresponding to powerpc/Linux.
I have since fixed the injection, and others, by verifying that the user input field is 100% numeric. But it’s just that easy to have an SQL injection. It really played havoc on certain other queries such as the build record query where every single test result could be returned and printed.
I will, however, need to write some PHP-backed forms that drop text data into a database field, so the first type of SQL injection will become an issue. PHP includes a function called mysql_escape_string() that is supposed to cure all of your string input worries. Unfortunately, it is deprecated due to security issues. Then there is mysql_real_escape_string(). I was about to say that this function is also deprecated, but I can’t find any reference to that now. However, if you examine the example code in the linked manual page, you will notice that it’s only one piece of the overall sanitization process.
Other languages have nifty functions that get around this issue. Perl and Python are able to construct queries that have parameters. I can’t remember the proper name for this, but in Python, it allows one to write statements such as:
cursor.execute("INSERT INTO table (id, user_data) VALUES ( %s, %s )", (id, big_scary_user_input_blob))
Perl’s API does something similar but with ‘?’ characters instead of ‘%s’, as memory serves. The difference (per my understanding, or perhaps per my supposition) is that rather than constructing a new string to pass to the SQL server, these higher level APIs use these placeholders to tell the database, “Here’s a memory pointer to the data; please read ‘n’ bytes from that pointer and put them in your table.” The input could have all the quotes, newlines, and other tainted characters it likes; the database won’t even notice (it’s not the database that notices in the case of an injection, actually; it’s the scripting language that is interpreting the string).
PHP doesn’t seem to have anything comparable to these placeholders. If it did, they would probably be the recommended solution to SQL injections rather than a hodge-podge of regular expressions and sometimes-deprecated security functions. I originally wanted to write the CGI reporting scripts in Python. However, my ISP did not have the MySQLdb-python module installed. Even if they did install it, Python CGI scripts have to run out of cgi-bin/ off the main directory which would undermine the friendliness of visiting http://builds.multimedia.cx/ as a top-level page.
One last story: The first time I discovered a web injection attack was when I was at university in 1997. The web was still young but I had read a little about injection attacks. Our school’s Unix servers, to which we all had logins, had a command line directory program which allowed us to look up data about other students. There was also a web/CGI version of this program whose output looked awfully familiar. In fact, it looked as if the form just took user input and passed it straight to the Unix command line program. Sure enough, I was able to enter the URL:
http://biguni.edu/tools/cgi-bin/lookup.cgi?name=mike+melanson%0d%0a/bin/ls
And lo, I saw my student information, plus a listing of the directory that held the script. Fortunately, I knew people on the school’s IT staff and reported it. The problem was fixed eventually. Hopefully, lessons were learned as well.
Hi…
I’m not sure your understanding of how Python can handle “tainted” input is correct.
Regarding, “The difference (per my understanding, or perhaps per my supposition…”: as far as i know all ‘cursor.execute(…)’ does is escape the string parameters (using a “known good” algorithm) and then include them in the SQL string supplied to the DBMS.
–Phil.
Thanks for the input. It would be good to get this straightened out before Google decides that this is a top authoritative article on the matter. :-)
mysql_real_escape_string really is the official escaping function. There’s also an official library called PEAR which has parameterized queries.
(mysql_real_escape_string is my second-favorite PHP function name after date_sunrise)
In the case of python, the “%s” part actual varies by database module. I think there are something four different possibilities. The MySQL module uses “%s”, while the sqlite module uses “?”. I wish they would have just settled on one.
The proper name for using placeholders like that is “prepared statements”.
http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html
Thanks, Corey– that’s exactly the term I was searching for.