PHP, MySQL, and persistent connections

15 January 2008

This is a pretty long story, so buckle in...

One of the reporting setups I'm in charge of here at work involves a trio of machines, let's call them LXA, WDX, and WSX. WDX is my local box, where my development happens; WSX hosts a reasonably high-usage (by internal standards) MySQL database with some pertinent statistics; LXA is a Linux box which, among other things, hosts the Apache/MySQL setup for the whole gig.

The report itself consists of a bare-bones HTML framework, with AJAX posting data into the skeleton. Two separate scripts provide different datasets, one refreshing once per minute and the other every ten seconds. In development, our this setup worked flawlessly. WDX connected to WSX via mysql_connect(...) and dumped the data, returning in ~250ms; the display was fresh and there was much joy. This success in hand, we pushed "live" (internally) to LXA and were confronted by an immediate 20s latency on server returns. No biggie — ruins the point of 10-second refreshes, but it's still bearable and can easily be attributed to other loads on the server.

Time goes by, summer turns to autumn turns to winter, and we forget about the issues. Two weeks ago, I receive a panicked e-mail, “[reporting system] is broken! Nothing's showing up!” It's nothing mission-critical, so I'm not terribly concerned, but people seem to enjoy the numbers, so I decide to be kind and look into it. I fire up the reporting tool and — FireBug is a beautiful thing — discover that my little 20s latencies have grown to 60s, and valid XML has become empty server timeouts.

mysqld-nt is showing 99% CPU consistently, so we kill it and restart. We heave a quick sigh of relief, until a few hours later we're back in troubletown. Rinse and repeat a few times, and it's apparent we've got an issue. Using Win32::Process::Info (link), we whip up a quick Perl script to diagnose the problem; in the process, we discover that usage jumps around lunchtime, and eventually subsides from 99% after everyone has gone home. Coincidence, much?

Armed with this knowledge, we send out the obvious e-mail, “Please refrain from using [reporting system] unless absolutely necessary.” Lo and behold, everything is fine. For a day. I should state that I attack MySQL from the PHP end, and not the other way around; that's why we're still trying to fix the webserver end at this point. We swap out all of the mysql_* calls in the PHP with their MySQLi equivalents, and notice a 25% (give or take) improvement. Sadly, not enough improvement to actually work.

We finally begin traversing the MySQL documentation, and hit upon the (obvious if you know it) command show processlist; which, unsurprisingly, lists active database connections and their states. Here's a cap of the results during this period:

+----+----------------------+----------------+---------+------------------+
| Id | User                 | Host           | Command | Info             |
+----+----------------------+----------------+---------+------------------+
|  2 | write                | localhost:2085 | Query   | show processlist |
| 13 | unauthenticated user | LXA:41333      | Connect | NULL             |
| 14 | unauthenticated user | LXA:41334      | Connect | NULL             |
| 15 | unauthenticated user | LXA:41337      | Connect | NULL             |
| 16 | unauthenticated user | LXA:41339      | Connect | NULL             |
| 17 | unauthenticated user | LXA:41341      | Connect | NULL             |
| 18 | unauthenticated user | LXA:41342      | Connect | NULL             |
| 21 | unauthenticated user | LXA:41344      | Connect | NULL             |
| 22 | unauthenticated user | LXA:41349      | Connect | NULL             |
| 23 | unauthenticated user | LXA:41350      | Connect | NULL             |
| 24 | unauthenticated user | LXA:41354      | Connect | NULL             |
| 25 | unauthenticated user | LXA:41355      | Connect | NULL             |
| 27 | unauthenticated user | LXA:41356      | Connect | NULL             |
| 28 | unauthenticated user | LXA:41357      | Connect | NULL             |
| 29 | unauthenticated user | LXA:41358      | Connect | NULL             |
| 30 | unauthenticated user | LXA:41359      | Connect | NULL             |
| 31 | unauthenticated user | LXA:41360      | Connect | NULL             |
| 32 | unauthenticated user | LXA:41361      | Connect | NULL             |
+----+----------------------+----------------+---------+------------------+
18 rows in set (0.00 sec)

(edit: scrubbed a little bit to remove unneeded information) What? 16 connections stuck in Connect state? We test the LXA - WSX link: ping is <1ms. Hm. We find that all queries are executing near-instantly (as close as expected). Is the problem only that the connections won't take? Some quick research into this revealed no known solution, but I expect to be proven wrong at some point (help!).

After fumbling for a bit, we found the mysql_pconnect(...) function; it's not hidden, by any means, but it's not really advertised. What mysql_pconnect(...) offers is a persistent database connection, which is hand-wavingly claimed to be unique by host, username, and password. If you're not familiar with persistent database connections, here's a primer: every time you connect to a database, a certain overhead is required. If the database is on the same machine, or even on a small network, the overhead is generally minimal and not much of a worry. If, however, the connection overhead is high, there is a possible advantage to never throwing out old connections: allowing them to persist (hence, “persistent”). There are definitely risks to this scheme; PHP actually recommends against it:

Using persistent connections can require a bit of tuning of your Apache and MySQL configurations to ensure that you do not exceed the number of connections allowed by MySQL.

And in their section on Persistent Database Connections has this little gem of direction:

People who aren't thoroughly familiar with the way web servers work and distribute the load may mistake persistent connects for what they're not. In particular, they do not give you an ability to open 'user sessions' on the same link, they do not give you an ability to build up a transaction efficiently, and they don't do a whole lot of other things. In fact, to be extremely clear about the subject, persistent connections don't give you any functionality that wasn't possible with their non-persistent brothers.

That said, our specific situation — high connection time with no other symptoms, and an otherwise little-used database server — seemed ripe for this solution. The implementation is identical to that of the original PHP MySQL API, so we rolled back the MySQLi changes (yes, it's incompatible; no, that makes no sense) and ran %s/mysql_connect/mysql_pconnect/g in vi. We quickly pushed the code live, as it couldn't get much worse, and latency was reduced from 90s to 0.2s. WOW.

Excited, we ran show processlist; on WSX:

+-------+-------+----------------+---------+------------------+
| Id    | User  | Host           | Command | Info             |
+-------+-------+----------------+---------+------------------+
|     7 | write | localhost:2453 | Sleep   | NULL             |
| 97744 | write | localhost:2954 | Query   | show processlist |
| 98986 | read  | LXA:49417      | Sleep   | NULL             |
| 99000 | read  | LXA:51757      | Sleep   | NULL             |
| 99002 | read  | LXA:51770      | Sleep   | NULL             |
| 99034 | read  | LXA:36885      | Sleep   | NULL             |
| 99040 | read  | LXA:36921      | Sleep   | NULL             |
| 99042 | read  | LXA:36924      | Sleep   | NULL             |
| 99080 | read  | LXA:59473      | Sleep   | NULL             |
| 99094 | read  | LXA:46164      | Sleep   | NULL             |
| 99102 | read  | LXA:46306      | Sleep   | NULL             |
| 99124 | read  | LXA:43139      | Sleep   | NULL             |
| 99196 | read  | LXA:59307      | Sleep   | NULL             |
| 99248 | read  | LXA:48019      | Sleep   | NULL             |
| 99427 | read  | LXA:48323      | Sleep   | NULL             |
| 99433 | read  | LXA:48387      | Sleep   | NULL             |
+-------+-------+----------------+---------+------------------+
16 rows in set (0.00 sec)

Wait, what? PHP says the connection should be reused if it's the same host (check), user (check), and password (check). After some discussion, we add some IP logging to the database records (to avoid the nasty Apache logfile, which we don't have access to anyway), and discover 7 simultaneous viewers of the report. Aha! 7 viewers times 2 AJAX scripts per viewer == 14 connections! With the scripts refreshing every 10s - 60s, and connections taking >60s to create, it must be that a whole new persistent connection was created for each view (as there were none in cache). We place the XML on standby, restart the MySQL server, and run a one-time script to create a database link from LXA to WSX. One connection in place, we bring the XML back and wait with baited breath...

Immediately there are 13 new connections. What? This one took some time, and I'm still not 100% on it: PHP reuses the database link if it can find one that's not already in use. Because these scripts are executing so closely, the database link is always occupied by another viewer; there will be as many database links as viewers!

In our case, this doesn't matter. With a small numbers of viewers, it's no problem — and if it is, we can work out a caching scheme. Using mysql_pconnect(...) for database link creation happened to solve all of our symptoms, but we have still failed to address any of the root issues.

For now, we're happy and in green pastures. But still, we're concerned (not enough to put an actual DBA on it, though; if time is money, DBA time is money and a half). Any thoughts on the root issue?

Included \(\LaTeX\) graphics are generated at LaTeX to png or by MathJax.

contemporary entries

comments

Opening and closing connections can have significant impact on database performance. I think, PHP layer is scalable because you can just add processes. Each process does not store anything. But, it also causes the problem that each process must open its own database connection for persistence.

Then, I hit upon a solution followed by Postgres, IBM etc. Make a middleware that simply forwards queries and results. But, this middleware must open only a limited number of connections to the database. Since this is just a forwarding process, it does not get loaded heavily even with thousands of incoming connections. The SQL commands can be send to a free database connection. If the number of incoming connections are more than db connections, SQL queries can be queued and executed on a first come first serve basis.

Thomas George (24 February 2008, 2:16am)
Hi Thomas,

Interesting. Is there any accepted (read: open-source) implementation of that, or is it just something to code up internally? Is there anything that might cause that to lag considerably?
Kyle (13 March 2008, 2:48am)
Hello!

My site is a very high traffic consuming site. My webhosts say that it is consuming a lot of CPU resource. Will usage of mysql_pconnect improve it?

Thanks.
AURO (19 April 2008, 3:54am)
Hi Auro,

There's a chance, but it's more likely to be some unoptimized queries than latent connect times (in my non-professional opinion). If you can't monitor DB load -- if you don't have access to the underlying shell -- then you might be able to wrap some timing loops around some common queries, and watch how they perform.

Then again, you can always try _pconnect, but that's just one possible problem of out many.

K
Kyle (5 May 2008, 7:47pm)

Sorry, further commenting on this post has been disabled. For more information, contact me.