Sites loading with curl and debugging

This one is a full on face-palm of fun: A client recently redirected one site that we didn't host to one that we do. At some point, Googlebot had indexed their old site and had URLs to files on the site, and fetching those URLs took down our site.

The old site had previously set up a redirect to include the full URL that was requested to a URL that notified the user of the move. For example, RewriteRule ^(.*)$$1 [L,R=301].

Something you have to understand is that procedural PHP used to be a big thing. Many applications written by talented developers used procedural code, especially in times before PHPs OO capabilities came in to serious use (think early PHP 5), so the only real way to do something like MVC was home-grown.

Our site was running a very old version of code that had a check to see, for example, if the requested URL was actually a file. If it wasn't (this is before things like controllers and whatnot), a curl request was made to another URL that contained the requested URL as a parameter. This other URL worked as if it were a controller, performing some different checks, but ultimately falling down to perform a check that made yet another curl request to the same URL so that it could rewrite some links in the content (don't ask, I didn't write it).

Each of these requests was sent from Apache to PHP-FPM with a no-content failure after a 30s timeout. However, our webservers were only set up with 2 PHP-FPM processes as a maximum. The first two requests used up all of the PHP-FPM processes available, leaving the third request blocking. This then triggered failures for anyone else attempting to browse the site (Pingdom, anyone?).

The fix, ignoring the fact that upgrading the code on these old sites to something that doesn't curl against itself multiple times was two-fold:

  1. Anywhere there is a curl to ourselves, make sure to do something like curl_setopt($ch, CURLOPT_TIMEOUT, 3); to prevent the requests from taking very long
  2. Increase the number of PHP-FPM worker processes available to that site.

Overall, we spent approximately 4 hours of developer time trying to debug this issue. It involved some quantum debugging, hair pulling, and exclamations of "why me." In the end, we have another item to tack on to our list of what-not-to-do.