WordPress and hphp: Part II

In my last post I had described how to circumvent some issues when compiling Wordpess 3.2.1 with Hiphop-Php. Unfortunately it came up that the compiled binary suffered from a memleak which took me quite some time to find and fix.

As it turned out hphp has a regular expression cache which caches every regular expression indefinitely such that clearing the cache is only possible if you shutdown the application. In principle this is not a problem for an application which has only a limited set of static regular expression patterns (which should be the case for most of the applications). But once the regex pattern becomes a runtime option the cache fails. This seems to be due to the fact that hphp compares cacheentries according to their regex-pattern hash and there is no guarantee that two equal dynamically allocated regex-pattern strings have the same hash. In the specific case of WordPress you have the runtime option to specify the date format which is mangled into a regex pattern somewhere inside the mysql2date function.

The obvious workaround is to limit the number of cacheentries. The specific commit can be found in my hiphop-php branch, which as the title says makes the PCRECache a least recently used cache. I strongly recommend those running a hphp-compiled WordPress to apply that patch. Feedback is as always welcome :)

Compiling WordPress with Hiphop-Php

This is a project that I started last weekend and where I just want to share some of the insights I had, because compiling with Hiphop-Php (hphp) is not as straightforward as compiling an application with gcc or clang ;)

The first thing you realize when looking at the github hiphop-php page is that it has a long list of dependencies, which I wanted to reduce to a minimum. So I ended up forking hiphop-php and adjusting it to my needs: it should work with a minimal set of dependencies and it should be easy to deploy. At the moment my list of dependencies, that are not provided by CentOS 5, is down to libevent, curl, oniguruma and libmemcached. I had to sacrifice the ICU SpoofChecker, but as it isn’t used by WordPress this shouldn’t be a problem. Additionally I’ve chosen to use the static library versions of these dependencies, because I compile this stuff in a separate virtual machine and I don’t want to mess with rpath issues.

Once when you get to the point where you have a working hphp and try to compile WordPress 3.2.1 you will notice that the function SpellChecker::loopback won’t compile. Introducing a temporary variable fixes the issue:

$ret = func_get_args();
return $ret;

Now you are at the point where you can compile WordPress :) …., but it won’t work :D Some of the SQL queries will fail and the best workaround I could come up with is to set

$q['suppress_filters'] = true;

in query.php.

So was this all worth it? Given the current viewership numbers of this blog I wouldn’t say so, but it was quite funny :D According to apachebench this blog is now capable to serve 50 request per second instead of 10.

At the end some last remarks about hphp:

  • Using the mentioned approach generates huge binaries, so a normal WordPress blog needs about 40-50 MB. The problem seems to be that some files, especially the dynamic_*.cpp ones, accumulate the references to symbols in other files. This prevents the linker from stripping the unneeded sections, because the compiler by default puts all functions of the same source file into one section. There are compiler flags, namely “-ffunction-section” and “-fdata-section” in combination with the linker flag “-Wl,–gc-sections”, which can change this behavior, but so far I didn’t try.
  • The upstream hphp has some issues with the source files not being present at runtime, see this commit.
  • I personally don’t like the idea to have to execute cmake in the root path of hphp :)