Of painting less

It has passed quite some time since my last blog post about optimizing KWin’s performance, so I felt the need to write a new one :) In the meantime KDE SC 4.8 including KWin 4.8 got released with all the features I had described in that blog post, but there are already new optimizations that have landed in the kde-workspace git repository, which I just want to explain shortly:

  • The first thing isn’t even new, it is already part of KWin 4.8 :) It is a window property called _NET_WM_OPAQUE_REGION that allows an application with a translucent window to give the window manager a hint which parts of a window are still opaque. Hence KWin has more room for optimizations. It is part of the ewmh spec and I hope that more applications/styles will adopt it. So far the only style, that I’m aware of, which is using this feature is oxygen-transparent.
  • I ported the TaskbarThumbnail, SlidingPopups and WobblyWindows effects to the faster code path that uses paintSimpleScreen. This was a long overdue step, which I really would like to have had in 4.8, but it needed some nontrivial changes to paintSimpleScreen. The actual painting is now done only in one pass instead of two, such that more window-transforming effects can be ported to utilize this function.
  • In the spirit of optimizing paintSimpleScreen I also tried to cut down the number of all OpenGL calls. E.g. now KWin does no longer emit a single OpenGL call if the damaged part of the window is fully occluded. To achieve this I used apitrace, which by the way really rocks.
  • These days there has finally been added faster repaint support for move/resize events in combination with oxygen-transparent. Before this patch KWin always had to invalidate the blur texture cache of a window if it overlapped with the area of a moving window, although the blurry window might have been below the moving one, which is usually the case. For several blurry windows stacked on top of each other, this meant that moving a window could considerably slow down KWin. At the moment I’m working on porting several window-transforming effects (e.g. WobblyWindows) to use this new and faster method.
That’s it for now :) For all the other cool new features in KWin I may refer you to Martin’s blog.

WordPress and hphp: Part II

In my last post I had described how to circumvent some issues when compiling Wordpess 3.2.1 with Hiphop-Php. Unfortunately it came up that the compiled binary suffered from a memleak which took me quite some time to find and fix.

As it turned out hphp has a regular expression cache which caches every regular expression indefinitely such that clearing the cache is only possible if you shutdown the application. In principle this is not a problem for an application which has only a limited set of static regular expression patterns (which should be the case for most of the applications). But once the regex pattern becomes a runtime option the cache fails. This seems to be due to the fact that hphp compares cacheentries according to their regex-pattern hash and there is no guarantee that two equal dynamically allocated regex-pattern strings have the same hash. In the specific case of WordPress you have the runtime option to specify the date format which is mangled into a regex pattern somewhere inside the mysql2date function.

The obvious workaround is to limit the number of cacheentries. The specific commit can be found in my hiphop-php branch, which as the title says makes the PCRECache a least recently used cache. I strongly recommend those running a hphp-compiled WordPress to apply that patch. Feedback is as always welcome :)

Compiling WordPress with Hiphop-Php

This is a project that I started last weekend and where I just want to share some of the insights I had, because compiling with Hiphop-Php (hphp) is not as straightforward as compiling an application with gcc or clang ;)

The first thing you realize when looking at the github hiphop-php page is that it has a long list of dependencies, which I wanted to reduce to a minimum. So I ended up forking hiphop-php and adjusting it to my needs: it should work with a minimal set of dependencies and it should be easy to deploy. At the moment my list of dependencies, that are not provided by CentOS 5, is down to libevent, curl, oniguruma and libmemcached. I had to sacrifice the ICU SpoofChecker, but as it isn’t used by WordPress this shouldn’t be a problem. Additionally I’ve chosen to use the static library versions of these dependencies, because I compile this stuff in a separate virtual machine and I don’t want to mess with rpath issues.

Once when you get to the point where you have a working hphp and try to compile WordPress 3.2.1 you will notice that the function SpellChecker::loopback won’t compile. Introducing a temporary variable fixes the issue:

$ret = func_get_args();
return $ret;

Now you are at the point where you can compile WordPress :) …., but it won’t work :D Some of the SQL queries will fail and the best workaround I could come up with is to set

$q['suppress_filters'] = true;

in query.php.

So was this all worth it? Given the current viewership numbers of this blog I wouldn’t say so, but it was quite funny :D According to apachebench this blog is now capable to serve 50 request per second instead of 10.

At the end some last remarks about hphp:

  • Using the mentioned approach generates huge binaries, so a normal WordPress blog needs about 40-50 MB. The problem seems to be that some files, especially the dynamic_*.cpp ones, accumulate the references to symbols in other files. This prevents the linker from stripping the unneeded sections, because the compiler by default puts all functions of the same source file into one section. There are compiler flags, namely “-ffunction-section” and “-fdata-section” in combination with the linker flag “-Wl,–gc-sections”, which can change this behavior, but so far I didn’t try.
  • The upstream hphp has some issues with the source files not being present at runtime, see this commit.
  • I personally don’t like the idea to have to execute cmake in the root path of hphp :)

Optimizing KWin 4.8

Since KDE SC is actually in feature freeze I thought it might be a good idea to blog about my contributions to KWin 4.8. As some of you may have noticed it is also my first post to this blog and especially to planetKDE :)

Many of my commits have been optimizing the existing code base, so despite the hopefully increased performance you should not see any changes. Or in other words: this will be a more technical blog post.

Occlusion Culling in KWin

All this started with Martin pointing me to the fact that kwin initially did not process XDamage events window specific, as they are reported by the X server, but rather gathering all events and then updating the corresponding screen region. This could lead to the strange behavior that although your current virtual desktop was empty kwin was busy repainting the background again and again, just because on another virtual desktop your videoplayer was running maximized. Clearly this is a waste of resources.

So the solution was to process the events on a per window basis, which required to change two of the main functions in KWin: paintGenericScreen and paintSimpleScreen. Now one has to know that if the screen gets repainted either one of those functions gets called no matter whether you use OpenGL or XRender for compositing. As a nice side effect this also means that the optimizations described here equally apply to the XRender backend.

  • paintGenericScreen is the general implementation which just draws the window stack bottom to top, doing the preprocessing and the rendering in the same pass. This has the advantage that you can draw every scene, but with the cost that it is not really optimized. Especially fullscreen effects use this code path.
  • paintSimpleScreen is restricted to cases where no window is transformed by an effect. The actual rendering is done in three passes. The first one is the preprocessing pass, where all effects not only get informed about what will be painted but also have the opportunity to change this data (e.g. making the window transparent). The second pass then starts drawing all the opaque windows top to bottom. At last the third pass paints all the remaining translucent parts bottom to top. The most crucial point here is to do a proper clipping when splitting up the rendering process into two passes.

While changing paintGenericScreen was straightforward by just accumulating the damage bottom to top, changing paintSimpleScreen needed a bit more work because of the aforementioned clipping. More precisely in the top to bottom pass one has to gather all the damaged translucent regions while cutting off all the regions that have already been rendered. The last pass then just has to render the remaining damaged translucent area. Or summarized one can say that kwin now implements some kind of occlusion culling.

Blur effect

Nearly everybody who asks in a KDE related chat, why KWin is performing poorly, gets the recommendation to deactivate the blur effect. The good news is that this should no longer be the case in KWin 4.8 :)

The main reason for the poor performance was that the blur effect requires the windows to be painted bottom to top and as such was limited to the unoptimized paintGenericScreen. So in KWin 4.7 not a single frame is painted with paintSimpleScreen if the blur effect is used. Hence my first objective was to port the blur effect to use paintSimpleScreen. Fortunately kwin allows the effects not only to change the painting region in the preprocessing pass but also the clipping area. This way the blur effect can now mimic the paintGenericScreen behavior and control which regions of the screen get painted bottom to top.

But just porting to paintSimpleScreen still was not that satisfactory, mainly because the blur effect still suffered from something I would call an avalanche effect. This was due to the fact that once the blurry region was damaged we had to repaint the whole region, such that a small damaged region could lead to a big repaint event (e.g. a damaged system tray icon forcing KWin to repaint the entire system tray). KWin now avoids this by buffering the blurred background in a texture, which then gets updated partially.

That’s it. :) Last but not least I want to thank Martin Gräßlin, Fredrik Höglund and Thomas Lübking for fruitful discussions and especially for taking the time to review all these changes.

Just another random blog …..

As the blog name suggests, this blog is about some random thoughts of mine. Given the fact that the entropy of these thoughts wouldn’t justify /dev/random, I found it appropriate to call it “/dev/urandom thoughts”.

Those hoping for random number, encryption or information theoretical related blog posts will probably be disappointed because I’m so far not planing any posts about these topics. The main topics I had in mind while setting up this blog are:

  • KDE, especially kwin, related thoughts
  • maybe some physics
  • All the rest I haven’t thought about yet.

I hope you enjoy reading :)

Regards,

Philipp