The KARL project has been focused in the last year on some performance and scalability issues. It’s a reasonably big database, ZODB-atop-RelStorage-atop-PostgreSQL. It’s also heavily security-centric with decent writes, so CDNs and other page caching wasn’t going to help.
I personally re-learned the ZODB lesson that the objects needed for your view better be in memory. Our hoster is 32-bit-only, so that made us learn a little more and think about the tradeoffs. Adding more threads means higher memory usage per process. We get some concurrency as PG requests release the GIL, but that’s hard to bank on. Instead you have a bunch of single-connection processes, but then you get little cache affinity. One bad view leads to 20k objects getting requested. The user hits reload a few times, and your site is hung until PG is finished.
We use memcache as well, but that is also limited to 2 GB. We likely need to spread the cache across a few processes.
We decided to do some timing of various scenarios, with the hardware and dataset that we had. A completely cold PG server (nothing in OS read buffers), a warm PG, stuff in memcache, stuff in the RelStorage local cache, and stuff in the ZODB connection cache. Here is what Shane found:
- 179 seconds when PG has to fetch a lot of the data from disk.
- 21 seconds when PG has all the data in its own cache.
- 6 seconds when memcache has all the pickles.
- 2 seconds when the local client cache has all the pickles.
- 1.2 seconds when the ZODB cache is filled.
We had previously thought that unpickling was a bottleneck. It wasn’t. We then did some research on Python standard library compression at the lowest level. Turns out that we can get decent compression at very high speeds. So we thought about RelStorage’s off-overlooked “local cache”, an in-memory, process-wide pickle cache. By default it is set to a low number.
The local cache had an enticing aspect: the code was tinkerable, as it was under Shane’s control. What if we could play with some ideas? For example, only cache objects under a certain size. Some big PDF might take up the space of thousands of (far more important) catalog objects. RelStorage gained a knob for setting a size threshold. Compression, though, is very interesting. With it we can get many more objects in the cache, and not pay too much of a price.
Both of these (size limit and compression level) are now in RelStorage. It will take a while to decipher the right combination of ZODB connections vs. client cache vs. local cache, and which numbers to up/down before hitting 2 GB range. But we’ve already had a big impact on performance.
And a plug for some other work that the KARL project funded Shane for….packing improvements that went into b2 a couple of days ago, plus perfmetrics decorators that let you spew all kinds of ZODB-oriented stats to graphite/DataDog.
