Cassandra 1.2 moves internals off-heap

Performance improvements in Cassandra 1.2 | DataStax

Disk capacities have been increasing. RAM capacities have been increasingly roughly in step. But the JVM’s ability to manage a large heap has not kept pace. So as Cassandra clusters deploy more and more data per node, we’ve been moving storage engine internal structures off-heap, managing them manually in native memory instead. 1.2 moves the two biggest remaining culprits off-heap: compression metadata and per-row bloom filters. Compression metadata takes about 20GB of memory per TB of compressed data. Moving this into native memory is especially important now that compression is enabled by default. Bloom filters help Cassandra avoid scanning data files that can’t possibly include the rows being queried. They weigh in at 1-2GB per billion rows, depending on how aggressively they are tuned. Both of these use the existing sstable reference counting with minor tweaking to free native resources when the sstable they are associated with is compacted away.

Advertisements