refactoring and your runtime’s performance optimizations

Asked “What is Refactoring?” Martin Fowler answers “Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

Cyrus Najmabadi (developer at Microsoft) writes on his blog Cyrus’ Blather, “… if you perform an “extract method” of some code that sits in a tight loop, and that the jit doesn’t inline, then you’ll have changed the external behavior with respect to its performance. However, there’s no good way to detect this, and most people don’t care. Similarly, if your code makes decisions based on runtime type information (i.e. it uses reflection), then it’s possible to change it’s external behavior because suddenly this alteration of internal structure becomes visible. However, customers have spoken and have stated that they’re fine with these caveats and so we accept them. “. He then goes into the issue of what tools should do if the code that’s being refactored is broken.

I disagree, I think developers do care. It’s just that there *are* no tools that provide such information but true there is “no good way to detect” refactorings that could potentially affect performance. The major problem is with the runtime providers themselves (CLR, Hotspot, J9,…), they just don’t publish the heuristics used in sub systems like JIT compilers. Hotspot and to some extent JRockit expose some of their internal metrics with interfaces like JMX and tools like jvmstat/visualgc but it’s all just runtime info. (for now?) which makes it hard to infer things like compiler heuristics.

What’s clear is that one should certainly watch out for performance when refactoring, as code structure and size matter (eg. PermGen size increase can cause Full GCs in Hotspot, invocation threshold for JIT compilation is a tunable in some runtimes, each JIT has its own inlining heuristics, etc)., especially now that concepts like Refactoring to Patterns are gaining momentum. What’s known is that if not wisely applied, patterns affect code complexity . And code complexity as we all know is second cousin to poor performance.

(edited) JavaOne talk on performance myths

Notes from the Azul Systems’ (presented by Cliff Click) talk.

Java Performance Myths.

[In response to the comment about the original notes being a little too scant, I’ve decided to edit this entry. Sun will provide audio & slides, which I will link to once available.]

. I heard (or googled) that making fields or methods final (or private) will help performance (or it will allow more inlining).
…Wrong. With or without ‘final’ every inlinable methods is inlined by the runtime compilers.

. I heard (or googled) that try/catch block are free (or very expensive).
…The reality is — it depends, in general try to avoid try/catch blocks in tight loops. Don’t use exceptions to end loops, or for null checking on say list traversal. You’d be defeating JIT optimizations, and duplicating the automatic range check.

. I heard (or googled) that using RTTI is better than instance_of (and/or better than v-call (virtual call))
…RTTI (Runtime Type Information –google for samples) is an ugly hack from c++, don’t do it unless you need to squeeze the last 10% of perf. improvement. RTTI wins but it’s too ugly (in the OO sense). Use v-call if you can.
…v-call is more expensive in Hotspot than other VMs, Hotspot implements better subtype checking (efficent switch).
…The bottom line is runtime compilers are optimized for the common patterns of coding out there, so stick to clean design/coding, use OO principles.

v-call : v_call(); // dynamic dispatch here
instance_of:
if( this instanceof Child1 )
((Child1)this).non_v_call();
RTTI: switch( _rtti ) { case 1: // hand-inline Child1 specific ...

. Should I avoid synchronization at all cost?
…The average system today spends 55-110 ns doing uncontented lock/unlock operations, so it’s not free but not terribly expensive either.
…Hotspot’s synchronization operations on Xeons are apparently slow (~275 cycles), IBM/BEA/Azul are much better at it.
…For light contention situations, BEA outperforms the other VMs, Sun performs poorly.
…In general, synchronization is better than bugs! (especially now that’s we’re in the multi-core era). So beware of the costs (and profile if you can) and try to use the new concurrency API in 1.5 but don’t avoid them as threading bugs like race conditions are notoriously hard to fix. Think more about your algorithm.

. I heard (or googled) that I should use object pools say to help the garbage collector, or should I reuse objects or create new ones and assume that the GC is smart enough to efficently take care of the cleanup?
…It depends on cost of initialization of the object and the turnover rates. Don’t do it for small objects, but it may be a possible win for large ones or those with heavy initalization cost (JPanel?). As always, profile — use JConsole or VisualGC.
…Don’t pool objects like Hashtables.

. How much of a performance impact the 5.0 features have on Java code?
…The foreach construct and autoboxing are FREE! (no additional cost incurred). They’re syntactic sugar.
…Note that enums on Xeon with Sun’s Hotspot has issues (more on this in a later post, but it’s when iterating over enums).

.Other predictions/advice
…Pause times are going down, GCs are getting more efficient. Concurrent GC will be the default in most VMs in the coming years, so if possible use them now!
Escape analysis (optimization that can be performed to improve storage allocation and reclamation of objects) is being integrated into VMs, one less reason to pool objects.
Locking is getting cheaper but multi-cores will make it more expensive, a CAS instruction on x86 takes ~200+ cycles.

Technorati Tags: |