Fast Infoset

Fast Infoset (FI) is an open, standards-based binary format for the efficient interchange of XML that is based on the XML Information Set (Infoset). In general, Fast Infoset can be used when it is necessary to retain the XML property of self-description (or the structure), and yet boost parsing speed and reduce document size.

Fast Infoset and the Pragmatic SOA Approach

Advertisements

IBM (quietly) releases their JDK5.0 (beta)

{disclaimer: I don’t work for IBM}

IBM has done it again. Yes, they’ve (quietly) released a beta of their JDK5.0 with a brand new version of the famed J9 VM — 2.3 to be exact. (Note: J9 VMs have been around since 1.4.1/2, but IBM’s been timid about it — eg. it wasn’t on by default). [As of this writing only Linux (AMD64, x86, powerpc) and AIX betas are available for download].

J9 (according to this presentation) is a Sun IP-free and “Java Powered” compliant JVM (Harmony anyone?). It’s apparently the work of their Ottawa (OTI) and Toronto labs.

Someone needs to give this release a much more thorough review (arstechnica?) but at first glance it appears to have what it takes to give the competition a run for its money.

Googling around (since there’s not much more than the SDK guide as far as docs go for this release), we found Kevin Stoodley’s (IBM Fellow and CTO in the compiler division) presentation “IBM Java Technology” — (delivered 2003?) — which is mostly about the Testarossa compiler (ships as part of J9). I don’t want to rehash what’s on his slides but the claim is that it uses every trick in the book for optimizing compilers (including those in Sun’s Hotspot like feedback-based dynamic recompilation) and what IBM’s learned about writing compilers over the years.

J9 seems to have support for some sort of “Multi-VM” deployment. The term’s been used and misused by all corners, so it’s not all too clear what they mean by “Multi-VM” (comments welcome!) but according to their “SDK guide” J9 does support “shar[ing] bootstrap and application classes between VMs, by storing them in a cache in shared memory“. This has been available in Sun’s Hotspot since 1.5. The difference is in that Hotspot only support a “set of classes from the system jar file“.

As for as GC algorithms go, this release of J9 supports 3 different strategies (-Xgcpolicy:[optthruput]|[optavgpause]|[gencon]||[subpool]).

  • optthruput” appears to be the classic mark-sweep, “
  • optavgpause “… reduces the time that is spent in these garbage collection pauses” (aka Concurrent GC)
  • the new kid on the block “gencon” which essentially “combine[s] use of concurrent and generational GC to help minimize the time that is spent in any garbage collection pause“.

According to their support folks, “gencon” not only supports “concurrent mark for the Tenured (old) generation” but also parallel (mark-compact) collection of the entire heap during “global collections” (ie. full GCs). This last one, as far collectors go, is what puts J9 in the same league as its competition. “gencon” has fine grain heap-sizing control with no less than 15 different options (see SDK guide). [The debate on whether to expose so many control options continues… ]
Sun’s Hotspot supports all three types of collectors, except that, in the current release of Hotspot (5.0), only the “nursery” is collected in parallel.

J9, according to Stoodley, supports “escape analysis“, which these days, seems to be the hot topic in Java discussion forums. He covers the topic in some detail on his slides, but there’s plently of papers out there describing various approaches. Escape analysis is essentially a technique to find allocations in a method that cannot escape from the method to other threads, and cannot escape from the method to its caller and optimize them away (eg. allocate them on the stack vs. the heap). It’s also used to eliminate superfluous synchronization.

What we’ll need to see is the Eclipse foundation’s performance benchmarks numbers on this JDK (I’ve already tried running it with 3.1.1 and, although startup was a little sluggish, overall responsivness was great). We’ll also need server class benchmark data… see how it measures up to the competition.

[Update – Oct 12]
IBM says this release has “a number of enhancements to GC“. Apparently, the “gencon policy is the only one that employs a Generational structure to the heaps“.

They’ve also fixed some “issues with fragmentation” and improved the Large Object Area (LOA) algorithm and configuration. LOA is an “allocation mechanism designed to optimize the management of large objects by separating them from small ones”.

Chris Bailey, their forum support contact says “if there’s enough interest, I’ll try and organise a series of articles that detail the enhancements that have been put into the Java 5.0 Runtime“. Yes, yes and yes! we want details.