Gosling leaves Oracle… joins Google

“As to why I left, it’s difficult to answer: just about anything I could say that would be accurate and honest would do more harm than good.”

He soldiered on till the end… then time came to turn off the lights and call it a night… Keep on truckin’, James!

transactional malloc()/free() in Intel’s STM Prototype v3.0?

Why can’t I see a transactional malloc()/free()? – Intel® Software Network

Intel STM Prototype v3.0 does provide transactional malloc() and free(). Any time malloc and free are used in __tm_atomic region or a function marked with tm_callable attribute they are replaced with a transactional safe malloc and free. For functions which are marked with tm_safe attribute, the compiler assumes user guarantees the safety of that function and does not replace the malloc and free. The user has to use his own transactional safe malloc and free in this case.

Oracle says “BEA JRockit JVM will not be available stand-alone… but will continue to be (ed. was it ever?) bundled with Oracle products.”

FAQ

Will Oracle continue to invest in JRockit technology?
Absolutely! JRockit is a strategic product for Oracle and its customers. Oracle will continue to invest in it significantly. Oracle will continue to pay attention to legacy-BEA and Oracle customers needs and customers should expect a strong roadmap for the JRockit JVM and all JRockit products.

Going forward, is JRockit going to be free or a for-charge product?
Consistent with Oracle policies, JRockit Mission Control and JRockit Real Time, which include the JRockit JVM and access to significant value-adds such as operations diagnostics and real-time features, are available for development and evaluation for free. It is also bundled with many other commercial products from Oracle.

… what is so hard about scaling the Twitter service?

Hueniverse: Scaling a Microblogging Service – Part I

The social web is creating demand for new scaling tools and
technologies. Current databases and caching solutions are simply unable
to handle a complex network of multiple relationship between objects.
While databases are still a good solution for persistent storage of
social data, each retrieval requires heavy calculation.

always try things yourself and profile

int64.org » Scalability isn’t everything

… application needed a queue of small objects, and on a modern quad–core CPU the cache misses were hurting performance so much that although a lock–free queue did have near 100% scalability, the overall operation was completing 165% faster with a locked queue with zero scalability.

The next best thing is to combines the best of both worlds: design a queue with low overhead and medium scalability. Using a reader–writer lock with a combination of lock–free operations, I came up with a queue that only needs to do a full lock once every 32 or 64 operations. The result? Scalability 5% lower than a lock–free queue, with overall performance 210% better.

OK, I’ll admit it: I cheated, somewhat. Lock–free algorithms are good for more than just scalability. They also offer immunity to nasty effects like deadlock, livelock, and priority inversion. In my case I wasn’t in a situation to worry about these, but you might be. The lesson here is to know your situation and decide carefully, and don’t trust what others tell you: always try things yourself and profile.

Scaling Audiogalaxy…

Scaling Audiogalaxy to 80 million daily page views | Spiteful.com

For our most heavily accessed data set, we had an extremely good read/write ratio, so we were able to fan out to about 20 slaves from a single master. This particular database had several hundred million rows, which challenged the limits of our hardware (periodically, we had to clean out stale data when it got too large), so one trick we used was index-segmentation. Different sets of slaves had different indexes, and our database access layer could pick a different cluster based on the necessary index. Specifically, the tables in this database generally had an ID and a string, but the index on the string was only necessary for some queries. So, on some slaves we simply didn’t have the string index. This allowed those machines to keep the entire ID index in memory, which was a huge performance boost.

We used sharding to scale our databases in other areas.

A new malloc(3) implementation for FreeBSD 7.0

FreeBSD 7.0-RELEASE Release Notes

A new malloc(3) implementation has been introduced. This implementation, sometimes referred to as “jemalloc”, was designed to improve the performance of multi-threaded programs, particularly on SMP systems, while preserving the performance of single-threaded programs. Due to the use of different algorithms and data structures, jemalloc may expose some previously-unknown bugs in userland code, although most of the FreeBSD base system and common ports have been tested and/or fixed. Note that jemalloc uses mmap(2) to obtain memory and only uses sbrk(2) under limited circumstances (and then only for 32-bit architectures). As a result, the datasize resource limit has little practical effect for typical applications. The vmemoryuse resource limit, however, can be used to bound the total virtual memory used by a process, as described in limits(1).

IBM triples Nutch performance with virtualized j9 JVM

IBM Research | | dgrove | Libra: A Library Operating System for a JVM in a Virtualized Execution Environment

Libra, an execution environment specialized for IBM’s J9 JVM. Libra does not replace the entire operating system. Instead, Libra and J9 form a single statically-linked image that runs in a hypervisor partition. Libra provides the services necessary to achieve good performance for the Java workloads of interest but relies on an instance of Linux in another hypervisor partition to provide a networking stack, a filesystem, and other services. The expense of remote calls is offset by the fact that Libra’s services can be customized for a particular workload; for example, on the Nutch search engine, we show that two simple customizations improve application throughput by a factor of 2.7.

IBM’s Cloneable JVM: JSR 121 implementation on Linux gives sub-5 seconds start up

VEE: VEE ’07, Cloneable JVM: a new …

Java has been successful particularly for writing applications in the server environment. However, isolation of multiple applications has not been efficiently achieved in Java. Many customers require that their applications are guarded by independent OS processes, but starting a Java application with a new process results in a long sequence of initializations being repeated each time. To date, there has been no way to quickly start a new Java application as an isolated OS process. In this paper, we propose a new isolation approach called Cloneable JVM to eliminate this startup overhead in Java. The key idea is to create a new Java application by copying, or cloning, the already-initialized image of the primary JVM process. Since the clone is already initialized, it can begin actual operations immediately as a new isolated process. This cloning abstraction can support new scenarios for Java, such as user isolation and transaction isolation. We implemented a prototype of the Cloneable JVM by modifying a production JVM on Linux, which provides a new API for cloning constructed on the Isolate API defined in JSR 121. Using this cloning API, several Java applications, including a large production J2EE application server, we remodified to demonstrate the isolation scenarios. Evaluations using these prototypes showed that new ready-to-serve Java applications can start up as a new process in less than 5 seconds, which is 4 to 170 times faster than starting these applications from scratch.