High Performance Networking in Google Chrome

High Performance Networking in Google Chrome – igvita.com

Chrome’s multi-process architecture carries important implications for how each network request is handled within the browser. Under the hood, Chrome actually supports four different execution models that determine how the process allocation is performed. By default, desktop Chrome browsers use the process-per-site model, that isolates different sites from each other, but groups all instances of the same site into the same process. However, to keep things simple, let’s assume one of the simplest cases: one distinct process for each open tab. From the network performance perspective, the differences here are not substantial, but the process-per-tab model is much easier to understand. The architecture dedicates one render process to each tab, which itself contains an instance of the WebKit open-source layout engine for interpreting and layout out the HTML (aka, “HTML Renderer” in the diagram), an instance of the V8 JavaScript engine, and the glue code to bridge these and a few other components. If you are curious, the Chromium wiki contains a great introduction to the plumbing.

Cassandra 1.2 moves internals off-heap

Performance improvements in Cassandra 1.2 | DataStax

Disk capacities have been increasing. RAM capacities have been increasingly roughly in step. But the JVM’s ability to manage a large heap has not kept pace. So as Cassandra clusters deploy more and more data per node, we’ve been moving storage engine internal structures off-heap, managing them manually in native memory instead. 1.2 moves the two biggest remaining culprits off-heap: compression metadata and per-row bloom filters. Compression metadata takes about 20GB of memory per TB of compressed data. Moving this into native memory is especially important now that compression is enabled by default. Bloom filters help Cassandra avoid scanning data files that can’t possibly include the rows being queried. They weigh in at 1-2GB per billion rows, depending on how aggressively they are tuned. Both of these use the existing sstable reference counting with minor tweaking to free native resources when the sstable they are associated with is compacted away.

Performance triage

Performance triage (David Dice’s Weblog)

Lets say I have a running application and I want to better understand its behavior and performance. We’ll presume it’s warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I’ll do is take a set of “pstack” samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what’s actually going on inside the program. You’ll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you’ll get a good sense where the cycles are being spent.

Spanner: Google’s Globally-Distributed Database

Spanner is Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

JavaOne ’12

Yes, the big names, the legends and the heroes are no more but fret not Oracle has managed to pull a not so shabby lineup of speakers for JavaOne 2012. Yes, it’s maturing, and maybe there’s just less to talk about, and Java’s a teenager now!

So if I do get to (thank you Oracle for the pass!) J1 this year, here are the talks I’ll be attending:

CON3586 – Dealing with JVM Limitations in Apache Cassandra
Jonathan Ellis, CTO , DataStax

CON3753 – Delivering Performance and Reliability at the World’s Leading Futures Exchange
Rene Perrin – Technical Specialist Software Engineer, CME Group

CON6583 – G1 Garbage Collector Performance Tuning
Charlie Hunt – Architect, Performance Engineering, Salesforce.com
Monica Beckwith – Principal Member of Technical Staff, Oracle

CON11233 – Detecting Memory Leaks in Applications Spanning Multiple JVMs
Albert Mavashev – CTO, Nastel Technologies, Inc.

CON6465 – JVM Support for Multitenant Applications
Graeme Johnson – Cloud JVM Architect, IBM CORPORATION

CON6703 – ARM: Eight Billion Served—“Want That Java Superoptimized?”
Andrew Sloss – Senior Pricinpal Engineer, ARM
Bertrand Delsart – Consulting Member of Technical Staff, Oracle

BOF6308 – Showdown at the JVM Corral
John Duimovich Duimovich – Java CTO, IBM Canada Ltd.
Mikael Vidstedt – JVM Architect, Oracle

I don’t want to die in a language I can’t understand – Dick Gabriel

Dick Gabriel, a legend – “scholar, scientist, poet, performance artist, entrepreneur, musician, essayist, and yes, hacker…” speaks at Clojure/West.

Richard P. Gabriel expands upon “Mixin-based Inheritance” by G. Bracha and W. Cook, observing that software engineering precedes science and incommensurability can be used to detect paradigm shifts.
http://www.infoq.com/presentations/Mixin-based-Inheritance

The ultimate reference book on Java performance is out! ‘The definitive master class in performance tuning Java applications…’, James Gosling.

Java Performance

 

Hot off the press – October 2011! 

http://www.amazon.com/Java-Performance-Charlie-Hunt/dp/0137142528

 

“The definitive master class in performance tuning Java applications…if you love all the gory details, this is the book for you.”
–James Gosling, creator of the Java Programming Language

Improvements in the Java platform and new multicore/multiprocessor hardware have made it possible to dramatically improve the performance and scalability of Java software.

Java™ Performance covers the latest Oracle and third-party tools for monitoring and measuring performance on a wide variety of hardware architectures and operating systems. The authors present dozens of tips and tricks you’ll find nowhere else.

You’ll learn how to construct experiments that identify opportunities for optimization, interpret the results, and take effective action. You’ll also find powerful insights into microbenchmarking–including how to avoid common mistakes that can mislead you into writing poorly performing software. Then, building on this foundation, you’ll walk through optimizing the Java HotSpot VM, standard and multitiered applications; Web applications, and more.

Coverage includes

  • Taking a proactive approach to meeting application performance and scalability goals
  • Monitoring Java performance at the OS level in Windows, Linux, and Oracle Solaris environments
  • Using modern Java Virtual Machine (JVM) and OS observability tools to profile running systems, with almost no performance penalty
  • Gaining “under the hood” knowledge of the Java HotSpot VM that can help you address most Java performance issues
  • Integrating JVM-level and application monitoring
  • Mastering Java method and heap (memory) profiling
  • Tuning the Java HotSpot VM for startup, memory footprint, response time, and latency
  • Determining when Java applications require rework to meet performance goals
  • Systematically profiling and tuning performance in both Java SE and Java EE applications
  • Optimizing the performance of the Java HotSpot VM

Using this book, you can squeeze maximum performance and value from all your Java applications–no matter how complex they are, what platforms they’re running on, or how long you’ve been running them.
About the Author

Charlie Hunt is the JVM performance lead engineer at Oracle. He is responsible for improving the performance of the HotSpot JVM and Java SE class libraries. He has also been involved in improving the performance of the Oracle GlassFish and Oracle WebLogic Server. A regular JavaOne speaker on Java performance, he also coauthored NetBeans™ IDE Field Guide (Prentice Hall, 2005).

Binu John is a senior performance engineer at Ning, Inc., where he focuses on improving the performance and scalability of the Ning platform to support millions of page views per month. Before that, he spent more than a decade working on Java-related performance issues at Sun Microsystems, where he served on Sun’s Enterprise Java Performance team. John has contributed to developing industry standard benchmarks such as SPECjms2007 and SPECJAppServer2010; published several performance whitepapers; and contributed to java.net’s XMLTest and WSTest benchmark projects.

ParallelGCThreads = (ncpus

On GC Threads (via Hiroshi Yamauchi):

  • Since ParallelCMSThreads is computed based on the value of ParallelGCThreads, overriding ParallelGCThreads when using CMS affects ParallelCMSThreads and the CMS performance.
  • Knowing how the default values of the flags helps better tune both the parallel GC and the CMS GC. Since the Sun JVM engineers probably empirically determined the default values in certain environment, it may not necessarily be the best for your environment.