SOA @ Yahoo!

InfoQ: QCon: REST for SOA at Yahoo!

Instead of replicating data between a backend master database and the frontend database, the frontend boxes now issue requests through a cache to backend API servers, all via HTTP. Because of this, there is now a single source of truth. The cache replicates the data once it has been requested – a pull model vs. a push model. Questioned whether this is a RESTful API, Mark stressed that he views issues around REST as a philosophical discussion, but conceded that the backend APIs are, in fact, RESTful. (He has expressed this view before in a blog entry called “REST issues: Real and Imagined”) User generated content is pushed through to the backend, and adding capacity becomes easy.As one example of just using HTTP correctly instead of getting into a philosophical REST discussion gave caching intermediaries. The caching features built into HTTP are quite advanced, and they become immediately usable for well-designed HTTP applications. Examples of advantages are freshness (because the data is pulled from the backend whenever it needs to) and validation (asking “has this changed” is a quick HTTP-base question to the backend). It is also possible to provide “recalculated” results, which are validated against the etag of the calculation input. Having a standards-based cache also enables the collection of metrics and load balancing. (For a great introduction into HTTP caching, see Mark’s own Caching Tutorial for Web authors and Webmasters.)

Mark also commented on some more advanced techniques used at Yahoo! Media Group. Multi-GB memory caches are not at all uncommon, and sometimes they are put into groups that are kept in sync via cache peering, i.e. the synchronization of more than cache in a group. (There are numerous common cache peer protocols, such as ICP) Another advanced concept is negative caching: if there’s an error out of the API server, the cache will cache the error, reducing the load on the backend. Collapsed forwarding means that multiple requests from the frontend can be collapsed to a single one, which according to mark is another great way to mitigate traffic overload from the frontend. While the cache is refreshing something in the backend, it can return a stale copy, a concept called stale-while-revalidate. Similarly, stale-if-error means that if there’s a problem on the backend box, it can serve a stale copy, too. Another concept is an invalidation channel, which is an out-of-band mechanism to tell the cache something has become stale.

Ebay does not use database transactions.

Martin Fowler: Transactionless

The rationale for not using transactions is that they harm performance at the sort of scale that ebay deals with. This effect is exacerbated by the fact that ebay heavily partitions its data into many, many physical databases. As a result using transactions would mean using distributed transactions, which is a common thing to be wary of.

This heavy partitioning, and the database’s central role in performance issues, means that ebay doesn’t use many other database facilities. Referential integrity and sorting are done in application code. There’s hardly any triggers or stored procedures.

My immediate follow-up to the news of transactionless was to ask what the consequences were for the application programmer, in particular the overall feeling about transactionlessness. The reply was that it was odd at first, but ended up not being a big deal – much less of a problem than you might think. You have to pay attention to the order of your commits, getting the more important ones in first. At each commit you have to check that it succeeded and decide what to do if it fails.

Scability problems hit Dow Jones Indexes

Computer glitch made market drop seem worse

The problem began at 1:50 p.m Tuesday amid heavy selling and caused a 70-minute time lag in calculating the value of the DJIA, according to a statement from Dow Jones Indexes. A system that feeds market data to the computer calculating the DJIA suddenly began experiencing delays. “While the DJIA was still being calculated and disseminated, the calculator was not receiving the underlying component prices of the DJIA on a timely basis.”

When the problem was identified, Dow Jones Indexes switched over to a back-up computer and the result was a sudden 200-point plunge in the DJIA as the system caught up with the latest market data. “This switch-over caused prices that were received during the latency period to be processed all at once, bringing the index immediately in line with its underlying component stocks.

“Dow Jones Indexes is continuing to investigate the latency issue to correct the root cause of the problem.”