Lessons in systems scalability: Inside MySpace.

Inside MySpace.com

MySpace started small, with two Web servers talking to a single database server. Originally, they were 2-processor Dell servers loaded with 4 gigabytes of memory, according to BenedettoBut at 500,000 accounts, which MySpace reached in early 2004, the workload became too much for a single database…. closing in on 2 million, the service began knocking up against the input/output (I/O) capacity of the database servers… The next database architecture was built around the concept of vertical partitioning, with separate databases … hitting 3 million registered users, the vertical partitioning solution couldn’t last… every database had to have its own copy of the users table… That meant when a new user registered, a record for that account had to be created on nine different database servers…

By mid-2004, … the “scale up” versus “scale out” decision…But a successful scale-out architecture requires solving complicated distributed computing problems, and large Web site operators such as Google, Yahoo and Amazon.com have had to invent a lot of their own technology to make it work. …So, MySpace gave serious consideration to a scale-up strategy, spending a month and a half studying the option … Unfortunately, that high-end server hardware was just too expensive—many times the cost of buying the same processor power and memory spread across multiple servers. Besides… even a super-sized database could ultimately be overloaded… MySpace began splitting its user base into chunks of 1 million accounts and putting all the data keyed to those accounts in a separate instance of SQL Server. Today, MySpace actually runs two copies of SQL Server on each server computer, for a total of 2 million accounts per machine… There is still a single database that contains the user name and password credentials for all users.

… 9 million accounts, in early 2005, … C# … under ASP.NET. … 150 servers running the new code were able to do the same work that had previously required 246... 10 million accounts, it began to see storage bottlenecks again. Implementing a SAN had solved some early performance problems, but now the Web site’s demands were starting to periodically overwhelm the SAN’s I/O capacity… For example, the seventh 1 million-account database MySpace brought online wound up being filled in just seven days, largely because of the efforts of one Florida band that was particularly aggressive in urging fans to sign up… 17 million accounts, in the spring of 2005 MySpace added a caching tier…

Benedetto admits he had to learn the hard way. “I’m a database and storage guy, so my answer tended to be, let’s put everything in the database,” he says, but putting inappropriate items such as session tracking data in the database only bogged down the Web site. In mid-2005, … 26 million accounts, MySpace switched to SQL Server 2005 … still in beta… The main reason was … 64-bit… “It was that we were so bottlenecked by memory.”.

As of November, MySpace was exceeding the number of simultaneous connections supported by SQL Server, causing the software to crash. Benedetto candidly admits that 100% reliability is not necessarily his top priority.
“That’s one of the benefits of not being a bank, of being a free service,” he says. MySpace has configured SQL Server to extend the time between the “checkpoints” operations it uses to permanently record updates to disk storage—even at the risk of losing anywhere between 2 minutes and 2 hours of data—because this tweak makes the database run faster. And because it’s virtually impossible to do realistic load testing on this scale, the testing that they do perform is typically targeted at a subset of live users on the Web site who become unwitting guinea pigs for a new feature or tweak to the software, he explains.