Hotmail : 100 sysadmins, +10,000 servers, multiple petabytes of data.

A Conversation with Phil Smoot

At every level—failing LUNs (logical unit numbers), failing servers, failing services, failing networks—you want to make sure that all calling services don’t tie up all of their resources on any particular component. And you want to make sure that operationally you can isolate a broken or failing hardware component or server or service from the rest of the network. Most of this experience comes with time. We try to build our bench with folks who understand what mistakes not to make.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s