home | sitemap | abstract | introduction | chaos | thinking | checklist | migrating | recovery
pushpull | cost | career | workshop | isconf | list_and_community | papers | references

Introduction

There is relatively little prior art in print which addresses the problems of large infrastructures in any holistic sense. Thanks to the work of many dedicated people we now see extensive coverage of individual tools, techniques, and policies [nemeth] [frisch] [stern] [dns] [evard] [limoncelli] [anderson] . But it is difficult in practice to find a "how to put it all together" treatment which addresses groups of machines larger than a few dozen.

Since we could find little prior art, we set out to create it. Over the course of several years of deploying, reworking, and administering large mission-critical infrastructures, we developed a certain methodology and toolset. We began thinking of an entire infrastructure as one large enterprise cluster, rather than as a collection of individual hosts. This change of perspective, and the decisions it invoked, made a world of difference in cost and ease of administration.

We recognize that there really is no "standard" way to assemble or manage large infrastructures of UNIX machines. While the components that make up a typical infrastructure are generally well-known, professional infrastructure architects tend to use those components in radically different ways to accomplish the same ends. In the process, we usually write a great deal of code to glue those components together, duplicating each others' work in incompatible ways.

Because infrastructures are usually ad hoc, setting up a new infrastructure or attempting to harness an existing unruly infrastructure can be bewildering for new sysadmins. The sequence of steps needed to develop a comprehensive infrastructure is relatively straightforward, but the discovery of that sequence can be time-consuming and fraught with error. Moreover, mistakes made in the early stages of setup or migration can be difficult to remove for the lifetime of the infrastructure.

We will discuss the sequence that we developed and offer a brief glimpse into a few of the many tools and techniques this perspective generated. If nothing else, we hope to provide a lightning rod for future discussion. We operate a web site (www.infrastructures.org) and mailing list for collaborative evolution of infrastructure designs. Many of the details missing from this document should show up on the web site.

In our search for answers, we were heavily influenced by the MIT Athena project [athena] , the OSF Distributed Computing Environment [dce] , and by work done at Carnegie Mellon University [sup] [afs] and the National Institute of Standards and Technology [depot] .

Checklist

Version Control


Gold Server
Host Install Tools
Ad Hoc Change Tools
Directory Servers
Authentication Servers
Time Synchronization
Network File Servers
File Replication Servers
Client File Access
Client O/S Update
Client Configuration Management
Client Application Management
Mail
Printing
Monitoring
Google
Search WWW Search www.infrastructures.org
Unix System Administration
[ Join Now | Ring Hub | Random | << Prev | Next >> ]
© Copyright 1994-2007 Steve Traugott, Joel Huddleston, Joyce Cao Traugott
In partnership with TerraLuna, LLC and CD International