From stevegt@TerraLuna.Org Wed Apr 24 17:07:38 2002
Date: Wed, 24 Apr 2002 17:07:38 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: "Alva L. Couch" <couch@eecs.tufts.edu>
Cc: Joyce Cao <joyce@TerraLuna.Org>, joyce.cao@idt.com
Subject: Re: ordering
Message-ID: <20020424170738.A25282@scramjet.TerraLuna.Org>
References: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Fri, Apr 19, 2002 at 04:57:52PM -0400
Status: RO
Content-Length: 9491

Hi Alva!

Joyce and I have both been digesting this message and your next.  Some
answers so far...

On Fri, Apr 19, 2002 at 04:57:52PM -0400, Alva L. Couch wrote:
> I think at the outset that I agree somewhat with the conclusions but
> not with the mathematics.  There are some mistakes.  - if you have N
> things over all orderings, there are N! possible outcomes, not N^2.

Of course.  My mistake.  Fixed.  But I was talking about number of
possible disk states, given an N-bit disk.  That would be 2^N possible
disk states.  I had it backwards -- silly error.  Thank you.

On the other hand, you make a very good point -- ordering of
operations is itself numerically interesting, because of that
factorial.  So I'm adding that, and elaborating below.  

An HACMP cluster node build is around 121 discrete operations, for
instance, if you only count the number of stanzas in the ISconf
makefile.  That would be 121!, or 8.09*10^200 possible orderings.

These produce a disk content which uses 2Gb of space, and has
2^(8*2*10^9) possible bit states.  I don't yet know what magnitude
that number is -- I've had a perl one-liner chewing CPU for several
hours so far.

> - One thing that you ignore is the concept of orthogonality.  Two
>   systems are orthogonal if changes to one can't affect the other, and
>   vice versa.  The effect of orthogonality is that operations on
>   orthogonal subsystems always commute.  E.g., a*b = b*a.  The effect
>   of orthogonality is to reduce the space of possible sequences.
>   E.g., I think everyone would agree that the 'networking subsystem'
>   is orthogonal to vi.  Changes to vi can't affect networking, and
>   vice versa.  Thus the results of a change to both is the same in
>   either order.

Yes, but in practice, who decides whether, or to what degree, two
subsystems, or operations on them, are related?  For example, the
latest version of VIM does contain networking code, to support remote
slaved edit sessions.  I would never have known about this possible
dependency if I hadn't read the docs.  

I suspect the most reliable and fast way to ensure that two sequences
produce the same outcome is via regression testing.  As the number of
possible sequences increases, so does the required number of
regression test iterations.

Going back to that HACMP build, a full regression test run to
determine equivalency of all possible orderings is problematic to say
the least; call it 1 hour for each rebuild, that's 9.24*10^196 years.
(Had to install Math::BigInt to get that.)  ;-)

Testing an HACMP cluster actually takes several hours after the build
itself is done.  That extra time would only hurt us by 1 additional
order of magnitude, so I'm neglecting that. 

Looking at the HACMP makefile dependency trees, 61 of the operations
are "leaves".  Only a few of those would be orthogonal to each other.
But let's assume that they all are.  We know that order matters in the
remaining 121-61 = 60 operations.  So that lowers us to 9.5*10^77
years of regression testing to determine which sequences of those 60
operations are equivalent.

Of course, much of that regression testing could be eliminated by
pruning "dead branches", sequences which include subsequences which
are known to be bad.  Let's say that 99.999% can be instantly culled
by recognition, before testing.  That still leaves 9.5*10^72 years,
unless I've slipped a decimal somewhere.

Let's say disk, network, and CPU speeds increase by a couple orders of
magnitude, and those 1-hour builds each take 1 second instead.  That
brings us down to 9.5*10^69 years to process the culled order space.

But this all presupposes that we already know the entire set of
operations to be performed.  When doing builds or upgrades this is
never the case.  The actions needed are always discovered in the
course of growing an infrastructure.  In the case of ISconf, this
discovery is by the person who's writing and debugging the makefile;
"Oh, looks like bar needs to be installed before I can install foo; I
didn't know until now that we even needed bar.  Ok, I'll add bar to
the makefile as a prereq for foo."  
  
This also presupposes that discovering a viable order of actions is
hard, and that exhaustive exploration of possible orderings is a
desireable way to yield an optimum outcome.  In practice, the most
viable order is the order in which the operations were discovered and
executed in the  first place.  At every point in this process, we
already have an order that will work.

And this also presupposes that the list of operations is a fixed set,
regardless of order.  In reality, changing the order will require
changing the membership of the set -- for example, failing to mount
/opt before populating /opt/foo will require a later migration onto
the 'real' /opt.  This migration is an additional operation which
would have to be discovered and added to the set.

This is all reaching the point from an extreme direction; the point
being that, for building or maintaining a given host class, it's
easiest in practice to constrain the ordering to the one known,
working sequence, test the heck out of it, and only add new actions to
the end of the sequence.

New host classes can then be created based on variations of the known
good sequence.  In ISconf, new host classes get created by creating
new makefile stanzas, naming ordered sets of existing stanzas as
prereqs.  In this case, we benefit from known working subsequences,
and the internal ordering of those subsequences is preserved.

> I have been working on the concept of ordering more than you 
> realize, but can't as the chair publish a paper on it this year. 
> In particular, I have designed a formalism that portrays the 
> behavioral effects of configuration, then analyzed the structure
> of operations on behavior as a semigroup.  This semigroup can 
> be decomposed by the factor theorem into subsemigroups describing
> changes to disjoint subsystems. 
> 
> This fancy math demonstratese that storing the complete ordering is
> 'more work than is required', though cfengine stores 'less than is
> required'.  All one has to do is to serialize the changes to
> orthogonal subsystems.  Since operations on orthogonal subsystems
> commute, one can always express them as groups of operations on one
> subsystem at a time.

I think, after looking at it over the last few years, that this is a
lot of what make is doing for ISconf.  I preach that targets should
only get added to the end of prereq lists when editing a makefile.
This means that changes within orthogonal subsets are always
deterministically serialized.  

More importantly (I think), human error in determining orthogonality
tends to be masked by the deterministic way that make iterates over
multiple prereqs in the same sequence every time. This way, even
though two actions might not be explicitly linked in the dependency
tree, they are nevertheless serialized identically on every run,
removing the possibility of indeterminate behavior.

For example, the deepest dependency graph explicitly coded in that
HACMP makefile is 8.  There are deeper HACMP dependencies not
explicitly coded; they are being maintained by make's deterministic
prereq list behavior.  I suspect the actual depth is at least 30.  If
I were to rearrange the order of any of those non-explicit prereq
lists, experience tells me that in most cases the build will break.  

If make's behavior were indeterminate, if it randomized the order of
things not explicitly stated as dependencies, or if I were to run make
with a -j flag, allowing it to parallelize, experience tells me that
the resulting outcome would be indeterminate and non-reproducable.  At
best, the build would cleanly break.  At worst, the build would
complete, and I would need to do extensive testing to determine
whether the machine is actually capable of doing its job.

My strongest driver is the need to remove the possibility of
indeterminate behavior -- it's a business requirement.  When I'm the
consultant flipping the switch to demonstrate to the customer what I
can do, indeterminate behavior is the greatest risk I can think of.
It doesn't matter how long the build or upgrade takes, as long as it's
less than an hour or two.  It doesn't matter if the predictable,
plodding nature of the build introduces security holes at
predetermined times; I haven't seen that exploited yet and don't
expect to, not for a few more years.  But I have seen unforseen
dependencies between subsystems cause build failures; it's a common
failure mode.

> You haven't convinced me that equivalence of the behavioral effects of
> randomly ordered configuration operations is undecidable.  Just
> complex. You might be able to do this. 

I think of it from the other direction -- beyond a certain number N,
it's infeasible to prove empirically that N configuration operations
can be randomly ordered and produce the same behavioral outcome.  N
appears to be as low as 8 or 9.

Genetic algorithms must have a fitness function which executes
quickly.  If the fitness function takes too long, then the time
required to arrive at a solution is prohibitive.  Building and testing
a host is a 1-hour fitness function.

For example, with a DNA size of only 8 operations, executing a 1-hour
fitness function for each combination requires 8!/24/365 = 4.6 years.

What am I missing?

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Sun Apr 28 00:06:50 2002
Date: Sun, 28 Apr 2002 00:06:50 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: "Alva L. Couch" <couch@eecs.tufts.edu>
Cc: Joyce Cao <joyce@TerraLuna.Org>
Subject: Re: more about turing equivalence.
Message-ID: <20020428000650.A13515@scramjet.TerraLuna.Org>
References: <20020420002305.0706BCFC4@piano.eecs.tufts.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020420002305.0706BCFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Fri, Apr 19, 2002 at 08:23:05PM -0400
Status: RO
Content-Length: 13259

Hi Alva!

I'm lagging two messages behind you because I want to follow your
thought processes as closely as I can.  It's possible (likely) that
I'll answer points that you've already addressed in your next two
messages.  This one I've been thinking about all week.  ;-)

On Fri, Apr 19, 2002 at 08:23:05PM -0400, Alva L. Couch wrote:
> I took a short walk and reflected on your paper some more.
> 
> a) Cfengine's configuration operations aren't turing equivalent, nor
>    are the operations posited by maelstrom.  Both are "state machines"
>    in the language of the Chomsky hierarchy.  Recall that there are
>    three levels of complexity in the Chomsky model, provably different:

I usually see four levels -- you're lumping linear bounded automata in
with turing machines, right?  I don't think it matters in this case,
unless you've seen something I've missed.

>    i) state machines/regular expression grammars.  Most cfengine
>       configurations live here.  It takes some trouble to make one
>       that isn't a state machine. 

Yes.  And the state which cfengine detects is based on spot checks of
the current disk content.  I suspect that, due to those 2^N possible
bit states of an N-bit disk, any method which uses current disk
content as the input to a state machine is only going to be able to
take samples.

This sampling means that cfengine is not really responding to the full
disk state of the machine.  During any given state transition,
cfengine responds to only a very small subset S of the disk state.
Therefore, any given rule in cfengine's state machine can actually be
triggered by any one of 2^(N-S) disk states.  Those multiple possible
disk states cause the outcome of any conventionally-configured
cfengine run to be non-deterministic.  (Ouch.)

> Likewise, makefiles can usually be
>       REDUCED to state machines by looking at what they do on each
>       architecture and codifying that as state transition rules. 

By intent, ISconf has always wrapped and used makefiles as
deterministic state machines, only.  The only bits on the disk that
the makefile reads to determine state are timestamp files explicitly
created by the makefile itself and maintained for the life of the
target host.  

This means that ISconf uses the full set of all prior actions,
accumulated over the entire life of the target host, in order to
determine current state and trigger the next state transition.  Since
there is only one possible disk state that could be represented by a
given set of timestamp files, we get deterministic state transitions.

An objection frequently raised is that this approach does not have the
ability to detect and correct out-of-band changes.  In a fully
automated environment, out-of-band changes to root-owned portions of
disk are indicators of a security breach.  The only reliable way to
inspect a disk for security breaches is to reboot off of an untainted
disk.  No tool which executes within the context of a breached disk
can be relied on to detect and correct the breach, so this objection
applies to cfengine, make, and even manual administration methods.  

>    ii) pushdown automata/context-free grammars.  Right now I know of
>       no configuration management tools in this category. 

Something's nagging at me here, having to do with LIFO ordering and a
configuration tool with rollback capabilities etc.  It could be that
the "save original files for uninstall" capability of many package
managers is verging on pushdown automata, but with nowhere near an
accurate implementation.  

>    iii) turing machines/unconstrained grammars.
>       The actual machines being configured live here.  This is where
>       undecidability may lurk.

Yes.  I suspect *all* self-administered hosts are one-tape turing
machines, regardless of administration tool.  I'll get back to that in
a moment.

>    This makes the CONFIGURATION model simpler than the EXECUTION
>    model.  While the execution model remains turing equivalent, the
>    configuration model is fundamentally simpler in structure.

Yes.  Further, I suspect a cfengine script is equivalent to the
nonvolatile ruleset of a one-tape turing machine, while the target
machine's disk is the turing tape -- it reads a bit, alters the tape,
reads again, etc.  

I suspect an ISconf makefile is equivalent to the ruleset of a
two-tape turing machine.  The directory (/var/isconf/stamps) where
ISconf keeps its stamp files is a read/write tape, while the rest of
the disk is a write-only tape.  ISconf itself doesn't care what's out
there on that write-only tape -- it drives blindly.  (Though the tools
which are called by 'make' might care.  For instance, it's perfectly
valid to call cfengine from within make.) 

So, yes, the configuration model and the target machine might fit
different Chomsky levels when viewed in isolation from each other.  

But these analogies collapse into one-tape machines when you
take into account the fact that any administrative action has the
ability to modify the behavior of the administrative tool itself.

Let's equate a one-tape turing machine's nonvolatile ruleset with the
CPU instruction set of a conventional Von Neumann machine.  The disk
content is the turing tape.  The operating system and subsidiary
programs reside on the same tape.  The configuration tool itself is by
intent a finite automata "state machine", emulated by the turing
machine.  This state machine's ruleset and control code reside on the
same tape as the operating system and other applications.

That configuration tool has the ability to modify the content of the
rest of the turing tape -- that's its job.  It can also directly
modify its own ruleset -- it can upgrade itself.

The content of the rest of the turing tape determines how the turing
machine interprets the configuration tool's ruleset.  In practical
terms, this means that the behavior of the configuration tool is
dependent on the operating system kernel, shared libraries, and
subsidiary applications.  

So the configuration tool can modify its own behavior in two ways;
directly, by replacing its own ruleset; or indirectly, by altering the
way that ruleset is interpreted.

Therefore, a configuration tool is capable of the self-modifying
behavior of a turing machine.  

The instances in which a configuration tool will modify its own
behavior indirectly, by modifying code it depends on, are difficult to
predict.  Doing so would require a full understanding of all parts of
our "turing tape" which affect the behavior of the tool.  In practice
this would mean scoping and conducting an exhaustive analysis of all
pertinent code in the kernel, all applications and utilities which the
configuration tool calls, as well as all shared libraries and
configuration files these components depend upon.  

Unless this full understanding exists, a configuration tool residing
on, executing within, and modifying the behavior of a one-tape turing
machine *must* be assumed to behave like a turing machine, rather than
a finite automata.

> b) Even with this simplification, the sequencing problem remains.  It
>    is easy to construct a sequence of invocations of N state machines
>    such that only the order x1..xn gives the appropriate outcome on
>    the turing machine tape.  This is the same thing as what you're
>    saying; without more simplifications, the outcome of every order is
>    potentially different.

Yes.

Further, if the state machines are themselves executing within the
context of the turing machine tape, and their states interpreted by
code which they manage, then the order of invocations becomes doubly
important:  If state machine x42 alters the ruleset of state machine
x43, then invoking x43 first, followed by x42, will cause a very
different outcome than x42 followed by x43.

Cfengine does not consider its prior actions of a month or a year ago
to be part of the "tape" in order to avoid hitting this particular
problem: it can *only* rely on those spot checks.  

In Mark's LISA workshop we discussed whether cfengine 1.X might be
configured to maintain long-term state, or if cfengine 2.X might
support this natively, in order to make it more deterministic.  I'm
not sure how many people present understood what this debate was
really about.  I think we're doing a pretty good job of addressing it
here.

> c) Determining an appropriate order for arbitrary state machines to
>    achieve a given outcome for configuration is INTRACTIBLE but not
>    UNDECIDABLE.  Hard but not impossible.  All one has to do is to try
>    all the orders and choose the one that produces the appropriate
>    outcome.  

Yes, if you can afford N!.  In practice, we don't need to try very
many of the orders at all -- the correct order is the one which worked
when each state machine was itself discovered to work.  In other
words, if the order P currently works, and we develop, debug, and test
a new state machine xn which produces the desired outcome when run
immediately after P, then the new order P' should include only P and
xn, in that order.  

The way xn gets coded in an ISconf makefile is by adding the new state
machine into the makefile as an independent stanza or internally
dependent set of stanzas.  Then we add the xn target name to the tail
end of the P prereq list, creating P'.

>    It becomes undecidable if other processes are SIMULTANEOUSLY
>    modifying configurations, asynchronously.

Absolutely.  Manual administration Bad, 'make -j' Bad...  And using
two different auto-administration tools on the same machine is Really
Really Bad.  ;-)

> d) Determining whether an appropriate order achieves a particular
>    behavioral outcome remains UNDECIDABLE.  This is just another
>    flavor of Church's thesis.

I haven't been able to find a reference for this.  Do you mean
"determining in advance"?  If so then I agree.  If not, then I'd have
to think about it longer.

> I think the real issue here is that VALIDATION of behavior only
> applies to a specific sequence of operations.  If one changes the
> sequence of operations, then one must prove that the other sequence
> reproduced the same disk state, bit for bit, before one can claim that
> the prior validation applies to the new state.  

YES!  This is very probably a core, if not *the* core, idea that I'd
like to get across.  Sampling disk state alone can not ensure bit for
bit reproduction of disk contents nor resulting behavior.  But
reproducing order of configuration changes will reproduce the same
disk state and behavior every time.

I further suspect that "A machine's set of behaviors will *always*
vary if the disk contents vary".  This might be proved by including the
disk content itself as part of the output set, which I think is valid
if the machine's programming includes introspection functions.
Machines with different disk contents will certainly always exhibit
different behaviors in response to 'cat /dev/*'.

> This is a claim of "transitivity of validation", the same claim
> utilized by the Linux Standard Base (LSB).  

Hmmmm...

>  Now
> 
> a) In the absence of further assumptions, whether two different
>    configurations exhibit the same behavior is UNDECIDABLE(can prove).

I agree, but haven't thought of a direct proof.  An indirect proof is
what I've been building with this whole chain of thought over the last
few months...  

Recap:

- sampling disk state alone isn't enough to determine disk state,
  unless you sample the *whole* disk

- ...so state machines driven by only disk samples produce
  non-deterministic behavior

- ...so the equivalence of behavior of two different state machines
  driven only by disk samples is undecidable  (that's easy)

- configuration driven by only deterministic ordering of individual
  state machines produces deterministic behavior

- because the state machines reside on and execute within the context
  of the driven disk, they can modify each other, as well as
  themselves

- ...so the behavior of individual state machines is dependent on the
  order of invocation of all state machines

- ...so whether two different orders of configuration operations
  exhibit the same behavior is undecideable (takes me a while to get
  here)

> b) so whether two different orders of configuration operations exhibit
>    the same behavior remains undecidable by lifting.

Exactly.

> The most important force in the world to system administrators is
> not configuration management, but configuration validation. 

Yes.  In this context, it's amazing how many shops lack test
environments, but instead "try out" new administrative code in
production.  In hindsight I can see why I've been harping on the need
for test environments for the last several years.  Interesting that
it's this easy (hah) to justify that need theoretically.  ;-)

> It is this validation that suffers from turing equivalence problems,
> not the act of configuration itself. 

If you take into account the x43,x42 ordering problem I described
above, does this change your mind about the act of configuration
itself having a turing equivalence problem?

Good stuff.  

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Sun Apr 28 14:20:28 2002
Date: Sun, 28 Apr 2002 14:20:28 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: "Alva L. Couch" <couch@eecs.tufts.edu>
Cc: Joyce Cao <joyce@TerraLuna.Org>
Subject: Re: ordering
Message-ID: <20020428142028.A28982@scramjet.TerraLuna.Org>
References: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Thu, Apr 25, 2002 at 09:07:10AM -0400
Status: RO
Content-Length: 9336

Hi Alva,

Next in sequence...  ;-)

On Thu, Apr 25, 2002 at 09:07:10AM -0400, Alva L. Couch wrote:
> The only way to assure that two subsystems are orthogonal is by design. 
> Testing cannot assure that.  This is a fundamental premise of 
> software testing. 
> 
> I think that this is the ONLY way to assure that a realistic build is
> reliable.  But worse, there is NO way to assure the build is
> consistent across an enterprise of hosts, other than to assure
> homogeneity of operations and even perhaps hardware.

Enacting all administrative change via serialized, tested and
reproducable sequences of operations constitutes homogeneity of
administrative operations.  This will ensure consistent builds across
an enterprise.  

Restating that, "if we do the same things in the same order to
identical machines, then they will remain identical in behavior".
Machines which aren't identical, or which by intent need different
builds, are created by variations of the tested base sequence.

This allows us to ignore orthogonality, rather than needing to
determine whether the subsystem developer designed for it.

I think you cover most of this later from a different angle, in your
discussion of transitive validation.

> > Going back to that HACMP build, a full regression test run to
> > determine equivalency of all possible orderings is problematic to say
> > the least; call it 1 hour for each rebuild, that's 9.24*10^196 years.
> > (Had to install Math::BigInt to get that.)  ;-)
> 
> Kind of cute but perhaps irrelevant.  I claim that we know that some
> orderings are equivalent, by design. The problem is that we have no 
> information about order invariance except by design. If two subsystems
> are designed not to interact, e.g., they're in separate boxes and 
> controlled by separate drivers and configuration, then this means that the 
> configuration operations for the subsystems commute with one another. 
> Instead of N!, one is left with N1! * N2! * .... * NK! where 
> N1,N2,...,NK are the sizes of sets of operations for the subsystems. 

Makes sense.  

Commutability and orthogonality are difficult for ordinary sysadmins
to prove or disprove in the field.  For a sysadmin to use a tool which
requires them to make these determinations, they must understand
correct use of the tool, the design of the subsystems they're
configuring, as well as the desired configuration result, and how to
test for it to make sure they succeeded.

For a sysadmin to use a tool which ensures strict serialization of
sequences of operations, they only need to understand correct use,
desired result, and how to test for it.  The "correct use" part is the
only additional thing they need to learn beyond what they would
normally do when using manual methods.  This is trainable.  In
practice, I've been able to orient sysadmins in makefile and CVS usage
in as little as a day, on-site.

If we require that sysadmins understand subsystem design well enough
to determine orthogonality, then we're expecting them to become more
of an engineer in order to use the tool.  This is less trainable.  I
would not be able to teach someone how to do this in a limited period
of time, on-site, in the middle of a rollout.

> > I think, after looking at it over the last few years, that this is a
> > lot of what make is doing for ISconf.  I preach that targets should
> > only get added to the end of prereq lists when editing a makefile.
> > This means that changes within orthogonal subsets are always
> > deterministically serialized.  
> 
> Yes.  Actually if you do this you don't even need to know anything
> about orthogonality. 

Exactly.

> > More importantly (I think), human error in determining orthogonality
> > tends to be masked by the deterministic way that make iterates over
> > multiple prereqs in the same sequence every time. This way, even
> > though two actions might not be explicitly linked in the dependency
> > tree, they are nevertheless serialized identically on every run,
> > removing the possibility of indeterminate behavior.
> 
> Agreed. 

I'll get back to this in a later message -- the response you sent me
this morning disagreed with a restating of this paragraph, which makes
me think I stated something unclearly in last night's message.  I'll
have to read both again.

> > For example, the deepest dependency graph explicitly coded in that
> > HACMP makefile is 8.  There are deeper HACMP dependencies not
> > explicitly coded; they are being maintained by make's deterministic
> > prereq list behavior.  I suspect the actual depth is at least 30.  If
> > I were to rearrange the order of any of those non-explicit prereq
> > lists, experience tells me that in most cases the build will break.  
> 
> This is important to note and document. Tell us a little about what breaks
> and how subtle the breaks are. 

Will do.

> > If make's behavior were indeterminate, if it randomized the order of
> > things not explicitly stated as dependencies, or if I were to run make
> > with a -j flag, allowing it to parallelize, experience tells me that
> > the resulting outcome would be indeterminate and non-reproducable.  At
> > best, the build would cleanly break.  At worst, the build would
> > complete, and I would need to do extensive testing to determine
> > whether the machine is actually capable of doing its job.
> 
> This is important to note and document.  I would say not that "make" 
> is necessary, but that "serialization" is necessary, and I can probably
> simplify your syntax with a simpler tool than make...

Agreed that "serialization" is necessary.  Agreed that 'make' isn't
the only way to go -- but didn't want to replace 'make' until I had
time and understood the problem well enough to do it properly.  Based
on past experience I suspect any other tool will also have to support:

- preserving serialization of reusable subsequences

- implicit assertion test of zero return codes for external commands

- implicit reproducable serialization of operations not explicitly
  serialized

- semaphores which child processes can use to signal async events to
  parent processes (like 'rebuild kernel and reboot when make is done')

- a hierarchical grouping of host attributes, so that host function
  can be determined quickly by eye

- re-usable ordered sets of grouped subsequences, so that new hosts
  can be created by prototype rather than by class (This is important.
  True class-based configuration tools don't seem to work in the
  field, while protoype-based systems consistently do.  I don't think
  I'm able to explain why yet; it may be nothing more than "this is
  the way sysadmins think", or there may be a more theoretical basis.
  I suspect, again, that it has to do with testing.)

These are things which ISconf/make already does.  In addition, over
the years those of us using ISconf have concluded that we also need: 

- postrequisites (like 'do foo after I'm done')

- decentralized state machine specification (rather than a monolithic
  makefile or script)

- lexically bound syntax (I want to be able to specify each operation
  in the language most suitable for that operation)

- separation of action code from site-specific configuration data

- decentralized editing of state machine specifications (no need to
  log into gold server to update makefile)

- state machine and file transport language integrated, as in
  cfengine, to remove need for NFS mounts to get packages on demand

- embedded documentation, like POD; including dynamically generated
  runbooks and training checklists generated per host class (the
  latter actually looks easy -- name a person to be trained as a
  "target host" and "build" them, checking off training actions as
  they complete)

> > > You haven't convinced me that equivalence of the behavioral effects of
> > > randomly ordered configuration operations is undecidable.  Just
> > > complex. You might be able to do this. 
> > 
> > I think of it from the other direction -- beyond a certain number N,
> > it's infeasible to prove empirically that N configuration operations
> > can be randomly ordered and produce the same behavioral outcome.  N
> > appears to be as low as 8 or 9.
> 
> This is an intractibility argument, not an undecidability argument;
> you don't need turing equivalence for this.  In fact, it follows even
> if the machines being CONFIGURED are simple state machines!  This is
> what bothered me about your math.  Turing equivalence is irrelevant.

Yes, turing equivalence is irrelevant for this math.  It is proving
itself to be a useful model for illustrating the behaviors of a
self-administering system though.  I don't think we could have had a
conversation this productive without that foundation.

I've gotten similar feedback from other people over the past few
months.  Folks seem to need to work with this abstraction; previously,
these discussions have always bogged down in behaviors of individual
packages and operating systems.  Falling back to Turing's theoretical
model seems to give us more neutral material to work with.

I know it's helped me immensely over the last few months in
understanding what I've been doing for the last several years.  ;-)

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Tue Apr 23 20:15:14 2002
Date: Tue, 23 Apr 2002 20:15:14 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Tom Cavin <tec@ai.mit.edu>
Cc: infrastructures@terraluna.org
Subject: Re: [Infrastructures] Re: The Turing Synthesis and Automated Administration
Message-ID: <20020423201514.A23959@scramjet.TerraLuna.Org>
References: <200204211901.MAA01482@roton.terraluna.org> <Pine.GSO.4.33.0204211659430.12245-100000@jumpgate.scorec.rpi.edu> <15556.37718.823408.23363@lap1-wccf.mit.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <15556.37718.823408.23363@lap1-wccf.mit.edu>; from tec@ai.mit.edu on Mon, Apr 22, 2002 at 06:48:54PM -0400
Status: RO
Content-Length: 8165

On Mon, Apr 22, 2002 at 06:48:54PM -0400, Tom Cavin wrote:
> I've been following this discussion (and reading the papers from LISA XV)
> and I think I have a different perspective that may prove useful in these
> discussions.  Steve and Joel's original infrastructures paper (LISA XII)
> deals with trading room systems in an environment where the systems and
> needs are both well defined and centrally controlled.  At the other
> extreme, Mark Burgess is in an academic environment where the systems and
> needs are not well defined or are fairly volitile and where central control
> is very limited.
> 
> These two different environments have lead to two different approaches to
> infrastructures.

Exactly, Tom.

The job of academic environments is to teach and do research.  But
academic computing environments rarely try to intentionally replicate
the operating procedures needed for mission-critical business
reliability.  A student's time isn't considered as valuable as, say, a
trader's.  So student administrators are inadvertently taught to treat
users poorly, and to create and administer ad-hoc systems with lower
reliability and higher labor requirements than what a business would
normally expect.

Academic-based research focuses on fixing the inherent security and
reliability problems of these same ad-hoc, indeterminate environments.
As a result, university research efforts tend to try to solve the
problems of university computing, not those of mission-critical
business environments.

This extends beyond universities, to government research labs, due to
their academic roots.  My favorite recent example is when a local lab
performed extensive planning and analysis for the upgrade of a
heavily-used login server.  This planning incorporated the idea of a
three-day shutdown during the upgrade.  Horrified on general
principle, I proposed some alternatives, but didn't expect to make a
dent.  The extended outage took place as planned.  This sort of
upgrade path would be unacceptable in a mission-critical business
environment.  The interns in that lab learn very bad habits.  These
habits will cut into their quality of life -- they simply aren't
gaining the tools they need to be more effective.

And so the systems administrators produced by our educational
institutions must be retaught when they land in a business
environment.  I've long recognized that the "best" administrators
often have no formal C.S. training at all -- they weren't formally
taught ad-hoc habits.

There is no strong feedback loop in place to correct this mismatch
between education and workplace.  For there to be one, academic
computing would need to be more open to those of us on the receiving
end of their efforts.

A common response I get when I mention mission-critical reliability is
"we can't afford that", as if I were talking about *spending* money.
I'm talking about saving it.  A well-managed computer costs a certain
amount of money, and gives you a certain return on that investment.
If you buy the same computer, and then manage it badly, it will cost
you more in labor and provide a lower ROI.  Which is cheaper?

Sure, expensive hardware solutions like HA and SAN will help you
squeeze out a few more hours of uptime per year.  But in most cases,
procedural changes will avoid most outages.

> Needless to say, the tools needed to manage these two cases are likely to
> be rather different.  While both tools are designed to manage the
> infrastructure, the goal of Isconf is to take systems from one known state
> to another in an orderly fashion, and the goal of Cfengine is to get a
> system to a partially known state from an unknown state.

Yes.  

And "partially known" is not a comfortable foundation for business
operations.  There's enough uncertainty in venture financing, product
development, marketing, and staffing -- why add to it by messing with
the reliability and predictability of the underlying infrastructure?  

Years after launch is way too late to try to get a grip on a business
-- this is true whether we're talking about infrastructure controls or
financial controls.  

> It is also the case that much of the published research since Steve and
> Joel's LISA XII paper has been done in the academic world.  

I suspect it will continue to be, until we ourselves do something
about it.  The fact is, most professionals just don't have the time or
energy to write down what they're doing, and most employers will
neither give permission to publish, nor pay the author to do it on
company time.

This heavily weights the ratio of academic versus professional authors
who are able to publish at LISA, for instance.

> The whole concept of an infrastructure is a really sweet idea, and that
> first paper was a real eye-opener for many folks, especially the academic
> sys-admins and CS faculty.  The fact that there is a real, practical
> solution to the problems of maintaining large groups of systems in a
> scalable manner in one environment seems to have acted as a challenge to
> people to extend this solution to other environments.  No one in academia
> seems to have looked at the problem since MIT's Project Athena (which gave
> us X windows, Hesiod - a precursor to LDAP, Zephyr - a precursor to instant
> messaging, simple unattended installation, and automatic unattended updates
> of systems) in the late 1980s and early 1990s.  Athena, though still used
> at MIT, never caught on elsewhere.  I think the reason for this is that
> there is a steep learning curve to get everything running all at once.  The
> infrastructures paper introduced something that is less ambitious than
> Athena, and more comprehensible.

It's interesting you should mention Athena -- it was one of my own
inspirations for the work we described in that paper.  I agree; Athena
was a wonderful exception.

In our own lab we played with parts of Athena, like Hesiod or Zephyr,
or of CMU's work, like SUP or AFS.  We knew we couldn't roll most of
these out to production, due to conflicting legacy and/or user
requirements.  SUP is the only one of these which actually made it
into production, and we used it heavily (it's still a better technical
fit than anon rsync.)

We were intentionally making the time to do masters-equivalent
research, exploring existing art and figuring out why it worked and
why it didn't, and then building on what we'd learned.  We *made* the
opportunity to do this; on the org chart were were just a small admin
group.  It meant a lot of late nights.  For several months, I carried
a whole bundle of past LISA papers around, reading them on the train.

The whole goal was to implement fully automated administration using
open standards with as little complexity as possible.  In hindsight, I
think that we were lucky to hit on the makefile thing early on -- and
I probably wouldn't have tried that if I hadn't just left USL.  The
UNIX source code makes extensive use of VPATH in makefiles, to handle
different hardware platforms.  I thought we could do something similar
at the admin layer.  That never worked out, but the makefiles
themselves stayed and grew.

				* * *

I think there's a larger story here, something more fundamental
missing from systems administrator education: a business foundation.
In today's businesses, the systems administrators daily make decisions
which have a fundamental impact on the ability of the business to
function.  As clerical jobs have been automated and migrated into
systems, systems administrators have become defacto business
administrators, making sure that business information flows from
source to sink.  They usually do this in isolation from the executive
team, and often don't have the background or authority to incorporate
business requirements into these decisions.  This translates into less
effective businesses and limited career paths for IT professionals --
a CIO rarely makes CEO.

Again, fixing this for the whole industry is going to require
fundamental academic change, starting way down at the undergrad level.

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Tue Apr 23 22:41:35 2002
Date: Tue, 23 Apr 2002 22:41:35 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Tom Cavin <tec@ai.mit.edu>
Cc: infrastructures@terraluna.org
Subject: Re: [Infrastructures] Re: The Turing Synthesis and Automated Administration
Message-ID: <20020423224135.B23959@scramjet.TerraLuna.Org>
References: <200204211901.MAA01482@roton.terraluna.org> <Pine.GSO.4.33.0204211659430.12245-100000@jumpgate.scorec.rpi.edu> <15556.37718.823408.23363@lap1-wccf.mit.edu> <20020423201514.A23959@scramjet.TerraLuna.Org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020423201514.A23959@scramjet.TerraLuna.Org>; from stevegt@TerraLuna.Org on Tue, Apr 23, 2002 at 08:15:14PM -0700
Status: RO
Content-Length: 2190


On Tue, Apr 23, 2002 at 08:15:14PM -0700, Steve Traugott wrote:
> On Mon, Apr 22, 2002 at 06:48:54PM -0400, Tom Cavin wrote:
> > I've been following this discussion (and reading the papers from LISA XV)
> > and I think I have a different perspective that may prove useful in these
> > discussions.  Steve and Joel's original infrastructures paper (LISA XII)
> > deals with trading room systems in an environment where the systems and
> > needs are both well defined and centrally controlled.  At the other
> > extreme, Mark Burgess is in an academic environment where the systems and
> > needs are not well defined or are fairly volitile and where central control
> > is very limited.
> > 
> > These two different environments have lead to two different approaches to
> > infrastructures.
> 
> Exactly, Tom.
> 
> The job of academic environments is to teach and do research.  But
> academic computing environments rarely try to intentionally replicate
> the operating procedures needed for mission-critical business
> reliability.  

Just came back from dinner.  Joyce looked at this previous message and
said that she thinks it's not too bad, but that a lot of people are
still going to disagree with me.  I admit that a lot of it is worded
strongly.  Let me try again:

- Tom's original, very-well-written point was that business and
  academia are very different environments and, as such, need to be
  managed differently, with different toolsets and procedures.  I
  agree.  

- But after reading Tom's message, I had to ask myself "Why?  Why are
  they different?"  
  
- Should they be the same?  Should business environments remain
  ad-hoc, like academia, the way they have been for years?  Or should
  academic computing catch up to what we've started doing in business,
  and become more structured, to better represent the environment most
  students will graduate into?

- Or should the two remain different, in order to each serve their own
  needs more completely?  If so, how do we educate systems
  administrators for business environments?

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Fri Apr 19 18:11:33 2002
Date: Fri, 19 Apr 2002 18:11:33 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Jon Stearley <jrstear@sandia.gov>
Cc: "Michael J. Carter" <mcarter@lanl.gov>, infrastructures@terraluna.org,
	luke@madstop.com, labrown@nc.rr.com
Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration
Message-ID: <20020419181133.A22600@scramjet.TerraLuna.Org>
References: <20020419103809.A4927@scramjet.TerraLuna.Org> <20020419184634.GD4514@sandia.gov> <1019242735.32636.69.camel@guglielmo> <20020419201348.GF4514@sandia.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020419201348.GF4514@sandia.gov>; from jrstear@sandia.gov on Fri, Apr 19, 2002 at 02:13:48PM -0600
Status: RO
Content-Length: 5326

Hi Jon!

On Fri, Apr 19, 2002 at 02:13:48PM -0600, Jon Stearley wrote:
> > be available, who decides when it needs to be used?
> > 
> > I don't see how a self-administering host can know that it has deviated
> > from the norm (presumably due to an "order error") and needs to start at
> > square one.
> > 
> > If the admin has to push reset, then the host isn't self-administering.
> 
> absolutely true.  but is fully/truely/super-duper self-administering
> infrastructures really the goal?  there are precious few (*if any*)
> human creations for which periodic human resets are not necessary and
> accepted practice.  steve's points (and the whole discussion) is
> extremely valueable, don't get me wrong.  industry chooses something
> along the lines of maximum effectiveness/(cost/benfit), that's all.
> my 2c.

I think this is a key place where my point has been missed a few times
over the years.  Self-administering can't be done halfway -- doing
some things automatically and some things manually is the *most*
expensive way to go, because you have the overhead of the automation
as well as the indeterminate behavior of human error.  It's the worst
of both worlds.

Fully self-administering is what I do.  It's all I do.  To me, it's
not a "goal" any more -- it's instead been a practice for 8 years.
Things are getting better, but for most of that 8 years I've felt like
a 1905-era pilot being told that what he does is impossible because
'that thang can't never fly'.  ;-)

At the other extreme, I've seen some really skull-cracking analysis of
how self-administering systems *might* be done at some point in the
future.  These are important from a research perspective, and I've
gotten good ideas from them, but I've tried very hard to stay right in
the middle, always sticking with tools and techniques usable in
current production -- as a consultant, I can't get too far ahead or I
don't get paid.

As most of you know by now, all I've usually ever used is makefiles
and some wrapper scripts that run at boot -- the sysadmin equivalent
of cloth wings and piano wire.  We can work together on building
aluminum wings and jet engines, but in order for us to do that we
first need to finally agree that human flight is feasible and
desireable.

I don't rely on "if all else fails, re-install".  I can see how the
"virtual machine" idea in the bootstrapping paper may have given this
misimpression -- at one point I think I even use the phrase
"rebuilding a virtual machine".

Re-installing is a disaster-recovery response.  It's expensive in
labor, expensive in political capital unless there's actually been a
disaster, and can be horribly expensive in terms of reliability and
unscheduled downtime.  From my perspective, jumpstarting machines to
get them back into a known state is a worst-case scenario.  Once
you've crashed the airplane, you can rebuild it from scratch, but the
damage you've done to the passengers by that point is irreversible.

"Hey, I've just fubared all 2000 of the machines on the trading floor;
can someone stop the markets for a few hours while I re-install them?"
That's a situation I wanted to avoid at all costs.  ;-)

It's been 8 years, thousands of machines, and several organizations
since I started doing deterministic, ordered changes, and I can't
recall a time when I've ever re-installed the same operating system on
a managed production desktop or server, short of hardware failure.
Joel, can you think of anything?

I suspect deterministic ordering is the airfoil of self-administering
systems.  If I'm right, then purposely randomized ordering, while
fascinating, might be roughly equivalent to wing-flapping machanisms.
It took centuries for aerodynamics experimenters to figure out and
prove to each other that the flapping wasn't important, but the
airfoil was.  I think we may be in a similar situation here.

I know that to encourage things to get out of sequence is to risk
breaking production machines in such a way that users will notice.
This is exactly what we're supposed to prevent.  It's a sysadmin's
version of the hippocratic oath; "First, do no harm".

"But ordering isn't as important as long as you test your changes
before putting them in production."  Sure.  But to do that, the
production machines have to have been built *exactly* the same way
that the test environment was, or else the test is invalid.  The only
way to ensure that test and production are built the same way is to,
well, build them the same way, applying changes in the same order in
each case.

"But you can just test in production, and if the change broke
something, you can always back it out.  If the backout fails, then
just re-install".  If anyone still feels this way, then they should
re-read the above paragraphs about reliability and downtime.  Testing
in production, and relying on scheduled downtime and backout windows,
eats into uptime numbers and precludes 24x7 operation.  A global
economy has no "off hours".  I think about this every time I'm waiting
in line at the Hertz counter after arriving in Tampa at 2 a.m. on a
Friday night, right smack in the middle of the Hertz scheduled
maintenance downtime.

Does any of this make sense?

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Sat Apr 20 10:34:56 2002
Date: Sat, 20 Apr 2002 10:34:56 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Daniel Hagerty <hag@linnaean.org>
Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration
Message-ID: <20020420103456.A24714@scramjet.TerraLuna.Org>
References: <20020419103809.A4927@scramjet.TerraLuna.Org> <20020420161429.120E61BA1B@perdition.linnaean.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020420161429.120E61BA1B@perdition.linnaean.org>; from hag@linnaean.org on Sat, Apr 20, 2002 at 12:14:28PM -0400
Status: RO
Content-Length: 3032

Hi Dan!

On Sat, Apr 20, 2002 at 12:14:28PM -0400, Daniel Hagerty wrote:
>  > *No* language or syntax can completely specify administrative actions
>  > to be applied to a machine without also specifying the complete
>  > history and order of previous changes.
> 
>     Are you trying to tell me that wasn't obvious to all of you?

It's something I've always assumed, but I was really surprised myself
to see that at least three very key people in LISA leadership hadn't
yet got that far.  I heard over and over during this past LISA that "a
sufficiently descriptive language" would solve any problem, removing
the need for strict ordering etc.  It's taken me the last few months
since then to marshal the synapses enough to make a case.

> I've had that in the back of my mind for at least two years now in
> building my local half assed infrastructure based on your ideas.  It
> was very clear that the total state of the machine is very important
> to ongoing mutation.  No, I haven't actually done work towards this, I
> just noticed it in how my infrastructure was failing.

;-)

>     As an example, if your system doesn't track it's full state, how
> does it know when a human has come along and performed an uncontrolled
> mutation?

How do/would you track full state?  I've thought about how nice it
would be to use a loadable kernel module to intercept open() calls
etc.  That's the only way I've thought of to really track *all* of the
changes made to a machine in real time.  I could do it for Linux, but
someone would have to write an equivalent module for every other
stinking version of every variant of UNIX.

That still doesn't solve the problem of what to *do* with that
information -- if someone edits /etc/hosts on a target machine, would
you propagate the new version to all other machines, or store it on
the gold server as a machine-specific variant for that host only?  Or
just blow up in their face and prevent them from editing in the first
place?

So I haven't yet tried to track out-of-band changes.  I use the rules
of "don't change things manually" and "if a human changes something
then it's a security problem and needs to be dealt with accordingly".
This gets the point across, and I've noticed people accepting this
more readily just within the last year.  In production environments
people always get the idea real quick.  I think the biggest resistance
comes from campus admins who are used to using student labor.

But I think that some kernel-level support is going to be needed to
ultimately solve the problem.

>     I didn't make last lisa because of funding issues, and this is now
> the second thing I've hard that I'm very sorry I missed.

Last year was the first time I was there since 1998 -- it's changed
for the better, but that just means that the debate has been elevated
to new levels.  At least people are finally over the push vs.
pull thing.  ;-)

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From stevegt@TerraLuna.Org Sat Apr 20 14:44:34 2002
Date: Sat, 20 Apr 2002 14:44:34 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Dylan Northrup <docx@io.com>
Cc: Jon Stearley <jrstear@sandia.gov>,
	"Michael J. Carter" <mcarter@lanl.gov>,
	infrastructures@TerraLuna.Org
Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration
Message-ID: <20020420144434.A6759@scramjet.TerraLuna.Org>
References: <20020419181133.A22600@scramjet.TerraLuna.Org> <Pine.LNX.4.44.0204192117280.10113-100000@eris.io.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.LNX.4.44.0204192117280.10113-100000@eris.io.com>; from docx@io.com on Fri, Apr 19, 2002 at 09:29:23PM -0500
Status: RO
Content-Length: 6049

On Fri, Apr 19, 2002 at 09:29:23PM -0500, Dylan Northrup wrote:
> An infinite number of monkeys in the guise of Steve Traugott wrote:
> 
> :=I think this is a key place where my point has been missed a few times
> :=over the years.  Self-administering can't be done halfway -- doing
> :=some things automatically and some things manually is the *most*
> :=expensive way to go, because you have the overhead of the automation
> :=as well as the indeterminate behavior of human error.  It's the worst
> :=of both worlds.
> 
> I disagree.  It's a law of diminishing returns.  The closer you get to fully
> automating your system, the harder it becomes to completely automate away
> the rest of it.  It's the computer version of Xeno's Paradox where you
> really *can* only get half the way through what's left.

We have to be using different meanings of the term "automate".  

Automated systems administration is very straightforward.  There is
only one way to change the contents of disk or RAM in a running UNIX
machine -- the syscall interface.  The task of automated
administration is simply to make sure that each machine's kernel gets
the right system calls, in the right order, to make it be the machine
you want it to be.  

There are no system calls used in the course of administration which
can't be issued from a shell script, a perl script, or (in very rare
cases) a piece of C code.  This makes the process of putting
automation in place itself very straightforward.  In my case, rather
than typing a command at a shell prompt, I put it into a makefile
stanza.  

If it's a complex series of commands, with conditionals and loops
etc., then it goes into a script, the script goes into CVS, and the
script name goes into the makefile.

Putting the CVS and makefile framework in place is itself also very
straightforward, as we just proved at Caterpillar earlier this year --
Luke took 6 hours to go from scratch to a running installation of
isconf, with me looking over his shoulder.  Luke is an exceptionally
smart guy, (Hi Luke!)  ;-P   But I think just about anyone on this
list should be able to get similar results.  

I was there for a week; the first 4 days were spent on political
buy-in, only the last on technology.  This is important -- the only
major hurdles I have ever run into for the last few years are
political rather than technological.  The major improvement I've had
to make in myself is in learning how to show both management and
sysadmins that this stuff can be easy.

People look for ways to prove the viewpoint they hold.  This can
happen unconsciously.  They may want to automate, but their own
perception of the difficulty can block them, even to the point of
convincing their own management that it's not worth trying.  

And so the industry consensus is that fully automated administration
is hard.  This consensus must be broken on a per-site basis, at least
until we reach a critical mass of automated sites.

In my experience, the more tasks you automate, the easier it gets to
automate the rest.  

> Also, if you really *can* automate *everything* then you're not solving
> interesting problems, or your problems aren't evolving, or your customers
> are very boring and completely non-standard indeed.  I find that once you've
> conquered/automated a particular problem, there's always one more problem to
> solve . . . a wrinkle I hadn't thought of or a new requirement I didn't know
> about.  You can never fully automate a sufficiently complex system. . .
> especially when users are involved ;-)

I separate problem-solving and corrective action from each other.

Problem solving is an intelligence function.  This is the job of
humans.

Corrective action is a manipulative function.  This is the job of
code.

A systems administrator solves a problem and implements corrective
action via a shell prompt, on one or more machines, making the
corrective action more or less non-repeatable.

An infrastructure architect solves a problem and implements corrective
action in repeatable code, which then executes on all applicable
machines.  Execution is within a control framework (like make) and
"applicable" is defined by configuration files within that framework
(hosts.conf in isconf version 2i).

> Now, this isn't to say you shouldn't automate as much as possible.  Fix a
> problem once and either solve the problem completely or automate the work
> around.  Your toolkit should always be expanding.  But the tools won't do
> the job without proper guidance from the mechanic.

ISconf hasn't needed to expand very much over the last 8 years, unless
you count the makefile stanzas themselves.  It *has* been rewritten a
few times though, as I got better at understanding what it does.  

I jotted some notes on my HP palmtop on the train into NYC in early
1994 and rolled a makefile and 'configure' script out to a few hundred
machines globally that same year, thinking "this thing's a hack but
it'll help me get a handle on the chaos".  

That "hack" worked real well compared to everything else I've tried
before and since, and it's taken me a long time to figure out exactly
why.  I think I only really grokked it in full during the last year.

It's funny -- I had looked at cfengine in 1994, and almost used it
instead.  It wasn't until years later, in 1998, that I *tried* to use
cfengine as a replacement for isconf and realized that the two solve
orthogonal problems and aren't interchangeable with each other.

The only major addition I've needed to add so far is some perl
networking code to solve the "barrier" problem of synchronizing
administrative events on two or more machines at a time, such as when
building nodes in an HA cluster.

But the core mechanism is still just plain old, simple make.  The only
innovation there has been to use GNU make, to get conditionals into
the makefiles themselves.  The Perl rewrite might use Make.pm or a
derivative.

Steve
-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com

From hag@linnaean.org Sat Apr 20 11:57:44 2002
Received: from perdition.linnaean.org (h00d0b71b83ad.ne.client2.attbi.com [65.96.132.240])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id LAA25256
	for <stevegt@TerraLuna.Org>; Sat, 20 Apr 2002 11:57:44 -0700
Received: by perdition.linnaean.org (Postfix, from userid 31013)
	id 7A2931BA1E; Sat, 20 Apr 2002 14:57:47 -0400 (EDT)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: Steve Traugott <stevegt@TerraLuna.Org>
Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration
In-Reply-To: <20020420103456.A24714@scramjet.TerraLuna.Org>
References: <20020419103809.A4927@scramjet.TerraLuna.Org>
	<20020420161429.120E61BA1B@perdition.linnaean.org>
	<20020420103456.A24714@scramjet.TerraLuna.Org>
X-Mailer: VM 6.34 under Emacs 20.7.1
Reply-To: Daniel Hagerty <hag@linnaean.org>
From: Daniel Hagerty <hag@linnaean.org>
Sender: Daniel Hagerty <hag@linnaean.org>
Date: Sat, 20 Apr 2002 14:57:47 -0400
Message-Id: <20020420185747.7A2931BA1E@perdition.linnaean.org>
Status: O
Content-Length: 3079

 > yet got that far.  I heard over and over during this past LISA that "a
 > sufficiently descriptive language" would solve any problem, removing
 > the need for strict ordering etc.  It's taken me the last few months
 > since then to marshal the synapses enough to make a case.

    The first half of the sentence makes sense.  Since strict ordering
strikes me as part of the problem, the second half does not, or
rather, ordering is part of what the language describes.

 > How do/would you track full state?  I've thought about how nice it
 > would be to use a loadable kernel module to intercept open() calls
 > etc.  That's the only way I've thought of to really track *all* of the
 > changes made to a machine in real time.  I could do it for Linux, but
 > someone would have to write an equivalent module for every other
 > stinking version of every variant of UNIX.

    As you note, it's truely intractible without instrumenting at a
pretty low level.  For the most part, I want to cheat -- the system
knows what it did and should be able to detect a mutation that it
didn't do, and declare fault.  While some things may be manageable as
a merge deal, I don't want to go there.

 > That still doesn't solve the problem of what to *do* with that
 > information -- if someone edits /etc/hosts on a target machine, would
 > you propagate the new version to all other machines, or store it on
 > the gold server as a machine-specific variant for that host only?  Or
 > just blow up in their face and prevent them from editing in the first
 > place?

    That's a policy issue.  There's a lot of room to argue about how
to do things, and what works for one situation may not work for others.

 > But I think that some kernel-level support is going to be needed to
 > ultimately solve the problem.

    I don't think it's wise to predicate everything on that.  We
should develop techniques to manage in the presence of a system
that isn't as cooperative as we like; I know oracle doesn't listen to
me when I tell them that everybody would benefit if they'd do "this".

    Remember that microsoft isn't going away anytime soon, and we have
to push their boxes around too.  I really doubt their ideas of where
to take management are fully in line with ours.

 > Last year was the first time I was there since 1998 -- it's changed
 > for the better, but that just means that the debate has been elevated
 > to new levels.  At least people are finally over the push vs.
 > pull thing.  ;-)

    No, not entirely.   Have you looked arusha?  I haven't looked in
depth, but some of the admin specified object model stuff they've done
looks very interesting.

    But at the core of it all is this "we're push vs pull neutral"
thing that just has world of hurt written all over it.  Sigh.

    Anyway, I'll catch up on the rest of infrastructures soon.  This
was a hell week of deadlines, and now that midnight friday has passed,
I can start cleaning up the damage I did to make externally reinforced
business and telco realities happen under completely unrealistic
constraints :-)


From jrstear@sandia.gov Thu Apr 25 16:15:19 2002
Received: from mm02snlnto.sandia.gov (mm02snlnto.sandia.gov [132.175.109.21])
	by roton.terraluna.org (8.9.3/8.9.3) with SMTP id QAA02519;
	Thu, 25 Apr 2002 16:15:19 -0700
Received: from 132.175.109.1 by mm02snlnto.sandia.gov with ESMTP (
 Tumbleweed MMS SMTP Relay (MMS v4.7)); Thu, 25 Apr 2002 17:14:55 -0600
X-Server-Uuid: 95b8ca9b-fe4b-44f7-8977-a6cb2d3025ff
Received: from mercy.sandia.gov (mercy.sandia.gov [134.253.242.218]) by
 sass165.sandia.gov (8.12.3/8.12.3) with ESMTP id g3PNFXP7015748; Thu,
 25 Apr 2002 17:15:33 -0600 (MDT)
Received: from jrstear by mercy.sandia.gov with local (Exim 3.33 #1 (
 Debian)) id 170sSZ-0007Xh-00; Thu, 25 Apr 2002 17:15:31 -0600
Date: Thu, 25 Apr 2002 17:15:30 -0600
To: "Steve Traugott" <stevegt@terraluna.org>
cc: infrastructures@terraluna.org, luke@madstop.com, labrown@nc.rr.com
Subject: Re: [Infrastructures] The Turing Synthesis and Automated
 Administration
Message-ID: <20020425231530.GI2481@sandia.gov>
References: <20020419103809.A4927@scramjet.TerraLuna.Org>
MIME-Version: 1.0
In-Reply-To: <20020419103809.A4927@scramjet.TerraLuna.Org>
User-Agent: Mutt/1.3.24i
From: "Jon Stearley" <jrstear@sandia.gov>
X-Filter-Version: 1.8 (sass165)
X-WSS-ID: 10D652651816631-01-01
Content-Type: text/plain; 
 charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Status: O
Content-Length: 1063

> Self-administering hosts are in effect Turing machines; they do brain
> surgery on themselves.  They don't behave according to strict
> Von-Neumann rules, but have complex feedback loops, recursion, and
> self-modifying code.
> 
> *No* language or syntax can completely specify administrative actions
> to be applied to a machine without also specifying the complete
> history and order of previous changes.  

haven't been able to keep on the whole thread but wanted to say that
after more though, i realize that my infrastructure does indeed have a
very extensive state graph, full of dependancies and ordering.  in my
case, i chose debian which has a very extensive pkg dependancy system
which runs nightly as directed by cfengine.  cfengine also does the
relatively cosmetic stuff (selecting which packages should be
installed, their site-specific configuration, misc symlinks and other
config, cleaning and etc - i LOVE cfengine), but without apt-get doing
the brain surgery i would be _much_afraid_.  so "you go" steve, this
is an excellent topic!

-jon


From couch@eecs.tufts.edu Fri Apr 19 13:57:42 2002
Received: from lin17.eecs.tufts.edu (IDENT:postfix@lin17.EECS.Tufts.EDU [130.64.23.47])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id NAA15910
	for <stevegt@terraluna.org>; Fri, 19 Apr 2002 13:57:42 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin17.eecs.tufts.edu (Postfix) with ESMTP id 44D5B1FEA8
	for <stevegt@terraluna.org>; Fri, 19 Apr 2002 16:57:53 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 30)
	id E4D2ECFC4; Fri, 19 Apr 2002 16:57:52 -0400 (EDT)
To: stevegt@terraluna.org
Subject: ordering
Message-Id: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu>
Date: Fri, 19 Apr 2002 16:57:52 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: RO
X-Status: A
Content-Length: 2789

I've read http://www.infrastructures.org/papers/turing.html
at the urging of my staff who are following the infrastructures list. 

I think at the outset that I agree somewhat with the conclusions but
not with the mathematics.  There are some mistakes.
- if you have N things over all orderings, there are N! possible
  outcomes, not N^2.  The N^2 in my paper is an embedding theorem, not
  an outcome theorem.  While all N! permutations can be embedded in a
  sequence of length N^2, all N! permutations may have unique
  outcomes as sequences.  The reduction to N^2 outcomes comes from
  strong convergence properties of the N operations, something that I
  know you don't think is practical to assure. 
- One thing that you ignore is the concept of orthogonality.  Two
  systems are orthogonal if changes to one can't affect the other, and
  vice versa.  The effect of orthogonality is that operations on
  orthogonal subsystems always commute.  E.g., a*b = b*a.  The effect
  of orthogonality is to reduce the space of possible sequences.
  E.g., I think everyone would agree that the 'networking subsystem'
  is orthogonal to vi.  Changes to vi can't affect networking, and
  vice versa.  Thus the results of a change to both is the same in
  either order.
- My paper does not advocate randomization of ordering except in cases
  where that ordering exposes an unknown and otherwise unobservable
  factor in existing configuration.  Frode's paper, however, utilizes
  randomization as an obscurity mechanism.  If you look carefully
  at my assumptions, you'll see that one can prove that my lack
  of ordering is *unobservable* in the final configuration. 
  That's how strong my assumptions are. 

I have been working on the concept of ordering more than you 
realize, but can't as the chair publish a paper on it this year. 
In particular, I have designed a formalism that portrays the 
behavioral effects of configuration, then analyzed the structure
of operations on behavior as a semigroup.  This semigroup can 
be decomposed by the factor theorem into subsemigroups describing
changes to disjoint subsystems. 

This fancy math demonstratese that storing the complete ordering is
'more work than is required', though cfengine stores 'less than is
required'.  All one has to do is to serialize the changes to
orthogonal subsystems.  Since operations on orthogonal subsystems
commute, one can always express them as groups of operations on one
subsystem at a time.

You're onto something but have to be careful about the math. 

You haven't convinced me that equivalence of the behavioral effects of
randomly ordered configuration operations is undecidable.  Just
complex. You might be able to do this. 

Feel free to keep up correspondence. I think there is a paper here. 
-alva


From couch@eecs.tufts.edu Fri Apr 19 17:22:56 2002
Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id RAA25355
	for <stevegt@terraluna.org>; Fri, 19 Apr 2002 17:22:56 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin16.eecs.tufts.edu (Postfix) with ESMTP id 7431047C29
	for <stevegt@terraluna.org>; Fri, 19 Apr 2002 20:23:06 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 30)
	id 0706BCFC4; Fri, 19 Apr 2002 20:23:05 -0400 (EDT)
To: stevegt@terraluna.org
Subject: more about turing equivalence.
Message-Id: <20020420002305.0706BCFC4@piano.eecs.tufts.edu>
Date: Fri, 19 Apr 2002 20:23:05 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: RO
X-Status: A
Content-Length: 3047

I took a short walk and reflected on your paper some more.

a) Cfengine's configuration operations aren't turing equivalent, nor
   are the operations posited by maelstrom.  Both are "state machines"
   in the language of the Chomsky hierarchy.  Recall that there are
   three levels of complexity in the Chomsky model, provably
   different:
   i) state machines/regular expression grammars.  Most cfengine
      configurations live here.  It takes some trouble to make one
      that isn't a state machine.  Likewise, makefiles can usually be
      REDUCED to state machines by looking at what they do on each
      architecture and codifying that as state transition rules. 
   ii) pushdown automata/context-free grammars.  Right now I know of
      no configuration management tools in this category. 
   iii) turing machines/unconstrained grammars.
      The actual machines being configured live here.  This is where
      undecidability may lurk.
   This makes the CONFIGURATION model simpler than the EXECUTION
   model.  While the execution model remains turing equivalent, the
   configuration model is fundamentally simpler in structure.

b) Even with this simplification, the sequencing problem remains.  It
   is easy to construct a sequence of invocations of N state machines
   such that only the order x1..xn gives the appropriate outcome on
   the turing machine tape.  This is the same thing as what you're
   saying; without more simplifications, the outcome of every order is
   potentially different.

c) Determining an appropriate order for arbitrary state machines to
   achieve a given outcome for configuration is INTRACTIBLE but not
   UNDECIDABLE.  Hard but not impossible.  All one has to do is to try
   all the orders and choose the one that produces the appropriate
   outcome.  It becomes undecidable if other processes are
   SIMULTANEOUSLY modifying configurations, asynchronously.

d) Determining whether an appropriate order achieves a particular
   behavioral outcome remains UNDECIDABLE.  This is just another
   flavor of Church's thesis.

I think the real issue here is that VALIDATION of behavior only
applies to a specific sequence of operations.  If one changes the
sequence of operations, then one must prove that the other sequence
reproduced the same disk state, bit for bit, before one can claim that
the prior validation applies to the new state.  This is a claim of
"transitivity of validation", the same claim utilized by the Linux
Standard Base (LSB).  Now

a) In the absence of further assumptions, whether two different
   configurations exhibit the same behavior is UNDECIDABLE(can prove).

b) so whether two different orders of configuration operations exhibit
   the same behavior remains undecidable by lifting.

The most important force in the world to system
administrators is not configuration management, but configuration
validation. It is this validation that suffers from turing 
equivalence problems, not the act of configuration itself. 

I'll keep thinking about this. 

-alva


From couch@eecs.tufts.edu Thu Apr 25 06:06:56 2002
Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id GAA08449
	for <stevegt@TerraLuna.Org>; Thu, 25 Apr 2002 06:06:56 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin16.eecs.tufts.edu (Postfix) with ESMTP id 0CFB447BFD
	for <stevegt@TerraLuna.Org>; Thu, 25 Apr 2002 09:07:11 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 30)
	id 95FC7CFC4; Thu, 25 Apr 2002 09:07:10 -0400 (EDT)
To: stevegt@TerraLuna.Org
Subject: Re: ordering
Message-Id: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu>
Date: Thu, 25 Apr 2002 09:07:10 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: RO
X-Status: A
Content-Length: 13545

> On Fri, Apr 19, 2002 at 04:57:52PM -0400, Alva L. Couch wrote:
> > I think at the outset that I agree somewhat with the conclusions but
> > not with the mathematics.  There are some mistakes.  - if you have N
> > things over all orderings, there are N! possible outcomes, not N^2.
> 
> Of course.  My mistake.  Fixed.  But I was talking about number of
> possible disk states, given an N-bit disk.  That would be 2^N possible
> disk states.  I had it backwards -- silly error.  Thank you.
> 
> On the other hand, you make a very good point -- ordering of
> operations is itself numerically interesting, because of that
> factorial.  So I'm adding that, and elaborating below.  
> 
> An HACMP cluster node build is around 121 discrete operations, for
> instance, if you only count the number of stanzas in the ISconf
> makefile.  That would be 121!, or 8.09*10^200 possible orderings.
> 
> These produce a disk content which uses 2Gb of space, and has
> 2^(8*2*10^9) possible bit states.  I don't yet know what magnitude
> that number is -- I've had a perl one-liner chewing CPU for several
> hours so far.
> 
> > - One thing that you ignore is the concept of orthogonality.  Two
> >   systems are orthogonal if changes to one can't affect the other, and
> >   vice versa.  The effect of orthogonality is that operations on
> >   orthogonal subsystems always commute.  E.g., a*b = b*a.  The effect
> >   of orthogonality is to reduce the space of possible sequences.
> >   E.g., I think everyone would agree that the 'networking subsystem'
> >   is orthogonal to vi.  Changes to vi can't affect networking, and
> >   vice versa.  Thus the results of a change to both is the same in
> >   either order.
> 
> Yes, but in practice, who decides whether, or to what degree, two
> subsystems, or operations on them, are related?  For example, the
> latest version of VIM does contain networking code, to support remote
> slaved edit sessions.  I would never have known about this possible
> dependency if I hadn't read the docs.  
> 
> I suspect the most reliable and fast way to ensure that two sequences
> produce the same outcome is via regression testing.  As the number of
> possible sequences increases, so does the required number of
> regression test iterations.

The only way to assure that two subsystems are orthogonal is by design. 
Testing cannot assure that.  This is a fundamental premise of 
software testing. 

I think that this is the ONLY way to assure that a realistic build is
reliable.  But worse, there is NO way to assure the build is
consistent across an enterprise of hosts, other than to assure
homogeneity of operations and even perhaps hardware.

> Going back to that HACMP build, a full regression test run to
> determine equivalency of all possible orderings is problematic to say
> the least; call it 1 hour for each rebuild, that's 9.24*10^196 years.
> (Had to install Math::BigInt to get that.)  ;-)

Kind of cute but perhaps irrelevant.  I claim that we know that some
orderings are equivalent, by design. The problem is that we have no 
information about order invariance except by design. If two subsystems
are designed not to interact, e.g., they're in separate boxes and 
controlled by separate drivers and configuration, then this means that the 
configuration operations for the subsystems commute with one another. 
Instead of N!, one is left with N1! * N2! * .... * NK! where 
N1,N2,...,NK are the sizes of sets of operations for the subsystems. 

> Testing an HACMP cluster actually takes several hours after the build
> itself is done.  That extra time would only hurt us by 1 additional
> order of magnitude, so I'm neglecting that. 
> 
> Looking at the HACMP makefile dependency trees, 61 of the operations
> are "leaves".  Only a few of those would be orthogonal to each other.
> But let's assume that they all are.  We know that order matters in the
> remaining 121-61 = 60 operations.  So that lowers us to 9.5*10^77
> years of regression testing to determine which sequences of those 60
> operations are equivalent.

Kind of cute but probably irrelevant. We need to know instead what the 
perturbational effects are. We know one ordering that works. What
does that tell us about other orderings that work?  Knowing that
one works is a lot of information...

> Of course, much of that regression testing could be eliminated by
> pruning "dead branches", sequences which include subsequences which
> are known to be bad.  Let's say that 99.999% can be instantly culled
> by recognition, before testing.  That still leaves 9.5*10^72 years,
> unless I've slipped a decimal somewhere.
> 
> Let's say disk, network, and CPU speeds increase by a couple orders of
> magnitude, and those 1-hour builds each take 1 second instead.  That
> brings us down to 9.5*10^69 years to process the culled order space.
> 
> But this all presupposes that we already know the entire set of
> operations to be performed.  When doing builds or upgrades this is
> never the case.  The actions needed are always discovered in the
> course of growing an infrastructure.  In the case of ISconf, this
> discovery is by the person who's writing and debugging the makefile;
> "Oh, looks like bar needs to be installed before I can install foo; I
> didn't know until now that we even needed bar.  Ok, I'll add bar to
> the makefile as a prereq for foo."  
>   
> This also presupposes that discovering a viable order of actions is
> hard, and that exhaustive exploration of possible orderings is a
> desireable way to yield an optimum outcome.  In practice, the most
> viable order is the order in which the operations were discovered and
> executed in the  first place.  At every point in this process, we
> already have an order that will work.

Yes. Discovering a valid ordering is the same as Church's thesis/the
halting problem.  The reason is that each time you create an ordering, 
you potentially program the machine differently as a result, and have
the validation problem that's hard. 
  
> And this also presupposes that the list of operations is a fixed set,
> regardless of order.  In reality, changing the order will require
> changing the membership of the set -- for example, failing to mount
> /opt before populating /opt/foo will require a later migration onto
> the 'real' /opt.  This migration is an additional operation which
> would have to be discovered and added to the set.

Yes. 
  
> This is all reaching the point from an extreme direction; the point
> being that, for building or maintaining a given host class, it's
> easiest in practice to constrain the ordering to the one known,
> working sequence, test the heck out of it, and only add new actions to
> the end of the sequence.
> 
> New host classes can then be created based on variations of the known
> good sequence.  In ISconf, new host classes get created by creating
> new makefile stanzas, naming ordered sets of existing stanzas as
> prereqs.  In this case, we benefit from known working subsequences,
> and the internal ordering of those subsequences is preserved.

Yes. This is perturbational analysis, as noted above. 
  
> > I have been working on the concept of ordering more than you 
> > realize, but can't as the chair publish a paper on it this year. 
> > In particular, I have designed a formalism that portrays the 
> > behavioral effects of configuration, then analyzed the structure
> > of operations on behavior as a semigroup.  This semigroup can 
> > be decomposed by the factor theorem into subsemigroups describing
> > changes to disjoint subsystems. 
> > 
> > This fancy math demonstratese that storing the complete ordering is
> > 'more work than is required', though cfengine stores 'less than is
> > required'.  All one has to do is to serialize the changes to
> > orthogonal subsystems.  Since operations on orthogonal subsystems
> > commute, one can always express them as groups of operations on one
> > subsystem at a time.
> 
> I think, after looking at it over the last few years, that this is a
> lot of what make is doing for ISconf.  I preach that targets should
> only get added to the end of prereq lists when editing a makefile.
> This means that changes within orthogonal subsets are always
> deterministically serialized.  

Yes.  Actually if you do this you don't even need to know anything
about orthogonality. 
  
> More importantly (I think), human error in determining orthogonality
> tends to be masked by the deterministic way that make iterates over
> multiple prereqs in the same sequence every time. This way, even
> though two actions might not be explicitly linked in the dependency
> tree, they are nevertheless serialized identically on every run,
> removing the possibility of indeterminate behavior.

Agreed. 
  
> For example, the deepest dependency graph explicitly coded in that
> HACMP makefile is 8.  There are deeper HACMP dependencies not
> explicitly coded; they are being maintained by make's deterministic
> prereq list behavior.  I suspect the actual depth is at least 30.  If
> I were to rearrange the order of any of those non-explicit prereq
> lists, experience tells me that in most cases the build will break.  

This is important to note and document. Tell us a little about what breaks
and how subtle the breaks are. 
  
> If make's behavior were indeterminate, if it randomized the order of
> things not explicitly stated as dependencies, or if I were to run make
> with a -j flag, allowing it to parallelize, experience tells me that
> the resulting outcome would be indeterminate and non-reproducable.  At
> best, the build would cleanly break.  At worst, the build would
> complete, and I would need to do extensive testing to determine
> whether the machine is actually capable of doing its job.

This is important to note and document.  I would say not that "make" 
is necessary, but that "serialization" is necessary, and I can probably
simplify your syntax with a simpler tool than make...

> My strongest driver is the need to remove the possibility of
> indeterminate behavior -- it's a business requirement.  When I'm the
> consultant flipping the switch to demonstrate to the customer what I
> can do, indeterminate behavior is the greatest risk I can think of.
> It doesn't matter how long the build or upgrade takes, as long as it's
> less than an hour or two.  It doesn't matter if the predictable,
> plodding nature of the build introduces security holes at
> predetermined times; I haven't seen that exploited yet and don't
> expect to, not for a few more years.  But I have seen unforseen
> dependencies between subsystems cause build failures; it's a common
> failure mode.

Yes.  Another driver is enterprise-wide consistency.  In a Microsoft
world your result is obvious to anyone who's tried to configure a
complex network binding with Win2k.
  
> > You haven't convinced me that equivalence of the behavioral effects of
> > randomly ordered configuration operations is undecidable.  Just
> > complex. You might be able to do this. 
> 
> I think of it from the other direction -- beyond a certain number N,
> it's infeasible to prove empirically that N configuration operations
> can be randomly ordered and produce the same behavioral outcome.  N
> appears to be as low as 8 or 9.

This is an intractibility argument, not an undecidability argument;
you don't need turing equivalence for this.  In fact, it follows even
if the machines being CONFIGURED are simple state machines!  This is
what bothered me about your math.  Turing equivalence is irrelevant.

This is the known cyclomatic complexity limit for McCabe's white-box
basis testing algorithm.  I don't think this is a coincidence.

> Genetic algorithms must have a fitness function which executes
> quickly.  If the fitness function takes too long, then the time
> required to arrive at a solution is prohibitive.  Building and testing
> a host is a 1-hour fitness function.
> 
> For example, with a DNA size of only 8 operations, executing a 1-hour
> fitness function for each combination requires 8!/24/365 = 4.6 years.

Yes, if you presume that you have "zero knowledge" of sequencing.  But
you don't have "zero knowledge". This simplifies things a lot. 
  
> What am I missing?

There's only one thing that I can see that you're missing.  Some
subsystems are orthogonal to all others and even order-invariant by
design.  This means that they can be configured in an order separate
from other orderings, or even configured in an order-invariant way.
In order to verify this one has to know that:
a) the subsystem you're configuring isn't turing equivalent (i.e.,
   you're not "programming" anything).  Most good ones are
   state-machine equivalent, where all you're doing is adding or
   removing states.
b) the subsystem is memoryless ("non-markovian").
c) the software basis for the subsystem is invariant.

A good example of this would be DHCP.  I claim that DHCP is
order-invariant provided that the software that one uses to run it
(e.g., dhcpd) doesn't change, and that it doesn't matter how many
times one enables or disables it.  The reason for this is that the
regression test has already been done.

I claim that this kind of invariance is a new kind of "modularity"
specific to system administration.  If one constructs modules of this
sort, then order doesn't matter outside the modules.  If one
constructs modules appropriately, order doesn't matter inside them,
either. "This is the way to construct systems". But current systems
aren't constructed that way. 

-alva


From couch@eecs.tufts.edu Thu Apr 25 10:01:04 2002
Received: from lin19.eecs.tufts.edu (IDENT:postfix@lin19.EECS.Tufts.EDU [130.64.23.49])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id KAA02563
	for <stevegt@terraluna.org>; Thu, 25 Apr 2002 10:01:04 -0700
Received: from darkmatter.eecs.tufts.edu (darkmatter.eecs.tufts.edu [10.3.1.7])
	by lin19.eecs.tufts.edu (Postfix) with ESMTP id 723DE819E
	for <stevegt@terraluna.org>; Thu, 25 Apr 2002 13:01:16 -0400 (EDT)
Received: by darkmatter.eecs.tufts.edu (Postfix, from userid 30)
	id 6D43F2AA07; Thu, 25 Apr 2002 13:01:15 -0400 (EDT)
To: stevegt@terraluna.org
Subject: intractability of predictability of state machines
Message-Id: <20020425170115.6D43F2AA07@darkmatter.eecs.tufts.edu>
Date: Thu, 25 Apr 2002 13:01:15 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: O
Content-Length: 2006
Lines: 38

Alas, I spoke too soon about the intractability of ordering being
invariant of whether the configured system is turing-equivalent or
just a state machine.  I'll have to think about this, but I think
there's a reduction for state machines that avoids most of the
complexity of validation.  The simplicity of this case comes from the
fact that modifications of independent substructures of a state
machine (without modifying links between them) commute....

The real issue here is "behavioral commutivity" of operations.  Two
operations X and Y are behaviorally commutative if XY and YX applied
to a machine exhibit the same final behavior. Of course, if XY and YX 
are bit-for-bit identical, they better be behaviorally identical. 

What Maelstrom did is to mathematically *construct* operations that
are behaviorally commutative.  It did this by positing a *homogeneity*
criterion:  two scripts that change a parameter always change it to
the same value.  This of course means that the behavioral result of XY
is always the same as the result of YX, if Y and X succeed in making
their modifications.

I think that to capture the interest of the LISA audience, though,
your paper has to take a different tack.  The premise that I think
that you should explore is the changing role of testing and validation
in IT infrastructure.  Point out that testing is no longer the sole
role of the vendor, but a responsibility of the system administrator.
Point out the while the vendor does "unit testing", we must do
"integration testing".  Explain why that's difficult, and give your
evidence (both anecdotal and mathematical) about the difficulty of the
problem. Point out that the existence of testing and validation as a process
forces one to look at the problem of network administration from a new 
perspective, and then explain the implications. 

I recently bought a new book "Testing IT" by Watson?.  This book
serves as a really good starting point for an essay on validation and
its implications.

-alva


From couch@eecs.tufts.edu Sun Apr 28 07:56:16 2002
Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id HAA25883
	for <stevegt@TerraLuna.Org>; Sun, 28 Apr 2002 07:56:16 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin16.eecs.tufts.edu (Postfix) with ESMTP id DCEF947C01
	for <stevegt@TerraLuna.Org>; Sun, 28 Apr 2002 10:56:32 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 30)
	id 47E30CFC4; Sun, 28 Apr 2002 10:56:32 -0400 (EDT)
To: stevegt@TerraLuna.Org
Subject: Re: more about turing equivalence.
Message-Id: <20020428145632.47E30CFC4@piano.eecs.tufts.edu>
Date: Sun, 28 Apr 2002 10:56:32 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: O
Content-Length: 22949
Lines: 481

> > a) Cfengine's configuration operations aren't turing equivalent, nor
> >    are the operations posited by maelstrom.  Both are "state machines"
> >    in the language of the Chomsky hierarchy.  Recall that there are
> >    three levels of complexity in the Chomsky model, provably different:
> 
> I usually see four levels -- you're lumping linear bounded automata in
> with turing machines, right?  I don't think it matters in this case,
> unless you've seen something I've missed.

I don't think it matters. I studied from an older book:)
  
> >    i) state machines/regular expression grammars.  Most cfengine
> >       configurations live here.  It takes some trouble to make one
> >       that isn't a state machine. 
> 
> Yes.  And the state which cfengine detects is based on spot checks of
> the current disk content.  I suspect that, due to those 2^N possible
> bit states of an N-bit disk, any method which uses current disk
> content as the input to a state machine is only going to be able to
> take samples.

Alas, this is not true of typical cfengine scripts. They sample
everything they change. It can be difficult to engineer a script
this way, but once engineered, there are no samples. 

The 'sampling' comes from poor interoperability between cfengine and
makefiles.  This could be eliminated by rewriting 'install' to be 
convergent! Then "make install" would not change anything that was 
already in compliance....
  
> This sampling means that cfengine is not really responding to the full
> disk state of the machine.  During any given state transition,

But this is an error in configuring the machine, not a property of
cfengine proper.  It's "poor practice" that I hope to stamp out in my
next book.

> cfengine responds to only a very small subset S of the disk state.
> Therefore, any given rule in cfengine's state machine can actually be
> triggered by any one of 2^(N-S) disk states.  Those multiple possible
> disk states cause the outcome of any conventionally-configured
> cfengine run to be non-deterministic.  (Ouch.)

True, but this is the result of stupid admins rather than stupid 
cfengine. 

I'm serious: making "install" (and rpm --install) convergent
would fix this. 

> > Likewise, makefiles can usually be
> >       REDUCED to state machines by looking at what they do on each
> >       architecture and codifying that as state transition rules. 
> 
> By intent, ISconf has always wrapped and used makefiles as
> deterministic state machines, only.  The only bits on the disk that
> the makefile reads to determine state are timestamp files explicitly
> created by the makefile itself and maintained for the life of the
> target host.  
> 
> This means that ISconf uses the full set of all prior actions,
> accumulated over the entire life of the target host, in order to
> determine current state and trigger the next state transition.  Since
> there is only one possible disk state that could be represented by a
> given set of timestamp files, we get deterministic state transitions.

We can get that in cfengine as well....it's just possible to abuse it. 
  
> An objection frequently raised is that this approach does not have the
> ability to detect and correct out-of-band changes.  In a fully
> automated environment, out-of-band changes to root-owned portions of
> disk are indicators of a security breach.  The only reliable way to
> inspect a disk for security breaches is to reboot off of an untainted
> disk.  No tool which executes within the context of a breached disk
> can be relied on to detect and correct the breach, so this objection
> applies to cfengine, make, and even manual administration methods.  

I claim that one can make isconf convergent in the cfengine sense. 
Then it will correct out-of-band changes and will still have the 
properties that it has now. I think that the combination of 
sequence determination and convergence will be the long-term
solution to all ills. 

> >    ii) pushdown automata/context-free grammars.  Right now I know of
> >       no configuration management tools in this category. 
> 
> Something's nagging at me here, having to do with LIFO ordering and a
> configuration tool with rollback capabilities etc.  It could be that
> the "save original files for uninstall" capability of many package
> managers is verging on pushdown automata, but with nowhere near an
> accurate implementation.  

Yes. 
  
> >    iii) turing machines/unconstrained grammars.
> >       The actual machines being configured live here.  This is where
> >       undecidability may lurk.
> 
> Yes.  I suspect *all* self-administered hosts are one-tape turing
> machines, regardless of administration tool.  I'll get back to that in
> a moment.

Sure, they certainly are, but what's the *minimal* model? 
That determines complexity. 

> >    This makes the CONFIGURATION model simpler than the EXECUTION
> >    model.  While the execution model remains turing equivalent, the
> >    configuration model is fundamentally simpler in structure.
> 
> Yes.  Further, I suspect a cfengine script is equivalent to the
> nonvolatile ruleset of a one-tape turing machine, while the target
> machine's disk is the turing tape -- it reads a bit, alters the tape,
> reads again, etc.  

Cfengine's base operations are simpler than that. 
They just assert state according to a template. 

> I suspect an ISconf makefile is equivalent to the ruleset of a
> two-tape turing machine.  The directory (/var/isconf/stamps) where
> ISconf keeps its stamp files is a read/write tape, while the rest of
> the disk is a write-only tape.  ISconf itself doesn't care what's out
> there on that write-only tape -- it drives blindly.  (Though the tools
> which are called by 'make' might care.  For instance, it's perfectly
> valid to call cfengine from within make.) 

This is dangerous thinking. According to the power theorems one- and two-
tape turing machines are equivalent. 
  
> So, yes, the configuration model and the target machine might fit
> different Chomsky levels when viewed in isolation from each other.  
> 
> But these analogies collapse into one-tape machines when you
> take into account the fact that any administrative action has the
> ability to modify the behavior of the administrative tool itself.

To say nothing of the n-tape to 1-tape equivalence theorem...
  
> Let's equate a one-tape turing machine's nonvolatile ruleset with the
> CPU instruction set of a conventional Von Neumann machine.  The disk
> content is the turing tape.  The operating system and subsidiary
> programs reside on the same tape.  The configuration tool itself is by
> intent a finite automata "state machine", emulated by the turing
> machine.  This state machine's ruleset and control code reside on the
> same tape as the operating system and other applications.
> 
> That configuration tool has the ability to modify the content of the
> rest of the turing tape -- that's its job.  It can also directly
> modify its own ruleset -- it can upgrade itself.

Yes, but the key to all successful administrative strategies is to 
simplify that model somewhat.  That model is provably too complex
for a human to use....
  
> The content of the rest of the turing tape determines how the turing
> machine interprets the configuration tool's ruleset.  In practical
> terms, this means that the behavior of the configuration tool is
> dependent on the operating system kernel, shared libraries, and
> subsidiary applications.  
> 
> So the configuration tool can modify its own behavior in two ways;
> directly, by replacing its own ruleset; or indirectly, by altering the
> way that ruleset is interpreted.
> 
> Therefore, a configuration tool is capable of the self-modifying
> behavior of a turing machine.  

Yes, but I know of no practical application of this. Things get really
confusing if you allow cfengine to modify itself. 
  
> The instances in which a configuration tool will modify its own
> behavior indirectly, by modifying code it depends on, are difficult to
> predict.  Doing so would require a full understanding of all parts of
> our "turing tape" which affect the behavior of the tool.  In practice
> this would mean scoping and conducting an exhaustive analysis of all
> pertinent code in the kernel, all applications and utilities which the
> configuration tool calls, as well as all shared libraries and
> configuration files these components depend upon.  
> 
> Unless this full understanding exists, a configuration tool residing
> on, executing within, and modifying the behavior of a one-tape turing
> machine *must* be assumed to behave like a turing machine, rather than
> a finite automata.

But this does not take into account that there are things that we as 
humans agree not to do in order not to get into this quandary. 
There is a pattern of practice that avoids this situation. 

> > b) Even with this simplification, the sequencing problem remains.  It
> >    is easy to construct a sequence of invocations of N state machines
> >    such that only the order x1..xn gives the appropriate outcome on
> >    the turing machine tape.  This is the same thing as what you're
> >    saying; without more simplifications, the outcome of every order is
> >    potentially different.
> 
> Yes.
> 
> Further, if the state machines are themselves executing within the
> context of the turing machine tape, and their states interpreted by
> code which they manage, then the order of invocations becomes doubly
> important:  If state machine x42 alters the ruleset of state machine
> x43, then invoking x43 first, followed by x42, will cause a very
> different outcome than x42 followed by x43.

Again, we as humans try to avoid such confusion. This kind of confusion
is possible in 'make', which is why I avoid it...
  
> Cfengine does not consider its prior actions of a month or a year ago
> to be part of the "tape" in order to avoid hitting this particular
> problem: it can *only* rely on those spot checks.  

No. This is a mis-perception. It is true that most *users* of cfengine
rely on spot checks. The more appropriate thing to do is to reverse-engineer
the total effect of a make and then express it as a series of copies. 
This is not a spot check; it's instead a complete convergent installation
of all components. The fact that people don't do this is a fault
of the practice, not of the tool. 

I've talked with Mark about making this possible. 
  
> In Mark's LISA workshop we discussed whether cfengine 1.X might be
> configured to maintain long-term state, or if cfengine 2.X might
> support this natively, in order to make it more deterministic.  I'm
> not sure how many people present understood what this debate was
> really about.  I think we're doing a pretty good job of addressing it
> here.

Yes. The verdict so far is that cfengine 1.x is dead and that cfengine 2.x
will probably track long-term state. 
  
> > c) Determining an appropriate order for arbitrary state machines to
> >    achieve a given outcome for configuration is INTRACTIBLE but not
> >    UNDECIDABLE.  Hard but not impossible.  All one has to do is to try
> >    all the orders and choose the one that produces the appropriate
> >    outcome.  
> 
> Yes, if you can afford N!.  In practice, we don't need to try very
> many of the orders at all -- the correct order is the one which worked
> when each state machine was itself discovered to work.  In other
> words, if the order P currently works, and we develop, debug, and test
> a new state machine xn which produces the desired outcome when run
> immediately after P, then the new order P' should include only P and
> xn, in that order.  
> 
> The way xn gets coded in an ISconf makefile is by adding the new state
> machine into the makefile as an independent stanza or internally
> dependent set of stanzas.  Then we add the xn target name to the tail
> end of the P prereq list, creating P'.
> 
> >    It becomes undecidable if other processes are SIMULTANEOUSLY
> >    modifying configurations, asynchronously.
> 
> Absolutely.  Manual administration Bad, 'make -j' Bad...  And using
> two different auto-administration tools on the same machine is Really
> Really Bad.  ;-)

It's just "turing equivalent". This is what's "bad" because in making
something *more* than simple state modifucation, one makes the system
unpredictable. 
  
> > d) Determining whether an appropriate order achieves a particular
> >    behavioral outcome remains UNDECIDABLE.  This is just another
> >    flavor of Church's thesis.
> 
> I haven't been able to find a reference for this.  Do you mean
> "determining in advance"?  If so then I agree.  If not, then I'd have
> to think about it longer.

I mean that even if you fix the order, you have to validate that
the resulting system works. You can't ever do anything, even in a
deterministic sequence, that you don't have to validate. 
  
> > I think the real issue here is that VALIDATION of behavior only
> > applies to a specific sequence of operations.  If one changes the
> > sequence of operations, then one must prove that the other sequence
> > reproduced the same disk state, bit for bit, before one can claim that
> > the prior validation applies to the new state.  
> 
> YES!  This is very probably a core, if not *the* core, idea that I'd
> like to get across.  Sampling disk state alone can not ensure bit for
> bit reproduction of disk contents nor resulting behavior.  But
> reproducing order of configuration changes will reproduce the same
> disk state and behavior every time.
> 
> I further suspect that "A machine's set of behaviors will *always*
> vary if the disk contents vary".  This might be proved by including the
> disk content itself as part of the output set, which I think is valid
> if the machine's programming includes introspection functions.
> Machines with different disk contents will certainly always exhibit
> different behaviors in response to 'cat /dev/*'.
> 
> > This is a claim of "transitivity of validation", the same claim
> > utilized by the Linux Standard Base (LSB).  
> 
> Hmmmm...

This is really cool.  LSB is arranged so that an application validated
with respect to one instance of the standard environment is guaranteed
to work in any other environment set up to the same standards.  One
can "debug once", "run everywhere".

> >  Now
> > 
> > a) In the absence of further assumptions, whether two different
> >    configurations exhibit the same behavior is UNDECIDABLE(can prove).
> 
> I agree, but haven't thought of a direct proof.  An indirect proof is
> what I've been building with this whole chain of thought over the last
> few months...  
> 
> Recap:
> 
> - sampling disk state alone isn't enough to determine disk state,
>   unless you sample the *whole* disk

Or unless you sample only the parts that can change, with respect 
to a base configuration. E.g., CFengine with only copy, links, and 
dirs stanzas (user modules and file editing do not obey this properly). 
  
> - ...so state machines driven by only disk samples produce
>   non-deterministic behavior

Correct, and this is bad cfengine use (and, for that matter, slink use) 
as well.  This is part of why my slink is simpler than cfengine....
  
> - ...so the equivalence of behavior of two different state machines
>   driven only by disk samples is undecidable  (that's easy)

Yes. 
  
> - configuration driven by only deterministic ordering of individual
>   state machines produces deterministic behavior

But so do operations that are strongly convergent such as convergent 
copying.  The fact that some operations in cfengine don't support this
means that these operations (e.g., file editing) should be avoided
for validation purposes. 
  
> - because the state machines reside on and execute within the context
>   of the driven disk, they can modify each other, as well as
>   themselves

Yes, but this is considered awful practice. 
  
> - ...so the behavior of individual state machines is dependent on the
>   order of invocation of all state machines

only limited to this in cases where ordering isn't commutative. 
  
> - ...so whether two different orders of configuration operations
>   exhibit the same behavior is undecideable (takes me a while to get
>   here)
>
> > b) so whether two different orders of configuration operations exhibit
> >    the same behavior remains undecidable by lifting.
> 
> Exactly.
> 
> > The most important force in the world to system administrators is
> > not configuration management, but configuration validation. 
> 
> Yes.  In this context, it's amazing how many shops lack test
> environments, but instead "try out" new administrative code in
> production.  In hindsight I can see why I've been harping on the need
> for test environments for the last several years.  Interesting that
> it's this easy (hah) to justify that need theoretically.  ;-)

I think this will be the theme of this conference. It's emergent 
in the work of several people. 
  
> > It is this validation that suffers from turing equivalence problems,
> > not the act of configuration itself. 
> 
> If you take into account the x43,x42 ordering problem I described
> above, does this change your mind about the act of configuration
> itself having a turing equivalence problem?

Nope.  I think that the turing equivalence problem comes from bad
practice rather than being an invariant.

Let me recap the discussion in a different way. 

The ultimate enemy of system administrators is unpredictability.  We
need to know that things will work the way they should.  By nature,
machines tend to behave unpredictably if administered in a haphazard
fashion. The art of system administration is to avoid this natural 
tendency of the machines and create a stable operating environment. 

This unpredictability is rooted deeply in the nature of the machines
that we administer.  All machines are Turing equivalent and the act of
administering them via any automated means has the potential to create
situations in which the machines become self-programming.  This means
that the machines can enter unforeseen states where behavior is
under-determined and not well understood.  Due to this Turing
equivalence, the correct functioning of a machine is undecidable.  It
cannot be "proven", but must be assured via rigorous testing in
realistic situations. It can never completely be tested, but instead
always contains unknown responses to particularly strange inputs. 

The three tools a system administrator has against these odds are
validation, repeatability, and invariance.  One *must* validate the
function of any configuration, and be able to *repeat* that situation
on various machines.

Throughout this process there is a knowledge that certain kinds of
state changes *don't* affect the function of the machine.  E.g., we
know that nothing in user directories affects the operating system.
These are the "invariances".  Without them, a system is never
validated.  An invariance is a *theoretical* assertion (or axiom) that
certain kinds of changes can't possibly create changes in behavior.
These invariances can be as simple as "comments in inetd.conf don't
affect operation" or as complex as "gcc has nothing to do with rpm".

Without a concept of invariance, there is no possibility of
validation.  In particular, subsystems that change constantly can't be
validated in any one state.  The only way that a validation can occur
is if these changing components are considered *theoretically* as
invariants.

All successful strategies for system maintenance are *practices* that
obey the principles of validation and repeatability with respect to
accepted invariances or axioms.  In other words, we don't restore the
user disk when something breaks in the operating system.

There are vast differences in what people are comfortable in calling
"invariances" or accepting as "axioms of decoupling".  This depends
upon each administrator's level of trust in specific services.  E.g.,
I "trust inetd" in the sense that I theoretically posit that "a
commented line does nothing".  So if I comment out tftp, I expect the
service to go away.  This is a kind of invariance:  a level of trust
in the theoretical correctness of a subsystem.  At the same time, long
experience has taught me not to trust linux dhcpd.  It doesn't always
do what it says, and is frightfully dependent upon its environment and
the ethernet cards to which it talks.

Where we differ is in how far we're willing to trust the axiomatic
behavior of subsystems in validating systems.  Inetd could be buggy
and it's conceivable that a commented line could get erroniously
interpreted in a buggy inetd.  In your system, everything is subject
to validation.  In mine, I *presume* that inetd is working properly if
its configuration has been validated.  So that
a) minor bit deviations in /etc/inetd.conf (that are considered
   invariant) can be ignored.  In particular, I posit that
   - "comments are comments".
   - "order of lines doesn't matter".
b) minor bit deviations in /usr/sbin/inetd are *not* ignorable.  If
   these occur, I've probably been hacked.
What you're saying is that in your validation model, you can't afford
to ignore bit deviations in /etc/inetd.conf.  That's fine, but a bit
of an extreme view.

Validation of complex systems is inherently difficult.  We simplify
the inherent complexity of systems by three methods:
a) axiomatic assertions.  Certain things can't happen and need not be
   tested.
b) transitive validation claims:  this works in this other known
   situation, and I know that what it needs is provided now, so I know
   it'll work here.  Thus validating preconditions for a subsystem is
   sufficient to validate the subsystem.
I think that what you're doing is asserting a transitive claim.  You
say that if your system works at state 5, and you make serial changes
after state 5 that result in state 8, then only the subsystems changed
between states 5 and 8 need be validated.  This is true.

But let's look at the other side of that.  Suppose that we decompose
ISConf operations into the file content assertions that they imply,
and code that as a cfengine script.  Let's utilize only pure
convergent operations (copy, links, dirs) and avoid file editing,
instead pumping images into place.  Then
a) order of operations doesn't matter, by design (but remember, we've
   abstracted things so that this is true by looking at results rather
   than incremental process.
b) if we change this, we have to change the whole process to a new
   convergent state.  We can't just tack things onto the end.
c) the transitive validation property is that if we change things,
   only subsystems that use files that change have to be validated.
   This can be a lot of files, but it can be very few. 

This is the way cfengine is supposed to be used. Current users don't
use it that way, and "get what they deserve". 

-alva


From couch@eecs.tufts.edu Sun Apr 28 17:32:33 2002
Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46])
	by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id RAA32528
	for <stevegt@TerraLuna.Org>; Sun, 28 Apr 2002 17:32:33 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin16.eecs.tufts.edu (Postfix) with ESMTP id 0DFCE47BBE
	for <stevegt@TerraLuna.Org>; Sun, 28 Apr 2002 20:32:50 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 30)
	id 8B5D8CFC4; Sun, 28 Apr 2002 20:32:49 -0400 (EDT)
To: stevegt@TerraLuna.Org
Subject: Re: ordering
Message-Id: <20020429003249.8B5D8CFC4@piano.eecs.tufts.edu>
Date: Sun, 28 Apr 2002 20:32:49 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: O
Content-Length: 7572
Lines: 163

> On Thu, Apr 25, 2002 at 09:07:10AM -0400, Alva L. Couch wrote:
> > The only way to assure that two subsystems are orthogonal is by design. 
> > Testing cannot assure that.  This is a fundamental premise of 
> > software testing. 
> > 
> > I think that this is the ONLY way to assure that a realistic build is
> > reliable.  But worse, there is NO way to assure the build is
> > consistent across an enterprise of hosts, other than to assure
> > homogeneity of operations and even perhaps hardware.
> 
> Enacting all administrative change via serialized, tested and
> reproducable sequences of operations constitutes homogeneity of
> administrative operations.  This will ensure consistent builds across
> an enterprise.  

This is consistency, not homogeneity, according to my definition.
Homogeneity is a property of independent processes that must agree.
Arusha does this by taking all data from a common repository.
Consistency is weaker; the last process wins and the last process is
always the same one.  So the winner is the same.  There are no races
in a system of homogeneous configuration actions.

> Restating that, "if we do the same things in the same order to
> identical machines, then they will remain identical in behavior".
> Machines which aren't identical, or which by intent need different
> builds, are created by variations of the tested base sequence.

Yes. This makes sense. 
  
> This allows us to ignore orthogonality, rather than needing to
> determine whether the subsystem developer designed for it.

I prefer to exploit any orthogonality I can find and ignore any
that doesn't stare me in the face:)
  
> Commutability and orthogonality are difficult for ordinary sysadmins
> to prove or disprove in the field.  For a sysadmin to use a tool which
> requires them to make these determinations, they must understand
> correct use of the tool, the design of the subsystems they're
> configuring, as well as the desired configuration result, and how to
> test for it to make sure they succeeded.

Agreed. We either use them when they are gifts, and ignore them
otherwise, or we have to document such properties. 
  
> For a sysadmin to use a tool which ensures strict serialization of
> sequences of operations, they only need to understand correct use,
> desired result, and how to test for it.  The "correct use" part is the
> only additional thing they need to learn beyond what they would
> normally do when using manual methods.  This is trainable.  In
> practice, I've been able to orient sysadmins in makefile and CVS usage
> in as little as a day, on-site.

There is one problem with a serialized system, and I don't refer to
the breakin problem.  It can't be reconfigured 'live' because
configuration instructions may change things necessary for proper
system operation.  It can't even necessarily be repeated twice to
identical effect. How do you assure repeatability in ISconf? 
It has to be some kind of consistency criterion....

> If we require that sysadmins understand subsystem design well enough
> to determine orthogonality, then we're expecting them to become more
> of an engineer in order to use the tool.  This is less trainable.  I
> would not be able to teach someone how to do this in a limited period
> of time, on-site, in the middle of a rollout.

Agreed.  This is something that we either "document" or "forget".
  
> Based on past experience I suspect any other tool will also have to
> support:
> 
> - preserving serialization of reusable subsequences
> 
> - implicit assertion test of zero return codes for external commands
> 
> - implicit reproducable serialization of operations not explicitly
>   serialized
> 
> - semaphores which child processes can use to signal async events to
>   parent processes (like 'rebuild kernel and reboot when make is done')
> 
> - a hierarchical grouping of host attributes, so that host function
>   can be determined quickly by eye
> 
> - re-usable ordered sets of grouped subsequences, so that new hosts
>   can be created by prototype rather than by class (This is important.
>   True class-based configuration tools don't seem to work in the
>   field, while protoype-based systems consistently do.  I don't think
>   I'm able to explain why yet; it may be nothing more than "this is
>   the way sysadmins think", or there may be a more theoretical basis.
>   I suspect, again, that it has to do with testing.)
> 
> These are things which ISconf/make already does.  In addition, over
> the years those of us using ISconf have concluded that we also need: 
> 
> - postrequisites (like 'do foo after I'm done')
> 
> - decentralized state machine specification (rather than a monolithic
>   makefile or script)
> 
> - lexically bound syntax (I want to be able to specify each operation
>   in the language most suitable for that operation)
> 
> - separation of action code from site-specific configuration data
> 
> - decentralized editing of state machine specifications (no need to
>   log into gold server to update makefile)
> 
> - state machine and file transport language integrated, as in
>   cfengine, to remove need for NFS mounts to get packages on demand
> 
> - embedded documentation, like POD; including dynamically generated
>   runbooks and training checklists generated per host class (the
>   latter actually looks easy -- name a person to be trained as a
>   "target host" and "build" them, checking off training actions as
>   they complete)

This goes into the paper for sure. 

> > > > You haven't convinced me that equivalence of the behavioral effects of
> > > > randomly ordered configuration operations is undecidable.  Just
> > > > complex. You might be able to do this. 
> > > 
> > > I think of it from the other direction -- beyond a certain number N,
> > > it's infeasible to prove empirically that N configuration operations
> > > can be randomly ordered and produce the same behavioral outcome.  N
> > > appears to be as low as 8 or 9.
> > 
> > This is an intractibility argument, not an undecidability argument;
> > you don't need turing equivalence for this.  In fact, it follows even
> > if the machines being CONFIGURED are simple state machines!  This is
> > what bothered me about your math.  Turing equivalence is irrelevant.
> 
> Yes, turing equivalence is irrelevant for this math.  It is proving
> itself to be a useful model for illustrating the behaviors of a
> self-administering system though.  I don't think we could have had a
> conversation this productive without that foundation.

Agreed. 
  
> I've gotten similar feedback from other people over the past few
> months.  Folks seem to need to work with this abstraction; previously,
> these discussions have always bogged down in behaviors of individual
> packages and operating systems.  Falling back to Turing's theoretical
> model seems to give us more neutral material to work with.

It's a bit dangerous though, because most people don't understand it
well enough to reason from it. There are constant traps and pitfalls
into which one can fall via imprecise thinking. 
  
> I know it's helped me immensely over the last few months in
> understanding what I've been doing for the last several years.  ;-)

Sure.  It may not be the best way to "sell" things, though.  Our
current audience is going to puke...

I think you've got some really important ideas here, that are crucial
to turning network administration from an "art" into a "profession".
My thinking is surprisingly similar.  Check out my students' abstracts
when they're done (at the last minute:).

-alva


From lisa02chair@usenix.org Fri May 24 09:53:51 2002
Received: from lin17.eecs.tufts.edu (IDENT:postfix@lin17.EECS.Tufts.EDU [130.64.23.47])
	by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g4OGrpv14069
	for <stevegt@terraluna.org>; Fri, 24 May 2002 09:53:51 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin17.eecs.tufts.edu (Postfix) with ESMTP
	id 336151FE83; Fri, 24 May 2002 12:53:50 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 418)
	id DED87CFD7; Fri, 24 May 2002 12:52:50 -0400 (EDT)
To: stevegt@terraluna.org
From: lisa02chair@usenix.org
Subject: LISA submission 050
Cc: paul@dcs.ed.ac.uk, couch@eecs.tufts.edu
Message-Id: <20020524165250.DED87CFD7@piano.eecs.tufts.edu>
Date: Fri, 24 May 2002 12:52:50 -0400 (EDT)
Status: RO
Content-Length: 19022
Lines: 394

Dear LISA'02 author, 

It is my great pleasure as Program Chair of LISA'02 to inform you 
that your paper:

    050: Why Order Matters: Turing Equivalence in Automated Systems Administration

has been accepted as a refereed paper for LISA'02, which will be held
from November 3-8 in Philadelphia, PA, USA.  Congratulations!

At least one author must attend the conference to present the paper. 

Your talk has been allocated 30 minutes (a 'long talk') at the
conference.  Please plan on presenting your paper in 22-25 minutes and
save the remaining time for questions and comments from the audience.
Your talk will tentatively occur during the session at

    Wed 2002/11/06 4:00 - 5:30


The chair for your session is: 

    Paul Anderson (paul@dcs.ed.ac.uk)

Shepherds for your paper include: 

   Paul Anderson (paul@dcs.ed.ac.uk)
   Alva Couch (couch@eecs.tufts.edu)

All of these people will receive a copy of this message and will
contact you directly soon after you receive this message.

Important dates:

- July 12, 2002:  near-final draft of paper due to shepherds for comments. 
- Aug 1, 2002:  (optional) Last date to submit your paper for copy editing.
- Aug 12, 2002:  final draft due to typesetter; only minor corrections
  are permitted after this time.
- Nov 3-8, 2002: LISA'02!

Important notes:

- All paper acceptances are contingent upon addressing any
  deficiencies that are discussed in the reviewer comments listed
  below.  Please take a moment to read these comments before beginning
  to write or revise your paper.  Reviews do not identify the reviewer
  unless specifically requested by the reviewer.

- Papers are due to the typesetter in final form by Aug 12, 2002.  You
  will be mailed instructions on how to submit your paper under
  separate cover, from our typesetter Rob Kolstad (kolstad@ace.delos.com).
  *All* papers (long talk and short talk) are subject to the same
  length restrictions (16 sides in the proceedings).

- At least one author must attend the conference to present the paper.
  Please register early for the conference.  One author of each
  accepted paper is entitled to complementary registration for the
  technical program.  To keep conference expenses down and fees low,
  we respectfully request that refereed paper speakers first ask their
  employers to pay conference registration fees.  The USENIX
  Conference Department will contact you regarding complimentary
  registration in cases where your employer will not cover
  registration costs.

- Authors who are full-time students are welcome to apply for a
  stipend to cover more costs, including travel and lodging.  Please
  see http://www.usenix.org/students/stipend.html for more information
  about the student stipend program.

- Your session chair will introduce you at the conference and manage
  the session in which you present your paper.  Time limits are
  non-negotiable.  Please provide the session chair with a brief
  biography of the speaker that can be read to introduce your paper at
  the conference.

- Shepherds stand ready to help you produce the best possible final
  paper.  Shepherds will contact you directly to discuss how they
  might best aid you in accomplishing this.  At minimum, shepherds
  will read a near-final draft of your paper and provide comments.
  Please provide a near-final draft of each paper to the shepherds and
  session chair by July 12, 2002.  Shepherds are also available to
  answer any questions you might have about writing or reviewer
  comments, and will be happy to comment upon intermediate versions of
  your paper upon request.

- At your option, you may also enlist the aid of our official LISA'02
  copy editor, Jeff Allen.  If you wish your paper to be edited
  carefully for spelling, grammar, etc, please mail an editable copy
  to lisa02copy@usenix.org before August 1, 2002.  The paper will be
  edited and sent back to you for final approval.  Mailing a paper to
  the copy editor is *not* a way to submit it for printing.

- I will contact you later with more details on your presentation for
  the conference.  The presentation hall will have a projection screen
  for a VGA laptop and an overhead projector for overhead
  transparencies.  There are no loaner laptops on site -- you must
  bring your own laptop or arrange to borrow one.  It is always wise
  to prepare overhead slides even if you plan to use PowerPoint.

- For any and all other problems, please feel free to contact me for 
  assistance (lisa02chair@usenix.org). 

We received many fine abstracts this year!
Thanks for helping to make LISA'02 a success!

Alva L. Couch
LISA'02 Program Chair
lisa02chair@usenix.org 

=======================================================================
Reviewer comments:

Title:   Why Order Matters: Turing Equivalence in Automated Systems Administration
Authors: Steve Traugott, TerraLuna, LLC; Lance Brown, National Institutes of Health
Email:   stevegt@terraluna.org

I think it's very important to define self-adminstration early in the
paper. Without a good definition, the paper lets the reader's imagination
fill in the blanks -- and that seems like a bad idea.

Many of the ideas are put forth as 'laws' that sound like absolute
truths (e.g., rule number 9 about a machine changing its disk contents
means that future changes will be made in a different way).  While we
can cite examples where this is true, I think we can also cite examples
when it isn't true.  Maybe using the word 'can' or 'might' would make
the rule a bit easier to understand.

The pivotal rules 3 and 4 in the implications for self-administration
presume that changes are not 'independent'.  I think there's probably some
benefit to be had by asserting that requiring independence is a difficult
task, fraught with peril in the real-world.

The prescription -- a starting bitmap + a bitstream of all possible changes
is easy to see as a theoretical sort of argument.  I'm a bit more concerned
about the pragmatic aspects of such an idea.  Maybe the paper should show
some of the ways that real upgrades, real software, real-world
administrative techniques can use this data.  I don't see how such a thing
would work in the framework as described.

Finally, I don't think the paper derives much 'bang' from the fundamental
notion of Turing equivalence.  That seems like a neat result that might
have some predictive properties -- maybe they can be explored?

=======================================================================

The theory of system administration is much needed but has received
little attention until very recently.  This paper adds to that theory
with a detailed discussion of the theoretical ramifications of
"machines that repair themselves." The authors point out rightly that
this aspect of system administration poses serious theoretical
questions that may be undecidable in the computer science sense.  They
contend that we can avoid some of the undecidability in this situation
by adopting a strict order of operations for configuration actions.

This is quite true.  It is not the only way, however.

Throughout the development of the theory of computer science, we have
found that difficult problems become easier with the addition of
appropriate constraints.  While it is true that unconstrained
automated processes are Turing equivalent, it is also well known that
such processes become easier to analyze when simple constraints are
imposed.  For example, limiting one's self to copying files and links
in the 'cfengine' sense makes one's configuration process equivalent to
a simple state machine rather than a turing machine.  Deterministic
ordering is *another* way to achieve this reduction for arbitrarily
complex configuration actions.  There are many others.

One must be *very* careful with language.  "Turing equivalence" is an
ideal state for ideal machines.  The "infinite" nature of such
machines makes "undecidability" possible.  This is not an attribute of
real machines (with finite memory), for which determining the
correctness of a configuration is simply "intractable."

I have serious problems with the writing style, which is choppy, 
with many one-sentence paragraphs that fail to fully develop ideas. 
Paragraphs need to be combined and developed so that they flow. 

Make sure that you include references to the many papers in the
literature that disagree with your conclusions!

=======================================================================

This paper advances the premise that the order of application is important
for configuration changes, since it is not possible to prove that two
configurations are identical if changes are applied in different orders.
This argument is used to advocate a configuration system in which all hosts
start as identical disk clones and have changes applied in identical sequences.

This is an interesting concept for anyone involved with configuration
systems, particularly in highly critical environments with large numbers of
identical machines. However, there are a number of objections to this
approach which are not addressed in the abstract, that I would like to see
covered in any final paper:

1) Machines are not completely deterministic in practice (they get external
interrupts) and even machines which have had identical configuration
changes applied, will not have identical disk images or even identical
sequences of system calls. 

2) For the purposes of system configuration we are interested in
equivalence  between certain important high-level behavioural properties,
and it may be possible to prove these, without requiring equivalence at the
bit level.

3) I would like to see some concrete examples where alternatice sequencing
has caused a problem in practice.Is this a significant problem compared to
other system configuration difficulties?

4) I would like to see how these ideas might be applied in practice (the
abstract does suggest that this is intended)

5) How can we deal with systems with very diverse and rapidly changing
configurations?

I personally found the discussion of Turing and Von Neumann machines a
little confusing, and it might be worth trying to express this in a more
concise way.


=======================================================================

This paper is an enthusiastic attempt at arguing that system modifications 
must always be ordered. However, I do not think that it succeeds in doing 
so in an effective or convincing way. There are a number of major problems 
with the abstract:

* The two halves of the paper--pages 1-top of 3, and 3-end--are 
unconnected, despite assertions to the contrary. The argument in the 
second half of the abstract--which is its true point--does not really 
depend on or even use the Turing/von Neumann concepts presented briefly in 
the first half of the abstract.

* The paper does not discuss its position in the context of the large 
amount of related work. Rather, it seems to simply take up an conversation 
which occurred last December. Keep in mind that most LISA attendees will 
have no knowledge whatsoever of this context.

* At some points, the paper makes unwarranted and often unacknowleged 
assumptions. For example, point 4 on page 3, which asserts that ABC
is always non-equivalent to ACB. Constructing a counterexample is trivial 
(just choose ABC to be independant--not hard at all). The rest of the 
argument follows from this dubious, invalid assumption. Indeed, this is 
the whole point of contention in the first place, so simply asserting that 
it is so begs the question. Logically, the argument is:

   o  We must decide between A and not A.
   o  A is obviously true.
   o  Therefore, A follows.
   o  Therefore, A's consequences follow.

I agree that if order matters and it is ignored, then bad things happen. 
But order does not always matter. Deciding when it does, the extent to 
which it is important, and the best tradeoff of complexity vs. risk of 
unforseen interactions is the hard part. The authors need to spend a lot 
more time thinking about these topics. They ought to be ready to submit a 
much better abstract next year.

I recommend rejecting this paper.

Finally, the first sentence is bizarre; I'd rethink it and revise it into 
something with a little more meat and less cuteness.

=======================================================================

Very well written, good detail and references.  (I am amused to see Turing's 
1939 work cited. :-) )  Very nicely done.


=======================================================================


This paper presents me with a dilemma. On the one hand, it addresses
an interesting problem and does so in a provocative way that I approve
of. It purports to offer a counterpoint to cfengine's recently much
written-about ideas of configuration "convergence". I would like to
see a paper that does this succeed in making its points clearly.

The problem with this abstract, as presented, is that it is full of
technical errors, misunderstandings and untrue assertions about systems.
It might be possible to salvage something of it, with significant
sherpherding, but as it stands it is just misleading and factually
incorrect.

Let me take some examples. 

"A 100k sample cannot determine the state of a 2GB disk."
This is obviously not true. A rudimentary program (e.g. rm -rf /)
takes only a few bytes to implement, but very clearly determines
the whole disk. 

More generally, the authors are assuming that every bit of a
configuration is significant. It is not. Turing machines would not
work at all unless algorithms could be coded into significantly less space
than the size of their tapes (actually their tapes are infinite, so
von Neumann machines only approximate Turing machines). Compression
and coding need to be taken into account. This is an information
theoretical limitation, nothing to do with Turing machines. The authors
do not discuss the amount of information required to describe a 
configurastion at all. This has to do with the amount of entropy in the
disk: if the disk is to be empty, it is very easy to keep it that way.
That is still a valid configuration. Clearly there is some middle
ground here. Not all bytes are independent, so it is not true that one
needs the complete history to explain a state like that resulting
from rm -rf / 

"The entire past history of changes to the machine are important...
if we expect deterministic, reliable behaviour in operation..."
The authors seem to be forgetting all of the changes made by users.
Are they talking about only a part of the computer, or all if it?
What about all of the hidden variables caused by network communications,
resulting in changes to log files and other effects etc?

"If we were to allow the order of changes to be unsequenced, we must test
each possible sequence of changes. This is an intractable problem."
This is wrong. Indeed, Couch showed how this could be done at LISA 2001,
and this is the same approach used by cfengine. The number of orderings
might be M!, but that is not the number of interest, because it neglects
the coding and the amount of information in those bits.

"..must continue to test [hosts] individually henceforth ... this is
very expensive. It is easier to avoid unsequenced changes"
First of all, the idea of a sequence assumes that there is a prior
dependency chain at work. This is only true if the operations being
performed are intertwined -- that is only true of complex operations,
and is usually an artifact of software. The idea that testing the
configuration of a host is expensive is an assertion, not backed up
with any figures. Cfengine seems to get along fine with this.
Also the assertion that it is easier to avoid that approach is just
an opinion, not a justified conclusion.

The authors miss some points which would be in their favour - such
as how ordered information within critical policy
files fits their arguments much better than arguing about Turing machines
and systems as a whole. Critical policy files can determine the entire
behaviour of the system, but they are just a tiny part of it, and
they can lead to critical dependencies. It is not so much order
that matters as the chain of dependency.

I would drop the whole Turing argument - it is a red Herring. If the
authors want to criticize cfengine's approach, they should attack it 
where it is weakest: in the issue of hidden dependencies through
unpredictable software and its relatioship to configuration files.
Unpredictable software can turn predictable intentions into chaos.


=======================================================================

This is an important paper for the advancement of theoretical sysadmin.

It unfortunately provides more problems than answers, but sometimes one 
has to walk deeper into the forest before finding ones way home.

There's a case to be made that a workshop is a better place to work on 
theoretical sysadmin, especially for papers like this with "intermediate 
results". I'm eager to see the fruit of the theoretical sysadmin movement, 
but I'm unwilling to see LISA turn into a collection of dry, inaccessible, 
mathemtical arguments. This paper threatens to take us down that road.


=======================================================================

This is a strange paper. The writing often consists of a series of
one-sentence paragraphs, and they sentences look like points of a 
manifesto.

It continues a debate from LISA �01 that this reviewer did not participate 
in, on whether or not a self-administering system needs to be 
deterministic in actions. The authors suggest this point is a touchstone 
upon which the community must agree or else little useful progress can be 
made. The crux of the argument seems to be whether you need to include the 
history of changes to a disk in a self-administering system that can 
modify anything on the disk.

It then switches to a tutorial on Turing machines, followed by the author�s
observations on Turing machine equivalence. We agree that a von Neumann
computer is a Turing Machine, but are surprised that anyone in the last 50
years thinks it necessary to make that point in a paper. They seem to think
that the Turing tape corresponds to disk. A simpler (and widespread)
alternative interpretation is memory is the equivalent to tape. Perhaps the
authors are impressed with the nonvolatility of paper tape and disks?

This paper is an op-ed piece that was inspired by discussions at the last
LISA, with cfengine the apparent target of its wrath.

The program committee will have to judge this paper on whether it will 
lead to useful discussions at the next LISA. I think the probability of 
lively discussion is a perfectly valid reason for the PC to accept a 
paper.  It is not clear to this reviewer what are the important 
contributions of this paper beyond a potentially lively debate, and 
whether it sheds light or just adds heat.

=======================================================================


From couch@eecs.tufts.edu Fri May 24 09:59:39 2002
Received: from lin18.eecs.tufts.edu (IDENT:postfix@lin18.EECS.Tufts.EDU [130.64.23.48])
	by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g4OGxdv17436
	for <stevegt@terraluna.org>; Fri, 24 May 2002 09:59:39 -0700
Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40])
	by lin18.eecs.tufts.edu (Postfix) with ESMTP
	id 568306F98B; Fri, 24 May 2002 12:59:38 -0400 (EDT)
Received: by piano.eecs.tufts.edu (Postfix, from userid 418)
	id 4350BCFC4; Fri, 24 May 2002 12:59:38 -0400 (EDT)
To: stevegt@terraluna.org
Subject: Re: LISA submission 050
Cc: paul@dcs.ed.ac.uk
Message-Id: <20020524165938.4350BCFC4@piano.eecs.tufts.edu>
Date: Fri, 24 May 2002 12:59:38 -0400 (EDT)
From: couch@eecs.tufts.edu (Alva L. Couch)
Status: RO
Content-Length: 211
Lines: 5

It should be rather obvious that things are going to get really
interesting around here.  I am willing to work much more closely with
you than a typical shepherd. Keep the ideas bouncing back and forth. 
-alva


From kolstad@ace.DELOS.COM Mon Jul  8 09:00:27 2002
From kolstad@ace.DELOS.COM  Mon Jul  8 09:00:27 2002
Received: from ace.DELOS.COM (ace.delos.com [65.102.83.114])
	by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g68G0Pf29175
	for <stevegt@terraluna.org>; Mon, 8 Jul 2002 09:00:25 -0700
Received: (from kolstad@localhost)
	by ace.DELOS.COM (8.10.1/8.10.1) id g68FK7f22738
	for stevegt@terraluna.org; Mon, 8 Jul 2002 09:20:07 -0600 (MDT)
Date: Mon, 8 Jul 2002 09:20:07 -0600 (MDT)
From: Rob Kolstad <kolstad@ace.DELOS.COM>
Message-Id: <200207081520.g68FK7f22738@ace.DELOS.COM>
To: stevegt@terraluna.org
Subject: --- USENIX LISA Conference Paper Submission Instructions ---
Status: RO
Content-Length: 17095

Hi!

Congratulations on your paper acceptance for the LISA conference in
Philadelphia.  Please let me introduce myself as the formatter
(typesetter) for your paper.

--> Your paper is due on Monday, August 12, 2002.  Really. <--

I'm enclosing detailed instructions below.  This is really easy stuff
-- just fight the temptation to format the paper yourself!

I will be your contact for all typesetting issues.  Please feel free
to write me with any questions.

					Rob

   ====================================================================
         /\      Rob Kolstad                     http://www.delos.com
      /\/  \     kolstad@delos.com          15235 Roller Coaster Road
     /  \   \    +1 719-481-6542          Colorado Springs, CO  80921
   ====================================================================

------------------------------------------------------------------------------

                           INSTRUCTIONS FOR USENIX
                             AUTHORS AND SPEAKERS
                              Rev. June 8, 2002
                  LISA XVI, 2002, Philadelphia, Pennsylvania

               Eric Allman, University of California, Berkeley
                     Judy DesHarnais, USENIX Association
                 Rob Kolstad, Berkeley Software Design, Inc.
                      Evi Nemeth, University of Colorado

                                   ABSTRACT

     This paper presents instructions for authors of papers at USENIX
     technical conferences and guidelines for speaking at those
     conferences.  It discusses details of preparing papers for inclusion
     in the proceedings and of preparing visual aids for use during the
     talk.  Careful adherence to these instructions will result in a
     calmer interaction between submitters and conference organizers, to
     everyone's benefit. NO MATTER WHAT, please read the enclosed
     checklist and follow its instructions.

Final Paper Submission Deadline: August 12, 2002

1.  General

     Please do not spend time making the paper look just the way you want it.
In fact, don't spend any time at all.  We'll end up re-paginating and re-
formatting it, anyway.  Really; every single time.  If it's convenient or if
you have tricky tables, diagrams, or pictures, send a Postscript copy of the
entire paper along so that the typesetter can see what it looks like.

     Note that you are not sending a final draft or a paper to which you still
have edits to apply.  You're submitting a final paper which will be returned
as time allows to you for a single `final typesetting check' before being
reproduced.  Please don't ask to make very many changes after you've sent your
paper in unless it's a true emergency (e.g., company secrets shouldn't be
disclosed).

1.1  Reviewing typeset papers

     We will e-mail your formatted paper back to you (in PostScript) as soon
as it is formatted. Please review it quickly (i.e., one day) and let us know
if there are any problems or not. Most papers will be returned within a week
of their submission. Please designate an alternate reviewer if you can't
return an `OK' to us almost immediately. Three days is the absolute maximum
time for this review as we're on a tight deadline.

2.  Text Submission

     To ensure a uniform look in the conference proceedings, most papers will
be typeset for you by Rob Kolstad. This really ends up making it easier for
you if you follow the directions below - don't work hard!  Just put the words
in a file and we do the rest.

     Authors are requested to submit electronic copy in one of many forms:
          o ASCII
          o ASCII with markups for italics, etc. (e.g., <i> ... </i> or
                                                          \fI...\fP)
          o Troff
          o TeX (subset of LaTeX standard macros)
          o Microsoft Word (but not Microsoft Word illustrations - only
                                                          postscript
                                                          illustrations)
The typesetter will create both PostScript (for proceedings and archives) and
HTML output (for the archives). Submissions will also be published in the
USENIX online library on the World Wide Web.

     Please do not spend any time at all formatting your paper other than
getting the content of the diagrams to look nice (they are inserted without
change, usually) and somehow marking those words that are in italic, bold, or
fixed-width type. The typesetter does a fine job and will send you back a
formatted copy for review and comment.

     You probably won't be able to get a good test of what your final output
will be like since character width tables, for example, differ from system to
system. In fact, you're better off sending your paper early and letting the
typesetter do his magic and get it back to you quickly (often within 24 hours
if you beat the deadline by more than a week).

     ASCII is a fine submission medium. FrameMaker, MSWord, and other WYSIWYG
editors all have an ASCII output option - feel free to use it.  Send along a
postscript file of the document with properly displayed italic words, bold
words, and the like so we get it typeset correctly.

     Some authors feel so strongly about typesetting their papers that they
are reluctant to submit them for re-typesetting. If you are one of those very
few authors, we will paste page numbers onto your paper and print it as
submitted. We strongly prefer not to do this and have not done so for almost a
decade. If you require it, please send us both PostScript for your paper (and
for the archives) and ASCII (for the archives and WWW pages).

     If you can not submit electronic copy, please contact Rob Kolstad
<kolstad@delos.com> [Mountain time: 719-481-6542] IMMEDIATELY.
         => ALL SPEAKERS MUST PROVIDE A PAPER FOR THE PROCEEDINGS <=
If you can not submit a suitable copy of your paper by the deadline, your talk
will be struck from the conference.

3.  Formatting

     Please DO NOT SPEND TIME FORMATTING YOUR PAPER. We just end up
reformatting it anyway. Counterintuitive as this might seem, it really is the
fastest way for everyone concerned.

     While the typesetting dude will be using troff, hardly anyone else does
any more. Please send a quick note if you'd like to submit in native troff
macros. Otherwise, feel free to use -me or -ms, but don't make it fancy!  Feel
free to submit pidgin troff with any consistent use of macros.

     If you'd like to submit in TeX, we use a converter that reads a subset of
LaTeX and converts it to our standard format. Please do not use other TeX
features (like macros and defines)! Please do not create your own clever
macros! This includes abbrevations of any sort. Please use only these macros:
      abstract            emph             list            subsection
      and                 end              maketitle       subsubsection
      author              enumerate        newblock        table
      begin               evensidemargin   nocite          tabular
      bf                  figure           noindent        texttt
      bibitem             footnote         normalsize      textwidth
      bibliographystyle   footnotesize     oddsidemargin   thanks
      center              hline            ref             title
      cite                hyphenation      reg             tm
      date                it               section         tt
      description         item             sl              verb
      document            itemize          sloppy          verbatim
      documentstyle       label            sloppypar
      em                  ldots            small
Please use the standard font changes, not new ones that you define.

     If you'd like to submit in Microsoft Word, enclose the file as some sort
of an attachment and it will be converted. Note that we can accept
illustrations only in encapsulated Postscript - MSWord illustrations are
almost always unusable.

     If you don't want to submit in troff, TeX or MSWord, PLEASE SUBMIT AN
ASCII FILE (maybe with markups to indicate typesetting directives like
`italic' or `bold') - it takes less time, on average, to typeset an ASCII file
from scratch than it does to typeset page numbers onto your pages. Double
space between paragraphs; mark your headers in some consistent format that a
human can decode. It takes about 10 minutes for us to add typesetting commands
to an ASCII file. It can take hours to untangle clever macro invocations
and/or formatting commands from other formats.

     We can not decipher PostScript versions of entire papers for re-
typesetting purposes.

4.  Diagram/Illustration Submission

     Submit diagrams as Encapsulated Postscript files. Note that just putting
a diagram on a page and capturing printer output is not encapsulated
postscript. In fact, various packages now use so many tricks that it is often
impossible to extract the essence of the diagram. Encapsulated postscript
truly is the answer.

     Furthermore, FrameMaker pictures are really painful to deal with - their
encapsulated postscript doesn't work very well.

     Please do not put captions with your diagrams or boxes around them.  Put
the captions in the text (as comments if you need to), not the diagram. The
typesetter will insert the diagram name (e.g., ``Listing 1'') and caption on
the diagram. Feel free to send us GIFs or TIFFs (e.g., screen images) for
making into Encapsulated Postscript - our tools work great.

     Normally, your paper will end up being typeset in double columns. If you
can confine your tables and pictures to about 3.25 inches wide, that'd be
great. Otherwise, we'll put the larger figures and diagrams across the two
columns - most likely floated to the bottom or top of the page. Be sure that
we can decipher the names of your figures and where they should appear in your
text.

5.  Details

5.1  Page Count

     Please limit the length of your paper to 16 typeset pages (surfaces)
total (some papers are more limited; you've been informed if this is the
case). Some dispensations may be available. Note that the USENIX macros do a
good job of compressing your paper from other printing methods. Many papers
see page compression ratios as high as 2:1 or higher (i.e., 30 typewriter
pages turn into 15 typeset pages). We do not anticipate page length to be a
major issue.  Please contact the typesetter if you think you might exceed this
limit.

5.2  Author's Biographies

     Author biographies follow each paper. Here is a sample biography:
   Tom Christiansen left the University of Wisconsin in 1987 with an MS-CS
   where he had been a system administrator for 6 years. He joined CONVEX
   Computer Corporation in Richardson, Texas where he is a software development
   engineer in the Internal Tools Group there.  Reach him via U.S. Mail at
   CONVEX Computer Corporation; POB 833851; 3000 Waterview Parkway; Richardson,
   TX 75083-3851.  Reach him electronically at <tc@convex.com>.

5.3  Availability

     Please include a section at the end of your paper that provides software
availability information. If the system you describe is available to others or
if more information (reports, etc.) may be obtained, please indicate terms and
contact information.

5.4  Submitting your Paper

     Submissions should be mailed electronically to: <kolstad@delos.com>.

     Please mail the ``big paper'' in a single file along with separate .eps
files using MIME or some other reliable transfer mechanism (not PGP).  All
received e-mail is unpacked and acknowledged as soon as it appears that the
files are received correctly. Illustrations are checked immediately as they
are often the main problem in transferring info to us.

     In case you mail us hardcopy of your paper (due to complex figures and
tables, difficult typographical features, or to show bold/italic words), we
have learned that the US mail system and Federal Express are not always as
gentle with your paper as you might like. We have found that the following
guidelines maximize your paper's chance of survival: no paper clips; no
staples; correctly sized envelope, not too big; cardboard stiffeners front and
back. This applies particularly to photographs that you may mail to us.

5.5  Trademarks

     Please do not include special trademark notices throughout your paper
(e.g., ``UNIX is a trademark of ...''). Instead, please note the trademark
references near the top of your paper in comments (or send them under separate
cover that notes that the trademarks are used). We'll include all the
trademarks in one section at the beginning of the proceedings.

5.6  Licensed Material

     The matter of publishing licensed material is of grave concern to USENIX.
One incident has already come dangerously close to causing serious problems.
The Association does not intend for this to happen again. The final copy will
be reviewed for such content, and if, in the opinion of those responsible for

worrying about such things, the submission contains licensed material, it can
not be printed in the Proceedings. The deadlines are such that it is unlikely
one could amend a submission and get it back in time, so look before you send
it. If there is ANY question on this matter, please contact your program chair
immediately.

5.7  Author Registration for Conferences

     Registration info has been sent to you under separate cover.

6.  Deadlines

     In order to publish the most timely of papers, the publication schedule
is very tight. We require your cooperation in order to have the Proceedings
available at conference registration. To meet the necessary internal
deadlines, we must have:
    o The electronic copy for your paper (including biographies),
    o Hard copy of it for reference (rarely required),
    o Hard copy of any graphics embedded in the paper (rarely required), and
    o The completed checklist (electronic copy is fine); see next paragraph
We must have these in hand at the close of business on August 12, 2002.
Manuscripts arriving after this date will be at risk.

     On the enclosed checklist that you will include with your paper, please
be sure to indicate: your name; your mailing address; your electronic mailing
address; your phone numbers, preferably both day and evening numbers (which we
really do use when required); and any other information the people printing
the Proceedings might need to contact you (such as "on vacation next week,
please contact second author at ..."). It is vitally important that we have a
way of contacting you quickly if there are any problems with your submission.

7.  Contacts
The program chair is Alva Couch (couch@eecs.tufts.edu)
The proceedings typesetter is Rob Kolstad <kolstad@delos.com> [Mountain time:
719-481-6542].
For other issues,such as audio/visual requests and registering for the
conference, contact conference@usenix.org [Pacific time:  714-588-8649x30].

     Please note: laptops are not provided. If you wish to discuss any special
A/V needs, please contact Greta Thurman; conference@usenix.org; +1
714-588-8649x30

     If you need to contact any of us, the preferred route is electronic mail.

                      CHECKLIST FOR USENIX AUTHORS AND SPEAKERS
                     LISA XVI, 2002, Philadelphia, Pennsylvania

          1.  Electronic Submission Checklist
          All these items are due by August 12, 2002.  If you can not
          submit by then, please contact the formatter immediately.  Please
          submit them electronically to <kolstad@delos.com>.

          1.1  Author's Contact Information
          Please provide contact information for at least one author who
          will be in town the week following the deadline for paper
          submission:
          Name:         ____________________________
          Address:      ____________________________
                        ____________________________
                        ____________________________
          Work Phone:   ____________________________
          Home Phone:   ____________________________
          Email:        ____________________________

          1.2  Items for the paper
          _____ text of your paper in ASCII, troff, TeX, or MSWord
          _____ Text includes biography for each author (in the paper itself)
          _____ TeX users: include output after running your text through refer or bib (if you use them)
          _____ Troff users: pic input for your graphics (in the paper, if you use them)
          _____ Encapsulated PostScript files for all diagrams & pictures (if you use them)
          _____ Special instructions for formatting your paper (if required)
          _____ 'Consent to publish' form sent to USENIX HQ

          2.  Audio Visual
          _____ While a microphone, overhead projector, and `beamer'
              (computer display projector) are available, laptops are not
              provided. If you wish to discuss any special A/V needs,
              please contact Greta Thurman; conference@usenix.org; +1
              714-588-8649x30


From stevegt@TerraLuna.Org Wed Jul 10 02:39:42 2002
Date: Wed, 10 Jul 2002 02:39:42 -0700
From: Steve Traugott <stevegt@TerraLuna.Org>
To: Paul Anderson <paul@dcs.ed.ac.uk>, couch@eecs.tufts.edu
Cc: lance@TerraLuna.Org
Subject: Progress on LISA 050 (Turing)
Message-ID: <20020710023942.B23398@scramjet.TerraLuna.Org>
References: <200207041229.g64CTIZ04264@moondog.dcs.ed.ac.uk> <20020705144722.B30520@scramjet.TerraLuna.Org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20020705144722.B30520@scramjet.TerraLuna.Org>; from stevegt@TerraLuna.Org on Fri, Jul 05, 2002 at 02:47:22PM -0700
Status: RO
Content-Length: 2620

Hi All,

Taking longer than I expected, but making progress.  I wouldn't waste
your time looking at it yet, but I just checked the latest (very)
rough draft out of CVS at:

   http://www.infrastructures.org/papers/turing/turing.html

...should be more readable by the end of this week.   I want to get
this stabilized soon so I can focus on shepherding other papers.

Two interesting thoughts (neither of which is in the text yet):

  - I think we can now demonstrate a clean NP, P, NP-complete
    progression for finding one good sequence of changes, validating
    that sequence, or finding all good sequences, respectively.  The
    idea here is that it's slightly expensive to figure out what order
    to make changes on a host to get a desired behavior, less
    expensive to test for that behavior, but intractably expensive to
    figure out the set of all sequences which will produce the same
    behavior.  This would seem to imply that you will greatly reduce
    costs if you can constrain yourself to the first sequence you
    validated.  This is nice to be able to show, since it's one of the
    core intuitions we started with last year.  (Thanks for pushing me
    on this, Alva.)

  - A network-connected host which autonomously fetches and executes
    privileged-mode code from other hosts, and uses that code to
    decide what code to fetch and execute later, may be closer to
    emulating an infinite-tape Turing machine than any of us has
    suspected (it's a two-tape machine, with a finite, local r/w tape
    and an effectively infinite read tape).  This seems especially
    true if privileged-mode code has an intrinsic ability to fetch and
    install new copies of itself from arbitrary servers elsewhere on
    the Internet (think debian, for instance).  Still thinking about
    this one.

G'night,

Steve


On Fri, Jul 05, 2002 at 02:47:22PM -0700, Steve Traugott wrote:
> ...slaving away at it over this 4-day weekend...  ;-)  Hope to get a
> draft to both of you by Monday.
> 
> Steve
> 
> On Thu, Jul 04, 2002 at 01:29:18PM +0100, Paul Anderson wrote:
> > 
> >   :-)
> > 
> > 	Paul
> > 
> > couch@eecs.tufts.edu (Alva L. Couch) says ..
> > 
> > >Happy 4th of July.  Now would be a good time for shepherds to remind
> > >authors that they'd like to see a near-final draft before the end of
> > >July.
> > 
> > 
> > 
> 
> -- 
> Stephen G. Traugott 
> UNIX/Linux Infrastructure Architect, TerraLuna LLC
> stevegt@TerraLuna.Org   
> http://www.stevegt.com

-- 
Stephen G. Traugott 
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org   
http://www.stevegt.com