
<section name='faq' title='Frequently Asked Questions'>

	<section type='faq' name='just-test' title="But is
		ordering really that important?  Can't you just test
		your changes before putting them in production?"> Sure.
		But to do that, the production machines have to have
		been built *exactly* the same way that the test
		environment was, or else the test is invalid.  The only
		way to ensure that test and production are built the
		same way is to, well, build them the same way, applying
		changes in the same order in each case.  </section>

	<section type='faq' name='just-dd' title="Okay, if I test
		a machine and like the results, why can't I just dd the
		disk image to a production box?"> You can -- you've just
		described the "starting bit state" [#bitstate].  But if
		you need to make any changes to that machine later,
		you'll need to have some way of testing them outside of
		production. [#testprocedure] </section>

	<section type='faq' name='just-backout' title="But why not just
		test in production?  If the change broke something,
		just back it out.  If the backout fails, then
		just re-install.">  XXX If anyone still feels this way, then
		they should re-read the above paragraphs about
		reliability and downtime.  Testing in production, and
		relying on scheduled downtime and backout windows, eats
		into uptime numbers and precludes 24x7 operation.  A
		global economy has no "off hours".  I think about this
		every time I'm waiting in line at the Hertz counter
		after arriving in Tampa at 2 a.m. on a Friday night,
		right smack in the middle of the Hertz scheduled
		maintenance downtime. </section>


	<section type='faq' name="whodecides" title="Does every
		change really need to be strictly sequenced?  Aren't
		some changes orthogonal?">
		
		<p> While it is true that some changes will always be
			orthogonal, we cannot prove orthogonality in advance,
			due to the halting problem.  [#halting] </p>

		<p> It might appear that some changes are "obviously
			unrelated" and therefore not subject to sequencing
			issues or the halting problem.  The problem is, who
			decides?  </p>

		<p> An anonymous reviewer of this paper agreed that "I
			think there's probably some benefit to be had by
			asserting that requiring independence is a difficult
			task, fraught with peril in the real-world." </p>

		<p> Another reviewer suggested that network and editor
			changes could be considered orthogonal.  However, VIM
			6.0 now supports remote slaved edit sessions via TCP
			[vim].  Again -- in practice, in the field, when
			deploying a change, who decides?  </p>
		
		<p> In practice, the most highly-skilled Infrastructure
			Architect [#iarch] in the world can't be depended on
			to always catch otherwise-unforseen dependencies.  In
			practice, the decisions for a particular change will
			usually not even be made by an architect, but by an
			Infrastructure Administrator [#iadmin].  It would be
			irresponsible for us to assume that either of these
			individuals have 20/20 foresight.  </p>

		<p> In other words, it's likely true that there are
			changes which are fully orthogonal.  But we don't
			know, and in practice won't know, which ones they are.
			Lacking this knowledge, in the field we must assume
			that all changes are subject to seqencing issues.
		</p>

	</section>

	<section type='faq' name="rule8020" title="Isn't all of
		this being a little too perfect?  Can't we settle for
		being correctly sequenced 80% of the time, and manually
		find and fix the other 20%?">

		<p> That 20% is divergence.  [#divergence]  Once you
			have allowed divergence to occur, you now need to
			start, and perpetually maintain, a convergence
			process.  [#convergence]  You no longer have the
			ability to make deterministic, predictable changes.
			[#deterministic]  This means that you now no longer
			have a production infrastructure [#production], but
			are instead, by definition, now running your
			production applications and users in a test
			environment.  </p>

		<p> If you allow 20%, 10%, or even 1% of your changes to
			be randomly sequenced, even once, then you will need
			to do regression and other testing for <b>all</b>
			changes on <b>all</b> of the hosts which received the
			random sequencing, both now and in the future,
			[#testing] to determine if any of those unsequenced
			changes broke something.  Keep in mind that, because
			they were randomly sequenced, unforseen dependencies
			may have broken only some hosts.  Without testing, you
			will not know which hosts, or which changes, are
			involved, if any.  [#whodecides] </p>

		<p> Due to possible barrier problems [#barrier], you may
			also need to test hosts which only received sequenced
			changes.  </p>

		<p> Due to the fact that convergence processes do not
			close [#closure], you will reach the point of
			diminishing returns as you try to increase your level
			of automation.  [#diminishing] </p>

		<p> In short, you will no longer have an Enterprise
			Virtual Machine [#evm], capable of being managed as a
			single predicable unit.  The purpose of this paper is
			to enable EVMs.  You will instead have a conventional
			convergent set of hosts.  Convergent infrastructures
			have been covered in [burgessXXX1] [burgessXXX2]
			[burgessXXX3].  </p>


	</section>
</section>

<h2>Review Comments XXX</h2>

Note to reviewers:  This section will go away.  Most of
these these comments are those received from the program
committee; we've inserted initial reactions into [ square
brackets ].  While the PC accepted the paper, a great many
of these comments are highly critical.  Most of these
criticisms are due to a clear misunderstanding of the
wording of the paper itself, not of the intent.  Some of
this we can fix.  Some we can't -- if someone tries hard
enough to misunderstand something, there's no way we can
stop them.  But we're going to distill most of these review
comments into the above FAQ section, to keep ourselves
honest by making sure that our own responses to these
arguments are published.  Hopefully one or more of these FAQ
entries will help someone else.

	<pre>


The prescription -- a starting bitmap + a bitstream of all possible changes
is easy to see as a theoretical sort of argument.  I'm a bit more concerned
about the pragmatic aspects of such an idea.  Maybe the paper should show
some of the ways that real upgrades, real software, real-world
administrative techniques can use this data.  I don't see how such a thing
would work in the framework as described.


Finally, I don't think the paper derives much 'bang' from
the fundamental notion of Turing equivalence.  That seems
like a neat result that might have some predictive
properties -- maybe they can be explored?  [ explain why it
is necessary for architects to understand and account for
turing effects ]

=======================================================================

The theory of system administration is much needed but has
received little attention until very recently.  This paper
adds to that theory with a detailed discussion of the
theoretical ramifications of "machines that repair
themselves." The authors point out rightly that this aspect
of system administration poses serious theoretical questions
that may be undecidable in the computer science sense.  They
contend that we can avoid some of the undecidability in this
situation by adopting a strict order of operations for
configuration actions.

This is quite true.  It is not the only way, however.

Throughout the development of the theory of computer
science, we have found that difficult problems become easier
with the addition of appropriate constraints.  While it is
true that unconstrained automated processes are Turing
equivalent, it is also well known that such processes become
easier to analyze when simple constraints are imposed.  For
example, limiting one's self to copying files and links in
the 'cfengine' sense makes one's configuration process
equivalent to a simple state machine rather than a turing
machine.  [ explain why copying files and links is also
subject to turing effects ]

Deterministic ordering is *another* way to achieve
this reduction for arbitrarily complex configuration
actions.  There are many others. [ what are they? ]

One must be *very* careful with language.  "Turing
equivalence" is an ideal state for ideal machines.  The
"infinite" nature of such machines makes "undecidability"
possible.  This is not an attribute of real machines (with
finite memory), for which determining the correctness of a
configuration is simply "intractable."  [ explain that the
infinite tape size in in fact represented, when you have
gold servers, image servers, debian servers, etc. in the mix
]

I have serious problems with the writing style, which is
choppy, with many one-sentence paragraphs that fail to fully
develop ideas.  Paragraphs need to be combined and developed
so that they flow. [ finish sentences, eradicate long
sentences, break up paragraphs ]

Make sure that you include references to the many papers in
the literature that disagree with your conclusions!  [ yes.
show cfengine/immunology, for instance ]

=======================================================================

This paper advances the premise that the order of
application is important for configuration changes, since it
is not possible to prove that two configurations are
identical if changes are applied in different orders.  This
argument is used to advocate a configuration system in which
all hosts start as identical disk clones and have changes
applied in identical sequences.  [ reviewer didn't
understand...  explain that all hosts do not need to be
identical, that we want them to be unique, that uniqueness
needs to be managed deterministically, that that these
techniques are already in production use ]

This is an interesting concept for anyone involved with
configuration systems, particularly in highly critical
environments with large numbers of identical machines.  [
not identical ]

However, there are a number of objections to this approach
which are not addressed in the abstract, that I would like
to see covered in any final paper:

1) Machines are not completely deterministic in practice
(they get external interrupts) and even machines which have
had identical configuration changes applied, will not have
identical disk images or even identical sequences of system
calls.  [ we are talking about writes to root-owned portions
of disk.  yes, there are portions of user data which can
also cause perturbations, and there are root-owned portions
(logs) which we don't care as much about, but that's where
theory and practice need to be cleanly defined ]

2) For the purposes of system configuration we are
interested in equivalence  between certain important
high-level behavioral properties, and it may be possible to
prove these, without requiring equivalence at the bit level.
[ can't, due to halting problem ]

3) I would like to see some concrete examples where
alternative sequencing has caused a problem in practice.Is
this a significant problem compared to other system
configuration difficulties?  [ yes, once you've removed all
the low-hanging fruit, you run into sequencing issues.
Think of some examples. ]

4) I would like to see how these ideas might be applied in
practice (the abstract does suggest that this is intended)
[ examples section ]

5) How can we deal with systems with very diverse and
rapidly changing configurations?  [ that's what this is all
for -- static configurations aren't interesting from a
change management point of view ]

I personally found the discussion of Turing and Von Neumann
machines a little confusing, and it might be worth trying to
express this in a more concise way. [ move turing and vn
definitions to glossary, make sure people read it, test for
understanding throughout doc ]


=======================================================================

This paper is an enthusiastic attempt at arguing that system
modifications must always be ordered. However, I do not
think that it succeeds in doing so in an effective or
convincing way. There are a number of major problems with
the abstract:

* The two halves of the paper--pages 1-top of 3, and
3-end--are unconnected, despite assertions to the contrary.
The argument in the second half of the abstract--which is
its true point--does not really depend on or even use the
Turing/Von Neumann concepts presented briefly in the first
half of the abstract.  [ use [#refs] to glue them together ]

* The paper does not discuss its position in the context of
the large amount of related work. Rather, it seems to simply
take up an conversation which occurred last December. Keep
in mind that most LISA attendees will have no knowledge
whatsoever of this context.

* At some points, the paper makes unwarranted and often
unacknowledged assumptions. For example, point 4 on page 3,
which asserts that ABC is always non-equivalent to ACB.
Constructing a counterexample is trivial (just choose ABC to
be independant--not hard at all).  [ yes, but who decides? ]

The rest of the argument follows from this dubious, invalid
assumption. Indeed, this is the whole point of contention in
the first place, so simply asserting that it is so begs the
question. Logically, the argument is:
   o  We must decide between A and not A.
   o  A is obviously true.
   o  Therefore, A follows.
   o  Therefore, A's consequences follow.

I agree that if order matters and it is ignored, then bad
things happen.  But order does not always matter. Deciding
when it does, the extent to which it is important, and the
best tradeoff of complexity vs. risk of unforseen
interactions is the hard part.  [ exactly ]

The authors need to spend a lot more time thinking about
these topics.  [ show that we did ]

Finally, the first sentence is bizarre; I'd rethink it and
revise it into something with a little more meat and less
cuteness. [ hmmm... ]

=======================================================================

Very well written, good detail and references.  (I am amused
to see Turing's 1939 work cited. :-) )  Very nicely done.  [
more antique citations ]

=======================================================================

This paper presents me with a dilemma. On the one hand, it
addresses an interesting problem and does so in a
provocative way that I approve of. It purports to offer a
counterpoint to cfengine's recently much written-about ideas
of configuration "convergence". I would like to see a paper
that does this succeed in making its points clearly.

The problem with this abstract, as presented, is that it is
full of technical errors, misunderstandings and untrue
assertions about systems.  It might be possible to salvage
something of it, with significant sherpherding, but as it
stands it is just misleading and factually incorrect. [ but
we can knock down all of his points... ]

Let me take some examples. 

"A 100k sample cannot determine the state of a 2GB disk."
This is obviously not true. A rudimentary program (e.g. rm
-rf /) takes only a few bytes to implement, but very clearly
determines the whole disk.  [ semantics -- reviewer didn't
realize paper meant "detect" ]

More generally, the authors are assuming that every bit of a
configuration is significant. It is not. [ yes, but who
decides? ]

Turing machines would not work at all unless algorithms
could be coded into significantly less space than the size
of their tapes (actually their tapes are infinite, so von
Neumann machines only approximate Turing machines).  [
standalone vn machines can only emulate limited universal
turing machines; but vn machines which can get or put
executables and data over network-reachable storage can
emulate full turing machines; turing machines can fully
emulate vn machines ]

Compression and coding need to be taken into account. This
is an information theoretical limitation, nothing to do with
Turing machines. The authors do not discuss the amount of
information required to describe a configurastion at all.  [
yes we do -- starting state + deltas - equates to starting
state of turing machine + XXX; add a fetch/upgrade
instruction to turing? ]

This has to do with the amount of entropy in the disk: if
the disk is to be empty, it is very easy to keep it that
way.  That is still a valid configuration. Clearly there is
some middle ground here. Not all bytes are independent, [
but who decides? ] so it is not true that one needs the
complete history to explain a state like that resulting from
rm -rf /  [ if you rm -rf /, then you have executed a
monotonic change -- you cannot afterwards detect any
information about disk state whatsoever -- your tools are
gone ]

"The entire past history of changes to the machine are
important...  if we expect deterministic, reliable behaviour
in operation..." The authors seem to be forgetting all of
the changes made by users.  Are they talking about only a
part of the computer, or all if it?  What about all of the
hidden variables caused by network communications, resulting
in changes to log files and other effects etc?  [ In
practice we're talking about root-owned non-log files.  This
is slightly fuzzy.  The world is slightly fuzzy. ]

"If we were to allow the order of changes to be unsequenced,
we must test each possible sequence of changes. This is an
intractable problem." This is wrong. Indeed, Couch showed
how this could be done at LISA 2001, and this is the same
approach used by cfengine. The number of orderings might be
M!, but that is not the number of interest, because it
neglects the coding and the amount of information in those
bits.  [ reviewer is confused.  couch did not attempt to
demonstrate this.  couch agrees that testing the results of
each N! possible sequence is intractable ]

"..must continue to test [hosts] individually henceforth ...
this is very expensive. It is easier to avoid unsequenced
changes" First of all, the idea of a sequence assumes that
there is a prior dependency chain at work. This is only true
if the operations being performed are intertwined -- that is
only true of complex operations, [ who decides what is
complex? ] and is usually an artifact of software [ huh? ].
The idea that testing the configuration of a host is
expensive is an assertion, not backed up with any figures.
[ okay, so you do the regression testing of every stinking
application on a host ] Cfengine seems to get along fine
with this.  [ cfengine is an immunology tool, not a build
tool -- it gets along with continual tweaking and testing in
production -- what, you deploy cfengine changes without even
testing?  see halting problem ] Also the assertion that it
is easier to avoid that approach is just an opinion, not a
justified conclusion.  [ show prior anecdotal experience ]

The authors miss some points which would be in their favour
- such as how ordered information within critical policy
	files [ isconf ] fits their arguments much better than
	arguing about Turing machines and systems as a whole.
	[show why we had to make the turing argument ]

Critical policy files can determine the entire behaviour of
the system, but they are just a tiny part of it, and they
can lead to critical dependencies. It is not so much order
that matters as the chain of dependency.  [ explain that
dependencies are unknown/undecideable, so are unreliable if
they are used as the only mechanism ]

I would drop the whole Turing argument - it is a red
Herring. [ it has allowed the debate to progress beyond
untheoretical assertions ] If the authors want to criticize
cfengine's approach, they should attack it where it is
weakest: in the issue of hidden dependencies through
unpredictable software and its relatioship to configuration
files.  [ all software is unpredictable in a new build, due
to halting problem ] Unpredictable software can turn
predictable intentions into chaos. [ and predictable
companies into bankruptcies.  that's the point of this whole
paper ]

=======================================================================

This is an important paper for the advancement of
theoretical sysadmin.

It unfortunately provides more problems than answers, but
sometimes one has to walk deeper into the forest before
finding ones way home.

There's a case to be made that a workshop is a better place
to work on theoretical sysadmin, especially for papers like
this with "intermediate results". I'm eager to see the fruit
of the theoretical sysadmin movement, but I'm unwilling to
see LISA turn into a collection of dry, inaccessible,
mathemtical arguments. This paper threatens to take us down
that road.  [ sysadmin vs. IA ] [ try to make it accessible ]


=======================================================================

This is a strange paper. The writing often consists of a
series of one-sentence paragraphs, and they sentences look
like points of a manifesto. [ turn it into numbered sections
overall ]

It continues a debate from LISA 01 that this reviewer did
not participate in, on whether or not a self-administering
system needs to be deterministic in actions. The authors
suggest this point is a touchstone upon which the community
must agree or else little useful progress can be made. The
crux of the argument seems to be whether you need to include
the history of changes to a disk in a self-administering
system that can modify anything on the disk. [ yes ]

It then switches to a tutorial on Turing machines, followed
by the author's observations on Turing machine equivalence.
We agree that a von Neumann computer is a Turing Machine,
but are surprised that anyone in the last 50 years thinks it
necessary to make that point in a paper.  [ vn machine is
not a turing machine -- a turing machie can emulate a vn
machine though; reviewer is confused ]  

They seem to think that the Turing tape corresponds to disk.
A simpler (and widespread) alternative interpretation is
memory is the equivalent to tape. Perhaps the authors are
impressed with the nonvolatility of paper tape and disks?  [
explain why we structure the disk/tape explanation this way
]

This paper is an op-ed piece that was inspired by
discussions at the last LISA, with cfengine the apparent
target of its wrath. [ less wrath ]

The program committee will have to judge this paper on
whether it will lead to useful discussions at the next LISA.
I think the probability of lively discussion is a perfectly
valid reason for the PC to accept a paper.  It is not clear
to this reviewer what are the important contributions of
this paper beyond a potentially lively debate, and whether
it sheds light or just adds heat.  [ shed light.  inform.
etc. ]

=======================================================================

	</pre>


	<section type='faq' name='truths' title="A lot of what
		you're saying sounds like 'laws' or absolute truths --
		are they really?">  No.  A lot of this is fuzzy.  The
		world is fuzzy.  Quantum mechanics is fuzzy.  You need
		to do your own research.  The best we can do is relate
		what we think we've learned.  Your mileage will vary.
	</section>


</section>

