From stevegt@TerraLuna.Org Wed Apr 24 17:07:38 2002 Date: Wed, 24 Apr 2002 17:07:38 -0700 From: Steve Traugott To: "Alva L. Couch" Cc: Joyce Cao , joyce.cao@idt.com Subject: Re: ordering Message-ID: <20020424170738.A25282@scramjet.TerraLuna.Org> References: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Fri, Apr 19, 2002 at 04:57:52PM -0400 Status: RO Content-Length: 9491 Hi Alva! Joyce and I have both been digesting this message and your next. Some answers so far... On Fri, Apr 19, 2002 at 04:57:52PM -0400, Alva L. Couch wrote: > I think at the outset that I agree somewhat with the conclusions but > not with the mathematics. There are some mistakes. - if you have N > things over all orderings, there are N! possible outcomes, not N^2. Of course. My mistake. Fixed. But I was talking about number of possible disk states, given an N-bit disk. That would be 2^N possible disk states. I had it backwards -- silly error. Thank you. On the other hand, you make a very good point -- ordering of operations is itself numerically interesting, because of that factorial. So I'm adding that, and elaborating below. An HACMP cluster node build is around 121 discrete operations, for instance, if you only count the number of stanzas in the ISconf makefile. That would be 121!, or 8.09*10^200 possible orderings. These produce a disk content which uses 2Gb of space, and has 2^(8*2*10^9) possible bit states. I don't yet know what magnitude that number is -- I've had a perl one-liner chewing CPU for several hours so far. > - One thing that you ignore is the concept of orthogonality. Two > systems are orthogonal if changes to one can't affect the other, and > vice versa. The effect of orthogonality is that operations on > orthogonal subsystems always commute. E.g., a*b = b*a. The effect > of orthogonality is to reduce the space of possible sequences. > E.g., I think everyone would agree that the 'networking subsystem' > is orthogonal to vi. Changes to vi can't affect networking, and > vice versa. Thus the results of a change to both is the same in > either order. Yes, but in practice, who decides whether, or to what degree, two subsystems, or operations on them, are related? For example, the latest version of VIM does contain networking code, to support remote slaved edit sessions. I would never have known about this possible dependency if I hadn't read the docs. I suspect the most reliable and fast way to ensure that two sequences produce the same outcome is via regression testing. As the number of possible sequences increases, so does the required number of regression test iterations. Going back to that HACMP build, a full regression test run to determine equivalency of all possible orderings is problematic to say the least; call it 1 hour for each rebuild, that's 9.24*10^196 years. (Had to install Math::BigInt to get that.) ;-) Testing an HACMP cluster actually takes several hours after the build itself is done. That extra time would only hurt us by 1 additional order of magnitude, so I'm neglecting that. Looking at the HACMP makefile dependency trees, 61 of the operations are "leaves". Only a few of those would be orthogonal to each other. But let's assume that they all are. We know that order matters in the remaining 121-61 = 60 operations. So that lowers us to 9.5*10^77 years of regression testing to determine which sequences of those 60 operations are equivalent. Of course, much of that regression testing could be eliminated by pruning "dead branches", sequences which include subsequences which are known to be bad. Let's say that 99.999% can be instantly culled by recognition, before testing. That still leaves 9.5*10^72 years, unless I've slipped a decimal somewhere. Let's say disk, network, and CPU speeds increase by a couple orders of magnitude, and those 1-hour builds each take 1 second instead. That brings us down to 9.5*10^69 years to process the culled order space. But this all presupposes that we already know the entire set of operations to be performed. When doing builds or upgrades this is never the case. The actions needed are always discovered in the course of growing an infrastructure. In the case of ISconf, this discovery is by the person who's writing and debugging the makefile; "Oh, looks like bar needs to be installed before I can install foo; I didn't know until now that we even needed bar. Ok, I'll add bar to the makefile as a prereq for foo." This also presupposes that discovering a viable order of actions is hard, and that exhaustive exploration of possible orderings is a desireable way to yield an optimum outcome. In practice, the most viable order is the order in which the operations were discovered and executed in the first place. At every point in this process, we already have an order that will work. And this also presupposes that the list of operations is a fixed set, regardless of order. In reality, changing the order will require changing the membership of the set -- for example, failing to mount /opt before populating /opt/foo will require a later migration onto the 'real' /opt. This migration is an additional operation which would have to be discovered and added to the set. This is all reaching the point from an extreme direction; the point being that, for building or maintaining a given host class, it's easiest in practice to constrain the ordering to the one known, working sequence, test the heck out of it, and only add new actions to the end of the sequence. New host classes can then be created based on variations of the known good sequence. In ISconf, new host classes get created by creating new makefile stanzas, naming ordered sets of existing stanzas as prereqs. In this case, we benefit from known working subsequences, and the internal ordering of those subsequences is preserved. > I have been working on the concept of ordering more than you > realize, but can't as the chair publish a paper on it this year. > In particular, I have designed a formalism that portrays the > behavioral effects of configuration, then analyzed the structure > of operations on behavior as a semigroup. This semigroup can > be decomposed by the factor theorem into subsemigroups describing > changes to disjoint subsystems. > > This fancy math demonstratese that storing the complete ordering is > 'more work than is required', though cfengine stores 'less than is > required'. All one has to do is to serialize the changes to > orthogonal subsystems. Since operations on orthogonal subsystems > commute, one can always express them as groups of operations on one > subsystem at a time. I think, after looking at it over the last few years, that this is a lot of what make is doing for ISconf. I preach that targets should only get added to the end of prereq lists when editing a makefile. This means that changes within orthogonal subsets are always deterministically serialized. More importantly (I think), human error in determining orthogonality tends to be masked by the deterministic way that make iterates over multiple prereqs in the same sequence every time. This way, even though two actions might not be explicitly linked in the dependency tree, they are nevertheless serialized identically on every run, removing the possibility of indeterminate behavior. For example, the deepest dependency graph explicitly coded in that HACMP makefile is 8. There are deeper HACMP dependencies not explicitly coded; they are being maintained by make's deterministic prereq list behavior. I suspect the actual depth is at least 30. If I were to rearrange the order of any of those non-explicit prereq lists, experience tells me that in most cases the build will break. If make's behavior were indeterminate, if it randomized the order of things not explicitly stated as dependencies, or if I were to run make with a -j flag, allowing it to parallelize, experience tells me that the resulting outcome would be indeterminate and non-reproducable. At best, the build would cleanly break. At worst, the build would complete, and I would need to do extensive testing to determine whether the machine is actually capable of doing its job. My strongest driver is the need to remove the possibility of indeterminate behavior -- it's a business requirement. When I'm the consultant flipping the switch to demonstrate to the customer what I can do, indeterminate behavior is the greatest risk I can think of. It doesn't matter how long the build or upgrade takes, as long as it's less than an hour or two. It doesn't matter if the predictable, plodding nature of the build introduces security holes at predetermined times; I haven't seen that exploited yet and don't expect to, not for a few more years. But I have seen unforseen dependencies between subsystems cause build failures; it's a common failure mode. > You haven't convinced me that equivalence of the behavioral effects of > randomly ordered configuration operations is undecidable. Just > complex. You might be able to do this. I think of it from the other direction -- beyond a certain number N, it's infeasible to prove empirically that N configuration operations can be randomly ordered and produce the same behavioral outcome. N appears to be as low as 8 or 9. Genetic algorithms must have a fitness function which executes quickly. If the fitness function takes too long, then the time required to arrive at a solution is prohibitive. Building and testing a host is a 1-hour fitness function. For example, with a DNA size of only 8 operations, executing a 1-hour fitness function for each combination requires 8!/24/365 = 4.6 years. What am I missing? Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Sun Apr 28 00:06:50 2002 Date: Sun, 28 Apr 2002 00:06:50 -0700 From: Steve Traugott To: "Alva L. Couch" Cc: Joyce Cao Subject: Re: more about turing equivalence. Message-ID: <20020428000650.A13515@scramjet.TerraLuna.Org> References: <20020420002305.0706BCFC4@piano.eecs.tufts.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020420002305.0706BCFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Fri, Apr 19, 2002 at 08:23:05PM -0400 Status: RO Content-Length: 13259 Hi Alva! I'm lagging two messages behind you because I want to follow your thought processes as closely as I can. It's possible (likely) that I'll answer points that you've already addressed in your next two messages. This one I've been thinking about all week. ;-) On Fri, Apr 19, 2002 at 08:23:05PM -0400, Alva L. Couch wrote: > I took a short walk and reflected on your paper some more. > > a) Cfengine's configuration operations aren't turing equivalent, nor > are the operations posited by maelstrom. Both are "state machines" > in the language of the Chomsky hierarchy. Recall that there are > three levels of complexity in the Chomsky model, provably different: I usually see four levels -- you're lumping linear bounded automata in with turing machines, right? I don't think it matters in this case, unless you've seen something I've missed. > i) state machines/regular expression grammars. Most cfengine > configurations live here. It takes some trouble to make one > that isn't a state machine. Yes. And the state which cfengine detects is based on spot checks of the current disk content. I suspect that, due to those 2^N possible bit states of an N-bit disk, any method which uses current disk content as the input to a state machine is only going to be able to take samples. This sampling means that cfengine is not really responding to the full disk state of the machine. During any given state transition, cfengine responds to only a very small subset S of the disk state. Therefore, any given rule in cfengine's state machine can actually be triggered by any one of 2^(N-S) disk states. Those multiple possible disk states cause the outcome of any conventionally-configured cfengine run to be non-deterministic. (Ouch.) > Likewise, makefiles can usually be > REDUCED to state machines by looking at what they do on each > architecture and codifying that as state transition rules. By intent, ISconf has always wrapped and used makefiles as deterministic state machines, only. The only bits on the disk that the makefile reads to determine state are timestamp files explicitly created by the makefile itself and maintained for the life of the target host. This means that ISconf uses the full set of all prior actions, accumulated over the entire life of the target host, in order to determine current state and trigger the next state transition. Since there is only one possible disk state that could be represented by a given set of timestamp files, we get deterministic state transitions. An objection frequently raised is that this approach does not have the ability to detect and correct out-of-band changes. In a fully automated environment, out-of-band changes to root-owned portions of disk are indicators of a security breach. The only reliable way to inspect a disk for security breaches is to reboot off of an untainted disk. No tool which executes within the context of a breached disk can be relied on to detect and correct the breach, so this objection applies to cfengine, make, and even manual administration methods. > ii) pushdown automata/context-free grammars. Right now I know of > no configuration management tools in this category. Something's nagging at me here, having to do with LIFO ordering and a configuration tool with rollback capabilities etc. It could be that the "save original files for uninstall" capability of many package managers is verging on pushdown automata, but with nowhere near an accurate implementation. > iii) turing machines/unconstrained grammars. > The actual machines being configured live here. This is where > undecidability may lurk. Yes. I suspect *all* self-administered hosts are one-tape turing machines, regardless of administration tool. I'll get back to that in a moment. > This makes the CONFIGURATION model simpler than the EXECUTION > model. While the execution model remains turing equivalent, the > configuration model is fundamentally simpler in structure. Yes. Further, I suspect a cfengine script is equivalent to the nonvolatile ruleset of a one-tape turing machine, while the target machine's disk is the turing tape -- it reads a bit, alters the tape, reads again, etc. I suspect an ISconf makefile is equivalent to the ruleset of a two-tape turing machine. The directory (/var/isconf/stamps) where ISconf keeps its stamp files is a read/write tape, while the rest of the disk is a write-only tape. ISconf itself doesn't care what's out there on that write-only tape -- it drives blindly. (Though the tools which are called by 'make' might care. For instance, it's perfectly valid to call cfengine from within make.) So, yes, the configuration model and the target machine might fit different Chomsky levels when viewed in isolation from each other. But these analogies collapse into one-tape machines when you take into account the fact that any administrative action has the ability to modify the behavior of the administrative tool itself. Let's equate a one-tape turing machine's nonvolatile ruleset with the CPU instruction set of a conventional Von Neumann machine. The disk content is the turing tape. The operating system and subsidiary programs reside on the same tape. The configuration tool itself is by intent a finite automata "state machine", emulated by the turing machine. This state machine's ruleset and control code reside on the same tape as the operating system and other applications. That configuration tool has the ability to modify the content of the rest of the turing tape -- that's its job. It can also directly modify its own ruleset -- it can upgrade itself. The content of the rest of the turing tape determines how the turing machine interprets the configuration tool's ruleset. In practical terms, this means that the behavior of the configuration tool is dependent on the operating system kernel, shared libraries, and subsidiary applications. So the configuration tool can modify its own behavior in two ways; directly, by replacing its own ruleset; or indirectly, by altering the way that ruleset is interpreted. Therefore, a configuration tool is capable of the self-modifying behavior of a turing machine. The instances in which a configuration tool will modify its own behavior indirectly, by modifying code it depends on, are difficult to predict. Doing so would require a full understanding of all parts of our "turing tape" which affect the behavior of the tool. In practice this would mean scoping and conducting an exhaustive analysis of all pertinent code in the kernel, all applications and utilities which the configuration tool calls, as well as all shared libraries and configuration files these components depend upon. Unless this full understanding exists, a configuration tool residing on, executing within, and modifying the behavior of a one-tape turing machine *must* be assumed to behave like a turing machine, rather than a finite automata. > b) Even with this simplification, the sequencing problem remains. It > is easy to construct a sequence of invocations of N state machines > such that only the order x1..xn gives the appropriate outcome on > the turing machine tape. This is the same thing as what you're > saying; without more simplifications, the outcome of every order is > potentially different. Yes. Further, if the state machines are themselves executing within the context of the turing machine tape, and their states interpreted by code which they manage, then the order of invocations becomes doubly important: If state machine x42 alters the ruleset of state machine x43, then invoking x43 first, followed by x42, will cause a very different outcome than x42 followed by x43. Cfengine does not consider its prior actions of a month or a year ago to be part of the "tape" in order to avoid hitting this particular problem: it can *only* rely on those spot checks. In Mark's LISA workshop we discussed whether cfengine 1.X might be configured to maintain long-term state, or if cfengine 2.X might support this natively, in order to make it more deterministic. I'm not sure how many people present understood what this debate was really about. I think we're doing a pretty good job of addressing it here. > c) Determining an appropriate order for arbitrary state machines to > achieve a given outcome for configuration is INTRACTIBLE but not > UNDECIDABLE. Hard but not impossible. All one has to do is to try > all the orders and choose the one that produces the appropriate > outcome. Yes, if you can afford N!. In practice, we don't need to try very many of the orders at all -- the correct order is the one which worked when each state machine was itself discovered to work. In other words, if the order P currently works, and we develop, debug, and test a new state machine xn which produces the desired outcome when run immediately after P, then the new order P' should include only P and xn, in that order. The way xn gets coded in an ISconf makefile is by adding the new state machine into the makefile as an independent stanza or internally dependent set of stanzas. Then we add the xn target name to the tail end of the P prereq list, creating P'. > It becomes undecidable if other processes are SIMULTANEOUSLY > modifying configurations, asynchronously. Absolutely. Manual administration Bad, 'make -j' Bad... And using two different auto-administration tools on the same machine is Really Really Bad. ;-) > d) Determining whether an appropriate order achieves a particular > behavioral outcome remains UNDECIDABLE. This is just another > flavor of Church's thesis. I haven't been able to find a reference for this. Do you mean "determining in advance"? If so then I agree. If not, then I'd have to think about it longer. > I think the real issue here is that VALIDATION of behavior only > applies to a specific sequence of operations. If one changes the > sequence of operations, then one must prove that the other sequence > reproduced the same disk state, bit for bit, before one can claim that > the prior validation applies to the new state. YES! This is very probably a core, if not *the* core, idea that I'd like to get across. Sampling disk state alone can not ensure bit for bit reproduction of disk contents nor resulting behavior. But reproducing order of configuration changes will reproduce the same disk state and behavior every time. I further suspect that "A machine's set of behaviors will *always* vary if the disk contents vary". This might be proved by including the disk content itself as part of the output set, which I think is valid if the machine's programming includes introspection functions. Machines with different disk contents will certainly always exhibit different behaviors in response to 'cat /dev/*'. > This is a claim of "transitivity of validation", the same claim > utilized by the Linux Standard Base (LSB). Hmmmm... > Now > > a) In the absence of further assumptions, whether two different > configurations exhibit the same behavior is UNDECIDABLE(can prove). I agree, but haven't thought of a direct proof. An indirect proof is what I've been building with this whole chain of thought over the last few months... Recap: - sampling disk state alone isn't enough to determine disk state, unless you sample the *whole* disk - ...so state machines driven by only disk samples produce non-deterministic behavior - ...so the equivalence of behavior of two different state machines driven only by disk samples is undecidable (that's easy) - configuration driven by only deterministic ordering of individual state machines produces deterministic behavior - because the state machines reside on and execute within the context of the driven disk, they can modify each other, as well as themselves - ...so the behavior of individual state machines is dependent on the order of invocation of all state machines - ...so whether two different orders of configuration operations exhibit the same behavior is undecideable (takes me a while to get here) > b) so whether two different orders of configuration operations exhibit > the same behavior remains undecidable by lifting. Exactly. > The most important force in the world to system administrators is > not configuration management, but configuration validation. Yes. In this context, it's amazing how many shops lack test environments, but instead "try out" new administrative code in production. In hindsight I can see why I've been harping on the need for test environments for the last several years. Interesting that it's this easy (hah) to justify that need theoretically. ;-) > It is this validation that suffers from turing equivalence problems, > not the act of configuration itself. If you take into account the x43,x42 ordering problem I described above, does this change your mind about the act of configuration itself having a turing equivalence problem? Good stuff. Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Sun Apr 28 14:20:28 2002 Date: Sun, 28 Apr 2002 14:20:28 -0700 From: Steve Traugott To: "Alva L. Couch" Cc: Joyce Cao Subject: Re: ordering Message-ID: <20020428142028.A28982@scramjet.TerraLuna.Org> References: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu>; from couch@eecs.tufts.edu on Thu, Apr 25, 2002 at 09:07:10AM -0400 Status: RO Content-Length: 9336 Hi Alva, Next in sequence... ;-) On Thu, Apr 25, 2002 at 09:07:10AM -0400, Alva L. Couch wrote: > The only way to assure that two subsystems are orthogonal is by design. > Testing cannot assure that. This is a fundamental premise of > software testing. > > I think that this is the ONLY way to assure that a realistic build is > reliable. But worse, there is NO way to assure the build is > consistent across an enterprise of hosts, other than to assure > homogeneity of operations and even perhaps hardware. Enacting all administrative change via serialized, tested and reproducable sequences of operations constitutes homogeneity of administrative operations. This will ensure consistent builds across an enterprise. Restating that, "if we do the same things in the same order to identical machines, then they will remain identical in behavior". Machines which aren't identical, or which by intent need different builds, are created by variations of the tested base sequence. This allows us to ignore orthogonality, rather than needing to determine whether the subsystem developer designed for it. I think you cover most of this later from a different angle, in your discussion of transitive validation. > > Going back to that HACMP build, a full regression test run to > > determine equivalency of all possible orderings is problematic to say > > the least; call it 1 hour for each rebuild, that's 9.24*10^196 years. > > (Had to install Math::BigInt to get that.) ;-) > > Kind of cute but perhaps irrelevant. I claim that we know that some > orderings are equivalent, by design. The problem is that we have no > information about order invariance except by design. If two subsystems > are designed not to interact, e.g., they're in separate boxes and > controlled by separate drivers and configuration, then this means that the > configuration operations for the subsystems commute with one another. > Instead of N!, one is left with N1! * N2! * .... * NK! where > N1,N2,...,NK are the sizes of sets of operations for the subsystems. Makes sense. Commutability and orthogonality are difficult for ordinary sysadmins to prove or disprove in the field. For a sysadmin to use a tool which requires them to make these determinations, they must understand correct use of the tool, the design of the subsystems they're configuring, as well as the desired configuration result, and how to test for it to make sure they succeeded. For a sysadmin to use a tool which ensures strict serialization of sequences of operations, they only need to understand correct use, desired result, and how to test for it. The "correct use" part is the only additional thing they need to learn beyond what they would normally do when using manual methods. This is trainable. In practice, I've been able to orient sysadmins in makefile and CVS usage in as little as a day, on-site. If we require that sysadmins understand subsystem design well enough to determine orthogonality, then we're expecting them to become more of an engineer in order to use the tool. This is less trainable. I would not be able to teach someone how to do this in a limited period of time, on-site, in the middle of a rollout. > > I think, after looking at it over the last few years, that this is a > > lot of what make is doing for ISconf. I preach that targets should > > only get added to the end of prereq lists when editing a makefile. > > This means that changes within orthogonal subsets are always > > deterministically serialized. > > Yes. Actually if you do this you don't even need to know anything > about orthogonality. Exactly. > > More importantly (I think), human error in determining orthogonality > > tends to be masked by the deterministic way that make iterates over > > multiple prereqs in the same sequence every time. This way, even > > though two actions might not be explicitly linked in the dependency > > tree, they are nevertheless serialized identically on every run, > > removing the possibility of indeterminate behavior. > > Agreed. I'll get back to this in a later message -- the response you sent me this morning disagreed with a restating of this paragraph, which makes me think I stated something unclearly in last night's message. I'll have to read both again. > > For example, the deepest dependency graph explicitly coded in that > > HACMP makefile is 8. There are deeper HACMP dependencies not > > explicitly coded; they are being maintained by make's deterministic > > prereq list behavior. I suspect the actual depth is at least 30. If > > I were to rearrange the order of any of those non-explicit prereq > > lists, experience tells me that in most cases the build will break. > > This is important to note and document. Tell us a little about what breaks > and how subtle the breaks are. Will do. > > If make's behavior were indeterminate, if it randomized the order of > > things not explicitly stated as dependencies, or if I were to run make > > with a -j flag, allowing it to parallelize, experience tells me that > > the resulting outcome would be indeterminate and non-reproducable. At > > best, the build would cleanly break. At worst, the build would > > complete, and I would need to do extensive testing to determine > > whether the machine is actually capable of doing its job. > > This is important to note and document. I would say not that "make" > is necessary, but that "serialization" is necessary, and I can probably > simplify your syntax with a simpler tool than make... Agreed that "serialization" is necessary. Agreed that 'make' isn't the only way to go -- but didn't want to replace 'make' until I had time and understood the problem well enough to do it properly. Based on past experience I suspect any other tool will also have to support: - preserving serialization of reusable subsequences - implicit assertion test of zero return codes for external commands - implicit reproducable serialization of operations not explicitly serialized - semaphores which child processes can use to signal async events to parent processes (like 'rebuild kernel and reboot when make is done') - a hierarchical grouping of host attributes, so that host function can be determined quickly by eye - re-usable ordered sets of grouped subsequences, so that new hosts can be created by prototype rather than by class (This is important. True class-based configuration tools don't seem to work in the field, while protoype-based systems consistently do. I don't think I'm able to explain why yet; it may be nothing more than "this is the way sysadmins think", or there may be a more theoretical basis. I suspect, again, that it has to do with testing.) These are things which ISconf/make already does. In addition, over the years those of us using ISconf have concluded that we also need: - postrequisites (like 'do foo after I'm done') - decentralized state machine specification (rather than a monolithic makefile or script) - lexically bound syntax (I want to be able to specify each operation in the language most suitable for that operation) - separation of action code from site-specific configuration data - decentralized editing of state machine specifications (no need to log into gold server to update makefile) - state machine and file transport language integrated, as in cfengine, to remove need for NFS mounts to get packages on demand - embedded documentation, like POD; including dynamically generated runbooks and training checklists generated per host class (the latter actually looks easy -- name a person to be trained as a "target host" and "build" them, checking off training actions as they complete) > > > You haven't convinced me that equivalence of the behavioral effects of > > > randomly ordered configuration operations is undecidable. Just > > > complex. You might be able to do this. > > > > I think of it from the other direction -- beyond a certain number N, > > it's infeasible to prove empirically that N configuration operations > > can be randomly ordered and produce the same behavioral outcome. N > > appears to be as low as 8 or 9. > > This is an intractibility argument, not an undecidability argument; > you don't need turing equivalence for this. In fact, it follows even > if the machines being CONFIGURED are simple state machines! This is > what bothered me about your math. Turing equivalence is irrelevant. Yes, turing equivalence is irrelevant for this math. It is proving itself to be a useful model for illustrating the behaviors of a self-administering system though. I don't think we could have had a conversation this productive without that foundation. I've gotten similar feedback from other people over the past few months. Folks seem to need to work with this abstraction; previously, these discussions have always bogged down in behaviors of individual packages and operating systems. Falling back to Turing's theoretical model seems to give us more neutral material to work with. I know it's helped me immensely over the last few months in understanding what I've been doing for the last several years. ;-) Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Tue Apr 23 20:15:14 2002 Date: Tue, 23 Apr 2002 20:15:14 -0700 From: Steve Traugott To: Tom Cavin Cc: infrastructures@terraluna.org Subject: Re: [Infrastructures] Re: The Turing Synthesis and Automated Administration Message-ID: <20020423201514.A23959@scramjet.TerraLuna.Org> References: <200204211901.MAA01482@roton.terraluna.org> <15556.37718.823408.23363@lap1-wccf.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <15556.37718.823408.23363@lap1-wccf.mit.edu>; from tec@ai.mit.edu on Mon, Apr 22, 2002 at 06:48:54PM -0400 Status: RO Content-Length: 8165 On Mon, Apr 22, 2002 at 06:48:54PM -0400, Tom Cavin wrote: > I've been following this discussion (and reading the papers from LISA XV) > and I think I have a different perspective that may prove useful in these > discussions. Steve and Joel's original infrastructures paper (LISA XII) > deals with trading room systems in an environment where the systems and > needs are both well defined and centrally controlled. At the other > extreme, Mark Burgess is in an academic environment where the systems and > needs are not well defined or are fairly volitile and where central control > is very limited. > > These two different environments have lead to two different approaches to > infrastructures. Exactly, Tom. The job of academic environments is to teach and do research. But academic computing environments rarely try to intentionally replicate the operating procedures needed for mission-critical business reliability. A student's time isn't considered as valuable as, say, a trader's. So student administrators are inadvertently taught to treat users poorly, and to create and administer ad-hoc systems with lower reliability and higher labor requirements than what a business would normally expect. Academic-based research focuses on fixing the inherent security and reliability problems of these same ad-hoc, indeterminate environments. As a result, university research efforts tend to try to solve the problems of university computing, not those of mission-critical business environments. This extends beyond universities, to government research labs, due to their academic roots. My favorite recent example is when a local lab performed extensive planning and analysis for the upgrade of a heavily-used login server. This planning incorporated the idea of a three-day shutdown during the upgrade. Horrified on general principle, I proposed some alternatives, but didn't expect to make a dent. The extended outage took place as planned. This sort of upgrade path would be unacceptable in a mission-critical business environment. The interns in that lab learn very bad habits. These habits will cut into their quality of life -- they simply aren't gaining the tools they need to be more effective. And so the systems administrators produced by our educational institutions must be retaught when they land in a business environment. I've long recognized that the "best" administrators often have no formal C.S. training at all -- they weren't formally taught ad-hoc habits. There is no strong feedback loop in place to correct this mismatch between education and workplace. For there to be one, academic computing would need to be more open to those of us on the receiving end of their efforts. A common response I get when I mention mission-critical reliability is "we can't afford that", as if I were talking about *spending* money. I'm talking about saving it. A well-managed computer costs a certain amount of money, and gives you a certain return on that investment. If you buy the same computer, and then manage it badly, it will cost you more in labor and provide a lower ROI. Which is cheaper? Sure, expensive hardware solutions like HA and SAN will help you squeeze out a few more hours of uptime per year. But in most cases, procedural changes will avoid most outages. > Needless to say, the tools needed to manage these two cases are likely to > be rather different. While both tools are designed to manage the > infrastructure, the goal of Isconf is to take systems from one known state > to another in an orderly fashion, and the goal of Cfengine is to get a > system to a partially known state from an unknown state. Yes. And "partially known" is not a comfortable foundation for business operations. There's enough uncertainty in venture financing, product development, marketing, and staffing -- why add to it by messing with the reliability and predictability of the underlying infrastructure? Years after launch is way too late to try to get a grip on a business -- this is true whether we're talking about infrastructure controls or financial controls. > It is also the case that much of the published research since Steve and > Joel's LISA XII paper has been done in the academic world. I suspect it will continue to be, until we ourselves do something about it. The fact is, most professionals just don't have the time or energy to write down what they're doing, and most employers will neither give permission to publish, nor pay the author to do it on company time. This heavily weights the ratio of academic versus professional authors who are able to publish at LISA, for instance. > The whole concept of an infrastructure is a really sweet idea, and that > first paper was a real eye-opener for many folks, especially the academic > sys-admins and CS faculty. The fact that there is a real, practical > solution to the problems of maintaining large groups of systems in a > scalable manner in one environment seems to have acted as a challenge to > people to extend this solution to other environments. No one in academia > seems to have looked at the problem since MIT's Project Athena (which gave > us X windows, Hesiod - a precursor to LDAP, Zephyr - a precursor to instant > messaging, simple unattended installation, and automatic unattended updates > of systems) in the late 1980s and early 1990s. Athena, though still used > at MIT, never caught on elsewhere. I think the reason for this is that > there is a steep learning curve to get everything running all at once. The > infrastructures paper introduced something that is less ambitious than > Athena, and more comprehensible. It's interesting you should mention Athena -- it was one of my own inspirations for the work we described in that paper. I agree; Athena was a wonderful exception. In our own lab we played with parts of Athena, like Hesiod or Zephyr, or of CMU's work, like SUP or AFS. We knew we couldn't roll most of these out to production, due to conflicting legacy and/or user requirements. SUP is the only one of these which actually made it into production, and we used it heavily (it's still a better technical fit than anon rsync.) We were intentionally making the time to do masters-equivalent research, exploring existing art and figuring out why it worked and why it didn't, and then building on what we'd learned. We *made* the opportunity to do this; on the org chart were were just a small admin group. It meant a lot of late nights. For several months, I carried a whole bundle of past LISA papers around, reading them on the train. The whole goal was to implement fully automated administration using open standards with as little complexity as possible. In hindsight, I think that we were lucky to hit on the makefile thing early on -- and I probably wouldn't have tried that if I hadn't just left USL. The UNIX source code makes extensive use of VPATH in makefiles, to handle different hardware platforms. I thought we could do something similar at the admin layer. That never worked out, but the makefiles themselves stayed and grew. * * * I think there's a larger story here, something more fundamental missing from systems administrator education: a business foundation. In today's businesses, the systems administrators daily make decisions which have a fundamental impact on the ability of the business to function. As clerical jobs have been automated and migrated into systems, systems administrators have become defacto business administrators, making sure that business information flows from source to sink. They usually do this in isolation from the executive team, and often don't have the background or authority to incorporate business requirements into these decisions. This translates into less effective businesses and limited career paths for IT professionals -- a CIO rarely makes CEO. Again, fixing this for the whole industry is going to require fundamental academic change, starting way down at the undergrad level. Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Tue Apr 23 22:41:35 2002 Date: Tue, 23 Apr 2002 22:41:35 -0700 From: Steve Traugott To: Tom Cavin Cc: infrastructures@terraluna.org Subject: Re: [Infrastructures] Re: The Turing Synthesis and Automated Administration Message-ID: <20020423224135.B23959@scramjet.TerraLuna.Org> References: <200204211901.MAA01482@roton.terraluna.org> <15556.37718.823408.23363@lap1-wccf.mit.edu> <20020423201514.A23959@scramjet.TerraLuna.Org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020423201514.A23959@scramjet.TerraLuna.Org>; from stevegt@TerraLuna.Org on Tue, Apr 23, 2002 at 08:15:14PM -0700 Status: RO Content-Length: 2190 On Tue, Apr 23, 2002 at 08:15:14PM -0700, Steve Traugott wrote: > On Mon, Apr 22, 2002 at 06:48:54PM -0400, Tom Cavin wrote: > > I've been following this discussion (and reading the papers from LISA XV) > > and I think I have a different perspective that may prove useful in these > > discussions. Steve and Joel's original infrastructures paper (LISA XII) > > deals with trading room systems in an environment where the systems and > > needs are both well defined and centrally controlled. At the other > > extreme, Mark Burgess is in an academic environment where the systems and > > needs are not well defined or are fairly volitile and where central control > > is very limited. > > > > These two different environments have lead to two different approaches to > > infrastructures. > > Exactly, Tom. > > The job of academic environments is to teach and do research. But > academic computing environments rarely try to intentionally replicate > the operating procedures needed for mission-critical business > reliability. Just came back from dinner. Joyce looked at this previous message and said that she thinks it's not too bad, but that a lot of people are still going to disagree with me. I admit that a lot of it is worded strongly. Let me try again: - Tom's original, very-well-written point was that business and academia are very different environments and, as such, need to be managed differently, with different toolsets and procedures. I agree. - But after reading Tom's message, I had to ask myself "Why? Why are they different?" - Should they be the same? Should business environments remain ad-hoc, like academia, the way they have been for years? Or should academic computing catch up to what we've started doing in business, and become more structured, to better represent the environment most students will graduate into? - Or should the two remain different, in order to each serve their own needs more completely? If so, how do we educate systems administrators for business environments? Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Fri Apr 19 18:11:33 2002 Date: Fri, 19 Apr 2002 18:11:33 -0700 From: Steve Traugott To: Jon Stearley Cc: "Michael J. Carter" , infrastructures@terraluna.org, luke@madstop.com, labrown@nc.rr.com Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration Message-ID: <20020419181133.A22600@scramjet.TerraLuna.Org> References: <20020419103809.A4927@scramjet.TerraLuna.Org> <20020419184634.GD4514@sandia.gov> <1019242735.32636.69.camel@guglielmo> <20020419201348.GF4514@sandia.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020419201348.GF4514@sandia.gov>; from jrstear@sandia.gov on Fri, Apr 19, 2002 at 02:13:48PM -0600 Status: RO Content-Length: 5326 Hi Jon! On Fri, Apr 19, 2002 at 02:13:48PM -0600, Jon Stearley wrote: > > be available, who decides when it needs to be used? > > > > I don't see how a self-administering host can know that it has deviated > > from the norm (presumably due to an "order error") and needs to start at > > square one. > > > > If the admin has to push reset, then the host isn't self-administering. > > absolutely true. but is fully/truely/super-duper self-administering > infrastructures really the goal? there are precious few (*if any*) > human creations for which periodic human resets are not necessary and > accepted practice. steve's points (and the whole discussion) is > extremely valueable, don't get me wrong. industry chooses something > along the lines of maximum effectiveness/(cost/benfit), that's all. > my 2c. I think this is a key place where my point has been missed a few times over the years. Self-administering can't be done halfway -- doing some things automatically and some things manually is the *most* expensive way to go, because you have the overhead of the automation as well as the indeterminate behavior of human error. It's the worst of both worlds. Fully self-administering is what I do. It's all I do. To me, it's not a "goal" any more -- it's instead been a practice for 8 years. Things are getting better, but for most of that 8 years I've felt like a 1905-era pilot being told that what he does is impossible because 'that thang can't never fly'. ;-) At the other extreme, I've seen some really skull-cracking analysis of how self-administering systems *might* be done at some point in the future. These are important from a research perspective, and I've gotten good ideas from them, but I've tried very hard to stay right in the middle, always sticking with tools and techniques usable in current production -- as a consultant, I can't get too far ahead or I don't get paid. As most of you know by now, all I've usually ever used is makefiles and some wrapper scripts that run at boot -- the sysadmin equivalent of cloth wings and piano wire. We can work together on building aluminum wings and jet engines, but in order for us to do that we first need to finally agree that human flight is feasible and desireable. I don't rely on "if all else fails, re-install". I can see how the "virtual machine" idea in the bootstrapping paper may have given this misimpression -- at one point I think I even use the phrase "rebuilding a virtual machine". Re-installing is a disaster-recovery response. It's expensive in labor, expensive in political capital unless there's actually been a disaster, and can be horribly expensive in terms of reliability and unscheduled downtime. From my perspective, jumpstarting machines to get them back into a known state is a worst-case scenario. Once you've crashed the airplane, you can rebuild it from scratch, but the damage you've done to the passengers by that point is irreversible. "Hey, I've just fubared all 2000 of the machines on the trading floor; can someone stop the markets for a few hours while I re-install them?" That's a situation I wanted to avoid at all costs. ;-) It's been 8 years, thousands of machines, and several organizations since I started doing deterministic, ordered changes, and I can't recall a time when I've ever re-installed the same operating system on a managed production desktop or server, short of hardware failure. Joel, can you think of anything? I suspect deterministic ordering is the airfoil of self-administering systems. If I'm right, then purposely randomized ordering, while fascinating, might be roughly equivalent to wing-flapping machanisms. It took centuries for aerodynamics experimenters to figure out and prove to each other that the flapping wasn't important, but the airfoil was. I think we may be in a similar situation here. I know that to encourage things to get out of sequence is to risk breaking production machines in such a way that users will notice. This is exactly what we're supposed to prevent. It's a sysadmin's version of the hippocratic oath; "First, do no harm". "But ordering isn't as important as long as you test your changes before putting them in production." Sure. But to do that, the production machines have to have been built *exactly* the same way that the test environment was, or else the test is invalid. The only way to ensure that test and production are built the same way is to, well, build them the same way, applying changes in the same order in each case. "But you can just test in production, and if the change broke something, you can always back it out. If the backout fails, then just re-install". If anyone still feels this way, then they should re-read the above paragraphs about reliability and downtime. Testing in production, and relying on scheduled downtime and backout windows, eats into uptime numbers and precludes 24x7 operation. A global economy has no "off hours". I think about this every time I'm waiting in line at the Hertz counter after arriving in Tampa at 2 a.m. on a Friday night, right smack in the middle of the Hertz scheduled maintenance downtime. Does any of this make sense? Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Sat Apr 20 10:34:56 2002 Date: Sat, 20 Apr 2002 10:34:56 -0700 From: Steve Traugott To: Daniel Hagerty Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration Message-ID: <20020420103456.A24714@scramjet.TerraLuna.Org> References: <20020419103809.A4927@scramjet.TerraLuna.Org> <20020420161429.120E61BA1B@perdition.linnaean.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020420161429.120E61BA1B@perdition.linnaean.org>; from hag@linnaean.org on Sat, Apr 20, 2002 at 12:14:28PM -0400 Status: RO Content-Length: 3032 Hi Dan! On Sat, Apr 20, 2002 at 12:14:28PM -0400, Daniel Hagerty wrote: > > *No* language or syntax can completely specify administrative actions > > to be applied to a machine without also specifying the complete > > history and order of previous changes. > > Are you trying to tell me that wasn't obvious to all of you? It's something I've always assumed, but I was really surprised myself to see that at least three very key people in LISA leadership hadn't yet got that far. I heard over and over during this past LISA that "a sufficiently descriptive language" would solve any problem, removing the need for strict ordering etc. It's taken me the last few months since then to marshal the synapses enough to make a case. > I've had that in the back of my mind for at least two years now in > building my local half assed infrastructure based on your ideas. It > was very clear that the total state of the machine is very important > to ongoing mutation. No, I haven't actually done work towards this, I > just noticed it in how my infrastructure was failing. ;-) > As an example, if your system doesn't track it's full state, how > does it know when a human has come along and performed an uncontrolled > mutation? How do/would you track full state? I've thought about how nice it would be to use a loadable kernel module to intercept open() calls etc. That's the only way I've thought of to really track *all* of the changes made to a machine in real time. I could do it for Linux, but someone would have to write an equivalent module for every other stinking version of every variant of UNIX. That still doesn't solve the problem of what to *do* with that information -- if someone edits /etc/hosts on a target machine, would you propagate the new version to all other machines, or store it on the gold server as a machine-specific variant for that host only? Or just blow up in their face and prevent them from editing in the first place? So I haven't yet tried to track out-of-band changes. I use the rules of "don't change things manually" and "if a human changes something then it's a security problem and needs to be dealt with accordingly". This gets the point across, and I've noticed people accepting this more readily just within the last year. In production environments people always get the idea real quick. I think the biggest resistance comes from campus admins who are used to using student labor. But I think that some kernel-level support is going to be needed to ultimately solve the problem. > I didn't make last lisa because of funding issues, and this is now > the second thing I've hard that I'm very sorry I missed. Last year was the first time I was there since 1998 -- it's changed for the better, but that just means that the debate has been elevated to new levels. At least people are finally over the push vs. pull thing. ;-) Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From stevegt@TerraLuna.Org Sat Apr 20 14:44:34 2002 Date: Sat, 20 Apr 2002 14:44:34 -0700 From: Steve Traugott To: Dylan Northrup Cc: Jon Stearley , "Michael J. Carter" , infrastructures@TerraLuna.Org Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration Message-ID: <20020420144434.A6759@scramjet.TerraLuna.Org> References: <20020419181133.A22600@scramjet.TerraLuna.Org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from docx@io.com on Fri, Apr 19, 2002 at 09:29:23PM -0500 Status: RO Content-Length: 6049 On Fri, Apr 19, 2002 at 09:29:23PM -0500, Dylan Northrup wrote: > An infinite number of monkeys in the guise of Steve Traugott wrote: > > :=I think this is a key place where my point has been missed a few times > :=over the years. Self-administering can't be done halfway -- doing > :=some things automatically and some things manually is the *most* > :=expensive way to go, because you have the overhead of the automation > :=as well as the indeterminate behavior of human error. It's the worst > :=of both worlds. > > I disagree. It's a law of diminishing returns. The closer you get to fully > automating your system, the harder it becomes to completely automate away > the rest of it. It's the computer version of Xeno's Paradox where you > really *can* only get half the way through what's left. We have to be using different meanings of the term "automate". Automated systems administration is very straightforward. There is only one way to change the contents of disk or RAM in a running UNIX machine -- the syscall interface. The task of automated administration is simply to make sure that each machine's kernel gets the right system calls, in the right order, to make it be the machine you want it to be. There are no system calls used in the course of administration which can't be issued from a shell script, a perl script, or (in very rare cases) a piece of C code. This makes the process of putting automation in place itself very straightforward. In my case, rather than typing a command at a shell prompt, I put it into a makefile stanza. If it's a complex series of commands, with conditionals and loops etc., then it goes into a script, the script goes into CVS, and the script name goes into the makefile. Putting the CVS and makefile framework in place is itself also very straightforward, as we just proved at Caterpillar earlier this year -- Luke took 6 hours to go from scratch to a running installation of isconf, with me looking over his shoulder. Luke is an exceptionally smart guy, (Hi Luke!) ;-P But I think just about anyone on this list should be able to get similar results. I was there for a week; the first 4 days were spent on political buy-in, only the last on technology. This is important -- the only major hurdles I have ever run into for the last few years are political rather than technological. The major improvement I've had to make in myself is in learning how to show both management and sysadmins that this stuff can be easy. People look for ways to prove the viewpoint they hold. This can happen unconsciously. They may want to automate, but their own perception of the difficulty can block them, even to the point of convincing their own management that it's not worth trying. And so the industry consensus is that fully automated administration is hard. This consensus must be broken on a per-site basis, at least until we reach a critical mass of automated sites. In my experience, the more tasks you automate, the easier it gets to automate the rest. > Also, if you really *can* automate *everything* then you're not solving > interesting problems, or your problems aren't evolving, or your customers > are very boring and completely non-standard indeed. I find that once you've > conquered/automated a particular problem, there's always one more problem to > solve . . . a wrinkle I hadn't thought of or a new requirement I didn't know > about. You can never fully automate a sufficiently complex system. . . > especially when users are involved ;-) I separate problem-solving and corrective action from each other. Problem solving is an intelligence function. This is the job of humans. Corrective action is a manipulative function. This is the job of code. A systems administrator solves a problem and implements corrective action via a shell prompt, on one or more machines, making the corrective action more or less non-repeatable. An infrastructure architect solves a problem and implements corrective action in repeatable code, which then executes on all applicable machines. Execution is within a control framework (like make) and "applicable" is defined by configuration files within that framework (hosts.conf in isconf version 2i). > Now, this isn't to say you shouldn't automate as much as possible. Fix a > problem once and either solve the problem completely or automate the work > around. Your toolkit should always be expanding. But the tools won't do > the job without proper guidance from the mechanic. ISconf hasn't needed to expand very much over the last 8 years, unless you count the makefile stanzas themselves. It *has* been rewritten a few times though, as I got better at understanding what it does. I jotted some notes on my HP palmtop on the train into NYC in early 1994 and rolled a makefile and 'configure' script out to a few hundred machines globally that same year, thinking "this thing's a hack but it'll help me get a handle on the chaos". That "hack" worked real well compared to everything else I've tried before and since, and it's taken me a long time to figure out exactly why. I think I only really grokked it in full during the last year. It's funny -- I had looked at cfengine in 1994, and almost used it instead. It wasn't until years later, in 1998, that I *tried* to use cfengine as a replacement for isconf and realized that the two solve orthogonal problems and aren't interchangeable with each other. The only major addition I've needed to add so far is some perl networking code to solve the "barrier" problem of synchronizing administrative events on two or more machines at a time, such as when building nodes in an HA cluster. But the core mechanism is still just plain old, simple make. The only innovation there has been to use GNU make, to get conditionals into the makefiles themselves. The Perl rewrite might use Make.pm or a derivative. Steve -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com From hag@linnaean.org Sat Apr 20 11:57:44 2002 Received: from perdition.linnaean.org (h00d0b71b83ad.ne.client2.attbi.com [65.96.132.240]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id LAA25256 for ; Sat, 20 Apr 2002 11:57:44 -0700 Received: by perdition.linnaean.org (Postfix, from userid 31013) id 7A2931BA1E; Sat, 20 Apr 2002 14:57:47 -0400 (EDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Steve Traugott Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration In-Reply-To: <20020420103456.A24714@scramjet.TerraLuna.Org> References: <20020419103809.A4927@scramjet.TerraLuna.Org> <20020420161429.120E61BA1B@perdition.linnaean.org> <20020420103456.A24714@scramjet.TerraLuna.Org> X-Mailer: VM 6.34 under Emacs 20.7.1 Reply-To: Daniel Hagerty From: Daniel Hagerty Sender: Daniel Hagerty Date: Sat, 20 Apr 2002 14:57:47 -0400 Message-Id: <20020420185747.7A2931BA1E@perdition.linnaean.org> Status: O Content-Length: 3079 > yet got that far. I heard over and over during this past LISA that "a > sufficiently descriptive language" would solve any problem, removing > the need for strict ordering etc. It's taken me the last few months > since then to marshal the synapses enough to make a case. The first half of the sentence makes sense. Since strict ordering strikes me as part of the problem, the second half does not, or rather, ordering is part of what the language describes. > How do/would you track full state? I've thought about how nice it > would be to use a loadable kernel module to intercept open() calls > etc. That's the only way I've thought of to really track *all* of the > changes made to a machine in real time. I could do it for Linux, but > someone would have to write an equivalent module for every other > stinking version of every variant of UNIX. As you note, it's truely intractible without instrumenting at a pretty low level. For the most part, I want to cheat -- the system knows what it did and should be able to detect a mutation that it didn't do, and declare fault. While some things may be manageable as a merge deal, I don't want to go there. > That still doesn't solve the problem of what to *do* with that > information -- if someone edits /etc/hosts on a target machine, would > you propagate the new version to all other machines, or store it on > the gold server as a machine-specific variant for that host only? Or > just blow up in their face and prevent them from editing in the first > place? That's a policy issue. There's a lot of room to argue about how to do things, and what works for one situation may not work for others. > But I think that some kernel-level support is going to be needed to > ultimately solve the problem. I don't think it's wise to predicate everything on that. We should develop techniques to manage in the presence of a system that isn't as cooperative as we like; I know oracle doesn't listen to me when I tell them that everybody would benefit if they'd do "this". Remember that microsoft isn't going away anytime soon, and we have to push their boxes around too. I really doubt their ideas of where to take management are fully in line with ours. > Last year was the first time I was there since 1998 -- it's changed > for the better, but that just means that the debate has been elevated > to new levels. At least people are finally over the push vs. > pull thing. ;-) No, not entirely. Have you looked arusha? I haven't looked in depth, but some of the admin specified object model stuff they've done looks very interesting. But at the core of it all is this "we're push vs pull neutral" thing that just has world of hurt written all over it. Sigh. Anyway, I'll catch up on the rest of infrastructures soon. This was a hell week of deadlines, and now that midnight friday has passed, I can start cleaning up the damage I did to make externally reinforced business and telco realities happen under completely unrealistic constraints :-) From jrstear@sandia.gov Thu Apr 25 16:15:19 2002 Received: from mm02snlnto.sandia.gov (mm02snlnto.sandia.gov [132.175.109.21]) by roton.terraluna.org (8.9.3/8.9.3) with SMTP id QAA02519; Thu, 25 Apr 2002 16:15:19 -0700 Received: from 132.175.109.1 by mm02snlnto.sandia.gov with ESMTP ( Tumbleweed MMS SMTP Relay (MMS v4.7)); Thu, 25 Apr 2002 17:14:55 -0600 X-Server-Uuid: 95b8ca9b-fe4b-44f7-8977-a6cb2d3025ff Received: from mercy.sandia.gov (mercy.sandia.gov [134.253.242.218]) by sass165.sandia.gov (8.12.3/8.12.3) with ESMTP id g3PNFXP7015748; Thu, 25 Apr 2002 17:15:33 -0600 (MDT) Received: from jrstear by mercy.sandia.gov with local (Exim 3.33 #1 ( Debian)) id 170sSZ-0007Xh-00; Thu, 25 Apr 2002 17:15:31 -0600 Date: Thu, 25 Apr 2002 17:15:30 -0600 To: "Steve Traugott" cc: infrastructures@terraluna.org, luke@madstop.com, labrown@nc.rr.com Subject: Re: [Infrastructures] The Turing Synthesis and Automated Administration Message-ID: <20020425231530.GI2481@sandia.gov> References: <20020419103809.A4927@scramjet.TerraLuna.Org> MIME-Version: 1.0 In-Reply-To: <20020419103809.A4927@scramjet.TerraLuna.Org> User-Agent: Mutt/1.3.24i From: "Jon Stearley" X-Filter-Version: 1.8 (sass165) X-WSS-ID: 10D652651816631-01-01 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 7bit Status: O Content-Length: 1063 > Self-administering hosts are in effect Turing machines; they do brain > surgery on themselves. They don't behave according to strict > Von-Neumann rules, but have complex feedback loops, recursion, and > self-modifying code. > > *No* language or syntax can completely specify administrative actions > to be applied to a machine without also specifying the complete > history and order of previous changes. haven't been able to keep on the whole thread but wanted to say that after more though, i realize that my infrastructure does indeed have a very extensive state graph, full of dependancies and ordering. in my case, i chose debian which has a very extensive pkg dependancy system which runs nightly as directed by cfengine. cfengine also does the relatively cosmetic stuff (selecting which packages should be installed, their site-specific configuration, misc symlinks and other config, cleaning and etc - i LOVE cfengine), but without apt-get doing the brain surgery i would be _much_afraid_. so "you go" steve, this is an excellent topic! -jon From couch@eecs.tufts.edu Fri Apr 19 13:57:42 2002 Received: from lin17.eecs.tufts.edu (IDENT:postfix@lin17.EECS.Tufts.EDU [130.64.23.47]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id NAA15910 for ; Fri, 19 Apr 2002 13:57:42 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin17.eecs.tufts.edu (Postfix) with ESMTP id 44D5B1FEA8 for ; Fri, 19 Apr 2002 16:57:53 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 30) id E4D2ECFC4; Fri, 19 Apr 2002 16:57:52 -0400 (EDT) To: stevegt@terraluna.org Subject: ordering Message-Id: <20020419205752.E4D2ECFC4@piano.eecs.tufts.edu> Date: Fri, 19 Apr 2002 16:57:52 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: RO X-Status: A Content-Length: 2789 I've read http://www.infrastructures.org/papers/turing.html at the urging of my staff who are following the infrastructures list. I think at the outset that I agree somewhat with the conclusions but not with the mathematics. There are some mistakes. - if you have N things over all orderings, there are N! possible outcomes, not N^2. The N^2 in my paper is an embedding theorem, not an outcome theorem. While all N! permutations can be embedded in a sequence of length N^2, all N! permutations may have unique outcomes as sequences. The reduction to N^2 outcomes comes from strong convergence properties of the N operations, something that I know you don't think is practical to assure. - One thing that you ignore is the concept of orthogonality. Two systems are orthogonal if changes to one can't affect the other, and vice versa. The effect of orthogonality is that operations on orthogonal subsystems always commute. E.g., a*b = b*a. The effect of orthogonality is to reduce the space of possible sequences. E.g., I think everyone would agree that the 'networking subsystem' is orthogonal to vi. Changes to vi can't affect networking, and vice versa. Thus the results of a change to both is the same in either order. - My paper does not advocate randomization of ordering except in cases where that ordering exposes an unknown and otherwise unobservable factor in existing configuration. Frode's paper, however, utilizes randomization as an obscurity mechanism. If you look carefully at my assumptions, you'll see that one can prove that my lack of ordering is *unobservable* in the final configuration. That's how strong my assumptions are. I have been working on the concept of ordering more than you realize, but can't as the chair publish a paper on it this year. In particular, I have designed a formalism that portrays the behavioral effects of configuration, then analyzed the structure of operations on behavior as a semigroup. This semigroup can be decomposed by the factor theorem into subsemigroups describing changes to disjoint subsystems. This fancy math demonstratese that storing the complete ordering is 'more work than is required', though cfengine stores 'less than is required'. All one has to do is to serialize the changes to orthogonal subsystems. Since operations on orthogonal subsystems commute, one can always express them as groups of operations on one subsystem at a time. You're onto something but have to be careful about the math. You haven't convinced me that equivalence of the behavioral effects of randomly ordered configuration operations is undecidable. Just complex. You might be able to do this. Feel free to keep up correspondence. I think there is a paper here. -alva From couch@eecs.tufts.edu Fri Apr 19 17:22:56 2002 Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id RAA25355 for ; Fri, 19 Apr 2002 17:22:56 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin16.eecs.tufts.edu (Postfix) with ESMTP id 7431047C29 for ; Fri, 19 Apr 2002 20:23:06 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 30) id 0706BCFC4; Fri, 19 Apr 2002 20:23:05 -0400 (EDT) To: stevegt@terraluna.org Subject: more about turing equivalence. Message-Id: <20020420002305.0706BCFC4@piano.eecs.tufts.edu> Date: Fri, 19 Apr 2002 20:23:05 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: RO X-Status: A Content-Length: 3047 I took a short walk and reflected on your paper some more. a) Cfengine's configuration operations aren't turing equivalent, nor are the operations posited by maelstrom. Both are "state machines" in the language of the Chomsky hierarchy. Recall that there are three levels of complexity in the Chomsky model, provably different: i) state machines/regular expression grammars. Most cfengine configurations live here. It takes some trouble to make one that isn't a state machine. Likewise, makefiles can usually be REDUCED to state machines by looking at what they do on each architecture and codifying that as state transition rules. ii) pushdown automata/context-free grammars. Right now I know of no configuration management tools in this category. iii) turing machines/unconstrained grammars. The actual machines being configured live here. This is where undecidability may lurk. This makes the CONFIGURATION model simpler than the EXECUTION model. While the execution model remains turing equivalent, the configuration model is fundamentally simpler in structure. b) Even with this simplification, the sequencing problem remains. It is easy to construct a sequence of invocations of N state machines such that only the order x1..xn gives the appropriate outcome on the turing machine tape. This is the same thing as what you're saying; without more simplifications, the outcome of every order is potentially different. c) Determining an appropriate order for arbitrary state machines to achieve a given outcome for configuration is INTRACTIBLE but not UNDECIDABLE. Hard but not impossible. All one has to do is to try all the orders and choose the one that produces the appropriate outcome. It becomes undecidable if other processes are SIMULTANEOUSLY modifying configurations, asynchronously. d) Determining whether an appropriate order achieves a particular behavioral outcome remains UNDECIDABLE. This is just another flavor of Church's thesis. I think the real issue here is that VALIDATION of behavior only applies to a specific sequence of operations. If one changes the sequence of operations, then one must prove that the other sequence reproduced the same disk state, bit for bit, before one can claim that the prior validation applies to the new state. This is a claim of "transitivity of validation", the same claim utilized by the Linux Standard Base (LSB). Now a) In the absence of further assumptions, whether two different configurations exhibit the same behavior is UNDECIDABLE(can prove). b) so whether two different orders of configuration operations exhibit the same behavior remains undecidable by lifting. The most important force in the world to system administrators is not configuration management, but configuration validation. It is this validation that suffers from turing equivalence problems, not the act of configuration itself. I'll keep thinking about this. -alva From couch@eecs.tufts.edu Thu Apr 25 06:06:56 2002 Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id GAA08449 for ; Thu, 25 Apr 2002 06:06:56 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin16.eecs.tufts.edu (Postfix) with ESMTP id 0CFB447BFD for ; Thu, 25 Apr 2002 09:07:11 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 30) id 95FC7CFC4; Thu, 25 Apr 2002 09:07:10 -0400 (EDT) To: stevegt@TerraLuna.Org Subject: Re: ordering Message-Id: <20020425130710.95FC7CFC4@piano.eecs.tufts.edu> Date: Thu, 25 Apr 2002 09:07:10 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: RO X-Status: A Content-Length: 13545 > On Fri, Apr 19, 2002 at 04:57:52PM -0400, Alva L. Couch wrote: > > I think at the outset that I agree somewhat with the conclusions but > > not with the mathematics. There are some mistakes. - if you have N > > things over all orderings, there are N! possible outcomes, not N^2. > > Of course. My mistake. Fixed. But I was talking about number of > possible disk states, given an N-bit disk. That would be 2^N possible > disk states. I had it backwards -- silly error. Thank you. > > On the other hand, you make a very good point -- ordering of > operations is itself numerically interesting, because of that > factorial. So I'm adding that, and elaborating below. > > An HACMP cluster node build is around 121 discrete operations, for > instance, if you only count the number of stanzas in the ISconf > makefile. That would be 121!, or 8.09*10^200 possible orderings. > > These produce a disk content which uses 2Gb of space, and has > 2^(8*2*10^9) possible bit states. I don't yet know what magnitude > that number is -- I've had a perl one-liner chewing CPU for several > hours so far. > > > - One thing that you ignore is the concept of orthogonality. Two > > systems are orthogonal if changes to one can't affect the other, and > > vice versa. The effect of orthogonality is that operations on > > orthogonal subsystems always commute. E.g., a*b = b*a. The effect > > of orthogonality is to reduce the space of possible sequences. > > E.g., I think everyone would agree that the 'networking subsystem' > > is orthogonal to vi. Changes to vi can't affect networking, and > > vice versa. Thus the results of a change to both is the same in > > either order. > > Yes, but in practice, who decides whether, or to what degree, two > subsystems, or operations on them, are related? For example, the > latest version of VIM does contain networking code, to support remote > slaved edit sessions. I would never have known about this possible > dependency if I hadn't read the docs. > > I suspect the most reliable and fast way to ensure that two sequences > produce the same outcome is via regression testing. As the number of > possible sequences increases, so does the required number of > regression test iterations. The only way to assure that two subsystems are orthogonal is by design. Testing cannot assure that. This is a fundamental premise of software testing. I think that this is the ONLY way to assure that a realistic build is reliable. But worse, there is NO way to assure the build is consistent across an enterprise of hosts, other than to assure homogeneity of operations and even perhaps hardware. > Going back to that HACMP build, a full regression test run to > determine equivalency of all possible orderings is problematic to say > the least; call it 1 hour for each rebuild, that's 9.24*10^196 years. > (Had to install Math::BigInt to get that.) ;-) Kind of cute but perhaps irrelevant. I claim that we know that some orderings are equivalent, by design. The problem is that we have no information about order invariance except by design. If two subsystems are designed not to interact, e.g., they're in separate boxes and controlled by separate drivers and configuration, then this means that the configuration operations for the subsystems commute with one another. Instead of N!, one is left with N1! * N2! * .... * NK! where N1,N2,...,NK are the sizes of sets of operations for the subsystems. > Testing an HACMP cluster actually takes several hours after the build > itself is done. That extra time would only hurt us by 1 additional > order of magnitude, so I'm neglecting that. > > Looking at the HACMP makefile dependency trees, 61 of the operations > are "leaves". Only a few of those would be orthogonal to each other. > But let's assume that they all are. We know that order matters in the > remaining 121-61 = 60 operations. So that lowers us to 9.5*10^77 > years of regression testing to determine which sequences of those 60 > operations are equivalent. Kind of cute but probably irrelevant. We need to know instead what the perturbational effects are. We know one ordering that works. What does that tell us about other orderings that work? Knowing that one works is a lot of information... > Of course, much of that regression testing could be eliminated by > pruning "dead branches", sequences which include subsequences which > are known to be bad. Let's say that 99.999% can be instantly culled > by recognition, before testing. That still leaves 9.5*10^72 years, > unless I've slipped a decimal somewhere. > > Let's say disk, network, and CPU speeds increase by a couple orders of > magnitude, and those 1-hour builds each take 1 second instead. That > brings us down to 9.5*10^69 years to process the culled order space. > > But this all presupposes that we already know the entire set of > operations to be performed. When doing builds or upgrades this is > never the case. The actions needed are always discovered in the > course of growing an infrastructure. In the case of ISconf, this > discovery is by the person who's writing and debugging the makefile; > "Oh, looks like bar needs to be installed before I can install foo; I > didn't know until now that we even needed bar. Ok, I'll add bar to > the makefile as a prereq for foo." > > This also presupposes that discovering a viable order of actions is > hard, and that exhaustive exploration of possible orderings is a > desireable way to yield an optimum outcome. In practice, the most > viable order is the order in which the operations were discovered and > executed in the first place. At every point in this process, we > already have an order that will work. Yes. Discovering a valid ordering is the same as Church's thesis/the halting problem. The reason is that each time you create an ordering, you potentially program the machine differently as a result, and have the validation problem that's hard. > And this also presupposes that the list of operations is a fixed set, > regardless of order. In reality, changing the order will require > changing the membership of the set -- for example, failing to mount > /opt before populating /opt/foo will require a later migration onto > the 'real' /opt. This migration is an additional operation which > would have to be discovered and added to the set. Yes. > This is all reaching the point from an extreme direction; the point > being that, for building or maintaining a given host class, it's > easiest in practice to constrain the ordering to the one known, > working sequence, test the heck out of it, and only add new actions to > the end of the sequence. > > New host classes can then be created based on variations of the known > good sequence. In ISconf, new host classes get created by creating > new makefile stanzas, naming ordered sets of existing stanzas as > prereqs. In this case, we benefit from known working subsequences, > and the internal ordering of those subsequences is preserved. Yes. This is perturbational analysis, as noted above. > > I have been working on the concept of ordering more than you > > realize, but can't as the chair publish a paper on it this year. > > In particular, I have designed a formalism that portrays the > > behavioral effects of configuration, then analyzed the structure > > of operations on behavior as a semigroup. This semigroup can > > be decomposed by the factor theorem into subsemigroups describing > > changes to disjoint subsystems. > > > > This fancy math demonstratese that storing the complete ordering is > > 'more work than is required', though cfengine stores 'less than is > > required'. All one has to do is to serialize the changes to > > orthogonal subsystems. Since operations on orthogonal subsystems > > commute, one can always express them as groups of operations on one > > subsystem at a time. > > I think, after looking at it over the last few years, that this is a > lot of what make is doing for ISconf. I preach that targets should > only get added to the end of prereq lists when editing a makefile. > This means that changes within orthogonal subsets are always > deterministically serialized. Yes. Actually if you do this you don't even need to know anything about orthogonality. > More importantly (I think), human error in determining orthogonality > tends to be masked by the deterministic way that make iterates over > multiple prereqs in the same sequence every time. This way, even > though two actions might not be explicitly linked in the dependency > tree, they are nevertheless serialized identically on every run, > removing the possibility of indeterminate behavior. Agreed. > For example, the deepest dependency graph explicitly coded in that > HACMP makefile is 8. There are deeper HACMP dependencies not > explicitly coded; they are being maintained by make's deterministic > prereq list behavior. I suspect the actual depth is at least 30. If > I were to rearrange the order of any of those non-explicit prereq > lists, experience tells me that in most cases the build will break. This is important to note and document. Tell us a little about what breaks and how subtle the breaks are. > If make's behavior were indeterminate, if it randomized the order of > things not explicitly stated as dependencies, or if I were to run make > with a -j flag, allowing it to parallelize, experience tells me that > the resulting outcome would be indeterminate and non-reproducable. At > best, the build would cleanly break. At worst, the build would > complete, and I would need to do extensive testing to determine > whether the machine is actually capable of doing its job. This is important to note and document. I would say not that "make" is necessary, but that "serialization" is necessary, and I can probably simplify your syntax with a simpler tool than make... > My strongest driver is the need to remove the possibility of > indeterminate behavior -- it's a business requirement. When I'm the > consultant flipping the switch to demonstrate to the customer what I > can do, indeterminate behavior is the greatest risk I can think of. > It doesn't matter how long the build or upgrade takes, as long as it's > less than an hour or two. It doesn't matter if the predictable, > plodding nature of the build introduces security holes at > predetermined times; I haven't seen that exploited yet and don't > expect to, not for a few more years. But I have seen unforseen > dependencies between subsystems cause build failures; it's a common > failure mode. Yes. Another driver is enterprise-wide consistency. In a Microsoft world your result is obvious to anyone who's tried to configure a complex network binding with Win2k. > > You haven't convinced me that equivalence of the behavioral effects of > > randomly ordered configuration operations is undecidable. Just > > complex. You might be able to do this. > > I think of it from the other direction -- beyond a certain number N, > it's infeasible to prove empirically that N configuration operations > can be randomly ordered and produce the same behavioral outcome. N > appears to be as low as 8 or 9. This is an intractibility argument, not an undecidability argument; you don't need turing equivalence for this. In fact, it follows even if the machines being CONFIGURED are simple state machines! This is what bothered me about your math. Turing equivalence is irrelevant. This is the known cyclomatic complexity limit for McCabe's white-box basis testing algorithm. I don't think this is a coincidence. > Genetic algorithms must have a fitness function which executes > quickly. If the fitness function takes too long, then the time > required to arrive at a solution is prohibitive. Building and testing > a host is a 1-hour fitness function. > > For example, with a DNA size of only 8 operations, executing a 1-hour > fitness function for each combination requires 8!/24/365 = 4.6 years. Yes, if you presume that you have "zero knowledge" of sequencing. But you don't have "zero knowledge". This simplifies things a lot. > What am I missing? There's only one thing that I can see that you're missing. Some subsystems are orthogonal to all others and even order-invariant by design. This means that they can be configured in an order separate from other orderings, or even configured in an order-invariant way. In order to verify this one has to know that: a) the subsystem you're configuring isn't turing equivalent (i.e., you're not "programming" anything). Most good ones are state-machine equivalent, where all you're doing is adding or removing states. b) the subsystem is memoryless ("non-markovian"). c) the software basis for the subsystem is invariant. A good example of this would be DHCP. I claim that DHCP is order-invariant provided that the software that one uses to run it (e.g., dhcpd) doesn't change, and that it doesn't matter how many times one enables or disables it. The reason for this is that the regression test has already been done. I claim that this kind of invariance is a new kind of "modularity" specific to system administration. If one constructs modules of this sort, then order doesn't matter outside the modules. If one constructs modules appropriately, order doesn't matter inside them, either. "This is the way to construct systems". But current systems aren't constructed that way. -alva From couch@eecs.tufts.edu Thu Apr 25 10:01:04 2002 Received: from lin19.eecs.tufts.edu (IDENT:postfix@lin19.EECS.Tufts.EDU [130.64.23.49]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id KAA02563 for ; Thu, 25 Apr 2002 10:01:04 -0700 Received: from darkmatter.eecs.tufts.edu (darkmatter.eecs.tufts.edu [10.3.1.7]) by lin19.eecs.tufts.edu (Postfix) with ESMTP id 723DE819E for ; Thu, 25 Apr 2002 13:01:16 -0400 (EDT) Received: by darkmatter.eecs.tufts.edu (Postfix, from userid 30) id 6D43F2AA07; Thu, 25 Apr 2002 13:01:15 -0400 (EDT) To: stevegt@terraluna.org Subject: intractability of predictability of state machines Message-Id: <20020425170115.6D43F2AA07@darkmatter.eecs.tufts.edu> Date: Thu, 25 Apr 2002 13:01:15 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: O Content-Length: 2006 Lines: 38 Alas, I spoke too soon about the intractability of ordering being invariant of whether the configured system is turing-equivalent or just a state machine. I'll have to think about this, but I think there's a reduction for state machines that avoids most of the complexity of validation. The simplicity of this case comes from the fact that modifications of independent substructures of a state machine (without modifying links between them) commute.... The real issue here is "behavioral commutivity" of operations. Two operations X and Y are behaviorally commutative if XY and YX applied to a machine exhibit the same final behavior. Of course, if XY and YX are bit-for-bit identical, they better be behaviorally identical. What Maelstrom did is to mathematically *construct* operations that are behaviorally commutative. It did this by positing a *homogeneity* criterion: two scripts that change a parameter always change it to the same value. This of course means that the behavioral result of XY is always the same as the result of YX, if Y and X succeed in making their modifications. I think that to capture the interest of the LISA audience, though, your paper has to take a different tack. The premise that I think that you should explore is the changing role of testing and validation in IT infrastructure. Point out that testing is no longer the sole role of the vendor, but a responsibility of the system administrator. Point out the while the vendor does "unit testing", we must do "integration testing". Explain why that's difficult, and give your evidence (both anecdotal and mathematical) about the difficulty of the problem. Point out that the existence of testing and validation as a process forces one to look at the problem of network administration from a new perspective, and then explain the implications. I recently bought a new book "Testing IT" by Watson?. This book serves as a really good starting point for an essay on validation and its implications. -alva From couch@eecs.tufts.edu Sun Apr 28 07:56:16 2002 Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id HAA25883 for ; Sun, 28 Apr 2002 07:56:16 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin16.eecs.tufts.edu (Postfix) with ESMTP id DCEF947C01 for ; Sun, 28 Apr 2002 10:56:32 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 30) id 47E30CFC4; Sun, 28 Apr 2002 10:56:32 -0400 (EDT) To: stevegt@TerraLuna.Org Subject: Re: more about turing equivalence. Message-Id: <20020428145632.47E30CFC4@piano.eecs.tufts.edu> Date: Sun, 28 Apr 2002 10:56:32 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: O Content-Length: 22949 Lines: 481 > > a) Cfengine's configuration operations aren't turing equivalent, nor > > are the operations posited by maelstrom. Both are "state machines" > > in the language of the Chomsky hierarchy. Recall that there are > > three levels of complexity in the Chomsky model, provably different: > > I usually see four levels -- you're lumping linear bounded automata in > with turing machines, right? I don't think it matters in this case, > unless you've seen something I've missed. I don't think it matters. I studied from an older book:) > > i) state machines/regular expression grammars. Most cfengine > > configurations live here. It takes some trouble to make one > > that isn't a state machine. > > Yes. And the state which cfengine detects is based on spot checks of > the current disk content. I suspect that, due to those 2^N possible > bit states of an N-bit disk, any method which uses current disk > content as the input to a state machine is only going to be able to > take samples. Alas, this is not true of typical cfengine scripts. They sample everything they change. It can be difficult to engineer a script this way, but once engineered, there are no samples. The 'sampling' comes from poor interoperability between cfengine and makefiles. This could be eliminated by rewriting 'install' to be convergent! Then "make install" would not change anything that was already in compliance.... > This sampling means that cfengine is not really responding to the full > disk state of the machine. During any given state transition, But this is an error in configuring the machine, not a property of cfengine proper. It's "poor practice" that I hope to stamp out in my next book. > cfengine responds to only a very small subset S of the disk state. > Therefore, any given rule in cfengine's state machine can actually be > triggered by any one of 2^(N-S) disk states. Those multiple possible > disk states cause the outcome of any conventionally-configured > cfengine run to be non-deterministic. (Ouch.) True, but this is the result of stupid admins rather than stupid cfengine. I'm serious: making "install" (and rpm --install) convergent would fix this. > > Likewise, makefiles can usually be > > REDUCED to state machines by looking at what they do on each > > architecture and codifying that as state transition rules. > > By intent, ISconf has always wrapped and used makefiles as > deterministic state machines, only. The only bits on the disk that > the makefile reads to determine state are timestamp files explicitly > created by the makefile itself and maintained for the life of the > target host. > > This means that ISconf uses the full set of all prior actions, > accumulated over the entire life of the target host, in order to > determine current state and trigger the next state transition. Since > there is only one possible disk state that could be represented by a > given set of timestamp files, we get deterministic state transitions. We can get that in cfengine as well....it's just possible to abuse it. > An objection frequently raised is that this approach does not have the > ability to detect and correct out-of-band changes. In a fully > automated environment, out-of-band changes to root-owned portions of > disk are indicators of a security breach. The only reliable way to > inspect a disk for security breaches is to reboot off of an untainted > disk. No tool which executes within the context of a breached disk > can be relied on to detect and correct the breach, so this objection > applies to cfengine, make, and even manual administration methods. I claim that one can make isconf convergent in the cfengine sense. Then it will correct out-of-band changes and will still have the properties that it has now. I think that the combination of sequence determination and convergence will be the long-term solution to all ills. > > ii) pushdown automata/context-free grammars. Right now I know of > > no configuration management tools in this category. > > Something's nagging at me here, having to do with LIFO ordering and a > configuration tool with rollback capabilities etc. It could be that > the "save original files for uninstall" capability of many package > managers is verging on pushdown automata, but with nowhere near an > accurate implementation. Yes. > > iii) turing machines/unconstrained grammars. > > The actual machines being configured live here. This is where > > undecidability may lurk. > > Yes. I suspect *all* self-administered hosts are one-tape turing > machines, regardless of administration tool. I'll get back to that in > a moment. Sure, they certainly are, but what's the *minimal* model? That determines complexity. > > This makes the CONFIGURATION model simpler than the EXECUTION > > model. While the execution model remains turing equivalent, the > > configuration model is fundamentally simpler in structure. > > Yes. Further, I suspect a cfengine script is equivalent to the > nonvolatile ruleset of a one-tape turing machine, while the target > machine's disk is the turing tape -- it reads a bit, alters the tape, > reads again, etc. Cfengine's base operations are simpler than that. They just assert state according to a template. > I suspect an ISconf makefile is equivalent to the ruleset of a > two-tape turing machine. The directory (/var/isconf/stamps) where > ISconf keeps its stamp files is a read/write tape, while the rest of > the disk is a write-only tape. ISconf itself doesn't care what's out > there on that write-only tape -- it drives blindly. (Though the tools > which are called by 'make' might care. For instance, it's perfectly > valid to call cfengine from within make.) This is dangerous thinking. According to the power theorems one- and two- tape turing machines are equivalent. > So, yes, the configuration model and the target machine might fit > different Chomsky levels when viewed in isolation from each other. > > But these analogies collapse into one-tape machines when you > take into account the fact that any administrative action has the > ability to modify the behavior of the administrative tool itself. To say nothing of the n-tape to 1-tape equivalence theorem... > Let's equate a one-tape turing machine's nonvolatile ruleset with the > CPU instruction set of a conventional Von Neumann machine. The disk > content is the turing tape. The operating system and subsidiary > programs reside on the same tape. The configuration tool itself is by > intent a finite automata "state machine", emulated by the turing > machine. This state machine's ruleset and control code reside on the > same tape as the operating system and other applications. > > That configuration tool has the ability to modify the content of the > rest of the turing tape -- that's its job. It can also directly > modify its own ruleset -- it can upgrade itself. Yes, but the key to all successful administrative strategies is to simplify that model somewhat. That model is provably too complex for a human to use.... > The content of the rest of the turing tape determines how the turing > machine interprets the configuration tool's ruleset. In practical > terms, this means that the behavior of the configuration tool is > dependent on the operating system kernel, shared libraries, and > subsidiary applications. > > So the configuration tool can modify its own behavior in two ways; > directly, by replacing its own ruleset; or indirectly, by altering the > way that ruleset is interpreted. > > Therefore, a configuration tool is capable of the self-modifying > behavior of a turing machine. Yes, but I know of no practical application of this. Things get really confusing if you allow cfengine to modify itself. > The instances in which a configuration tool will modify its own > behavior indirectly, by modifying code it depends on, are difficult to > predict. Doing so would require a full understanding of all parts of > our "turing tape" which affect the behavior of the tool. In practice > this would mean scoping and conducting an exhaustive analysis of all > pertinent code in the kernel, all applications and utilities which the > configuration tool calls, as well as all shared libraries and > configuration files these components depend upon. > > Unless this full understanding exists, a configuration tool residing > on, executing within, and modifying the behavior of a one-tape turing > machine *must* be assumed to behave like a turing machine, rather than > a finite automata. But this does not take into account that there are things that we as humans agree not to do in order not to get into this quandary. There is a pattern of practice that avoids this situation. > > b) Even with this simplification, the sequencing problem remains. It > > is easy to construct a sequence of invocations of N state machines > > such that only the order x1..xn gives the appropriate outcome on > > the turing machine tape. This is the same thing as what you're > > saying; without more simplifications, the outcome of every order is > > potentially different. > > Yes. > > Further, if the state machines are themselves executing within the > context of the turing machine tape, and their states interpreted by > code which they manage, then the order of invocations becomes doubly > important: If state machine x42 alters the ruleset of state machine > x43, then invoking x43 first, followed by x42, will cause a very > different outcome than x42 followed by x43. Again, we as humans try to avoid such confusion. This kind of confusion is possible in 'make', which is why I avoid it... > Cfengine does not consider its prior actions of a month or a year ago > to be part of the "tape" in order to avoid hitting this particular > problem: it can *only* rely on those spot checks. No. This is a mis-perception. It is true that most *users* of cfengine rely on spot checks. The more appropriate thing to do is to reverse-engineer the total effect of a make and then express it as a series of copies. This is not a spot check; it's instead a complete convergent installation of all components. The fact that people don't do this is a fault of the practice, not of the tool. I've talked with Mark about making this possible. > In Mark's LISA workshop we discussed whether cfengine 1.X might be > configured to maintain long-term state, or if cfengine 2.X might > support this natively, in order to make it more deterministic. I'm > not sure how many people present understood what this debate was > really about. I think we're doing a pretty good job of addressing it > here. Yes. The verdict so far is that cfengine 1.x is dead and that cfengine 2.x will probably track long-term state. > > c) Determining an appropriate order for arbitrary state machines to > > achieve a given outcome for configuration is INTRACTIBLE but not > > UNDECIDABLE. Hard but not impossible. All one has to do is to try > > all the orders and choose the one that produces the appropriate > > outcome. > > Yes, if you can afford N!. In practice, we don't need to try very > many of the orders at all -- the correct order is the one which worked > when each state machine was itself discovered to work. In other > words, if the order P currently works, and we develop, debug, and test > a new state machine xn which produces the desired outcome when run > immediately after P, then the new order P' should include only P and > xn, in that order. > > The way xn gets coded in an ISconf makefile is by adding the new state > machine into the makefile as an independent stanza or internally > dependent set of stanzas. Then we add the xn target name to the tail > end of the P prereq list, creating P'. > > > It becomes undecidable if other processes are SIMULTANEOUSLY > > modifying configurations, asynchronously. > > Absolutely. Manual administration Bad, 'make -j' Bad... And using > two different auto-administration tools on the same machine is Really > Really Bad. ;-) It's just "turing equivalent". This is what's "bad" because in making something *more* than simple state modifucation, one makes the system unpredictable. > > d) Determining whether an appropriate order achieves a particular > > behavioral outcome remains UNDECIDABLE. This is just another > > flavor of Church's thesis. > > I haven't been able to find a reference for this. Do you mean > "determining in advance"? If so then I agree. If not, then I'd have > to think about it longer. I mean that even if you fix the order, you have to validate that the resulting system works. You can't ever do anything, even in a deterministic sequence, that you don't have to validate. > > I think the real issue here is that VALIDATION of behavior only > > applies to a specific sequence of operations. If one changes the > > sequence of operations, then one must prove that the other sequence > > reproduced the same disk state, bit for bit, before one can claim that > > the prior validation applies to the new state. > > YES! This is very probably a core, if not *the* core, idea that I'd > like to get across. Sampling disk state alone can not ensure bit for > bit reproduction of disk contents nor resulting behavior. But > reproducing order of configuration changes will reproduce the same > disk state and behavior every time. > > I further suspect that "A machine's set of behaviors will *always* > vary if the disk contents vary". This might be proved by including the > disk content itself as part of the output set, which I think is valid > if the machine's programming includes introspection functions. > Machines with different disk contents will certainly always exhibit > different behaviors in response to 'cat /dev/*'. > > > This is a claim of "transitivity of validation", the same claim > > utilized by the Linux Standard Base (LSB). > > Hmmmm... This is really cool. LSB is arranged so that an application validated with respect to one instance of the standard environment is guaranteed to work in any other environment set up to the same standards. One can "debug once", "run everywhere". > > Now > > > > a) In the absence of further assumptions, whether two different > > configurations exhibit the same behavior is UNDECIDABLE(can prove). > > I agree, but haven't thought of a direct proof. An indirect proof is > what I've been building with this whole chain of thought over the last > few months... > > Recap: > > - sampling disk state alone isn't enough to determine disk state, > unless you sample the *whole* disk Or unless you sample only the parts that can change, with respect to a base configuration. E.g., CFengine with only copy, links, and dirs stanzas (user modules and file editing do not obey this properly). > - ...so state machines driven by only disk samples produce > non-deterministic behavior Correct, and this is bad cfengine use (and, for that matter, slink use) as well. This is part of why my slink is simpler than cfengine.... > - ...so the equivalence of behavior of two different state machines > driven only by disk samples is undecidable (that's easy) Yes. > - configuration driven by only deterministic ordering of individual > state machines produces deterministic behavior But so do operations that are strongly convergent such as convergent copying. The fact that some operations in cfengine don't support this means that these operations (e.g., file editing) should be avoided for validation purposes. > - because the state machines reside on and execute within the context > of the driven disk, they can modify each other, as well as > themselves Yes, but this is considered awful practice. > - ...so the behavior of individual state machines is dependent on the > order of invocation of all state machines only limited to this in cases where ordering isn't commutative. > - ...so whether two different orders of configuration operations > exhibit the same behavior is undecideable (takes me a while to get > here) > > > b) so whether two different orders of configuration operations exhibit > > the same behavior remains undecidable by lifting. > > Exactly. > > > The most important force in the world to system administrators is > > not configuration management, but configuration validation. > > Yes. In this context, it's amazing how many shops lack test > environments, but instead "try out" new administrative code in > production. In hindsight I can see why I've been harping on the need > for test environments for the last several years. Interesting that > it's this easy (hah) to justify that need theoretically. ;-) I think this will be the theme of this conference. It's emergent in the work of several people. > > It is this validation that suffers from turing equivalence problems, > > not the act of configuration itself. > > If you take into account the x43,x42 ordering problem I described > above, does this change your mind about the act of configuration > itself having a turing equivalence problem? Nope. I think that the turing equivalence problem comes from bad practice rather than being an invariant. Let me recap the discussion in a different way. The ultimate enemy of system administrators is unpredictability. We need to know that things will work the way they should. By nature, machines tend to behave unpredictably if administered in a haphazard fashion. The art of system administration is to avoid this natural tendency of the machines and create a stable operating environment. This unpredictability is rooted deeply in the nature of the machines that we administer. All machines are Turing equivalent and the act of administering them via any automated means has the potential to create situations in which the machines become self-programming. This means that the machines can enter unforeseen states where behavior is under-determined and not well understood. Due to this Turing equivalence, the correct functioning of a machine is undecidable. It cannot be "proven", but must be assured via rigorous testing in realistic situations. It can never completely be tested, but instead always contains unknown responses to particularly strange inputs. The three tools a system administrator has against these odds are validation, repeatability, and invariance. One *must* validate the function of any configuration, and be able to *repeat* that situation on various machines. Throughout this process there is a knowledge that certain kinds of state changes *don't* affect the function of the machine. E.g., we know that nothing in user directories affects the operating system. These are the "invariances". Without them, a system is never validated. An invariance is a *theoretical* assertion (or axiom) that certain kinds of changes can't possibly create changes in behavior. These invariances can be as simple as "comments in inetd.conf don't affect operation" or as complex as "gcc has nothing to do with rpm". Without a concept of invariance, there is no possibility of validation. In particular, subsystems that change constantly can't be validated in any one state. The only way that a validation can occur is if these changing components are considered *theoretically* as invariants. All successful strategies for system maintenance are *practices* that obey the principles of validation and repeatability with respect to accepted invariances or axioms. In other words, we don't restore the user disk when something breaks in the operating system. There are vast differences in what people are comfortable in calling "invariances" or accepting as "axioms of decoupling". This depends upon each administrator's level of trust in specific services. E.g., I "trust inetd" in the sense that I theoretically posit that "a commented line does nothing". So if I comment out tftp, I expect the service to go away. This is a kind of invariance: a level of trust in the theoretical correctness of a subsystem. At the same time, long experience has taught me not to trust linux dhcpd. It doesn't always do what it says, and is frightfully dependent upon its environment and the ethernet cards to which it talks. Where we differ is in how far we're willing to trust the axiomatic behavior of subsystems in validating systems. Inetd could be buggy and it's conceivable that a commented line could get erroniously interpreted in a buggy inetd. In your system, everything is subject to validation. In mine, I *presume* that inetd is working properly if its configuration has been validated. So that a) minor bit deviations in /etc/inetd.conf (that are considered invariant) can be ignored. In particular, I posit that - "comments are comments". - "order of lines doesn't matter". b) minor bit deviations in /usr/sbin/inetd are *not* ignorable. If these occur, I've probably been hacked. What you're saying is that in your validation model, you can't afford to ignore bit deviations in /etc/inetd.conf. That's fine, but a bit of an extreme view. Validation of complex systems is inherently difficult. We simplify the inherent complexity of systems by three methods: a) axiomatic assertions. Certain things can't happen and need not be tested. b) transitive validation claims: this works in this other known situation, and I know that what it needs is provided now, so I know it'll work here. Thus validating preconditions for a subsystem is sufficient to validate the subsystem. I think that what you're doing is asserting a transitive claim. You say that if your system works at state 5, and you make serial changes after state 5 that result in state 8, then only the subsystems changed between states 5 and 8 need be validated. This is true. But let's look at the other side of that. Suppose that we decompose ISConf operations into the file content assertions that they imply, and code that as a cfengine script. Let's utilize only pure convergent operations (copy, links, dirs) and avoid file editing, instead pumping images into place. Then a) order of operations doesn't matter, by design (but remember, we've abstracted things so that this is true by looking at results rather than incremental process. b) if we change this, we have to change the whole process to a new convergent state. We can't just tack things onto the end. c) the transitive validation property is that if we change things, only subsystems that use files that change have to be validated. This can be a lot of files, but it can be very few. This is the way cfengine is supposed to be used. Current users don't use it that way, and "get what they deserve". -alva From couch@eecs.tufts.edu Sun Apr 28 17:32:33 2002 Received: from lin16.eecs.tufts.edu (IDENT:postfix@lin16.EECS.Tufts.EDU [130.64.23.46]) by roton.terraluna.org (8.9.3/8.9.3) with ESMTP id RAA32528 for ; Sun, 28 Apr 2002 17:32:33 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin16.eecs.tufts.edu (Postfix) with ESMTP id 0DFCE47BBE for ; Sun, 28 Apr 2002 20:32:50 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 30) id 8B5D8CFC4; Sun, 28 Apr 2002 20:32:49 -0400 (EDT) To: stevegt@TerraLuna.Org Subject: Re: ordering Message-Id: <20020429003249.8B5D8CFC4@piano.eecs.tufts.edu> Date: Sun, 28 Apr 2002 20:32:49 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: O Content-Length: 7572 Lines: 163 > On Thu, Apr 25, 2002 at 09:07:10AM -0400, Alva L. Couch wrote: > > The only way to assure that two subsystems are orthogonal is by design. > > Testing cannot assure that. This is a fundamental premise of > > software testing. > > > > I think that this is the ONLY way to assure that a realistic build is > > reliable. But worse, there is NO way to assure the build is > > consistent across an enterprise of hosts, other than to assure > > homogeneity of operations and even perhaps hardware. > > Enacting all administrative change via serialized, tested and > reproducable sequences of operations constitutes homogeneity of > administrative operations. This will ensure consistent builds across > an enterprise. This is consistency, not homogeneity, according to my definition. Homogeneity is a property of independent processes that must agree. Arusha does this by taking all data from a common repository. Consistency is weaker; the last process wins and the last process is always the same one. So the winner is the same. There are no races in a system of homogeneous configuration actions. > Restating that, "if we do the same things in the same order to > identical machines, then they will remain identical in behavior". > Machines which aren't identical, or which by intent need different > builds, are created by variations of the tested base sequence. Yes. This makes sense. > This allows us to ignore orthogonality, rather than needing to > determine whether the subsystem developer designed for it. I prefer to exploit any orthogonality I can find and ignore any that doesn't stare me in the face:) > Commutability and orthogonality are difficult for ordinary sysadmins > to prove or disprove in the field. For a sysadmin to use a tool which > requires them to make these determinations, they must understand > correct use of the tool, the design of the subsystems they're > configuring, as well as the desired configuration result, and how to > test for it to make sure they succeeded. Agreed. We either use them when they are gifts, and ignore them otherwise, or we have to document such properties. > For a sysadmin to use a tool which ensures strict serialization of > sequences of operations, they only need to understand correct use, > desired result, and how to test for it. The "correct use" part is the > only additional thing they need to learn beyond what they would > normally do when using manual methods. This is trainable. In > practice, I've been able to orient sysadmins in makefile and CVS usage > in as little as a day, on-site. There is one problem with a serialized system, and I don't refer to the breakin problem. It can't be reconfigured 'live' because configuration instructions may change things necessary for proper system operation. It can't even necessarily be repeated twice to identical effect. How do you assure repeatability in ISconf? It has to be some kind of consistency criterion.... > If we require that sysadmins understand subsystem design well enough > to determine orthogonality, then we're expecting them to become more > of an engineer in order to use the tool. This is less trainable. I > would not be able to teach someone how to do this in a limited period > of time, on-site, in the middle of a rollout. Agreed. This is something that we either "document" or "forget". > Based on past experience I suspect any other tool will also have to > support: > > - preserving serialization of reusable subsequences > > - implicit assertion test of zero return codes for external commands > > - implicit reproducable serialization of operations not explicitly > serialized > > - semaphores which child processes can use to signal async events to > parent processes (like 'rebuild kernel and reboot when make is done') > > - a hierarchical grouping of host attributes, so that host function > can be determined quickly by eye > > - re-usable ordered sets of grouped subsequences, so that new hosts > can be created by prototype rather than by class (This is important. > True class-based configuration tools don't seem to work in the > field, while protoype-based systems consistently do. I don't think > I'm able to explain why yet; it may be nothing more than "this is > the way sysadmins think", or there may be a more theoretical basis. > I suspect, again, that it has to do with testing.) > > These are things which ISconf/make already does. In addition, over > the years those of us using ISconf have concluded that we also need: > > - postrequisites (like 'do foo after I'm done') > > - decentralized state machine specification (rather than a monolithic > makefile or script) > > - lexically bound syntax (I want to be able to specify each operation > in the language most suitable for that operation) > > - separation of action code from site-specific configuration data > > - decentralized editing of state machine specifications (no need to > log into gold server to update makefile) > > - state machine and file transport language integrated, as in > cfengine, to remove need for NFS mounts to get packages on demand > > - embedded documentation, like POD; including dynamically generated > runbooks and training checklists generated per host class (the > latter actually looks easy -- name a person to be trained as a > "target host" and "build" them, checking off training actions as > they complete) This goes into the paper for sure. > > > > You haven't convinced me that equivalence of the behavioral effects of > > > > randomly ordered configuration operations is undecidable. Just > > > > complex. You might be able to do this. > > > > > > I think of it from the other direction -- beyond a certain number N, > > > it's infeasible to prove empirically that N configuration operations > > > can be randomly ordered and produce the same behavioral outcome. N > > > appears to be as low as 8 or 9. > > > > This is an intractibility argument, not an undecidability argument; > > you don't need turing equivalence for this. In fact, it follows even > > if the machines being CONFIGURED are simple state machines! This is > > what bothered me about your math. Turing equivalence is irrelevant. > > Yes, turing equivalence is irrelevant for this math. It is proving > itself to be a useful model for illustrating the behaviors of a > self-administering system though. I don't think we could have had a > conversation this productive without that foundation. Agreed. > I've gotten similar feedback from other people over the past few > months. Folks seem to need to work with this abstraction; previously, > these discussions have always bogged down in behaviors of individual > packages and operating systems. Falling back to Turing's theoretical > model seems to give us more neutral material to work with. It's a bit dangerous though, because most people don't understand it well enough to reason from it. There are constant traps and pitfalls into which one can fall via imprecise thinking. > I know it's helped me immensely over the last few months in > understanding what I've been doing for the last several years. ;-) Sure. It may not be the best way to "sell" things, though. Our current audience is going to puke... I think you've got some really important ideas here, that are crucial to turning network administration from an "art" into a "profession". My thinking is surprisingly similar. Check out my students' abstracts when they're done (at the last minute:). -alva From lisa02chair@usenix.org Fri May 24 09:53:51 2002 Received: from lin17.eecs.tufts.edu (IDENT:postfix@lin17.EECS.Tufts.EDU [130.64.23.47]) by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g4OGrpv14069 for ; Fri, 24 May 2002 09:53:51 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin17.eecs.tufts.edu (Postfix) with ESMTP id 336151FE83; Fri, 24 May 2002 12:53:50 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 418) id DED87CFD7; Fri, 24 May 2002 12:52:50 -0400 (EDT) To: stevegt@terraluna.org From: lisa02chair@usenix.org Subject: LISA submission 050 Cc: paul@dcs.ed.ac.uk, couch@eecs.tufts.edu Message-Id: <20020524165250.DED87CFD7@piano.eecs.tufts.edu> Date: Fri, 24 May 2002 12:52:50 -0400 (EDT) Status: RO Content-Length: 19022 Lines: 394 Dear LISA'02 author, It is my great pleasure as Program Chair of LISA'02 to inform you that your paper: 050: Why Order Matters: Turing Equivalence in Automated Systems Administration has been accepted as a refereed paper for LISA'02, which will be held from November 3-8 in Philadelphia, PA, USA. Congratulations! At least one author must attend the conference to present the paper. Your talk has been allocated 30 minutes (a 'long talk') at the conference. Please plan on presenting your paper in 22-25 minutes and save the remaining time for questions and comments from the audience. Your talk will tentatively occur during the session at Wed 2002/11/06 4:00 - 5:30 The chair for your session is: Paul Anderson (paul@dcs.ed.ac.uk) Shepherds for your paper include: Paul Anderson (paul@dcs.ed.ac.uk) Alva Couch (couch@eecs.tufts.edu) All of these people will receive a copy of this message and will contact you directly soon after you receive this message. Important dates: - July 12, 2002: near-final draft of paper due to shepherds for comments. - Aug 1, 2002: (optional) Last date to submit your paper for copy editing. - Aug 12, 2002: final draft due to typesetter; only minor corrections are permitted after this time. - Nov 3-8, 2002: LISA'02! Important notes: - All paper acceptances are contingent upon addressing any deficiencies that are discussed in the reviewer comments listed below. Please take a moment to read these comments before beginning to write or revise your paper. Reviews do not identify the reviewer unless specifically requested by the reviewer. - Papers are due to the typesetter in final form by Aug 12, 2002. You will be mailed instructions on how to submit your paper under separate cover, from our typesetter Rob Kolstad (kolstad@ace.delos.com). *All* papers (long talk and short talk) are subject to the same length restrictions (16 sides in the proceedings). - At least one author must attend the conference to present the paper. Please register early for the conference. One author of each accepted paper is entitled to complementary registration for the technical program. To keep conference expenses down and fees low, we respectfully request that refereed paper speakers first ask their employers to pay conference registration fees. The USENIX Conference Department will contact you regarding complimentary registration in cases where your employer will not cover registration costs. - Authors who are full-time students are welcome to apply for a stipend to cover more costs, including travel and lodging. Please see http://www.usenix.org/students/stipend.html for more information about the student stipend program. - Your session chair will introduce you at the conference and manage the session in which you present your paper. Time limits are non-negotiable. Please provide the session chair with a brief biography of the speaker that can be read to introduce your paper at the conference. - Shepherds stand ready to help you produce the best possible final paper. Shepherds will contact you directly to discuss how they might best aid you in accomplishing this. At minimum, shepherds will read a near-final draft of your paper and provide comments. Please provide a near-final draft of each paper to the shepherds and session chair by July 12, 2002. Shepherds are also available to answer any questions you might have about writing or reviewer comments, and will be happy to comment upon intermediate versions of your paper upon request. - At your option, you may also enlist the aid of our official LISA'02 copy editor, Jeff Allen. If you wish your paper to be edited carefully for spelling, grammar, etc, please mail an editable copy to lisa02copy@usenix.org before August 1, 2002. The paper will be edited and sent back to you for final approval. Mailing a paper to the copy editor is *not* a way to submit it for printing. - I will contact you later with more details on your presentation for the conference. The presentation hall will have a projection screen for a VGA laptop and an overhead projector for overhead transparencies. There are no loaner laptops on site -- you must bring your own laptop or arrange to borrow one. It is always wise to prepare overhead slides even if you plan to use PowerPoint. - For any and all other problems, please feel free to contact me for assistance (lisa02chair@usenix.org). We received many fine abstracts this year! Thanks for helping to make LISA'02 a success! Alva L. Couch LISA'02 Program Chair lisa02chair@usenix.org ======================================================================= Reviewer comments: Title: Why Order Matters: Turing Equivalence in Automated Systems Administration Authors: Steve Traugott, TerraLuna, LLC; Lance Brown, National Institutes of Health Email: stevegt@terraluna.org I think it's very important to define self-adminstration early in the paper. Without a good definition, the paper lets the reader's imagination fill in the blanks -- and that seems like a bad idea. Many of the ideas are put forth as 'laws' that sound like absolute truths (e.g., rule number 9 about a machine changing its disk contents means that future changes will be made in a different way). While we can cite examples where this is true, I think we can also cite examples when it isn't true. Maybe using the word 'can' or 'might' would make the rule a bit easier to understand. The pivotal rules 3 and 4 in the implications for self-administration presume that changes are not 'independent'. I think there's probably some benefit to be had by asserting that requiring independence is a difficult task, fraught with peril in the real-world. The prescription -- a starting bitmap + a bitstream of all possible changes is easy to see as a theoretical sort of argument. I'm a bit more concerned about the pragmatic aspects of such an idea. Maybe the paper should show some of the ways that real upgrades, real software, real-world administrative techniques can use this data. I don't see how such a thing would work in the framework as described. Finally, I don't think the paper derives much 'bang' from the fundamental notion of Turing equivalence. That seems like a neat result that might have some predictive properties -- maybe they can be explored? ======================================================================= The theory of system administration is much needed but has received little attention until very recently. This paper adds to that theory with a detailed discussion of the theoretical ramifications of "machines that repair themselves." The authors point out rightly that this aspect of system administration poses serious theoretical questions that may be undecidable in the computer science sense. They contend that we can avoid some of the undecidability in this situation by adopting a strict order of operations for configuration actions. This is quite true. It is not the only way, however. Throughout the development of the theory of computer science, we have found that difficult problems become easier with the addition of appropriate constraints. While it is true that unconstrained automated processes are Turing equivalent, it is also well known that such processes become easier to analyze when simple constraints are imposed. For example, limiting one's self to copying files and links in the 'cfengine' sense makes one's configuration process equivalent to a simple state machine rather than a turing machine. Deterministic ordering is *another* way to achieve this reduction for arbitrarily complex configuration actions. There are many others. One must be *very* careful with language. "Turing equivalence" is an ideal state for ideal machines. The "infinite" nature of such machines makes "undecidability" possible. This is not an attribute of real machines (with finite memory), for which determining the correctness of a configuration is simply "intractable." I have serious problems with the writing style, which is choppy, with many one-sentence paragraphs that fail to fully develop ideas. Paragraphs need to be combined and developed so that they flow. Make sure that you include references to the many papers in the literature that disagree with your conclusions! ======================================================================= This paper advances the premise that the order of application is important for configuration changes, since it is not possible to prove that two configurations are identical if changes are applied in different orders. This argument is used to advocate a configuration system in which all hosts start as identical disk clones and have changes applied in identical sequences. This is an interesting concept for anyone involved with configuration systems, particularly in highly critical environments with large numbers of identical machines. However, there are a number of objections to this approach which are not addressed in the abstract, that I would like to see covered in any final paper: 1) Machines are not completely deterministic in practice (they get external interrupts) and even machines which have had identical configuration changes applied, will not have identical disk images or even identical sequences of system calls. 2) For the purposes of system configuration we are interested in equivalence between certain important high-level behavioural properties, and it may be possible to prove these, without requiring equivalence at the bit level. 3) I would like to see some concrete examples where alternatice sequencing has caused a problem in practice.Is this a significant problem compared to other system configuration difficulties? 4) I would like to see how these ideas might be applied in practice (the abstract does suggest that this is intended) 5) How can we deal with systems with very diverse and rapidly changing configurations? I personally found the discussion of Turing and Von Neumann machines a little confusing, and it might be worth trying to express this in a more concise way. ======================================================================= This paper is an enthusiastic attempt at arguing that system modifications must always be ordered. However, I do not think that it succeeds in doing so in an effective or convincing way. There are a number of major problems with the abstract: * The two halves of the paper--pages 1-top of 3, and 3-end--are unconnected, despite assertions to the contrary. The argument in the second half of the abstract--which is its true point--does not really depend on or even use the Turing/von Neumann concepts presented briefly in the first half of the abstract. * The paper does not discuss its position in the context of the large amount of related work. Rather, it seems to simply take up an conversation which occurred last December. Keep in mind that most LISA attendees will have no knowledge whatsoever of this context. * At some points, the paper makes unwarranted and often unacknowleged assumptions. For example, point 4 on page 3, which asserts that ABC is always non-equivalent to ACB. Constructing a counterexample is trivial (just choose ABC to be independant--not hard at all). The rest of the argument follows from this dubious, invalid assumption. Indeed, this is the whole point of contention in the first place, so simply asserting that it is so begs the question. Logically, the argument is: o We must decide between A and not A. o A is obviously true. o Therefore, A follows. o Therefore, A's consequences follow. I agree that if order matters and it is ignored, then bad things happen. But order does not always matter. Deciding when it does, the extent to which it is important, and the best tradeoff of complexity vs. risk of unforseen interactions is the hard part. The authors need to spend a lot more time thinking about these topics. They ought to be ready to submit a much better abstract next year. I recommend rejecting this paper. Finally, the first sentence is bizarre; I'd rethink it and revise it into something with a little more meat and less cuteness. ======================================================================= Very well written, good detail and references. (I am amused to see Turing's 1939 work cited. :-) ) Very nicely done. ======================================================================= This paper presents me with a dilemma. On the one hand, it addresses an interesting problem and does so in a provocative way that I approve of. It purports to offer a counterpoint to cfengine's recently much written-about ideas of configuration "convergence". I would like to see a paper that does this succeed in making its points clearly. The problem with this abstract, as presented, is that it is full of technical errors, misunderstandings and untrue assertions about systems. It might be possible to salvage something of it, with significant sherpherding, but as it stands it is just misleading and factually incorrect. Let me take some examples. "A 100k sample cannot determine the state of a 2GB disk." This is obviously not true. A rudimentary program (e.g. rm -rf /) takes only a few bytes to implement, but very clearly determines the whole disk. More generally, the authors are assuming that every bit of a configuration is significant. It is not. Turing machines would not work at all unless algorithms could be coded into significantly less space than the size of their tapes (actually their tapes are infinite, so von Neumann machines only approximate Turing machines). Compression and coding need to be taken into account. This is an information theoretical limitation, nothing to do with Turing machines. The authors do not discuss the amount of information required to describe a configurastion at all. This has to do with the amount of entropy in the disk: if the disk is to be empty, it is very easy to keep it that way. That is still a valid configuration. Clearly there is some middle ground here. Not all bytes are independent, so it is not true that one needs the complete history to explain a state like that resulting from rm -rf / "The entire past history of changes to the machine are important... if we expect deterministic, reliable behaviour in operation..." The authors seem to be forgetting all of the changes made by users. Are they talking about only a part of the computer, or all if it? What about all of the hidden variables caused by network communications, resulting in changes to log files and other effects etc? "If we were to allow the order of changes to be unsequenced, we must test each possible sequence of changes. This is an intractable problem." This is wrong. Indeed, Couch showed how this could be done at LISA 2001, and this is the same approach used by cfengine. The number of orderings might be M!, but that is not the number of interest, because it neglects the coding and the amount of information in those bits. "..must continue to test [hosts] individually henceforth ... this is very expensive. It is easier to avoid unsequenced changes" First of all, the idea of a sequence assumes that there is a prior dependency chain at work. This is only true if the operations being performed are intertwined -- that is only true of complex operations, and is usually an artifact of software. The idea that testing the configuration of a host is expensive is an assertion, not backed up with any figures. Cfengine seems to get along fine with this. Also the assertion that it is easier to avoid that approach is just an opinion, not a justified conclusion. The authors miss some points which would be in their favour - such as how ordered information within critical policy files fits their arguments much better than arguing about Turing machines and systems as a whole. Critical policy files can determine the entire behaviour of the system, but they are just a tiny part of it, and they can lead to critical dependencies. It is not so much order that matters as the chain of dependency. I would drop the whole Turing argument - it is a red Herring. If the authors want to criticize cfengine's approach, they should attack it where it is weakest: in the issue of hidden dependencies through unpredictable software and its relatioship to configuration files. Unpredictable software can turn predictable intentions into chaos. ======================================================================= This is an important paper for the advancement of theoretical sysadmin. It unfortunately provides more problems than answers, but sometimes one has to walk deeper into the forest before finding ones way home. There's a case to be made that a workshop is a better place to work on theoretical sysadmin, especially for papers like this with "intermediate results". I'm eager to see the fruit of the theoretical sysadmin movement, but I'm unwilling to see LISA turn into a collection of dry, inaccessible, mathemtical arguments. This paper threatens to take us down that road. ======================================================================= This is a strange paper. The writing often consists of a series of one-sentence paragraphs, and they sentences look like points of a manifesto. It continues a debate from LISA ’01 that this reviewer did not participate in, on whether or not a self-administering system needs to be deterministic in actions. The authors suggest this point is a touchstone upon which the community must agree or else little useful progress can be made. The crux of the argument seems to be whether you need to include the history of changes to a disk in a self-administering system that can modify anything on the disk. It then switches to a tutorial on Turing machines, followed by the author’s observations on Turing machine equivalence. We agree that a von Neumann computer is a Turing Machine, but are surprised that anyone in the last 50 years thinks it necessary to make that point in a paper. They seem to think that the Turing tape corresponds to disk. A simpler (and widespread) alternative interpretation is memory is the equivalent to tape. Perhaps the authors are impressed with the nonvolatility of paper tape and disks? This paper is an op-ed piece that was inspired by discussions at the last LISA, with cfengine the apparent target of its wrath. The program committee will have to judge this paper on whether it will lead to useful discussions at the next LISA. I think the probability of lively discussion is a perfectly valid reason for the PC to accept a paper. It is not clear to this reviewer what are the important contributions of this paper beyond a potentially lively debate, and whether it sheds light or just adds heat. ======================================================================= From couch@eecs.tufts.edu Fri May 24 09:59:39 2002 Received: from lin18.eecs.tufts.edu (IDENT:postfix@lin18.EECS.Tufts.EDU [130.64.23.48]) by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g4OGxdv17436 for ; Fri, 24 May 2002 09:59:39 -0700 Received: from piano.eecs.tufts.edu (piano.eecs.tufts.edu [130.64.23.40]) by lin18.eecs.tufts.edu (Postfix) with ESMTP id 568306F98B; Fri, 24 May 2002 12:59:38 -0400 (EDT) Received: by piano.eecs.tufts.edu (Postfix, from userid 418) id 4350BCFC4; Fri, 24 May 2002 12:59:38 -0400 (EDT) To: stevegt@terraluna.org Subject: Re: LISA submission 050 Cc: paul@dcs.ed.ac.uk Message-Id: <20020524165938.4350BCFC4@piano.eecs.tufts.edu> Date: Fri, 24 May 2002 12:59:38 -0400 (EDT) From: couch@eecs.tufts.edu (Alva L. Couch) Status: RO Content-Length: 211 Lines: 5 It should be rather obvious that things are going to get really interesting around here. I am willing to work much more closely with you than a typical shepherd. Keep the ideas bouncing back and forth. -alva From kolstad@ace.DELOS.COM Mon Jul 8 09:00:27 2002 From kolstad@ace.DELOS.COM Mon Jul 8 09:00:27 2002 Received: from ace.DELOS.COM (ace.delos.com [65.102.83.114]) by roton.TerraLuna.Org (8.11.6/8.9.3) with ESMTP id g68G0Pf29175 for ; Mon, 8 Jul 2002 09:00:25 -0700 Received: (from kolstad@localhost) by ace.DELOS.COM (8.10.1/8.10.1) id g68FK7f22738 for stevegt@terraluna.org; Mon, 8 Jul 2002 09:20:07 -0600 (MDT) Date: Mon, 8 Jul 2002 09:20:07 -0600 (MDT) From: Rob Kolstad Message-Id: <200207081520.g68FK7f22738@ace.DELOS.COM> To: stevegt@terraluna.org Subject: --- USENIX LISA Conference Paper Submission Instructions --- Status: RO Content-Length: 17095 Hi! Congratulations on your paper acceptance for the LISA conference in Philadelphia. Please let me introduce myself as the formatter (typesetter) for your paper. --> Your paper is due on Monday, August 12, 2002. Really. <-- I'm enclosing detailed instructions below. This is really easy stuff -- just fight the temptation to format the paper yourself! I will be your contact for all typesetting issues. Please feel free to write me with any questions. Rob ==================================================================== /\ Rob Kolstad http://www.delos.com /\/ \ kolstad@delos.com 15235 Roller Coaster Road / \ \ +1 719-481-6542 Colorado Springs, CO 80921 ==================================================================== ------------------------------------------------------------------------------ INSTRUCTIONS FOR USENIX AUTHORS AND SPEAKERS Rev. June 8, 2002 LISA XVI, 2002, Philadelphia, Pennsylvania Eric Allman, University of California, Berkeley Judy DesHarnais, USENIX Association Rob Kolstad, Berkeley Software Design, Inc. Evi Nemeth, University of Colorado ABSTRACT This paper presents instructions for authors of papers at USENIX technical conferences and guidelines for speaking at those conferences. It discusses details of preparing papers for inclusion in the proceedings and of preparing visual aids for use during the talk. Careful adherence to these instructions will result in a calmer interaction between submitters and conference organizers, to everyone's benefit. NO MATTER WHAT, please read the enclosed checklist and follow its instructions. Final Paper Submission Deadline: August 12, 2002 1. General Please do not spend time making the paper look just the way you want it. In fact, don't spend any time at all. We'll end up re-paginating and re- formatting it, anyway. Really; every single time. If it's convenient or if you have tricky tables, diagrams, or pictures, send a Postscript copy of the entire paper along so that the typesetter can see what it looks like. Note that you are not sending a final draft or a paper to which you still have edits to apply. You're submitting a final paper which will be returned as time allows to you for a single `final typesetting check' before being reproduced. Please don't ask to make very many changes after you've sent your paper in unless it's a true emergency (e.g., company secrets shouldn't be disclosed). 1.1 Reviewing typeset papers We will e-mail your formatted paper back to you (in PostScript) as soon as it is formatted. Please review it quickly (i.e., one day) and let us know if there are any problems or not. Most papers will be returned within a week of their submission. Please designate an alternate reviewer if you can't return an `OK' to us almost immediately. Three days is the absolute maximum time for this review as we're on a tight deadline. 2. Text Submission To ensure a uniform look in the conference proceedings, most papers will be typeset for you by Rob Kolstad. This really ends up making it easier for you if you follow the directions below - don't work hard! Just put the words in a file and we do the rest. Authors are requested to submit electronic copy in one of many forms: o ASCII o ASCII with markups for italics, etc. (e.g., ... or \fI...\fP) o Troff o TeX (subset of LaTeX standard macros) o Microsoft Word (but not Microsoft Word illustrations - only postscript illustrations) The typesetter will create both PostScript (for proceedings and archives) and HTML output (for the archives). Submissions will also be published in the USENIX online library on the World Wide Web. Please do not spend any time at all formatting your paper other than getting the content of the diagrams to look nice (they are inserted without change, usually) and somehow marking those words that are in italic, bold, or fixed-width type. The typesetter does a fine job and will send you back a formatted copy for review and comment. You probably won't be able to get a good test of what your final output will be like since character width tables, for example, differ from system to system. In fact, you're better off sending your paper early and letting the typesetter do his magic and get it back to you quickly (often within 24 hours if you beat the deadline by more than a week). ASCII is a fine submission medium. FrameMaker, MSWord, and other WYSIWYG editors all have an ASCII output option - feel free to use it. Send along a postscript file of the document with properly displayed italic words, bold words, and the like so we get it typeset correctly. Some authors feel so strongly about typesetting their papers that they are reluctant to submit them for re-typesetting. If you are one of those very few authors, we will paste page numbers onto your paper and print it as submitted. We strongly prefer not to do this and have not done so for almost a decade. If you require it, please send us both PostScript for your paper (and for the archives) and ASCII (for the archives and WWW pages). If you can not submit electronic copy, please contact Rob Kolstad [Mountain time: 719-481-6542] IMMEDIATELY. => ALL SPEAKERS MUST PROVIDE A PAPER FOR THE PROCEEDINGS <= If you can not submit a suitable copy of your paper by the deadline, your talk will be struck from the conference. 3. Formatting Please DO NOT SPEND TIME FORMATTING YOUR PAPER. We just end up reformatting it anyway. Counterintuitive as this might seem, it really is the fastest way for everyone concerned. While the typesetting dude will be using troff, hardly anyone else does any more. Please send a quick note if you'd like to submit in native troff macros. Otherwise, feel free to use -me or -ms, but don't make it fancy! Feel free to submit pidgin troff with any consistent use of macros. If you'd like to submit in TeX, we use a converter that reads a subset of LaTeX and converts it to our standard format. Please do not use other TeX features (like macros and defines)! Please do not create your own clever macros! This includes abbrevations of any sort. Please use only these macros: abstract emph list subsection and end maketitle subsubsection author enumerate newblock table begin evensidemargin nocite tabular bf figure noindent texttt bibitem footnote normalsize textwidth bibliographystyle footnotesize oddsidemargin thanks center hline ref title cite hyphenation reg tm date it section tt description item sl verb document itemize sloppy verbatim documentstyle label sloppypar em ldots small Please use the standard font changes, not new ones that you define. If you'd like to submit in Microsoft Word, enclose the file as some sort of an attachment and it will be converted. Note that we can accept illustrations only in encapsulated Postscript - MSWord illustrations are almost always unusable. If you don't want to submit in troff, TeX or MSWord, PLEASE SUBMIT AN ASCII FILE (maybe with markups to indicate typesetting directives like `italic' or `bold') - it takes less time, on average, to typeset an ASCII file from scratch than it does to typeset page numbers onto your pages. Double space between paragraphs; mark your headers in some consistent format that a human can decode. It takes about 10 minutes for us to add typesetting commands to an ASCII file. It can take hours to untangle clever macro invocations and/or formatting commands from other formats. We can not decipher PostScript versions of entire papers for re- typesetting purposes. 4. Diagram/Illustration Submission Submit diagrams as Encapsulated Postscript files. Note that just putting a diagram on a page and capturing printer output is not encapsulated postscript. In fact, various packages now use so many tricks that it is often impossible to extract the essence of the diagram. Encapsulated postscript truly is the answer. Furthermore, FrameMaker pictures are really painful to deal with - their encapsulated postscript doesn't work very well. Please do not put captions with your diagrams or boxes around them. Put the captions in the text (as comments if you need to), not the diagram. The typesetter will insert the diagram name (e.g., ``Listing 1'') and caption on the diagram. Feel free to send us GIFs or TIFFs (e.g., screen images) for making into Encapsulated Postscript - our tools work great. Normally, your paper will end up being typeset in double columns. If you can confine your tables and pictures to about 3.25 inches wide, that'd be great. Otherwise, we'll put the larger figures and diagrams across the two columns - most likely floated to the bottom or top of the page. Be sure that we can decipher the names of your figures and where they should appear in your text. 5. Details 5.1 Page Count Please limit the length of your paper to 16 typeset pages (surfaces) total (some papers are more limited; you've been informed if this is the case). Some dispensations may be available. Note that the USENIX macros do a good job of compressing your paper from other printing methods. Many papers see page compression ratios as high as 2:1 or higher (i.e., 30 typewriter pages turn into 15 typeset pages). We do not anticipate page length to be a major issue. Please contact the typesetter if you think you might exceed this limit. 5.2 Author's Biographies Author biographies follow each paper. Here is a sample biography: Tom Christiansen left the University of Wisconsin in 1987 with an MS-CS where he had been a system administrator for 6 years. He joined CONVEX Computer Corporation in Richardson, Texas where he is a software development engineer in the Internal Tools Group there. Reach him via U.S. Mail at CONVEX Computer Corporation; POB 833851; 3000 Waterview Parkway; Richardson, TX 75083-3851. Reach him electronically at . 5.3 Availability Please include a section at the end of your paper that provides software availability information. If the system you describe is available to others or if more information (reports, etc.) may be obtained, please indicate terms and contact information. 5.4 Submitting your Paper Submissions should be mailed electronically to: . Please mail the ``big paper'' in a single file along with separate .eps files using MIME or some other reliable transfer mechanism (not PGP). All received e-mail is unpacked and acknowledged as soon as it appears that the files are received correctly. Illustrations are checked immediately as they are often the main problem in transferring info to us. In case you mail us hardcopy of your paper (due to complex figures and tables, difficult typographical features, or to show bold/italic words), we have learned that the US mail system and Federal Express are not always as gentle with your paper as you might like. We have found that the following guidelines maximize your paper's chance of survival: no paper clips; no staples; correctly sized envelope, not too big; cardboard stiffeners front and back. This applies particularly to photographs that you may mail to us. 5.5 Trademarks Please do not include special trademark notices throughout your paper (e.g., ``UNIX is a trademark of ...''). Instead, please note the trademark references near the top of your paper in comments (or send them under separate cover that notes that the trademarks are used). We'll include all the trademarks in one section at the beginning of the proceedings. 5.6 Licensed Material The matter of publishing licensed material is of grave concern to USENIX. One incident has already come dangerously close to causing serious problems. The Association does not intend for this to happen again. The final copy will be reviewed for such content, and if, in the opinion of those responsible for worrying about such things, the submission contains licensed material, it can not be printed in the Proceedings. The deadlines are such that it is unlikely one could amend a submission and get it back in time, so look before you send it. If there is ANY question on this matter, please contact your program chair immediately. 5.7 Author Registration for Conferences Registration info has been sent to you under separate cover. 6. Deadlines In order to publish the most timely of papers, the publication schedule is very tight. We require your cooperation in order to have the Proceedings available at conference registration. To meet the necessary internal deadlines, we must have: o The electronic copy for your paper (including biographies), o Hard copy of it for reference (rarely required), o Hard copy of any graphics embedded in the paper (rarely required), and o The completed checklist (electronic copy is fine); see next paragraph We must have these in hand at the close of business on August 12, 2002. Manuscripts arriving after this date will be at risk. On the enclosed checklist that you will include with your paper, please be sure to indicate: your name; your mailing address; your electronic mailing address; your phone numbers, preferably both day and evening numbers (which we really do use when required); and any other information the people printing the Proceedings might need to contact you (such as "on vacation next week, please contact second author at ..."). It is vitally important that we have a way of contacting you quickly if there are any problems with your submission. 7. Contacts The program chair is Alva Couch (couch@eecs.tufts.edu) The proceedings typesetter is Rob Kolstad [Mountain time: 719-481-6542]. For other issues,such as audio/visual requests and registering for the conference, contact conference@usenix.org [Pacific time: 714-588-8649x30]. Please note: laptops are not provided. If you wish to discuss any special A/V needs, please contact Greta Thurman; conference@usenix.org; +1 714-588-8649x30 If you need to contact any of us, the preferred route is electronic mail. CHECKLIST FOR USENIX AUTHORS AND SPEAKERS LISA XVI, 2002, Philadelphia, Pennsylvania 1. Electronic Submission Checklist All these items are due by August 12, 2002. If you can not submit by then, please contact the formatter immediately. Please submit them electronically to . 1.1 Author's Contact Information Please provide contact information for at least one author who will be in town the week following the deadline for paper submission: Name: ____________________________ Address: ____________________________ ____________________________ ____________________________ Work Phone: ____________________________ Home Phone: ____________________________ Email: ____________________________ 1.2 Items for the paper _____ text of your paper in ASCII, troff, TeX, or MSWord _____ Text includes biography for each author (in the paper itself) _____ TeX users: include output after running your text through refer or bib (if you use them) _____ Troff users: pic input for your graphics (in the paper, if you use them) _____ Encapsulated PostScript files for all diagrams & pictures (if you use them) _____ Special instructions for formatting your paper (if required) _____ 'Consent to publish' form sent to USENIX HQ 2. Audio Visual _____ While a microphone, overhead projector, and `beamer' (computer display projector) are available, laptops are not provided. If you wish to discuss any special A/V needs, please contact Greta Thurman; conference@usenix.org; +1 714-588-8649x30 From stevegt@TerraLuna.Org Wed Jul 10 02:39:42 2002 Date: Wed, 10 Jul 2002 02:39:42 -0700 From: Steve Traugott To: Paul Anderson , couch@eecs.tufts.edu Cc: lance@TerraLuna.Org Subject: Progress on LISA 050 (Turing) Message-ID: <20020710023942.B23398@scramjet.TerraLuna.Org> References: <200207041229.g64CTIZ04264@moondog.dcs.ed.ac.uk> <20020705144722.B30520@scramjet.TerraLuna.Org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020705144722.B30520@scramjet.TerraLuna.Org>; from stevegt@TerraLuna.Org on Fri, Jul 05, 2002 at 02:47:22PM -0700 Status: RO Content-Length: 2620 Hi All, Taking longer than I expected, but making progress. I wouldn't waste your time looking at it yet, but I just checked the latest (very) rough draft out of CVS at: http://www.infrastructures.org/papers/turing/turing.html ...should be more readable by the end of this week. I want to get this stabilized soon so I can focus on shepherding other papers. Two interesting thoughts (neither of which is in the text yet): - I think we can now demonstrate a clean NP, P, NP-complete progression for finding one good sequence of changes, validating that sequence, or finding all good sequences, respectively. The idea here is that it's slightly expensive to figure out what order to make changes on a host to get a desired behavior, less expensive to test for that behavior, but intractably expensive to figure out the set of all sequences which will produce the same behavior. This would seem to imply that you will greatly reduce costs if you can constrain yourself to the first sequence you validated. This is nice to be able to show, since it's one of the core intuitions we started with last year. (Thanks for pushing me on this, Alva.) - A network-connected host which autonomously fetches and executes privileged-mode code from other hosts, and uses that code to decide what code to fetch and execute later, may be closer to emulating an infinite-tape Turing machine than any of us has suspected (it's a two-tape machine, with a finite, local r/w tape and an effectively infinite read tape). This seems especially true if privileged-mode code has an intrinsic ability to fetch and install new copies of itself from arbitrary servers elsewhere on the Internet (think debian, for instance). Still thinking about this one. G'night, Steve On Fri, Jul 05, 2002 at 02:47:22PM -0700, Steve Traugott wrote: > ...slaving away at it over this 4-day weekend... ;-) Hope to get a > draft to both of you by Monday. > > Steve > > On Thu, Jul 04, 2002 at 01:29:18PM +0100, Paul Anderson wrote: > > > > :-) > > > > Paul > > > > couch@eecs.tufts.edu (Alva L. Couch) says .. > > > > >Happy 4th of July. Now would be a good time for shepherds to remind > > >authors that they'd like to see a near-final draft before the end of > > >July. > > > > > > > > -- > Stephen G. Traugott > UNIX/Linux Infrastructure Architect, TerraLuna LLC > stevegt@TerraLuna.Org > http://www.stevegt.com -- Stephen G. Traugott UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@TerraLuna.Org http://www.stevegt.com