Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Burgess M.Principles of network and system administration.2004.pdf
Скачиваний:
163
Добавлен:
23.08.2013
Размер:
5.65 Mб
Скачать

6.7. COMPETITION, IMMUNITY AND CONVERGENCE

225

with the smooth running of systems. When two manual administrators have a difference of opinion, there can be contention. The relevance of interpersonal skills in system administration teamwork was considered in ref. [168] and a cooperative shell environment for helping to discipline work habits was considered in ref. [2].

6.6.3Central control

Another approach to system administration is the use of control systems, in the manner of the star model. Tivoli, HP OpenView and Sun Solstice are examples of these. In the control approach, the system administrator follows the state of the network by defining error conditions to look for. A process on each host reports errors as they occur to the administrator. In this way the administrator has an overview of every problem on the network from his/her single location and can either fix the problems by hand as they occur (if the system supports remote login), or distribute scripts and antidotes which provide a partial automation of the process. The disadvantage with this system is that a human administrator usually has to start the repair procedures by hand and this creates a bottleneck: all the alarms go to one place to be dealt with serially. With this approach, the amount of work required to run the system increases roughly linearly with the number of hosts on the network.

6.6.4Immunology (self-maintenance)

A relatively new approach to system management which is growing in popularity is the idea of equipping networked operating systems with a simple immune system. By analogy with the human body, an immune system is an automatic system that every host possesses which attempts to deal with emergencies. An immune system is the Fire, Police and Paramedic services as well as the garbage collection agencies. In an immune system, every host is responsible for automatically repairing its own problems, without crying warnings about what is going on to a human. This avoids a serial bottleneck created by a human administrator. The time spent on implementing and running this model is independent of the number of hosts on the network.

6.7 Competition, immunity and convergence

All collective systems (including all biological phenomena) are moderated and stabilized by a cooperative principle of feedback regulation. This regulating principle is sometimes called the prey–predator scenario, or a game, because it is about competition between different parts of a system. When one part of the system starts to grow out of control, it tends to favor the production of an antidote which keeps that part in check. Similarly, the antidote cannot exist without the original system, so it cannot go so far as to destroy the original system, since it destroys itself in the process. A balance is therefore found between the original part of the system and its antidote. The classical example of a prey–predator model is that of populations of foxes and rabbits. If the number of rabbits increases suddenly, then foxes feed well and grow in numbers, eating more rabbits, thus stabilizing

226 CHAPTER 6. MODELS OF NETWORK AND SYSTEM ADMINISTRATION

the numbers. If rabbits grow scarce, then foxes die and thus an equilibrium is maintained. Another example of this type of behavior is to be found in the body’s own repair and maintenance systems. The name ‘immunity’ is borrowed from the idea that systems of biological complexity are able to repair themselves in such a way as to maintain an equilibrium called health. The relative immunity of, for instance, the human body to damage and disease is due to a continual equilibrium between death, cleanup and renewal. Immunity from disease is usually attributed to an immune system, which is comprised of cells which fight invading organisms, though it has become clear over the years that the phenomenon of immunity is a function of many cooperating systems throughout the entire human organism, and that disease does not distinguish between self and non-self (body and invader) as was previously thought. In the immunity model, we apply this principle to the problem of system maintenance.

The immunity model is about self-sufficient maintenance and is of central importance to all scalable approaches to network management, since it is the only model which scales trivially with the number of networked hosts. The idea behind immunity is to automate host maintenance in such a way as to give each host responsibility for its own configuration. A level of automation is introduced to every host, in such as way as to bring each host into an ideal state. What we mean by an ideal state is not fixed: it depends on local system policy, but the central idea of the immunity model is to keep hosts as close to their ideal state as possible.

The immunity model has its origins in the work of John von Neumann, the architect of modern computer systems. He was the first person to recognize the analogy between living organisms and computers [313, 314], and clearly understood the conceptual implications of computing machines which could repair and maintain themselves, as early as 1948.

Automatic systems maintenance has been an exercise in tool-building for many years. The practice of automating basic maintenance procedures has been commonplace in the Unix world, see section 7.8.1. Following von Neumann’s insights, the first theoretical work on this topic, addressing the need for convergence, appears in refs. [41, 55]. The biological analogy between computers and human immune systems has been used to inspire models for the detection of viruses, principally in insecure operating systems. This was first discussed in 1994 by Kephart of IBM in ref. [175] and later expanded upon by Forrest et al. [118, 291, 121, 119, 156, 290, 317, 155, 94, 120, 238, 93]. The analogy between system administration and immunology was discussed independently by Burgess in [43, 44], in the wider context of general system maintenance. References [44, 42] also discuss how computer systems can be thought of as statistical mechanical systems, drawing on a wide body of knowledge from theoretical physics. Interestingly, ref. [44] and ref. [291], which appeared slightly earlier, point out many of the same ideas independently, both speculating freely on the lessons learned from human immunology, though the latter authors do not seem to appreciate the wider validity of their work to system maintenance.

The idea of immunity requires a notion of convergence. Convergence means that maintenance work (the counter-force or antidote) tends to bring a host to a state of equilibrium, i.e. a stable state, which is the state we would actually like the system

6.8. POLICY AND CONFIGURATION AUTOMATION

227

to be in. The more maintenance that is performed, the closer we approach the ideal state of the system. When the ideal state is reached, maintenance work stops, or at least has no further effect. The reason for calling this the immunity model is that this is precisely the way that biological maintenance works. As long as there is damage or the system is threatened, a counter-force is mobilized, followed by a garbage collection and a repair team. There is a direct analogy between medicine and computer maintenance. Computer maintenance is just somewhat simpler.

Critics of the convergence approach to system administration argue that systems should be controlled absolutely and not allowed to simply meander into a stable state. Traugott has argued that many users are not disciplined enough to make convergence adequate for ensuring predictability, and that hosts should be managed absolutely by wiping out hosts that deviate from specification and rebuilding them step by step. This approach is called congruence rather than convergence [304]. Convergence proponents retort that convergence by ensuring sequences of commuting atomic operations is the only reliable way to achieve a guaranteeable state.

6.8 Policy and configuration automation

The idea of being able to automate the configuration from a high-level policy was the idea behind cfengine. Prior to cfengine, several authors had explored the possibilities for automation and abstraction without combining all the elements into an integrated framework [138, 114, 154, 14]; most of these were too specific or too low level to be generally useful.

Cfengine and PIKT [231] are system administration tools consisting of two elements: a language and a configuration engine. Together these are used to instruct and enable all hosts on a network about how to configure and maintain themselves. Rather than being a cloning mechanism, cfengine takes a broader view of system configuration, enabling host configurations to be built from scratch on classes of host.

Cfengine is about defining the way we want all the hosts on our network to be configured, and having them do the work themselves. PIKT is similar, but allows a mixture of declarative and imperative programming to define host policy. These and other tools are for automation and for definition. Because they include language for describing system configuration at a high level, they can also be used to express system policy in formal terms. The correct way to use cfengine is therefore to specify and automate system policy in terms of concrete actions. See section 7.11.

What make declarative languages different from scripting languages is the high level at which they operate. Rather than allowing complete programming generality, they usually provide a set of intelligent primitives for configuring and maintaining systems. An important feature of cfengine primitives is that they satisfy, as far as possible, the principle of convergence (see section 6.7). This means that a policy expressed by a cfengine program can easily be made to embody a convergent behavior. As a system inevitably drifts from its ideal state, a cfengine policy brings it back to that ideal state. When it reaches that state, cfengine becomes quiescent and does no more.

228 CHAPTER 6. MODELS OF NETWORK AND SYSTEM ADMINISTRATION

Policy-based administration works from a central configuration, maintained from one location. That central configuration describes the entire network by referring to classes and types of host. Many abstraction mechanisms are provided for mapping out the networks. The work of configuration and maintenance is performed by each host separately. Each host is thus given responsibility for its own state and the work of configuration is completely distributed. This means that a cfengine or PIKT policy, for instance, scales trivially with the number of hosts, or put another way, the addition of extra hosts does not affect the ability of other hosts to maintain themselves. Traffic on servers increases at most linearly with the number of hosts and the network is relied upon as little as possible. This is not true of network-based control models, for instance, where network resource consumption increases at least in proportion to the total number of hosts, and is completely reliant on network integrity (see section 6.3).

6.9 Integrating multiple OSs

Combining radically different operating systems in a network environment is a challenge both to users and administrators. Each operating system services a specific function well, and if we are to allow users to move from operating system to operating system with access to their personal data, we need to balance the convenience of availability with the caution of differentiation. It ought to be clear to users where they are, and what system they are using, to avoid unfortunate mistakes. Combining different Unix-like systems is challenge enough, but adding Windows hosts or MacIntosh technology to a primarily Unix-based network, or vice versa, requires careful planning [37]. Integrating radically different network technologies is not worth the effort unless there is some particular need. It is always possible to move data between two hosts using the universally supported FTP protocol. But do we need to have open file sharing or software compatibility?

6.9.1Compatible naming

Different operating systems use quite different naming schemes for objects. Until the late 1990s, Unix names could not be represented in MSDOS unless they were no longer than eight characters. Some operating systems did not allow spaces in filenames. Some assign and reserve special meanings for characters. The Internet URL naming scheme has created its own naming scheme for objects, which takes into account the service or communications channel used to access the object:

Channel://Object-name

File names are often, but not always, hierarchical. Windows introduced the notion of ‘drives’, for instance: A:, B:, C: and so on. The Internet Protocol family uses a hierarchical naming scheme encoded into IP addresses. The general problem of naming objects in distributed systems has great importance to being able to locate resources and express their locations. See ref. [71] for a discussion of this, for example.

6.9. INTEGRATING MULTIPLE OSs

229

Names can play a fundamental role in how we choose to integrate resources within a system. They address both cultural and practical issues.

6.9.2Filesystem sharing

Sharing of filesystems between different operating systems can be useful in a variety of circumstances. File-servers, which host and share users’ files, need to be fast, stable and capable machines. Workstations for end-users, on the other hand, are chosen for quite different reasons. They might be chosen to run some particular software, or on economic ground, or perhaps for user-friendliness. The MacIntosh has always been a favorite workstation for multi-media applications. It is often the preferred platform for music and graphical applications. Windows operating systems are cheap and have a wide and successful software base.

There are other reasons for wanting to keep an inhomogeneous (heterogeneous) network. An organization might need a mainframe or vector processor for intensive computation, whose disks need to be available to workstations for collecting data. There might be legacy systems waiting to be replaced with new machinery, which we have to accommodate in order to run old software, or development groups supporting software across multiple platforms. There are a dozen reasons for integration.

What about solutions? Most solutions to the file-sharing problem are software based. Client and server software is available for implementing network-sharing protocols across platform boundaries. For example, client software for the Unix NFS filesystem has been implemented for both Windows (PCNFS) and MacIntosh system 7/8/9. This enables Windows and MacIntosh workstations to use Unixlike hosts as file and printer servers, in much the same way as Windows servers or Novell Netware servers provide those services. These services are adequate for insecure operating systems, since there is no need to map file permissions across foreign filesystems. Windows is more of a problem, however. Windows ACLs cannot be represented in a simple fashion on a Unix filesystem.

The converse, that of making Unix files available to PCs, has the reverse problem. While NT is capable of representing Unix file permissions, Windows 9x and the MacIntosh are not. Insecure operating systems are always a risk in network sharing. The Samba software is a free software package which implements Unix file semantics in terms of the Windows SMB (Server Message Block) protocols.

Netware provides an NT client called NDS (Network Directory Services) for NT which allows NT domain servers to understand the Novell object directory model. Clearly, there is already filesystem compatibility between PC servers. Conversely, NT provides Netware clients and other server products can be purchased to provide access to AS/400 mainframes. Both Novell and NT provide MacIntosh clients, and MacIntosh products can also talk to NT and Unix servers. GNU/Linux has made a valiant attempt to link up with most existing sharing protocols on Unix, PCs and Apple hosts.

Mechanisms clearly exist to implement cross-platform sharing. The main question is, how easy are these systems to implement and maintain? Are they worth the cost in time and money?

230 CHAPTER 6. MODELS OF NETWORK AND SYSTEM ADMINISTRATION

6.9.3User IDs and passwords

If we intend to implement sharing across such different operating systems as Unix and Windows, we need to have common usernames on both systems. Crossplatform user authentication is usually based on the understanding that username text can be mapped across operating systems. Clearly numerical Unix user IDs and Windows security IDs cannot map meaningfully between systems without some glue to match them: that glue is the username. To achieve sharing, then, we must standardize usernames. Unix-like systems often require usernames to be no more than eight characters, so this is a good limit to keep to if Unix-like operating systems are involved or might become involved.

Principle 36 (One name for one object II). Each user should have the same unique name on every host. Multiple names lead to confusion and mistaken identity. A unique username makes it clear which user is responsible for which actions.

Common passwords across multiple platforms is much harder than disk sharing, and it is a much more questionable practice (see below).

6.9.4User authentication

Making passwords work across different operating systems is often a pernicious problem in a scheme for complete integration. The password mechanisms for Unix and Windows are completely different and basically incompatible. The new Mac OS Server X is based on BSD4.4 emulation, so its integration with other Unix-like operation systems should be relatively painless. Windows, however, remains the odd-one-out. Whether or not it is correct to merge the password files of two separate operating systems is a matter for policy. The user bases of one operating system are often different from the user bases of another. From a security perspective, making access easy is not always the right thing to do.

Passwords are incompatible between Windows and Unix for two reasons: NT passwords can be longer than Unix passwords and the form of encryption used to store them is different. The encryption mechanisms which are used to store passwords are one-way transformations, so it is not possible to convert one into the other. There is no escaping the fact that these systems are basically incompatible.

A fairly recent development is the invention of Pluggable Authentication Modules (PAM) in Solaris, and their subsequent adopting in other flavors of Unix. The PAM mechanism is an indirection mechanism for exchanging or supplementing authentication mechanisms, for users and for network services, simply by adding modules to a configuration file /etc/pam.conf.

Instead of being prompted for a Unix password on login, users are connected to one or more password modules. Each module prompts for a password and grants security credentials if the password is correctly received. Thus, for instance, users could be immediately prompted for a Unix password, a Kerberos password and a DCE password on login, thus removing the necessity for a manual login to these extra systems later. PAM also supports the idea of mapped passwords, so that

EXERCISES

231

a single strong password can be used to trigger the automatic login to several stacked modules, each with its own private password stored in a PAM database.

This is a very exciting possibility, mitigated only by a conspicuous lack of documentation about how to write modules for PAM. PAM could clearly help in the integration of Unix with Windows if a module for Windows-style authentication could be written for Unix.

6.10 A model checklist

A model of system administration that encompasses cooperation and delegation must pass some basic tests.

What technologies are supported by the model?

What human practices are supported by the model?

Will the model survive a reinstallation or upgrade of the major hardware at our site?

Will the model survive a reinstallation or upgrade of the major software at our site?

Will the network and its productivity survive the loss of any component?

Do any of the solutions or choices compromise security or open any backdoors?

What is more important: user freedom or system well-being (convenience or security)?

Do users understand their responsibilities with regard to the network? (Do they need to be educated as part of the model?)

Have we observed all moral and technical responsibilities with respect to the larger network?

Is the system easy to understand for users and for system administrators?

Does the system function predictably and fail predictably?

If it fails one of these tests, one could easily find oneself starting again in a year or so.

Exercises

Self-test objectives

1.What are the objectives of computer management?

2.What is the difference (if any) between management and regulation?

232CHAPTER 6. MODELS OF NETWORK AND SYSTEM ADMINISTRATION

3.What is meant by an information model? What information needs to be modeled in a human–computer system?

4.What is a directory service?

5.What is meant by White Pages and Yellow Pages?

6.Describe the X.500 information model.

7.What are current popular implementations of directory services and how do they differ?

8.What is the main problem with Sun Microsystem’s Network Information Service (NIS) today?

9.What is meant by system infrastructure?

10.Argue for and against homogeneity of hardware and software in a computer network.

11.What is meant by load balancing?

12.What is an ad hoc network?

13.What is meant by a peer-to-peer network?

14.Explain why a peer-to-peer network is a client-server technology, in spite of what is sometimes claimed.

15.Explain what is meant by the star model of network management.

16.Describe the ability of the star model to cope with large numbers of machines.

17.How does intermittent connectivity (e.g. mobile communications) affect the ability of the star model to cope with large numbers of devices?

18.What is meant by a hierarchical topology?

19.What is meant by a mesh topology?

20.Describe the SNMP management model.

21.What is an MIB?

22.Describe the Jini system of device management. How does it differ from SNMP?

23.What are the main principles of stable infrastructure?

24.Explain the virtual machine model of human–computer networks.

25.What role does revision control play in system administration?

26.What is meant by a push model of host management? What is meant by a pull model?

EXERCISES

233

27.Describe the OSI model for network management.

28.What is TMN?

29.What does convergence mean in the context of system administration?

30.Describe the issues in integrating multiple operating systems.

Problems

1.Discuss why system homogeneity is a desirable feature of network infrastructure models. How does homogeneity simplify the issues of configuration and maintenance? What limits have to be placed on homogeneity, i.e. why can’t hosts all be exactly identical?

2.Draw an information hierarchy for your company, college or university. Use it to draw up a schema for building a directory service.

3.Explain what is meant by Traugott and Huddleston’s virtual machine view of the network. Compare this view of a computer system with that of a living organism, formed from many cooperating organs.

4.Compare the file system naming conventions in Windows and Unix. How are devices named? How are they referred to? Are there any basic incompatibilities between Unix and Windows names today?

5.Explain what is meant by the term convergence in configuration management. What are the advantages of convergence? Are there any disadvantages?

6.What is a directory service? What is the difference between a directory service and a name service?

7.Discuss how a directory service can bind together an organization.

8.In an administrative environment, it is often important to have the ability to undo changes which have been made. Discuss how you might implement a version control scheme for your system in which you could roll out and then roll back a system to a previous state. Describe how you would implement a scheme using

(a)A convergent policy declaration (e.g. cfengine).

(b)An imperative specification of steps (e.g. make and Perl).

9.Explain the difference between a push model and a pull model of system administration. What are the security implications of these and how well do they allow for delegation of responsibility in the network?

10.Discuss what special problems have to be solved in a heterogeneous network, i.e. one composed of many different operating systems.

11.Evaluate the cfengine language primitives: are these natural and sufficient for writing a policy that maintains any operating system? If not, what extra primitives are needed?

234CHAPTER 6. MODELS OF NETWORK AND SYSTEM ADMINISTRATION

12.What are the advantages of a central point of control and configuration in network management? What are the disadvantages?

13.Suppose you have at your disposal four Unix workstations, all of the same type. One of them has twice the amount of memory. What would you use for DNS? Which would be a web server? Which would be an NFS server? Explain your reasoning.

14.These days, network communities consist of many PCs with large disks that utilize their space very poorly. Discuss a strategy for putting this spare disk capacity to use, e.g. as a possible backup medium. Consider both the practical and security aspects of your plan.

15.Formulate a plan for delegation of tasks within a system administration team. Create an information model that tries to prevent members of a team from interfering with one another, but at the same time gives no one administrator too much power or responsibility. If one of the team falls ill, will the team continue to function? Is the same true if a new member comes into the group? What personal considerations are important?

Chapter 7

Configuration and maintenance

We are now faced with two overlapping issues: how to make a computer system operate in the way we have intended, and how to keep it in that state over a period of time. Configuration management is the administration of state in hosts or network hardware. Host state is configured by a variety of methods:

Configuration text file

XML file

Database format (registry)

Transmitted protocol (ASN.1).

Configuration and maintenance are clearly related issues. Maintenance is simply configuration in the face of creeping decay. All systems tend to decay into chaos with time. There are many reasons for this decline, from deep theoretical reasons about thermodynamics, to the more intuitive notions above wear and tear. To put it briefly, it is clear that the number of ways in which a system can be in order is far fewer than the number of ways in which a system can be in a state of disorder, thus statistically any random change in the system will move it into disorder, rather than the other way around. We can even elevate this to a principle to emphasize its inevitability:

Principle 37 (Disorder). Systems tend to a state of disorder unless a disciplined policy is maintained, because they are exposed to random noise through contact with users.

Whether by creeping laziness or through undisciplined cooperation in a team [242, 270, 340, 106], poor communication or whatever, the system will degenerate as small errors and changes drive it forward. That degeneration can be counteracted by repair work which either removes or mitigates the errors.

236

CHAPTER 7. CONFIGURATION AND MAINTENANCE

Principle 38 (Equilibrium). Deviation from a system’s ideal state can be smoothed out by a counteractive response. If these two effects are in balance, the system will stay in equilibrium.

Equilibrium is a ‘fixed point’ of the system behavior. System administration is about finding such fixed points and using them to develop policy. The time scales over which errors occur and which repairs are made are clearly important. If we correct the system too slowly, it will run away from us; there is thus an inherent potential for instability in computer networks.

7.1 System configuration policy

So far our analysis of networks has been about mapping out which machines performed what function on the network (see chapter 3 and section 3.8.6). Another side of network setup is the policies, practices and procedures which are used to make changes to or to maintain the system as a whole, i.e. what humans decide as part of the system administration process.

System administration is often a collaborative effort between several administrators. It is therefore important to have agreed policies for working so that everyone knows how to respond to ‘situations’ which can arise, without working against one another. A system policy also has the role of summarizing the attitudes of an organization to its members and its surroundings and often embodies security issues. As Howell cites from Pogo [161], ‘We have met the enemy, and he is us!’ A system policy should contain the issues we have been discussing in the foregoing chapters. There are issues to be addressed at each level: network level, host level, user level.

Principle 39 (Policy). A clear expression of goals and responses prepares a site for future trouble and documents intent and procedure. Policy should be a protocol for achieving system predictability.

It is crucial that everyone agrees on policy matters. Although a policy can easily be an example of blind rule-making, it is also a form of communication. A policy documents acceptable behavior, but it should also document what response is appropriate in a crisis. Only then are we assured of an orchestrated response to a problem, free of conflicts and disagreements. What is important is that the document does not simply become an exercise in bureaucracy, but is a living guide to the practice of network community administration. A system policy can include some or all of the following:

Organization: What responsibility will the organization take for its users’ actions? What responsibility will the organization take for the users’ safety. Who is responsible for what? Has the organization upheld its responsibilities to the wider network community? Measures to prevent damage to others and from others.

Users: Allowing and forbidding certain types of software. Rigid control over space (quotas) or allow freedom, but police the system with controls. Choice of