Reliable communication in the presence of failuresThe design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistant orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols is the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.
Document ID
19900020661
Acquisition Source
Legacy CDMS
Document Type
Contractor Report (CR)
Authors
Birman, Kenneth P. (Cornell Univ. Ithaca, NY, United States)
Joseph, Thomas A. (Cornell Univ. Ithaca, NY, United States)
Date Acquired
September 6, 2013
Publication Date
January 1, 1987
Subject Category
Computer Systems
Report/Patent Number
NASA-CR-183046NAS 1.26:183046Report Number: NASA-CR-183046Report Number: NAS 1.26:183046