|
Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems.
|
Dariusz R. Kowalski, Mariam Momenzadeh, Alexander A. Shvartsman
|
| |
|
|
Abstract
A fundamental problem in distributed computing is performing a set of tasks despite failures and delays. Stated abstractly, the problem is to perform N tasks using P failure-prone processors. This paper studies the efficiency of emulating shared-memory task-performing algorithms on asynchronous message-passing processors with quantifiable message latency. Efficiency is measured in terms of work and communication, and the challenge is to obtain subquadratic work and message complexity. While prior solutions assumed synchrony and constant delays, the solutions given here yields subquadratic efficiency with asynchronous processors when the delays and failures is suitably constrained. The solutions replicate shared objects using a quorum system, provided it is not disabled. One algorithm has subquadratic work and communication when the delays and the number of processors, K, owning object replicas, are O(P[0.41]). It tolerates [K-1/2] crashes. It is also shown that there exists an algorithm that has subquadratic work and communication and that tolerates o(P) failures, provided message delays are sublinear.
|
| |
|
|
| |
|
|
Cooperative computing with fragmentable and mergeable groups.
|
Chryssis Georgiou, Alexander A. Shvartsman
|
| |
|
|
Abstract
This work considers the problem of performing a set of N tasks on a set of P cooperating message-passing processors (P ≤ N). The processors use a group communication service (GCS) to coordinate their activity in the setting where dynamic changes in the underlying network topology cause the processor groups to change over time. GCSs have been recognized as effective building blocks for fault-tolerant applications in such settings. Our results explore the efficiency of fault-tolerant cooperative computation using GCSs. The original investigation of this area by (Dolev et al., Dynamic load balancing with group communication, in: Proc. of the 6th International Colloquium on Structural Information and Communication Complexity, 1999) focused on competitive lower bounds, non-redundant task allocation schemes and work-efficient algorithms in the presence of fragmentation regroupings. In this work we investigate work-efficient and message-efficient algorithms for fragmentation and merge regroupings. We present an algorithm that uses GCSs and implements a coordinator-based strategy. For the analysis of our algorithm we introduce the notion of view-graphs that represent the partially-ordered view evolution history witnessed by the processors. For fragmentations and merges, the work of the algorithm (defined as the worst case total number of task executions counting multiplicities) is not more than min{N ċ f + N, N ċ P}, and the message complexity is no worse than 4(N ċ f + N + P ċ m), where f and m denote the number of new groups created by fragmentations and merges, respectively. Note that the constants are very small and that, interestingly, while the work efficiency depends on the number of groups f created as the result of fragmentations, work does not depend on the number of groups m created as the result of merges. |
| |
|
|
| |
|
|
Performing Work with Asynchronous Processors: Message-Delay-Sensitive Bounds
|
D. Kowalski and A.A. Shvartsman
|
| |
|
|
Abstract
This paper considers the problem of performing tasks in asynchronous distributed settings. This problem, called Do-All, has been substantially studied in synchronous models, but there is a dearth of efficient algorithms for asynchronous message-passing processors. Do-All can be trivially solved without any communication by an algorithm where each processor performs all tasks. Assuming p processors and t tasks, this requires work Θ(p . t). Thus, it is important to develop subquadratic solutions (when p and t are comparable) by trading computation for communication. Following the observation that it is not possible to obtain subquadratic work when the message delay d is substantial, e.g., d = Θ(t), this work pursues a message-delay-sensitive approach. Here, the upper bounds on work and communication are given as functions ofp, t, and d, the upper bound on message delays, however, algorithms have no knowledge of d and they cannot rely on the existence of an upper bound on d. This paper presents two families of asynchronous algorithms achieving, for the first time, subquadratic work as long as d = o(t). The first family uses as its basis a shared-memory algorithm without having to emulate atomic registers assumed by that algorithm. These deterministic algorithms have work O(tp[ε] + p d [t/d][ε]) for any e > 0. The second family uses specific permutations of tasks, with certain combinatorial properties, to sequence the work of the processors. These randomized (deterministic) algorithms have expected (worst-case) work O(t log p + p d log(2 + t/d)). Another important contribution in this work is the first delay-sensitive lower bound for this problem that helps explain the behavior of our algorithms: any randomized (deterministic) algorithm has expected (worst-case) work of Ω(t + pd log[d+1] t). |
| |
|
|
| |
|
|
Distributed Cooperation and Adversity: Complexity Trade-Offs
|
Chryssis Georgiou, Alexander Russell, Alexander A. Shvartsman
|
| |
|
|
Abstract
The problem of cooperatively performing a collection of tasks in a decentralized setting where the computing medium is subject to adversarial perturbations is one of the fundamental problems in distributed computing. Such perturbations can be caused by processor failures, unpredictable delays, and communication breakdowns.(i)~failure-sensitive bounds for distributed cooperation problems for synchronous processors subject to crash failures.These research results are motivated by the earlier work of the third author with Paris C. Kanellakis at Brown University. |
| |
|
|
| |
|
|
Efficient Gossip and Robust Distributed Computation
|
Ch. Georgiou, D. Kowalski and A.A. Shvartsman
|
| |
|
|
Abstract
This paper presents an efficient deterministic gossip algorithm for p synchronous, crash-prone, message-passing processors. The algorithm has time complexity T = O(log2p) and message complexity M = O(p1+ε), for any ε > 0. This substantially improves the message complexity of the previous best algorithm that has M = O(p1.77), while maintaining the same time complexity.The strength and utility of the new result is demonstrated by constructing a deterministic algorithm for performing n tasks in this distributed setting. Previous solutions used coordinator or checkpointing approaches, immediately incurring a work penalty Ω(n + f ċ p) for f crashes, or relied on strong communication primitives, such as reliable broadcast, or had work too close to the trivial Θ(p ċ n) bound of oblivious algorithms. The new algorithm uses p crash-prone processors to perform n similar and idempotent tasks so long as one processor remains active. The work of the algorithm is W = O(n + p ċ min{f + 1, log3p}) and its message complexity is M = O(fpepsiv; + p min{f + 1, log p}), for any ε > 0. This substantially improves the work complexity of previous solutions using simple point-to-point messaging, while "meeting or beating" the corresponding message complexity bounds.The new algorithms use communication graphs and permutations with certain combinatorial properties that are shown to exist. The algorithms are correct for any permutations, and in particular, the same expected bounds can be achieved using random permutations. |
| |
|
|
| |
|
|
GeoQuorums: Implementing Atomic Memory in Mobile Ad Hoc Networks
|
Shlomi Dolev, Seth Gilbert, Nancy A. Lynch, Alexander A. Shvartsman, Jennifer L. Welch
|
| |
|
|
Abstract
We present a new approach, the GeoQuorums approach, for implementing atomic read/write shared memory in ad hoc networks. Our approach is based on abstract nodes associated with certain geographic locations. We assume the existence of focal points, geographic areas that are normally populated. by mobile hosts. For example, a focal point may be a road junction, a scenic observation point, or a water resource in the desert. Mobile hosts that happen to populate a focal point participate in implementing shared atomic put/get objects, using a replicated state machine approach. These objects are then used to implement atomic read/write operations. The GeoQuorums algorithm defines certain intersecting sets of focal points, known as quorums. The quorum systems are used to maintain the consistency of the shared memory. We present a mechanism for changing quorum systems on the fly, thus improving efficiency. Overall, the new GeoQuorums algorithm efficiently implements read and write operations in a highly dynamic, mobile network. |
| |
|
|
| |
|
|
RAMBO II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks
|
S. Gilbert, N. Lynch and A. Shvartsman
|
| |
|
|
Abstract
This paper presents a new algorithm implementing reconfigurable atomic read/write memory for highly dynamic environments. The original RAMBO algorithm, recently developed by Lynch and Shvartsman [15, 16], guarantees atomicity for arbitrary patterns of asynchrony, message loss, and node crashes. RAMBO II implements a different approach to establishing new configurations: instead of operating sequentially, the new algorithm reconfigures aggressively, transferring information from old configurations to new configurations in parallel. This improvement substantially reduces the time to establish a new configuration and to remove obsolete configurations. This, in turn, substantially increases fault tolerance and reduces the latency of read/write operations when the network is unstable or reconfiguration is bursty. This paper presents RAMBO II, a correctness proof, and a conditional analysis of its performance. Preliminary empirical studies illustrate the advantages of the new algorithm. |
| |
|
|
| |
|
|
Work-Competitive Scheduling for Cooperative Computing with Dynamic Groups
|
Ch. Georgiou, A. Russell and A. Shvartsman
|
| |
|
|
Abstract
The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possibility that groups of processors become disconnected (and, perhaps, reconnected) during the computation. The efficiency of task-performing algorithms is often assessed in terms of work: the total number of tasks, counting multiplicities, performed by all of the processors during the computation. In general, the scenario where the processors are partitioned into g disconnected components causes any task-performing algorithm to have work $\Omega(t\cdot g)$ even if each group of processors performs no more than the optimal number of $\Theta(t)$ tasks. Given that such pessimistic lower bounds apply to any scheduling algorithm, we pursue a competitive analysis. Specifically, this paper studies a simple randomized scheduling algorithm for p asynchronous processors, connected by a dynamically changing communication medium, to complete t known tasks. The performance of this algorithm is compared against that of an omniscient off-line algorithm with full knowledge of the future changes in the communication medium. The paper describes a notion of computation width, which associates a natural number with a history of changes in the communication medium, and shows both upper and lower bounds on work-competitiveness in terms of this quantity. Specifically, it is shown that the simple randomized algorithm obtains the competitive ratio $(1+\mathbf{cw}/e)$, where $\mathbf{cw}$ is the computation width and $e$ is the base of the natural logarithm ($e=2.7182\ldots$); this competitive ratio is then shown to be tight. |
| |
|
|
| |
|
|
The Complexity of Synchronous Iterative Do-All with Crashes
|
Ch. Georgiou, A. Russell and A. Shvartsman
|
| |
|
|
Abstract
The ability to cooperate on common tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search such as SETI to distributed simulation and multi-agent collaboration. Do-All, an abstraction of such cooperative activity, is the problem of performing N tasks in a distributed system of P failure-prone processors. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by all processors are counted. Work is ideally expressed as a function of N, P, and f, the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, P and f. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts facts the repeated use of Do-All such as found in typical algorithm simulations. Our f-sensitive analysis enables us to derive tight bounds for r-iterative Do-All work (that are stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work under "free" load-balancing, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. We give an improved analysis of an efficient message-passing algorithm. We also derive a tight and complete analysis of the best known Do-All algorithm for the synchronous shared-memory model. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors. |
| |
|
|
| |
|
|
Communication and Data Sharing for Dynamic Distributed Systems
|
N. Lynch and A. Shvartsman
|
| |
|
|
Abstract
This research direction aims to develop and analyze algorithms to solve problems of
communication and data sharing in highly dynamic distributed environments. The term
dynamic here encompasses many types of changes, including changing network topology,
processor mobility, changing sets of participating client processes, a wide range of
types of processor and network failures, and timing variations. Constructing distributed
applications for such environments is a difficult programming problem. In practice,
considerable effort is required to make applications resilient to changes in client requirements
and to evolution of the underlying computing medium. We focus our work
on distributed services that provide useful guarantees and that make the construction of
sophisticated distributed applications easier. The properties we study include ordering
and reliability guarantees for communication and coherence guarantees for data sharing.
To describe inherent limitations on what problems can be solved, and at what cost, the
algorithmic results will be accompanied by lower bound and impossibility results.
One example of our approach is the new dynamic atomic shared-memory service for
message-passing systems. We formally specified the service and developed algorithms
implementing the service. A system implementation is under development. The service
is reconfigurable in the sense that the set of owners of data can be changed dynamically
and concurrently with the ongoing read and write operations.We proved the correctness
of the implementation for arbitrary patterns of asynchrony and crashes, and we analyzed
its performance conditioned on assumptions about timing and failures. |
| |
|
|
| |
|
|
|
Computer
Science and Engineering
University of Connecticut
191 Auditorium Road, U-155
Storrs, Connecticut 06269-3155
|
Office: ITE227 Telephone
: +1 860 486 5570
Fax : +1 860 486 4817
Webmaster: |
Last
Updated: March 1, 2007 |
|
|