JOURNAL
OF COMPUTER
AND SYSTEM SCIENCES
32, 15-78 (1986)
Reliable Computation
with Cellular Automata
PETER GAcs* Boston
University,
Boston,
Massachusetts
02215
Received December 1, 1983; revised July 30, 1985
We construct a one-dimensional array of cellular automata on which arbitrarily large computations can be implemented reliably, even though each automaton at each step makes an error with some constant probability. In statistical physics, this construction leads to the refutation of the “positive probability conjecture,” which states that any one-dimensional infinite particle system with positive transition probabilities is ergodic. Our approach takes its origin from Kurdyumov’s ideas for this refutation. To compute reliability with unreliable components, von Neumann proposed Boolean circuits whose intricate interconnection pattern (arising from the error-correcting organization) he had to assume to be immune to errors. In a uniform cellular medium, the error-correcting organization exists only in “software,” therefore errors threaten to disable it. The real technical novelty of the paper is therefore the construction of a self-repairing organization. 0 1986 Academic Press, Inc.
1. INTRODUCTION Can we avoid the accumulation of errors in arbitrarily large computations using unreliable components? The formal statement of this problem is based on the assumption that computing devices of arbitrary size must be built from a few elementary components. Each component makes errors with some frequency independent of the size of the device to be built. What are the architectures enabling us to deal with all combinations of errors likely to arise for devices of a given size? We will consider the case when a failure does not incapacitate the component permanently, only causes it, in the step when it occurs, to violate its rule of operation. In the following steps, the component obeys its rule of operation again, until the next error. The case of permanent component failure may be of greater practical importance, but it has not been investigated in the same generality. (However, see [ 101, for some elegant geometrical results in a similar model.) There are reasons to believe that many of the techniques developed for the case of the * This work was supported in part by the National Science Foundation Grants MCS 8110430, MCS 8104008, MCS 8302874, and DCR 8405270, and in part by DARPA Grant NOOO14-82-K0193 monitored by the ONR. Part of this research was done while the author was with the University of Rochester. This paper was requested for the Special STOC 83 Issue, but because of scheduling delays could not be included in that issue.
15 OO22OOOO/86$3.00 571/32/l-2
Copyright 0 1986 by Academic Press, Inc. All rights of reproduclmn in any form reserved.
16
PETER
GkS
transient failure will be applicable for the case of permanent failure. Another justification of this model is the interest it holds from the point of view of statistical physics. Reliable computation with unreliable components must use massive parallelism. Information temporarily stored anywhere during computation is subject to decay and therefore must be actively maintained. In 1953, von Neumann designed some reliable Boolean circuits. In his model, each component had some constant probability of failure. For a circuit consisting of n perfect components, he built a circuit out of O(n log n) unreliable components, computing the same function. (For an efficient realization of his idea, see [ 1 I.) In 1968, Taylor, using Gallagher’s low-density parity-check codes, constructed a Boolean circuit out of O(K) unreliable components and memory elements, capable of holding K bits of information for a polynomial number of steps. This construction was improved by Kuznietsov, using an idea of Pinsker, increasing the storage time to an expo