A Timeout-Based Message Ordering Protocol for a Lightweight Software Implementation of TMR Systems
Paul D. Ezhilchelvan†, Francisco V. Brasileiro‡, and Neil A. Speirs†
‡Departamento de Sistemas e Computação
Universidade Federal da Paraíba - UFPb/Campus II
Av. Aprígio Veloso s/n, Campina Grande - Pb, 58108-970, Brazil
Abstract
Replicated processing with majority voting is a well known method for achieving reliability
and availability - two key attributes of dependability. Triple Modular Redundant (TMR)
processing is the most commonly used version of that method. Replicated processing requires
that the replicas reach agreement on the order in which input requests are to be processed.
Almost all synchronous and deterministic ordering protocols published in the literature are
time-based in the sense that they require replicas’ clocks to be kept synchronised within some
known bound. We present a protocol for TMR systems that is based on timeouts and does not
require the clocks be kept in bounded synchronism. The protocol design exploits features that
are so characteristic of a TMR system. Consequently, the worst-case input ordering delay
turns out to be the smallest possible; i.e., no symmetric protocol that works only with
unsynchronised clocks, can provide a smaller worst-case delay.
Keywords and Phrases
Byzantine failures, fault tolerance, Triple Modular Redundancy (TMR), process replication,
agreement, message ordering, physical and logical clocks.
1. Introduction
We consider the task of designing and implementing a system that continues to provide
services in the presence of a bounded number of processor failures. One of the problems is
that a faulty processor might fail in an arbitrary or Byzantine [Pease80] manner. N modular
redundant (NMR) processing is one of the most effective ways to mask the effects of
Byzantine failures [Powell88]. Considering that non-faulty processors can generate
unforgeable signatures, the basic idea here is to use in place of a single processor a set of N,
N ≥ 3, processors to mask the failures of at most (N - 1)/2 processors. A Triple modular
redundant (TMR) system with N = 3 is the most practical version of an NMR system. The
three processors of a TMR system communicate with each other only by message passing and
are assumed to fail independent of each other. These processors execute every given task in
parallel and the results produced by them are subject to a majority vote. If the voted result is
regarded as the output from the TMR system, the system functions correctly provided (i) at
least two of its constituent processors are non-faulty, (ii) all non-faulty ones produce identical
results after executing any given task, and (iii) the voting performed is correct.
Process replication within a TMR system assumes the well understood state machine
model, for which precise requirements for supporting replicated processing are known
[Sch90]. One of the requirements is that non-faulty processors of a TMR system process
messages that are input to the system, in an identical order so that their results are identical.
This means that non-faulty processors must agree on the relative order in which each input
message is to be processed. In this paper, we develop a message ordering protocol for the
authenticated Byzantine fault model – perhaps the most well-known sub-class of the
Byzantine fault model – in which a faulty processor cannot undetectably forge a non-faulty
processor’s signature; the protocol also assumes the following synchronous environment: (i)
communication delays between non-faulty processors are bounded by a known constant; (ii)
processing and scheduling delays within a non-faulty processor can also be bounded; and, (iii)
each non-faulty processor has a local read-only physical clock whose running rate with
respect to the passage of real by a small and known bound.
Message ordering in the presence of failures requires ability to detect late and absent
messages. In the time-based approach [Lampo84], this capability is usually achieved by
synchronising non-faulty processors’ clocks within some known bound, e. Since non-faulty
processors need not have clocks with identical running rates, they should periodically execute
a synchronisation protocol (e.g., [Vasant88, Veris97]) to adjust their clock readings by
appropriate amounts. This synchronisation involves (i) periodic exchange of messages,
consuming network bandwidth, (ii) using data abstractions [Dolev84] to adjust the readings
of a physical clock (a read-only object), and (iii) implementing amortisation techniques
[Schmu90]. An alternative to using synchronised clocks (and therefore to having to keep the
clock differences within a known bound) is to use timeouts and to employ the
(unsynchronised) physical clocks only for measuring timeouts. We will adopt this timeout-
based approach for designing the TMR message ordering protocol.
In asynchronous systems, where the bounds on transmission, processing and scheduling
delays cannot be known with certainty, timeouts are commonly employed for dealing with
failures [Chandra96, Fetzer97] so that non-faulty processors can order messages identically in
a fault-tolerant manner. However, for synchronous systems, the time-based approach is
generally favoured. Practical systems, such as MARS [Kopetz89] and Air Traffic Control
System [Cristian96], have taken the time-based approach. [Echtle87] and AMp [Veris91] are
some of the early exceptions to this time-based trend in the synchronous context. AMp is a
timeout-based, fault-tolerant message ordering protocol, designed and implemented for
commercial applications. It provides the same message ordering guarantees as our protocol in
1Real time is measured in an assumed Newtonian time frame that cannot be directly observed.
a general n-processor system but assumes a benign fault model where processors either crash
or occasionally omit to produce responses. Our assumption of authenticated Byzantine faults
is a weaker fault model and, as argued in [Pease80], any further weakening of this fault model
would make the desired form of message ordering impossible to achieve in a three-processor
One of the designers of AMp later analysed time vs. timeout-based approaches which led
to the following conclusion that has both theoretical and practical significance [Veris96]:
though the synchronous, timeout-based protocols cannot be perfect substitutes for their time-
based counterparts in all circumstances, they can however provide attractive alternatives in a
number of application settings. This conclusion is based on two observations: timeout-based
protocols are less precise in preserving the temporal ordering of events (the order in which
events occur according to Newtonian time); however, the precision they offer is sufficient in
many settings where external events are being interpreted to the system by slow functioning
interface devices such as valves or by cyclic processes operating with large periodicity; for
example, a cyclic process that observes sensor changes every 10 ms, will interpret and report
all changes that occurred in the last 10 ms period to have occurred at the ‘same time’; this
means that, as far as the computing and ordering processes are concerned, two external events
occur either at the same time or at least 10 ms apart, and that any protocol with precision
smaller than 10 ms will be equally useful. Secondly, timeout-based protocols are faster under
certain ‘favourable’ conditions. In the case of our protocol, these favourable conditions
(derived in section 5) turn out to be: all processors are non-faulty; and, the maximum
difference (λ) within which non-faulty processors receive a given input from the environment
and the actual maximum transmission delay (da) that currently holds within the system, are
such that λ=< 2(e- da). λ is typically defined as tightness [Veris00] and will be small when processors receive inputs (from the system environment) via a broadcast medium. We
therefore believe the protocol presented here enhances the tool-kit available for a TMR
system builder. Further, our implementation experience [Speirs93, Brasil95] suggests that
building a TMR system using our protocol is easier as clocks need not be synchronised.
The rest of the paper is organised as follows. Section 2 describes the structure of the TMR
system, the basic assumptions and the message ordering requirements. Section 3 develops and
presents the protocol. Section 4 describes the design efforts involved in keeping the ordering
delays as small as possible. Section 5 presents proofs of correctness and also proves that no
(symmetric) protocol that works only with unsynchronised clocks can guarantee smaller
worst-case ordering delays. Section 6 concludes the paper.
2. System Description and Assumptions
We will suppose that the TMR system is made up of processors named Pi, Pj and Pk.
These processors are uniquely ordered and the ordering is known to them. Each processor is
connected directly to the other two processors of the system by internal links. Also, it is
connected to the ‘outside world’ or the system environment from which input messages are
received and output messages are sent to. Figure 1 shows a TMR system whose processors
are connected to the environment via a bus. The unit (shown as a black square in the figure)
that connects a processor to the bus is called the network attachment controller (NAC).
Assumption 1: Within a TMR system, at least 2 out of 3 processors are non-faulty and never Assumption 2: The internal links that connect processors never fail. (Alternatively, a link
failure can be thought of as the failure of one of the two processors connected by that link.)
Processors are reliably connected to the system environment. We further assume that a NAC
does not allow the attached processor to use the bus continuously for a long time, and thereby
it contains the effects of the “babbling idiot” syndrome: the faulty processor may generate
messages randomly and try to transmit them all onto the bus, thus preventing other processors
Assumption 3: Each processor can sign the messages it sends, and authenticate the signed
messages it receives [Rivest78, Tsudik92] such that:
A non-faulty processor’s signature for a given message is unique and cannot
Any attempt to alter the contents of a non-faulty processor’s signed message is
detected by any other non-faulty processor.
In this paper we adopt the style of writing real time values in Greek letters and clock time
values in italicised lower case Roman letters; the term ‘clock’ will always refer to a
Assumption 4: If a non-faulty Pi prepares and transmits m at real time τi, every non-faulty destination Pj will receive m at real time τj, τi ≤ τj < (τi + δ), where δ, δ > 0, is known. δ
bounds the message queuing delays at the sending and receiving processors, and the
propagation delay from the sender to receivers. A violation of this assumption between a pair
of processors means that one of the processors is faulty.
Assumption 5: A non-faulty Pi’s clock measures an interval x in a real time interval x(1 + ρi), where |ρi| ≤ ρ and ρ is a known positive constant. Assumption 6: Processing and scheduling delays are bounded and known for a non-faulty
processor; more precisely, a non-faulty processor: i) performs a local computation (e.g.,
processing a received message) within a bounded and known amount of time; and, ii)
schedules a computational task within a bounded and known amount of time.
Remarks: The last three assumptions make the system context a synchronous one. In
asynchronous models, however, these assumptions can be violated infinitely often even
within a non-faulty processor and consequently the order protocol will terminate only when
such violations cease for a sufficiently long time (see [Chandra96, Cristian99]); an exception
is the timed asynchronous model of [Cristian99] which does not permit violation of
assumption 5 within a non-crashed processor. In our case, a violation of any of these three
assumptions within a processor would render the processor faulty; consequently, the order
protocol is always guaranteed to terminate in a known and bounded amount of time, provided
the number of faulty processors is restricted to one (assumption 1).
2.1 Input Message Ordering
Non-faulty processors can receive the input messages in different order. Referring to figure
1, Pi can receive µ1 followed by µ2 and Pk in the reverse order. So, when a processor
receives an input message from the system environment, it must first decide the processing
order for that message. For that, it forms an internal message m that contains the received
input message in the data field m.µ and sends m to all other processors in the system using the
order protocol that guarantees the following two conditions:
Validity: If a non-faulty Pi forms and sends m, all non-faulty processors (including Pi) will
decide on an order for m, within a known and bounded real time interval ∆; Unanimity: If a non-faulty Pi decides on an order for a given m, then every non-faulty Pj
These conditions ensure that an input µ supplied to a non-faulty P gets identically ordered
in the form of P’s internal message m, m.µ = µ, by all non-faulty processors of the TMR
system within ∆. It is possible that the faulty processor generates and orders an internal
message for a fictitious input or for an old input message that has already been processed by
non-faulty processors. So, to ensure that no duplicate or fictitious input gets processed, an
ordered internal m, m.µ = µ, can initiate the (ordered) processing of input µ only after the µ
contained in it, has been verified to be valid, i.e. not a fictitious or duplicate input. We refer
the reader to [Shri92] for details on how a non-faulty processor (i) derives an ordered stream
of inputs from an ordered stream of internal messages, and (ii) generates a voted output from
the results it computes. This paper will focus only on the (timeout-based) ordering of internal
messages. Note that since the TMR system can have at most one faulty processor, an input µ
must be supplied to at least two processors within the system; we will assume that every µ is
sent to all three processors in the TMR system.
3. The protocol
The protocol has three aspects to it: (i) message counters maintained by processors, (ii)
message diffusion that enables non-faulty processors to receive each other’s messages, and
(iii) timeliness checks to assess the timeliness of a received message. 3.1. Message Counter and Diffusion
Each processor within the TMR system, say Pi, maintains a counter called the message counter, denoted as MCi, which holds an integer value and is initialised to INIT_VAL (usually 1) when the system is first started. For every given input µ received from the environment, Piforms an internal m in the following manner: the data field m.µ is set to µ, the originator field m.O to i; the timestamp field m.TS to Mci; further, MCi is incremented by 1. Incrementing of MCi soon after timestamping m ensures that any message Pi later forms gets a timestamp
larger than m.TS. An internal message m formed is accepted with its copy being entered into a message list called the acceptedi and is then sent to other processors using the send(m)
primitive. An invocation of this primitive discards m if m.S contains two signatures, and
performs the following actions if m.S contains no or one signature: it generates the signature of Pi for m, appends the generated signature to any signature that may already be in m.S and transmits the resulting m to processors in the system that have not signed m. Pi receives
internal messages from other processors by executing receive(m) primitive which blocks until
it can return an authentic m that is received via an internal link and has the authentic signatures of one or two distinct processors other than Pi.
Whenever Pi receives m, it checks whether m is timely. (Procedures for checking the
timeliness of a received message are described in the next sub-section.) If m is found untimely, it is discarded; otherwise, the following three actions are performed: (i) MCi is set to the maximum of {MCi, m.TS + 1}, (ii) m is accepted by entering a copy of m into acceptedi; and, (iii) m is ‘diffused’ by executing the primitive send(m). Thus, any accepted
message that is received with one signature is ‘diffused’ to the processor that appears not to
have ‘seen’ the message. As there can be at most one faulty processor in the TMR system, diffusion of single-signed messages ensures that if m of a faulty processor enters acceptedi of a non-faulty Pi, then the other non-faulty processor receives m at least once.
From the description above, it is obvious that: (i) a message sent or received by a non-
faulty processor will have either one or two authentic signatures; and (ii) every sent message,
but no received message, carries the host processor’s signature. For a given message m,
path(m) is defined as the ordered sequence of processors that have signed m. Thus, if m is a double-signed message that is formed by Pj and diffused by Pk, then path(m) = Pj:Pk. The
first processor in path(m) is called the originator of m and the last processor the immediate sender of m. Note that the originator and the immediate sender of m are one and the same if
m is single-signed. Two paths are said to intersect if they contain one or more processors
commonly in them. 3.2. Timeliness Checks
These checks enable a processor to determine the timeliness of a received m. Before
presenting them, we will define a clock time interval d such that by measuring d in its local
clock a non-faulty processor is guaranteed to measure a real time interval of at least δ
duration, i.e., d ≥ δ/(1 - ρ); d is known to, and identical for all non-faulty processors of the
system. We will assume, for simplicity, that a processor takes zero time to execute any
instruction of the protocol and the send(m) and receive(m) primitives. (Realising this
assumption in practice will require an increase in the value of d, which is possible as the
protocol does not impose any upper bound on the value of d.)
Fixing the processor Pi in the TMR system to be non-faulty, we will describe the checks
by which Pi determines the timeliness of a received message. Suppose that Pi receives a message m at its local clock time ti. There can arise one of three possible situations depending on the value of MCi at ti: MCi < m.TS or MCi = m.TS or MCi > m.TS. If MCi < m.TS or MCi = m.TS then m is a ‘future’ or a ‘present’ message respectively and is considered by Pi as timely; if, on the other hand, MCi is already larger than m.TS when m is being received, then
m is a ‘past’ message, and its timeliness should be judged based on how much time has elapsed since MCi first became larger than m.TS. So, timeliness checks are needed only for messages received with past timestamps. Let us suppose that the m received at ti is a past
Let m’, be the message whose acceptance by Pi caused MCi to become larger than m.TS
for the first time. That is, just before Pi accepted m’, MCi ≤ m.TS. Note that Pi could have
either received m’ from another processor or itself formed and sent m’; and also that m’.TS ≥ m.TS because acceptance of m’ has caused MCi to exceed m.TS. Let Pi accept m’ at its clock time ti’, ti’ < ti. For m to be considered timely, (ti - ti’) must be less than a certain
bound, called the timeliness bound, whose length depends on path(m’) and path(m). Table 1
summarises these timeliness bounds for various combinations of path(m’) and path(m). For example, if m’ is a single signed message from Pk and m is a single signed message from Pj, then the entry B2 (row B, column 2) indicates that (ti - ti’) must be less than 2d for Pi to
consider m timely. We will denote an entry of table 1 by treating the table as a matrix: Table1[Pr, Pc] denotes the value in the row corresponding to the path Pr and in the column corresponding to the path Pc.
Pk:Pj (E) d
Table 1: Pi’s timeliness bounds for a past m given that MCi first exceeded m.TS due to accepting m’.3.3. Protocol Description
To perform the timeliness checks on received messages, non-faulty Pi maintains path
counters, denoted as PCi[p], for every path p through which Pi can receive a message. Thus, Pi maintains four path counters, one for each p ∈ {Pk, Pj, Pj:Pk, Pk:Pj}. These counters have
integer values which are initialised to (INIT_VAL - 1) when the system is formed. A path counter PCi[p] is set to T, to indicate that any m, m.TS ≤ T and path(m) = p, to be received
thereafter, must not be considered timely. A primitive update(path: p, timestamp: T) is used which, upon being invoked, sets PCi[p] = maximum of {PCi[p], T}. The instruction
“schedule op at t” schedules the operation op at (local) clock time t, if t is a future time when the instruction is executed. Whenever Pi forms and sends m’ or receives a timely m’, executions of the update primitive are scheduled to set PCi[p] = maximum of {PCi[p], m’.TS}, for every p ∈ {Pk, Pj, Pj:Pk, Pk:Pj} after the elapse of time intervals indicated in
Table1[path(m’), p]. With the path counters being updated at appropriate timing instances, the timeliness of a received m is checked simply by referring to PCi[path(m)]: m is timely only if m.TS > PCi[path(m)] when m was received.
We define PCi,min = minimum of {PCi[p], ∀p | p ∈ {Pk, Pj, Pj:Pk, Pk:Pj}}. Any m that
is received after PCi,min has become larger than or equal to m.TS is untimely irrespective of path(m) and cannot enter acceptedi. So, after PCi,min ≥ T, every m, m.TS ≤ T, that is already in acceptedi is said to have become ‘stable’ and can be safely ordered. Stable messages with a given timestamp T, are grouped together into a list called stablei(T)for ordering. The entries of stablei(T) are first stripped of the signatures they have, and then duplicate entries are
Two messages m and m’ are said to be spurious if m.O = m’.O, m.TS = m’.TS and
m.µ ≠ m’.µ. Spurious messages are generated when the faulty processor gives the same timestamp to distinct messages m and m’ it forms. Spurious messages in stablei(T) are discarded. The stablei(T) will now contain at most one message originating from a given
processor. If it contains more than one message, they are ordered according to their source processors. Messages of stablei(T) are ordered only after the contents of all stablei(T’),
T’ < T, have been ordered. This is ensured by using the stability counterSC which is initialised to (INIT_VAL - 1). The protocol for Pi is presented in figure 2.
do /* Broadcast process: handles external messages */
get µ from outside the system; m.µ = µ; m.TS = MCi; m.O = i; MCi = MCi + 1; t = clocki; for every p ∈ {Pk, Pj, Pj:Pk, Pk:Pj} do
schedule update(PCi[p], m.TS) at t + Table1[path(m), p];
end for send(m); acceptedi = acceptedi ∪ {m}; end do /* Broadcast process */ || do /* Diffuse process: handles internal messages */
receive(m); if (m.TS ≤ PCi[path(m)]) then discard(m) else
MCi = maximum of {MCi, m.TS + 1}; t = clocki; for every p ∈ {Pk, Pj, Pj:Pk, Pk:Pj} do
schedule update(PCi[p], m.TS) at t + Table1[path(m), p];
end for send(m); acceptedi = acceptedi ∪ {m}; end do /* Diffuse process */ || do /* Order process: identifies and orders stable messages */
PCi,min = minimum of {PCi[p], ∀=p | p ∈={Pk, Pj, Pj:Pk, Pk:Pj}}; while SCi ≤ PCi,min do
SCi = SCi + 1; stablei = {m | m ∈ acceptedi and m.TS = SCi}; acceptedi = acceptedi - stablei; /* remove stable messages */ if stablei ≠ ∅ then
distinct = ∅; for every m in stablei do
m.S = ∅; /* strip off signatures from m */if m ∉ distinct then insert m in distinct end if
end for stablei = distinct; spurious = {m, m’|m=∈=stablei ∧=m’∈=stablei ∧=m’.O=m.O ∧==m’µ≠m.µ}; stablei = stablei - spurious; /* remove spurious messages */ if stablei≠∅ then ∀=m|m=∈=stablei deliver m ordered by m.O end if end while end do /* Order process */
Figure 2. The Timeout-based Protocol for Message Ordering.
4. Estimating Timeliness Bounds
Central to protocol correctness is the accurate estimation of safe timeliness bounds which
also need to be small if the protocol is to order messages fast. This section presents the
arguments for, and the rationale behind the derivation of the smallest safe bounds. It is done
in three parts. First, the basic condition that should be met for the protocol to be correct is
stated. We then analyse how meeting this condition is made difficult by various possible
failure modes of the faulty processor; this analysis leads to the identification of the safety requirement. The third part shows in two stages how the timeliness bounds used in the
protocol are derived to be safe. First, the bounds are derived in the most straightforward
manner; next, the contexts where bounds can be safely reduced are identified and the bounds
reduced. Note that the smaller the timeliness bounds used, the earlier the accepted messages
are stabilised and ordered, resulting in smaller ∆ (the upper bound on message ordering
4.1. The Basic Condition
The following condition must be met for the protocol to be correct:
Unanimous acceptance: m enters the accepted list of a non-faulty processor if and only if m
or a message equivalent to m entersthe accepted list of every other non-faulty processor in
A message equivalent to m is denoted as equiv(m) and is defined as any other message that
differs from m only in the signature field; that is, equiv(m).µ = m.µ, equiv(m).O = m.O,
equiv(m).TS = m.TS and equiv(m).S ≠ m.S. Since a non-faulty processor does not accept an
incoming message containing its own signature, it accepts at most one equiv(m) after having
accepted m. Note that equiv(m) and m become identical once their signatures are stripped off
(i.e., once the signature fields are set to Φ).
It can be noted from the protocol description that there exists a local time when a non-
faulty processor stops accepting messages of certain timestamp value. When that time comes
for a given timestamp value T, an already accepted message with timestamp T becomes stable. If the basic condition holds, then any two non-faulty processors, say, Pi and Pk,
construct an identical stable(T) of signature-stripped and non-spurious messages for every
given T, T ≥ INIT_VAL. Given that processor ordering is unique and known to non-faulty
processors, the order determined by them for a (signature-stripped) m will be identical.
4.2. Effects of Failures
A faulty processor, say Pj, can fail in the following ways to prevent the above basic
F1 (impersonating a non-faulty processor): Pj generates a signed message on behalf of a
non-faulty processor and attempts to deceive the other non-faulty processor into accepting the forged message. For example, Pj fabricates a message with Pk’s (forged) signature and diffuses it to Pi, with path(m) = Pk:Pj. If Pi were to accept m, it will not diffuse m to Pk as
the accepted message is already double-signed. However, by assumption A3, such attempts by faulty Pj are detectable and hence failures of this type are not a threat to meeting the basic
F2 (delayed sending of own messages): Pj arbitrarily delays the sending of a message m it
generates. A failure of this type can lead to one of the following outcomes: (i) both the non-
faulty processors find m untimely and hence both discard m; (ii) both find m timely and enter a copy into their respective accepted list; or (iii) one non-faulty processor, say Pi, finds it
timely and the other does not. Note that the first two outcomes do not pose any problem since
both the non-faulty processors are unanimous in accepting or not accepting m. The last
outcome lacks this unanimity and is depicted in figure 3(a). The situation of m being entered only in acceptedi and not in acceptedk, has to be dealt with if identical ordering is to be
Figure 3. (a) Pi accepts m but not Pk. (b) diffusion failure.
F3 (two-facing while sending own messages): Pj sends a properly signed m to say, Pi, and
sends to Pk either (a) an inauthentic version of m, or (b) nothing, or (c) a different, authentic message m’ (that is never sent to Pi). The effect of a two-facing failure is the same as the last
outcome of the previous category: a 1-signed message enters the accepted list of one non-
faulty processor and neither that message nor its equivalent ever enters the accepted list of the
F4 (failure during diffusion): Pj fails while diffusing a message m which it received from,
say, Pi as shown in figure 3(b). It can fail by altering the contents of m (call this failure type
F4.1) or by delaying the diffusion by an arbitrary amount of time (call this type F4.2). Assumption 3(b) reduces the impact of F4.1 by enabling Pk to detect Pj’s tampered message as inauthentic and discard the diffused message. F4.2 can result in Pk not receiving the
diffused message at all, or receiving it but finding it untimely. Thus, both F4.1 and F4.2 can prevent equiv(m) from entering acceptedk.
To deal with the problematic outcomes caused by failures of F2, F3, and F4 types, the
timeliness checks are required to be safe:
safety requirement: a non-faulty processor finds a received m timely if the immediate sender
Let us suppose that the timeliness checks meet the safety requirement. Consider the third
outcome caused by an F2 type failure (see figure 3(a)). Pi will diffuse m after it has entered it into acceptedi. By the safety property of timeliness checks, Pk is guaranteed to find the Pi’s diffused message timely; thus, equiv(m) is guaranteed to enter acceptedk. This argument also
shows that an F3 type failure is not a threat when timeliness checks are safe. Let us next consider figure 3(b), which depicts an F4 type failure. Since Pi, the originator as well as immediate sender of m, is non-faulty, the safety property guarantees that Pk finds m timely and that m enters acceptedk as well. Thus, if the timeliness checks are safe, the above-
mentioned basic condition is met and therefore identical ordering is achieved despite the
arbitrary failures of the faulty processor.
It can be observed from the protocol description that the value of a non-faulty processor’s
MC (message counter) is always one more than the largest timestamp of messages accepted
so far; also, that the value of any of its PC (path counter) can never exceed the largest
timestamp of the accepted messages. So, a non-faulty processor’s MC will be at least one
more than any of its PC. This means that when a received m is a future or present message, it
is guaranteed to be found timely. That is, the safety requirement is trivially met if the received
message is a present or future one. So, in what follows, we will show that the timeliness bounds used are safe when a non-faulty Pi finds a received m to be a past message.
4.3. Derivation of Timeliness Bounds
We will present our derivations in the context in which the bounds of Table 1 are
presented: Pi is non-faulty and accepts m’ at its clock time ti’ and receives m, m.TS ≤ m’.TS, at ti, ti > ti’; just before ti’, MCi ≤ m.TS, and at ti’ MCi > m’.TS. To keep the description
simple, we assume the following. (i) The minimum message transmission delay (as measured
by a non-faulty clock) is zero. (ii) The clocks of all non-faulty processors have identical
running rates. (This assumption is removed in subsection 4.3.3.) (iii) Unless stated explicitly, time is measured according to the clock of Pi. So, when we say an event happened at (time) t, it means that the event happened at time t according to Pi’s clock. (iv) Finally, the subscript i is dropped from ti and ti’, where the context is obvious.
Lemma 4.1. Given that Piaccepts m’ at t’, any non-faulty Pkwill have MCk > m’.TS before Proof: If Pk has signed m’, then MCk > m’.TS when it signed m’. So, MCk > m’.TS becomes true before t’ or at t’ if m’ has experienced zero transmission delay in reaching Pi from Pk. If Pk has not signed m’, it will receive the diffused m’ from Pi before t’ + d. Upon the reception of m’, if MCk is not already greater than m’.TS then m’ is either a present or a future message, and therefore must be accepted by Pk which will then set MCk > m’.TS.
Say, non-faulty Pk forms and sends m, m.TS ≤ m’.TS. By lemma 4.1, MCk becomes larger than m’.TS before t’ + d. So, m should have been sent by Pk before t’ + d. Even if m experiences the maximum delay of just less than d time, Pi should receive Pk’s single-signed m before t’ + 2d. It suggests that if Pi receives a single-signed message m at t such that
(t - t’) < 2d, it must consider m timely. This scenario is depicted in figure 4(a) where time
progresses from left to right and the labels “>m’.TS” and “=m.TS+1” indicate the earliest
instances when a non-faulty message counter exceeds m’.TS and when it becomes equal to
m.TS + 1, respectively; “<nd” labels an interval of length less than nd, for some integer n ≥ 1. On the time-line of Pk (in figure 4(a)), the timing instance labelled “=m.TS+1” cannot be on the right hand side of that labelled “>m’.TS” since m.TS ≤ m’.TS. So, Pi can receive a timely m from Pk at t, (t - t’) < 2d.
Figure 4. m.TS ≤ m’.TS. (a) Non-faulty Pk forms and sends m. (b) Pk diffuses Pj’s m.
Suppose that m originates from a faulty Pj and that Pk receives it within 2d time after
MCk > m’.TS has become true. (See figure 4(b).) Just like in figure 4(a) where Pi accepts Pk’s m because m arrives within 2d time after MCi > m’.TS, Pk must now consider Pj’s m
timely. (Note that a non-faulty processor cannot know whether another processor is non-faulty or faulty.) When Pk diffuses m, Pi must find the diffused message timely for the safety property to be met. From figure 4(b), Pi can receive the diffused m at t, t < t’ + 4d. So, Pi’s
timeliness check for accepting a double signed m is (t - t’) < 4d.
4.3.1. Motivation for reducing the bounds
The timeliness bounds of 2d for 1-signed messages and 4d for 2-signed messages are safe
for the protocol. Reducing these bounds further (without compromising safety) can lead to smaller ∆. To illustrate this, let us evaluate ∆ for the following scenario: Pi receives and accepts 1-signed m’ from Pk at time t’. Suppose that there is no m, m.TS ≤ m’.TS, left in the system for ordering. Non-faulty Pi deduces that it will never accept a 1-signed m and a 2-signed m at t’ + 2d and t’ + 4d respectively. So, if the 1-signed m’ from Pk had taken maximum transmission time to reach Pi, the delay (in real-time) for Pi to order m’ would be δ=+ 4d with ρ having been assumed to be zero; i.e., ∆ ≥ =δ +4 d. Suppose that we are able to show that it is enough for Pi to wait until t’ + 3d to deduce that no 1- signed or 2-signed m, m.TS ≤ m’.TS, will ever be accepted; this smaller waiting period reduces the ordering delay
for m’ to δ=+ 3d. Suppose also that we are able to show that the ordering delay for all types of m’ that Pi can possibly receive and accept, can be reduced as well, then ∆ itself can be
reduced, resulting in a faster ordering protocol. Below, we will explain the intuition behind
our attempts at reducing timeliness bounds.
Given that Pi has received and accepted m’ at t’, let the originator of m’ be Pk and m’ be
1-signed or 2-signed. Pi can now expect any of Pk’s 1-signed m, m.TS < m’.TS, to be received before t’+ d, if Pk is non-fa Pk sends the messages it forms, in the increasing order of message timestamps; another non-faulty Pi must receive the earlier
one within at most d after it has received the later one. So, the conclusion is: given that m’ is
accepted at t’, the timeliness bound for a 1-signed m such that m.TS ≤ m’.TS and the
originator of m’ ≡ originator of m, is d. Note that this bound is d smaller than the generic
bound we have earlier derived for a 1-signed m.
It can be observed that we have achieved the above reduction (i) by considering the
slowest possible behaviour of a message m from another non-faulty processor, with a hypothesis that Pi has already accepted m’, m’.TS ≥ m.TS; and, (ii) for the case where
path(m’) intersects path(m). It turns out that whenever (ii) holds, reduction is possible.
Intuitively, the timing behaviour of a potential m from another non-faulty processor can be
reasoned out more precisely, if the potential m and the accepted m’ (with m’.TS ≥ m.TS) had
been handled by the same processor. Because there are only three processors in the TMR
system, it is guaranteed that path(m’) intersects path(m) whenever m is double signed and m’ is not formed by Pi. This means that when the accepted m’ is not locally formed, Pi needs to
wait only for a smaller period (smaller by at least d) in order to be sure that no 2-signed m will any longer be accepted; so, even if Pi needs to wait until t’+ 2d for 1-signed messages, it
can start ordering m’ by t’+ 3d.
There are fourteen combinations of path(m’) and path(m) for which reduction is feasible.
They correspond to those entries in table 1 where the bounds are neither 2d nor 4d, and to the
entries D3 and E4 which have 2d for a double-signed m. In the next sub-section, we explain
the rationale behind the reduction by considering three representative path combinations; for
each, we also indicate the dual cases for which the explanation can be readily applied - thus
covering eight path combinations in total. For space reasons, we omit the explanations for the
remaining six combinations which are not dissimilar to the combinations considered. Note
that this omission does not leave the paper incomplete, as Section 5 proves that the protocol
is correct, i.e., all reductions applied are correct. (A reader who is not interested in the
reduction rationale but only in the overall correctness, can proceed straight to section 5.)
4.3.2. Rationale for reducing the bounds
We will make a default assumption that the immediate sender of m (taken mostly as Pk) is
non-faulty and name a particular path combination after its corresponding entry in table 1.
Case B1: path(m’) = path(m) = Pk. Since both m and m’ originate from non-faulty Pk, m.TS
and m’.TS cannot be equal. By hypothesis, m.TS cannot be larger than m’.TS; so, only m.TS < m’.TS is possible. So, Pk must have sent m first and then m’. (See figure 5(a).) Since
2Note that the safety requirement is void if the immediate sender of m is faulty.
message transmission can take as small as zero time and as large as just less than d time, (t - t’) < d. Similar argument applies also for the case D1, where path(m’) = Pj:Pk and path(m) = Pk. Here, the non-faulty Pk does not form m’ but receives it from Pj and diffuses it after setting its MCj > m’.TS. So, it must have sent m before diffusing m’ and therefore (t - t’) < d. Replacing Pk by Pj as the immediate sender of m (and therefore considering Pj as non-faulty) in these two cases leads to the bound of d for cases C2 (path(m’) = path(m) = Pj) and E2 (path(m’) = Pk:Pj and path(m) = Pj).
Case E1: path(m’) = Pk:Pj and path(m) = Pk. Both m and m’ originate from non-faulty Pk which must have sent m first and then m’ since m.TS < m’.TS. But Pi receives m’ first and then m. This can happen, as illustrated in figure 5(b), when Pk sends m and m’ close to each other and the transmission delay of m’ from Pk to Pj and then from Pj to Pi is less than the transmission delay of m from Pk to Pi. Since message delays between non-faulty processors are bounded by d,Pi must receive m before t’ + d. Hence the timeliness bound for this case is d. It also applies for the case D2 which is the dual of E1 with Pj being the non-faulty
Case B4: path(m’) = Pk and path(m) = Pk:Pj. Observe that Pj here is the non-faulty immediate sender of m and Pk can be faulty or non-faulty. Since Pi accepts m’ (by hypothesis), it will diffuse the message to Pj. Pj also diffuses m to Pi. So, there are two cases to consider: Pj receives the single-signed m from Pk either (i) before or (ii) after it receives the diffused m’ from Pi. The subcase (i) is shown in figure 6(a). Pj can receive the diffused m’ from Pi at any time before t’ + d. Even if it diffuses m just before receiving m’, Pi will
receive the diffused m before t’ + d + d; that is, (t - t’) < 2d.
Figure 6(b) illustrates the second subcase where Pj receives the single-signed m from Pk after it has received the double-signed m’ from Pi. Since Pj diffuses m to Pi, it must have found
the single-signed m timely. Suppose that it also finds the double-signed m’ timely. This means that MCj > m’.TS becomes true by the time Pj accepts the double-signed m’. When it
receives the single-signed m after having accepted the double-signed m’, it is in the same
situation as Pi in case E1 where path(m’) = Pk:Pj, path(m) = Pk and the timeliness bound is d. So, for the non-faulty Pj to find the single-signed m timely, it must have received m within d time after MCj > m’.TS became true; that is, the time elapsed between receiving m’ and m
must be less than d. From figure 6(b), (t - t’) < 3d is possible for this subcase. We now have to show that Pj does find the double-signed m’ timely, which is done as lemma A1 in
appendix A. The proof is based on the observation that the double-signed messages have
larger timeliness bounds than the single-signed ones; so, it is not possible for a non-faulty
processor to reject a double-signed message as untimely and then accept later a single-signed
message of smaller or equal timestamp as timely.
Choosing the largest of the bounds estimated for the two subcases, 3d becomes the timeliness bound for B4; this bound also applies for C3 where Pk is substituted for Pj as the non-faulty
Figure 6. Case B4. (a) Pj receives m before m’. (b) Pj receives m after m’.
4.3.3. Adjusting for non-identical clock rates
We now focus on our earlier assumption that non-faulty processors’ clocks have identical
running rates. When the readings of non-faulty clocks can drift apart with the passing of real
time, the differences accumulated over a period of time must be compensated. The way we do
this compensation is by increasing the value of d by the maximum difference which the non-
faulty clocks could develop during the ‘life-time’ of a non-faulty processor’s message within
the ordering protocol, i.e., during the maximum real-time within which a non-faulty
processor’s message gets ordered by every non-faulty processor in the system. Appendix B
presents proofs to show that δ/(1 - 5ρ) is the smallest and safe lower bound required on d to
compensate for the non-identical running rates of non-faulty clocks. Below, we present the
‘worst-case’ scenario which gives rise to this bound on d.
Suppose that non-faulty Pk’s clock is running fast with ρk = -ρ, and non-faulty Pi’s clock
is running slow with ρi = ρ. Let Pi form and send m’ at local clock time ti’, and let m’ be
transmitted in zero time; let Pk accept m’ at its clock time tk’. Since m’ is assumed to be transmitted in zero time, the clocks of Pi and Pk read ti’ and tk’ respectively at the same real
Let faulty Pj behave in the most disruptive manner in sending its m, m.TS ≤ m’.TS: it
forms m, delays its sending for a while, and sends only to Pi - the processor with the slow clock. Let Pi receive the single-signed m from Pj at the latest time for accepting such m: just before its clock time ti’ + 2d. So, Pi accepts m and diffuses double-signed m to Pk. For Pk to accept the 2-signed m from Pi, its clock, despite running fast, should not be reading past tk’ + 3d (see entry B3 of table 1, interchanging the suffixes k and i) when it receives that 2-signed m, even if the transmission from Pi to Pk is to take just less than δ=- the maximum permitted time (see assumption 4). So, d must be large enough to ensure that τ + 3d(1 + ρk) ≥ τ + 2d(1 + ρi) + δ, i.e., 3d(1 - ρ) ≥ 2d(1 + ρ) + δ; that is d ≥ δ/(1 - 5ρ).
5. Protocol Correctness and Relative Performance Lemma 1: The protocol guarantees the validity condition with ∆== 4d(1 + ρ), provided d ≥ δ/(1 - 5ρ). Proof: Consider an execution of the protocol in which Pi and Pk are non-faulty. Let Pi form and send a message m at real-time τi. Pk receives m before τi + δ (by assumption 4). Lemmas
in the appendix B prove that the timeliness bounds of table 1 are safe when d ≥ δ/(1 - 5ρ). So, Pk will find the received m timely and accept it. Note that when a non-faulty processor
accepts a message m, none of its path counters stays below m.TS after some finite time. So, the message m accepted by Pi and Pk, will be taken up for ordering and be ordered because a faulty processor cannot form and send another message that could make Pi and Pk consider m
as spurious (due to assumption 3). This shows that the validity condition is met within a finite time after τi.
The schedule instructions in the Broadcast process indicate that PCi,min reaches m.TS within 4d clock time after τi. So, Pi orders m no later than τi + 4d(1 + ρi). The schedule instructions in the Diffuse process indicate that PCk,min reaches m.TS within 3d clock time after Pk receives m which is before τi + δ.So, Pk orders m no later than τi + δ=+ 3d(1 + ρk). Thus the message m sent by Pi gets ordered by Pi and Pk, no later than τi + maximum of {4d(1 + ρi), δ + 3d(1 + ρk)}. By assumption 5, |ρi| ≤ ρ and |ρk| ≤ ρ. So, ∆== maximum of {4d(1 + ρ), δ + 3d(1 + ρ)} = 4d(1 + ρ).
Let Stablei(T), for some T, T ≥ INIT_VAL, denote the set stablei which non-faulty Pi first constructs during an execution of the protocol when SCi = T. The code for the Order process indicates that (a1) Pi strips off signatures of every m in Stablei(T) and removes duplicates; and then, (a2) it removes spurious messages from Stablei(T). Let SigFreei(T) and SpuFreei(T) denote the resulting Stablei(T) after a1 and a2 are carried out, respectively: Definition 1: SigFreei(T) = {m | m ∈=Stablei(T)=∧=m.S = ∅}. Definition 2: SpuFreei(T) = {m | m ∈=SigFreei(T)=∧=(¬∃m’ ∈=SigFreei(T):=m.O = m’.O)}. Lemma 2: Consider an execution of the protocol in which Pi and Pk are non-faulty. Suppose that: m ∈=Stablei(T) equiv(m) ∈=Stablek(T), and m ∈=Stablek(T)
equiv(m) ∈=Stablei(T).
Then, SpuFreei(T) = SpuFreek(T). Proof: In reducing Stablei(T) to SigFreei(T), Pi empties the m.S field of every m in Stablei(T)
and then discards duplicates. m and equiv(m) are distinguished only by their signature fields,
and setting this field to empty will make them identical. (See the definition of equiv(m) in section 4.1.) If Stablei(T) contains m and equiv(m), the duplicate removal ensures that only one of the identical copies is retained in SigFreei(T). So, by the hypothesis of the lemma, SigFreei(T) = SigFreek(T). Contrary to lemma, assume that SpuFreei(T) ≠ SpuFreek(T); also assume that, without loss of generality, there is an m such that m ∈=SpuFreei(T) and m ∉=SpuFreek(T). By definition 2, SpuFreei(T) ⊆ SigFreei(T). We have established that SigFreei(T) = SigFreek(T). So, m ∈=SigFreek(T). But m ∉=SpuFreek(T); so, by definition 2, there must have been an m’ in SigFreek(T) such that m.O = m’.O. Since SigFreei(T) = SigFreek(T), m’ ∈=SigFreei(T). By definition 2, m cannot be in SpuFreei(T).
Lemma 3: The protocol guarantees the unanimity condition, provided d ≥ δ/(1 - 5ρ). Proof: Consider an execution of the protocol in which Pi and Pk are non-faulty. Say, Pi accepts some m’. If m’ is signed by Pk, then equiv(m’) is already accepted by Pk. If m’ is not signed by Pk, then Pi will sign and diffuse m’ to Pk. Lemmas in the appendix B prove that the timeliness bounds of table 1 are safe when d ≥ δ/(1 - 5ρ). So, Pk will accept the diffused
message due to the safety property of the timeliness bounds. Thus, for every m’ accepted by Pi, Pk accepts equiv(m’). Observe (from figure 2) that a non-faulty processor accepts no m,
m.TS ≤ T, once its stability counter(SC) becomes equal to T, T ≥ INIT_VAL. So, for every m’ accepted by Pi, Pk should accept equiv(m’) before SCk becomes equal to m’.TS. Therefore, for any T ≥ INIT_VAL, when Pi and Pk form Stablei(T) and Stablek(T) respectively: m ∈=Stablei(T)
equiv(m) ∈=Stablek(T).
By similar arguments, we can show: m” ∈=Stablek(T) equiv(m”) ∈=Stablei(T). By lemma 2, SpuFreei(T) = SpuFreek(T) = SpuFree(T) (say) for every T ≥ INIT_VAL. By definition 2,
SpuFree(T) will contain at most one message originating from a given processor. Since processor ordering is unique and known, Pi and Pk will order the entries of SpuFree(T)
identically. Further, entries of SpuFree(T) are ordered only after the entries of all SpuFree(T’), T’ < T, have been ordered. So, Pi orders m’ before m if and only if Pk orders m’
before m. Thus the protocol satisfies the unanimity condition.
Theorem 1: The protocol guarantees unanimity and validity conditions with ∆== 4d(1 + ρ), provided d ≥ δ/(1 - 5ρ). Proof: Follows from lemmas 1 and 3.
Ordering Delays with Synchronised Clocks
Let non-faulty processors’ clocks be synchronised within e. According to the classical
time-based ordering protocol of [Cristian85], when a non-faulty Pi forms and broadcasts a
message m, it sets m.TS to the current reading of its synchronised clock; any non-faulty
processor can stabilise an accepted m only at its clock time m.TS + 2(d + e). Therefore, a
message m whose broadcast has been initiated at real time τ is ordered by its non-faulty originator at τ + 2(d + e)(1 + ρm.O) and by any non-faulty Pj, j ≠ m.O, at some real time within the interval [τ + 2(d + e)(1 + ρj) - e(1 + ρj), τ + 2(d + e)(1 + ρj) + e(1 + ρj)], depending on the actual difference between the readings of Pj’s and m.O’s clocks (which is bounded by e) at the time m is ordered by Pj. So, the upper bound on time-based ordering
delays is (2d + 3e)(1 + ρ), or 2d if e and ρ are negligibly small compared to d; if only ρ is
negligibly small compared to d, then the upper bound becomes 2(d + e) + e.
We will now establish the favourable conditions listed in section 1. Let λ and da be as
defined there, and all processors be non-faulty. Suppose also that an input µ is first received
by a processor within the TMR system at real-time τ, and that ρ is negligibly small. By τ + λ, every processor forms and broadcasts a message m with m.µ = µ. By τ + λ + 2da, every processor, say Pi, accepts two double-signed messages m’ and m” such that path(m’) ≠ path(m”) and m’.µ = m”.µ = µ; the one with a lower timestamp, say m’, becomes stable at Pi by τ + λ + 2da + 2d: if Pi accepts m’ at τ’, τ’ ≤ τ + λ + 2da, say, with path(m’) = Pj:Pk, then
by the entry D3 of Table 1, it will not accept a double-signed m with m.TS ≤ m’.TS and path(m) = Pj:Pk, after τ' + 2d ≤ τ + λ + 2da+ 2d; further, since Pi accepts m” with path(m”) = Pk:Pj by time τ + λ + 2da, it will not accept any double-signed m with path(m) = Pk:Pj and m.TS ≤ m”.TS > m’.TS, after τ + λ + 2da + 2d. Thus, every processor orders input µ by τ + λ + 2da + 2d; with the time based protocol, no correct processor can order µ before=τ + 2(d + e). Thus, our protocol is guaranteed to be faster when λ=< 2(e- da).
5.1. Optimal Upper Bound
We next show that the ordering bound of our protocol is the smallest achievable when (i)
clocks are not synchronised, (ii) ordering is symmetric, and (iii) a processor cannot deduce
temporal order between concurrent messages it receives. Each of these premises is defined
Unsynchronised Clocks: Non-faulty processors’ clocks are not synchronised, where clock is a device which a processor uses for observing time. To state formally, let ci(τ) denote the reading of Pi’s clock at real time τ. Clocks of non-faulty Pi and Pj are said to be unsynchronised during an interval ι if |ci(τ) - cj(τ)| is arbitrary for every τ=in ι.
Symmetric Ordering: Our protocol, like [Cristian85], is symmetric in the sense that the
correct processors run the same program (except for process identities and signatures). In
contrast, in an asymmetric protocol (e.g. [Echtle87]), correct processors can execute different
code and hence play different roles: one processor, termed the sequencer, decides and
disseminates the message ordering for other processors to accept what it has decided. In
failure-free executions, asymmetric ordering will be the fastest if the non-sequencer
processors can confirm the sequencer’s correct behaviour without delaying message ordering;
such delay-free confirmation is possible only when the sequencer is guaranteed to fail in a
benign manner (e.g. by crash). However, when (authenticated) Byzantine failures are
permitted, the non-sequencer processors must exchange messages to confirm that the
sequencer had behaved correctly so far; that is, message diffusion must precede message
ordering. Further, when the sequencer fails, message ordering is delayed until the failure is
detected and a new sequencer is elected. Though not proved here, it appears that asymmetric
ordering cannot offer a better worst-case ordering delay.
Non-deductibility of Temporal Order: Based on the definition of ‘happened before’ in
[Lampo78], we define two messages to be concurrent iff neither one can be said to have
happened before the other. Temporal order on messages is an order that is based on the
Newtonian time instants at which messages were generated: for any two messages in the
system, one message is before another in the temporal order if the first one was generated
earlier in a Newtonian time-frame (see [Veris00] for a formal definition). We assume that no
processor can deduce temporal order between concurrent messages it receives
For simplicity we will assume ρ = 0 and consider a system in which communication delays
between two non-faulty processors can be at most δm. Let us define δm- such that δm- < δm and (δm - δm-) is infinitely small.
Theorem 2: Any symmetric ordering protocol that works only with unsynchronised clocks, will have executions in which the ordering delay can be 3δm + δm-. Proof: By contradiction. Assume that there is such a protocol which guarantees that ordering delays are always smaller than 3δm + δm-. Consider two distinct executions of this protocol during real-time intervals ι1 and ι2 respectively. By hypothesis, non-faulty processors’ clocks
remain unsynchronised throughout each interval
In the first execution (see figure 7(a)), Pi fails only by not sending its messages to Pj and not receiving Pj’s messages. Pi sends mi at its clock time ti. Let mi take zero time to reach Pk. Suppose that Pk’s clock reads tk when Pk receives mi. (Since mi takes zero time, when Pk’s clock reads tk, Pi’s clock reads ti.) Let Pk accept and diffuse mi to Pj and the diffused message take δm time to be received by Pj. Just before Pj receives the diffused mi, i.e. when Pk’s clock reads tk + δm-, suppose that Pj‘s clock reads tj and that Pj forms and sends mj which takes δm time to be received by Pk. So, Pk’s clock reads (tk + δm- + δm) when Pk receives mj. Note that Pj sends mj before it receives the diffused mi from Pk, and therefore neither mi nor mj happened before [Lampo78] the other. Since Pk cannot deduce that mi 3 According to [Lampo78], two messages need not be produced at the same newtonian time, for them to be deemed concurrent. 4 By this, we exclude a class of protocols which permit clock synchronisation messages to be piggybacked onto the order protocol messages and thus achieve clock synchronisation during the order protocol execution.
originated before mj in real-time, we will assume (without loss of generality) that mj is ordered before mi by all non-faulty processors.
Figure 7. Execution Scenarios. (a) First execution. (b) Second execution.
In the second execution (see figure 7(b)), Pj fails only by not sending its messages to Pi and not receiving Pi’s messages. Pi sends mi at its clock time ti. mi takes δm time to reach Pk and is not received by Pj. Suppose that Pk’s clock reads tk when Pk receives mi, i.e., when Pi’s clock reads ti + δm. (Note: this is possible with unsynchronised clocks whose readings can
differ by an arbitrary amount; to prove the impossibility we have chosen the difference to be a convenient amount.) Assume that the mi diffused by Pk takes δm time to be received by Pj.
When Pk’s clock reads tk + δm-, suppose that Pj‘s clock reads tj and that Pj forms and sends mj only to Pk which takes δm time to be received. That is, Pk’s clock reads (tk + δm- + δm) when Pk receives mj. For Pk, this execution is indistinguishable from the first must order mj before mi. Since ordering delays are always smaller than 3δm + δm-, non-faulty Pi must order its own mi before ti + 3δm + δm-. Say Pk’s diffused mj takes δm time to be received by Pi. That is, Pi can receive mj (for the first time), only when its clock reads ti + 3δm + δm-. So, Pi can accept mj only at or after ti + 3δm + δm-. Hence Pi cannot order mj before ti + 3δm + δm-, and therefore before mi. This violates the unanimity condition.
By theorem 2, the upper bound ∆ on ordering delays must be at least (3δm + δm-), i.e.
∆ ≥ (3δm + δm-). Since δm is not known directly, but only its upper bound δ (see assumption 4), ∆ ≥ 4δ. Theorem 1 establishes ∆ of our protocol to be 4d(1 + ρ) with d recommended to
be d ≥ δ/(1 - 5ρ). When d is chosen to be δ/(1 - 5ρ) and when ρ = 0, we have ∆ = 4δ. Thus, if
we ignore the (small) increments on d made to compensate for non-zero ρ, our protocol is
The compensation needed to account for non-zero ρ varies from protocol to protocol and
is influenced very much by the design and the structure of a given protocol. We do not
therefore claim that the compensation needed for our protocol is the smallest required by any
5 This is true even if the sender of a message m timestamps m with the local send time.
protocol. Instead, we show in Appendix B that the compensation factor of (1/(1 - 5ρ)) is the
smallest needed for our protocol.
6. Concluding Remarks
Replicated processing with majority voting is a well-known technique for masking the
effects of Byzantine faults in processors. Message ordering is a central requirement for
implementing this concept in a multi-client environment. We have presented a timeout-based
message ordering protocol for TMR systems. It is a two-round protocol, as it has to be
[Dolev82], since there can be at most one faulty processor in the TMR system. We have also
presented in detail the design efforts in keeping the timeliness bounds small; consequently the
protocol provides an upper bound on ordering delays, which is the smallest achievable when
non-faulty processor’s clocks are not synchronised within a known and finite bound.
We conclude the paper by exposing two drawbacks in generalising the protocol to a
system of N processors, where at most π out of N processors can fail. We restrict the context
of generalisation to the one that preserves the minimality of the message cost of the ordering
protocol. (This restriction rules out generalisations into protocols that guarantee early
ordering or smaller latencies at the expense of increased message cost.) A generalised
protocol (with minimum message cost) must however allow (π=+ 1) rounds of messages exchange. This means that the number of path counters which a processor has to maintain
increases exponentially with π and N: one counter for every possible path through which a
message with at most (π + 1) signatures can be received. Secondly, the generalisation must
first assume that a processor must allow a timeliness bound of (2d)i, 1 ≤ i ≤ π + 1, for any
i-signed message it receives. This follows from the arguments presented (in Section 4.3) for
the TMR protocol before the bounds were reduced. Thus, when a non-faulty P first receives a
timely message m from another processor Q, it has to wait for 2d(π + 1) time before m can be
ordered. This means that the ordering delay bound is ∆ = 2d(π + 1) + δ. Reducing the bound
to 2d(π + 1) as we did for the TMR protocol in Section 4.3.2, is possible only in the following
circumstance: N and π must be such that for any m and any (π + 1)-signed m’ that P may
accept, m’.TS ≤ m.TS, path(m) must intersect with path(m’). To guarantee that a particular m
and all possible (π + 1)-signed m’ have intersecting paths, we must have (π + 1) ≥ N – 1. Note
that the ordering problem is vacuous for π > N – 2, and hence the realistic upper bound on π
is N - 2. So, the ordering bound can be reduced to 2d(π + 1), only when π = N – 2. With π > 1, the reduction is not possible for an NMR system where N has to be at least 2π + 1. In other words, π = 1 is the only case for which ordering delays can be reduced by δ and
replicated processing with majority voting is feasible. That is, the design of the protocol
presented exploits features that are so characteristic of the TMR system in providing the
smallest upper bound on message ordering delays. Acknowledgements
This work has been supported in part by grants from CNPq/Brazil.
References
F. V. Brasileiro, P. D. Ezhilchelvan and N. A. Speirs, “TMR processing without
explicit clock synchronisation,” Proceedings of the 14th Symposium on Reliable Distributed Systems, Bad Neuenahr, Germany, pp. 186-195, September 1995.
[Chandra96] T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed
Systems", Journal of the ACM, 43(2), pp. 225 - 267, March 1996.
[Cristian85] F. Cristian, H. Aghili, R. Strong and D. Dolev, “Atomic Broadcast: from Simple
Message Diffusion to Byzantine Agreement,” Digest of Papers, FTCS-15, Ann Arbor, USA, pp. 200-206, June 1985.
[Cristian96] F. Cristian, B. Dancey, and J Dehn, "Fault-Tolerance in Air Traffic Control Systems",
ACM Transactions on Computer Systems, 14(3), pp. 265-286, August 1996.
[Cristian99] F. Cristian and C. Fetzer, "The Timed Asynchronous Distributed System Model", IEEE
Transactions on Parallel and Distributed Systems, 10(6), June 1999.
D. Dolev and H. R. Strong, “Requirements for Agreement in a Distributed System,” Proceedings of the 2nd Symposium on Distributed Databases, pp. 115-129, September 1982.
Dolev, J. Halpern, and H. R. Strong, “On the Possibility and Impossibility of
Achieving Clock Synchronisation,” Proceedings of the 16th Annual ACM STOC, pp. 504-511, April 1984.
K. Echtle, “Fault Masking and Sequence Agreement by a Voting Protocol with low Message Number,” Proceedings of the 6th Symposium on Reliability in Distributed Software and Database Systems, pp. 149-160, March 1987.
C. Fetzer and F. Cristian, "Fail Awareness: An Approach to Construct Fail-safe Applications", Digest of Papers, FTCS-27, Seattle, June 1997.
[Kopetz89] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabl, C. Senft, and R.
Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The Mars Approach", IEEE Micro, pp. 25-41, 1989.
[Lampo78] L. Lamport, “Time, Clocks, and Ordering of Events in a Distributed System,”
Communications of the ACM, 21(7), pp. 558-565, July 1978.
L. Lamport, “Using Time Instead of Timeout for Fault-Tolerant Distributed Systems,” ACM Transactions on Programming Languages and Systems, 6(2), pp. 254-280, April 1984.
M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” Journal of the ACM, 27(2), pp. 228-234, April 1980.
[Powell88] D. Powell, P. Verissimo, G. Bonn, F. Waeselynck, and D. Seaton, “The Delta-4
Approach to Dependability in Open Distributed Computing Systems,” Digest of Papers, FTCS-18, Tokyo, pp. 246-251, June 1988.
R. L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signatures and Public Key Cryptosystems,” Communications of the ACM, 31(2), pp. 120-126, February 1978.
F. Schmuck and F. Cristian, “Continuous Clock Amortization Need Not Affect the Precision of a Clock Synchronisation Algorithm,” Proceedings of the 9th ACM Symposium on Principles of Distributed Computing, pp. 133-141, August 1990.
F. B. Schneider, “Implementing Fault Tolerant Services Using the State Machine Approach: a Tutorial,” ACM Computing Surveys, 22(4), pp. 299-319, December 1990.
S. K. Shrivastava, P. D. Ezhilchelvan, N. A. Speirs, S. Tao, and A. Tully, “Principle Features of the Voltan Family of Reliable System Architectures for Distributed Systems,” IEEE Transactions on Computers, 41(5), pp. 542-549, May 1992.
N.A. Speirs, S. Tao, F.V. Brasileiro, P.D. Ezhilchelvan, and S.K Shrivastava, “The Design and Implementation of Voltan Fault-Tolerant Systems for Distributed Systems, Transputer Communications, 1(2), pp. 1-17, November 1993.
G Tsudik, “Message Authentication With One-Way Hash Functions”, ACM Computer Communications Review, 22(5), 1992.
N. Vasanthavada and P. N. Marinos, “Synchronisation of Fault-Tolerant Clocks in the Presence of Malicious Failures,” IEEE Transactions on Computers, Vol. C-37(4), pp. 440-448, April 1988.
P. Verissimo, L. Rodrigues, and J. Rufino, “The Atomic Multicast Protocol (AMp),” In Delta-4: A Generic Architecture for Dependable Distributed Computing, D. Powell (Ed.) ESPRIT Research Papers, Springer-Verlag, pp. 267-294, 1991.
P. Verissimo, “Causal Delivery Protocols in Real-Time Systems: a Generic Model,” Journal of Real Time Systems, 10(1), pp. 45-73, 1996.
P. Verissimo, L. Rodrigues, and A. Casimoro, “Cesium Spray: A Precise and Accurate Global Clock Service of Large Scale Systems,” Journal of Real Time Systems, 11(3), 1997.
P. Verissimo and M. Raynal, “Time in Distributed System Models and Algorithms,” in ‘Advances in Distributed Systems’, S. Krakowiak and S. K. Shrivastava (Eds.) LNCS 1752, Springer-Verlag, pp. 1-32, 2000.
Appendix A. Relationship Between Timeliness Checks of Messages Lemma A1: Let t1 and t2 be the local clock times when a non-faulty Pi receives a single- signed m1 and a double-signed m2, respectively; also, let m1.TS ≤ m2.TS and t1 > t2. If Pi finds m1 timely then it must also find m2 timely.
Figure A1. m1.TS ≤ m2.TS ≤ m’.TS.
Proof: We will measure time according to Pi’s local clock and prove the lemma by contradiction. Suppose that Pi finds m1 timely and m2 late. Let m2 be late by y, y > 0, time units, i.e., if Pi had received m2 at any time before (t2 - y) then it would have found m2 timely. (See figure A1.) That m2 received at t2 was found late implies that there exists a message m’, m’.TS ≥ m2.TS, which was accepted by Pi at (t2 - y - tb2), where tb2 is the
timeliness bound indicated by the entry of table 1 whose row corresponds to the path(m’) and
column to the path(m2). Let tb1 be the timeliness bound indicated by the entry of table 1 whose row corresponds to the path(m’) and column to the path(m1).
Since m1.TS ≤ m2.TS ≤ m’.TS, m1 must be received by Pi before (t2 - y - tb2 + tb1) for it to
be considered timely. For any given path(m’), i.e. in any given row of table 1, the timeliness
bound for a single-signed message is smaller than that for a double-signed-message. That is,
tb1 < tb2. This means, (t2 - y - tb2 + tb1) < t2; by given, t2 < t1. So, t1 > (t2 - y - tb2 + tb1). This means that m1 is received by Pi after (t2 - y - tb2 + tb1) and cannot be found timely by Pi.
B. Safe Timeliness Bounds
We next show that the timeliness bounds satisfy the safety requirement of section 4.2
when d ≥ δ/(1 - 5ρ). This is done in two stages: lemma B1 proves that a non-faulty processor
always finds another non-faulty processor’s 1-signed message timely; and lemma B3 shows
that when a non-faulty processor diffuses a 2-signed message to another non-faulty processor,
the latter finds the diffused message timely. As in the main paper, we adopt the style of
writing real time values in Greek and clock time values in italicised lower case Roman letters.
The term ‘clock’ always refers to a processor’s physical clock.
We assume the following notations: STARTi(p,≥T) denotes the smallest real time instance
when PCi[p] for path p becomes larger than or equal to T, T ≥ INIT_VAL. That is, just before real time STARTi(p,≥T), PCi[p] is less than T. ENDi(≤T) denotes the largest real time instance when MCi is less than or equal to T. That is, just after real time ENDi(≤T), MCi is larger than T and Pi will not form and send any m, m.T ≤ T. We also retain the notation ρi, |ρi| ≤ ρ, to denote the rate with which the clock of a processor Pi drifts from real time. Lemma B0: Say a non-faulty Pi accepts m at real time τi. If m is not signed by a non-faulty Pk, Pk then receives m or equiv(m) at real time τk such that τk < τi + δ. Proof: If m is Pi’s own message, Pi is required to send m to both Pj and Pk. If m is signed by Pj, Pi is required to diffuse double-signed equiv(m) to Pk. By assumption 4, Pk then receives m or equiv(m) before τi + δ. Lemma B1: When a non-faulty Pk forms and sends m, another non-faulty Pi finds m timely,
provided d ≥ δ/(1 - ρ). Proof: By lemma B0, Pi receives m or equiv(m) within δ real-time after Pk sends m. Say, m.TS = T. If Pi has not accepted any message with timestamp larger than or equal to T until it receives m, then MCi ≤ T when it receives m and Pi finds m timely. Let us suppose that MCi is already larger than T when Pi receives m. Since Pi has accepted one or more messages with
timestamp larger than or equal to T, there must exist a message, say m’, m’.TS ≥ T, whose acceptance causes Pi to set PCi[Pk] ≥ T at real time STARTi(Pk,≥T). If we show that STARTi(Pk,≥T) - ENDk(≤T) ≥ δ for every possible path(m’), then the lemma is proved. We
present the proof by categorising the values of path(m’) in two cases. For each case, we construct the proof in the following manner. We define τi to be the real time when Pi accepts m’ and α = STARTi(Pk,≥T) - τi. (Figure B1 shows these values along the real-time axis for Pi.) We derive two inequalities – inequality (1) involving τi and ENDk(≤T) and inequality (2) involving τi and STARTi(Pk,≥T). We then relate these two inequalities through the common element τi, to show what is required.
Figure B1. Pi accepting m’ at τi sets PCi(Pk) ≥=T at STARTi(Pk,≥T).
Case I. path(m’) = Pk or Pk:Pj or Pj:Pk. Just before sending m’, m’.TS ≥ T, Pk sets MCk
to m’.TS + 1. Therefore,
ENDk(≤T) ≤ τi
For all the considered values of path(m’), α = d(1 + ρi). (See entries B1, E1 and D1 of table
STARTi(Pk,≥T) = τi + d(1 + ρi) STARTi(Pk,≥T) ≥ ENDk(≤T) + d(1 + ρi).
Since d ≥ δ/(1 - ρ) implies d(1 + ρi) ≥ δ, we have
STARTi(Pk,≥T) - ENDk(≤T) ≥ δ. Case II. path(m’) = Pj or Pi. Let Pk receive m’ (when path(m’) = Pi) or equiv(m’) (when
path(m’) = Pj) from Pi, at real time τk. By lemma B0, τk < τi + δ. Just before τk, we have either MCk ≤=T or MCk > T. In the first case ENDk(≤T) = τk, and in the latter case ENDk(≤T) < τk. So,
ENDk(≤T) - δ < τi
From entries A1 and C1 of table 1, α = 2d(1 + ρi). So,
STARTi(Pk,≥T) = τi + 2d(1 + ρi) STARTi(Pk,≥T) > ENDk(≤T) - δ=+=2d(1 + ρi).=
When d ≥ δ/(1 - ρ), 2d(1 + ρi) ≥ 2δ, thus
STARTi(Pk,≥T) - ENDk(≤T) ≥ δ. Lemma B2: STARTi(Pj:Pk,≥T) ≥ STARTi(Pj,≥T) for a non-faulty Pi and T ≥ INIT_VAL. Proof: Let m’, m’.TS ≥ T be the message whose acceptance at time τ causes Pi to set PCi[Pj:Pk] ≥ T at STARTi(Pj:Pk,≥T). From table 1, τ ≥ STARTi(Pj:Pk,≥T) – 2d(1 + ρi); also, from the protocol in figure 2, Pi will schedule an update of PCi[Pj] to at least m’.TS at latest by τ +2d(1 + ρi), thus STARTi(Pk,≥T) ≤ τ + 2d(1 + ρi), and since d > 0, STARTi(Pj:Pk,≥T) ≥ STARTi(Pk,≥T). Lemma B3: When a non-faulty Pi diffuses a double-signed m to non-faulty Pk, Pk finds m
timely, provided d ≥ δ/(1 - 5ρ). Proof: As in lemma B1, we will let m.TS = T and suppose that MCk is already larger than T when Pk receives m. Since Pk has accepted message(s) with timestamp larger than or equal to T, there must exist a message, say m’, m’.TS ≥=T, whose acceptance causes Pk to set PCk[Pj:Pi] ≥=T at real time STARTk(Pj:Pi,≥T). Let τk be the real time when Pk accepts m’ and α = STARTk(Pj:Pi,≥T) - τk. Figure B2 shows these values along the real-time axis for Pk. STARTk(Pj:Pi,≥T)
k[Pj:Pi] < T) PC
Figure B2. Pk accepting m’ at τk sets PCk(Pj:Pi) ≥=T at STARTk(Pj:Pi,≥T).
Note that Pi must have diffused m to Pk before STARTi(Pj,≥T), and that Pk will find m
timely if it receives m before STARTk(Pj:Pi,≥T). We need only to show that STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) ≥ δ for every possible path(m’). We show this by
considering three cases regarding the originator of m’, m’.O. In all cases, there is a real time
τi when Pi either sends m’ or receives equiv(m’); in the same way we did to prove lemma B1, for each case we construct two inequalities: inequality (3) involving τi and STARTk(Pj:Pi,≥T), and inequality 4 involving τi and STARTi(Pj,≥T). These equalities are then related through τi to show what is required.
We first present the timeliness bounds that a non-faulty Pk uses to check the timeliness of
a message m it receives. They are presented in the same way that we did before for a non-faulty Pi, and are obtained simply by interchanging the suffixes k and i in table 1. Table B1 shows the timeliness bounds used by Pk.
Pi:Pj (E) d
Table B1: Pk’s timeliness bounds for a past m given that MCk first exceeded m.TS due to accepting m’.Case I. m’.O = Pk. From entry A3 of table B1, α = 4d(1 + ρk) and Pk must have formed
and sent m’ at real time STARTk(Pj:Pi,≥T) - 4d(1 + ρk). Let Pi receive m’ at τi, by lemma B0,
τi < STARTk(Pj:Pi,≥T) - 4d(1 + ρk) + δ=
From lemma B1, Pi accepts m’; so from entry B2 of table 1, we have
STARTi(Pj,≥T) ≤ τi + 2d(1 + ρi) STARTi(Pj,≥T) < STARTk(Pj:Pi,≥T) - 4d(1 + ρk) + δ + 2d(1 + ρi), and
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) > 2d(1 + 2ρk - ρi) - δ
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) > δ,
whenever d ≥ δ/(1 + 2ρk - ρi), which is always true since d ≥ δ/(1 - 5ρ) > δ/(1 + 2ρk - ρi).
Case II. m’.O = Pi. Note that the m’ that Pk accepts can be single or double-signed. Let Pi
form and send m’ (if m’ is single-signed) or the single-signed equiv(m’) (if m’ is double-signed) at real time τi. Whether m’ is single or double-signed, the entries B3 and E3 of table B1 indicate that α = 3d(1 + ρk). Since τi ≤ τk,
τi ≤STARTk(Pj:Pi,≥T) - 3d(1 + ρk) STARTi(Pj,≥T) ≤ τi + 2d(1 + ρi) STARTi(Pj,≥T) ≤ STARTk(Pj:Pi,≥T) - 3d(1 + ρk) + 2d(1 + ρi), and
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) ≥ d(1 + 3ρk - 2ρi)
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) ≥ δδ,
whenever d ≥ δ/(1 + 3ρk - 2ρi), which is true since d ≥=δ/(1 - 5ρ) ≥=δ/(1 + 3ρk - 2ρi).
Case III: m’.O = Pj. Say path(m’) = Pj. From entry C3 of table B1, α = 3d(1 + ρk). Let Pi
receive the diffused equiv(m’) from Pk at real time τi; from lemma B0,
τi <STARTk(Pj:Pi,≥T) - 3d(1 + ρk) + δ= =
(III.3a).
If Pi accepts equiv(m’) at τi, then from entry D2 of table 1, τi ≥ STARTi(Pj,≥T) - d(1 + ρi); otherwise, τi > STARTi(Pj:Pk,≥T) and from lemma B2, τi > STARTi(Pj,≥T). Since d > 0, in
STARTi(Pj,≥T) ≤ τi + d(1 + ρi) (III.4a). STARTi(Pj,≥T) < STARTk(Pj:Pi,≥T) - 3d(1 + ρk) + δ=+ d(1 + ρi), and
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) > 2d(1 + 3ρk/2 - ρi/2) - δ,
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) > δδ,
whenever d ≥ δ/(1 + 3ρk/2 - ρi/2), which is true since d ≥=δ/(1 - 5ρ) > δ/(1 + 3ρk/2 - ρi/2).
Say path(m’) = Pj:Pi. α = 2d(1 + ρk) from entry D3 of table B1. Let Pi diffuse m’ to Pk at
τi ≤STARTk(Pj:Pi,≥T) - 2d(1 + ρk) (III.3b). STARTi(Pj,≥T) ≤ τi + d(1 + ρi) (III.4b). STARTi(Pj,≥T) ≤ STARTk(Pj:Pi,≥T) - 2d(1 + ρk) + d(1 + ρi), and
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) ≥ d(1 + 2ρk - ρi),
STARTk(Pj:Pi,≥T) - STARTi(Pj,≥T) ≥ δδ,
whenever d ≥ δ/(1 + 2ρk - ρi), which is true since d ≥=δ/(1 - 5ρ) > δ/(1 + 2ρk - ρi).
GRANT PROGRESS REPORT SUMMARY 01602: Longitudinal Study Investigating the Progression and Pathogenesis of Atypical Hyperadrenocorticism in Scottish Terriers Principal Investigator: Research Institution: Virginia-Maryland Regional College of Veterinary Medicine Grant Amount: Start Date: 1/1/2012 End Date: 12/31/2013 Progress Report: Mid-Year 2 Report Due: Receiv
How To Treat Digital DermatitisDigital Dermatitis (DD), also known as Hairy Foot Warts, Strawberry Foot Rot, Mortellaro’s Disease and Rasperry Heel, has become the most prevalent infectious hoof disorder on many Canadian dairy farms. Dr. Paul Greenough, in his new book ‘Bovine Laminitis and Lameness’ (see sidebar on next page) recom-mends the following treatment for animals infected wit