## Srds-dc

**A Timeout-Based Message Ordering Protocol for a Lightweight **
**Software Implementation of TMR Systems **
Paul D. Ezhilchelvan†, Francisco V. Brasileiro‡, and Neil A. Speirs†
‡Departamento de Sistemas e Computação
Universidade Federal da Paraíba - UFPb/Campus II
Av. Aprígio Veloso s/n, Campina Grande - Pb, 58108-970, Brazil

**Abstract **
Replicated processing with majority voting is a well known method for achieving reliability
and availability - two key attributes of dependability. Triple Modular Redundant (TMR)
processing is the most commonly used version of that method. Replicated processing requires
that the replicas reach agreement on the order in which input requests are to be processed.
Almost all synchronous and deterministic ordering protocols published in the literature are
time-based in the sense that they require replicas’ clocks to be kept synchronised within some
known bound. We present a protocol for TMR systems that is based on timeouts and does not
require the clocks be kept in bounded synchronism. The protocol design exploits features that
are so characteristic of a TMR system. Consequently, the worst-case input ordering delay
turns out to be the smallest possible; i.e., no symmetric protocol that works only with
unsynchronised clocks, can provide a smaller worst-case delay.

**Keywords and Phrases **
Byzantine failures, fault tolerance, Triple Modular Redundancy (TMR), process replication,
agreement, message ordering, physical and logical clocks.

**1. Introduction **
We consider the task of designing and implementing a system that continues to provide
services in the presence of a bounded number of processor failures. One of the problems is
that a faulty processor might fail in an arbitrary or Byzantine [Pease80] manner.

*N* modular
redundant (NMR) processing is one of the most effective ways to mask the effects of

*Byzantine* failures [Powell88]. Considering that non-faulty processors can generate
unforgeable signatures, the basic idea here is to use in place of a single processor a set of

*N*,

*N* ≥ 3, processors to mask the failures of at most (

*N* - 1)/2 processors. A Triple modular
redundant (TMR) system with

*N* = 3 is the most practical version of an NMR system. The
three processors of a TMR system communicate with each other only by message passing and
are assumed to fail independent of each other. These processors execute every given task in
parallel and the results produced by them are subject to a majority vote. If the voted result is
regarded as the output from the TMR system, the system functions correctly provided (i) at
least two of its constituent processors are non-faulty, (ii) all non-faulty ones produce identical
results after executing any given task, and (iii) the voting performed is correct.
Process replication within a TMR system assumes the well understood state machine
model, for which precise requirements for supporting replicated processing are known
[Sch90]. One of the requirements is that non-faulty processors of a TMR system process
messages that are input to the system, in an identical order so that their results are identical.
This means that non-faulty processors must

*agree* on the relative

*order* in which each input
message is to be processed. In this paper, we develop a message ordering protocol for the

*authenticated Byzantine* fault model – perhaps the most well-known sub-class of the
Byzantine fault model – in which a faulty processor cannot undetectably forge a non-faulty
processor’s signature; the protocol also assumes the following

*synchronous* environment: (i)
communication delays between non-faulty processors are bounded by a known constant; (ii)
processing and scheduling delays within a non-faulty processor can also be bounded; and, (iii)
each non-faulty processor has a local read-only physical clock whose running rate with
respect to the passage of real by a small and known bound.
Message ordering in the presence of failures requires ability to detect late and absent
messages. In the time-based approach [Lampo84], this capability is usually achieved by
synchronising non-faulty processors’ clocks within some known bound,

*e*. Since non-faulty
processors need not have clocks with identical running rates, they should periodically execute
a synchronisation protocol (e.g., [Vasant88, Veris97]) to adjust their clock readings by
appropriate amounts. This synchronisation involves (i) periodic exchange of messages,
consuming network bandwidth, (ii) using data abstractions [Dolev84] to adjust the readings
of a physical clock (a read-only object), and (iii) implementing amortisation techniques
[Schmu90]. An alternative to using synchronised clocks (and therefore to having to keep the
clock differences within a known bound) is to use timeouts and to employ the
(unsynchronised) physical clocks only for measuring timeouts. We will adopt this timeout-
based approach for designing the TMR message ordering protocol.
In

*asynchronous* systems, where the bounds on transmission, processing and scheduling
delays cannot be known with certainty, timeouts are commonly employed for dealing with
failures [Chandra96, Fetzer97] so that non-faulty processors can order messages identically in
a fault-tolerant manner. However, for synchronous systems, the time-based approach is
generally favoured. Practical systems, such as MARS [Kopetz89] and Air Traffic Control
System [Cristian96], have taken the time-based approach. [Echtle87] and AMp [Veris91] are
some of the early exceptions to this time-based trend in the synchronous context. AMp is a
timeout-based, fault-tolerant message ordering protocol, designed and implemented for
commercial applications. It provides the same message ordering guarantees as our protocol in
1Real time is measured in an assumed Newtonian time frame that cannot be directly observed.
a general

*n*-processor system but assumes a benign fault model where processors either crash
or occasionally omit to produce responses. Our assumption of authenticated Byzantine faults
is a weaker fault model and, as argued in [Pease80], any further weakening of this fault model
would make the desired form of message ordering impossible to achieve in a three-processor
One of the designers of AMp later analysed time vs. timeout-based approaches which led
to the following conclusion that has both theoretical and practical significance [Veris96]:
though the synchronous, timeout-based protocols cannot be perfect substitutes for their time-
based counterparts in

*all* circumstances, they can however provide attractive alternatives in a
number of application settings. This conclusion is based on two observations: timeout-based
protocols are less precise in preserving the temporal ordering of events (the order in which
events occur according to Newtonian time); however, the precision they offer is sufficient in
many settings where external events are being interpreted to the system by slow functioning
interface devices such as valves or by cyclic processes operating with large periodicity; for
example, a cyclic process that observes sensor changes every 10 ms, will interpret and report
all changes that occurred in the last 10 ms period to have occurred at the ‘same time’; this
means that, as far as the computing and ordering processes are concerned, two external events
occur either at the same time or at least 10 ms apart, and that any protocol with precision
smaller than 10 ms will be equally useful. Secondly, timeout-based protocols are faster under
certain ‘favourable’ conditions. In the case of our protocol, these favourable conditions
(derived in section 5) turn out to be: all processors are non-faulty; and, the maximum
difference (λ) within which non-faulty processors receive a given input from the environment
and the actual maximum transmission delay (

*da*) that currently holds within the system, are
such that λ=< 2(

*e- da*). λ is typically defined as

*tightness* [Veris00] and will be small when processors receive inputs (from the system environment) via a broadcast medium. We
therefore believe the protocol presented here enhances the tool-kit available for a TMR
system builder. Further, our implementation experience [Speirs93, Brasil95] suggests that
building a TMR system using our protocol is easier as clocks need not be synchronised.
The rest of the paper is organised as follows. Section 2 describes the structure of the TMR
system, the basic assumptions and the message ordering requirements. Section 3 develops and
presents the protocol. Section 4 describes the design efforts involved in keeping the ordering
delays as small as possible. Section 5 presents proofs of correctness and also proves that no
(symmetric) protocol that works only with unsynchronised clocks can guarantee smaller
worst-case ordering delays. Section 6 concludes the paper.

**2. System Description and Assumptions **
We will suppose that the TMR system is made up of processors named

*P*i,

*P*j and

*P*k.
These processors are uniquely ordered and the ordering is known to them. Each processor is
connected directly to the other two processors of the system by

*internal* links. Also, it is
connected to the ‘outside world’ or the system environment from which input messages are
received and output messages are sent to. Figure 1 shows a TMR system whose processors
are connected to the environment via a bus. The unit (shown as a black square in the figure)
that connects a processor to the bus is called the network attachment controller (NAC).

**Assumption 1**: Within a TMR system, at least 2 out of 3 processors are non-faulty and never

**Assumption 2**: The internal links that connect processors never fail. (Alternatively, a link

failure can be thought of as the failure of one of the two processors connected by that link.)
Processors are reliably connected to the system environment. We further assume that a NAC
does not allow the attached processor to use the bus continuously for a long time, and thereby
it contains the effects of the “babbling idiot” syndrome: the faulty processor may generate
messages randomly and try to transmit them all onto the bus, thus preventing other processors

**Assumption 3**: Each processor can sign the messages it sends, and authenticate the signed

messages it receives [Rivest78, Tsudik92] such that:
A non-faulty processor’s signature for a given message is unique and cannot
Any attempt to alter the contents of a non-faulty processor’s signed message is
detected by any other non-faulty processor.
In this paper we adopt the style of writing real time values in Greek letters and clock time
values in italicised lower case Roman letters; the term ‘clock’ will always refer to a

**Assumption 4**: If a non-faulty

*P*i prepares and transmits

*m* at real time τi

*,* every non-faulty

destination

*P*j will receive

*m* at real time τj

*,* τi ≤ τj < (τi + δ), where δ, δ > 0, is known. δ

bounds the message queuing delays at the sending and receiving processors, and the
propagation delay from the sender to receivers. A violation of this assumption between a pair
of processors means that one of the processors is faulty.

**Assumption 5**: A non-faulty

*P*i’s clock measures an interval

*x* in a real time interval

*x*(1 + ρi), where |ρi| ≤ ρ and ρ is a known positive constant.

**Assumption 6**: Processing and scheduling delays are bounded and known for a non-faulty

processor; more precisely, a non-faulty processor: i) performs a local computation (e.g.,
processing a received message) within a bounded and known amount of time; and, ii)
schedules a computational task within a bounded and known amount of time.

**Remarks**: The last three assumptions make the system context a synchronous one. In

asynchronous models, however, these assumptions can be violated infinitely often even
within a non-faulty processor and consequently the order protocol will terminate only when
such violations cease for a sufficiently long time (see [Chandra96, Cristian99]); an exception
is the timed asynchronous model of [Cristian99] which does not permit violation of
assumption 5 within a non-crashed processor. In our case, a violation of any of these three
assumptions within a processor would render the processor faulty; consequently, the order
protocol is always guaranteed to terminate in a known and bounded amount of time, provided
the number of faulty processors is restricted to one (assumption 1).

**2.1 Input Message Ordering **
Non-faulty processors can receive the input messages in different order. Referring to figure
1,

*P*i can receive µ1 followed by µ2 and

*P*k in the reverse order. So, when a processor
receives an input message from the system environment, it must first decide the processing
order for that message. For that, it forms an

*internal *message

*m* that contains the received
input message in the data field

*m*.µ and sends

*m* to all other processors in the system using the
order protocol that guarantees the following two conditions:

**Validity**: If a non-faulty

*P*i forms and sends

*m*, all non-faulty processors (including

*P*i) will

decide on an order for

*m*, within a known and bounded real time interval ∆;

**Unanimity**: If a non-faulty

*P*i decides on an order for a given

*m*, then every non-faulty

*P*j

These conditions ensure that an input µ supplied to a non-faulty

*P* gets identically ordered
in the form of

*P*’s internal message

*m*,

*m*.µ = µ, by all non-faulty processors of the TMR
system within ∆. It is possible that the faulty processor generates and orders an internal
message for a fictitious input or for an old input message that has already been processed by
non-faulty processors. So, to ensure that no duplicate or fictitious input gets processed, an
ordered internal

*m*,

*m*.µ = µ, can initiate the (ordered) processing of input µ only after the µ
contained in it, has been verified to be valid, i.e. not a fictitious or duplicate input. We refer
the reader to [Shri92] for details on how a non-faulty processor (i) derives an ordered stream
of inputs from an ordered stream of internal messages, and (ii) generates a voted output from
the results it computes. This paper will focus only on the (timeout-based) ordering of internal
messages. Note that since the TMR system can have at most one faulty processor, an input µ
must be supplied to at least two processors within the system; we will assume that every µ is
sent to all three processors in the TMR system.

**3. The protocol **
The protocol has three aspects to it: (i)

*message counters* maintained by processors, (ii)
message

*diffusion* that enables non-faulty processors to receive each other’s messages, and
(iii)

*timeliness checks *to assess the timeliness of a received message.

**3.1. Message Counter and Diffusion **
Each processor within the TMR system, say

*P*i, maintains a counter called the

*message *
*counter*, denoted as

*MC*i, which holds an integer value and is initialised to

*INIT_VAL *(usually 1) when the system is first started. For every given input µ received from the environment,

*P*i

* *
*forms* an internal

*m* in the following manner: the data field

*m*.µ is set to µ, the originator field

*m*.

*O* to

*i*; the timestamp field

*m*.

*TS* to

*Mc*i; further,

*MC*i is incremented by 1. Incrementing of

*MC*i soon after timestamping

*m* ensures that any message

*P*i later forms gets a timestamp
larger than

*m*.

*TS*. An internal message

*m* formed is accepted with its copy being entered into a message list called the

*accepted*i and is then sent to other processors using the

*send(m)*
primitive. An invocation of this primitive discards

*m* if

*m.S *contains two signatures, and
performs the following actions if

*m.S *contains no or one signature: it generates the signature of

*P*i for

*m*, appends the generated signature to any signature that may already be in

*m*.

*S* and transmits the resulting

*m* to processors in the system that have not signed

*m*.

*P*i receives
internal messages from other processors by executing

*receive*(

*m*) primitive which blocks until
it can return an authentic

*m* that is received via an internal link and has the authentic signatures of one or two distinct processors other than

*P*i.
Whenever

*P*i receives

*m*, it checks whether

*m* is timely. (Procedures for checking the
timeliness of a received message are described in the next sub-section.) If

*m* is found untimely, it is discarded; otherwise, the following three actions are performed: (i)

*MC*i is set to the

*maximum of* {

*MC*i,

*m*.

*TS *+ 1}, (ii)

*m* is

*accepted* by entering a copy of

*m* into

*accepted*i; and, (iii)

*m* is ‘diffused’ by executing the primitive

*send*(

*m*). Thus, any accepted
message that is received with one signature is ‘diffused’ to the processor that appears not to
have ‘seen’ the message. As there can be at most one faulty processor in the TMR system, diffusion of single-signed messages ensures that if

*m* of a faulty processor enters

*accepted*i of a non-faulty

*P*i, then the other non-faulty processor receives

*m* at least once.
From the description above, it is obvious that: (i) a message sent or received by a non-
faulty processor will have either one or two authentic signatures; and (ii) every sent message,
but no received message, carries the host processor’s signature. For a given message

*m*,

*path*(

*m*) is defined as the ordered sequence of processors that have signed

*m*. Thus, if

*m* is a double-signed message that is formed by

*P*j and diffused by

*P*k, then

*path*(

*m*) =

*P*j:

*P*k. The
first processor in

*path*(

*m*) is called the

*originator* of

*m* and the last processor the

*immediate *
*sender* of

*m*. Note that the

*originator* and the

*immediate sender* of

*m* are one and the same if

*m* is single-signed. Two paths are said to

*intersect* if they contain one or more processors
commonly in them.

**3.2. Timeliness Checks **
These checks enable a processor to determine the timeliness of a received

*m*. Before
presenting them, we will define a clock time interval

*d* such that by measuring

*d* in its local
clock a non-faulty processor is guaranteed to measure a real time interval of at least δ
duration, i.e.,

*d* ≥ δ/(1 - ρ);

*d* is known to, and identical for all non-faulty processors of the
system. We will assume, for simplicity, that a processor takes zero time to execute any
instruction of the protocol and the

*send*(

*m*) and

*receive*(

*m*) primitives. (Realising this
assumption in practice will require an increase in the value of

*d*, which is possible as the
protocol does not impose any upper bound on the value of

*d*.)
Fixing the processor

*P*i in the TMR system to be non-faulty, we will describe the checks
by which

*P*i determines the timeliness of a received message. Suppose that

*P*i receives a message

*m* at its local clock time

*t*i. There can arise one of three possible situations depending on the value of

*MC*i at

*t*i:

*MC*i <

*m*.

*TS* or

*MC*i =

*m*.

*TS* or

*MC*i >

*m*.

*TS*. If

*MC*i <

*m*.

*TS* or

*MC*i =

*m*.

*TS* then

*m* is a ‘future’ or a ‘present’ message respectively and is considered by

*P*i as timely; if, on the other hand,

*MC*i is already larger than

*m*.

*TS* when

*m* is being received, then

*m* is a ‘past’ message, and its timeliness should be judged based on how much time has elapsed since

*MC*i first became larger than

*m*.

*TS*. So, timeliness checks are needed only for messages received with past timestamps. Let us suppose that the

*m* received at

*t*i is a past
Let

*m*’, be the message whose acceptance by

*P*i caused

*MC*i to become larger than

*m.TS*
for the first time. That is, just before

*P*i accepted

*m*’,

*MC*i ≤

*m.TS*. Note that

*P*i could have
either received

*m*’ from another processor or itself formed and sent

*m*’; and also that

*m*’.

*TS *≥

*m.TS* because acceptance of

*m*’ has caused

*MC*i to exceed

*m.TS*. Let

*P*i accept

*m*’ at its clock time

*t*i

*’*,

*t*i

*’* <

*t*i. For

*m* to be considered timely, (

*t*i -

*t*i

*’*) must be less than a certain
bound, called the

*timeliness bound*, whose length depends on

*path*(

*m*’) and

*path*(

*m*). Table 1
summarises these timeliness bounds for various combinations of

*path*(

*m*’) and

*path*(

*m*). For example, if

*m*’ is a single signed message from

*P*k and

*m* is a single signed message from

*P*j, then the entry B2 (row B, column 2) indicates that (

*t*i -

*t*i

*’*) must be less than 2

*d* for

*P*i to
consider

*m* timely. We will denote an entry of table 1 by treating the table as a matrix: Table1[

*P*r,

*P*c] denotes the value in the row corresponding to the path

*P*r and in the column corresponding to the path

*P*c.

*P*k:

*P*j

* *(E)

* d *
Table 1:

*P*i’s timeliness bounds for a past

*m* given that

*MC*i first exceeded

*m*.

*TS *due to accepting

*m*’

*.*
**3.3. Protocol Description **
To perform the timeliness checks on received messages, non-faulty

*P*i maintains path
counters, denoted as

*PC*i[

*p*], for every path

*p* through which

*P*i can receive a message. Thus,

*P*i maintains four path counters, one for each

*p* ∈ {

*P*k,

*P*j,

*P*j:

*P*k,

*P*k:

*P*j}. These counters have
integer values which are initialised to (

*INIT_VAL *- 1) when the system is formed. A path counter

*PC*i[

*p*] is set to

*T*, to indicate that any

*m*,

*m*.

*TS* ≤

*T* and

*path*(

*m*) =

*p*, to be received
thereafter, must not be considered timely. A primitive

*update(path: p*,

*timestamp: T)* is used which, upon being invoked, sets

*PC*i[

*p*] =

*maximum of *{

*PC*i[

*p*],

*T*}

*. *The instruction
“

*schedule op at t*” schedules the operation

*op* at (local) clock time

*t,* if

*t *is a future time when the instruction is executed. Whenever

*P*i forms and sends

*m*’ or receives a timely

*m*’, executions of the

*update* primitive are scheduled to set

*PC*i[

*p*] =

*maximum of *{

*PC*i[

*p*],

*m*’.

*TS*}, for every

*p* ∈ {

*P*k,

*P*j,

*P*j:

*P*k,

*P*k:

*P*j} after the elapse of time intervals indicated in
Table1[

*path*(

*m*’),

*p*]. With the path counters being updated at appropriate timing instances, the timeliness of a received

*m* is checked simply by referring to

*PC*i[

*path*(

*m*)]:

*m* is timely only if

*m*.

*TS* >

*PC*i[

*path*(

*m*)] when

*m* was received.
We define

*PC*i,min =

*minimum of* {

*PC*i[

*p*], ∀

*p* |

*p* ∈ {

*P*k,

*P*j,

*P*j:

*P*k,

*P*k:

*P*j}}. Any

*m* that
is received after

*PC*i,min has become larger than or equal to

*m*.

*TS* is untimely irrespective of

*path*(

*m*) and cannot enter

*accepted*i. So, after

*PC*i,min ≥

*T*, every

*m*,

*m*.

*TS* ≤

*T*, that is already in

*accepted*i is said to have become ‘stable’ and can be safely ordered. Stable messages with a given timestamp

*T*, are grouped together into a list called

*stable*i(

*T*)

* *for ordering. The entries of

*stable*i(

*T*) are first stripped of the signatures they have, and then duplicate entries are
Two messages

*m* and

*m*’ are said to be

*spurious* if

*m*.

*O* =

*m*’.

*O*,

*m*.

*TS* =

*m*’.

*TS* and

*m*.µ ≠

*m*’.µ. Spurious messages are generated when the faulty processor gives the same timestamp to distinct messages

*m* and

*m*’ it forms. Spurious messages in

*stable*i(

*T*) are discarded. The

*stable*i(

*T*) will now contain at most one message originating from a given
processor. If it contains more than one message, they are ordered according to their source processors. Messages of

*stable*i(

*T*) are ordered only after the contents of all

*stable*i(

*T*’),

*T*’

* < T,* have been ordered. This is ensured by using the

*stability counter* *SC* which is initialised to (

*INIT_VAL* - 1). The protocol for

*P*i is presented in figure 2.

**do **/* Broadcast process: handles external messages */

get µ from outside the system;

m.µ = µ; m.TS = MCi; m.O = i;

MCi = MCi + 1;

t = clocki;

**for **every p ∈ {Pk, Pj, Pj:Pk, Pk:Pj} do

schedule update(PCi[p], m.TS) at t + Table1[path(m), p];

**end **for

send(m);

acceptedi = acceptedi ∪ {m};

**end **do /* Broadcast process */

||

**do **/* Diffuse process: handles internal messages */

receive(m);

**if **(m.TS ≤ PCi[path(m)])

**then **discard(m)

**else**
MCi = maximum of {MCi, m.TS + 1};

t = clocki;

**for **every p ∈ {Pk, Pj, Pj:Pk, Pk:Pj} do

schedule update(PCi[p], m.TS) at t + Table1[path(m), p];

**end **for

send(m);

acceptedi = acceptedi ∪ {m};

**end **do /* Diffuse process */

||

**do **/* Order process: identifies and orders stable messages */

PCi,min = minimum of {PCi[p], ∀=p | p ∈={Pk, Pj, Pj:Pk, Pk:Pj}};

**while **SCi ≤ PCi,min do

SCi = SCi + 1;

stablei = {m | m ∈ acceptedi and m.TS = SCi};

acceptedi = acceptedi - stablei; /* remove stable messages */

**if **stablei ≠ ∅

**then**
distinct = ∅;

**for **every m in stablei do

m.S = ∅; /* strip off signatures from m */if m ∉ distinct then insert m in distinct end if

**end **for

stablei = distinct;

spurious = {m, m’|m=∈=stablei ∧=m’∈=stablei ∧=m’.O=m.O ∧==m’µ≠m.µ};

stablei = stablei - spurious; /* remove spurious messages */

**if **stablei≠∅

**then **∀=m|m=∈=stablei deliver m ordered by m.O

**end **if

**end **while

**end **do /* Order process */

Figure 2. The Timeout-based Protocol for Message Ordering.

**4. Estimating Timeliness Bounds **
Central to protocol correctness is the accurate estimation of safe timeliness bounds which
also need to be small if the protocol is to order messages fast. This section presents the
arguments for, and the rationale behind the derivation of the smallest safe bounds. It is done
in three parts. First, the basic condition that should be met for the protocol to be correct is
stated. We then analyse how meeting this condition is made difficult by various possible
failure modes of the faulty processor; this analysis leads to the identification of the

*safety *
*requirement*. The third part shows in two stages how the timeliness bounds used in the
protocol are derived to be safe. First, the bounds are derived in the most straightforward
manner; next, the contexts where bounds can be safely reduced are identified and the bounds
reduced. Note that the smaller the timeliness bounds used, the earlier the accepted messages
are stabilised and ordered, resulting in smaller ∆ (the upper bound on message ordering

**4.1. The Basic Condition **
The following condition must be met for the protocol to be correct:

**Unanimous acceptance**:

*m* enters the

*accepted* list of a non-faulty processor if and only if

*m*
or a message equivalent to

*m* enters

* *the

*accepted* list of every other non-faulty processor in
A message equivalent to

*m* is denoted as

*equiv*(

*m*) and is defined as any other message that
differs from

*m* only in the signature field; that is,

*equiv*(

*m*).µ =

*m*.µ,

*equiv*(

*m*).

*O* =

*m*.

*O*,

*equiv*(

*m*).

*TS* =

*m*.

*TS* and

*equiv*(

*m*).

*S* ≠

*m*.

*S*. Since a non-faulty processor does not accept an
incoming message containing its own signature, it accepts at most one

*equiv*(

*m*) after having
accepted

*m*. Note that

*equiv*(

*m*) and

*m* become identical once their signatures are stripped off
(i.e., once the signature fields are set to Φ).
It can be noted from the protocol description that there exists a local time when a non-
faulty processor stops accepting messages of certain timestamp value. When that time comes
for a given timestamp value

*T*, an already accepted message with timestamp

*T *becomes stable. If the basic condition holds, then any two non-faulty processors, say,

*P*i and

*P*k,
construct an identical

*stable*(

*T*) of signature-stripped and non-spurious messages for every
given

*T*,

*T *≥

*INIT_VAL*. Given that processor ordering is unique and known to non-faulty
processors, the order determined by them for a (signature-stripped)

*m* will be identical.

**4.2. Effects of Failures **
A faulty processor, say

*P*j, can fail in the following ways to prevent the above basic
F1 (

*impersonating a non-faulty processor*):

*P*j generates a signed message on behalf of a
non-faulty processor and attempts to deceive the other non-faulty processor into accepting the forged message. For example,

*P*j fabricates a message with

*P*k’s (forged) signature and diffuses it to

*P*i, with

*path*(

*m*) =

*P*k:

*P*j. If

*P*i were to accept

*m*, it will not diffuse

*m* to

*P*k as
the accepted message is already double-signed. However, by assumption A3, such attempts by faulty

*P*j are detectable and hence failures of this type are not a threat to meeting the basic
F2 (

*delayed sending of own messages*):

*P*j arbitrarily delays the sending of a message

*m* it
generates. A failure of this type can lead to one of the following outcomes: (i) both the non-
faulty processors find

*m* untimely and hence both discard

*m*; (ii) both find

*m* timely and enter a copy into their respective

*accepted* list; or (iii) one non-faulty processor, say

*P*i, finds it
timely and the other does not. Note that the first two outcomes do not pose any problem since
both the non-faulty processors are unanimous in accepting or not accepting

*m*. The last
outcome lacks this unanimity and is depicted in figure 3(a). The situation of

*m* being entered only in

*accepted*i and not in

*accepted*k, has to be dealt with if identical ordering is to be
Figure 3. (a)

*P*i accepts

*m* but not

*P*k. (b) diffusion failure.
F3 (

*two-facing while sending own messages*):

*P*j sends a properly signed

*m* to say,

*P*i, and
sends to

*P*k either (a) an inauthentic version of

*m*, or (b) nothing, or (c) a different, authentic message

*m*’ (that is never sent to

*P*i). The effect of a two-facing failure is the same as the last
outcome of the previous category: a 1-signed message enters the

*accepted* list of one non-
faulty processor and neither that message nor its equivalent ever enters the

*accepted* list of the
F4 (

*failure during diffusion*):

*P*j fails while diffusing a message

*m* which it received from,
say,

*P*i as shown in figure 3(b). It can fail by altering the contents of

*m* (call this failure type
F4.1) or by delaying the diffusion by an arbitrary amount of time (call this type F4.2). Assumption 3(b) reduces the impact of F4.1 by enabling

*P*k to detect

*P*j’s tampered message as inauthentic and discard the diffused message. F4.2 can result in

*P*k not receiving the
diffused message at all, or receiving it but finding it untimely. Thus, both F4.1 and F4.2 can prevent

*equiv*(

*m*) from entering

*accepted*k.
To deal with the problematic outcomes caused by failures of F2, F3, and F4 types, the
timeliness checks are required to be

*safe*:

**safety requirement**: a non-faulty processor finds a received

*m* timely if the immediate sender

Let us suppose that the timeliness checks meet the safety requirement. Consider the third
outcome caused by an F2 type failure (see figure 3(a)).

*P*i will diffuse

*m* after it has entered it into

*accepted*i. By the safety property of timeliness checks,

*P*k is guaranteed to find the

*P*i’s diffused message timely; thus,

*equiv*(

*m*) is guaranteed to enter

*accepted*k. This argument also
shows that an F3 type failure is not a threat when timeliness checks are safe. Let us next consider figure 3(b), which depicts an F4 type failure. Since

*P*i, the originator as well as immediate sender of

*m*, is non-faulty, the safety property guarantees that

*P*k finds

*m* timely and that

*m* enters

*accepted*k as well. Thus, if the timeliness checks are safe, the above-
mentioned basic condition is met and therefore identical ordering is achieved despite the
arbitrary failures of the faulty processor.
It can be observed from the protocol description that the value of a non-faulty processor’s

*MC* (message counter) is always one more than the largest timestamp of messages accepted
so far; also, that the value of any of its

*PC* (path counter) can never exceed the largest
timestamp of the accepted messages. So, a non-faulty processor’s

*MC* will be at least one
more than any of its

*PC*. This means that when a received

*m* is a future or present message, it
is guaranteed to be found timely. That is, the safety requirement is trivially met if the received
message is a present or future one. So, in what follows, we will show that the timeliness bounds used are safe when a non-faulty

*P*i finds a received

*m* to be a past message.

**4.3. Derivation of Timeliness Bounds **
We will present our derivations in the context in which the bounds of Table 1 are
presented:

*P*i is non-faulty and accepts

*m*’ at its clock time

*t*i

*’* and receives

*m*,

*m*.

*TS* ≤

*m*’.

*TS*, at

*t*i,

*t*i >

*t*i’; just before

*t*i’,

*MC*i ≤

*m.TS*, and at

*t*i’

*MC*i >

*m*’

*.TS.* To keep the description
simple, we assume the following. (i) The minimum message transmission delay (as measured
by a non-faulty clock) is zero. (ii) The clocks of all non-faulty processors have identical
running rates. (This assumption is removed in subsection 4.3.3.) (iii) Unless stated explicitly, time is measured according to the clock of

*P*i. So, when we say an event happened at (time)

*t*, it means that the event happened at time

*t* according to

*P*i’s clock. (iv) Finally, the subscript i is dropped from

*t*i and

*t*i’, where the context is obvious.

**Lemma 4.1**. Given that

*P*i

* *accepts

* m*’ at

*t*’, any non-faulty

* P*k

* *will have

* MC*k

* > m’.TS before *
*Proof*: If

*P*k has signed

*m*’, then

*MC*k >

*m*’.

*TS* when it signed

*m*’. So,

*MC*k >

*m*’.

*TS* becomes true before

*t*’ or at

*t*’ if

*m*’ has experienced zero transmission delay in reaching

*P*i from

*P*k. If

*P*k has not signed

*m*’, it will receive the diffused

*m*’ from

*P*i before

*t*’ +

*d*. Upon the reception of

*m*’, if

*MC*k is not already greater than

*m*’.

*TS* then

*m*’ is either a present or a future message, and therefore must be accepted by

*P*k which will then set

*MC*k >

*m*’.

*TS*.
Say, non-faulty

*P*k forms and sends

*m, m*.

*TS* ≤

*m*’.

*TS*. By lemma 4.1,

*MC*k becomes larger than

*m*’.

*TS* before

*t*’ +

*d*. So,

*m* should have been sent by

*P*k before

*t*’ +

*d*. Even if

*m *experiences the maximum delay of just less than

*d* time,

*P*i should receive

*P*k’s single-signed

*m* before

*t*’ + 2

*d*. It suggests that if

*P*i receives a single-signed message

*m* at

*t* such that
(

*t* -

*t*’) < 2

*d*, it must consider

*m* timely. This scenario is depicted in figure 4(a) where time
progresses from left to right and the labels “>

*m*’.

*TS*” and “=

*m.TS*+1” indicate the earliest
instances when a non-faulty message counter exceeds

*m*’.

*TS *and when it becomes equal to

*m*.

*TS* + 1, respectively; “<

*nd*” labels an interval of length less than

*nd*, for some integer

*n* ≥ 1. On the time-line of

*P*k (in figure 4(a)), the timing instance labelled “=

*m.TS*+1” cannot be on the right hand side of that labelled “>

*m*’.

*TS*” since

*m*.

*TS* ≤

*m*’.

*TS*. So,

*P*i can receive a timely

*m* from

*P*k at

*t*, (

*t* -

*t*’) < 2

*d*.
Figure 4.

*m*.

*TS* ≤

*m*’.

*TS*. (a) Non-faulty

*P*k forms and sends

*m*. (b)

*P*k diffuses

*P*j’s

*m*.
Suppose that

*m* originates from a faulty

*P*j and that

*P*k receives it within 2

*d* time after

*MC*k >

*m*’.

*TS* has become true. (See figure 4(b).) Just like in figure 4(a) where

*P*i accepts

*P*k’s

*m* because

*m* arrives within 2

*d* time after

*MC*i >

*m*’.

*TS*,

*P*k must now consider

*P*j’s

*m*
timely. (Note that a non-faulty processor cannot know whether another processor is non-faulty or faulty.) When

*P*k diffuses

*m*,

*P*i must find the diffused message timely for the safety property to be met. From figure 4(b),

*P*i can receive the diffused

*m* at

*t*,

*t* <

*t*’ + 4

*d*. So,

*P*i’s
timeliness check for accepting a double signed

*m* is (

*t* -

*t*’) < 4

*d*.

**4.3.1. Motivation for reducing the bounds **
The timeliness bounds of 2

*d* for 1-signed messages and 4

*d* for 2-signed messages are safe
for the protocol. Reducing these bounds further (without compromising safety) can lead to smaller ∆. To illustrate this, let us evaluate ∆ for the following scenario:

*P*i receives and accepts 1-signed

*m*’ from

*P*k at time

*t*’. Suppose that there is no

*m, m*.

*TS* ≤

*m*’.

*TS*, left in the system for ordering. Non-faulty

*P*i deduces that it will never accept a 1-signed

*m* and a 2-signed

*m* at

*t*’ + 2

*d* and

*t*’ + 4

*d* respectively. So, if the 1-signed

*m*’ from

*P*k had taken maximum transmission time to reach

*P*i, the delay (in real-time) for

*P*i to order

*m*’ would be δ=+ 4

*d* with ρ having been assumed to be zero; i.e., ∆ ≥ =δ +4

*d.* Suppose that we are able to show that it is enough for

*P*i to wait until

*t*’ + 3

*d* to deduce that no 1- signed or 2-signed

*m, *
*m*.

*TS* ≤

*m*’.

*TS,* will ever be accepted; this smaller waiting period reduces the ordering delay
for

*m*’ to δ=+ 3

*d*. Suppose also that we are able to show that the ordering delay for all types of

*m*’ that

*P*i can possibly receive and accept, can be reduced as well, then ∆ itself can be
reduced, resulting in a faster ordering protocol. Below, we will explain the intuition behind
our attempts at reducing timeliness bounds.
Given that

*P*i has received and accepted

*m*’ at

*t*’, let the originator of

*m*’ be

*P*k and

*m*’ be
1-signed or 2-signed.

*P*i can now expect any of

*P*k’s 1-signed

*m, m*.

*TS* <

*m*’.

*TS*, to be received before

*t*’+

*d*, if

*P*k is non-fa

*P*k sends the messages it forms, in the increasing order of message timestamps; another non-faulty

*P*i must receive the earlier
one within at most

*d* after it has received the later one. So, the conclusion is: given that

*m*’ is
accepted at

*t*’, the timeliness bound for a 1-signed

*m *such that

*m*.

*TS* ≤

*m*’.

*TS* and the

*originator* of

*m*’ ≡

*originator *of

*m*, is

*d*. Note that this bound is

*d* smaller than the generic
bound we have earlier derived for a 1-signed

*m*.
It can be observed that we have achieved the above reduction (i) by considering the
slowest possible behaviour of a message

*m* from another non-faulty processor, with a hypothesis that

*P*i has already accepted

*m*’,

* m’*.

*TS* ≥

*m*.

*TS*; and, (ii) for the case where

*path(m*’

*)* intersects

*path(m)*. It turns out that whenever (ii) holds, reduction is possible.
Intuitively, the timing behaviour of a potential

*m* from another non-faulty processor can be
reasoned out more precisely, if the potential

*m* and the accepted

*m*’ (with

* m’*.

*TS* ≥

*m*.

*TS*) had
been handled by the same processor. Because there are only three processors in the TMR
system, it is guaranteed that

*path(m*’

*)* intersects

*path(m)* whenever

*m* is double signed and

*m*’ is not formed by

*P*i. This means that when the accepted

*m*’ is not locally formed,

*P*i needs to
wait only for a smaller period (smaller by at least

*d*) in order to be sure that no 2-signed

*m* will any longer be accepted; so, even if

*P*i needs to wait until

*t*’+ 2

*d* for 1-signed messages, it
can start ordering

*m*’ by

*t*’+ 3

*d*.
There are fourteen combinations of

*path(m*’

*)* and

*path(m)* for which reduction is feasible.
They correspond to those entries in table 1 where the bounds are neither 2

*d* nor 4

*d*, and to the
entries D3 and E4 which have 2

*d* for a double-signed

*m*. In the next sub-section, we explain
the rationale behind the reduction by considering three representative path combinations; for
each, we also indicate the dual cases for which the explanation can be readily applied - thus
covering eight path combinations in total. For space reasons, we omit the explanations for the
remaining six combinations which are not dissimilar to the combinations considered. Note
that this omission does not leave the paper incomplete, as Section 5 proves that the protocol
is correct, i.e., all reductions applied are correct. (A reader who is not interested in the
reduction rationale but only in the overall correctness, can proceed straight to section 5.)

**4.3.2. Rationale for reducing the bounds **
We will make a default assumption that the immediate sender of

*m* (taken mostly as

*P*k) is
non-faulty and name a particular path combination after its corresponding entry in table 1.
Case B1:

*path(m*’

*)* =

*path(m)* =

*P*k. Since both

*m* and

*m*’ originate from non-faulty

*P*k,

*m*.

*TS*
and

*m*’.

*TS *cannot be equal. By hypothesis,

*m*.

*TS* cannot be larger than

*m*’.

*TS*; so, only

*m*.

*TS* <

*m*’.

*TS *is possible. So,

*P*k must have sent

*m* first and then

*m*’. (See figure 5(a).) Since
2Note that the safety requirement is void if the immediate sender of

*m* is faulty.

message transmission can take as small as zero time and as large as just less than

*d* time, (

*t* -

*t*’) <

*d*. Similar argument applies also for the case D1, where

*path(m*’

*)* =

*P*j:

*P*k and

*path(m)* =

*P*k. Here, the non-faulty

*P*k does not form

*m*’ but receives it from

*P*j and diffuses it after setting its

*MC*j >

*m*’.

*TS*. So, it must have sent

*m* before diffusing

*m*’ and therefore (

*t* -

*t*’) <

*d*. Replacing

*P*k by

*P*j as the immediate sender of

*m* (and therefore considering

*P*j as non-faulty) in these two cases leads to the bound of

*d* for cases C2 (

*path(m*’

*)* =

*path(m)* =

*P*j) and E2 (

*path*(

*m*’) =

*P*k:

*P*j and

*path*(

*m*) =

*P*j).
Case E1:

*path*(

*m*’) =

*P*k:

*P*j and

*path(m)* =

*P*k. Both

*m* and

*m*’ originate from non-faulty

*P*k which must have sent

*m* first and then

*m*’ since

*m*.

*TS* <

*m*’.

*TS*. But

*P*i receives

*m*’ first and then

*m*. This can happen, as illustrated in figure 5(b), when

*P*k sends

*m* and

*m*’ close to each other and the transmission delay of

*m*’ from

*P*k to

*P*j and then from

*P*j to

*P*i is less than the transmission delay of

*m* from

*P*k to

*P*i. Since message delays between non-faulty processors are bounded by

*d,* *P*i must receive

*m* before

*t*’ +

*d*. Hence the timeliness bound for this case is

*d*. It also applies for the case D2 which is the dual of E1 with

*P*j being the non-faulty
Case B4:

*path*(

*m*’) =

*P*k and

*path(m)* =

*P*k:

*P*j. Observe that

*P*j here is the non-faulty immediate sender of

*m* and

*P*k can be faulty or non-faulty. Since

*P*i accepts

*m*’ (by hypothesis), it will diffuse the message to

*P*j.

*P*j also diffuses

*m* to

*P*i. So, there are two cases to consider:

*P*j receives the single-signed

*m* from

*P*k either (i) before or (ii) after it receives the diffused

*m*’ from

*P*i. The subcase (i) is shown in figure 6(a).

*P*j can receive the diffused

*m*’ from

*P*i at any time before

*t*’ +

*d*. Even if it diffuses

*m* just before receiving

*m*’,

*P*i will
receive the diffused

*m* before

*t*’ +

*d* +

*d*; that is, (

*t* -

*t*’) < 2

*d*.
Figure 6(b) illustrates the second subcase where

*P*j receives the single-signed

*m* from

*P*k after it has received the double-signed

*m*’ from

*P*i. Since

*P*j diffuses

*m* to

*P*i, it must have found
the single-signed

*m* timely. Suppose that it also finds the double-signed

*m*’ timely. This means that

*MC*j >

*m*’.

*TS* becomes true by the time

*P*j accepts the double-signed

*m*’. When it
receives the single-signed

*m* after having accepted the double-signed

*m*’, it is in the same

situation as

*P*i in case E1 where

*path*(

*m*’) =

*P*k:

*P*j,

*path(m)* =

*P*k and the timeliness bound is

*d*. So, for the non-faulty

*P*j to find the single-signed

*m* timely, it must have received

*m* within

*d* time after

*MC*j >

*m*’.

*TS* became true; that is, the time elapsed between receiving

*m*’ and

*m*
must be less than

*d*. From figure 6(b), (

*t* -

*t*’) < 3

*d* is possible for this subcase. We now have to show that

*P*j does find the double-signed

*m*’ timely, which is done as lemma A1 in
appendix A. The proof is based on the observation that the double-signed messages have
larger timeliness bounds than the single-signed ones; so, it is not possible for a non-faulty
processor to reject a double-signed message as untimely and then accept later a single-signed
message of smaller or equal timestamp as timely.
Choosing the largest of the bounds estimated for the two subcases, 3

*d* becomes the timeliness bound for B4; this bound also applies for C3 where

*P*k is substituted for

*P*j as the non-faulty
Figure 6. Case B4. (a)

*P*j receives

*m* before

*m*’. (b)

*P*j receives

*m* after

*m*’.

**4.3.3. Adjusting for non-identical clock rates **
We now focus on our earlier assumption that non-faulty processors’ clocks have identical
running rates. When the readings of non-faulty clocks can drift apart with the passing of real
time, the differences accumulated over a period of time must be compensated. The way we do
this compensation is by increasing the value of

*d* by the maximum difference which the non-
faulty clocks could develop during the ‘life-time’ of a non-faulty processor’s message within
the ordering protocol, i.e., during the maximum real-time within which a non-faulty
processor’s message gets ordered by every non-faulty processor in the system. Appendix B
presents proofs to show that δ/(1 - 5ρ) is the smallest and safe lower bound required on

*d* to
compensate for the non-identical running rates of non-faulty clocks. Below, we present the
‘worst-case’ scenario which gives rise to this bound on

*d*.
Suppose that non-faulty

*P*k’s clock is running fast with ρk = -ρ, and non-faulty

*P*i’s clock
is running slow with ρi = ρ. Let

*P*i form and send

*m*’ at local clock time

*t*i’, and let

*m*’ be
transmitted in zero time; let

*P*k accept

*m*’ at its clock time

*t*k’. Since

*m*’ is assumed to be transmitted in zero time, the clocks of

*P*i and

*P*k read

*t*i’ and

*t*k’ respectively at the same real
Let faulty

*P*j behave in the most disruptive manner in sending its

*m*,

* m*.

*TS *≤

*m*’

*.TS*: it
forms

*m*, delays its sending for a while, and sends only to

*P*i - the processor with the slow clock. Let

*P*i receive the single-signed

*m* from

*P*j at the latest time for accepting such

*m*: just before its clock time

*t*i’ + 2

*d*. So,

*P*i accepts

*m *and diffuses double-signed

*m* to

*P*k. For

*P*k to accept the 2-signed

*m* from

*P*i, its clock, despite running fast, should not be reading past

*t*k’ + 3

*d* (see entry B3 of table 1, interchanging the suffixes k and i) when it receives that 2-signed

*m*, even if the transmission from

*P*i to

*P*k is to take just less than δ=- the maximum permitted time (see assumption 4). So,

*d* must be large enough to ensure that τ + 3

*d*(1 + ρk) ≥ τ + 2

*d*(1 + ρi) + δ, i.e., 3

*d*(1 - ρ) ≥ 2

*d*(1 + ρ) + δ; that is

*d* ≥ δ/(1 - 5ρ).

**5. Protocol Correctness and Relative Performance **
**Lemma 1: **The protocol guarantees the validity condition with ∆== 4

*d*(1 + ρ), provided

*d* ≥ δ/(1 - 5ρ).

*Proof*: Consider an execution of the protocol in which

*P*i and

*P*k are non-faulty. Let

*P*i form

and send a message

*m* at real-time τi.

*P*k receives

*m* before τi + δ (by assumption 4). Lemmas

in the appendix B prove that the timeliness bounds of table 1 are safe when

*d* ≥ δ/(1 - 5ρ). So,

*P*k will find the received

*m* timely and accept it. Note that when a non-faulty processor
accepts a message

*m*, none of its path counters stays below

*m.TS* after some finite time. So, the message

*m* accepted by

*P*i and

*P*k, will be taken up for ordering and be ordered because a faulty processor cannot form and send another message that could make

*P*i and

*P*k consider

*m*
as spurious (due to assumption 3). This shows that the validity condition is met within a finite time after τi.
The

*schedule* instructions in the

*Broadcast* process indicate that

*PC*i,min reaches

*m*.

*TS* within 4

*d* clock time after τi. So,

*P*i orders

*m* no later than τi + 4

*d*(1 + ρi). The

*schedule* instructions in the

*Diffuse* process indicate that

*PC*k,min reaches

*m*.

*TS* within 3

*d* clock time after

*P*k receives

*m* which is before τi + δ.

* *So,

*P*k orders

*m* no later than τi + δ=+ 3

*d*(1 + ρk). Thus the message

*m* sent by

*P*i gets ordered by

*P*i and

*P*k, no later than τi +

*maximum of *{4

*d*(1 + ρi), δ + 3

*d*(1 + ρk)}. By assumption 5, |ρi| ≤ ρ and |ρk| ≤ ρ. So, ∆==

*maximum of* {4

*d*(1 + ρ), δ + 3

*d*(1 + ρ)} = 4

*d*(1 + ρ).
Let

*Stable*i(

*T*), for some

*T*,

*T* ≥

*INIT_VAL*, denote the set

*stable*i which non-faulty

*P*i first

constructs during an execution of the protocol when

*SC*i =

*T*. The code for the Order process

indicates that (a1)

*P*i strips off signatures of every

*m* in

*Stable*i(

*T*) and removes duplicates;

and then, (a2) it removes spurious messages from

*Stable*i(

*T*). Let

*SigFree*i(

*T*) and

* SpuFree*i(

*T*)

denote the resulting

*Stable*i(

*T*) after a1 and a2 are carried out, respectively:

**Definition 1**:

*SigFree*i(

*T*) = {

*m* |

*m* ∈=

*Stable*i(

*T*)=∧=

*m.S* = ∅}.

**Definition 2**:

*SpuFree*i(

*T*) = {

*m *|

*m* ∈=

*SigFree*i(

*T*)=∧=(¬∃

*m*’ ∈=

*SigFree*i(

*T*):=

*m*.O =

*m*’.O)}.

**Lemma 2: **Consider an execution of the protocol in which

*P*i and

*P*k are non-faulty. Suppose

that:

*m* ∈=

*Stable*i(

*T*)

*equiv*(

*m*) ∈=

*Stable*k(

*T*), and

*m* ∈=

*Stable*k(

*T*)

*equiv*(

*m*) ∈=

*Stable*i(

*T*).
Then,

*SpuFree*i(

*T*) =

*SpuFree*k(

*T*).

** **

*Proof*: In reducing

*Stable*i(

*T*) to

*SigFree*i(

*T*),

*P*i empties the

*m*.

*S* field of every

*m* in

*Stable*i(

*T*)

and then discards duplicates.

*m* and

*equiv*(

*m*) are distinguished only by their signature fields,
and setting this field to empty will make them identical. (See the definition of

*equiv*(

*m*) in section 4.1.) If

*Stable*i(

*T*) contains

*m* and

*equiv*(

*m*), the duplicate removal ensures that only one of the identical copies is retained in

*SigFree*i(

*T*). So, by the hypothesis of the lemma,

*SigFree*i(

*T*) =

*SigFree*k(

*T*). Contrary to lemma, assume that

*SpuFree*i(

*T*) ≠

*SpuFree*k(

*T*); also assume that, without loss of generality, there is an

*m* such that

*m* ∈=

*SpuFree*i(

*T*) and

*m* ∉=

*SpuFree*k(

*T*). By definition 2,

*SpuFree*i(

*T*) ⊆

*SigFree*i(

*T*). We have established that

*SigFree*i(

*T*) =

*SigFree*k(

*T*). So,

*m* ∈=

*SigFree*k(

*T*). But

*m* ∉=

*SpuFree*k(

*T*); so, by definition 2, there must have been an

*m*’ in

*SigFree*k(

*T*) such that

*m*.O =

*m*’.O. Since

*SigFree*i(

*T*) =

*SigFree*k(

*T*),

*m*’ ∈=

*SigFree*i(

*T*). By definition 2,

*m* cannot be in

*SpuFree*i(

*T*).

**Lemma 3: **The protocol guarantees the unanimity condition, provided

*d* ≥ δ/(1 - 5ρ).

*Proof*: Consider an execution of the protocol in which

*P*i and

*P*k are non-faulty. Say,

*P*i

accepts some

*m*’. If

*m*’ is signed by

*P*k, then

*equiv*(

*m*’) is already accepted by

*P*k. If

*m*’ is not

signed by

*P*k, then

*P*i will sign and diffuse

*m*’ to

*P*k. Lemmas in the appendix B prove that

the timeliness bounds of table 1 are safe when

*d* ≥ δ/(1 - 5ρ). So,

*P*k will accept the diffused

message due to the safety property of the timeliness bounds. Thus, for every

*m*’ accepted by

*P*i,

*P*k accepts

*equiv*(

*m*’). Observe (from figure 2) that a non-faulty processor accepts no

*m*,

*m*.

*TS* ≤

*T*, once its stability counter

* *(

*SC*) becomes equal to

*T*,

*T* ≥

*INIT_VAL*. So, for every

*m*’ accepted by

*P*i,

*P*k should accept

*equiv*(

*m*’) before

*SC*k becomes equal to

*m*’.

*TS*. Therefore, for any

*T* ≥

*INIT_VAL*, when

*P*i and

*P*k form

*Stable*i(

*T*) and

*Stable*k(

*T*) respectively:

*m* ∈=

*Stable*i(

*T*)

*equiv*(

*m*) ∈=

*Stable*k(

*T*).
By similar arguments, we can show:

*m*” ∈=

*Stable*k(

*T*)

* equiv*(

*m*”) ∈=

*Stable*i(

*T*). By lemma 2,

*SpuFree*i(

*T*) =

*SpuFree*k(

*T*) =

*SpuFree*(

*T*) (say) for every

*T* ≥

*INIT_VAL*. By definition 2,

*SpuFree*(

*T*) will contain at most one message originating from a given processor. Since processor ordering is unique and known,

*P*i and

*P*k will order the entries of

*SpuFree*(

*T*)
identically. Further, entries of

*SpuFree*(

*T*) are ordered only after the entries of all

*SpuFree*(

*T*’),

*T*’ <

*T*, have been ordered. So,

*P*i orders

*m*’ before

*m* if and only if

*P*k orders

*m*’
before

*m*. Thus the protocol satisfies the unanimity condition.

**Theorem 1: **The protocol guarantees unanimity and validity conditions with ∆== 4

*d*(1 + ρ),

provided

*d* ≥ δ/(1 - 5ρ).

*Proof*: Follows from lemmas 1 and 3.

**Ordering Delays with Synchronised Clocks **
Let non-faulty processors’ clocks be synchronised within

*e*. According to the classical
time-based ordering protocol of [Cristian85], when a non-faulty

*P*i forms and broadcasts a
message

*m*, it sets

*m*.

*TS* to the current reading of its synchronised clock; any non-faulty
processor can stabilise an accepted

*m* only at its clock time

*m*.

*TS* + 2(

*d* +

*e*). Therefore, a
message

*m* whose broadcast has been initiated at real time τ is ordered by its non-faulty originator at τ + 2(

*d* +

*e*)(1 + ρ

*m*.O) and by any non-faulty

*P*j,

*j* ≠

*m*.O, at some real time within the interval [τ + 2(

*d* +

*e*)(1 + ρj) -

*e*(1 + ρj), τ + 2(

*d* +

*e*)(1 + ρj) +

*e*(1 + ρj)], depending on the actual difference between the readings of

*P*j’s and

*m*.O’s clocks (which is bounded by

*e*) at the time

*m* is ordered by

*P*j. So, the upper bound on time-based ordering
delays is (2

*d* + 3

*e*)(1 + ρ), or 2

*d* if

*e* and ρ are negligibly small compared to

*d*; if only ρ is
negligibly small compared to

*d*, then the upper bound becomes 2(

*d* +

*e*) +

*e*.
We will now establish the favourable conditions listed in section 1. Let λ and

* d*a be as
defined there, and all processors be non-faulty. Suppose also that an input µ is first received
by a processor within the TMR system at real-time τ, and that ρ is negligibly small. By τ + λ, every processor forms and broadcasts a message

*m* with

*m*.µ = µ. By τ + λ + 2

*d*a, every processor, say

*P*i, accepts two double-signed messages

*m*’ and

* m*” such that

*path*(

*m*’) ≠

*path*(

*m*”) and

*m*’.µ =

*m*”.µ = µ; the one with a lower timestamp, say

*m*’

*, *becomes stable at

*P*i by τ + λ + 2

*d*a + 2

*d*: if

*P*i accepts

*m*’ at τ’, τ’ ≤ τ + λ + 2

*d*a, say, with

*path*(

*m*’) =

*P*j:

*P*k, then
by the entry D3 of Table 1, it will not accept a double-signed

*m* with

*m*.TS ≤

*m*’.TS and

*path*(

*m*) =

*P*j:

*P*k, after τ' + 2

*d *≤ τ + λ + 2

*d*a+ 2

*d*; further, since

*P*i accepts

*m*” with

*path*(

*m*”) =

*P*k:

*P*j by time τ + λ + 2

*d*a, it will not accept any double-signed

*m* with

*path*(

*m*) =

*P*k:

*P*j and

*m*.TS ≤

*m*”.TS >

*m*’.TS, after τ + λ + 2

*d*a + 2

*d*. Thus, every processor orders input µ by τ + λ + 2

*d*a + 2

*d*; with the time based protocol, no correct processor can order µ before=τ + 2(

*d* +

*e)*. Thus, our protocol is guaranteed to be faster when λ=< 2(

*e- d*a).

**5.1. Optimal Upper Bound **
We next show that the ordering bound of our protocol is the smallest achievable when (i)
clocks are not synchronised, (ii) ordering is symmetric, and (iii) a processor cannot deduce
temporal order between concurrent messages it receives. Each of these premises is defined

*Unsynchronised Clocks:* Non-faulty processors’ clocks are not synchronised, where clock is a device which a processor uses for observing time. To state formally, let

*c*i(τ) denote the reading of

*P*i’s clock at real time τ. Clocks of non-faulty

*P*i and

*P*j are said to be unsynchronised during an interval ι if |

*c*i(τ) -

*c*j(τ)| is arbitrary for every τ=in ι.

*Symmetric Ordering*: Our protocol, like [Cristian85], is

*symmetric* in the sense that the
correct processors run the same program (except for process identities and signatures). In
contrast, in an asymmetric protocol (e.g. [Echtle87]), correct processors can execute different
code and hence play different roles: one processor, termed the sequencer, decides and
disseminates the message ordering for other processors to accept what it has decided. In
failure-free executions, asymmetric ordering will be the fastest if the non-sequencer
processors can confirm the sequencer’s correct behaviour without delaying message ordering;
such delay-free confirmation is possible only when the sequencer is guaranteed to fail in a
benign manner (e.g. by crash). However, when (authenticated) Byzantine failures are
permitted, the non-sequencer processors must exchange messages to confirm that the
sequencer had behaved correctly so far; that is, message diffusion must precede message
ordering. Further, when the sequencer fails, message ordering is delayed until the failure is
detected and a new sequencer is elected. Though not proved here, it appears that asymmetric
ordering cannot offer a better worst-case ordering delay.

*Non-deductibility of Temporal Order*: Based on the definition of ‘happened before’ in
[Lampo78], we define two messages to be

*concurrent* iff neither one can be said to have
happened before the other. Temporal order on messages is an order that is based on the
Newtonian time instants at which messages were generated: for any two messages in the
system, one message is before another in the temporal order if the first one was generated
earlier in a Newtonian time-frame (see [Veris00] for a formal definition). We assume that no
processor can deduce temporal order between concurrent messages it receives
For simplicity we will assume ρ = 0 and consider a system in which communication delays
between two non-faulty processors can be at most δm. Let us define δm- such that δm- < δm and (δm - δm-) is infinitely small.

**Theorem 2**: Any symmetric ordering protocol that works only with unsynchronised clocks,

will have executions in which the ordering delay can be 3δm + δm-.

*Proof*: By contradiction. Assume that there is such a protocol which guarantees that ordering delays are

*always* smaller than 3δm + δm-. Consider two distinct executions of this protocol during real-time intervals ι1 and ι2 respectively. By hypothesis, non-faulty processors’ clocks
remain unsynchronised throughout each interval
In the first execution (see figure 7(a)),

*P*i fails

*only *by not sending its messages to

*P*j and not receiving

*P*j’s messages.

*P*i sends

*m*i at its clock time

*t*i. Let

*m*i take zero time to reach

*P*k. Suppose that

*P*k’s clock reads

*t*k when

*P*k receives

*m*i. (Since

*m*i takes zero time, when

*P*k’s clock reads

*t*k,

*P*i’s clock reads

*t*i.) Let

*P*k accept and diffuse

*m*i to

*P*j and the diffused message take δm time to be received by

*P*j. Just before

*P*j receives the diffused

*m*i, i.e. when

*P*k’s clock reads

*t*k + δm-, suppose that

*P*j‘s clock reads

*t*j and that

*P*j forms and sends

*m*j which takes δm time to be received by

*P*k. So,

*P*k’s clock reads (

*t*k + δm- + δm) when

*P*k receives

*m*j. Note that

*P*j sends

*m*j before it receives the diffused

*m*i from

*P*k, and therefore neither

*m*i nor

*m*j happened before [Lampo78] the other. Since

*P*k cannot deduce that

*m*i 3 According to [Lampo78], two messages need not be produced at the same newtonian time, for them to be deemed concurrent. 4 By this, we exclude a class of protocols which permit clock synchronisation messages to be piggybacked onto the order protocol messages and thus achieve clock synchronisation

*during* the order protocol execution.
originated before

* m*j in real-time, we will assume (without loss of generality) that

*m*j is ordered before

*m*i by all non-faulty processors.
Figure 7. Execution Scenarios. (a) First execution. (b) Second execution.
In the second execution (see figure 7(b)),

*P*j fails

*only* by not sending its messages to

*P*i and not receiving

*P*i’s messages.

*P*i sends

*m*i at its clock time

*t*i.

*m*i takes δm time to reach

*P*k and is not received by

*P*j. Suppose that

*P*k’s clock reads

*t*k when

*P*k receives

*m*i, i.e., when

*P*i’s clock reads

*t*i + δm. (Note: this is possible with unsynchronised clocks whose readings can
differ by an arbitrary amount; to prove the impossibility we have chosen the difference to be a convenient amount.) Assume that the

*m*i diffused by

*P*k takes δm time to be received by

*P*j.
When

*P*k’s clock reads

*t*k + δm-, suppose that

*P*j‘s clock reads

*t*j and that

*P*j forms and sends

*m*j only to

*P*k which takes δm time to be received. That is,

*P*k’s clock reads (

*t*k + δm- + δm) when

*P*k receives

*m*j. For

*P*k, this execution is indistinguishable from the first must order

*m*j before

*m*i. Since ordering delays are always smaller than 3δm + δm-, non-faulty

*P*i must order its own

*m*i before

*t*i + 3δm + δm-. Say

*P*k’s diffused

*m*j takes δm time to be received by

*P*i. That is,

*P*i can receive

*m*j (for the first time), only when its clock reads

*t*i + 3δm + δm-. So,

*P*i can accept

*m*j only at or after

*t*i + 3δm + δm-. Hence

*P*i cannot order

*m*j before

*t*i + 3δm + δm-, and therefore before

*m*i. This violates the unanimity condition.
By theorem 2, the upper bound ∆ on ordering delays must be at least (3δm + δm-), i.e.
∆ ≥ (3δm + δm-). Since δm is not known directly, but only its upper bound δ (see assumption 4), ∆ ≥ 4δ. Theorem 1 establishes ∆ of our protocol to be 4

*d*(1 + ρ) with

*d* recommended to
be

*d* ≥ δ/(1 - 5ρ). When

*d* is chosen to be δ/(1 - 5ρ) and when ρ = 0, we have ∆ = 4δ. Thus, if
we ignore the (small) increments on

*d* made to compensate for non-zero ρ, our protocol is
The compensation needed to account for non-zero ρ varies from protocol to protocol and
is influenced very much by the design and the structure of a given protocol. We do not
therefore claim that the compensation needed for our protocol is the smallest required by any
5 This is true even if the sender of a message

*m* timestamps

*m* with the local send time.
protocol. Instead, we show in Appendix B that the compensation factor of (1/(1 - 5ρ)) is the
smallest needed for

*our* protocol.

**6. Concluding Remarks **
Replicated processing with majority voting is a well-known technique for masking the
effects of Byzantine faults in processors. Message ordering is a central requirement for
implementing this concept in a multi-client environment. We have presented a timeout-based
message ordering protocol for TMR systems. It is a two-round protocol, as it has to be
[Dolev82], since there can be at most one faulty processor in the TMR system. We have also
presented in detail the design efforts in keeping the timeliness bounds small; consequently the
protocol provides an upper bound on ordering delays, which is the smallest achievable when
non-faulty processor’s clocks are not synchronised within a known and finite bound.
We conclude the paper by exposing two drawbacks in generalising the protocol to a
system of

*N* processors, where at most π out of

*N* processors can fail. We restrict the context
of generalisation to the one that preserves the minimality of the message cost of the ordering
protocol. (This restriction rules out generalisations into protocols that guarantee early
ordering or smaller latencies at the expense of increased message cost.) A generalised
protocol (with minimum message cost) must however allow (π=+ 1) rounds of messages exchange. This means that the number of path counters which a processor has to maintain
increases exponentially with π and

*N*: one counter for every possible path through which a
message with at most (π + 1) signatures can be received. Secondly, the generalisation must
first assume that a processor must allow a timeliness bound of (2

*d*)

*i*, 1 ≤

*i* ≤ π + 1, for any

*i*-signed message it receives. This follows from the arguments presented (in Section 4.3) for
the TMR protocol before the bounds were reduced. Thus, when a non-faulty

*P* first receives a
timely message

*m* from another processor

*Q*, it has to wait for 2

*d*(π + 1) time before

*m* can be
ordered. This means that the ordering delay bound is ∆ = 2

*d*(π + 1) + δ. Reducing the bound
to 2

*d*(π + 1) as we did for the TMR protocol in Section 4.3.2, is possible only in the following
circumstance:

*N* and π must be such that for any

*m* and any (π + 1)-signed

*m*’ that

*P* may
accept,

*m*’.

*TS *≤

*m*.

*TS*,

* path*(

*m*) must intersect with

*path*(

*m*’). To guarantee that a particular

*m*
and all possible (π + 1)-signed

*m*’ have intersecting paths, we must have (π + 1) ≥

*N* – 1. Note
that the ordering problem is vacuous for π >

*N* – 2, and hence the realistic upper bound on π
is

*N* - 2. So, the ordering bound can be reduced to 2

*d*(π + 1), only when π =

*N* – 2. With π > 1, the reduction is not possible for an NMR system where

*N* has to be at least 2π + 1. In other words, π = 1 is the

*only* case for which ordering delays can be reduced by δ

*and*
replicated processing with majority voting is feasible. That is, the design of the protocol
presented exploits features that are so characteristic of the TMR system in providing the
smallest upper bound on message ordering delays.

**Acknowledgements **
This work has been supported in part by grants from CNPq/Brazil.

**References **
F. V. Brasileiro, P. D. Ezhilchelvan and N. A. Speirs, “TMR processing without
explicit clock synchronisation,” Proceedings of the 14th Symposium on Reliable Distributed Systems, Bad Neuenahr, Germany, pp. 186-195, September 1995.
[Chandra96] T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed
Systems", Journal of the ACM, 43(2), pp. 225 - 267, March 1996.
[Cristian85] F. Cristian, H. Aghili, R. Strong and D. Dolev, “Atomic Broadcast: from Simple
Message Diffusion to Byzantine Agreement,” Digest of Papers, FTCS-15, Ann Arbor, USA, pp. 200-206, June 1985.
[Cristian96] F. Cristian, B. Dancey, and J Dehn, "Fault-Tolerance in Air Traffic Control Systems",
ACM Transactions on Computer Systems, 14(3), pp. 265-286, August 1996.
[Cristian99] F. Cristian and C. Fetzer, "The Timed Asynchronous Distributed System Model", IEEE
Transactions on Parallel and Distributed Systems, 10(6), June 1999.
D. Dolev and H. R. Strong, “Requirements for Agreement in a Distributed System,” Proceedings of the 2nd Symposium on Distributed Databases, pp. 115-129, September 1982.
Dolev, J. Halpern, and H. R. Strong, “On the Possibility and Impossibility of
Achieving Clock Synchronisation,” Proceedings of the 16th Annual ACM STOC, pp. 504-511, April 1984.
K. Echtle, “Fault Masking and Sequence Agreement by a Voting Protocol with low Message Number,” Proceedings of the 6th Symposium on Reliability in Distributed Software and Database Systems, pp. 149-160, March 1987.
C. Fetzer and F. Cristian, "Fail Awareness: An Approach to Construct Fail-safe Applications", Digest of Papers, FTCS-27, Seattle, June 1997.
[Kopetz89] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabl, C. Senft, and R.
Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The Mars Approach", IEEE Micro, pp. 25-41, 1989.
[Lampo78] L. Lamport, “Time, Clocks, and Ordering of Events in a Distributed System,”
Communications of the ACM, 21(7), pp. 558-565, July 1978.
L. Lamport, “Using Time Instead of Timeout for Fault-Tolerant Distributed Systems,” ACM Transactions on Programming Languages and Systems, 6(2), pp. 254-280, April 1984.
M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” Journal of the ACM, 27(2), pp. 228-234, April 1980.
[Powell88] D. Powell, P. Verissimo, G. Bonn, F. Waeselynck, and D. Seaton, “The Delta-4
Approach to Dependability in Open Distributed Computing Systems,” Digest of Papers, FTCS-18, Tokyo, pp. 246-251, June 1988.
R. L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signatures and Public Key Cryptosystems,” Communications of the ACM, 31(2), pp. 120-126, February 1978.
F. Schmuck and F. Cristian, “Continuous Clock Amortization Need Not Affect the Precision of a Clock Synchronisation Algorithm,” Proceedings of the 9th ACM Symposium on Principles of Distributed Computing, pp. 133-141, August 1990.
F. B. Schneider, “Implementing Fault Tolerant Services Using the State Machine Approach: a Tutorial,” ACM Computing Surveys, 22(4), pp. 299-319, December 1990.
S. K. Shrivastava, P. D. Ezhilchelvan, N. A. Speirs, S. Tao, and A. Tully, “Principle Features of the Voltan Family of Reliable System Architectures for Distributed Systems,” IEEE Transactions on Computers, 41(5), pp. 542-549, May 1992.
N.A. Speirs, S. Tao, F.V. Brasileiro, P.D. Ezhilchelvan, and S.K Shrivastava, “The Design and Implementation of Voltan Fault-Tolerant Systems for Distributed Systems, Transputer Communications, 1(2), pp. 1-17, November 1993.
G Tsudik, “Message Authentication With One-Way Hash Functions”, ACM Computer Communications Review, 22(5), 1992.
N. Vasanthavada and P. N. Marinos, “Synchronisation of Fault-Tolerant Clocks in the Presence of Malicious Failures,” IEEE Transactions on Computers, Vol. C-37(4), pp. 440-448, April 1988.
P. Verissimo, L. Rodrigues, and J. Rufino, “The Atomic Multicast Protocol (AMp),” In

*Delta-4: A Generic Architecture for Dependable Distributed Computing*, D. Powell (Ed.) ESPRIT Research Papers, Springer-Verlag, pp. 267-294, 1991.
P. Verissimo, “Causal Delivery Protocols in Real-Time Systems: a Generic Model,” Journal of Real Time Systems, 10(1), pp. 45-73, 1996.
P. Verissimo, L. Rodrigues, and A. Casimoro, “Cesium Spray: A Precise and Accurate Global Clock Service of Large Scale Systems,” Journal of Real Time Systems, 11(3), 1997.
P. Verissimo and M. Raynal, “Time in Distributed System Models and Algorithms,” in ‘Advances in Distributed Systems’, S. Krakowiak and S. K. Shrivastava (Eds.) LNCS 1752, Springer-Verlag, pp. 1-32, 2000.

**Appendix **
**A. Relationship Between Timeliness Checks of Messages **
**Lemma A1**: Let

*t*1 and

*t*2 be the local clock times when a non-faulty

*P*i receives a single-

signed

*m*1 and a double-signed

*m*2, respectively; also, let

*m*1.

*TS* ≤

*m*2.

*TS* and

*t*1 >

*t*2. If

*P*i

finds

*m*1 timely then it must also find

*m*2 timely.

Figure A1.

*m*1.

*TS* ≤

*m*2.

*TS* ≤

*m*’.

*TS*.

**Proof: **We will measure time according to

*P*i’s local clock and prove the lemma by

contradiction. Suppose that

*P*i finds

*m*1 timely and

*m*2 late. Let

*m*2 be late by

*y*,

*y* > 0, time

units, i.e., if

*P*i had received

*m*2 at any time before (

*t*2 -

*y*) then it would have found

*m*2

timely. (See figure A1.) That

*m*2 received at

*t*2 was found late implies that there exists a

message

*m*’,

*m*’.

*TS *≥

* m*2.

*TS*, which was accepted by

*P*i at (

*t*2 -

*y* -

*tb*2), where

*tb*2 is the

timeliness bound indicated by the entry of table 1 whose row corresponds to the

*path*(

*m*’) and
column to the

*path*(

*m*2). Let

*tb*1 be the timeliness bound indicated by the entry of table 1 whose row corresponds to the

*path*(

*m*’) and column to the

*path*(

*m*1).
Since

*m*1.

*TS* ≤

*m*2.

*TS* ≤

*m*’.

*TS*,

*m*1 must be received by

*P*i before (

*t*2 -

*y* -

*tb*2 +

*tb*1) for it to
be considered timely. For any given

*path*(

*m*’), i.e. in any given row of table 1, the timeliness
bound for a single-signed message is smaller than that for a double-signed-message. That is,

*tb*1 <

*tb*2. This means, (

*t*2 -

*y* -

*tb*2 +

*tb*1) <

*t*2; by given,

*t*2 <

*t*1. So,

*t*1 > (

*t*2 -

*y* -

*tb*2 +

*tb*1). This means that

*m*1 is received by

*P*i after (

*t*2 -

*y* -

*tb*2 +

*tb*1) and cannot be found timely by

*P*i.

**B. Safe Timeliness Bounds **
We next show that the timeliness bounds satisfy the safety requirement of section 4.2
when

*d *≥ δ/(1 - 5ρ). This is done in two stages: lemma B1 proves that a non-faulty processor
always finds another non-faulty processor’s 1-signed message timely; and lemma B3 shows
that when a non-faulty processor diffuses a 2-signed message to another non-faulty processor,
the latter finds the diffused message timely. As in the main paper, we adopt the style of
writing real time values in Greek and clock time values in italicised lower case Roman letters.
The term ‘clock’ always refers to a processor’s physical clock.
We assume the following notations:

*START*i(

*p*,≥

*T*) denotes the smallest real time instance
when

*PC*i[

*p*] for path

*p* becomes larger than or equal to

*T*,

*T* ≥

*INIT_VAL*. That is, just before

real time

*START*i(

*p*,≥

*T*),

*PC*i[

*p*] is less than

*T*.

*END*i(≤

*T*) denotes the largest real time

instance when

*MC*i is less than or equal to

*T*. That is, just after real time

*END*i(≤

*T*),

*MC*i is

larger than

*T* and

*P*i will not form and send any

*m*,

*m*.

*T* ≤

*T*. We also retain the notation ρi,

|ρi| ≤ ρ, to denote the rate with which the clock of a processor

*P*i drifts from real time.

**Lemma B0**: Say a non-faulty

*P*i accepts

*m* at real time τi. If

*m* is not signed by a non-faulty

*P*k,

*P*k then receives

*m* or

*equiv*(

*m*) at real time τk such that τk < τi + δ.

**Proof: **If

*m* is

*P*i’s own message,

*P*i is required to send

*m* to both

*P*j and

*P*k. If

*m* is signed by

*P*j,

*P*i is required to diffuse double-signed

*equiv*(

*m*) to

*P*k. By assumption 4,

*P*k then receives

*m* or

*equiv*(

*m*) before τi + δ.

**Lemma B1**: When a non-faulty

*P*k forms and sends

*m*, another non-faulty

*P*i finds

*m* timely,

provided

*d* ≥ δ/(1 - ρ).

**Proof:** By lemma B0,

*P*i receives

*m *or

*equiv*(

*m*) within δ real-time after

*P*k sends

*m*. Say,

*m*.

*TS* =

*T*. If

*P*i has not accepted any message with timestamp larger than or equal to

*T* until it

receives

*m*, then

*MC*i ≤

*T* when it receives

*m* and

*P*i finds

*m* timely. Let us suppose that

*MC*i

is already larger than

*T* when

*P*i receives

*m*. Since

*P*i has accepted one or more messages with

timestamp larger than or equal to

*T*, there must exist a message, say

*m’*,

*m’*.

*TS* ≥

*T*, whose

acceptance causes

*P*i to set

*PC*i[

*P*k] ≥

*T* at real time

*START*i(

*P*k

*,*≥

*T*). If we show that

**START****i(***P***k***,*≥

**T****) - ***END***k(**≤

**T****) **≥ δ for every possible

*path*(

*m’*), then the lemma is proved. We

present the proof by categorising the values of

*path*(

*m’*) in two cases. For each case, we construct the proof in the following manner. We define τi to be the real time when

*P*i accepts

*m’* and α =

*START*i(

*P*k

*,*≥

*T*) - τi. (Figure B1 shows these values along the real-time axis for

*P*i.) We derive two inequalities – inequality (1) involving τi and

*END*k(≤

*T*) and inequality (2) involving τi and

*START*i(

*P*k

*,*≥

*T*). We then relate these two inequalities through the common element τi, to show what is required.
Figure B1.

*P*i accepting

*m’* at τi sets

*PC*i(

*P*k) ≥=

*T* at

*START*i(

*P*k

*,*≥

*T*).

*Case I*.

*path*(

*m’*) =

*P*k or

*P*k:

*P*j or

*P*j:

*P*k. Just before sending

*m’*,

*m’*.

*TS* ≥

*T*,

*P*k sets

*MC*k
to

*m’*.

*TS* + 1. Therefore,

**END****k(**≤

**T****) **≤ τ

**i **
For all the considered values of

*path*(

*m’*), α =

*d*(1 + ρi). (See entries B1, E1 and D1 of table

**START****i(***P***k***,*≥

**T****) = **τ

**i + ***d***(1 + **ρ

**i) **
*START*i(

*P*k

*,*≥

*T*) ≥

*END*k(≤

*T*) +

*d*(1 + ρi).
Since

*d* ≥ δ/(1 - ρ) implies

*d*(1 + ρi) ≥ δ, we have

**START****i(***P***k***,*≥

**T****) - ***END***k(**≤

**T****) **≥ δ.

*Case II*.

*path*(

*m’*) =

*P*j or

*P*i. Let

*P*k receive

*m’* (when

*path*(

*m’*) =

*P*i) or

*equiv*(

*m’*) (when

*path*(

*m’*) =

*P*j) from

*P*i, at real time τk. By lemma B0, τk < τi + δ. Just before τk, we have either

*MC*k ≤=

*T *or

*MC*k >

*T*. In the first case

*END*k(≤

*T*) = τk, and in the latter case

*END*k(≤

*T*) < τk. So,

**END****k(**≤

**T****) - **δ

** < **τ

**i **
From entries A1 and C1 of table 1, α = 2

*d*(1 + ρi). So,

**START****i(***P***k***,*≥

**T****) = **τ

**i + 2***d***(1 + **ρ

**i) **
*START*i(

*P*k

*,*≥

*T*) >

*END*k(≤

*T*) - δ=+=2

*d*(1 + ρi).=
When

*d* ≥ δ/(1 - ρ), 2

*d*(1 + ρi) ≥ 2δ, thus

**START****i(***P***k***,*≥

**T****) - ***END***k(**≤

**T****) **≥ δ.

**Lemma B2**:

*START*i(

*P*j:

*P*k

*,*≥

*T*) ≥

*START*i(

*P*j

*,*≥

*T*) for a non-faulty

*P*i and

*T* ≥

*INIT_VAL*.

**Proof:** Let

*m*’,

*m*’.

*TS* ≥

*T* be the message whose acceptance at time τ causes

*P*i to set

*PC*i[

*P*j:

*P*k] ≥

*T* at

*START*i(

*P*j:

*P*k

*,*≥

*T*). From table 1, τ ≥

*START*i(

*P*j:

*P*k

*,*≥

*T*) – 2

*d*(1 + ρi); also,

from the protocol in figure 2,

*P*i will schedule an update of

*PC*i[

*P*j] to at least

*m*’.

*TS* at latest

by τ +2

*d*(1 + ρi), thus

*START*i(

*P*k

*,*≥

*T*) ≤ τ + 2

*d*(1 + ρi), and since

*d* > 0,

*START*i(

*P*j:

*P*k

*,*≥

*T*) ≥

*START*i(

*P*k

*,*≥

*T*).

**Lemma B3**: When a non-faulty

*P*i diffuses a double-signed

*m* to non-faulty

*P*k,

*P*k finds

*m*
timely, provided

*d* ≥ δ/(1 - 5ρ).

**Proof:** As in lemma B1, we will let

*m*.

*TS* =

*T* and suppose that

*MC*k is already larger than

*T* when

*P*k receives

*m*. Since

*P*k has accepted message(s) with timestamp larger than or equal to

*T*, there must exist a message, say

*m’*,

*m’*.

*TS* ≥=

*T*, whose acceptance causes

*P*k to set

*PC*k[

*P*j:

*P*i] ≥=

*T *at real time

*START*k(

*P*j:

*P*i

*,*≥

*T*). Let τk be the real time when

*P*k accepts

*m’* and α =

*START*k(

*P*j:

*P*i

*,*≥

*T*) - τk. Figure B2 shows these values along the real-time axis for

*P*k.

*START*k(

*P*j:

*P*i,≥

*T*)
k[

*P*j:

*P*i] <

*T*)

*PC*
Figure B2.

*P*k accepting

*m’* at τk sets

*PC*k(

*P*j:

*P*i) ≥=

*T* at

*START*k(

*P*j:

*P*i

*,*≥

*T*).
Note that

*P*i must have diffused

*m* to

*P*k before

*START*i(

*P*j

*,*≥

*T*), and that

*P*k will find

*m*
timely if it receives

*m* before

*START*k(

*P*j:

*P*i

*,*≥

*T*). We need only to show that

**START****k(***P***j:***P***i***,*≥

**T****) - ***START***i(***P***j***,*≥

**T****) **≥ δ for every possible

*path*(

*m’*). We show this by

considering three cases regarding the

* originator *of

*m’*,

*m’*.

*O*. In all cases, there is a real time
τi when

*P*i either sends

*m*’ or receives

*equiv*(

*m*’); in the same way we did to prove lemma B1, for each case we construct two inequalities: inequality (3) involving τi and

*START*k(

*P*j:

*P*i

*,*≥

*T*), and inequality 4 involving τi and

*START*i(

*P*j

*,*≥

*T*). These equalities are then related through τi to show what is required.
We first present the timeliness bounds that a non-faulty

*P*k uses to check the timeliness of
a message

*m* it receives. They are presented in the same way that we did before for a non-faulty

*P*i, and are obtained simply by interchanging the suffixes k and i in table 1. Table B1 shows the timeliness bounds used by

*P*k.

*P*i:

*P*j

* *(E)

* d *
Table B1:

*P*k’s timeliness bounds for a past

*m* given that

*MC*k first exceeded

*m.TS *due to accepting

*m*’

*.*
*Case I*.

*m’*.

*O* =

*P*k. From entry A3 of table B1, α = 4

*d*(1 + ρk) and

*P*k must have formed
and sent

*m’* at real time

*START*k(

*P*j:

*P*i

*,*≥

*T*) - 4

*d*(1 + ρk). Let

*P*i receive

*m’* at τi, by lemma B0,
τ

**i < ***START***k(***P***j:***P***i***,*≥

**T****) - 4***d***(1 + **ρ

**k) + **δ=

From lemma B1,

*P*i accepts

*m’*; so from entry B2 of table 1, we have

**START****i(***P***j***,*≥

**T****) **≤ τ

**i + 2***d***(1 + **ρ

**i) **
*START*i(

*P*j

*,*≥

*T*) <

*START*k(

*P*j:

*P*i

*,*≥

*T*) - 4

*d*(1 + ρk) + δ + 2

*d*(1 + ρi), and

*START*k(

*P*j:

*P*i

*,*≥

*T*) -

*START*i(

*P*j

*,*≥

*T*) > 2

*d*(1 + 2ρk - ρi) - δ

**START****k(***P***j:***P***i***,*≥

**T****) - ***START***i(***P***j***,*≥

**T****) > **δ,

whenever

*d* ≥ δ/(1 + 2ρk - ρi), which is always true since

*d* ≥ δ/(1 - 5ρ) > δ/(1 + 2ρk - ρi).

*Case II*.

*m’*.

*O* =

*P*i. Note that the

*m*’ that

*P*k accepts can be single or double-signed. Let

*P*i
form and send

*m’* (if

*m’* is single-signed) or the single-signed

*equiv*(

*m*’) (if

*m*’ is double-signed) at real time τi. Whether

*m’* is single or double-signed, the entries B3 and E3 of table B1 indicate that α = 3

*d*(1 + ρk). Since τi ≤ τk,
τ

**i **≤

** ***START***k(***P***j:***P***i***,*≥

**T****) - 3***d***(1 + **ρ

**k) **
**START****i(***P***j***,*≥

**T****) **≤ τ

**i + 2***d***(1 + **ρ

**i) **
*START*i(

*P*j

*,*≥

*T*) ≤

*START*k(

*P*j:

*P*i

*,*≥

*T*) - 3

*d*(1 + ρk) + 2

*d*(1 + ρi), and

*START*k(

*P*j:

*P*i

*,*≥

*T*) -

*START*i(

*P*j

*,*≥

*T*) ≥

*d*(1 + 3ρk - 2ρi)

**START****k(***P***j:***P***i***,*≥

**T****) - ***START***i(***P***j***,*≥

**T****) **≥ δδ,

whenever

*d* ≥ δ/(1 + 3ρk - 2ρi), which is true since

*d *≥=δ/(1 - 5ρ) ≥=δ/(1 + 3ρk - 2ρi).

*Case III*:

*m’*.

*O* =

*P*j. Say

*path*(

*m’*) =

*P*j. From entry C3 of table B1, α = 3

*d*(1 + ρk). Let

*P*i
receive the diffused

*equiv*(

*m’*) from

*P*k at real time τi; from lemma B0,
τ

**i **<

** ***START***k(***P***j:***P***i***,*≥

**T****) - 3***d***(1 + **ρ

**k) + **δ= =

(

**III.3a)**.

If

*P*i accepts

*equiv*(

*m*’) at τi, then from entry D2 of table 1, τi ≥

*START*i(

*P*j

*,*≥

*T*) -

*d*(1 + ρi); otherwise, τi >

*START*i(

*P*j:

*P*k

*,*≥

*T*) and from lemma B2, τi >

*START*i(

*P*j

*,*≥

*T*). Since

*d* > 0, in

**START****i(***P***j***,*≥

**T****) **≤ τ

**i + ***d***(1 + **ρ

**i) **
** (III.4a)**.

*START*i(

*P*j

*,*≥

*T*) <

*START*k(

*P*j:

*P*i

*,*≥

*T*) - 3

*d*(1 + ρk) + δ=+

*d*(1 + ρi), and

*START*k(

*P*j:

*P*i

*,*≥

*T*) -

*START*i(

*P*j

*,*≥

*T*) > 2

*d*(1 + 3ρk/2 - ρi/2) - δ,

**START****k(***P***j:***P***i***,*≥

**T****) - ***START***i(***P***j***,*≥

**T****) **> δδ,

whenever

*d* ≥ δ/(1 + 3ρk/2 - ρi/2), which is true since

*d *≥=δ/(1 - 5ρ) > δ/(1 + 3ρk/2 - ρi/2).
Say

*path*(

*m’*) =

*P*j:

*P*i

*. *α = 2

*d*(1 + ρk) from entry D3 of table B1. Let

*P*i diffuse

*m’* to

*P*k at
τ

**i **≤

** ***START***k(***P***j:***P***i***,*≥

**T****) - 2***d***(1 + **ρ

**k) **
** (III.3b)**.

**START****i(***P***j***,*≥

**T****) **≤ τ

**i + ***d***(1 + **ρ

**i) **
** (III.4b)**.

*START*i(

*P*j

*,*≥

*T*) ≤

*START*k(

*P*j:

*P*i

*,*≥

*T*) - 2

*d*(1 + ρk) +

*d*(1 + ρi), and

*START*k(

*P*j:

*P*i

*,*≥

*T*) -

*START*i(

*P*j

*,*≥

*T*) ≥

*d*(1 + 2ρk - ρi),

**START****k(***P***j:***P***i***,*≥

**T****) - ***START***i(***P***j***,*≥

**T****) **≥ δδ,

whenever

*d* ≥ δ/(1 + 2ρk - ρi), which is true since

*d *≥=δ/(1 - 5ρ) > δ/(1 + 2ρk - ρi).

Source: http://fubica.lsd.ufcg.edu.br/hp/publicacoes/artigos/ebs.pdf

GRANT PROGRESS REPORT SUMMARY 01602: Longitudinal Study Investigating the Progression and Pathogenesis of Atypical Hyperadrenocorticism in Scottish Terriers Principal Investigator: Research Institution: Virginia-Maryland Regional College of Veterinary Medicine Grant Amount: Start Date: 1/1/2012 End Date: 12/31/2013 Progress Report: Mid-Year 2 Report Due: Receiv

How To Treat Digital DermatitisDigital Dermatitis (DD), also known as Hairy Foot Warts, Strawberry Foot Rot, Mortellaro’s Disease and Rasperry Heel, has become the most prevalent infectious hoof disorder on many Canadian dairy farms. Dr. Paul Greenough, in his new book ‘Bovine Laminitis and Lameness’ (see sidebar on next page) recom-mends the following treatment for animals infected wit