Search for question
Question

10. An intern in the Independent Party campaign designs an independent

SWIM/ping-based failure detection protocol, for an asynchronous distributed

system, that works as follows. Assume there are N=M*K*R processes in the

system (M, K, R, are positive integers, each > 2). Arrange these N processes in a

MxKxR 3-dimensional matrix (tesseract), with M processes in each column, and

K processes in each row, and R processes in the 3rd dimension (aisles). All

processes maintain a full membership list, however pinging is partial. Each

process Pijk (in i-th row and j-th column and k-th aisle) periodically (every T time

units) marks a subset of its membership list as its Monitoring Set. The monitoring

set of a given process, once selected, does not change. The monitoring set of Pijk

contains: i) all the processes in in its own column Pjk, ii) all the other processes in

its own row Prk, and ii) all the processes in in its own aisle Pij. At this point,

there are two options available to you: Option 1 - Each process sends heartbeats

to its monitoring set members. Option 2 - Each process periodically pings all its

monitoring set members; pings are responded to by acks, just like in the SWIM

protocol (but there are no indirect pings or indirect acks.). Failure detection

timeouts work as usual: Option 1 has the heartbeat receiver timeout waiting for a

heartbeat, while Option 2 has the pinging process (pinger) time out. The

suspected process is immediately marked as failed. This is run in an

asynchronous distributed system.

a. How many failures does Option 1 take to violate completeness? That is,

find the value L so that if there are (L-1) simultaneous failures, all of them

will be detected, but if there are L simultaneous failures then not all of

them may be detected.

b. Answer the same above question for Option 2.

c. An opposition party candidate claims that for K-R=2, both Option 1 and

Option 2 provide completeness for all scenarios with up to (and including)

9 simultaneous failures. You gently respond that they are wrong and that

it also depends on M. What are all the values of M (given K=R-2) for

which your opponent's claim above is true? Justify your answer clearly.

Fig: 1