Paxos Algorithm - Ceph



 


Paxos

The Paxos algorithm is a consensus algorithm that is used to ensure that a distributed system comes to agreement on a particular value. It was first described by Leslie Lamport in a series of papers in the late 1990s and is used in many distributed systems, including Ceph.

The algorithm is designed to work in a system with a group of nodes, where each node can propose a value and all nodes must agree on a single value. The Paxos algorithm consists of three phases:

  1. Propose: A node (called the proposer) proposes a value that it would like the group to agree on. This value is sent to a special node called the acceptor.
  2. Prepare: The acceptor sends a "prepare" message to the other nodes in the group, indicating that it has received a proposal. The other nodes then send back a "promise" message, indicating that they will not accept any more proposals until a decision has been made.
  3. Accept: Once a quorum of nodes have sent back a promise message, the proposer sends an "accept" message to the acceptor, including the value that was proposed. The acceptor then sends an "accepted" message to the other nodes, indicating that the proposed value has been accepted.

At this point, the Paxos algorithm has reached consensus and all the monitor nodes agree on the new configuration of the cluster. This allows the cluster to function correctly, even if some of the monitor nodes are down or unavailable, as a quorum of monitors need to agree before making a change.

By using the Paxos algorithm, Ceph can automatically maintain a consistent view of the cluster state and make sure that data is available to clients. It allows the cluster to scale and adapt to changes, such as adding new OSDs, without interruption of service.

Paxos in ceph

In Ceph, the Paxos algorithm is used by the monitor nodes to maintain a consistent view of the cluster state and to elect a leader among the monitor nodes. The Paxos algorithm is responsible for agreeing on the configuration of the cluster, including the number and location of objects, and the placement of replicas.

Here's how the Paxos algorithm works in a Ceph cluster:

  1. A proposer monitor node initiates a proposal for a new configuration, such as adding a new OSD (Object Storage Daemon) to the cluster.
  2. The proposer sends a "prepare" message to the other monitor nodes, indicating that it has a proposal for a new configuration.
  3. The other monitor nodes send back a "promise" message, indicating that they will not accept any more proposals until a decision has been made.
  4. Once a quorum of monitor nodes have sent back a promise message, the proposer sends an "accept" message to the acceptors, including the proposed configuration.
  5. The acceptors then sends an "accepted" message to the other monitor nodes, indicating that the proposed configuration has been accepted.
  6. The Paxos leader (elected by the Paxos algorithm) will update the cluster map with the new configuration and make it available to clients via the metadata servers.

At this point, the Paxos algorithm has reached consensus and all the monitor nodes agree on the new configuration of the cluster. This allows the cluster to function correctly, even if some of the monitor nodes are down or unavailable, as a quorum of monitors need to agree before making a change.

By using the Paxos algorithm, Ceph can automatically maintain a consistent view of the cluster state and make sure that data is available to clients. It allows the cluster to scale and adapt to changes, such as adding new OSDs, without interruption of service.

What is Quorum

Quorum refers to the minimum number of nodes in a distributed system that must be in agreement in order for the system to function correctly. It is a mechanism used to ensure that a distributed system reaches a consistent state and that data is available to clients, even in the presence of node failures or network partitions.

Quorum is a term that comes from the Latin word "quorum," which means "of whom" and it's used to indicate the minimum number of members that must be present at a meeting before official business can be conducted. In distributed systems, quorum is the minimum number of nodes that must be in agreement on a particular value or decision, in order for the system to reach a consistent state.

In a distributed system, quorum is used to establish a minimum number of nodes that must be available and in agreement, before the system can perform a specific action or make a specific decision. This helps to ensure that the system can continue to function even if some nodes are down or unavailable. By having a quorum of nodes agree on a decision, it ensures that the decision is made by a majority of the nodes and it's less likely that a faulty decision is made.

Quorum is a general concept that can be implemented in different ways depending on the requirements of the system. Some systems use a fixed quorum size, while others use dynamic quorums that adapt to the number of available nodes. Some systems use odd number of nodes and others use even number of nodes. Additionally, the quorum size can vary depending on the context and specific requirements of a distributed system.

"Out of quorum" mean

In a distributed system, if a particular node or group of nodes is "out of quorum," it means that the minimum number of nodes required for the system to function correctly is not currently present or in agreement.

When a node or group of nodes is out of quorum, it can affect the availability and consistency of the system. For example, if a quorum of nodes is needed to make a decision and a node goes down, the remaining nodes may not be able to reach a quorum and the system may not be able to make a decision or perform a specific action. In this case, the system may not be able to function as intended and could experience downtime or data loss.

Additionally, "out of quorum" can also refer to a situation where some nodes have diverging view of the state of the system, it might happen because of a network partition or clock skews. Nodes that don't agree on the state of the system may be unable to reach a quorum, which could lead to issues with consistency and availability.

To prevent out of quorum situations, distributed systems often employ mechanisms like replication, heartbeating, and leader election that help to maintain the availability and consistency of the system even in the presence of node failures or network partitions. Additionally, some systems may have failover mechanisms, such as manual failover, that can be used to bring the system back into quorum in the event of a failure.


Source: Chat GPT

Damasukma Trihanandi
Damasukma Trihanandi
How you think about what you are doing affect how you do it, or whether you do it at all.
Tags:
Link copied to clipboard.