# Kafka的复制机制

Kafka是一种高吞吐量的分布式发布订阅消息系统，有如下特性：

1. 通过O(1)的磁盘数据结构提供消息的持久化，这种结构对于即使数以TB的消息存储也能够保持长时间的稳定性能。
2. 高吞吐量：即使是非常普通的硬件Kafka也可以支持每秒数百万的消息。
3. 支持通过Kafka服务器和消费机集群来分区消息。
4. 支持流式处理。

7年过去了， kafka已经成为一个羽翼丰满的发布订阅平台、消息存储、流处理的工具。财富500强企业中有三分之一的公司使用了kafka平台。也就是在昨天(2017年11月1日)，kafka发布了它的1.0.0版本。

1. Producer可以继续发布消息
2. Consumer可以继续接收消息

1、基于quorum的方式延迟(latency)可能会好于primary-backup,因为基于quorum的方式只需要部分节点写入成功就可以返回。
2、在同样多的节点下基于primary-backup的复制可以容忍更多的节点失败，只要有一个节点活着就可以工作。
3、primary-backup在两个节点的情况下就可以提供容错，而基于quorum的方式至少需要三个节点。

Kafka采用了第二种方式，也就是主从模式， 主要是基于容错的考虑，并且在两个节点的情况下也可以提供高可用。

kafka的复制是针对分区的。比如上图中有四个broker,一个topic,2个分区，复制因子是3。当producer发送一个消息的时候，它会选择一个分区，比如topic1-part1分区，将消息发送给这个分区的leader， broker2、broker3会拉取这个消息，一旦消息被拉取过来，slave会发送ack给master，这时候master才commit这个log。

Kafka引入了 ISR的概念。ISR是in-sync replicas的简写。ISR的副本保持和leader的同步，当然leader本身也在ISR中。初始状态所有的副本都处于ISR中，当一个消息发送给leader的时候，leader会等待ISR中所有的副本告诉它已经接收了这个消息，如果一个副本失败了，那么它会被移除ISR。下一条消息来的时候，leader就会将消息发送给当前的ISR中节点了。

There are 3 cases of leader failure which should be considered -

1. The leader crashes before writing the messages to its local log. In this case, the client will timeout and resend the message to the new leader.
2. The leader crashes after writing the messages to its local log, but before sending the response back to the client
Atomicity has to be guaranteed: Either all the replicas wrote the messages or none of them
The client will retry sending the message. In this scenario, the system should ideally ensure that the messages are not written twice. Maybe, one of the replicas had written the message to its local log, committed it, and it gets elected as the new leader.
3. The leader crashes after sending the response. In this case, a new leader will be elected and start receiving requests.

When this happens, we need to perform the following steps to elect a new leader.

1.Each surviving replica in ISR registers itself in Zookeeper.

1. The replica that registers first becomes the new leader. The new leader chooses its LEO as the new HW.
2. Each replica registers a listener in Zookeeper so that it will be informed of any leader change. Everytime a replica is notified about a new leader:
If the replica is not the new leader (it must be a follower), it truncates its log to its HW and then starts to catch up from the new leader.
3. The leader waits until all surviving replicas in ISR have caught up or a configured time has passed. The leader writes the current ISR to Zookeeper and opens itself up for both reads and writes.
(Note, during the initial startup when ISR is empty, any replica can become the leader.)