Kafka

Apache Kafka is a streaming platform that can be used to ship messages around the place.

Calculating (deprecated) DefaultPartititioner assigned partition

When creating a Kafka message, it’s possible to set a key in order to ensure that all messages with that same key end up on the same partition.

Previously, the default strategy (aptly named DefaultPartitioner) used to be that the message key attached to a message would be hashed using murmur2.

You could use this calculator combined with the default Kafka seed (9747b28c) to figure out the murmur hash and then modulo by the number of partitions you had.

For example:

0974728 (seed) + 12413413 (input) -> 3242098085 (hash)
3242098085 (hash) % 16 (partition)
= 5

This would mean that a message with a key of 12413413 sent to a topic with 16 partitions, would be assigned to Partition 5

Calculating hash strategy for kafka-go

Nowadays, the default partition is apparently UniformStickyPartitioner although at the time of writing IBM/sarama defaults to HashPartitioner and segmentio/kafka-go has round-robin as its default.

When it comes to the hash strategy, I recently found myself wondering how to determine the assigned partition for a program using kafka-go’s hash partition.

This lead to me slicing up the default hasher implementation into a small Go playground program.

NOTE: While I’ve manually run the testcases through this small program and they all passed, I haven’t exercised it in any depth. It’s just a reference for myself in the future. You should probably use the actual library directly.

package main

import (
	"fmt"
	"hash"
	"hash/fnv"
)

func main() {
	partitions := generatePartitions(3)
	key := "blah"

	hasher := fnv.New32a().(hash.Hash32)
	hasher.Reset()

	if _, err := hasher.Write([]byte(key)); err != nil {
		panic(err)
	}

	partition := int32(hasher.Sum32()) % int32(len(partitions))
	if partition < 0 {
		partition = -partition
	}
	fmt.Println(partition)
}

func generatePartitions(partitionCount int) []int {
	partitions := []int{}
	for i := 0; i <= partitionCount-1; i++ {
		partitions = append(partitions, i)
	}
	return partitions
}