Performance Benchmark Report

An analysis of the K8s-Ceph-Production-Sim Cluster as of June 26, 2025.

Executive Summary

This report details the performance benchmarks conducted on a newly provisioned 4-node Kubernetes cluster utilizing a Rook Ceph storage backend. The results indicate a healthy, stable, and resilient cluster, with performance consistent with expectations for its configuration (3x data replication over a 10GbE network). The primary performance bottleneck was identified as the 10GbE network, not the underlying NVMe storage. The cluster is performing well and is ready for production workloads that require high availability and data durability.

Test Environment Configuration

💻

Servers & CPU

4 Nodes / Xeon D-1528

🧠

RAM

128 GB per node

💾

Storage

8x Samsung 980 1TB NVMe

🌐

Network & Ceph

10GbE / 3x Replication

Benchmark Dashboard

Analysis

Conclusion & Key Finding

Network is the Primary Bottleneck

The benchmark results confirm that the cluster is healthy, resilient, and performing well for its configuration. The high-speed NVMe drives are significantly underutilized, with performance being limited by the 10GbE network fabric.

This is most evident in the write performance, which is impacted by Ceph's 3x replication factor. As illustrated, a single write request from an application results in three separate write operations across the cluster network, multiplying traffic and latency.

Recommendation

Future performance enhancements should focus on upgrading the network infrastructure (e.g., to 25GbE or higher) before considering faster storage.

Ceph 3x Write Replication

App Write Request
Primary OSD Write
Replica 1 Write
Replica 2 Write

(Over 10GbE Network)