Optimizing My Setup on a K3s Cluster with Ceph Storage

Context: I currently have a K3s cluster deployed across two main nodes with a third node acting solely as a HA node. The two main nodes each have NVMe drives and also serve as the primary storage nodes. The third node is purely for high availability and does not handle workloads. The nodes are connected over a different internet network using Tailscale.

My goal is to run a few lightweight services like Nextcloud, Vaultwarden, and some other small apps for personal use. This setup is used by a small group of people (3-4 users).

Problem: To manage storage, I have deployed Rook Ceph on this cluster, and while the read speeds are excellent (~14.6 Gb/s), the write speeds are significantly slower, averaging about 100 MB/s. Here are some relevant details:

  1. Network:
    • Nodes are connected via Tailscale over a 1 Gbps internet connection.
    • Latency between nodes is around 13-14ms.
  2. Storage setup:
    • The Ceph pool is set up with a replication factor of 2.
    • I have tried increasing the Placement Groups (PGs) from 32 to 64 to optimize data distribution, but the write speed remained the same.
    • Reducing the replication factor to 1 improved write speed significantly (~230 MB/s) but sacrifices redundancy, which is critical for data safety.
    • I have also experimented with various Ceph configurations, including adjusting recovery limits, write sizes, and other performance-related settings, but the issue persists.
  3. Hardware:
    • Both primary nodes have NVMe drives dedicated solely for storage. The operating system is installed on separate disks.
    • Rook Ceph uses the NVMe drives for OSDs.

Current Challenges:

  • The primary bottleneck seems to be the 1 Gbps connection between the nodes, as Ceph replicates data across them.
  • Even with optimal Ceph settings (e.g., limiting recovery/backfill tasks and tuning write size), the write speed does not exceed 100 MB/s when using a replication factor of 2.

Question: Is Rook Ceph the right solution for this use case, or would a different technology be better suited given the following goals?

  1. High Availability: If one node goes down, the other should take over seamlessly.
  2. Good Write Speeds: Optimize for better write performance, even with the current network setup.
  3. Data Redundancy: Minimize the risk of data loss while ensuring performance.
  4. Ease of Management: Since this is a small-scale personal setup, simplicity is important.

Potential Alternatives I've Considered:

  1. I have also tried Longhorn, but the issue remains the same with write speeds.

I’m looking for advice on:

  • Whether there are tweaks I can make to my current setup to improve write performance.
  • If there are better storage solutions for this architecture and use case.
  • Best practices for handling a high-availability setup over an internet-based connection.

Any suggestions or insights are greatly appreciated!