Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions Planning for Ceph What is Ceph? How Ceph works? Ceph use cases Replacing your storage array with Ceph Performance Reliability The use of commodity hardware Specific use cases OpenStack- or KVM-based virtualization Large bulk block storage Object storage Object storage with custom application Distributed filesystem - web farm Distributed filesystem -SMB file server replacement Infrastructure design SSDs Consumer Prosumer Enterprise SSDs Enterprise -read intensive Enterprise - general usage Enterprise -write intensive Memory CPU Disks Networking 10G networking requirement Network design OSD node sizes Failure domains Price Power supplies How to plan a successful Ceph implementation Understanding your requirements and how it relates to Ceph Defining goals so that you can gauge if the project is a success Choosing your hardware Training yourself and your team to use Ceph Running PoC to determine if Ceph has met the requirements Following best practices to deploy your cluster Defininga change management process Creating a backup and recovery plan Summary Deploying Ceph Preparing your environment with Vagrant and VirtualBox System requirements Obtaining and installing VirtualBox Setting up Vagrant The ceph-deploy tool Orchestration Ansible Installing Ansible Creating your inventoryfile Variables Testing A very simple playbook Adding the Ceph Ansible modules Deploying a test cluster with Ansible Change and configuration management Summary BlueStore What is BlueStore? Why was it needed? Ceph's requirements Filestore limitations Why is BlueStore the solution? How BlueStore works RocksDB Deferred writes BlueFS How to use BlueStore Upgrading an OSD in your test cluster Summary Erasure Coding for Better Storage Efficiency What is erasurecoding? K+M How does erasure coding work in Ceph? Algorithms and profiles Jerasure ISA LRC SHEC Where can I use erasure coding? Creating an erasure-coded pool Overwrites on erasure code pools with Kraken Demonstration Troubleshooting the 2147483647 error Reproducing the problem Summary Developing with Librados What is librados? How to use librados? Example librados application Example of the librados application with atomic operations Example of the librados application that uses watchers and notifiers Summary Distributed Computation with Ceph RADOS Classes Example applications and the benefits of using RADOS classes Writing a simple RADOS class in Lua Writing a RADOS class that simulates distributed computing Preparing the build environment RADOS class Client librados applications Calculating MD5 on the client Calculating MD5 on the OSD via RADOS class Testing RADOS class caveats Summary Monitoring Ceph Why it is important to monitor Ceph What should be monitored Ceph health Operating system and hardware Smart stats Network Performance counters PG states -the good, the bad, and the ugly The good The active state The clean state Scrubbing and deep scrubbing The bad The inconsistent state The backfilling, backfill_wait, recovering, recovery_wait states The degraded state Remapped The ugly The incomplete state The down state The backfill_toofull state Monitoring Ceph with collectd Graphite Grafana collectd Deploying collectd with Ansible Sample Graphite queries for Ceph Number of Up and In OSDs Showing most deviant OSD usage Total number of IOPs across all OSDs Total MBps across all OSDs Cluster capacity and usage Average latency Custom Ceph collectd plugins Summary Tiering with Ceph Tiering versus caching How Cephs tiering functionality works What is a bloom filter Tiering modes Writeback Forward Read-forward Proxy Read-proxy Uses cases Creating tiers in Ceph Tuning tiering Flushing and eviction Promotions Promotion throttling Monitoring parameters Tiering with erasure-coded pools Alternative caching mechanisms Summary Tuning Ceph Latency Benchmarking Benchmarking tools Fio Sysbench Ping iPerf Network benchmarking Disk benchmarking RADOS benchmarking RBD benchmarking Recommended tunings CPU Filestore VFS cache pressure WBThrottle and/or nr_requests Filestore queue throttling filestore_queue_low_threshhold filestore_queue_high_threshhold filestore_expected_throughput_ops filestore_queue_high_delay_multiple filestore_queue_max_delay_multiple PG Splitting Scrubbing OP priorities The Network General system tuning Kernel RBD Queue Depth ReadAhead PG distributions Summary Troubleshooting Repairing inconsistent objects Full OSDs Ceph logging Slow performance Causes Increased client workload Down OSDs Recovery and backfilling Scrubbing Snaptrimming Hardware or driver issues Monitoring iostat htop atop Diagnostics Extremely slow performance or no IO Flapping OSDs Jumbo frames Failing disks Slow OSDs Investigating PGs in a down state Large monitor databases Summary Disaster Recovery What is a disaster? Avoiding data loss What can cause an outage or data loss? RBD mirroring The journal The rbd-mirror daemon Configuring RBD mirroring Performing RBD failover RBD recovery Lost objects and inactive PGs Recovering from a complete monitor failure Using the Cephs object store tool Investigating asserts Example assert Summary