Last updated
Last updated
Key Knowledge Areas:
Understand the goals of High Availability and Site Reliability Engineering Understand common cluster architectures Understand recovery and cluster reorganization mechanisms Design an appropriate cluster architecture for a given purpose Understand application aspects of high availability Understand operational considerations of high availability Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the concepts of LVS / IPVS Understand the basics of VRRP Configure keepalived Configure ldirectord Configure backend server networking Understand HAProxy Configure HAProxy Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the architecture and components of Pacemaker (CIB, CRMd, PEngine, LRMd, DC, STONITHd) Manage Pacemaker cluster configurations Understand Pacemaker resource classes (OCF, LSB, Systemd, Service, STONITH, Nagios) Manage Pacemaker resources Manage resource rules and constraints (location, order, colocation). Manage advanced resource features (templates, groups, clone resources, multi-state resources) Obtain node information and manage node health Manage quorum and fencing in a Pacemaker cluster Configure the Split Brain Detector on shared storage Manage Pacemaker using pcs Manage Pacemaker using crmsh Configure and management of corosync in conjunction with Pacemaker Awareness of Pacemaker ACLs Awareness of other cluster engines (OpenAIS, Heartbeat, CMAN) Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the DRBD architecture Understand DRBD resources, states and replication modes Configure DRBD disks and devices Configure DRBD networking connections and meshes Configure DRBD automatic recovery and error handling Configure DRBD quorum and handlers for split brain and fencing Manage DRBD using drbdadm Understand the principles of drbdsetup and drbdmeta Restore and verify the integrity of a DRBD device after an outage Integrate DRBD with Pacemaker Understand the architecture and features of LINSTOR Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the concepts of Storage Area Networks Understand the concepts of Fibre Channel, including Fibre Channel Topologies Understand and manage iSCSI targets and initiators Understand and configure Device Mapper Multipath I/O (DM-MPIO) Understand the concept of a Distributed Lock Manager (DLM) Understand and manage clustered LVM Manage DLM and LVM with Pacemaker Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the principles of cluster file systems and distributed file systems Understand the Distributed Lock Manager Create, maintain and troubleshoot GFS2 file systems in a cluster Create, maintain and troubleshoot OCFS2 file systems in a cluster Awareness of the O2CB cluster stack Awareness of other commonly used clustered file systems, such as AFS and Lustre Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the architecture and components of GlusterFS Manage GlusterFS peers, trusted storge pools, bricks and volumes Mount and use an existing GlusterFS Configure high availability aspects of GlusterFS Scale up a GlusterFS cluster Replace failed bricks Recover GlusterFS from a physical media failure Restore and verify the integrity of a GlusterFS cluster after an outage Awareness of GNFS Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand the architecture and components of Ceph Manage OSD, MGR, MON and MDS Understand and manage placement groups and pools Understand storage backends (FileStore and BlueStore) Initialize a Ceph cluster Create and manage Rados Block Devices Create and manage CephFS volumes, including snapshots Mount and use an existing CephFS Understand and adjust CRUSH maps Configure high availability aspects of Ceph Scale up a Ceph cluster Restore and verify the integrity of a Ceph cluster after an outage Understand key concepts of Ceph updates, including update order, tunables and features Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand and monitor S.M.A.R.T values using smartmontools, including triggering frequent disk checks Configure system shutdown at specific UPS events Configure monit for alerts in case of resource exhaustion Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Manage RAID devices using various raid levels, including hot spare discs, partitionable RAIDs and RAID containers Add and remove devices from an existing RAID Change the RAID level of an existing device Recover a RAID device after a failure Understand various metadata formats and RAID geometries Understand availability and performance properties of various raid levels Configure mdadm monitoring and reporting Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand and manage LVM, including linear and striped volumes Extend, grow, shrink and move LVM volumes Understand and manage LVM snapshots Understand and manage LVM thin and thick pools Understand and manage LVM RAIDs Partial list of the used files, terms and utilities:
Key Knowledge Areas:
Understand and configure bonding network interface Network bond modes and algorithms (active-backup, balance-tlb, balance-alb, 802.3ad, balance-rr, balance-xor, broadcast) Configure switch configuration for high availability, including RSTP Configure VLANs on regular and bonded network interfaces Persist bonding and VLAN configuration Understand the principle of autonomous systems and BGP to manage external redundant uplinks Awareness of traffic shaping and control capabilities of Linux Partial list of the used files, terms and utilities: