stonith_admin
stonith_admin
is a command-line tool used for managing STONITH (Shoot The Other Node In The Head) devices and operations in Pacemaker-based high-availability clusters. It is used for querying, configuring, and triggering STONITH devices, which are responsible for fencing failed nodes to ensure cluster integrity.
Syntax
Common Options and Commands
-L, --list-all List all STONITH devices configured in the cluster.
-Q, --query Query the status of a specific STONITH device.
-H, --history Display the history of STONITH operations, showing which nodes were fenced and when.
-F, --fence Fence (power off or reboot) a specific node in the cluster manually.
-U, --unfence Unfence (re-enable) a specific node in the cluster.
-R, --register Register a new STONITH device in the cluster. You can specify the STONITH agent (e.g.,
external/ipmi
) and options for the agent.-D, --deregister Deregister (remove) a STONITH device from the cluster.
-S, --status Display the status of all STONITH devices.
-t, --test Simulate a STONITH action without actually fencing a node.
-v, --verbose Run the command with more detailed output.
Example Usage
List all STONITH devices in the cluster:
Query a specific STONITH device (e.g.,
my_stonith_device
):Fence (power off) a node named
node1
:Register a new STONITH device for IPMI-based fencing:
Display STONITH operation history:
STONITH Agents
There are various STONITH agents available depending on the type of fencing hardware or software being used. Examples include:
external/ipmi: For fencing via IPMI-based power management.
fence_vmware: For fencing virtual machines using VMware tools.
fence_xvm: For Xen-based virtualization environments.
Each agent requires specific parameters (e.g., IP address, login credentials, etc.) to interact with the fencing device.
Conclusion
stonith_admin
is an essential tool for managing fencing operations in a Pacemaker-based cluster. It provides mechanisms to manually fence nodes, query fencing devices, and maintain high availability by ensuring nodes that are misbehaving or failing are removed from the cluster to prevent data corruption or other issues.
Last updated