The Cluster Command Control (C3) tools are a suite of cluster tools developed at Oak Ridge National Laboratory that are useful for both administration and application support. The suite includes tools for cluster-wide command execution, file distribution and gathering, process termination, remote shutdown and restart, and system image updates.
A short description of each tool follows:
The default method of execution for the tools is to run the command on all cluster nodes concurrently. However, a serial version of cexec is also provided that may be useful for deterministic execution and debugging. To invoke the serial version of cexec, type cexecs instead of cexec. For more information on how to use each tool, see the man page for the specific tool.
Specific instances of C3 commands identify their compute nodes with the help of cluster configuration files: files that name a set of accessible clusters and describe the set of machines in each accessible cluster. /etc/c3.conf, the default cluster configuration file, should consist of a list of cluster descriptor blocks: syntactic objects that name and describe a single cluster that is accessible to that system's users. The following is an example of a default configuration file that contains exactly one cluster descriptor block: a block that describes a cluster of 64 nodes:
cluster cartman { cartman-head:node0 #head node node[1-64] #compute nodes }
The cluster tag denotes a new cluster descriptor block. The next word is the name of the cluster (in this example, ``cartman''). The first line in the configuration is the head node. The first name is the external interface followed by a colon and then the internal interface (for example, an outside user can login to the cluster by ssh'ing to ``cartman-head.mydomain.com''). If only one name is specified, then it is assumed to be both external and internal. Starting on the next line is the node definitions. Nodes can be either ranges or single machines. The above example uses ranges - node1 through node64.
In the case of a node being offline, two tags are used: exclude and dead. exclude sets nodes offline that are declared in a range and dead indicates a single node declaration is dead. The below example declares 32 nodes in a range with several offline and then 4 more nodes listed singularly with 2 offline.
cluster kenny { node0 #head node dead placeholder #change command line to 1 indexing node[1-32] #first set of nodes exclude 30 #offline nodes in the range exclude [5-10] node100 #single node definition dead node101 #offline node dead node102 node103 }
One other thing to note is the use of a place holder node. When specifying ranges on the command line a nodes position in the configuration file is relevant. Ranges on the command line are 0 indexed.
For example, in the cartman cluster example (first example), node1 occupies position 0 which may not be very intuitive to a user. Putting a node offline in front of the real compute nodes changes the indexing of the C3 command line ranges. In the kenny cluster example (second example) node1 occupies position 1.
For a more detailed example, see the c3.conf man page.
Ranges can be specified in two ways, one as a range, and the other as
a single node. Ranges are specified by the following format: m-n, where m is a positive integer (including zero) and is a
number larger than
. Single positions are just the position number.
If discontinuous ranges are needed, each range must be separated by a ``,''. The range ``0-5, 9, 11'' would execute on positions 0, 1, 2, 3, 4, 5, 9, and 11 (nodes marked as offline will not participate in execution).
There are two tools used to help manage keeping track of which nodes are at which position: cname(1) and cnum(1). cnum assumes that you know node names and want to know their position. cname takes a range argument and returns the node names at those positions (if no range is specified it assumes that you want all the nodes in the cluster). See their man pages for details of use.
NOTE: ranges begin at zero!
Machine definitions are what C3 uses to specify clusters and ranges on the command line. There are four cases a machine definition can take. First is that one is not specified at all. C3 will execute on all the nodes on the default cluster in this case (the default cluster is the first cluster in the configuration file). An example would be as follows:
$ cexec ls -l
the second case is a range on the default cluster. This is
in a form of :range
. An example cexec would be as
follows:
$ cexec :1-4,6 ls -l
This would execute ls on nodes 1, 2, 3, 4, and 6 of the
default cluster. The third method is specifying a specific cluster.
This takes the form of cluster_name:
. An example
cexec would be as follows:
$ cexec cartman: ls -l
This would execute ls on every node in cluster cartman. The fourth (and final) way of specifying a machine definition
would be a range on a specific cluster. This takes the form of cluster_name:range
. An example cexec would be as
follows:
$ cexec cartman:2-4,10 ls -l
This would execute ls on nodes 2, 3, 4, and 10 on cluster cartman. These four methods can be mixed on a single command line. for example
$ cexec :0-4 stan: kyle:1-5 ls -l
is a valid command. it would execute ls on nodes 0, 1, 2, 3, and 4 of the default cluster, all of stan and nodes 1, 2, 3, 4, and 5 of kyle (the stan and kyle cluster configuration blocks are not shown here). In this way one could easily do things such as add a user to several clusters or read /var/log/messages for an event across many clusters. See the c3-range man page for more detail.
In most cases, C3 has tried to mimic the standard Linux command it is based on. This is to make using the cluster as transparent as possible. One of the large differences is such as using the interactive options. Because one would not want to be asked yes or no multiple times for each node, C3 will only ask once if the interactive option is specified. This is very important to remember if running commands such as ``crm -all -R /tmp/foo'' (recursively delete /tmp/foo on every node in every cluster you have access too).
Multiple cluster uses do not necessarily need to be restricted by listing specific nodes; nodes can also be grouped based on role, essentially forming a meta-cluster. For example, if one wishes to separate out PBS server nodes for specific tasks, it is possible to create a cluster called pbs-servers and only execute a given command on that cluster. It is also useful to separate nodes out based on things such as hardware (e.g., fast-ether/gig-ether), software (e.g., some nodes may have a specific compiler), or role (e.g., pbs-servers).
root 2002-11-08