Tuesday, February 7, 2012

Windows Server 2008 R2 Cluster Terminology

The following list contains the many terms associated with Windows

Server 2008 R2 clustering technologies:

Cluster—A cluster is a group of independent servers (nodes) that are accessed and

presented to the network as a single system.

Node—A node is an individual server that is a member of a cluster.

Cluster resource—A cluster resource is a service, application, IP address, disk, or

network name defined and managed by the cluster Within a cluster, cluster

resources are grouped and managed together using cluster resource groups, now

known as Services and Applications groups.

Services and Applications group—Cluster resources are contained within a cluster

in a logical set called a Services and Applications group or historically referred to as a

cluster group. Services and Applications groups are the units of failover within the

cluster. When a cluster resource fails and cannot be restarted automatically, the

Services and Applications group this resource is a part of will be taken offline, moved

to another node in the cluster, and the group will be brought back online.

Client Access Point—A Client Access Point is a term used in Windows Server 2008

R2 failover clusters that represents the combination of a network name and associated

IP address resource. By default, when a new Services and Applications group is

defined, a Client Access Point is created with a name and an IPv4 address. IPv6 is

supported in failover clusters but an IPv6 resource either needs to be added to an

existing group or a generic Services and Applications group needs to be created with

the necessary resources and resource dependencies.

Virtual cluster server—A virtual cluster server is a Services or Applications group

that contains a Client Access Point, a disk resource, and at least one additional

service or application-specific resource. Virtual cluster server resources are accessed

either by the domain name system (DNS) name or a NetBIOS name that references

an IPv4 or IPv6 address. A virtual cluster server can in some cases also be directly

accessed using the IPv4 or IPv6 address. The name and IP address remain the same

regardless of which cluster node the virtual server is running on.

Active node—An active node is a node in the cluster that is currently running at

least one Services and Applications group. A Services and Applications group can

only be active on one node at a time and all other nodes that can host the group are

considered passive for that particular group.

Passive node—A passive node is a node in the cluster that is currently not running

any Services and Applications groups.

Active/passive cluster—An active/passive cluster is a cluster that has at least one

node running a Services and Applications group and additional nodes the group can

be hosted on, but are currently in a waiting state. This is a typical configuration

when only a single Services and Applications group is deployed on a failover cluster.

Active/active cluster—An active/active cluster is a cluster in which each node is

actively hosting or running at least one Services and Applications group. This is a

typical configuration when multiple groups are deployed on a single failover cluster

to maximize server or system usage. The downside is that when an active system

fails, the remaining system or systems need to host all of the groups and provide the

services and/or applications on the cluster to all necessary clients.

Cluster heartbeat—The cluster heartbeat is a term used to represent the communication

that is kept between individual cluster nodes that is used to determine node

status. Heartbeat communication can occur on a designated network but is also

performed on the same network as client communication. Due to this internode

communication, network monitoring software and network administrators should

be forewarned of the amount of network chatter between the cluster nodes. The

amount of traffic that is generated by heartbeat communication is not large based

on the size of the data but the frequency of the communication might ring some

network alarm bells.

Cluster quorum—The cluster quorum maintains the definitive cluster configuration

data and the current state of each node, each Services and Applications group,

and each resource and network in the cluster. Furthermore, when each node reads

the quorum data, depending on the information retrieved, the node determines if it

should remain available, shut down the cluster, or activate any particular Services

and Applications groups on the local node. To extend this even further, failover clusters

can be configured to use one of four different cluster quorum models and essentially

the quorum type chosen for a cluster defines the cluster. For example, a cluster

that utilizes the Node and Disk Majority Quorum can be called a Node and Disk

Majority cluster.

Cluster witness disk or file share—The cluster witness or the witness file share are

used to store the cluster configuration information and to help determine the state

of the cluster when some, if not all, of the cluster nodes cannot be contacted.

Generic cluster resources—Generic cluster resources were created to define and

add new or undefined services, applications, or scripts that are not already included

as available cluster resources. Adding a custom resource provides the ability for that

resource to be failed over between cluster nodes when another resource in the same

Services and Applications group fails. Also, when the group the custom resource is a

member of moves to a different node, the custom resource will follow. One disadvantage

or lack of functionality with custom resources is that the Failover Clustering

feature cannot actively monitor the resource and, therefore, cannot provide the

same level of resilience and recoverability as with predefined cluster resources.

Generic cluster resources include the generic application, generic script, and generic

service resource.

Shared storage—Shared storage is a term used to represent the disks and volumes

presented to the Windows Server 2008 R2 cluster nodes as LUNs. In particular,

shared storage can be accessed by each node on the cluster, but not simultaneously.

Cluster Shared Volumes—A Cluster Shared Volume is a disk or LUN defined

within the cluster that can be accessed by multiple nodes in the cluster simultaneously.

This is unlike any other cluster volume, which normally can only be accessed

by one node at a time, and currently the Cluster Shared Volume feature is only used

on Hyper-V clusters but its usage will be extended in the near future to any failover

cluster that will support live migration.

LUN—LUN stands for Logical Unit Number. A LUN is used to identify a disk or a

disk volume that is presented to a host server or multiple hosts by a shared storage

array or a SAN. LUNs provided by shared storage arrays and SANs must meet many

requirements before they can be used with failover clusters but when they do, all

active nodes in the cluster must have exclusive access to these LUNs.

Failover—Failover is the process of a Services and Applications group moving from

the current active node to another available node in the cluster when a cluster

resource fails. Failover occurs when a server becomes unavailable or when a resource

in the cluster group fails and cannot recover within the failure threshold.

Failback—Failback is the process of a cluster group automatically moving back to a

preferred node after the preferred node resumes operation. Failback is a nondefault

configuration that can be enabled within the properties of a Services and

Applications group. The cluster group must have a preferred node defined and a failback

threshold defined as well, for failback to function. A preferred node is the node

you would like your cluster group to be running or hosted on during regular cluster

operation when all cluster nodes are available. When a group is failing back, the

cluster is performing the same failover operation but is triggered by the preferred

node rejoining or resuming cluster operation instead of by a resource failure on the

currently active node.

Live Migration—Live Migration is a new feature of Hyper-V that is enabled when

Virtual Machines are deployed on a Windows Server 2008 R2 failover cluster. Live

Migration enables Hyper-V virtual machines on the failover cluster to be moved

between cluster nodes without disrupting communication or access to the virtual

machine. Live Migration utilizes a Cluster Shared Volume that is accessed by all

nodes in the group simultaneously and it transfers the memory between the nodes

during active client communication to maintain availability. Live Migration is

currently only used with Hyper-V failover clusters but will most likely extend to

many other Microsoft services and applications in the near future.

Quick Migration—With Hyper-V virtual machines on failover clusters, Quick

Migration provides the option for failover cluster administrators to move the virtual

machine to another node without shutting the virtual machine off. This utilizes the

virtual machine’s shutdown settings options and if set to Save, the default setting,

performing a Quick Migration will save the current memory state, move the virtual

machine to the desired node, and resume operation shortly. End users should only

encounter a short disruption in service and should reconnect without issue depending

on the service or application hosted within that virtual machine. Quick

Migration does not require Cluster Shared Volumes to function.

Geographically dispersed clusters—These are clusters that span physical locations

and sometimes networks to provide failover functionality in remote buildings and

data centers, usually across a WAN link. These clusters can now span different

networks and can provide failover functionality, but network response and throughput

must be good and data replication is not handled by the cluster.

Multisite cluster—Geographically dispersed clusters are commonly referred to as

multisite clusters as cluster nodes are deployed in different Active Directory sites.

Multisite clusters can provide access to resources across a WAN and can support

automatic failover of Services and Applications groups defined within the cluster.

Stretch clusters—A stretch cluster is a common term that, in some cases, refers to

geographically dispersed clusters in which different subnets are used but each of the

subnets is part of the same Active Directory site—hence, the term stretch, as in

stretching the AD site across the WAN. In other cases, this term is used to describe

a geographically dispersed cluster, as in the cluster stretches between geographic

locations.