By Brad McGehee
If your mission-critical SQL Server should experience a motherboard failure, how long will it be down? One hour, four hours, a day, or longer? How much will this cost your business in lost sales or productivity? And perhaps even more important to you, what will it do to your stress level?
Being a SQL Server DBA can be demanding andstressful, especially as the success of your company is often a function of your SQL Server’s uptime. While we, as DBAs, have some control over the uptime of our SQL Servers, we don’t have full control. There is not much we can do if a motherboard fails on a server, other than be prepared.
As you may already be aware, there is one way to help boost your SQL Server’s uptime, and that is byclustering SQL Servers. This way, should one SQL Server fail in the cluster, another clustered server will automatically take over, keeping downtime to minutes, instead of hours or more.
The purpose of this article is to introduce you to SQL Server clustering, along with its pros and cons. If you are considering clustering SQL Server to help reduce potential downtime, this article is a good place tostart.
What is Clustering?
Clustering can be best described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed. The obvious goal behind this, given that all computer hardware and software will eventually fail, is to ensure that users running mission-critical applications will have little or nodowntime when such a failure occurs. Downtime can be very expensive, and our goal as DBA is to help reduce it as much as possible.
More specifically, clustering refers to a group of two or more servers (generally called nodes) that work together and represent themselves as a single virtual server to a network. In other words, when a client connects to clustered SQL Servers, it thinks there is onlya single SQL Server, not more than one. When one of the nodes fails, its responsibilities are taken over by another server in the cluster, and the end-user notices little, if any differences before, during, and after the failover.
Microsoft added clustering features to its operating system when they introduced Windows NT Server 4.0 Enterprise Edition several years ago. The actual clusteringfeature was called MSCS (Microsoft Clustering Server). While some brave folks actually put the software into production, I personally avoided it as it was not as dependable as Microsoft led you to believe. Also, about this same time, SQL Server 6.5 Enterprise Edition was released, allowing it to be clustered. This was a very crude attempt at clustering SQL Server that was rarely implemented in thereal world.
Later, when SQL Server 7.0 became available, there were major improvements in SQL Server clustering, but no way was it still good enough, as Windows NT 4.0 Server Enterprise MSCS was still being used at its foundation, and it just wasn’t good enough for maintaining high-availability servers.
Fortunately, Microsoft’s second attempt at clustering, now called the Microsoft ClusterService, in Windows 2000 Advanced Server and Windows 2000 Datacenter Server, is much better. While I won’t claim it is perfect, at least I now am willing to use it to cluster mission-critical SQL Servers. Cluster Service not only works well when clustering SQL Server 7.0, it is even better if you want to cluster SQL Server 2000, which has greatly enhanced clustering support.
One very importantaspect of clustering that often gets overlooked is that it is not a complete backup system for your applications. It is only one part of a multi-part strategy required to ensure minimum downtime and 100% recoverability.
The main benefits that clustering provides is the ability to recover from failed server hardware (excluding the shared disk) and failed software, such as failed services or a...