Which Exchange 2007 High Availability Solution should I choose?

by Shijaz Abdulla on 14.05.2009 at 16:06

For those of you still upgrading to Exchange Server 2007 or consolidating your Exchange Servers, and are considering some of the High Availability solutions, I have a clear recommendation:

Avoid Single Copy Cluster (SCC); Use CCR (Cluster Continuous Replication) or SCR (Standby Continuous Replication) instead!

Why?

  • SCC is not a complete HA solution. There exists a single point of failure – the shared storage! In CCR or SCR, there are two replicas of the same data.
  • You don’t need a shared storage for CCR or SCR
  • You don’t need third-party replication software (such as Double-Take) to span the database over two data centers.
  • Improved failover behavior when compared to SCC.
  • Installation is easier than SCC, and you do not need to perform additional hardware validation because shared storage is not required.
  • Easier to manage.
  • Improve backup performance, by letting backups to run from the passive copy of the data
  • Single Copy Cluster (SCC) is being discontinued in Exchange Server 2010

So, for all future Exchange 2007 HA designs, please keep this in mind!

Fig 1. (below) Single Copy Cluster

Fig 2. (below) Cluster Continuous Replication

Implementing a two-node single copy cluster in Exchange Server 2007

by Shijaz Abdulla on 03.05.2009 at 17:29

This article used to exist on www.shijaz.com before it was taken down in May 2009. Originally published in July 2007.

This article gives step-by-step explanation on how to implement Single Copy Cluster in Microsoft Exchange Server 2007.

Background

For Exchange Server 2003 administrators:

Short and sweet: A two-node Single Copy Cluster in Exchange Server 2007 works just about the same way a two-node Active-Passive cluster works in Exchange Server 2003.

For newbie Exchange administrators:

It is assumed you know what the following terms mean:

  • Cluster

  • Node

  • Failover

  • Storage Group

  • SAN

A two-node single copy cluster (SCC) is a clustered mailbox server that uses shared storage in a failover cluster configuration to allow multiple servers to manage a single copy of the storage groups. In short, the Exchange data is stored on a shared storage device (such as a Storage Area Network – SAN) and is connected to two server computers, but can be accessed by only one at a time. The server computer that has access to the storage resource at any given point of time is called the Active node and the server computer is not active is called the Passive node. When the active node fails, the passive node gains access to the shared storage and the Exchange services run on the second node. The passive node then becomes active and this process is called failover.

2node1

Procedure

Task 1 of : Configure Network Cards

Configure two network cards in each node: a public network card for the clients and a private network card for the two server nodes.

  1. To configure a cluster, you need a minimum of two network cards on each node. Verify that you have at least two on each of your two servers.

  2. To easily identify the network cards, rename one card to "Public" and the other to "Private". The Public NIC on each server connects to your LAN and will have an IP address on your local LAN. The Private NIC on your server connects to private network shared between your two nodes. This can be a cross-cable connection directly drawn between the Private NIC of Node1 and the Private NIC of Node2. Use an IP address scheme that is different from your LAN IP range for the Private interfaces. The Private interface is used for "heartbeat" communication between the nodes (to see if the other node is "alive").

Task 2 of : Configure Shared Storage

Configure shared data storage, and assign the same drive letter for the shared disk storage on both nodes in the SCC cluster

  1. Configure your shared storage device and create volumes for use by the Exchange cluster. For information on how to do this, refer hardware documentation/vendor.

  2. Once the volumes have been created, map them on both servers by the same drive letter using Disk Management. (Right-click My Computer > Manage > Disk Management)

Task 3 of : Create Windows Cluster User Account

Create a Windows cluster service account that will be used by the clustering service to start and stop service during failover. The necessary permissions for this account are granted when configuring the cluster.

  1. Open Active Directory Users & Computers

  2. Create a user (say) CLUSTERADMIN.

  3. Set Password Never Expires for this user. You don’t want the password time bomb to blow on your face!

Task 4 of : Create the Cluster

Create a new cluster on the first node by using the graphical Cluster Administrator tool or the cluster.exe command-line tool.

  1. See my article "How to setup an Exchange 2003 cluster" and follow only step 1 and step 2 to create the cluster.

  2. Add the second node to the cluster, by specifying the computer name and the password for the cluster service account. If you wanted to create a multi-node cluster, add all the nodes in this step.

Task 4 of : Install Mailbox Server Role

Install the Exchange Server 2007 Mailbox Role on the active node

  1. Start Exchange Server 2007 setup and choose Custom Exchange Server Installation. Select the Active Clustered Mailbox Role.
    2Node_1

  2. During installation, you will be prompted for the clustered Virtual Name and the clustered Virtual IP. This is the "virtual" hostname/IP that will always be online regardless of which node is up. The virtual hostname and virtual IP address is created as a resource on the cluster. Clients will be configured to use this virtual hostname.
    Note: You can also run setup from the command prompt with the following options: /newcms /CMSname:ClusterMailboxServerName /CMSIPAddress:ClusteredMailboxServerNameIPAddress /CMSSharedStorage CMSDataPath

  3. If applicable, move existing storage groups and mailbox databases to the active node by using the Move-StorageGroupPath and Move-DatabasePath cmdlets in the Exchange Management Shell. Brief syntax is as follows:
    Move-StorageGroupPath -Identity <StorageGroupIdParameter> [-ConfigurationOnly <SwitchParameter>] [-CopyLogFolderPath <NonRootLocalLongFullPath>] [-CopySystemFolderPath <NonRootLocalLongFullPath>] [-DomainController <Fqdn>] [-Force <SwitchParameter>] [-LogFolderPath <NonRootLocalLongFullPath>] [-SystemFolderPath <NonRootLocalLongFullPath>]
    Move-DatabasePath -Identity <DatabaseIdParameter> [-ConfigurationOnly <SwitchParameter>] [-CopyEdbFilePath <EdbFilePath>] [-DomainController <Fqdn>] [-EdbFilePath <EdbFilePath>] [-Force <SwitchParameter>]

  4. Your first node is now ready. Install the Mailbox Server role on the passive node. Select Custom Exchange Server Installation, choose the Passive Clustered Mailbox Role option. Once setup completes, you will be able to failover from the active node to the passive node. Test the failover using Move-ClusteredMailboxServer Cmdlet.
    Important: Always test the failover (also called ‘Handoff’) using the Move-ClusteredMailboxServer cmdlet on Exchange Server 2007. It is recommended NOT to
    use the Move Group option in Cluster Administrator.

  5. Move mailboxes or create new mailboxes on the active node.

MSExchangeSA Event 9396 while generating Offline Address Book

by Shijaz Abdulla on 28.01.2009 at 09:49

January 28, 2009

Log Name:      Application
Source:        MSExchangeSA
Date:          1/28/2009 10:18:59 AM
Event ID:      9396
Task Category: OAL Generator
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      servername
Description:
OALGen is running on a single-copy cluster (SCC) node that does not have the registry value ‘SYSTEMCurrentControlSetServicesMSExchangeSAParameters
servernameOabDropFolderLocation’ or it is set to a non-existing path. Offline address book generation will not be performed.

This typically happens on an SCC cluster when the OabDropFolderLocation registry entry:

  • does not exist (was accidentally deleted). The key should exist on all nodes with the same value.
  • the location/folder mentioned in this registry value was deleted or renamed
  • the location is on a non-shared disk (like C:). On an SCC cluster, this folder should always be on a shared disk.

To fix the problem, recreate the registry entry if it doesn’t exist, or edit the value so that it points to a valid location.

image

FACT: DFS Replication is not cluster-aware

by Shijaz Abdulla on 07.10.2008 at 07:46

Distributed File System (DFS) that comes with Windows Server 2003 and Windows Server 2003 R2 is not cluster-aware.

You will not be able to add the virtual name of a file server cluster while defining a namespace server. Even if you try to get away by typing in the virtual name without browsing for it, you will most likely end up with an error:

The object identifier does not represent a valid object.

This means you cannot replicate between a file server cluster and a standalone file server (or another file server cluster).

To sum up, Microsoft DFS Replication service is not cluster aware and hence you cannot replicate data stored on shared storage. Even if you manage to do it, remember that storing replicated folders on a shared storage is NOT supported.

Adding a node on a SQL 2005 cluster

by Shijaz Abdulla on 18.09.2008 at 23:22

In the absence of a dedicated SQL DBA at the place where I work, I take care of the SQL Servers too. (No comment.)

One of the passive nodes of a SQL Server 2005 cluster had died a sudden death (hard drive and RAID failed under mysterious circumstances) which necessitated a total rebuild of the failed node.

So I went ahead and evicted passive node from the Cluster Administrator on the surviving active node. After the server rebuild was over, I configured Windows clustering on the second node using Cluster Administrator. Next, I started setup on the active node (from Control Panel –> Add/Remove Programs –> SQL Server 2005 –> Change).

I reached the point in the setup wizard where I choose to add a node to the existing virtual server/cluster. After a while I get the following error message:

Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

Upon checking the Task Scheduler event log on the node being rebuilt, I found this:

"SQL Server Remote Setup .job" (setup.exe) 9/18/2008 11:36:24 PM ** ERROR **
    Unable to start task.
    The specific error is:
    0x80070005: Access is denied.
    Try using the Task page Browse button to locate the application.

Now that’s very helpful, isn’t it?

A few minutes of head-scratching and web-searching yielded what I was missing – I was working on the servers connected via Remote Desktop! For the installation to start successfully on the remote (new) node, it should not have any active remote desktop sessions on it. I went ahead and closed all RDP sessions on the new node being rebuilt using Task Manager (Users tab) and also logged off the session that I was connected to.

Another retry from the first node, and setup now progressed without any errors.

The day the Exchange cluster died

by Shijaz Abdulla on 24.09.2007 at 08:48

I installed Windows Server 2003 Service Pack 2 on a client’s Exchange Server 2003 cluster on Thursday night (Yeah, I hear you – what a way to spend a weekend!). Everything went well, installation completed, rebooted and everything was happy and kicking.

…until on Friday morning when the Exchange HTTP Virtual Server Instance failed. Since this resource was configured to ‘affect the group’, the failure forced a failover of the whole Exchange cluster group to the passive node.

Within no time, Exchange HTTP Virtual Server Instance failed again, this time on the passive node! Someone press the Panic button!! The initial understanding of the situation was clear – Installation of Windows Server 2003 Service Pack 2 brought the mighty Exchange cluster to its knees.

I rebooted both nodes and normal operation ensued. But after a couple of hours it happened again. In the event logs, I could see things like:

Event Type: Warning
Event Source: MSExchangeIS Mailbox Store
Event Category: General
Event ID: 1115
Description:
Error 0xfffffbbe returned from closing database table, called from function JTAB_BASE::EcCloseTable on table DeletedFolders. For more information, click http://www.microsoft.com/contentredirect.asp.

Event Type: Error
Event Source: MSExchangeCluster
Event Category: Services
Event ID: 1005
Description: Exchange HTTP Virtual Server Instance 100 (servername): The IsAlive check for this resource failed. For more information, click http://www.microsoft.com/contentredirect.asp.

Event Type: Error
Event Source: Srv
Event Category: None
Event ID: 2019
Description: The server was unable to allocate from the system nonpaged pool because the pool was empty. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

I couldn’t find much on these errors on the Internet, and this is the reason for this post. Here’s what the problem is.

My client is running Windows Server 2003 on a 32 bit server. 32-bit versions of Windows, as we all know, support a maximum of 4 GB RAM. By default, Windows slices the total memory right down the middle: 2 GB is reserved for the OS and 2 GB for the applications. Out of the 2 GB reserved for the OS, 256 MB is reserved for non-paged pool memory.

My client is using the /3GB switch, which forces Windows to limit itself to 1 GB RAM and let the applications use 3 GB. But this causes the non-paged pool memory reservation to be reduced to 128MB instead of 256MB.

Now, 128 MB is a tight little space. IIS uses non paged pool memory for processing requests. On Windows Server 2003 and Windows Vista, IIS stops processing requests once the available non-paged pool memory goes below 20 MB. Event 2019 is evidence for that.

Of course you know, Exchange relies heavily on IIS. So that explains why the Exchange HTTP Virtual Server resource went down! But wait – what’s hogging up the non-paged pool memory? And how do we fix this?

That’s when Microsoft sent in their Poolmon utility, that grabs information on whats in there. The culprit? – Broadcom’s NetXtreme II network card driver! It was incompatible with scalable networking features bundled with Windows Server 2003 SP2 (and the Windows Scalable Networking Pack) and caused a memory leak! I disabled the TCP Chimney with the following command:

Netsh int ip set chimney DISABLED

I also disabled the registry key HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParametersEnableTCPA registry value setting by it to zero on both nodes and other steps mentioned in KB936594. That was all it took to solve the problem!

See my earlier related post: Delayed Logins: Change Password feature in ISA 2006