Enterprise Server Design
Those folks over on the Unix side of the shop have some other valid complaints when it comes to using Windows in the Enterprise. If you ever get a look at their system setups, they don’t appear to be much better than what you’re doing on your side. You’ll probably note that nearly everything they are doing is built by Sun, is a tidy chassis with redundant power supplies, hot swappable disk arrays built-in, etc.
Yes, your systems should be similarly built. No ‘workgroup server’ chassis should be sitting in your data center and be touted as a ‘mission critical’ system to anyone in your organization. That kind of hardware simply doesn’t accommodate the claim and it will, eventually, make a liar out of you. There are some features of a Sun system that are simply not available to you in the Intel chassis yet (and these deficiencies affect Linux operators, too). For instance, a Sun system allows out of band access. So long as the power supplies are up and power is available, a Sun system is still accessible through Out Of Band access. A “panicked” (a panic is the Unix equivalent of the Blue Screen of Death) Solaris system is still reachable through a special interface. This means the system is still accessible remotely. Were it your Windows system, there still is not a built-in way to handle a BSOD without showing up on site to start working on the machine. Even the use of devices like Compaq’s Remote Insight Management Lights Out board does not fully expose the system to you and its use does mean sacrificing some system stability if you enable it.
Those are basic, very basic, truths when considering the design of enterprise systems. In our Windows world, old mystical lies will persist, too. You’ll be advised to configure systems with RAID 1 stripes for the operating system and then to configure RAID 5 stripes for the data partition in the interest of keeping system performance high.
The truth is simpler, more dogmatic, and this is largely because when those rules were made, the hardware simply wasn’t available that would allow you to do more basic system configurations that would deliver high performance without talking those measures.
These days, meeting those service requirements is easier. Redundant power supplies, multiple processors, large RAM installations and single RAID 5 stripes are realistic and reliable ways to build an Intel based server. And you can do it with the hardware. There is no more need to build an Enterprise server using the software RAID tools that come with Windows NT variants. Besides, want some real fun? Try swapping a disk or breaking a mirrored set in NT and still have a system that will simply boot. Without intense study, using the software tools in NT to keep a system alive has more to do with detailed study and black magic than good planning.
Another reason to glance at that beautiful Sun system (which never seems to fail) is to look at the software running on it. Solaris (the Unix variant that Sun ships as its operating system) is a strong, stable system that is delivered with excellent tools and many pieces of Enterprise critical services ship or are readily available for use with it. BIND for DNS services, a solid DHCP system, file sharing based on NFS, etc.
Even the Solaris environment suffers from the quality of 3rd party software, though. Large database systems like CRM (Customer Resource Management) tools offer the Enterprise great advantages and are often considered to be ‘Mission Critical’. Unfortunately, the software itself and its underlying database system rarely meet the demands of real, critical tools. This is true in both the Unix and Windows worlds. These are the tools that will make your phone ring in the middle of the night.
So our only real way to manage emergencies before they arise is to go back to the basics of system design. Here are some very basic principles or questions to ask when designing an Enterprise server. The list is presented with the highest priorities first:
- Make the system stable with only the minimal set of options and services installed. If a service allows you remote access to the box, is secure and stable then it’s probably a good idea to install it. However, why would you put the security and availability risk represented by Outlook on your server? Why would you sacrifice system performance, stability, etc., by installing a web server on your DNS system? Minimize the system configuration!
Where possible design the system with as much of the configuration in hardware as possible
Critical elements, like power supplies should be redundant and hot-swappable
Do not create complicated disk sets in software. Plan to purchase disk controllers that are SCSI based, allow for hot-swapping drives and that store the array configuration on each disk. Do not use the Windows NT/2000 disk tools for drive configurations unless you are working with attached storage offered through a SAN
Be leery of mixing Enterprise services like DNS, DHCP, etc., on the same system that is offering other application services like a database
Do not overly complicate the hardware or software mix on the system. For instance, the Compaq management tool described above makes remotely managing the server much easier…unfortunately; it also increases the odds that you will have to remotely manage the server through poor quality drivers and services.
Performance—Do you want a fast car or one that lasts a while with minimal required attention? Of course you don’t want everyone mailing you with a complaint that logins are extremely slow. But you also don’t want everyone to be unable to email you because the fast, critical system isn’t even available. Performance is secondary to everything else on this list. If you have a performance issue when everything else is right, you probably need to rethink the architecture of all your systems, not just this one, because the implication is that you are doing too many of the wrong things on many of your machines. You’re also likely to notice that when the performance suffers overly much, all these other guidelines are probably not being satisfied either.
Using these suggestions as my basic criteria, I have been able to create and maintain Windows NT and 2000 systems which have only required a reboot for the installation of patches, etc. In the cases where I have been forced to install a 3rd party or questionable tool as an Enterprise service, my crew and I have been very successful in creating systems whose only emergencies arise because of those tools and not because the basic system has failed. Using these guidelines allows us to approach the high-availability/stability and performance that your buddies over in the Unix group demand and enjoy.