Advertisement: Support LinuxWorld, click here! |
![]()
Advertisement
|
![]() |
Is there a Linux cluster in your future? We'll find out, with a look at the state of Linux multiprocessing, paying particular attention to the role of clustering over the next year.
The what, how, and why of clustering This simplicity clouds quickly, though, as we begin to examine how and why clustering is done. Several related but distinct concepts blur the boundaries of what exactly a cluster is. Users cluster computers to get more -- more processing power, more reliability, or more manageability. Start by looking at a concrete example, say, a Web site that's beginning to gag on its hit load. It needs more cycles. How do you get them? The most immediate choices are to move to one of the following:
Multiprocessing is any arrangement that uses more than one processor to solve a single problem. The most familiar variant now is symmetric multiprocessing, or SMP, in which two or more processors share the same backplane. SMP is a commodity in the contemporary 80x86 world, where dual-Pentium boxes are standard catalog items. A multiprocessing operating system is hard to get right the first time; both Windows NT and Linux had problems with their early SMP installations. In this area, Linux kernal release 2.2.0 is a watershed for Linux. Its impact will continue to unfold in the year ahead, but it's already significant. Linux 2.2.0 makes VA Research Founder Larry Augustin feel "comfortable" up to four-way processing. VA Research Chief Technology Officer Leonard Zubkoff is one of those now massaging bus logic drivers to enable the kernel to scale properly to eight processors, and there are plenty of plans afoot for the kernel to manage dozens of processors. And it isn't just insiders who are aware of the importance of 2.2.0. (Linux inventor Linus Torvalds called the kernel a "big weight off my back.") Recently, PC Week was one of several mainstream magazines to herald Linux 2.2.0 as "enterprise-ready" on the strength of its multiprocessing capability. Remember that SMP boxes support the closest possible teamwork between processors. At the other extreme are several software schemes for heterogeneous distributed processing. These facilitate programming applications which do most of their work on an isolated node, with results occasionally communicated over data links of no-better-than-Internet reliability and speed. These arrangements sometimes receive press coverage after locating another prime number or decrypting an intractable test message. Slightly closer couplings appear in the categories of job scheduling or load leveling software. These typically distribute atomic tasks -- one HTTP delivery or graphical rendering, say -- to whichever host in a compute farm is ready for another assignment. Clustering splits the difference between load leveling and SMP. It generally involves relatively homogeneous hardware, specialized fast interconnects whenever possible, and a range of useful scaling factors. To solve our example problem of an overloaded Web server, different sites have taken each of these approaches (along with reliance on several special-purpose products tuned exclusively for scaling Web service). For academic surveys of the trade-offs, see Distributed and Parallel Computing and Designing and Building Parallel Programs, (see Resources), authoritative textbooks which explain the software issues involved in cooperative computing. That's how the landscape of multiprocessing technologies looks from cruising altitude. Now, what are your specific challenges and how might clustering help with them?
Intensive scientific computing Supercomputers traditionally perform these kinds of calculations, shipping with specialized hardware and software to manage HPC complexity. They do this at a considerable price, however. For much less money you can lash together commodity hardware with Linux and Linux-specific clustering software to achieve the same computational throughput. That's the achievement of the Beowulf project for clustering, and the competitive advantages of Linux clustering in HPC are on course to continue growing in the foreseeable future. Beowulf, which originated at the NASA Goddard Space Flight Center, is known to many as the "Extreme Linux" package which Red Hat retails for $29.95. Beowulf has already achieved several impressive milestones. The Avalon project gained a prize in the 1998 Gordon Bell competition for the supercomputing performance of its Beowulf cluster. Oak Ridge National Laboratory has received extensive press coverage for its Stone SuperComputer, a useful supercomputer built from obsolete parts. Beowulf-style supercomputing is still a small minority of the field, and inertia isn't the only reason. The proprietary HPC vendors typically bundle plenty of value beyond raw computational cycles, including:
It's likely that all of these technologies will become more and more widely diffused and "open." Nothing intrinsic to Linux keeps it from incorporating such pieces. Augustin offers a mundane example of the spread of such techniques: several customers have come to VA Research with the need to pack a lot of boards into a small space. This might arise from simple constraints on office real estate, or perhaps limits on acceptable interconnect latency. His company has responded to this market by becoming accomplished in the details necessary to fill a standard seven-foot cabinet with economical processor cards while keeping uptime high. There are no particular secrets to this. Augustin proudly states that his company's value-ad lies in its execution of what's publicly known -- all the software VA Research writes is turned back for inclusion in the Linux core source. We all know Linux makes it easy to turn old, unused Windows machines into valuable e-mail and Web servers. This has partly enabled the Internet boom of the last few years. Beowulf technology has a similar potential to help create qualitatively different uses for computers. Red Hat Director of Technical Alliances Robert Hart speculates that Extreme Linux opens up the possibility of dramatic new achievements in such computing-intensive areas as:
Hart surmises this technology will actually be the "'killer application' for Linux in the first five years of the next century, quite possibly sooner." While the race between Beowulf-style HPC and proprietary alternatives will be an interesting one for years to come, Microsoft appears to be a bystander. It has arranged a few publicity events to demonstrate the ability of NT clusters to take on computationally intensive jobs, and its Web pages on the subject are cogently written. However, as the Microsoft Cluster Server (MSCS) Overview states, the "algorithms and features in the current software must be extended and thoroughly tested on larger clusters before customers can reliably use a multinode MSCS cluster for production work, or gain enhanced cluster benefits." The software publicly available to support HPC on NT is surprisingly primitive, even when compared with Beowulf.
Linux availability There's no question that availability is important. There are plenty of applications that need to be up around the clock, but they don't all require the flat-out horsepower of HPC. Think of all the Web servers, factory controllers, telecommunications switches, ATMs, medical and military monitors, flight-control systems, and stock-transaction data stores in the world. A blue screen of death for any of these generally has swift consequences: somebody loses money, a job, or more. A distinction is often made between reliability and availability. A component is reliable if it lasts a long time before failing; a system is "highly available" if its pieces can fail safely. Microsoft's Wolfpack clustering technology, for example, "can automatically detect the failure of an application or server, and quickly restart it on a surviving server," as its Web site explains well. Failover capability like this is what most buyers think they're getting when they begin asking about Linux clusters. They're in for disappointment. There's simply no standard failover for Linux now. That unpromising reality isn't the end of the story though. It's true that Linux software has a lot of catching up to do in managing availability, even with Wolfpack, let alone more robust operating systems such as OpenVMS and Solaris. However, three points are relevant here:
The following sections explain why.
Linux's reliability
Even without these enhancements, very high reliability can substitute for high availability for some jobs. Sam Ockman is president of Penguin Computing and a former employee of VA Research. Penguin, like VA, packages turnkey Linux servers. Much of Penguin's business has to do with putting together boxes that simply don't break. Ockman rather gleefully describes how he's packed as many as 18 fans into a single host to ensure that operating temperature stays within bounds. What has this attention to detail achieved? "I don't want to jinx myself, but I'll tell you this -- we've never had to have a computer shipped back to us to be fixed." There are plenty of other ways to improve reliability apart from failover or more sophisticated clustering schemes. Disk drives are the first component to go in a well-run shop, and RAID completely answers this frailty. Clean, stable power also solves many problems.
Cluster manageability
Is clustering right for you?
Linux, frankly, isn't ready for high-availability requirements right now. Check back in a year, though, and this might have changed. This also means that, if you have good ideas on Linux clustering, this is the time to make them real, so that they can become part of the standard distributions.
Discuss this article in the LinuxWorld forums
(2
postings)
About the author |
||||||||||||
|
Advertisement: Support LinuxWorld, click here! |
(c) 1999 LinuxWorld, published by Web Publishing Inc.