|
Volume Number: 18 (2002)
Issue Number: 06
Column Tag: Network Management
by John C. Welch
Heterogeneous Networks as a Defense Mechanism
Why a genetically diverse network has advantages that a network of clones can’t match
Welcome
Crackers. Viral attacks. Inside attacks. The computing world is fraught with peril these days. Yet as much as we would like to hide away from these things or ignore them, we have to deal with computers, computer networks, and the security and privacy issues they bring. But the question is always, how do we counter all the bad things in the computing world so that we can get to work? There are quite a few methods, some more common than others, none of which are a panacea. The most used methods are prophylactic, or external defenses/protections, and while they have achieved a certain amount of success, such measures, in the end, guarantee a certain amount of failure as well.
External Defenses and associated issues
The way most people defend their networks and computers are by creating barriers to external attacks. For example, to defend against virii, we buy antivirus applications. We use firewalls to prevent crackers from gaining access. We use arcane password requirements and policies, and stringent policing of internal network usage to keep employees and other authorized users from causing damage. More concisely, we use prophylactic hardware, software, etc. Now, I’m not going to say these are bad, or unnecessary actions. In fact, they are quite intelligent actions. But, as we will see, on their own, they are not ever going to be enough.
Problems
Virii
Virus defense is a particularly thorny issue. In the last few years, you’ve gone from only having to update viruses monthly, to daily updates. You have to scan almost every file that you download or use, and you have to repeatedly scan the same files, because virus writers get smarter all the time. Viruses are also becoming smarter, or at least more flexible in design and execution. They change their code each time they run, they dance from sector to sector of the drive, some will even dodge virus scans.
It’s not even an operating system issue any more. Application - based virii have surpassed operating system virii in number and infection rates. It’s also not just a Microsoft application issue. QuickTime, Adobe Acrobat, and other applications from different vendors are being infected. Services are getting slammed the same way. IIS, and other Web/Internet services are being used as viral delivery mechanisms. While things like IIS attacks are limited to a single operating system, the way services like IIS work means that the virii that target them can also hurt non-Windows services by creating a denial of service, (DOS) attack situation as they probe Apache, iPlanet, WebSTAR, and other Web Service Servers looking for a valid entry point.
Unfortunately, the image of a virus creator as some demented savant with no social skills sitting alone in the dark wreaking havoc on a cruel world is absolutely incorrect. There are now virus ‘kits’ on the Internet that allow anyone who can work a GUI to create virii that are quite advanced in nature while possessing only minimal technical abilities. The ‘Kournikova’ virus is an example. The creator of this Microsoft Outlook virus was not a cracker, or any other kind of programmer. He wasn’t a computer expert of any kind. He was an Anna Kournikova fan who wanted to spread her fame. The method he chose, while damaging and annoying, was the modern version of Led Zeppelin fans in the 1970s spray - painting ‘Zoƒo’ on bridges and highway overpasses. So almost anyone with Internet access can create viruses.
While antivirus vendors have done an admirable job, with on the fly scanners, email attachment scanning, server scanning, heuristic analysis, (which looks for virus - like behavior as opposed to static ‘signatures’), there is still a problem with relying on antivirus utilities as the sole form of protection. Delay. There is always a delay between the discovery of the virus and the release of the inoculation definitions. As well, there is a delay between the announcement of the release and the downloading and vaccination of infected systems. There is also the delays in release times between different antivirus vendors of new definitions. In the case of a virus like Code Red, which spread at amazingly high speeds, this means that network administrators have to shut down vulnerable systems until the systems can be cleaned. With Code Red, this meant shutting down entire networks. Clearly, the measure - countermeasure - counter-countermeasure dance of virus and antivirus is not a winning strategy. (This should in no way be interpreted as saying that antiviral utilities are not worth the effort. Any security and protection plan should be layered, and antiviral utilities are a critical part of at least one of those layers. Even on platforms with a normally low rate of infection, if there is an antiviral utility available, and you aren’t using it because ‘you never see
Security and crackers
Non-viral attack points are everywhere as well. The recent announcements of vulnerabilities in SNMP and PHP affect any operating system that can run these services, (which is almost all of them, and in the case of SNMP, includes non - Mac OS X Macs.) Security holes are endemic on all platforms, even if they don’t get the same level of publicity as the Microsoft security holes.
While all of these holes can be patched, there is the same delay as with virii. The vendor has to be notified of vulnerabilities, the vulnerability has to be analyzed, fixed, the fix tested, and distributed. In some cases, the patch creates other problems which have to go through the notification-analysis-fix-fix test-fix distribution cycle. Meanwhile, crackers are able to use these holes to break into systems and at very least, increase an already overburdened IT department’s workload, which almost ensures that mistakes will be made in patch implementation, creating situations where systems that aren’t patched are thought to be patched, and are now ‘stealth holes’ into a network.
A major problem with defending against crackers in this manner is one that you can’t patch or secure against, and that is the attitude that the non-IT world takes towards them. For the most part, they are either viewed as misguided, or crusaders helping overworked, or (more commonly) inept IT people secure their networks. Because people don’t think of computer data as ‘real’ and time has never been seen to have an inherent cost in IT, since we are all paid on salary, the time that these criminals take from IT people is seen as the IT person’s just reward for not doing their jobs correctly. One cracker, Adrian Lamo, has gained a measure of fame as a ‘security crusader’ by breaking into systems, changing data, then informing the owners of the system of his attack, then offering his services to fix the holes.
Now, I think that any intelligent network administrator will, at some point, hire someone to try and penetrate their security. That is the only way to ensure that you have done things correctly. But the important thing here is that the people running the attacks have been hired for that purpose. If someone cracks your network, without being authorized to do so, then there is a more precise term for their actions: Breaking and Entering.
I am completely serious about this. If you came home, and found someone sitting in your Barcalounger watching your TV, drinking your beer, and they told you that they had broken into your home to show you how bad your security is, and they’ll happily fix that for you if you hire them as a security consultant, you would first laugh at them, then call the cops. There is no difference save the physicality of the house break in. But because computers aren’t ‘real’ then breaking into one isn’t a ‘crime’ in most people’s eyes. That attitude needs to be changed. Professionals don’t break into my systems if they want me to hire them. They come to me and offer their services. If I accept, then they try to find my security holes, and help me close them. If I don’t accept, then they go away except for annoying sales calls. If I get a visitor telling me how they broke into my network, and offering their services so it can’t happen again, the first thing I’m doing is calling the cops. The second thing I’m doing is hiring a professional security consultant to work with me to patch these holes. If nothing else, am I seriously supposed to professionally trust someone who uses a lack of ethics as a sales tool?
But increased security isn’t the ‘uber-fix’ that people think it is either. In some cases, security can create it’s own problems if applied incorrectly. Overzealous security policies applied without thought to their effects on the people that have to live with them can make daily use of computer resources so difficult that employees feel they have to find ways around them just to get work done. Email scanning for dangerous payloads is often used as an excuse to scan email content for ‘inappropriate content’, which is defined so nebulously that employees feel the need to use non-company email accounts while at work, which creates more security holes that have to be patched, and still more onerous security policies.
One of the sillier side effects of the viral problems that gets passed off as security is the severe restriction or even banning of email attachments. This is not just an IS policy issue. One of Microsoft’s first anti-viral Outlook updates essentially made it impossible to use attachments with Outlook. Obviously this is not a sane answer. Attachments make email far more useful than it could otherwise be. Yes, they can be misused, but so can HTML email, and no one seems to be starting a trend to ban that. If you restrict email to the point where no one wants to use it, then you are just killing a critical tool to avoid a problem that you will end up with anyway.
Remember that in general, the more secure a system is, the harder it is to use. This is particularly evident when you see companies implementing physical security policies like removing floppy / CD drives, or placing locks on them so that you can’t casually use them. While a company is perfectly in the legal right to do this, such a serious indication that the employees can’t be trusted is never a good policy unless you have other security requirements that require you to do this.
So security is even more of a balancing act than antivirus defenses are. If you go too far, the system becomes unusable. If you don’t go far enough, then you are under attack constantly, by your own machines in many cases.
Human issues
These are not only the most complex, but they will also cause you the most problems if not dealt with correctly. First, you have user training. If you do not train the people on the network correctly and hold them accountable to that training, then you don’t have a prayer of any other external protections working. But there are inherent problems that crop up constantly with the user training solutions.
Training isn’t cheap. External training can cost from a couple of hundred to a couple of thousand dollars per person per course. This can get prohibitively expensive for larger corporations. In house trainers may be cheaper, but what department wants to have to deal with that headache and expense. You still need facilities, equipment, lesson plans, courseware, study materials, etc. As well, the first budget to be cut, and the last to be restored is always the training budget. So what ends up happening is a new employee gets a list of ‘Don’ts’ that they read just enough to find the part where they are supposed to sign, acknowledging that they have indeed absorbed these important items into the very fiber of their being, hand it back to their boss, or HR, and then forget it ever existed. Training could be one of the best defenses against network break-ins and viral attacks, but not until it is seen as being as critical as power, water, and the CEO’s bonus.
While eliminating training of the general user population seems as ignorant and short - sited as it is, it pales to the way that most corporations treat the people tasked with keeping the network safe and running efficiently. IT departments will be told that they are critical to a company’s operations, then get their budgets slashed due to reasons from the economy to the CEO hearing from a golf partner that outsourcing will save him more money than the company makes in a year. The IT staff has to deal with every facet of a network and all attached devices, yet they get no more of a training budget than anyone else, namely none. In addition, since they are usually looked at as a drain on the company’s bottom line, their requests for additional funding get analyzed more than almost any other department.
IT departments are perennially short - staffed, even as their workload increases yearly, monthly, sometimes daily. Companies tell you up front that you get paid for 40 hours, but the minimum work week is 60 hours, with mandatory overtime. If you try to do your job in eight hours and go home, you are seen as ‘not a team player’, and let go at the first opportunity. The problem this creates is high turnover rates, with some companies replacing entire departments every couple of years when the numbers are averaged out. As a result, there is almost no ‘institutional’ memory in corporate IT departments, because that requires that your senior IT staffer not be the only one to have been there for over a year.
Ideally, the IT people will document what they do, so that when they are gone, there is a history of what has been done to and with the network. The reality is that when you are as overworked as most IT people are, documentation never even enters the picture, much less actually getting done. So every time an IT person leaves, that knowledge is gone, and has to be relearned by the next person, who then leaves, and the cycle continues. Some companies are trying to do something about this, but they are too few, and still too far ahead of the curve for it to become standard practice.
What this means is that you cannot always rely on having an IT staff that is intimately familiar with your network, because chances are they either haven’t been employed long enough to have achieved that level of knowledge, or are on the way out, and no longer care.
End results
The end results of these factors is that prophylactic protection, because of inherent implementation issues, useless training initiatives, and IT staff turnover, simply cannot work on their own. But there is another cause that accelerates these results, and it is genetic in nature.
Network homogeneity Is a root enabler for network vulnerabilty
Almost any article on increasing your IT efficiency, improving your ROI, decreasing IT expenditures, making your network easier to manage and protect will eventually recommend that you take, among any other human or computer actions, the step of standardizing your computing platforms on a single client and server platform. I propose that regardless of what platform you settle on, that by making your network completely, or almost completely homogenous, that you are creating vulnerabilities that no amount of protection can fix.
The best examples of this are in the non-computing world. Human and animal diseases spread fastest when there is a degree of genetic uniformity among the species being attacked. It is well known among animal and plant breeders that any group that is too inbred is vulnerable to diseases and conditions than a group with a more diverse genetic background. The histories of plant infestations demonstrate how a single species can be nearly destroyed within a relatively short time by a single disease. The Irish famine of earlier centuries is a rather extreme example of this. The potato crops in Ireland were nearly destroyed by a blight, which, due to the almost total dependence on that plant by the Irish people, caused a massive famine.
Other examples of the ways that excessive genetic similarity creates problems are the conditions and diseases that only affect certain groups of people, or affect only certain groups in any kind of number, such as sickle-cell anemia. This example can easily be applied to network design and implementation.
If you have a network that is all, or mostly homogenous, then a new virus has a guaranteed population to infect and spread itself from. A homogenous network with high-speed connections, and heavy traffic spread virii at astounding rates before the infection is discovered and dealt with. Melissa, Code Red, Kournikova, and the rest are perfect examples of this. The world - wide rate of Code Red infection should have been a wake up call, and it was, but only lately is it becoming the right kind of wake up call.
No matter how good your defenses, if your network is nothing but a series of clones, then they all have exactly the same weaknesses to virii, or crackers. This has nothing to do with platform. An all - Sun, or all - Mac OS X network is just as vulnerable to a Sun or Mac OS X - specific attack as an all Windows network is to something like Code Red, or Melissa. Because the virii or attacker only has to deal with a single set of possible entry points, the job of the cracker or virus creator is greatly simplified. All they have to do is construct the virus or attack around a given platform’s weaknesses, and they are assured of at least early success. If the people breaking into your network, or writing virii are actually talented or skilled, they can use that target specificity to make avoiding detection even easier. If all you have to deal with is Windows, or Solaris, or HP-UX, then you have a much better chance of avoiding detection simply because the conditions you have to deal with are drastically reduced. Popularity of platform, not quality of platform is why most virii center on Windows.
We don’t, (for the most part), approve of cloning for humans or domesticated animals, so why do we not only approve of it for computing, but champion it as the answer to all our problems? If genetic homogeneity is a threat to well - being everywhere outside of computers, how can any intelligent person think that those problems magically disappear just because it’s a computer? It’s not just computers where this falls down. If the US Air Force consisted of nothing but F-16s and B-1Bs, then defeating the USAF in battle goes from an extremely difficult goal to one that is relatively simple. The level of homogeneity that exists in some computer networks would be thought of as either illegal or the height of stupidity in any other area, so why continue to use a method that is so obviously flawed?
Money spells it out quite well. By having single sources for hardware, software, and service, your up front costs for a computing environment are greatly reduced. Regardless of platform, if everything comes from one place, you save money, initially. Up front costs are also the most obvious. How do you show that you saved money because something didn’t happen? You almost can’t, short of letting the attack or infection happen, tallying the costs, and using that as justification for implementing diversity. While this would clearly show the hidden costs of over - homogenizing your network, the lack of ethics inherent in such an action would, and should get the people involved in such an action fired, if not arrested and sued. To put it bluntly, you cannot really show the cost of something that didn’t happen. The best you can do is use the misfortune of others as your justification.
Genetic diversity in networks as a strength against attack
So, how do you go about implementing genetic diversity on your network? You have to correctly analyze your needs. Too many IT divisions get suckered into using the preferred platform as the basis for determining network implementation. If you look at everything from the standpoint of “How do we get
Define the problem correctly
Rather than think about it from the platform, think about the problem on its own. What is the problem? We need faster file servers? We need better network management? Define the problem on its own. The need for faster file servers has nothing to do with Windows or Linux, unless the current servers are running those operating systems. Even then, that should only be an historical note. At this stage, no platform should be excluded from consideration.
Advantages
The advantages of correct problem definition are numerous. First, you can avoid going for the quick solution that may only mask the problem. I have seen problems with speed that are really caused by an overburdened infrastructure get patched by adding servers so that each server has less work to do. The infrastructure is still overburdened, but the problem is hidden because the individual servers are more lightly loaded.
Another advantage is that you often find the problem is not nearly as complex as it initially seemed. For example, there was a company with a normally reliable 16Mbps Token Ring network that one day started to go down almost constantly with no apparent reason. One of the first proposals was to yank out all the Token Ring infrastructure and replace it with Ethernet. Luckily the expense that this solution entailed kept it from being immediately implemented. What the problem turned out to be were unlabeled jacks. It seems the company had recently upgraded the number of network drops, and each faceplate had two network drops, along with a phone jack, but the labeling of the jacks had been put off for last, so the jacks could be installed marginally faster. Both the phone system and the Token Ring connectors were RJ - type connectors, the phones using RJ-11 connectors, and the Token Ring cables using RJ-45 connectors. So, a single user, not realizing the difference, plugged the phone into the Token Ring port. There was just enough contact between the connectors in the plug and the cable so that the net result was the network was going down, seemingly without cause. The correct solution ended up being essentially free, and far less traumatic than a complete infrastructure replacement would have been.
So defining the problem correctly, without preconceived solutions is critical to correctly implementing a genetically diverse network.
Analyze every possible solution
So you have defined the problem. Next, see what the possible solutions are. Platform should still not be a factor here. You need to look at all possible solutions for appropriateness that is not specific to any platform. In our file server speed solution, a possible solution may be to give all users 100GB UltraSCSI RAID stacks on their desks. However, this is impractical for many reasons, none of which have to do with the platform on the user’s desk. But all the solutions need to be looked at. There are too many instances of unconventional solutions turning out to be the perfect solution for a problem. While it may be a trite and overused term, ‘thinking outside the box’ is the best description of what should happen here.
Winnow the list of possible solutions objectively
That’s not to say that standard solutions should be tossed aside either. All solutions, both conventional and unconventional need to be looked at with the same objectivity. Don’t worry that you will have such a large list of solutions that you won’t be able to pick a solution. There are always going to be requirements that will help determine the solution. For example, while a fibre - channel SAN may be a faster way to serve files, if you don’t have a fibre-channel setup in place, the fiscal and time costs of such a solution may remove it from the list.
Space limitations are an example of a factor that is going to apply to any solution, and is platform neutral. If you only have a small amount of space in your server room for a new server, then a solution that involves hundred - station Linux clusters may not be practical at this time. Network limitations are another example. That reconditioned AS/400 may indeed be a cheap and fast server, but if it can only use Twinax and you only can implement Ethernet, then this is not a good solution.
The point is, make sure that you use unavoidable limitations to winnow your solutions list first. Network, physical, fiscal, these are all limits that should take precedence over the operating system and platform for the server.
Standards are good, but in moderation
So now you have a manageable solutions list. What about computing standards? Well, as long as you don’t go overboard, and apply standards to where they need to be applied, they can be an aid. Too many companies standardize on operating system or application, when they would be better off standardizing on data format. If you want to ensure uniformity of final output, standardizing on Windows and Word or Solaris and LaTex will not do nearly as much good as standardizing your fonts, image formats, and data formats. Limiting your font usage, standardizing on a small number of image formats, such as TIFF, JPEG, MPEG, EPS, etc., and using Acrobat as your final output of choice is going to give you all the benefits of standardization, but will leave you with a far more capable tool box than a single platform and application will. It also means that if the preferred application or platform is under attack, you can still get work done.
This does not mean just randomly seed different platforms about your network. First, if you have 100 people in a group using Windows, and one person using a Mac, all you create are problems for yourself. Implementing a different platform has to be done in a planned logical way. Willy - nilly is a fun way to do many things, but network design is not one of them.
Common knowledge is always wrong
One of the signs that a solution is going to be bad is when it starts with any variant of “Well, everyone knows…” There is nothing in computing that everyone knows that is ever universally correct. “Everyone knows” that AppleTalk is chatty and getting rid of it will make things better. Well, to quote an Apple engineer on the AppleShareIP team:
“On Bandwidth: An idle connection to an AppleShare server (via IP) sends 2 tickle packets of about 64 bytes in size every 30 seconds (call it 4 bytes/second or 0.00024% of a 10 Mega Bit [10 Base-T] connection <I may be off by a factor of 10 either way, its early>). When transferring files, AFP is just as efficient as any other well implemented IP protocol, a single client can, under ideal conditions fill the pipe with minimal overhead. For a 16k read/write we have 28 bytes of AFP/DSI protocol info on top of 780 bytes of TCP/IP protocol info for a payload efficiency of about91% (it takes 12 packets to move 16k of data).
So maybe everybody knows nothing. The point here is don’t assume anything. The solution is good or bad based on the facts, not assumptions, attitudes, personal prejudices, the magnetic pull of the Moon, Kentucky windage, etc. If you let anything but the facts and reality guide your selection of a solution, then you may as well throw darts at the list, you’ll have as much luck that way as any, and it’s probably more fun.
Implementing Diversity
So, the solution to the new file server is a new server. It has to be able to authenticate against your active directory servers transparently, it has to support Windows clients smoothly, and it has to fit into your Veritas backup solution. Congratulations, you have a multitude of platforms to pick from. Samba can easily handle Windows clients and use an upstream Active Directory server to handle authentication requests. Veritas supports a wide range of client platforms, including Mac OS X, so you can freely pick from any of them.
You’ll get a new, faster file server. Your users will be happier because they get to their files faster, with the same reliability. Because you can choose from a wider range of vendors, you get better competition for your business, which means your upfront costs are smaller. But even better, what happens when a Windows virus comes ripping through your network and hits that Linux file server?
It dies. Stops. That machine isn’t infected. Those files are safe. What happens when some cracker larvae is using the latest Solaris hack to get past your firewall, and hits your Mac OS X file server? The same thing. He now has to figure out what this operating system is, and then find ways to crack it. He may indeed find a new crack, but that Solaris - specific hole that he used is closed at least here.
By having a genetically diverse network, you aren’t losing anything, except the loss of security and capability that comes with a building full of genetically identical clones. Your up front costs don’t have to be any higher. You may have a learning curve on the new platform, but those aren’t as bad as they seem to be, and the Internet has terabytes of information that will help you along. By having a mixture of server platforms and client platforms you create firebreaks on your network. You ensure that no single attack that is targeted at a single vulnerability is able to completely compromise your entire network unchecked. While Code Red may load down an iPlanet web server, it’s certainly not going to abuse it in the same way as it will an unpatched IIS server. Outlook viruses become merely amusing if you aren’t using Outlook as your only email client.
In addition, you gain a whole host of capabilities that you simply cannot achieve in a homogenous environment. Unix, Windows, Mac OS X, AS/400s, NetWare, *BSD, Linux, et al all have unique strengths and weaknesses that can compliment each other. There are products that only exist on a single, or small number of platforms that can be of great use to you, but only if you have that platform available.
Even better, when you combine a genetically diverse network with a well - thought out set of prophylactic measures, such as antivirus programs and intrusion monitors, both methods become more effective and secure. Even if one of your low - infection platforms does get infected, the damage that occurs will be limited because the other platforms won’t be infected. Not only is the damage mitigated, but you also have more time to prevent a similar problem on the other platforms. The prophylactic protections don’t take up as much of your time, because they have a genetic backup in place. They have less to watch out for, because your firebreaks are intercepting and halting much of the damage before it hits vulnerable systems. As well, other prophylactic measures can help you stop the problem before it hits vulnerable or targeted systems.
Genetic diversity is just another, less common way of removing a single point of failure. If you aren’t going to do that, then why bother with RAID and failover servers, etc?
Conclusion
Genetic diversity isn’t just some fad, or some keen idea that has no basis in reality. In fact, it has millennia of proof that it is a good tactic. It is a proven way to keep any population healthy and functional, no matter if you are talking about potatoes, humans, trees, cows, military forces, or computers. It may take more work than a clone farm, but the benefits are real, tangible, and undeniable. It’s not a panacea, but it can, and will make your network stronger and more capable.
In the end, there is no magic bullet. Every form of protection, including genetic diversity on your network has a weakness. You have to combine network genetics and prophylactic measures, along with a lot of planning, to achieve the best results. Use both. The next time you buy a new box, if you already have a lot of that platform, see if maybe you can get the same, or even better results from a different platform. You’ll learn more, you’ll gain more capabilities, and the next time the Legion Of Bad People unleashes some hideous Windows, or Linux virus, you’ll have a much better time than your counterparts in the lands of Windows and Linux clones.
John Welch jwelch@mit.edu is the IT manager for the MIT Police department, and the Chief Know-It-All for TackyShirt. He has over fifteen years of experience at making computers work. John specializes in figuring out ways in which to make the Mac do what nobody thinks it can, showing that the Mac is the superior administrative platform, and teaching others how to use it in interesting, if sometimes frightening ways. He also does things that don’t involve computertry on occasion, or at least that’s the rumor.
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine