|
Volume Number: | 6 | |||
Issue Number: | 5 | |||
Column Tag: | Programmer's Workshop |
Programmer, Heal Thyself
Diagnosing Virus Infections
By Mike Morton, University of Hawaii
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Introduction
Macintosh users have a variety of defenses against infection by computer viruses. Public-domain, shareware, and commercial applications are available to prevent, detect, and repair virus infection. But most of these solutions require some effort on the user’s part, and users who have never had their software infected are (understandably) not motivated to use these tools. This article describes how your application can detect if it’s been infected, and gives sample code in Think C. Self-diagnosis is not “the solution”, but it can be part of it -- it helps alert users early so they can start repairing their disks as soon as possible.
Many thanks to Andrew Levin and virus maven John Norstad for their comments on this project.
Checksums
All known Macintosh viruses work by adding new CODE resources or expanding existing CODE resources. (Of course, there might be other methods of infection, but why think up new methods to encourage the turkeys who write viruses?) At first, I thought the obvious method to detect changed code was with a checksum: just add up all the bytes in all the code resources. There are a couple of problems with this
First, your code is going to look something like:
/* 1 */ #define EXPECTED_SUM 12345 short actualSum; actualSum = sum_of_CODE (); if (actualSum != EXPECTED_SUM) virus_alert (actualSum);
During development, you don’t know the sum. When you run the application, it’ll compute the actual checksum and display it. So you change the #define and recompile the code. But now the code is different (because the constant is different), and sums to a new value. There might be no end of this cycle, because the checksum includes its own value. There are ways around this moving-target problem, but it’s still a hassle.
Second, let’s assume that the detection technique described here becomes standard. A widely-used technique is likely to become the victim of viruses designed specifically to thwart it. A clever virus could attach CODE which is “checksum-neutral” -- the additional bytes sum to zero. No matter what the checksum algorithm is, you can extend the summed data with virus code and then append constants to keep the checksum the same. Sure, you can make the checksum method more and more convoluted, but this only encourages virus authors and wastes everyone’s time.
Checksums and related check-methods are intended to catch unintentional changes made to data, and you can make the chance of undetected errors vanishingly small. But I don’t think they’re always appropriate against malicious changes.
Length sums
Instead, you can make life much harder for the virus by taking advantage of the fact that it must somehow add code to an application. It’s hard for virus code to replace the existing code in an application without crippling it. So what if you sum the total length of the CODE resources? This detects the changes wrought by every known Macintosh virus, and doesn’t suffer from the self-reference problem (where changing the expected checksum also changes the real checksum).
Of course, you can sum other types of resources besides CODE. You probably don’t want to checksum things like MENU or ICON, since that would prevent users from legitimately changing them. But if you store executable code in a custom control -- a CDEF resource -- for instance, you might want to check for tampering with those resources.
Using the diagnosis routines
The diagnostic code (see the source listing for “virusCheck.c” and also details in “virusCheck.h”) is fairly simple, but there are some tricks to calling it.
Probably the ugliest problem shows up under Think C, or any other development system which runs applications “inside it”. If you run an application inside Think C, and you use a resource file (named “project.rsrc”), the current resource file will be this file -- not your application -- when the application starts up. So, the first trick is never to try to do diagnosis in this environment. If you use such a resource file (and it has no CODE resources of its own), a quick-and-dirty check for whether the application is standalone is to do this at startup:
/* 2 */ if (Count1Resources (‘CODE’) == 0) we’re under Think C else we’re standalone
The demo application checks this and refuses to continue running inside Think C. You’ll probably want something in your application like:
/* 3 */ if (Count1Resources (‘CODE’)) { check for viruses }
The simplest diagnosis routine is vCodeCheck (), which checks if the CODE resources in the current resource file look right -- whether there are the right number of them and they have the right total length. You pass the count and length you expect, and a flag to say whether it should report errors with a debugger. A sample invocation might be:
/* 4 */ /* Expected values: */ #define COUNT 10 #define LENGTH 10000 if (Count1Resources (‘CODE’)) if (vCodeCheck (COUNT, LENGTH, 1)) Alert ( );
Of course, the count of code segments and the total length of them will be different for your application. Make up any values you like for the first time around. (In Think C, the count of code segments seems to be two greater than the number of segments visible in the project -- one for the jump table, one for initialization?) Build a standalone application and run it with a debugger installed. If the values aren’t right (and they’re probably not), the diagnosis routines will spit out messages in your debugger like:
Got count right for resource type ‘CODE’ instead of wrong
and
Got length right for resource type ‘CODE’ instead of wrong
You’ll also probably get the message from your Alert call, since the function will report an infection.
Go back to the source, and change the #define’s for the count and length to be the right values, from the debugger output. Then set the last argument to vCodeCheck () to be -1, to prevent debugger messages. Lastly, build your application and run it again. You shouldn’t see any message from the debugger or the Alert call.
[Why -1 and 1 for the flag instead of “true” and “false”? Because the code which Think C emits to pass “true” and “false” differs in length. By turning off debugger output, this would change the length of the CODE Most compilers will pass “1” and “-1” with code of the same length. Sorry ]
You should do this every time you’re ready to distribute a standalone version of the application. If you build new versions often, you might want to conditionally compile out the code to save a lot of time, then put it back before shipping.
Be sure to test that it diagnoses correctly. Make a couple of working copies of it; use ResEdit to add a CODE resource into one and type a few extra bytes onto the end of a CODE resource in another. Launch both and make sure they diagnose themselves. Note that some development environments (like Think C) will lock and/or protect resources, and you may need to undo this protection before altering them with ResEdit.
Lastly, a slightly more general form of checking for any kind of resource can be had with the vResCheck () function. This is identical to vCodeCheck (), except that you pass the type of resource as a first argument. You may want to use this for resources of type CDEF, LDEF, WDEF, or in general anything which contains executable code.
Notes on the code
The file “virusCheck.h” contains important notes on using this technique, and you should at least skim it. A couple of especially important notes:
• It suggests a few ways to make non-functional changes to the code, so that viruses won’t evolve which recognize and disable the code.
• You must verify that the System is 3.2 or newer, or that the ROM is Mac Plus or newer.
• You can call the check function from more than one place at more than one time.
There’s very little interesting in “virusCheck.c” -- just a straightforward walk through the resources in the application’s resource file. Note that in case of errors, the function chooses to be conservative and reports this as an infection.
One interesting problem which I haven’t resolved is whether all the GetResource calls will have any significant effect. The function sets ResLoad to false, so no resources are actually loaded. But some proofreaders have pointed out that extra master pointers are allocated. For CODE resources, this is no big deal -- your application’s main loop will probably unload them anyway. But I’d be interested in comments on what to do here to minimize side-effects.
This is pretty generic Mac code, and it should port easily to any other C system, and reasonably to any language which can access the Toolbox. Probably the thing you’ll have to watch out for the most is how constants are compiled. For instance, as pointed out above, the Think C version of this can’t use “true” and “false” because they compile to different lengths.
Notes on the sample application
The demonstration program is pretty simple. It sets the count and length of CODE resources with #defines, and calls vCodeCheck (). (First, it asks if you want errors reported via the debugger, so you can calibrate the #defined values for your version of the C compiler and libraries.) Then it calls the function two more times to make sure it detects an incorrect count and length of CODE. That’s all!
It doesn’t demonstrate checking of types of resources other than CODE, but that’s pretty straightforward.
Other methods
You can do things to protect your application at the time you build it, as well as at the time it runs.
John Norstad, principal author of Disinfectant, suggests that you mark resources as “protected” and/or “locked” -- Think C already does this for you.
In Disinfectant, John wants to discourage any changes at all, so he:
• marks the application’s resource map “reade only”
• checksums all resources at startup
(this means you can’t modify menus, etc.)
• marks the application “shared” and “locked”
Of course, the usual defenses are still important -- keep several up-to-date virus-diagnosis applications on a locked floppy disk and inspect all your disks regularly. If you’re producing a commercial product, do everything you can think of -- twice -- to check the master disk of an application you are about to ship.
You may also want to “manually” inspect your application with ResEdit, to help spot new kinds of viruses.
Breaking the news
So, your application thinks it’s infected. Should you refuse to run? I think not -- what if some weird INIT or some change in system software is tripping you up? You might not really be infected. And the user may desperately need to use your application. (AppleLink refuses to run, but that’s because it might transmit an infection.) You might choose to offer the user the option of continuing or quitting, but in a recent application, I stuck with this terse explanation:
It’s probably not worth suggesting an application to repair the infection, since there’s no easy way to know if the virus is too recent.
Above all, don’t be flippant -- some users have no idea what a virus is or what it means. Make it clear that they should get help from an expert. If your application includes a manual, you might want to devote a little space to recommending a user group or two for help. Also, to be on the safe side, if your manual describes the self-diagnosis feature, remind the user that self-diagnosis shouldn’t be relied upon as a substitute for more general diagnosis applications, which are regularly updated.
The last word
In the war on biological viruses, new antibiotics eventually bring about the evolution of resistant strains. The same is true of software defenses: copy-protection, system security, and virus diagnosis/repair all are made obsolete, though by intentional evolution instead of haphazard selection.
Apple’s Developer Technical Support even declines to offer much in the way of specific help. A note I got from them in October 1989 said, “Supporting an anti-Virus procedure in an application is something that MacDTS will not be able to support. It is similar to attempting to support copy protection: it will be a never ending battle.”
I agree that an officially-supported diagnosis or protection method is a bad idea, but not for the same reason. A never-ending battle per se isn’t a problem (does Apple quit competing with IBM just because there’s no end in sight?). But any standard solution invites attacks targeted specifically for its methods. While I hope the “virusCheck” functions see wide use, I hope that people will modify them to prevent viruses from recognizing them -- and improve them to detect future strains.
Evolving new diagnosis and protection techniques, remember, is only part of the picture. User education, careful administration of shared machines (such as in academic computing centers), and active efforts to find virus authors are needed. (But see Jim Matthews’ sobering letter in the September ’89 Communications of the ACM for thoughts on overreacting to digital diseases.)
It’s sad that some developers have to spend their time producing high-quality defenses against viruses, but sadder still that equally-talented developers waste everyone’s time because they don’t have the maturity to put their abilities to good use. As Spock said to Trelane in the Star Trek episode The Squire of Gothos, “I object to you. I object to intellect without discipline. I object to power without constructive purpose.” Let’s hope MacTutor doesn’t need many follow-ups to this article.
Listing: virusCheck.h /* virusCheck.h -- Functions for self-diagnosis of virus infections. Copyright © 1989 by Michael S. Morton. Special thanks to John Norstad and Andrew Levin for advice. You may copy, alter, use, and distribute these routines if you leave this file unchanged up to this line. Think C version. Notes: ---- • You are STRONGLY urged to make non-functional changes to both C functions, to discourage the invention of viruses which recognize this code and disable it. Specifically: - all parameters and local variables are now declared “register”; delete the ‘register’ keyword for randomly-chosen variables - declare your own variables and pepper the code with assignments involving them -- “a = b+c/d*e+f”. (Be sure to avoid division by 0.) - reorder pairs of lines which are preceded by this comment: You can swap the order of the next two lines - test your application (see below) after all these changes - remember that you can call this function from more than one place in your application • To calibrate your application: - set the application’s calls to vResCheck or vCodeCheck to pass “1” for the “report” parameter - build a standalone, double-clickable application - make sure that you have a debugger installed which can intercept calls to DebugStr () -- MacsBug or TMON will do - run the application; if you get messages of the form: Got count CC for resource type ‘<type>’, instead of <expected> Got length LL for resource type ‘<type>’, instead of <expected> then change the arguments in your calls to vCodeCheck() and vResCheck() to pass CC for count and LL for long • To test your application’s virus-detection: - calibrate it as above - change the application’s calls to pass “-1” for reporting - build a standalone, double-clickable application - use ResEdit to add a CODE resource from anywhere to your application - launch the application and make sure it detects and reports infection - delete the added CODE resource or build the application again • Both of these C functions require EITHER a Mac Plus or 512KE or later, OR System file 3.2 or later (for the “one deep” resource calls). The application must check this before calling these. N.B.: I’m not 100% sure that System 3.2 will work on 128K/512K ROM; please try it if you expect your application to work on this configuration. • The “report” parameter takes 1 and -1, not 1 and 0, because many compilers will compile a parameter 0 in less space than a parameter 1. • If these functions encounter an unexpected error, they act conservatively and assume there’s an infection. • If you’re working under Think C, the checksum will be different depending on whether your application is running as a project or as a standalone application. You may want to use this technique (invented by David Oster, I think) which tests whether the Think C environment is present. It relies on the fact that your project’s resource file is the current resource file when you start up, and it contains no CODE resources. if (Count1Resources (‘CODE’))-- are we standalone or project? if (vCodeCheck ( )) -- standalone: do the check { } -- check failed: report virus For this to work, you must have a resource file “project.rsrc”. • In certain obscure cases, you may find that changing the arguments changes the code length. For example, changing: vCodeCheck (3, 0L, 1); to vCodeCheck (3, 12345L, 1); will do this. If this is a problem, move the constant out of the code with something like: static long expected = 12345L; vCodeCheck (3, expected, 1);*/ #ifndef _virusCheck_ /* already seen this */ #define _virusCheck_ /* yes: don’t define it again */ /* vCodeCheck -- Check for apparent alteration of CODE resources. Return TRUE if the count/length do NOT match, meaning an apparent infection.*/ extern Boolean vCodeCheck ( short expectedCount,/* expected number of CODEs */ long expectedLen, /* expected total size of CODEs */ short report); /* >0 => report errors to developer */ /* vResCheck -- Check for apparent alteration of resources. Return TRUE if the count/length do NOT match, meaning an apparent infection.*/ extern Boolean vResCheck ( ResType type, /* type of resource to sum */ short expectedCount,/* expected number of resources */ long expectedLen, /* expected total size of resources */ short report); /* >0 => report errors to developer */ #endif _virusCheck_
Listing: virusCheck.c /* virusCheck.c -- Functions for self-diagnosis of virus infections. Copyright © 1989 by Michael S. Morton. You may copy, alter, use, and distribute these routines if you leave this file unchanged up to this line. See notes in the “.h” file. Think C 3.0 version.*/ /* History: 26-Nov-89 -- MM --No longer needs strings library. Various small documentation changes. 6-Nov-89 -- MM --First version. Enhancements needed: • Should we do a ReleaseResource for some resources, to ditch the master pointer which gets allocated only because we asked for it? How can we know when to do this? For CODE resources, it’s not a big problem, since there aren’t vast numbers of them. • Consider checking if the ROM/System file is recent enough for us */ #include “virusCheck.h” /* get our own prototypes */ /* Local prototypes: */ static void fail (char *kind, ResType type, long actual, long expected); static void append (char **pPtr, char *s); /* vResCheck -- Check that the resources of a specified type in the application haven’t been altered. Return TRUE if there’s apparent tampering. */ extern Boolean vResCheck (type, expectedCount, expectedLen, report) register ResType type; /* INPUT: type of resource to sum */ register short expectedCount; /* INPUT: expected number of resources */ register long expectedLen; /* INPUT: exp. total len of resources */ register short report; /* INPUT: >0 => report errors w/debugger */ {register short actCount; /* actual count of rsrcs of this type */ register long actLen;/* actual total length of resources */ register Handle rsrc; /* resource to check */ register Boolean failFlag = false; /* any problems encountered? */ register short oldResFile; /* for preserving current resource file */ register Boolean oldResLoad; /* for preserving “ResLoad” flag */ /*Switch to the application’s resource file. Note that all resource calls from here on are the “one deep” calls from Inside Mac, vol. IV. */ /*You can swap the order of the next two lines: */ oldResFile = CurResFile (); /* remember initial resource file */ oldResLoad = ResLoad; /* remember “ResLoad” state */ /*You can swap the order of the next two lines: */ UseResFile (CurApRefNum);/* search application for resources */ SetResLoad (false); /* don’t load resources right away */ /*You can swap the order of the next two lines: */ actLen = 0;/* initialize length */ actCount = Count1Resources (type);/* how many of this type are there? */ if (actCount != expectedCount) { if (report > 0) /* is the developer listening? */ fail (“count”, type, actCount, expectedCount); failFlag = true;/* TAMPERING DETECTED */ } /* end of mismatched resource count */ while (actCount)/* loop actCount down to 1 */ { /* Get the resource’s handle, but don’t load it. */ rsrc = Get1IndResource (type, actCount); /* see if it’s already in memory */ if (! rsrc)/* not available? */ { if (report > 0) /* is the developer listening? */ DebugStr (“\pResource not available!”); failFlag = true; /* error detected; ASSUME TAMPERING */ goto EXIT; /* sorry, Dr. Dijkstra */ } /*You can swap the order of the next two lines: */ actLen += SizeResource (rsrc); /* sum up length of rsrcs of this type */ --actCount;/* get next index number */ } /* end of loop through resources */ if (actLen != expectedLen) { if (report > 0) /* is the developer listening? */ fail (“length”, type, actLen, expectedLen); failFlag = true; /* TAMPERING DETECTED */ } EXIT: /* goto here on tampering or error */ /*You can swap the order of the next two lines: */ UseResFile (oldResFile); /* restore original resource file */ SetResLoad (oldResLoad); /* restore original loading state */ return failFlag;/* TRUE => error or tampering */ }/* end of vResCheck () */ /* vCodeCheck -- Check CODE resources haven’t been altered.*/ extern Boolean vCodeCheck (expectedCount, expectedLen, report) register short expectedCount; /* expected number of CODEs */ register long expectedLen;/* expected total size of CODEs */ register short report; /* INPUT: >0 => report errors w/debugger */ { return vResCheck (‘CODE’, expectedCount, expectedLen, report); }/* end of vCodeCheck () */ /* fail -- dump a string like: Got <kind> <actual> for resource type ‘<type>’, instead of <expected> */ static void fail (kind, type, actual, expected) char *kind; /* INPUT: “count” or “length” */ ResType type; /* INPUT: resource type which failed */ long actual, expected; /* INPUT: counts or lengths */ {char buffer [100]; /* for accumulating output message */ char *bufp; /* pointer into buffer[] */ Str255 actualText, expectedText; /* formatted from params */ union /* to get ResType to be like string */ { char resName [5]; ResType theType; } u; NumToString ((long) actual, & actualText); PtoCstr ((char *) & actualText); NumToString ((long) expected, & expectedText); PtoCstr ((char *) & expectedText); u.theType = type; /* set up resource type */ u.resName [4] = ‘\0’; /* to be a NUL-ended C string */ bufp = buffer; /* point to output buffer */ append (& bufp, “Got “); append (& bufp, kind); append (& bufp, “ “); append (& bufp, (char *) & actualText); append (& bufp, “ for resource type ‘“); append (& bufp, u.resName); append (& bufp, “‘ instead of “); append (& bufp, (char *) & expectedText); *bufp++ = ‘\0’; CtoPstr (buffer); DebugStr (buffer); }/* end of fail () */ /* append -- Append a string to an output buffer. This routine lets us avoid pulling in the “strings” library. */ static void append (pPtr, s) char **pPtr; /* UPDATE: VAR ptr to output */ register char *s; /* INPUT: string to append */ {register char *p; /* output ptr */ register char c; p = *pPtr; /* pick up output pointer */ while (c = *s++)/* loop through all non-nulls */ *p++ = c;/* storing them in buffer */ *pPtr = p; /* return updated output pointer */ }/* end of append () */
Listing: virusDemo.c /* virusDemo.c -- Demonstration of virus self-diagnosis. Copyright © 1989 by Michael S. Morton. */ /* History: 26-Nov-89 -- MM --First version.*/ #include “virusCheck.h” /* get our own prototypes */ /* Local and C library prototypes: */ void main (void); int printf (char *formatn, ...); int getche (void); /* NOTE: The total length of the CODE resources will vary with your development system, so you’ll have to calibrate this.*/ #define CODELENGTH 15948L /* actual length of CODE resources */ /* NOTE: The number of CODE resources will vary with how you arrange your project. This count assumes that all source files and libraries are grouped into a single segment. The count is higher because of the resources which Think C adds. */ #define CODECOUNT (1+2) /* actual count of CODE resources */ #define WRONGLENGTH (CODELENGTH+1) /* guaranteed to fail */ #define WRONGCOUNT (CODECOUNT+1) /* ditto */ void main () {char dbgResponse; /* user response for debugger query */ short debugger; /* debugger installed? 1=y, -1=n */ if (! Count1Resources (‘CODE’)) { SysBeep (5); printf (“This demo must be run standalone, not within Think C\n”); printf (“Press any key to exit “); getche (); ExitToShell (); } printf (“Report unexpected errors with the debugger? “); dbgResponse = getche (); printf (“\n”); if ((dbgResponse == ‘y’) || (dbgResponse == ‘Y’)) debugger = 1; else debugger = -1; printf (“\nChecking -- shouldn’t fail “); if (vCodeCheck (CODECOUNT, CODELENGTH, debugger)) printf (“FAILED! *** This application is apparently infected. ***\n”); else printf (“didn’t fail.\n”); printf (“\nChecking -- should fail on count “); if (vCodeCheck (WRONGCOUNT, CODELENGTH, -1)) printf (“failed.\n”); else printf (“DIDN’T FAIL!\n”); printf (“\nChecking -- should fail on length “); if (vCodeCheck (CODECOUNT, WRONGLENGTH, -1)) printf (“failed.\n”); else printf (“DIDN’T FAIL!\n”); printf (“\nPress any key to exit “); getche (); }/* end of main () */
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine