home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Dr. CD ROM (Annual Premium Edition)
/
premium.zip
/
premium
/
DOSUTIL1
/
CATCHMTE.ZIP
/
MTE-INFO.TXT
< prev
Wrap
Text File
|
1992-06-29
|
26KB
|
479 lines
Mutation Engine Report (2)
Copyright (c) 1992 by VDS Advanced Research Group
All Rights Reserved
P.O. Box 9393
Baltimore, MD 21228
(410) 247-7117
prepared by
Tarkan Yetiser
This report is provided to satisfy the curiosity of the
public. We were approached by some third parties to perform an
analysis on MtE. We would like to share the results of our
analysis with everyone. If you find an error or inaccuracy in
this report, please feel free to contact us. All constructive
criticism is welcome. We thank all those who took the time to
read and bring inaccurate or ambiguous parts of this report
to our attention.
TABLE of CONTENTS
I. Mutation Engine and Viruses
II. How to Catch Viruses and MtE-based Viruses
III. Mutation Types and Detection Algorithms
A. Sample Decryptor Disassembly
IV. Live Tests and Results
A. Comments on Test Results
V. A Simple Message
------------------------------------------------------------------------
I. Mutation Engine and Viruses
We have analyzed the so-called MtE (Mutation Engine by a "Dark
Avenger" from Bulgaria), and sample viruses based on it; namely,
Pogue and Dedicated. We have also conducted tests to examine what
kind of a potential this miscreant has, and collected empirical
data on how popular scanners deal with the MtE. We have also implemented a
little program (CatchMtE) that can recognize MtE-based code using an
algorithmic technique. The program in executable form is available free of
charge as a service to the public. Due to possible misuse, the source code
and a more detailed (at the opcode and bit-mask level) analysis with
decryptor samples and algorithms necessary to detect MtE will be made
available in a limited fashion. Under no circumstances, actual virus
samples will be provided; except the missed samples can be sent to known
anti-viral product developers who wish to enhance their programs.
For those who are not familiar with the MtE, some preliminary
information will be presented first:
MtE is NOT a virus per se, but an object module that can be
linked into a virus to give it polymorphic capabilities. MtE
expects to be called as a routine that can encrypt a certain
portion of code and can generate a suitable decryption routine. It
uses a random number generator to vary each mutation so that it
will not be possible to recognize the new variant by using simple
scanning techniques. The random number generator is not part of the
MtE object module. A sample pseudo-random number generator is
included with the archive Dark Avenger distributes. A virus writer
could also supply his own random number generator.
It's not trivial to take an existing virus and link it with MtE
and turn it into a polymorphic virus. The concern is that having
MtE available greatly reduces the effort needed to craft such a virus.
It's more like an "off-the-shelf" component that other virus writers
can use. Many of those virus writers would probably not be able to code
a polymorphic virus from scratch. Analysis of many viruses indicate
that their authors are not that skilled at all. It is this majority
that could benefit from something like MtE.
Though all this may sound ordinary, MtE got so much attention
not because it is just another encryptive virus but because it can
provide even simple viruses with a feature that makes it difficult
to scan for them. MtE is just like a library routine that you link
into your virus and call when needed. It is a little over 2K in an
object module named MTE.OBJ. A person who calls himself "Dark
Avenger" claims to have developed MtE, and distributes it by
uploading to BBSes in Bulgaria. The archive contains a fairly
detailed documentation on how to use MtE, and even includes a
demonstration virus, a non-resident COM infector known as "Dedicated".
Shortly after MtE made its appearance, a modified copy of this virus
called "Fear" is also seen. We are informed that a few other MtE-based
viruses have been recently reported.
Why this person is engaged in such potentially harmful activity,
or how he/she gets away with it is not something we know about. Curious
individuals who would like to learn more about the history of virus
production in Bulgaria and other social as well as technical issues
are invited to read an excellent paper written by anti-virus researcher
Mr. Vesselin Bontchev of Virus Testing Center, University of Hamburg.
The paper is titled "Bulgarian Virus Factory", and it is available via
anonymous FTP. It provides insight into some of the cultural aspects of
the virus underground in Bulgaria. Mr. Bontchev's contribution to
anti-virus research is much appreciated; otherwise, we probably would
have never known what goes on inside the Bulgarian virus factories.
II. How to Catch Viruses and MtE-based Viruses
Scanning for many known viruses is usually a trivial task.
You disassemble a sample, extract a sequence of bytes that would
exist in each infected executable object, put it into a pattern
matching engine, and then look for that pattern in executable
objects that that virus is known to target. This method proved to
be quite useful in fighting many viruses seen in the wild. Assuming
a carefully chosen scan string, you can find the virus easily
without too many false positives. Not so for polymorphic viruses.
These viruses try to defeat common scanning methods. They keep
their body encrypted to defy analysis, and encrypt the new copy
inserted into an executable object using a different key so that it
will "look" as if a different virus infection has occurred. However,
even these viruses require a plaintext code that will decrypt the rest
of the virus. Scanners can use strings extracted from the plaintext
portion of the virus to identify them. It is usually necessary to
include wildcard bytes (don't-care bytes) to be able to deal with the
varying parts of the decryption routine. Naturally, false alarms are
more likely to occur. MtE-based viruses are more advanced than such
viruses seen before.
We would like to emphasize that the contents of each mutation
and the corresponding decryption routine MtE generates is far too
variable to extract a simple (or even wildcard) scan string. It is
necessary to analyze the MtE itself as well as many sample mutations.
After that, certain characteristics of the code MtE generates can be
used as telltale signs to detect its presence. Avoiding false positives
while maintaining 100% detection ratio is quite difficult.
Armed with an 80x86 instruction set guide (we used Turbo
Assembler 3.0 Quick Reference Guide), and a good disassembler (we
used Mr. Zandt's DIS86 available via anonymous FTP), and a few
known viruses based on MtE (Pogue and Dedicated with payload
removed), we analyzed the MtE code, and the mutations generated.
Tests were conducted on a 40Mhz 386 with a 100 meg HD and MS-DOS
5.0, and a 4.77Mhz IBM/XT with a 30 meg HD and PC-DOS 3.3 installed.
A simple program that generated decoys (small, fully functional programs)
was used to create a large number of samples.
In the case of Pogue, the virus was allowed to remain resident and
infect each decoy program as it is created. Since the Dedicated
virus is not resident, it was necessary to create decoys first and
then infect them by running the virus (infects in the current
directory). After the tests, we archived the samples and stored
them on floppy diskettes, and removed them from the hard drives of
the test machines.
In the Intel 80x86 architecture, it is possible to express a
computation in very dissimilar ways. This is possible because
certain registers can be substituted in place of another one and
still achieve the same result. For example, you can index an array
by using SI, DI, BP or BX registers. Or you could XOR a certain
value at a given memory location by loading that value in AX, BX,
CX or DX first, and performing the XOR on that register, and then
putting the result back into memory, etc. Even other possibilities
exist. When stepping through elements in an array, you can
increment the index register by ADDing to it, INCing it, or ADDing
and then SUBtracting from it. It should be clear that such
flexibility helps MtE significantly. Of course, variability is
something string scanners do not handle too well, since there are
many combinations to search for.
MtE goes even further than that. The size of the decryption
routine is also variable, making it infeasible to assume certain
things that would hold for many polymorphic viruses. It also sets
up a lengthy sequence of redundant instructions before the decryptor
enters the decryption loop.
For over 90% of the mutations, MtE generates a convoluted
16-bit XOR-type encryption; however, in many cases it uses indirect
ways to apply the XOR mask to a memory value. For example, it
computes the mask, and then gets the value to be decrypted into a
register, applies the mask and puts the result back into that memory
location. Besides, memory access is done using many different
instructions such as MOV and XCHG. Sometimes, it uses a SUB and NEG
instruction sequence instead of XOR. There are also many redundant
instructions peppered freely throughout the decryptor.
In some cases (5.5%), MtE generates a decryptor with a null
effect. The decryptor does not actually decrypt anything, and the
virus code is in plaintext. The frequency of such cases seems to
depend on the random number generator. It is funny to note that some
popular scanners misidentify such extreme cases where the virus
is not even encrypted. To handle these mutations, it is sufficient
to extract a signature from the MtE itself. It is also possible to
extract one from known MtE-based viruses and identify the virus
directly. At any rate, a scan string from MtE itself should be used
in case a future virus creates a plaintext variant.
We must also mention that even these plaintext mutations
contained a fully working copy of MtE. They successfully propagated
and generated encrypted mutations in future generations. MtE appears
to generate correct code in all cases. The deviation between new
generations started using plaintext parents and new generations started
using encrypted parents was negligible.
III. Mutation Types and Detection Algorithms
MtE generates 3 "types" of mutations. They are as follows:
1. Double-reference (detectable using Method-1) ( ~ 93.0% )
2. Single-reference (detectable using Method-2) ( ~ 1.5% )
3. Plaintext or no-reference ( ~ 5.5% )
By implementing two algorithms and one scan string for the
plain mutations, it is possible to recognize MtE-based viruses
while keeping false positives to an acceptable level. We have one
such program that achieved 100% hit rate during our tests. Some
others also claim 100% hit rate; and we have tested them as well.
Double-reference mutations contain two memory accesses. The
first one usually (but not always) fetches a word to be decrypted.
The second one updates the same memory location after the word is
decrypted. We must mention that neither fetching nor updating is
always by simple MOVs; in fact, XCHG as well as SUB and NEG
instructions are used sometimes.
Single-reference mutations contain only one memory reference.
The instruction that decrypts using one memory access is XOR. It
simply applies an XOR-mask, computed in a very convoluted way, to
a memory location. There are some irregularities in this type of
mutations. Popular scanners we tested had difficulty in dealing with
this type of samples.
Our CatchMtE implements extra checks to reduce false alarms while
maintaining 100% hit rate on this type of mutations. Many self-modifying
programs (especially the compressed ones) trigger a false alarm, since
they may contain similar memory references while they are modifying
themselves. However, there is one characteristic of MtE that easily
gives it away and simplifies positive identification. Once you verify
the presence of that characteristic, you can achieve 100% hit rate even
if MtE is enhanced! We will not go into further detail here for security
reasons.
In the case of plaintext mutations, there are two subcategories:
1. Null-effect decryptor with no memory reference
2. Null-effect decryptor with memory reference but with a 0-key
The mutations in the first category contain nothing more than a
series of MOVs, ROLs, ANDs and so on. Their overall effect is null since
no reference to the code to be decrypted (in plaintext in this case) is
present. The body of the virus as well as the MtE itself are both in
plaintext and can be disassembled directly.
The mutations in the second category are a little more interesting
in that the decryptor actually fetches from and stores words to memory. In
binary arithmetic, if you XOR a value with 0, that value does not change;
just like if you multiply a number by 1, you get the same number. Again,
the overall effect of the decryptor is null since it uses a 0-key. The
virus code as well as the MtE itself are both in plaintext.
A more detailed analysis of mutation types is not made public
due to possible misuse of such information. We will only show here
a sample decryptor and its commented disassembly.
A. Sample Decryptor Disassembly
Following is a sample decryption routine MtE generated during a live
Pogue test. The rest of the disassembly is omitted for security reasons.
This sample was missed by SCAN 91, and F-PROT 2.04; though F-PROT 2.04a
detects it. Following code is NOT a virus.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Virus : Pogue mutation, encrypted, resident COM infector, 21h,1Ch
; Analyst : -
; Date : June 1992
; Type : Single-reference
; File name : PMUT1220.COM, in PDF
; File size : 3608
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
***** START of DISASSEMBLY *****
COM_PROG_ENTRY:
; In determining the program entry point, it is necessary to use a
; "sliding-window" approach since the first instruction is not a JMP
; or CALL. Second byte is a JMP (e9) and the offset follows it.
0000:0100 4d dec bp
0000:0101 e9 fc 01 jmp DECRYPTOR_ENTRY (0300)
; junk deleted
; In this mutation, BX register is used to step thru the code word at
; a time. Note the way XOR-mask is computed on the first iteration,
; DI will be 7DCCh the first time, and it will change as BX is incremented.
; We are looking at a piece of code that uses BASED-addressing.
; Since the XOR instruction at 0000:030c includes a constant offset 0f18h,
; we must add that value to the initial value of BX, which is set to f420h
; before the loop. Now if you add 0f18h and f420h, you will get 0338h! Why?
; Because BX is a 16-bit register, and it will wrap around after 64K with
; the carry flag set during the process. CF is not important here.
; In other words, the block that is being decrypted is at 0000:0338h.
; The constant 0dee is XORed with the current value of index, and another
; constant 7c02 is subtracted to obtain the final XOR-mask to be applied
; to the word at 0f18+BX.
DECRYPTOR_ENTRY:
0000:0300 bb 20 f4 mov bx,f420
DEC_LOOP_TOP:
0000:0303 bf ee 0d mov di,0dee
0000:0306 33 fb xor di,bx
0000:0308 81 ef 02 7c sub di,7c02
0000:030c 31 bf 18 0f xor [0f18+bx],di
; Following is an example of redundant instructions peppered throughout the
; decryptors. After these two instructions, BX will not change, and SI
; is not used anyway. In a non-virus program, you would not often see such a
; sequence.
0000:0310 8b f3 mov si,bx
0000:0312 8b de mov bx,si
; Now BX will be incremented by 2 to step word at a time
; This is the simple case; in some mutations, the two INC instructions
; are not used at all.
0000:0314 43 inc bx
0000:0315 43 inc bx
0000:0316 75 eb jnz DEC_LOOP_TOP (0303)
; Now how do we calculate the number of iterations? Not too difficult.
; Above you see an implicit terminating condition for the loop. Again,
; it uses the fact that BX is a 16-bit register, and when it hits 64K
; it will wrap around to 0, in which case the loop will be done.
; Since the initial BX is F420h, and we are stepping 2 bytes at a time,
; the number of iterations is: (64K - F420h + 1) / 2 = 1520 in decimal
; In other words, the size of the block being decrypted is 3040 bytes.
; Let's check to see if this makes sense numerically.
; The file size on this sample was 3608 bytes. Adjusting for 100h ORG,
; we can see that 0218h bytes are above the loop end.
;
; 3608 - 536 (0218h) = 3072 bytes from loop end to end of file
; 3608 - 568 (0238h) = 3040 bytes in encrypted block
;
; There is a discrepancy of 32 bytes? Not really, they are padding bytes
; as documented by Dark Avenger in MtE! 32 (MAX_ADD_LEN) bytes, as it says.
; This is only one of the ways it achieves variable length.
; Following are the padding bytes, which are all valid instructions!
0000:0318 bd 22 b9 mov bp,b922
0000:031b b1 03 mov cl,03
0000:031d d3 c5 rol bp,cl
0000:031f 8b fd mov di,bp
0000:0321 bd ae 4f mov bp,4fae
0000:0324 81 e5 25 e7 and bp,e725
0000:0328 8b dd mov bx,bp
0000:032a 8b ee mov bp,si
0000:032c 81 f5 60 fb xor bp,fb60
0000:0330 8a cb mov cl,bl
0000:0332 d3 c5 rol bp,cl
0000:0334 8b cf mov cx,di
0000:0336 d3 c5 rol bp,cl
; from this point on, the code is encrypted
0000:0338
***** END of DISASSEMBLY *****
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
IV. Live Tests and Results
Test #1 Base Virus Name: Dedicated
SCAN 91 F-PROT 2.04 CatchMTE 1.0
by Name (1) 67 69 60
as MtE (2) 933 931 940
misidentified -0 -0 N/A
missed -0 -0 -0
Hit Rate 100% 100% 100%
(1) SCAN91 --> [Mut], F-PROT 2.04 --> Dedicated, CatchMTE --> Dedicated
(2) SCAN91 --> [DAME], F-PROT 2.04 --> MtE, CatchMTE --> MtE-based
Test #2 Base Virus Name: Pogue
SCAN 91 F-PROT 2.04 CatchMTE 1.0
by Name (1) 0 0 56
as MtE (2) 935 936 944
misidentified (3) -65 -61 N/A
missed -0 -3 -0
Hit Rate 93.5% 93.6% 100%
(1) SCAN91 --> N/A, F-PROT 2.04 --> N/A, CatchMTE --> Pogue
(2) SCAN91 --> [DAME], F-PROT 2.04 --> MtE, CatchMTE --> MtE-based
(3) SCAN91 --> [7S], F-PROT 2.04 --> Gotcha, CatchMTE --> N/A
Test #3 Base Virus Name: Pogue
SCAN 91 F-PROT 2.04a CatchMTE 1.0
by Name (1) 0 0 141
as MtE (2) 2399 2398 2259
misidentified (3) ? ? N/A
missed -1 -2 -0
Hit Rate 99.96% 99.92% 100%
(1) SCAN91 --> N/A, F-PROT 2.04 --> N/A, CatchMTE --> Pogue
(2) SCAN91 --> [DAME], F-PROT 2.04 --> MtE, CatchMTE --> MtE-based
(3) Not counted
A. Comments on Test Results
The amount of time it took each program to complete the tests
are not published not to distract from the main purpose of these
tests: to determine if they can achieve 100% hit rate.
It seems that both F-PROT 2.04 and SCAN 91 misidentify some
Pogue mutations that are in plaintext. F-PROT "quickscan" missed
ALL mutations. You are advised to use SECURE scan mode of this
product. The extra speed comes with 0% hit rate on MtE-based viruses!
In test #2, F-PROT 2.04 missed three encrypted Pogue mutations.
We examined these samples and found them to be of Single-reference type,
and detectable using Method-2. The samples worked as expected. We can
only speculate that F-PROT lacks Method-2 detection algorithm and uses
a heuristic in such cases.
One of those three that were missed was called "suspicious" and
guessed to be a variant of the Gotcha virus. Surprisingly, Virx 2.3
missed one of these same mutations. We did not include Virx 2.3 in our
full test suite.
Further testing was done with F-PROT 2.04a. This new release
caught all the ones that the previous one missed in test #2; but still
some others, of single-reference type again, were missed in test #3.
SCAN 91 also missed 1 mutation completely. This mutation was NOT one
of the two that F-PROT 2.04a missed; they were identified as [DAME].
It should be noted that misidentification of 6% of Pogue
mutations is a little alarming. All these misidentified mutations
were found to be working and capable of generating new mutations.
Some people may take an MtE-based infestation more seriously than an
infection by "Seventh Son" virus. This leaves a chance for an unrecognized
mutation to escape detection. We must remember that it takes only one
virus to continue the infection unless other precautions such as
integrity checking are in place.
Someone brought to our attention that TBSCAN 3.3 can use MtE AVR
module in VIRSCAN.DAT (revision 920617) to detect MtE-based viruses. A
copy was downloaded from an FTP-site, and it was tested against the
sample set in test #3. It detected no viruses at all!
One of our researchers mentioned that this product is highly
optimized, and that it may not be checking for MtE if the file size
is less than a certain number of bytes. Samples in test #3 were of
varying sizes: as small as 3500 bytes and never larger than 4000 bytes.
When we tested this scanner with a few larger samples (11K), TBSCAN 3.3
seemed to be able to detect MtE. We did not conduct comprehensive tests
using TBSCAN.
This reminded us legendary algorist and computer scientist
Dr. Donald Knuth's famous saying: "Premature optimization is the root
of all programming evil". Improving the performance of one's scanner
is always desirable, as long as one does not try to "cut corners" and
sacrifice reliability. F-PROT quickscan also missed all MtE mutations,
but at least F-PROT has a secure mode that catches most of them. TBSCAN
did not have such a mode, in fact, it has an even faster mode that missed
the samples faster!
It is not acceptable for a scanner not to find viruses that it
has scan strings for (or algorithms in this case). The primary function
of a virus scanner is to find viruses known to that scanner. If it performs
this task in a speedy manner, that's nice indeed; and it may even give it
a competitive edge. Failing to perform its primary function, on the other
hand, defeats the purpose of such a tool.
V. A Simple Message
It is dangerous to assume that scanning is adequate since
there are some products that can detect MtE-based viruses 100% of
the time. We identified at least two ways to make MtE less
predictable. Of course, such information will not be disseminated.
However, considering the availability of MtE to the hackers all
around the world, and the "glory" Dark Avenger will enjoy due to
media hype, it's only a matter of time such improvements will be
discovered by irresponsible individuals. Besides, this may start a
new trend among virus writers to create things like MtE. Keeping up
with new virus signatures was hard enough (though manageable), but
keeping up with many mutation engines is not going to be trivial.
Unfortunately, locking up these "mutant engineers" is not a practical
solution, and not even legally possible in many parts of the world.
The message is clear. The first line of defense against
viruses is NOT using scanners. Although they proved to be very
useful, you are highly encouraged to consider other approaches such
as integrity checkers as a first line of defense. Even the less
sophisticated integrity checkers have a better chance to catch
mutating viruses, long before their developers get a chance to
analyze the virus samples. The reason is that viruses have a tendency
to modify existing code to propagate in most cases. Their spread
can be controlled using a non-virus-specific solution that
concentrates on the main characteristic of most viruses. Such an
approach is not only more cost-effective but also more secure. If
your company still relies on a virus scanner to protect its PC-based
computing resources against viruses, you are walking on thin ice.