home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!think.com!yale.edu!jvnc.net!netnews.upenn.edu!netnews.cc.lehigh.edu!news
- From: bontchev@fbihh.informatik.uni-hamburg.de (Vesselin Bontchev)
- Newsgroups: comp.virus
- Subject: How to measure polymorphism?
- Message-ID: <0022.9301071651.AA16031@barnabas.cert.org>
- Date: 6 Jan 93 21:21:12 GMT
- Sender: virus-l@lehigh.edu
- Lines: 120
- Approved: news@netnews.cc.lehigh.edu
-
- Hello, everybody!
-
- There are already two polymorphic engines available (MtE and TPE) and
- we are going to see more and more polymorphic viruses in the future.
- An interesting question arises - how to determine how polymorphic a
- virus is? How to determine which of two viruses is "more polymorphic"?
- In other words - how to measure polymorphism in an objective way?
-
- There are several ways one could think about:
-
- 1) Determining the total number of different instance that a
- particular virus can take. Obviously, a virus that can take 6
- different appearances (V-Sign) is less polymorphic than a virus that
- can take 65,536 (Cascade).
-
- Unfortunately, this does not work. First, for some viruses it is
- almost impossible (well, at least too difficult) to count the total
- number of appearances. Nobody has succeeded yet to determine the exact
- number of different decryptors that MtE can generate.
-
- Second, a variably encrypted virus with the following decryptor
-
- mov si,code_end
- mov cx,code_length
- mov ax,key
- decode:
- xor word ptr [si],ax
- jmp short skip
-
- garbage db ?
-
- skip:
- dec si
- dec si
- loop decode
-
- where the byte 'garbage' can take any value is not very polymorphic,
- regardless that the number of possible decryptors is 256. If we change
-
- garbage db ?
-
- to
-
- garbage dw ?
-
- this will mean that the number of possible decryptors is 65,536, but
- the virus is not more polymorphic...
-
- 2) The second idea is to divide the polymorphisms is classes. Class 0
- means no polymorphism, class 1 means variable encryption with constant
- decryptor, class 2 means variable encryption with a decryptor that can
- be detected by a wildcard scan string, class 3 is a virus that cannot
- be detected even with a wildcard string. E.g., Vienna (class 0) is
- less polymorphic than Cascade (class 1), which is less polymorphic
- than Suomi (class 2), which is less polymorphic than V2P2 (class 3).
-
- Unfortunately, this is not good enough. First, what to do with viruses
- that use a limited set of decryptors, one of which is selected
- randomly (Whale). Such viruses are obviously more polymorphic than
- Cascade. But are they more or less polymorphic than Suomi? They can be
- detected by a set of non-wildcard strings...
-
- Second, what about Bad Boy? It consists of 9 segments of code, 8 of
- which can appear in any order. This gives 8! = 40,320 variants. But
- the virus is even not encrypted, so it can be detected with a simple
- (non-wildcard) scan string...
-
- Third, what is a "wildcard string"? A string allowing "don't care"
- bytes? A string allowing "don't know how many don't care bytes"? A
- string allowing "don't care" bits? For instance, Maltese Amoeba cannot
- be detected by the user-defined scan strings of SCAN or F-Prot, but
- can be with the wildcard scan strings of HTScan/TbScan... So, is
- Maltese Amoeba class 2 or class 3 polymorph?
-
- Fourth, what about V2P2 and V2P6Z? None of them can be detected by a
- scan string (even a wildcard one), but most experts agree that V2P6Z
- is "more polymorphic" than V2P2 - because the decryptor can be of
- different length, the virus too, and many other things...
-
- 3) The third idea was proposed by Dr. Solomon. He proposes to use the
- length of the longest byte sequence that is present in the decryptor.
- As an addition, one could use the sum of the length of all constant
- byte sequences in the decryptor - if there are several short ones that
- can appear in any order (like it is in V2P2).
-
- This is already better, but still not good enough. According to this
- criterium, the MtE-based viruses have polymorphism 1 - because all
- decryptors contain only a single constant byte (the JNZ instruction),
- which does not appear at a constant place. (Obviously, by this
- criterium, the higher the number, the less polymorphic the virus is.)
-
- Unfortunately, as I said, this is still not good enough. There are
- other viruses with only a single constant byte in the decryptor, which
- are much less polymorphic than the MtE-based ones. On the other side,
- the TPE-based viruses can have 0 constant bytes in the decryptor, but
- this does not imply that they are more polymorphic than the MtE-based
- ones... In fact, if one looks at them very carefully, it is possible
- to see that the different decryptors are pretty similar, if you remove
- the garbage ("do-nothing") instructions...
-
- It seems that what each anti-virus researcher considers as "level of
- polymorphism" is dependent somehow on the technique he uses in his
- anti-virus product to detect polymorphic viruses. For instance, V2P6Z
- might look "more difficult" (i.e., more polymorphic) than the
- MtE-based viruses to those people whose scanners depend on the number
- of addressing modes used by the instruction that performs the actual
- decoding...
-
- This article is not meant to provide a solution of the problem. I am
- trying just to explain the problem and am asking for solutions. Any
- ideas are welcome - we really need an objective way to measure the
- level of polymorphism...
-
- Regards,
- Vesselin
- - --
- Vesselin Vladimirov Bontchev Virus Test Center, University of Hamburg
- Tel.:+49-40-54715-224, Fax: +49-40-54715-226 Fachbereich Informatik - AGN
- < PGP 2.1 public key available on request. > Vogt-Koelln-Strasse 30, rm. 107 C
- e-mail: bontchev@fbihh.informatik.uni-hamburg.de D-2000 Hamburg 54, Germany
-