January 14, 2002 | 0 comments
How does a computer virus scan work?
Geoff Kuenning, a professor of computer science at Harvey Mudd College, provides this explanation.
Malicious software comes in several flavors, distinguished primarily by their method of propagation. The two most pervasive forms are viruses and worms. A virus attaches itself to an existing program such that, when that program is executed, bad things happen. Like a biological virus, it cannot live without a host. In contrast, a worm is an independent program that reproduces itself without requiring a host program. Depending on the form, a worm may be able to propagate without any action on the victim's part. Most malicious software today consists of worms rather than viruses.
Worms and viruses require slightly different protection mechanisms because of their different propagation methods. A virus scanner operates by searching for the signatures of known viruses. A signature is a characteristic pattern that occurs in every copy of a virus. It might be a string of characters, such as a message that the virus will display on the screen when activated, or it might be binary computer code or even a particular bit of data that is embedded in the virus. These patterns are identified by technicians at organizations specializing in computer security and are then made available on security Web sites. Virus scanners can then download the patterns to bring their internal pattern lists up to date.
There are three complications with this scheme. The first is that the patterns, if ill chosen, can legitimately appear in uninfected files. For example, a pattern containing just the word "hello" would not be very useful. Part of the technicians' job is to find patterns that are unique to the viruses.
The second complication is that virus writers do not want their viruses to be detected, so they engage in a war of stealth techniques. For example, many viruses store themselves in an encrypted form, varying the encryption key as they travel so that the encrypted patterns are different on each victim machine. Virus scanners can beat this technique either by setting their patterns to search for the part of the program that decrypts the virus (this code must necessarily be unencrypted) or by duplicating the decryption operation before doing their matching.
The third complication has to do with performance. Theoretically, a virus could attach itself to any executable program. On a modern computer, there may be hundreds or even thousands of potential host programs. Scanning every one of these programs every time the virus scanner is run would take an unreasonably long time. So virus scanners usually limit themselves to a smaller list of probable hosts. For example, floppy and removable disks are common virus vectors, so removable disks are usually scanned whenever they are inserted. On Microsoft Windows, programs in the \WINDOWS\SYSTEM folder are popular virus targets, so a virus scanner will usually check those files. The scanner's internal pattern list can also identify other files that are known to be targets of a particular virus.
Because worms are independent programs, they are somewhat easier to detect than viruses. Being independent, they must reside in a file of their own somewhere and that file must be constructed such that the computer will automatically execute it. These constraints place limits on such characteristics as where the file can appear and how it is named. The scanner can simply check those well-known places and then apply the same pattern-matching techniques that are used for viruses.
Present-day scanners also look for known vectors for worms. Since most worms propagate through e-mail, a scanner can be set up to look at incoming e-mail before it is delivered to the user and to scan outgoing messages as they are sent. If a worm is detected, it can be removed from the message. If the worm is in an outgoing e-mail, it must, of course, also be removed from the infected computer.