BACKGROUND OF THE INVENTION
A conventional virtual-machine monitor (VMM) typically runs on a computer and presents to other software the abstraction of one or more virtual machines. Each virtual machine may function as a self-contained platform, running its own "guest operating system" (i.e., an operating system hosted by the VMM). The guest operating system expects to operate as if it were running on a dedicated computer rather than a virtual machine. That is, the guest operating system expects to control various computer operations and have access to hardware resources during these operations. The hardware resources may include processor-resident resources (e.g., control registers) and resources that reside in memory (e.g., descriptor tables).
In a virtual-machine environment, the VMM should be able to have ultimate control over these resources to provide proper operation of virtual machines and for protection from and between virtual machines. To achieve this, the VMM typically intercepts and arbitrates all accesses made by guest software to the hardware resources. Specifically, when guest software requests an operation that requires access to a protected hardware resource, the control over this operation is transferred to the VMM which then assures the validity of the access, emulates the functionality desired by guest software and transfers control back to the guest software, thereby protecting the hardware resources and virtualizing accesses of guest software to hardware resources. Because the number of hardware resource elements that need to be protected from accesses by guest software is large and such accesses may be frequent, there is a significant performance cost associated with this protection and virtualization.
One example of a hardware resource that is frequently accessed by guest software is a control register. For instance, in the instruction-set architecture (ISA) of the Intel Pentium IV (referred to herein as the IA-32 ISA), there are a number of control registers that are used to configure the processor operating mode, control the memory subsystem configuration and hardware resources, etc. Typically, when guest software attempts to access a bit in a control register, the control is transferred to the VMM which is responsible for maintaining consistency between write and read operations initiated by the guest software with respect to this bit. That is, the VMM controls the value that guest software is allowed to write to each bit of the control register and the value that guest software reads from each bit. Such virtualization of control register accesses creates significant performance overheads.
DESCRIPTION OF EMBODIMENTS
A method and apparatus for controlling accesses of guest software to registers in a virtual-machine architecture are described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention can be practiced without these specific details.
Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer system's registers or memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or the like, may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer-system memories or registers or other such information storage, transmission or display devices.
In the following detailed description of the embodiments, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. Moreover, it is to be understood that the various embodiments of the invention, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described in one embodiment may be included within other embodiments. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled.
FIG. 1 illustrates one embodiment of a virtual-machine environment 100, in which the present invention may operate. In this embodiment, bare platform hardware 116 comprises a computing platform, which may be capable, for example, of executing a standard operating system (OS) or a virtual-machine monitor (VMM), such as a VMM 112. The VMM 112, though typically implemented in software, may emulate and export a bare machine interface to higher level software. Such higher level software may comprise a standard or real-time OS, may be a highly stripped down operating environment with limited operating system functionality, may not include traditional OS facilities, etc. Alternatively, for example, the VMM 112 may be run within, or on top of, another VMM. VMMs and their typical features and functionality are well-known by those skilled in the art and may be implemented, for example, in software, firmware or by a combination of various techniques.
The platform hardware 116 includes a processor 118 and memory 120. Processor 118 can be any type of processor capable of executing software, such as a microprocessor, digital signal processor, microcontroller, or the like. The processor 118 may include microcode or hardcoded logic for performing the execution of method embodiments of the present invention.
The platform hardware 116 can be of a personal computer (PC), mainframe, handheld device, portable computer, set-top box, or any other computing system.
Memory 120 can be a hard disk, a floppy disk, random access memory (RAM), read only memory (ROM), flash memory, or any other type of machine medium readable by processor 118. Memory 120 may store instructions for performing the execution of method embodiments of the present invention.
The VMM 112 presents to other software (i.e., "guest" software) the abstraction of one or more virtual machines (VMs), which may provide the same or different abstractions to the various guests. FIG. 1 shows two VMs, 102 and 114. The guest software running on each VM may include a guest OS such as a guest OS 104 or 106 and various guest software applications 108–110. Each of the guest OSs 104 and 106 expects to control access to physical resources (e.g., processor registers, memory and memory-mapped I/O devices) within the VMs 102 and 114 on which the guest OS 104 or 106 is running and to perform other functions.
The VMM 112 facilitates functionality desired by guest software while retaining ultimate control over privileged hardware resources within the platform hardware 116. Specifically, once guest software attempts to access a privileged resource, the control over the processor is transferred to the VMM112, which then decides whether to perform a requested operation (e.g., emulate it for the guest software, proxy the operation directly to the platform hardware 116, etc.) or deny access to the resource to facilitate security, reliability or other mechanisms. The act of facilitating the functionality for the guest software may include a wide variety of activities on the part of the VMM 112. The activities of the VMM 112 as well as its characteristics should not limit the scope of the present invention.
In one embodiment, the transfer of control from guest software to VMM is dictated by control bit settings in a virtual machine control structure (VMCS)122. Settings in the VMCS 122 may prevent guest software from performing operations that may result in its access of certain privileged hardware resources. Different guest software may execute with different control bit settings in different VMCS memory images, though only one such VMCS is shown in FIG. 1. The VMCS 122 resides in memory 120 and is maintained by the processor 118. It should be noted that any other data structure (e.g., an on-chip cache, a file, a lookup table, etc.) may be used to store the VMCS 122 or the fields associated with each designated hardware resource without loss of generality.
When guest software attempts to perform an operation which accesses protected resources, control is transferred to the VMM 112. The VMM 112 has access to all platform hardware 116. When such a transition occurs, the VMM 112 receives control over the operation initiated by guest software. The VMM 112 then may perform this operation or deny access as described above, and may transfer control back to guest software by executing a special instruction. The control of guest software through this mechanism is referred to herein as VMX operation and the transfer of control from the guest software to VMM is referred to herein as a VM exit.
In one embodiment, the execution of certain instructions, certain exceptions and interrupts and certain platform events may cause a VM exit. These potential causes of VM exits are referred to herein as virtualization events. For example, a VM exit may be generated when guest software attempts to perform an operation (e.g., an instruction) that may result in its access of certain privileged hardware resources (e.g., a control register or an IO port).
In an embodiment, when a VM exit occurs, components of the processor state used by guest software are saved, and components of the processor state required by the VMM 112 are loaded. This saving and loading of processor state may, depending on the processor ISA, have the effect of changing the active address space (e.g., in the IA-32 ISA, the active address space is determined by the values in the control registers, which may be saved and restored on VM exit). In one embodiment, the components of the processor state used by guest software are stored in a guest-state area of VMCS 122 and the components of the processor state required by the VMM 112 are stored in a monitor-state area of VMCS 122.
In one embodiment, when a transition from the VMM to guest software occurs, the processor state that was saved at the VM exit is restored and control is returned to the guest OS 104 or 106 or guest applications 108 or 110.
In an embodiment, when a VM exit occurs, control is passed to the VMM 112 at a specific entry point (e.g., an instruction pointer value) delineated in the VMCS 122. In another embodiment, control is passed to the VMM 112 after vectoring through a redirection structure (e.g., the interrupt-descriptor table in the IA-32 ISA). Alternatively, any other mechanism known in the art can be used to transfer control from the guest software to the VMM 112.
Because the number of hardware resource elements that need to be protected from accesses by guest software is large and such accesses may be frequent, there is a significant performance cost associated with this protection and virtualization. In addition, an operation initiated by guest software may involve access to a privileged resource, which may pose no problem to the security and proper operation of the VMs 102 and 114. For example, in the IA-32 ISA, control register 0 (CR0) includes a task-switch (TS) bit that is used to optimize context switching by avoiding saving and restoring floating-point state until the state is accessed. The update of the TS bit by the guest OS through the Clear Task-Switched Flag (CLTS) instruction is unlikely to pose a problem to system security and proper operation of the VMs 102 and 114. In contrast, the paging enable (PG) bit of CR0 configures the processor operating mode and as such must be controlled exclusively by the VMM 112. In some cases, the VMM 112 may not allow the guest software to disable paging and therefore must control attempts of the guest software to do so.
In one embodiment, a filtering mechanism is provided for reducing the number of VM exits caused by accesses of guest software to such hardware resources as registers (e.g., control registers, general purpose registers, model-specific registers, etc.) or memory-based resources (e.g., paging control fields in memory, etc.). It should be noted that while an exemplary embodiment of the present invention is described below with reference to a register, the teachings of the present invention may be applied to any other hardware resource without loss of generality.
The filtering mechanism functions using one or more fields associated with each designated hardware resource as will be described in greater detail below. In one embodiment, the fields associated with each designated hardware resource are contained in a VMCS 122.
FIG. 2 is a flow diagram of one embodiment of a process 200 for filtering accesses of guest software to a hardware resource such as a register. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 2, process 200 begins with processing logic receiving a command pertaining to one or more portions of a register from guest software (processing block 202). A register portion may be a particular single bit of the register or multiple (contiguous or non-contiguous) bits of the register. The command pertaining to the register portions may be a read command requesting to read data from the register portions or a write command requesting to write data to the register portions. The register may represent a control register (e.g., CR0 or CR4 in the IA-32 ISA), an integer register, or any other register or memory-based resource.
Next, processing logic reads corresponding indicators from a mask field (processing block 204). The mask field includes a set of indicators corresponding to portions of the register. For example, if the register is a 32-bit control register (e.g., CR0 or CR4 in the IA-32 ISA), the mask field may include 32 indicators, with each indicator corresponding to a particular bit of the control register. Alternatively, the mask field may have fewer indicators than the number of bits in the register because the register may have unused bits, some indicators in the mask field may correspond to two or more bits, or for any other reason. Each indicator in the mask field provides information on whether a portion is under guest control (i.e., the guest software is permitted to access the corresponding portion of the register) or under control of the VMM. In an embodiment of the invention, bits in the register that do not have a corresponding mask bit are assumed to be under guest control. In another embodiment of the invention, they are assumed to be under VMM control.
At decision box 206, processing logic determines whether guest software is permitted to access all of the requested register portions based on the corresponding indicators from the mask field. If the determination is positive, processing logic executes the command on the requested register portions (processing block 208). That is, processing logic reads data from, or writes data to, the requested register portions.
Otherwise, if the determination made at decision box 206, is negative, then in one embodiment, processing logic transfers control to the VMM (processing block 210).
In an alternative embodiment, an extra field is used to further reduce the number of situations in which control is transferred in to the VMM. The extra field is referred to herein as a shadow value field. Each portion of the shadow value field corresponds to a particular portion of the register and stores the value that guest software expects to see in this portion of the register. In an embodiment of the invention, the value of the shadow value field is maintained by the VMM and is stored in the VMCS. In one embodiment, only the register portions with the indicators in the mask field that indicate the inability of guest software to access these register portions have corresponding portions in the shadow value field. For example, in the IA-32 ISA, if guest software is not permitted to access bits 1 through 10 in CR0 as reflected by values of indicators in the mask field, the size of the shadow value field will be limited to 10 bits that correspond to bits 1 through 10 of CR0. In another embodiment, each register portion with an indicator (regardless of its value) in the mask field has a corresponding portion in the shadow value field.
FIG. 4 is a flow diagram of one embodiment of a process 400 for providing an additional filtering of guest software accesses to a hardware resource such as a register. In one embodiment, process 400 replaces block 210 in FIG. 2. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 4, process 400 begins at processing block 402 with processing logic determining that one or more register portions being accessed are under control of the VMM based on corresponding one or more indicators in a mask field, as discussed above in conjunction with FIG. 2.
Next, at decision box 403, processing logic determines whether the access is a command to write data to the requested register portions. If the determination is positive, i.e., the access is a write command, processing logic determines whether the data that the guest wishes to write to each of the portions is equal to data stored in corresponding portions of a read shadow field for all portions under VMM control (decision box 404). If this determination is positive for all requested portions that are under VMM control, the guest is allowed to write data to all portions of the actual register resource that are under guest control (as determined by the corresponding bits in the mask field) (processing block 405) and then process 400 ends. If the determination is negative for any requested portions that are under VMM control, processing logic transfers control to the VMM (processing block406). The VMM then updates the corresponding portion of the shadow value field and actual register resource as necessary according to its implementation requirements and transfers control back to guest software.
Alternatively, if the command initiated by guest software is a command to read data from the requested register portions, control is not transferred to the VMM. Specifically, processing logic accesses the corresponding portions of the shadow value field for all requested portions that are under VMM control (processing block 412) and returns data stored in these portions of the shadow value field combined with values from the actual register resource for portions of the resource that are under guest control to guest software (processing block 414).
One embodiment in which the transfer of control to the VMM is supported via VMX operation discussed in greater detail above with reference to FIG. 1 will now be described in more detail.
In one embodiment, the VMM maintains a set of control bits to configure which virtualization events will cause a VM exit. This set of control bits is referred to herein as a redirection map. In one embodiment, the redirection map is contained in the VMCS 122 of FIG. 1. Once an occurrence of a virtualization event is detected, the redirection map is consulted to find an unconditional exit bit associated with this virtualization event. The bit indicates whether this virtualization event will unconditionally result in a VM exit. For example, the redirection map may include two bits for each control register, with one bit controlling VM exits on guest requests to read data from the control register and the other bit controlling VM exits on guest requests to write data to the control register.
In addition, in one embodiment, for each designated resource (e.g., CR0 or CR4 in the IA-32 ISA), the redirection map includes a bit indicating whether a mask field will be used for this resource and a bit indicating whether a shadow value field will be used for this resource.
FIG. 5 is flow diagram of one embodiment of a process 500 for controlling access to a hardware resource such as a register during VMX operation using a redirection map. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 5, process 500 begins at processing block 502 with processing logic identifying an occurrence of a virtualization event caused by a request of guest software to access a portion of a hardware resource such as a control register. This request is either a command to read data from one or more portions of a particular register or a command to write data to one or more portions of a particular register.
At processing block 503, processing logic consults the redirection map to determine if the unconditional exit bit associated with this virtualization event is set (decision box 506). If this bit is set, processing logic triggers a VM exit (processing block 522). For example, in the IA-32 ISA the redirection map may include bits to unconditionally cause VM exits on writes to CR2, reads from CR0, writes to CR4, etc.
Alternatively, if the unconditional exit bit is not set, processing logic further determines whether a mask field is to be used for the register (decision box508). This determination is made using a designated bit in the redirection map. For example, in the IA-32 ISA there may be bits in the redirection map indicating if a mask is used for CR0, if a mask is used for CR4, etc. If the mask field is not to be used for this register, processing logic executes the requested read or write command on the requested register portions (processing block 514). Otherwise, processing logic reads mask field bits corresponding to the requested register portions (processing block 510). These bits are referred to as the requested mask field bits. The requested mask field bits are examined to determine if one or more of them are set (indicating that one or more of the corresponding register portions are under VMM control) (decision box 512).
If none of the requested mask field bits are set, i.e., guest software is allowed to access all of the requested register portions, processing logic executes the requested read or write command on the register portions (processing block 514). Otherwise, if any bits in the requested mask field are set, processing logic determines whether a shadow value field will be used for the register based on a designated bit in the redirection map (decision box 516). For example, in the IA-32 ISA there may be bits in the redirection map to indicate if a shadow value is used for CR0 accesses, for CR4accesses, etc. If the shadow value field is not to be used for the register, processing logic triggers a VM exit (processing box 522).
If the shadow value field is to be used for the register and the request initiated by guest software is a read command (decision box 517), processing logic reads the bits of the shadow value field that correspond to those register portions that are set in the requested mask field and hence are under VMM control (processing block 518). These bits from the shadow value field are combined with the bits from the actual register that correspond to bits in the requested mask field that are not set and hence are under guest control. These combined values are then returned to guest software. Values of bits in the protected resource which are not represented in the mask and/or shadow value field may be read from the register.
If the shadow value field is to be used for the register but the request initiated by guest software is a write command, processing logic compares the value requested to be written to register bits under VMM control with the value of corresponding bits in the shadow value field (decision box 520). If these two values are the same, the requested register portions that are under guest control are written (processing block 519). That is, the bits under guest control are written; those under VMM control remain unchanged. In one embodiment, bits in the register which are not represented in the mask and/or shadow value field may be written if they are assumed to be under guest control. In another embodiment, data is not written to the unrepresented bits because they are assumed to be under VMM control. Otherwise, if the two values compared at decision box 520 are different, processing logic triggers a VM exit (processing block 522).
In one embodiment, a set of criteria is predefined by the VMM for filtering VM exits. The criteria are based on combinations of values stored in a mask field and a shadow value field and a value that guest software wishes to write to the register. FIG. 3 is a flow diagram of one embodiment of a process300 for filtering VM exits using a set of criteria. The process may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
Referring to FIG. 3, process 300 begins with processing logic determining whether the access of guest software is a request to write data to a register (decision box 302). If the determination is negative, i.e., the access is a request to read data from the register, the resulting value of the read request is determined using the following expression:
DEST=(MF AND SVF) OR (NOT MF AND CRVAL),
where AND, NOT and OR are bitwise Boolean operators, MF is a value of the mask field, SVF is a value of the shadow value field, and CRVAL is the current value of the actual protected register. A bit in the mask field is set if a corresponding bit in the register is controlled by the VMM. Otherwise, if a bit in the register is controlled by guest software, then a corresponding bit in the mask field is equal to zero. A bit in the shadow field has the value that guest software expects to see in the corresponding bit of the register and may be different from the current value of the corresponding bit of the actual register.
According to the above expression, if the requested bit is controlled by guest software, the data is read from the register, and if the requested bit is controlled by the VMM, the data is read from the shadow value field.
If processing logic determines at decision box 302 that the access of guest software is a write request, processing logic combines the values of the mask field and shadow value field (processing block 304) as follows:
INT1=MF AND SVF.
In addition, processing logic combines the value of the mask field with the value that guest software wishes to write to the register (processing block306) using the following expression:
INT2=MF AND SRC,
where SRC is the value that the guest wishes to write to the register.
Further, processing logic compares the two combinations at decision box 308. If the two combinations are equal, i.e., all bits in the register are either controlled by guest software, or controlled by the VMM and the value of the corresponding bit in shadow value field is equal to the value that guest software wishes to write to the register, then processing logic executes the following expression at processing block 312:
CR=(MF AND CRVAL) OR (NOT MF AND SRC).
According to this expression, if a bit in the register is controlled by guest software, the bit will be updated with the value that guest software wishes to write. Otherwise, the value of the bit in the register will remain the same and will not be updated.
Alternatively, if the two combinations are not equal, i.e., at least one bit in the register is controlled by the VMM and the value of the corresponding bit in shadow value field is not equal to the value that guest software wishes to write to the register, then processing logic triggers a VM exit at processing block 310.
Note that the description of the process 300 is simplified by using the entire register (e.g., 32 bits for the CR0 register in the IA-32 ISA) and mask and shadow value fields that are 32 bits wide. A person of ordinary skill in the art will understand that embodiments of the present invention can apply to read and write operations that access only a limited subset of the register bits or that access bits in multiple registers. Additionally, those skilled in the art will see application of the invention to situations where there is not a bit-for-bit correspondence between the various elements involved (e.g. if bits in the mask apply to multiple bits in the protected resource).