Full virtualization has proven highly successful for a) sharing a computer system among multiple users, b) isolating users from each other (and from the control program) and c) emulating new hardware to achieve improved reliability, security and productivity.
VMware Workstation, Server, and ESX take a more optimized path to running target operating systems on the host than emulators (such as Bochs) which simulate the function of each CPU instruction on the target machine one-by-one, or dynamic recompilation which compiles blocks of machine-instructions the first time they execute, and then uses the translated code directly when the code runs subsequently. (Microsoft Virtual PC for Mac OS X takes this approach.) VMware software does not emulate an instruction set for different hardware not physically present. This significantly boosts performance, but can cause problems when moving virtual machine guests between hardware hosts using different instruction-sets (such as found in 64-bit Intel and AMD CPUs), or between hardware hosts with a differing number of CPUs. Stopping the virtual-machine guest before moving it to a different CPU type generally causes no issues.
VMware's products use the CPU to run code directly whenever possible (as, for example, when running user-mode and virtual 8086 mode code on x86). When direct execution cannot operate, such as with kernel-level and real-mode code, VMware products re-write the code dynamically, a process VMware calls "binary translation" or BT. BT automatically modifies x86 software on-the-fly to replace instructions that "pierce the virtual machine" with a different, virtual machine safe sequence of instructions; this technique provides the appearance of full virtualization. The translated code gets stored in spare memory, typically at the end of the address space, which segmentation mechanisms can protect and make invisible. For these reasons, VMware operates dramatically faster than emulators, running at more than 80% of the speed that the virtual guest operating-system would run directly on the same hardware. In one study VMware claims a slowdown over native ranging from 0 to 6 percent for the VMware ESX Server.
VMware's approach avoids some of the difficulties of virtualization on x86-based platforms. Virtual machines may deal with offending instructions by replacing them, or by simply running kernel-code in user-mode. Replacing instructions runs the risk that the code may fail to find the expected content if it reads itself; one cannot protect code against reading while allowing normal execution, and replacing in-place becomes complicated. Running the code unmodified in user-mode will also fail, as most instructions which just read the machine-state do not cause an exception and will betray the real state of the program, and certain instructions silently change behavior in user-mode. One must always rewrite; performing a simulation of the current program counter in the original location when necessary and (notably) remapping hardware code breakpoints.
Although VMware virtual machines run in user-mode, VMware Workstation itself requires the installation of various drivers in the host operating-system, notably to dynamically switch the GDT and the IDT tables.
The VMware product line can also run different operating systems on a dual-boot system simultaneously by booting one partition natively while using the other as a guest within VMware Workstation.
Each guest in vmware is contained within one large file. Copy the file and you have a backup of the whole system.
VMware introduced in 1998 a hypervisor for machines using the Intel x86 instruction set. The x86 architecture used in most PC systems poses particular difficulties to virtualization. Full virtualization (presenting the illusion of a complete set of standard hardware) on x86 has significant costs in hypervisor complexity and run-time performance. Recently CPU vendors have added hardware virtualization assistance to their products. Intel's is called Intel VT (codenamed Vanderpool) and AMD's is referred to as AMD-V (codenamed Pacifica). These extensions address the parts of x86 that are difficult or inefficient to virtualize, providing additional support to the hypervisor. This enables simpler virtualization code and a higher performance for full virtualization.
An alternative approach requires modifying the guest operating-system to make system calls to the hypervisor, rather than executing machine I/O instructions which are then simulated by the hypervisor. This is called paravirtualization in Xen, a "hypercall" in Parallels Workstation, and a "DIAGNOSE code" in IBM's VM. VMware supplements the slowest rough corners of virtualization with device drivers for the guest. All are really the same thing, a system call to the hypervisor below. Some microkernels such as Mach and L4 are flexible enough such that "paravirtualization" of guest operating systems is possible.
Others, like Xen, implement software-only virtual machines. Xen runs on a normal host operating system such as Linux, and is able to run both paravirtualized and fully virtualized (i.e., unmodified) operating systems with the help of the hardware virtualization extensions Intel VT-x. The Xen distribution already contains versions of FreeBSD, Linux, NetBSD, and Plan 9 from Bell Labs that have been so modified. User programs will continue to work on Xen without change. Also, Xen has been re-implemented on the OpenSolaris operating system as of build 75
The DMA controllers are special hardware . now embedded into the chip in modern integrated processors . that manage the data transfers and arbitrate access to the system bus. The controllers are programmed with source and destination pointers (where to read/write the data), counters to track the number of transferred bytes, and settings, which includes I/O and memory types, interrupts and states for the CPU cycles.
Transfers are initiated when the DMA controller is notified of the need to move data to the memory by some event (keyboard press or mouse click, for examples). The controller asserts a DMA request signal to the CPU to use the system bus. The CPU completes its current operation and yields control of the bus to the DMA controller via a DMA acknowledge signal. The controller then reads and writes data and controls signals as if it is the CPU, which at that instant is tri-stated (idled). Upon completion of the transfer, DMA controller de-asserts the DMA request signal and the CPU in turn removes its DMA acknowledge signal and resumes control of the bus.
DMA is implemented in computer bus architectures to speed up computer operations and allow multitasking. Normally, the CPU will be fully occupied in any read/write operation; enabling DMA allows reading/writing data in the internal memory, external memory and peripherals without CPU involvement, thus making the processor available for other tasks. This ensures streamlined operations, as movement of data to/from memory is one of the most common computer operations and freeing the CPU of this overhead can lead to a significant improvement in performance.
DMA is useful in real-time computing applications where critical operations must be done concurrently. Stream processing is another application of DMA, where transfer and data processing are done simultaneously. Many hardware systems use DMA including floppy and disk drive controllers, graphics cards, network cards, sound cards and graphics processing units.
Synchronous DMA moves a byte or word at a time between system memory and a peripheral. After completing each transfer, the DMA asks the I/O port to signal when the latter is ready for another transaction. In this set-up, the DMA and the CPU shares the bus cycles, with the DMA winning any contest for system bus control.
Burst Mode DMA assumes that both the destination and source can take transfers as quickly as the controller can make them. The CPU sets up the controller, and after a signal from the I/O port, the entire data is copied to the destination. The DMA controller has sole access to the system bus during the transfer which is very rapid compared to synchronous DMA.
Flyby DMA, which is not supported by all controllers, puts out the source or destination address, then initiates a simultaneous read and write cycle. Flyby transfers are very fast as the read cycle and write cycle are compressed to a single cycle. Flyby can support both burst and synchronous types of transactions.
© Nachum Danzig 2010