QEMU, a Fast and Portable Dynamic Translator-Fabrice Bellard-翻译

2023-03-07,,

Abstract

We present the internals of QEMU, a fast machine emulator using an original portable dynamic translator. It emulates several CPUs (x86, PowerPC, ARM and Sparc) on several hosts (x86, PowerPC, ARM, Sparc, Alpha and MIPS). QEMU supports full system emulation in which a complete and unmodified operating system is run in a virtual machine and Linux user mode emulation where a Linux process compiled for one target CPU can be run on another CPU.

我们展示QEMU的内部,一个快速的机器仿真器使用原始的便携式动态翻译器。它模拟了几个主机(x86,PowerPC,ARM和SPARC)在几个主机(x86,PowerPC,ARM,SPARC,alpha和MIPS)。QEMU支持全系统仿真,其中完整和未修改的操作系统可以运行在虚拟机上,以及Linux的用户模式,使得Linux进程编译的一个CPU可以运行另一个CPU。

1 Introduction

QEMU is a machine emulator: it can run an unmodified target operating system (such as Windows or Linux) and all its applications in a virtual machine. QEMU itself runs on several host operating systems such as Linux, Windows and Mac OS X. The host and target CPUs can be different.

The primary usage of QEMU is to run one operating system on another, such as Windows on Linux or Linux on Windows. Another usage is debugging because the virtual machine can be easily stopped, and its state can be inspected, saved and restored. Moreover, specific embedded devices can be simulated by adding new machine descriptions and new emulated devices.

QEMU的主要用途是在另一个操作系统上运行一个操作系统,例如Linux上的Windows或Windows上的Linux。另一种用法是调试,因为虚拟机可以很容易地停止,并且它的状态可以被检查、保存和恢复。此外,可以通过添加新的机器描述和新的仿真设备来模拟特定的嵌入式设备。

QEMU also integrates a Linux specific user mode emulator. It is a subset of the machine emulator which runs Linux processes for one target CPU on another CPU. It is mainly used to test the result of cross compilers or to

test the CPU emulator without having to start a complete virtual machine.

QEMU还集成了Linux特定的用户模式仿真器。 它是机器仿真器的一个子集,它为另一个CPU上的一个目标CPU运行Linux进程。它主要用于测试交叉编译器的结果或测试CPU仿真器,而不必启动完整的虚拟机。【我们不用了解用户模式,我们要弄的是全系统仿真】

QEMU is made of several subsystems:

- CPU emulator (currently x86, PowerPC, ARM and Sparc)

- Emulated devices (e.g. VGA display, 16450 serial port, PS/2 mouse and keyboard, IDE hard disk, NE2000 network card, ...)

- Generic devices (e.g. block devices, character devices, network devices) used to connect the emulated devices to the corresponding host devices

用于将仿真设备连接到相应主机设备的通用设备(例如,块设备、字符设备、网络设备)

- Machine descriptions (e.g. PC, PowerMac, Sun4m) instantiating the emulated devices

实例化仿真设备的机器描述

- debugger

- user interface

This article examines the implementation of the dynamic translator used by QEMU. The dynamic translator performs a runtime conversion of the target CPU instructions into the host instruction set. The resulting binary code is stored in a translation cache so that it can be reused. The advantage compared to an interpreter is that the target instructions are fetched and decoded only once.

本文研究QEMU所使用的动态翻译器的实现。动态翻译器执行目标CPU指令到主机指令集的运行时转换。所得二进制代码存储在翻译缓存中,以便可以重用。与解释器相比的优点是只获取和解码目标指令一次。

Usually dynamic translators are difficult to port from one host to another because the whole code generator must be rewritten. It represents about the same amount of work as adding a new target to a C compiler. QEMU is much simpler because it just concatenates pieces of machine code generated off line by the GNU C Compiler.

由于整个代码生成器必须重写,所以动态翻译通常难以从一个主机传送到另一个主机。它代表了与C编译器添加新目标相同的工作量。QEMU要简单得多,因为它只连接GNU C编译器离线生成的机器代码块。

A CPU emulator also faces other more classical but difficult [2] problems:

CPU仿真器还面临其他更经典但困难的问题。

- management of the translated code cache

翻译代码缓存的管理

- register allocation 寄存器分配

- condition code optimizations 条件码优化

- direct block chainning 直接块链接

- memory management 内存管理

- self-modifying code support 自修改代码支持

- exception support 异常支持

- hardware interrupts 硬件中断

- user mode emulation

2 Portable dynamic translation

便携式动态翻译

2.1 description

The first step is to split each target CPU instruction into fewer simpler instructions called micro operations. Each micro operation is implemented by a small piece of C code. This small C source code is compiled by GCC to

an object file. The micro operations are chosen so that their number is much smaller (typically a few hundreds) than all the combinations of instructions and operands of the target CPU. The translation from target CPU instructions

to micro operations is done entirely with hand coded code. The source code is optimized for readability and compactness because the speed of this stage is less critical than in an interpreter.

第一步是将每个目标CPU指令分割成更小的更简单的指令,称为微操作。每一个微操作都由一小段C代码实现。这个小段C源代码是由GCC编译得到的对象文件。微操作的选择使得它们的数量比目标CPU的所有指令和操作数的组合小得多(通常只有几百个)。从目标CPU指令的翻译微操作完全由手工编码来完成。为了可读性和紧凑性,对源代码进行了优化,因为这个阶段的速度会比解释器里的要慢。

A compile time tool called dyngen uses the object file containing the micro operations as input to generate a dynamic code generator. This dynamic code generator is invoked at runtime to generate a complete host function which concatenates several micro operations.

称为dyngen的编译时间工具使用包含微操作的对象文件作为输入,来生成动态代码生成器。在运行时调用这个动态代码生成器,来生成连接多个微操作的完整主机函数。

The process is similar to [1], but more work is done at compile time to get better performance. In particular,a key idea is that in QEMU constant parameters can be given to micro operations. For that purpose, dummy code relocations are generated with GCC for each constant parameter. This enables the dyngen tool to locate the relocations and generate the appropriate C code to resolve them when building the dynamic code. Relocations are also supported to enable references to static data and to other functions in the micro operations.

该过程类似于[1],但是在编译时进行更多的工作以获得更好的性能。特别是,一个关键的想法是,在QEMU常数参数可以给微操作。为此,使用GCC为每个常数参数生成虚拟代码重定位。这使得dyngen工具能够定位重定位,并在构建动态代码时生成适当的C代码来解决它们。还支持重新定位以启用对静态数据和微操作中其他功能的引用。

2.2 example

Consider the case where we must translate the following PowerPC instruction to x86 code:

addi r1,r1,-16 # r1 = r1 - 16(courier字体)

The following micro operations are generated by the PowerPC code translator:

movl_T0_r1 # T0 = r1

addl_T0_im -16 # T0 = T0 - 16

movl_r1_T0 # r1 = T0

【这样的感觉就是把一个指令的每一个小步骤都详细写出来】

The number of micro operations is minimized without impacting the quality of the generated code much. For example, instead of generating every possible move between every 32 PowerPC registers, we just generate moves to and from a few temporary registers. These registers T0, T1, T2 are typically stored in host registers by using the GCC static register variable extension.

在不影响生成代码的质量的情况下,最小化了微操作的数量。例如,我们不是每32个PowerPC寄存器之间生成每个可能的移动,而只是生成到几个临时寄存器的移动。这些寄存器T0、T1、T2通常通过使用GCC静态寄存器变量扩展被存储在主机寄存器中

The micro operation movl T0 r1 is typically coded as:

void op_movl_T0_r1(void)

{

T0 = env->regs[1];

}

env is a structure containing the target CPU state. The 32 PowerPC registers are stored in the array env->regs[32].

Env是一种包含目标CPU 实例的结构体

addl T0 im is more interesting because it uses a constant parameter whose value is determined at runtime:

extern int __op_param1;

void op_addl_T0_im(void)

{

T0 = T0 + ((long)(&__op_param1));

}

The code generator generated by dyngen takes a micro operation stream pointed by opc ptr and outputs the host code at position gen code ptr. Micro operation parameters are pointed by opparam ptr:

由dyngen生成的代码生成器获取opc ptr所指向的微操作流,并在位置gen code ptr处输出主机代码。OPARAMP PTR指出微操作参数:

[...]

for(;;) {

switch(*opc_ptr++) {

[...]

case INDEX_op_movl_T0_r1:

{

extern void op_movl_T0_r1();

memcpy(gen_code_ptr, (char *)&op_movl_T0_r1+0,3);

gen_code_ptr += 3;

break;

}

case INDEX_op_addl_T0_im:

{

long param1;

extern void op_addl_T0_im();

memcpy(gen_code_ptr, (char *)&op_addl_T0_im+0,6);

param1 = *opparam_ptr++;

*(uint32_t *)(gen_code_ptr + 2) = param1;

gen_code_ptr += 6;

break;

}

[...]

}

}

[...]

}

For most micro operations such as movl T0 r1, the host code generated by GCC is just copied. When constant parameters are used, dyngen uses the fact that relocations to op param1 are generated by GCC to patch the generated code with the runtime parameter (here it is called param1).

When the code generator is run, the following host code is output:

对于大多数微操作,如MOVL T0 R1,由GCC生成的主机代码只是复制。当使用常数参数时,dyngen使用GCC生成对op param1的重定位来用运行时参数(这里称为param1)修补生成的代码。

当代码生成器运行时,输出以下主机代码:

# movl_T0_r1

# ebx = env->regs[1]

mov 0x4(%ebp),%ebx

# addl_T0_im -16

# ebx = ebx - 16

add $0xfffffff0,%ebx

# movl_r1_T0

# env->regs[1] = ebx

mov %ebx,0x4(%ebp)

On x86, T0 is mapped to the ebx register and the CPU

state context to the ebp register.

2.3 Dyngen implementation

The dyngen tool is the key of the QEMU translation process. The following tasks are carried out when running it on an object file containing micro operations:

Dyngen工具是QEMU翻译过程的关键。当在包含微操作的对象文件上运行时,执行以下任务:

- The object file is parsed to get its symbol table, its relocations entries and its code section. This pass depends on the host object file format (dyngen supports ELF (Linux), PE-COFF (Windows) and

MACH-O (Mac OS X)).

对象文件被解析以获取其符号表、其重新定位项及其代码段。这个传递取决于主机对象文件格式(DYNGEN支持ELF(Linux)、PE-COFF(Windows)和

MACH-O(MAC OS X)。

- The micro operations are located in the code section using the symbol table. A host specific method is executed to get the start and the end of the copied code. Typically, the function prologue and epilogue are skipped.

微操作使用符号表位于代码段中。主机执行一种特定的方法以获得复制代码的开始和结束。通常,忽略掉函数序言和尾随。

- The relocations of each micro operations are examined to get the number of constant parameters.The constant parameter relocations are detected by the fact they use the specific symbol name --op_ paramN.

检查每个微操作的重新定位以获得常数参数的数量。

- A memory copy in C is generated to copy the micro operation code. The relocations of the code of each micro operation are used to patch the copied code so that it is properly relocated. The relocation patches are host specific.

用C写的内存拷贝以复制微操作代码。每个微操作代码的重新定位用于修补复制的代码,以便正确地重新定位。重新定位补丁是特定于主机的。

- For some hosts such as ARM, constants must be stored near the generated code because they are accessed with PC relative loads with a small displacement. A host specific pass is done to relocate these constants in the generated code.

对于一些主机,如ARM,常数必须存储在生成的代码附近,因为它们是通过PC相对负载访问的,位移很小。主机特定的传递是为了在生成的代码中重新定位这些常量。

When compiling the micro operation code, a set of GCC flags is used to manipulate the generation of function prologue and epilogue code into a form that is easy to parse. A dummy assembly macro forces GCC to always terminate the function corresponding to each micro

operation with a single return instruction. Code concatenation

would not work if several return instructions were generated in a single micro operation.

在编译微操作代码时,使用一组GCC标志来将函数序言和尾随代码的生成操作成易于解析的形式。虚拟装配宏迫使GCC总是终止对应于每个微的功能。使用单个返回指令进行操作。代码级联如果在一个微操作中产生多个返回指令,则无法工作。

3 implementation details

3.1 translated blocks and translation cache

When QEMU first encounters a piece of target code, it translates it to host code up to the next jump or instruction modifying the static CPU state in a way that cannot be deduced at translation time. We call these basic blocks

Translated Blocks (TBs).

当QEMU第一次遇到一段目标代码时,它将它转换为主机代码,直到下一次跳转或指令以翻译时无法以推断的方式修改静态CPU状态。我们称这些基本翻译块(TB)。

A 16 MByte cache holds the most recently used TBs.For simplicity, it is completely flushed when it is full.

16 MB的缓存保存最近使用的TB块。为简单起见,它满了才会被清空。

The static CPU state is defined as the part of the CPU state that is considered as known at translation time when entering the TB. For example, the program counter (PC) is known at translation time on all targets. On x86, the static CPU state includes more data to be able to generate better code. It is important for example to know if the CPU is in protected or real mode, in user or kernel mode, or if the default operand size is 16 or 32 bits.

静态CPU状态被定义为CPU状态的一部分,当进入tb时,该CPU状态在翻译时被认为是已知的。例如,程序计数器(PC)在所有目标上的翻译时间是已知的。在x86上,静态CPU状态包括更多的数据,以便能够生成更好的代码。例如,了解CPU是保护模式还是真实模式、用户模式还是内核模式,或者默认的操作数大小是16位还是32位,是很重要的。

3.2 register allcocation

QEMU uses a fixed register allocation. This means that each target CPU register is mapped to a fixed host register or memory address. On most hosts, we simply map all the target registers to memory and only store a few temporary variables in host registers. The allocation of the temporary variables is hard coded in each target CPU description. The advantage of this method is simplicity and portability.

QEMU使用固定寄存器分配。这意味着每个目标CPU寄存器映射到固定的主机寄存器或存储器地址。在大多数主机上,我们只将所有目标寄存器映射到内存,只在主机寄存器中存储一些临时变量。在每个目标CPU描述中,临时变量的分配是硬编码的。这种方法的优点是简单性和可移植性。

The future versions of QEMU will use a dynamic temporary register allocator to eliminate some unnecessary moves in the case where the target registers are directly stored in host registers.

在目标寄存器直接存储在主机寄存器中的情况下,QEMU的未来版本将使用动态临时寄存器分配器来消除一些不必要的移动。

3.3 condition code optimization

Good CPU condition code emulation (eflags register on x86) is a critical point to get good performances.

QEMU uses lazy condition code evaluation: instead of computing the condition codes after each x86 instruction,it just stores one operand (called CC SRC), the result (called CC DST) and the type of operation (called CC OP). For a 32 bit addition such as R = A + B, we have:

QEMU使用惰性条件代码评估:它不是在每个x86指令之后计算条件代码,而是只存储一个操作数(称为CC SRC)、结果(称为CC DST)和操作类型(称为CC OP)。对于一个32位的加法,例如r= a+b,我们有:

CC_SRC=A

CC_DST=R

CC_OP=CC_OP_ADDL

Knowing that we had a 32 bit addition from the constant stored in CC OP, we can recover A, B and R from CC SRC and CC DST. Then all the corresponding condition codes such as zero result (ZF), non-positive result (SF), carry (CF) or overflow (OF) can be recovered if they are needed by the next instructions.

知道我们在CC OP中存储了一个32位的加法,我们可以从CC SRC和CC DST中恢复A、B和R。然后,如果下一个指令需要零结果(ZF)、非正结果(SF)、进位(CF)或溢出(OF)等所有相应的条件码,则可以恢复它们。

The condition code evaluation is further optimized at translation time by using the fact that the code of a complete TB is generated at a time. A backward pass is done on the generated code to see if CC OP, CC SRC or CC DST are not used by the following code. At the end of TB we consider that these variables are used. Then we delete the assignments whose value is not used in the following code.

通过使用一次生成完整TB的代码,在转换时进一步优化条件代码评估。对生成的代码进行反向传递,以查看以下代码是否使用CC OP、CC SRC或CC DST。在TB结束时,我们考虑使用这些变量。然后我们删除在下面的代码中不使用其值的赋值。

3.4 direct block chaning

After each TB is executed, QEMU uses the simulated Program Counter (PC) and the other information of the static CPU state to find the next TB using a hash table. If the next TB has not been already translated, then a new translation is launched. Otherwise, a jump to the next TB is done.

在执行每个TB之后,QEMU使用模拟的程序计数器(PC)和静态CPU状态的其他信息来使用哈希表查找下一个TB。如果下一个TB还没有被翻译,那么就启动一个新的翻译。否则,跳转到下一个TB。

In order to accelerate the most common case where the new simulated PC is known (for example after a conditional jump), QEMU can patch a TB so that it jumps directly to the next one.

为了加速已知新模拟PC的最常见情况(例如,在条件跳转之后),QEMU可以修补TB,使其直接跳转到下一个。

The most portable code uses an indirect jump. On some hosts (such as x86 or PowerPC), a branch instruction is directly patched so that the block chaining has no overhead.

最可移植的代码使用间接跳转。在一些主机(如x86或PowerPC)上,分支指令被直接修补,使得块链接没有开销。

3.5 memory management

For system emulation, QEMU uses the mmap() system call to emulate the target MMU. It works as long as the emulated OS does not use an area reserved by the host OS.

对于系统仿真,QEMU使用mmap()系统调用来仿真目标MMU。只要仿真OS不使用主机OS保留的区域,它就可以工作。

In order to be able to launch any OS, QEMU also supports a software MMU. In that mode, the MMU virtual to physical address translation is done at every memory access. QEMU uses an address translation cache to speed up the translation.

为了能够启动任何操作系统,QEMU还支持软件MMU。在这种模式下,MMU虚拟地址到物理地址转换是在每个存储器访问中完成的。QEMU使用地址转换缓存来加快翻译速度。

To avoid flushing the translated code each time the MMU mappings change, QEMU uses a physically indexed translation cache. It means that each TB is indexed with its physical address.

为了避免每次MMU映射变化时刷新翻译代码,QEMU使用物理索引的翻译缓存。这意味着每个TB用其物理地址来索引。

When MMU mappings change, the chaining of the TBs is reset (i.e. a TB can no longer jump directly to another one) because the physical address of the jump targets may change.

当MMU映射改变时,因为跳转目标的物理地址可能改变,所以重置TB的链接(即,TB不再能够直接跳转到另一个)。

3.6 self-modifying code and translated code invalidation 自修改代码与转换代码失效

On most CPUs, self-modifying code is easy to handle because a specific code cache invalidation instruction is executed to signal that code has been modified. It suffices to invalidate the corresponding translated code.

在大多数CPU上,自修改代码很容易处理,因为可以执行特定的代码缓存无效指令以发出代码已被修改的信号。它足以使对应的翻译代码无效。

However on CPUs such as the x86, where no instruction cache invalidation is signaled by the application when code is modified, self-modifying code is a special challenge .

然而,在诸如x86之类的CPU上,当修改代码时,应用程序不会发出指令缓存失效的信号,因此自修改代码是一个特殊的挑战。

When translated code is generated for a TB, the corresponding host page is write protected if it is not already read-only. If a write access is made to the page, then QEMU invalidates all the translated code in it and reenables write accesses to it.

当为TB生成翻译代码时,对应的主机页如果不是只读的话,则是写保护的。如果对该页进行写访问,则QEMU将所有翻译的代码无效,并重新启用对它的访问访问。

Correct translated code invalidation is done efficiently by maintaining a linked list of every translated block contained in a given page. Other linked lists are also maintained to undo direct block chaining.

通过维护给定页面中包含的每个翻译块的链接列表,可以有效地完成正确的翻译代码无效。还维护其他链接列表以取消直接块链接。

When using a software MMU, the code invalidation is more efficient: if a given code page is invalidated too often because of write accesses, then a bitmap representing all the code inside the page is built. Every store into that page checks the bitmap to see if the code really needs to be invalidated. It avoids invalidating the code when only data is modified in the page.

当使用软件MMU时,代码失效更为有效:如果给定代码页由于写访问而经常失效,那么将构建一个位图,表示页面内的所有代码。进入该页面的存储都会检查位图以查看代码是否真的需要失效。当页面中只有数据被修改时,它就能避免代码的无效。

3.7 exception support

longjmp() is used to jump to the exception handling code when an exception such as division by zero is encountered. When not using the software MMU, host signal handlers are used to catch the invalid memory accesses.

当遇到诸如除以零的异常时,使用longjmp()来跳转到异常处理代码。当不使用软件MMU时,主机信号处理程序被用来捕获无效的内存访问。

QEMU supports precise exceptions in the sense that it is always able to retrieve the exact target CPU state at the time the exception occurred.4 Nothing has to be done for most of the target CPU state because it is explicitly stored and modified by the translated code. The target CPU state S which is not explicitly stored (for example the current Program Counter) is retrieved by re-translating the TB where the exception occurred in a mode where S is recorded before each translated target instruction. The host program counter where the exception was raised is used to find the corresponding target instruction and the state S.

QEMU支持精确的异常,因为它总是能够在发生异常时检索准确的目标CPU状态。对于大多数目标CPU状态不必做任何事情,因为它是显式地存储并被转换的代码修改的。未显式存储的目标CPU状态S(例如,当前程序计数器)是通过重新翻译TB来检索的,其中异常发生在S在每个被翻译的目标指令之前被记录的模式中。异常引发的主程序计数器用于查找相应的目标指令和状态S。

3.8 hardware interrupts

In order to be faster, QEMU does not check at every TB if an hardware interrupt is pending. Instead, the user must asynchronously call a specific function to tell that an interrupt is pending. This function resets the chaining of the currently executing TB. It ensures that the execution will return soon in the main loop of the CPU emulator. Then the main loop tests if an interrupt is pending and handles it.

为了更快,如果硬件中断未解决QEMU不检查每一个TB。相反,用户必须异步调用特定的函数,以告知中断正在挂起。此函数重置当前正在执行的TB的链接。它确保在CPU仿真器的主循环中执行将很快返回。然后主循环测试是否中断并处理它。

3.9 user mode emulation

QEMU supports user mode emulation in order to run a Linux process compiled for one target CPU on another CPU. At the CPU level, user mode emulation is just a subset of the full system emulation. No MMU simulation is done because QEMU supposes the user memory mappings are handled by the host OS. QEMU includes a generic Linux system call converter to handle endianness issues and 32/64 bit conversions. Because QEMU supports exceptions, it emulates the target signals exactly. Each target thread is run in one host thread.

4 porting work 移植工作

In order to port QEMU to a new host CPU, the following must be done:

•  dyngen must be ported (see section 2.2).

•  The temporary variables used by the micro operations may be mapped to host specific registers in order to optimize performance.

•  Most host CPUs need specific instructions in order to maintain coherency between the instruction cache and the memory.

• If direct block chaining is implemented with patched branch instructions, some specific assembly macros must be provided.

如果用补丁分支指令实现直接块链接,则必须提供一些特定的汇编宏。

The overall porting complexity of QEMU is estimated to be the same as the one of a dynamic linker.

QEMU的整体移植复杂度被估计为与动态链接器相同。

5 performance

In order to measure the overhead due to emulation, we compared the performance of the BYTEmark benchmark for Linux on a x86 host in native mode, and then under the x86 target user mode emulation.

User mode QEMU (version 0.4.2) was measured to be about 4 times slower than native code on integer code. On floating point code, it is 10 times slower. This can be understood as a result of the lack of the x86 FPU stackpointer in the static CPU state.

In full system emulation, the cost of the software MMU induces a slowdown of a factor of 2.In full system emulation, QEMU is approximately 30 times faster than Bochs.User mode QEMU is 1.2 times faster than valgrind--skin=none version 1.9.6, a hand coded x86 to x86 dynamic translator normally used to debug programs.The --skin=none option ensures that Valgrind does not generate debug code.

在全系统仿真中,软件MMU的成本导致2倍的减速。在全系统仿真中,QEMU大约比Bochs快30倍。用户模式QEMU比valgrind--skin=none版本1.9.6快1.2倍,手动编码x86到x86的动态转换器通常使用调试程序。 --skin=none选项确保Valgrind不生成调试代码。

6 conclusion and future work

QEMU has reached the point where it is usable in everyday work, in particular for the emulation of commercial x86 OSes such as Windows. The PowerPC target is close to launch Mac OS X and the Sparc one begins to launch Linux. No other dynamic translator to date has supported so many targets on so many hosts, mainly because the porting complexity was underestimated. The QEMU approach seems a good compromise between performance and complexity.

QEMU已经达到了可以在日常工作中使用的地步,特别是用于对商业x86 OS(如Windows)进行仿真。PowerPC接近启动Mac OS X,而SPARC开始启动Linux。迄今为止,还没有其他动态翻译器在这么多主机上支持这么多目标,这主要是因为移植的复杂性被低估了。QEMU方法在性能和复杂性之间似乎是一个很好的折中方案。

The following points still need to be addressed in the future:

- Porting: QEMU is well supported on PowerPC and x86 hosts. The other ports on Sparc, Alpha, ARM and MIPS need to be polished. QEMU also depends

very much on the exact GCC version used to compile the micro operations definitions.

移植:QEMU在PowerPC和X86主机上得到了很好的支持。SPARC、Alpha、ARM和MIPS上的其他端口需要进行打磨。 QEMU也取决于非常适合用于编译微操作定义的确切的GCC版本。

- Full system emulation: ARM and MIPS targets need to be added.

- Performance: the software MMU performance can be increased. Some critical micro operations can also be hand coded in assembler without much modifications in the current translation framework. The CPU main loop can also be hand coded in assembler.

性能:软件MMU性能可以提高。一些关键的微操作也可以在汇编程序中手工编码,而在当前的翻译框架中没有太多的修改。CPU主回路也可以在汇编程序中手工编码。

- Virtualization: when the host and target are the same, it is possible to run most of the code as is. The simplest implementation is to emulate the target kernel code as usual but to run the target user code as is.

虚拟化:当主机和目标相同时,可以按原样运行大部分代码。最简单的实现是像通常一样模拟目标内核代码,但按原样运行目标用户代码。

- Debugging: cache simulation and cycle counters could be added to make a debugger as in SIMICS

调试:可以添加缓存模拟和循环计数器,以便在SIMICS中生成调试器。

QEMU, a Fast and Portable Dynamic Translator-Fabrice Bellard-翻译的相关教程结束。

《QEMU, a Fast and Portable Dynamic Translator-Fabrice Bellard-翻译.doc》

下载本文的Word格式文档,以方便收藏与打印。