Monday, November 8, 2010

ROM-less emulation progress

Over the last two weeks I have implemented most of the high-level emulation framework that I alluded to in my last post here as well as a large number of tracing wrappers for the original ROM calls. In the next stage I will start replacing some of those wrappers with re-implementations, starting with some easy ones.

It turns out I was somewhat optimistic; so far I have wrapped over 450 distinct ROM entry points (the actual current number of wrappers is 513 but there are some error catchers and possible duplicates). Creating the wrappers and writing and debugging the framework took more effort then I expected, but it was worth it: every call to a ROM entry point described or implied by the Green Book or OS-9 documentation is now wrapped with a high-level emulation function that so far does nothing except calling the original ROM routine and tracing its input/output register values.

Surely there aren't that many application-callable API functions, I can hear you think? Well actually there are, for sufficiently loose definitions of "application-callable". You see, the Green Book specifies CD-RTOS as being OS-9 and every "trick" normally allowed under OS-9 is theoretically legal in a CD-i title. That includes bypassing the OS-supplied file managers and directly calling device drivers; there are many CD-i titles that do some of this (the driver interfaces are specified by the Green Book). In particular, all titles using the Balboa library do this.

I wanted an emulation framework that could handle this so my framework is built around the idea of replacing the OS-9 module internals but retaining their interfaces, including all the documented (and possibly some undocumented) data structures. One of the nice features of this approach is that native ROM code can be replaced by high-level emulation on a routine-by-routine basis.

How does it really work? As a start, I've enhanced the 68000 emulation to possibly invoke emulation modules whenever an emulated instruction generates one of the following processor exceptions: trap, illegal instruction, line-A, line-F.

The emulation modules can operate in two modes: either copy an existing ROM module and wrap its entry points, or generate an entirely new memory module. In both cases, the emulation module will emit line-A instructions at the appropriate points. The emitted modules will go into a memory area appropriately called "emurom" that the OS-9 kernel scans for modules. Giving the emitted modules identical names but higher revision numbers then the ROM modules will cause the OS-9 kernel to use the emitted modules.

This approach works for every module except the kernel itself, because it is entered by the boot code before the memory scan for modules is even performed. The kernel emulation module will actually patch the ROM kernel entry point so that it jumps to the emitted kernel module.

The emitted line-A instructions are recognized by the emulator disassembler; they are called "modcall" instructions (module call). Each such instruction corresponds to a single emulation function; entry points into the function (described below) are indicated by the word immediately following it in memory. For example, the ROM routine that handles the F$CRC system call now disassembles like this:

modcall kernel:CRC:0
jsr XXX.l
modcall kernel:CRC:$
rts

Here the XXX is the absolute address of the original ROM routine for this system call; the two modcall instructions trace the input and output registers of this handler. If the system call were purely emulated (no fallback to the original ROM routine) it would look like this:

modcall kernel:CRC:0
modcall kernel:CRC:$
rts

Both modcall instructions remain, although technically the latter is now unnecessary, but the jsr instruction has disappeared. Technically, the rts instruction could also be eliminated but it looks more comprehensible this way.

One could view the approach as adding a very powerful "OS-9 coprocessor" to the system.

If an emulation function has to make inter-module calls, complications arise. High-level emulation context cannot cross module boundaries, because the called module may be native (and in many cases even intra-module calls can raise this issue). For this reason, emulation functions need additional entry points where the emulation can resume after making such a call. The machine language would like this, e.g. for the F$Open system call:

modcall kernel:Open:0
modcall kernel:Open:25
modcall kernel:Open:83
modcall kernel:Open:145
modcall kernel:Open:$
rts

The numbers following the colon are relative line numbers in the emulation function. When the emulation function needs to make a native call, it pushes the address of one such modcall instruction on the native stack, sets the PC register to the address it wants to call and resumes instruction emulation. When the native routine returns, it will return to the modcall instruction which will re-enter the emulation function at the appropriate point.

One would expect that emulation functions making native calls need to be coded very strangely: a big switch statement on the entry code (relative line number), followed by the appropriate code. However, a little feature of the C and C++ languages allows the switch statement to be mostly hidden. The languages allow the case labels of a switch statement to be nested arbitrarily deep into the statements inside the switch.

The entire contents of emulation functions are encapsulated inside a switch statement on the entry number (hidden by macros):

switch (entrynumber)
{
case 0:
...
}

On the initial call, zero is passed for entrynumber so the function body starts executing normally. Where a native call needs to be made, the processor registers are set up (more on this below) and a macro is invoked:

MOD_CALL(address);

This macro expands to something like this:

MOD_PARAMS.SetJumpAddress(address);
MOD_PARAMS.SetReturnLine(__LINE__);
return eMOD_CALL;
case __LINE__:

Because this is a macro expansion, both invokations of the __LINE__ macro will expand to the line number of the MOD_CALL macro invokation.

What this does is to save the target address and return line inside MOD_PARAMS and then return from the emulation function with value eMOD_CALL. This value causes the wrapper code to push the address of the appropriate modcall instruction and jump to the specified address. When that modcall instruction executes after the native call returns, it passes the return line to the emulation function as the entry number which will dutifully switch on it and control will resume directly after the MOD_CALL macro.

In reality, the code uses not __LINE__ but __LINE__ - MOD_BASELINE which will use relative line numbers instead of absolute ones; MOD_BASELINE is a constant defined as the value of __LINE__ at the start of the emulation function.

The procedure described above has one serious drawback: emulation functions cannot have "active" local variables at the point where native calls are made (the compiler will generate errors complaining that variable initialisations are being skipped). However, the emulated processor registers are available as temporaries (properly saved and restored on entry and exit of the emulation function if necessary) which should be good enough. Macros are defined to make accessing these registers easy.

When native calls need to be made, the registers must be set up properly. This would lead to constant "register juggling" before and after each call, which is error-prone and tedious. To avoid it, it is possible to use two new sets of registers: the parameter set and the results set. Before a call, the parameter registers must be set up properly; the call will then use these register values as inputs and the outputs will be stored in the results registers (register juggling will be done by the wrapper code). The parameter registers are initially set to the values of the emulated processor registers and also set from the results registers after each call.

The following OS-9 modules are currently wrapped:

kernel nrf nvdrv cdfm cddrv ucm vddrv ptdrv kbdrv pipe scf scdrv

The *drv modules are device drivers; their names must be set to match the ones used in the current system ROM in order to properly override those. The *.brd files in the sys directory have been extended to include this information like this:

** Driver names for ROM emulation.
set cddrv.name=cdapdriv
set vddrv.name=video
set ptdrv.name=pointer
set kbdrv.name=kb1driv

The kernel emulation module avoids knowledge of system call handler addresses inside the kernel by trapping the first "system call" so that it can hook all the handler addresses in the system and user mode dispatch tables to their proper emulation stubs. This first system call is normally the I$Open call for the console device.

File manager and driver emulation routines hook all the entry points by simply emitting a new entry point table and putting the offset to it in the module header. The offsets in the new table point to the entry point stubs (the addresses of the original ROM routines are obtained from the original entry point table).

The above works fine for most modules, but there was a problem with the video driver because it is larger then 64KB (the offsets in the entry point are 16-bit values relative to the start of the module). Luckily there is a text area near the beginning of the original module (it is actually just after the original entry point table) that can be used for a "jump table" so all entry point offsets fit into 16 bits. After this it should have worked, but it didn't because it turns out that UCM has a bug that requires the entry point table to *also* be in the first 64KB of the module (it ignores the upper 16-bits of the 32-bit offset to this table in the module header). This was fixed by simply reusing the original entry point table in this case.

One further complication arose because UCM requires the initialisation routines of drivers to also store the absolute addresses of their entry points in UCM variables. These addresses were "hooked" by adding code to the initialisation emulation routine that changes these addresses to point to the appropriate modcall instructions.

All file managers and drivers contain further dispatching for the SetStat and GetStat routines, based on the contents of one or two registers. Different values in these registers will invoke entirely separate functions with different register conventions; they really must be redirected to different emulation functions. This is achieved by lifting the dispatching to the emulation wrapper code (it is all table-driven).

Most of the above has been implemented, and CD-i emulator now traces all calls to ROM routines (when emurom is being used). A simple call to get pointing device coordinates would previously trace as follows (when trap tracing was turned on with the "et trp" command):

@00DF87E4(cdi_app) TRAP[5812] #0 I$GetStt <= d0.w=7 d1.w=SS_PT d2.w=PT_Coord
@00DF87E8(cdi_app) TRAP[5812] #0 I$GetStt => d0.w=$8000 d1.l=$1EF00FD

Here the input value d0.w=7 is the path number of the pointing device; the resulting mouse coordinates are in d1.l and correspond to (253,495),

When modcall tracing is turned on, this "simple" call will trace as follows:

@00DF87E4(cdi_app) TRAP[5812] #0 I$GetStt <= d0.w=7 d1.w=SS_PT d2.w=PT_Coord
@00F86EE0(kernel) MODCALL[16383] kernel:GetStt:0 <= d0.w=7 d1.w=$59 [Sys]
@00F86D10(kernel) MODCALL[16384] kernel:CCtl:0 <= d0.l=2 [NoTrap]
@00F86D1A(kernel) MODCALL[16384] kernel:CCtl:$ =>
@00F8A460(ucm) MODCALL[16385] ucm:GetPointer:0 <= u_d0.w=7 u_d2.w=0
@00FA10A4(pointer) MODCALL[16386] pointer:PtCoord:0 <= d0.w=7
@00FA10AE(pointer) MODCALL[16386] pointer:PtCoord:$ => d0.w=$8000 d1.l=$1EF00FD
@00F8A46A(ucm) MODCALL[16385] ucm:GetPointer:$ =>
@00F86D10(kernel) MODCALL[16387] kernel:CCtl:0 <= d0.l=5 [NoTrap]
@00F86D1A(kernel) MODCALL[16387] kernel:CCtl:$ =>
@00F86EEA(kernel) MODCALL[16383] kernel:GetStt:$ =>
@00DF87E8(cdi_app) TRAP[5812] #0 I$GetStt => d0.w=$8000 d1.l=$1EF00FD

You can see that the kernel dispatches this system call to kernel:GetStt, the handler for the I$GetStt system call. It starts by doing some cache control and then calls the GetStat entry point of the ucm modules, which dispatches it to its GetPointer routine. This routine in turn calls the GetStat routine of the pointer driver, which dispatches it to its PtCoord routine. This final routine performs the actual work and returns the results, which are then ultimately returned by the system call, after another bit of cache control.

The calls to ucm:GetStat and pointer:GetStat are no longer visible in the above trace as the emulation wrapper code directly dispatches them to ucm:GetPointer and pointer:PtCoord, respectively; it doesn't even trace the dispatching because this would result in another four lines of tracing output.

As a sidenote, all of the meticulous cache and address space control done by the kernel is really wasted, as CD-i systems do not need these. But the calls are still being made, which makes the kernel needlessly slow; one major reason why calling device drivers directly is often done. Newer versions of OS-9 eliminate these calls by using different kernel flavors for different processors and hardware configurations.

The massive amount of tracing needs to be curtailed somewhat before further work can productively be done; this is what I will start with next.

I have already generated fully documented stub functions for the OS-9 kernel from the OS-9 technical documentation; I will also need to generate for all file manager and driver calls, based on the digital Green Book.

It is perhaps noteworthy that some kernel calls are not described in any of the OS-9 version 2.4 documentation that I was able to find, but they *are* described in the online OS-9/68000 version 3.0 documentation.

Some calls made by the native ROMs remain undocumented but those mostly seem to be CD-i system-control (for example, one of them sets the front display text). Of the OS-9 kernel calls, only the following ones are currently undocumented:

F$AllRAM
F$FModul
F$POSK

Their existence was inferred by the appropriate constants existing in the compiler library files, but I have not seen any calls to them (yet).