Sunday, September 4, 2011

MPEG decoding, state save/restore, NRF emulation, ...

It's been a while since I wrote anything here, but that doesn't mean that work on CD-i Emulator has stopped. On the contrary, a lot has happened in the last month and describing all of it will take a very long blog post. So here goes…

Last January an annoying date-checking bug was found which forced me to release beta2 somewhat earlier than anticipated. After that I did no further work on CD-i Emulator. There were various reasons for this, but the most import one was a very busy period at my day job.

After a well-earned vacation I resumed CD-i related work in early August. First I spent a few days on Walter Hunt's OS-9 port of gcc, the GNU C/C++ Compiler that I found in October of last year. Getting it working on a modern Cygwin installation was interesting and something very different from my usual line of work. The result could be useful for homebrew activities: it's a much more usable C compiler then the Microware OS-9 one and supports C++ as a bonus. I intend to use this for ROM-less emulation validation some day; see also below. The sources need to be released but I haven’t gotten to that stage yet.

After that I had another go at the Digital Video cartridge emulation. At the point where I left off last year the major stumbling block was the presumed picture / frame buffering logic of the MPEG video driver. When the appropriate interrupt status bits are set the driver starts copying a bulk of status information to an array of device registers and it will sometimes also read from those registers. This is all controlled by several status and timing registers that are also referenced elsewhere and I previously could not get a handle on it.

My first attempt this time was spending another few days staring at it and tracing it, but this did not gain me much new understanding. Finally I decided to just leave it for now and see how far I could get without understanding this part of the driver. I decided to once again attempt to get "CD-i Full Motion Video Technical Aspects" working.

This CD-i was produced by Philips to give future Full Motion Video (as the new MPEG playback functions were called at the time) developers a demonstration of the technical capabilities of the new hardware, at a time when this hardware was still in the early beta phase. The CD-i actually contains the compiler libraries necessary for making FMV calls from CD-i applications, as these had not previously been widely distributed.

It is not a very slick disc visually, being intended for developers, but it demonstrates a number of FMV techniques such as regular playback, playback control including pause, slow motion and single step, freeze frame and forward/backward scan, special effects like scrolling the FMV window, a seamless jump and a sample of overlay effects with the CD-i base case video planes.

I had previously tried to run this disc on CD-i Emulator, but it always crashed for an unknown reason that I attributed to MPEG device emulation problems. This time I traced back the crash and it turned out to have nothing at all to do with FMV playback but was instead caused by an incorrect emulation of the 68000 instruction "move ea,ccr" which is supposed to set the condition code register (ccr) to the value specified by the effective address (ea). In the processor manual this is classified as a word instruction and I had emulated it as such, which turned out to be wrong as it caused a word write to the full status register which should have been a byte write to the lower eight bits of it which hold the condition codes.

The problem manifested itself when the application calls the math trap handler for some mundane number calculations, which were naturally supposed to set the condition codes. The value written to the status register inadvertently changed the processor from user to system mode (and also scrambled the active interrupt masking level) which caused an instant stack switch that caused a bus error when the trap handler attempts to return to the application program (the cpu took the return address from the wrong stack and got garbage instead).

Most CD-i applications probably don't use the math trap handler so the problem went undetected for a long time. Now that it's fixed some other titles have probably started working but I haven't tested that.

After this, the FMV Technical Aspects application would get to its main menu screen, allowing me to start FMV playback operations. Regular playback worked fine until the end of the video clip, where there turned out to be status bit generation issues that prevented the application from properly detecting the end of video clip condition (the decoder is supposed to send a "buffer underflow" signal, among others, after the end of the MPEG data and my emulation didn't do that yet).

This was not very easy to fix because of the way that MPEG data buffering and decoding is handled inside CD-i Emulator, which I'll get into below. So it took me some time.

Regular play working fine, I started worrying about window control. This was the area where I feared the picture buffering stuff, but it turned out that this was easily bypassed. The horizontal / vertical scrolling functions were ideal to test this but it took me some time to get it working. There were bugs in several areas, including my integration of the MPEG video decoding code, which I took from the well-known mpeg2dec package. This code is written to decode a single video sequence and consequently did not handle image size changes without some re-initialization calls at the appropriate times. Failing that, it mostly just crashed (at the Windows application level) due to out-of-bounds video buffer accesses.

Another issue was the timing of device register updates for image size changes; I turned out to have the basic mechanism wrong and consequently the driver would keep modifying the window parameters to incorrect values.

Having all of the above fixed, I returned my attention to playback control. So far I can get the video playback properly paused, but I haven't been able to get it properly resumed. For some reason the application resumes the MPEG playback but it doesn't resume the disc playback. Since the driver waits for new data to arrive from disc before actually resuming MPEG playback nothing happens (this is documented as such). The application is presumably expecting some signal from the driver to get into the proper state for resuming disc playback, but I haven't found it yet.

At this point, it seemed promising to look at other CD-i titles using playback control and the Philips Video CD application is an obvious candidate. Again, regular playback appears to work fine, but playback control (including pause/resume) does not. It turns out that this application uses a different driver call (it uses MV_ChSpeed instead of MV_Pause, probably in preparation for possible slow motion or single step), which never completes successfully, probably again because of device status signaling. Similar issues appear to block playback control in a few other titles I tried.

I've given some thought to tracing driver calls and signals on an actual player to see what CD-i Emulator is doing wrong, and it appears to be relatively simple, there's just a bandwidth issue because all of the trace output will have to go out the serial port which can go no higher then 19200 baud. Some kind of data compression is obviously needed and I've determined a relatively simple scheme that should be enough (the CD-i player side will all need to be coded in 68000 machine language so simplicity is important!), but I haven't actually written any code for it yet.

I know there are issues with the proper timing of some video status signals. Things like start-of-sequence, end-of-sequence and start-of-picture-group should be delayed until display of the corresponding picture, at present they are delivered at decoding time, which can be a few pictures early. But that does not really affect the titles I've tried so far, because they do not attempt picture-synced operations. An application like The Lost Ride might be sensitive to thinks like this, though, and it needs to be fixed at some time. Similar issues are probably present with time code delivery. In addition, the last-picture-displayed and buffer-underflow signals are not always properly sent; I'm fixing these as I go along.

In the process, I decided that the magenta border was getting annoying and tried to fix it. That turned out to be harder then I thought. The MPEG chip has a special border color register that is written by the MV_BColor driver call and it seemed enough to just pass the color value to the MPEG window overlay routines. Well, not so. Again the issue turned out to be timing of decoder status signals, but of a different kind. The driver doesn't write the border color registers until it has seen some progress in certain timing registers related to the picture buffering thing, presumably to avoid visual flashes or something on the actual hardware. Fortunately, it turned out to be easy to simulate that progress, taking care not to trigger the complicated picture buffer code that I so far managed to keep dormant.

At some point, possibly related to slow motion or freeze frame, I might need to actually tackle that code but I hope to by that time have gained more understanding of the supposed workings of the MPEG chip.

Looking at the above, you might think that all of the difficulties are with the MPEG video decoding and that is indeed mostly true. I did have to fix something in the MPEG audio decoding, related to the pause/resume problems, and that was the updating of the audio decoder clock. When audio and video playback are synchronized the MPEG video driver uses the MPEG audio clock as it's timing reference, which means that it has to be stopped and restarted when video playback control operations occur. Since I had never before seriously tested this, the audio clock wasn't stopped at all and the video driver obligingly continued decoding and displaying pictures until it ran out of buffered data.

There is currently just one known problem with the MPEG audio decoding: the audio isn't properly attenuated as specified by the driver. This causes little audio distortions at some stream transitions and when buffers run out. There is also a problem with base case audio synchronization but that is hard to trigger and possibly even not audible in many titles so I'll worry about that much later.

Above I promised to get into the MPEG data buffering and decoding issue. The basic problem is one of conceptual mismatch: the CD-i decoding hardware gets data "pushed" into it (by DMA or direct device I/O) at the behest of the driver, whereas the MPEG decoding code (based on the publicly available mpeg2dec and musicout programs from the MPEG Software Simulation Group) expects to "pull" the data it needs during decoding. Things get messy when the decoding runs out of data, as the code does not expect to ever do so (it was originally written to decode from a disc file which of course never runs out of data until the end of the sequence). Some obvious solutions include putting the decoding in a separate thread (which given multi-core processors might be a good idea anyway from a performance perspective) and modifying it to become restartable at some previous sync point (most easily this would be the start of an audio frame or a picture or picture slice). Both options are somewhat involved although they have obvious benefits, and it may turn out that I will need to do one of them anyway at some point. For now I've avoided the problems by carefully timing calls into the MPEG decoding code so that enough data to decode a single audio frame or video picture should always be available; the MPEG data stream at the system level contains enough timestamp and buffering information to make this possible (in particular, it specifies the exact decoding time of every audio frame or video picture in relation to the timing of the data stream, thus making it possible to make those calls into the decoding code at a time that a valid MPEG data stream will have already filled the buffers far enough).

The approach depends on the timing of the MPEG data entering the decoder, which means that it does not handle buffer underflow conditions unless you add some kind of automatic decoding that continues even if no more MPEG data appears, and this is basically what I’ve done. In the end it was just relatively straightforward extension of the automatic decoding already there to handle the fact that MPEG audio streams do not have to explicitly timestamp every single audio frame (the CD-i Green Book does not even allow this unless you waste massive amounts of space in each MPEG audio data sector) and would have been needed anyway to correctly decode the last pictures of a sequence, but that had never been tested before.

For performance and possible patent reasons I have taken care to edit the MPEG decoding code (placing appropriate #ifdef lines at the right places) so that only MPEG 1 video and audio layer I/II decoding code is compiled into the CD-i Emulator executable. This is all that is needed for CD-i anyway and MPEG 2 video and audio layer III greatly complicate the decoding and thus significantly enlarge the compiled code.

Being somewhat stymied at the FMV front, I next decided to spend some time on another lingering issue. During testing, I often have to do the same exact sequence of mouse actions to get a CD-i application to a problem point and this is starting to be annoying. Input recording and playback are a partial solution to this but then you still have to wait while the application goes through it, which is also annoying and can sometimes take quite some time anyway. The obvious solution is a full emulation state save/restore feature, which I've given some thought and started implementing. It's nowhere near finished, though.

During the MESS collaboration I spent some time investigating the MESS save/restore mechanism. If at all possible I would love to be compatible for CD-i emulation states, but it turns out to be quite hard to do. The basic internal mechanism is quite similar in spirit to what I developed for CD-i Emulator, but it's the way the data is actually saved that makes compatibility very hard. Both approaches basically boil down to saving and restoring all the relevant emulation state variables, which includes easy things like the contents of cpu, memory and device registers but also internal device state variables. The latter are of course not identical between different emulators but they could probably be converted if some effort was thrown at it and for a typical device they aren't very complex anyway. The MESS implementation uses an initialization-time registration of all state variables; at save/restore time it just walks the registrations and saves or restores the binary contents of those variables. CD-i Emulator has a somewhat more flexible approach; at save/restore time it calls a device-specific serialize function to save or restore the contents of the state variables. The actual registration / serialization codes are structurally similar in the two emulators (a simple list of macro/function calls on the state variables) but the code runs at different times.

The real problem is that MESS includes very little meta information in the save files: only a single checksum of all the names and types of registered state variables in registration order. This is enough to validate the save data at restore time if the state variables of the saving emulator exactly match those of the restoring emulator, because there is no information to implement skipping or conversions. This holds between different versions or in some case even configurations of MESS emulators, but it holds even more so between MESS and CD-i Emulator! The meta information could of course be obtained from the MESS source code (relatively simply macro modifications could cause it to be written out) but that would require exact tracking of MESS versions because every version could have its own checksum corresponding to different meta information (in this case CD-i Emulator would need meta information sets for every MESS checksum value it wants to support).

I want CD-i Emulator to be more flexible, especially during development, so I decided to make full meta information an option in the save file. The saved state of every device is always versioned, which allows the save/restore code to implement explicit conversion where needed, but during development this isn't good enough. With full meta information turned on, the name and type of every state variable precedes the save data for that variable in the save file. This allows more-or-less automatic skipping of unknown state variables and when properly implemented the restore code can also handle variable reordering. At release time, I will fix the version numbers and save full metadata information sets for those version numbers so that the same automatic skipping and handling of reordering can be done even if the metadata isn't in the save file (it probably won't be because of file size considerations, although that may turn out to be a non-issue because save files need to include the full RAM contents anyway which is 1 MB of data in the simplest case without any compression, which is of course an option).

In addition to all of the above, I made some progress on the ROM-less emulation front. First I spent some time reading up on the internals of OS-9 file managers, because writing a replacement for the NRF file manager (NRF = Nonvolatile RAM File manager) seemed the logical next step. Actually writing it turned out not to be that hard, but there were of course bugs in the basic ROM emulation code. Most of them had to do with handlers not calling into the original ROM, which totally screwed up the tracing code. Some new functionality was also needed to properly read/write OS-9 data structures inside the emulated machine from the ROM emulation code; I wanted to implement this in such a way that compilation to "native" 68000 code remains a future option for ROM emulation modules. And of course the massive tracing described in the previous blog post had to be curtailed because it was impossible to see the relevant information in the morass of tracing output.

The new emulated NRF stores its files in the PC file system and it currently works fine when you start it with no stored files (i.e., the player will boot). In that case it will write out a proper "csd" (Configuration Status Descriptor) file. However, if this file already exists, the player crashes, although I have so far not found any fault in the NRF code. The origin of the problem probably lies elsewhere; I suspect it has to do with the hidden "player_shell_settings.prf" file. This file is read and written by the ROM bootstrap even before OS-9 is running; it does this by directly accessing the NVRAM memory (the file never changes size and is always the first one in NVRAM). Since the bootstrap accesses of this file do not go through the NRF file manager or even the NVRAM driver they are not redirected by the OS-9 emulation. However, later accesses by the player shell *are* redirected and the player shell does not seem able to handle the file not existing in the PC file system in the case where a csd file already exists. Solutions include extending the emulated NRF to always access this particular file from the NVRAM instead of the PC file system or somehow synchronizing the two locations for the file. The latter is probably the easiest route given the fixed location and size of the file, but the former is also useful as it would provide a full reimplementation of the original NRF that could in principle be compiled to native 68000 code to replace the "original" NRF in ROM (this is where gcc comes in as alluded to earlier, since all emulation code is written in C++).

In either case, I do not want the file manager to directly access emulated NVRAM although it could do so easily, as there is already an internal CNvramPort interface that provides just such access independent of the actual emulated NVRAM chip. The NRF file manager should instead call the NVRAM driver, which means that I need to implement cross-module calling first. It's not really hard in principle, the design has been done but there are a lot of little details to get right (the most obvious implementation uses at least 66 bytes of emulated stack space on each such call which I find excessive and might not even work; smarter implementations require some finicky register mask management or a "magic cookie" stacking approach, the latter having the best performance in the emulation case but being impossible in the native 68000 compilation case). When cross-module calling is working, I can also have the file manager allocate emulated memory and separate out the filename parsing functions by using the OS-9 system calls that provide these functions (the current emulated NRF does not allocate emulated memory which is arguably an emulation error and has the filename parsing coded out explicitly).

When everything works correctly with the emulated NRF, I have to find some way of integrating it in the user experience. You could always start over without any NVRAM files, but I'd like to have some way of migrating files between the two possible locations without having to run CD-i Emulator with weird options. Extending the CD-i File Extractor (cdifile) by incorporating (part of) the emulated NRF seems the obvious choice, which would also provide me with some impetus to finally integrate it with the CD-i File Viewer (wcdiview) program that's supposed to be a GUI version of cdifile but so far is just a very thin skeleton barely able to graphically display a single CD-i IFF image file passed on the command line (it doesn't even have a File Open menu) and will often crash. A proper implementation would look like Windows Explorer with a tree view on the left (CD-i file system, real-time channels and records, IFF chunk structure, etc) and a variable content display on the right (raw data view, decoded sector view, code disassembly view, graphical image view, audio playback, slideshow playback, decoded MPEG view, MPEG playback, etc).

That touches on another area in which I did some work last month: the saving of CD-i IFF image files for each emulated video frame. The motivation for this was to bring full-resolution real-time frame saving into the realm of the possible, as it would write only about 2 x (1024 + 280 x (384 + 32)) = 247 KB of raw CD-i video and DCP data per frame instead of 560 x 768 x 3 = 1260 KB of raw RGB. At least on my PC this has turned out not to be the case, however. The data is written out fine, which is not as easy as it sounds since video line data size can vary with each line because of pixel repeat and run-length encoding, but it's still too slow. That being so, I am not really very motivated to extend the CD-i IFF decoding implementation to actually decode this information. Some kind of compression could be an option, but that takes processor time and makes things even harder and possibly slower. Perhaps using another thread for this would be a solution, on a multi-core machine this should not greatly impact the basic emulation performance nor the debugging complexity as the compression code would be independent of the emulation itself.

So there is still a lot of work to be done, but it's all quite interesting and will provide for some entertaining evenings and weekends in the coming weeks or possibly months.

No comments:

Post a Comment