This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Tuesday, October 23, 2012

Address Unknown

Finally all the normal addressing modes are done in this update. As I mentioned in the previous post: the final missing pieces were the 68020 (complex) addressing modes.
This took me lot longer than I expected. Not just because these addressing modes are complicated, but also because memory reading is involved.

Secretly I hoped that this will fix the issue with the OS, but no. I am not sure why I expected that. :P
The OS still behaves the same: reboot loop.

You can stop reading now, technical details will follow.

Why does memory access make everything much more complicated?

The answer is: every memory access has to go through the memory handling functions in the emulator. I already implemented the possibility of direct memory access, but we cannot depend on that every time (or rather most of the time).

Then why is that causing any headaches? Because the functions are outside of the translated code, coded in C and all assigned temporary registers will be gone.

Previously, for the MOVE instructions I workarounded the situation by remapping and reloading only the required temporary registers as soon as the memory operation was done. This approach is simply not working in the case of the addressing mode translation: the addressing modes are completely isolated subroutines, do not know any details of the already allocated and mapped registers (that should be restored).

For now I implemented a different workaround for the addressing modes, which is far less optimal: instead of dropping all the temporary registers, I save them on the stack then restore it when the execution returns to the translated code.

No need to say how much slower this is: saving and reloading all mapped temporary registers than dealing with the absolutely required registers only. I am not satisfied with this solution, but at least it is working. Unless the called function expects a proper stackframe, because there won't be any. (At least this is not the case for OS4, and most likely not for MorphOS either.) This solution is not strictly compatible with the SysV ABI.

Probably it would make more sense to store the temporary registers in the static context structure, there is no need to be reentrant: there are no call backs to the translated code. Maybe I can revisit this whole piece of code at a later stage.

Monday, October 8, 2012

Jinxed it (more Apples)

Speaking of apples. As it seems I managed to upset the Gods with the previous post somehow: they unleashed their wrath on me. Apple had released iOS6.

The funny part was: the developer pre-release was all hunky dory with our app, but then came the final release and things have changed dramatically overnight. Since I am working as a mobile app developer recently, I had to spend lot of time on it, even some weekend day(s).Okay, I stop grumbling, we are finally managed to take everything under control. It was stressful and annoying.

So, I guess you might guess why the recent update is so thin: when I finally managed to get home I was fully drained, I had no strength to look at one more line of code.
I have implemented three missing addressing mode which are included in this update. I also started to work on the complex addressing modes (also called 68020 addressing modes), which are... well... really complex. I am less than half way thru with them, but in the meanwhile I didn't want to hold back this:

G5 support

While I suffered in deep agony, luckily Tobias Netzel was busy again and managed to fix up E-UAE for the G5 (PowerPC 970) processors. He got rid of the mcrxr emulated unsupported instruction (see the chapter about mcrxr) for this processor type. He also done some optimizations regarding the microcoded instruction usage, but probably we need to do more about it once.

The situation was very much similar to the 68060 and some unsupported FPU instructions which were emulated by the OS. If you remember that time how much Oxypatcher increased the speed of some floating point calculation intensive apps you can understand why E-UAE was soo sluggishly slow on G5 before: almost half of the emulated instructions make use of the mcrxr instruction for emulating some arithmetic flags.
With the recent changes this PPC instruction is not used for anything if the emulator was compiled for G5 and the fix helps a lot on the interpretive emulation too.

Some benchmarks from Tobias using the Mandelbrot test (G5 - 2.1 GHz):

Interpretive emulation:
  • using mcrxr: 5:02 secs
  • without mcrxr: 59 secs
JIT without flag optimization:
  • using mcrxr: 7:46 secs (yes, even slower than the interpretive...)
  • without mcrxr: 27.5 secs
JIT with flag optimization:
  • using mcrxr: 1:47 sec
  • without mcrxr: 18 sec
Well done, Tobias! The PPC Mac users will be grateful again.
By the way, these changes might have affect on the Cell PPE and Xenon processors too. Any volunteering developers?

Bounty

Recently, I had a cautious look on the bounty page for the project and I was a bit shocked by the fact that Mr. William Morris donated 1000 EUR. That was half of the previously collected bounty. Very generous of you.

Maybe this post is not the best opportunity to thank you all for your support. I hope I fulfill the expectations rather sooner than later.