This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Monday, November 7, 2011

While you are waiting (FAQ)

I know you can hardly wait to fire up the JIT-crafted UAE; still I need to ask for more patience. Things are improving, yet it is not as mature as this admired audience would deserve, so there is no pre-relase just yet.

Some of you is more excited than the others and asked interesting questions that worth answering. Questions are repeated all over again quite often, so let me save some time and pondering: here are the answers for the...

Frequently Asked Questions

Last updated on 11 January 2014.


Q: How can I help?
A: You can always show your support by sending a small donation to this project on AmigaBounty. You can also spread the word to users of other platforms; maybe they would like to hear about it too.
In case you feel the urge, you can fiddle with the sources. See the Q/A below related to getting the current sources.

Q: Will this JIT translation work on AmigaOS3.x PPC/MorphOS/Macintosh/Linux/game consoles? The bounty is about AmigaOS4!
A: Absolutely, I have no intention to limit the implementation to AmigaOS4.
However, I cannot promise that I will release any other specific builds from E-UAE other than the one for AmigaOS4. (Simply because I have no experience, neither too much time/intention to dig deep into development for different platforms.)
The sources are released under GPL license; you can do your own build or merge it into your favorite UAE derivation.

Q: Will you fix the bug XYZ in E-UAE? Will you improve the chipset emulation?
A: No. I never promised anything else than PowerPC JIT translation and I still intend to stick to this. I have fixed some minor things in the original E-UAE implementation that was related to AmigaOS4, but don’t expect any subtle changes.

I am not going to take over the maintenance of the E-UAE source code base. Somebody else has to stand up and accept this task.

Q: Will this JIT thing improve the speed of the emulation of my favorite (OCS) game?
A: Possible, but I must point out that similarly to the x86 implementation the JIT translation will only work on processors that have instruction (CPU) cache built in and turned on (68020 and up). So, if your game cannot cope with these processors and the turned on processor cache, then I am sorry, but it won’t help much.


Q: I tried to run Kickstart 1.3 and it crashed with the JIT enabled!

Why? I tried to run <some ancient Amiga 500 game> and it crashed with the JIT enabled! Why?
A: Before Kickstart 2.05 the OS was not aware of the processor (CPU) cache, essentially these Kickstart versions wouldn't work with 68020 processors at all.
The processor cache is disabled when the processor resets and starts executing the ROM, but since the OS is not aware of the cache flags it turns on the cache when it fiddles with the processor control registers.
Why was it working with the interpretive emulator when 68020 processor was selected: the processor cache is not emulated in the interpretive emulator, so basically it doesn't matter at all if it was on or off.


Q: Why is the JIT depending on the processor (CPU) cache?
A: Because it is very hard to detect the self-modifying code, code overwriting and similar situations without the help from the OS or the running program itself. When the cache is flushed then the previously compiled code can be recompiled.
The JIT implementation is depending on the processor (CPU) cache emulation to overcome of the issue of the self-modifying and the generated/copied code. The JIT compiled code behaves quite similar to the actual 68020 execution, so if an OS version or a game does not work on 68020 (caches enabled) then it won't work with JIT either. (I had my sweet time with this issue while I was developing Petunia, back in the days. Still not resolved
completely there, but it is good enough for now.)

Q: Will this JIT thing improve the speed of the emulation of my favorite (OS-friendly) 68k application?
A: Most likely yes. No promises, though.


Q: When will it be ready?
A: When it’s done… ;) If everything went very well then it will be finished around when the bounty deadline comes: 02/21/2012. (Maybe I should start a countdown, but it would be too embarrassing when I miss the deadline. ;) Well, the deadline is missed, so, let's say: when it's done.


Q: Is the implementation derived from Petunia?
A: No, it is a completely different code. Neither any C or assembly source code from nor any dependency on the JIT implementation in AmigaOS4 is used.
(Actually, this is not entirely true: I was way too lazy to create an instruction description table file again and the parsing tool for it, so these two pieces were borrowed from Petunia. My original plan was using the present file and parsing, but it was so tuned to the interpretive emulation and my JIT implementation is significantly different than the x86 version, so I just skipped this.
But apart from that two files the emulation itself is completely different.)

Q: Will it be faster than Petunia?
A: I have no idea. It is more likely that some apps will run faster with Petunia, others will run faster with the E-UAE JIT. There are too many factors that might have effect the results. We will see when it will be finished.


Q: Where can I get the sources?
A: Check out the SourceForge project: https://sourceforge.net/projects/euaeppcjit 

Q: Where can I get the executable?
A: There is no final release yet, the project is in Beta stage.

Q: I want to try the emulation RIGHT NOW! How can I get it?
A: Go ahead and download the beta from the SourceForge project page:
https://sourceforge.net/projects/euaeppcjit/files
Please note: it is still Beta. You can give it a try, but be prepared that it might not work as you might expect it.

Q: But... but... I don't know how to set up the JIT compiling. What would you suggest?
A: There are some crucial settings for the JIT in the E-UAE configuration file. Without these settings the JIT compiling won't work properly. I suggest to add the following block into your E-UAE configuration file:

cpu_speed=max
cpu_type=68020
cpu_compatible=false
cpu_cycle_exact=false
blitter_cycle_exact=false
comp_optimize=no
cachesize=8192
comp_constjump=yes
comp_trustbyte=indirect
comp_trustword=indirect
comp_trustlong=indirect

ppc.use_tbc=false

These are the necessary settings for now, might (or most likely will) change in the future. 


Q: How can I be sure that the JIT is working while I am running the program?
A: There is a LED for it! Yes, if you turn on the LED on-screen-display (OSD) then you will see a LED that says "JIT". If this LED is green (read: not completely black) then the JIT is on. The more bright green means more executed JIT code.
If the LED is red then the JIT compiling is not working at the moment.
To turn the LEDs on put this into your configuration file:

show_leds=true

If you are interested in the details of how the JIT LED works have a look on this post: http://euaejit.blogspot.com/2013/02/watch-for-led.html
The LED behavior is slightly changed in Beta01, so have a look on this page too:
http://euaejit.blogspot.com/2014/01/ppcjitbeta01-happy-new-year-2014.html

(Bonus questions, asked almost every time… :)
Q: Do you like to live in New Zealand?
A: Very much so. :)


Q: Have you met the hobbits yet?
A: No, not yet, but I am looking forward to meet them. But we already visited the Hobbiton, that was fun!


Q: Have you met any sheep yet?
A: Yes, a lot.


I will add more questions and answers to this post as soon as I got more.
It seems pretty boring for now. Maybe I should post some naked ladi^H^H^H^H facts to this page.

Comments? :)

Sunday, October 2, 2011

Let It Rip

I had spent lot of time on understanding and reshaping the x86 JIT compiling, and fit the PPC code emitter into it. No, no and no. The source code is so deeply convoluted, riddled with residue from debugging, useless functions and scary/cryptic workarounds that I had to give it up.
So, I finally decided: I will rip the x86 part out of the recent E-UAE sources. The PPC code emitter will come as a brand new part, won't depend on the x86 counterpart.

PS. I would like to thank Douglas McLaughlin for the Synergy client for OS4. Great work, it helps much.

Sunday, September 11, 2011

The Big X on the Map

Just a very quick update: things are slowly evolving, so DON'T PANIC.

Some more words about what has happened recently: I had some trouble with my Amiga, turned out as a PSU problem. Luckily it is already fixed with a "new" case for the poor girl.

I am fighting with the rather chaotic E-UAE sources, so I made a few steps to make it easier to edit the whole project: I managed to put together a cross-compiling environment from Windows. *booo-booo*
Compiling is much faster on the muscle x86 processor (*booo-booo*) and there are some advanced tools to do some heavy lifting when it comes to source code editing. I must also admit that since I use source version control at work, I barely can go on without it.

With the help of Cygwin, Eclipse CDT, TortoiseSVN, VisualSVN Server, FileZilla Server and a simple script on the Amiga side I was able to assemble a semi-automatic build and test deployment environment.

Thanks to Zerohero for the detailed cross-compiling setup description.

Thursday, July 14, 2011

Growing flowers

Today is the 10th Anniversary of the first public demo of Project Petunia. The very first demonstration happened on Fyanica #6 party 14th July 2001. You can find some pictures and the videos about the demonstration here. (Page is Hungarian, so be prepared… ;)


 *Sigh* Ten years of my life. Like it was yesterday.

Friday, July 1, 2011

...beep...

Sorry for being silent for long time, at the moment I am struggling with my personal life, but will be back shortly. (I had to find a new place and move in, wasn't that simple actually, but finally we are over it. All I need now is a desk to put my computers on it...)

In the meantime I had stolen a few hours to dig deep into the E-UAE JIT implementation. As it seems I can reuse the already implemented bits and pieces. The "only" part that needs to be implemented is the actual PowerPC code translator.

So, stay tuned. (Or you can lay back, if you prefer.)

Monday, May 2, 2011

Is it alive, Igor?

While I had my expectations for setting up the cross-compiling environment, I failed to fulfill my own agenda: it just doesn't want to work. I have tried to set up a Cygwin environment, but the configuration script behaves odd and fooling around a few hours I lost the battle. Not a big issue yet, since I wasn't able to set up net connection on the Amiga anyway and transferring files on a flash drive isn't exactly comfortable.

So, instead of wasting more time on that, I rather started to create the test environment for the JIT.
To limit the scope of the must-be-implemented instructions, I figured out that I can prepare a spoofed Kickstart ROM file that contains exactly what I want to test at the moment. After a while I found my age old Mandelbrot test that fancies only Motorola 68000 instructions and prepared in sufficiently hardware-banging fashion - best candidate.
The development environment for the test code: PhxAss for compiling, vlink (from the ominous VBCC package) for linking.

You might ask why vlink was necessary when PhxAss is able to do the linking. The answer lies in the Kickstart ROM file format: it is a plain raw binary dump, that starts from $FFF80000 in memory. This format can be manufactured by a number of ways, but PhxAss is not exactly prepared for it.
I have never tried to create a raw binary from a relocate-able Amiga executable before, so I had to tinker with the linker script first a little while, but finally managed it.
E-UAE has some restrictions on the size of the ROM file, it must match one of the legitimate sizes from the real hardware: either 256 or 512 KB sized. (There are other special sizes for Amiga 1000, or the CD-32, but generally these wouldn't help much.)

Finally, the ROM file was ready. I knew that it gets loaded to that high memory address ($FFF80000), and I slightly remember that it gets overlaid to the CHIP RAM (to the zero address of the memory address space, actually) with a hardware trickery right after reset.
The reason is quite simple: the Motorola processor takes the first two longwords from the memory for the initial program counter (PC) and the supervisor stack pointer (SSP) right after start-up. To put some meaningful data into these addresses either the ROM should start from the beginning of the address space (technically that would be address $00000000), or some hardware magic has to be done. The latter is the case of the Amiga systems. When the ROM started from the real address, the overlay goes away by setting some of the custom registers... But which register exactly? It is not a common code to start-up an Amiga right from clean reset... Luckily, others already done the hard work: check out this link from the section "A Quick explanation of what happens and why".

Everything seemed nice, however I never came around figuring out how to turn on AGA chipset after a plain reset. Until you start the SetPatch command, the system behaves as OCS chipset does, and now I don't even have the SetPatch command since I don't have any part of the system without the actual ROM. Not good, not good!

I have spent a whole day finding out the answer. Well spent time indeed, by the way: I refreshed my memory about Amiga hardware and coding for the legacy machines. The result is included in the sources, in case you are curious. (The "magic" FMODE register.)

You can download this initial test package from here from the SourceForge page of the project, it comes handy if you would like to start writing your own Amiga emulator (haha).
Please note again: you will need the PhxAss and the VBCC package for 68k target if you would like to rebuild. To do the compiling and linking just type: "build mandel_hw" into the command line and here we go. (If you have the compiling tools in your path somewhere, of course.)
Before you are rushing to shovel it into E-UAE: while WinUAE won't bother checking the Kickstart ROM, and you can basically load anything into it, E-UAE is trying to do checksum on the binary image. I was too lazy to create the proper checksum, I rather modified the E-UAE sources to skip the checksums. (Meaning: you won't be able to load the test into the usual E-UAE or derivatives for now.)

I have to add that I don't know exactly who created this Mandelbrot set calculation. It wasn't me, I got the source from a magazine some time ago. I guess the author won't be bothered that I published his/her(?) work again. So, let's say these sources are Public Domain for now.

How to proceed... Well, I still wasn't able to manage to prepare the poor man's JIT compiling. It is not working yet, there are some issues with it. This is coming on the way next.

Sunday, March 27, 2011

Rainy Day with Copper-rendered Rainbows

The sky is cloudy, sometimes sudden showers are soaking the insanely green grass and I am forced into the shabby unit we rented half a year ago. Time to move - popping into my mind quite often these days, but that is a completely different matter.
There are only a few things distracting me from working on E-UAE, I tried to concentrate working on it with a great strength. (Apage Satanas: Facebook! I clearly miss the times when I was developing Flamingo. I was sitting in my curtained flat all day, no internet, no phone, no tv: the best environment for creating your own little pet creature and take over the world, MUHAHAH...)

I was trying to compile E-UAE since my last post, but I am clearly not really fond of the multi-platform environment. Configure script is a bad dog: never behaves. (Linux developers must have an insane amount of time and patience.)
Finally, after running into the never-ending circle of deleting all, copy back, adjusting, configuring and do the make for the hundredth time: I was able to compile E-UAE sources, and the binary is even working. YAY! \o/

I am happy that I can start on tinkering with the JIT implementation, but I have to admit that my Amiga developer skills became a bit rusty in the last three years, since I hardly had time to spend on development.
I fought big time with the path of include files, still getting flooded with complaints about missing prototypes for the AOS functions, but at least that is only a warning. (Not that I can live with compile-time warnings for my code, but I really don't have enough strength to find what minor thing needs fine tuning in the compiling to get rid of these warnings safely.)
It took me an hour today to find out that TimeDelay function is part of the (obsolete) libamiga and this is why the linking failed every time. There was an easy fix for that: just remove the check for AOS4 from the sources and let it fall back to the DOS/Delay() function. Not a proper solution, because Delay() is badly inaccurate, but at least I was able to compile and fire it up.

The compiling on the uA1 is awesomely slow and eating up all the memory, so I had to turn on the swapping. I seriously considering the cross-compiling from my windows laptop, and let the A1 do the testing functionality only. It was a very long time since I tried cross-compiling for the OS kernel, back in the days when it was impossible to compile it on AOS4. It was painful, I had to use Cygwin and a number of tricks to get it working. Then copied the compiled binary to my Amiga 4000 on a floppy disk for testing... *Sigh* I was soo patient, I am amazed now. ;)
Speaking of which: I still need net connection for the uA1, that would be essential for the cross-compiling.

Anyway, let's go back to experimenting. The first test will be implementing a very-very simple form of the poor-man's JIT, which will do nothing more than "compiling" the executed code into series of calls to the interpretive instruction implementation functions. The outcome will be an environment, where I still have all the already implemented instructions, but I can replace them one-by-one with the real JIT instructions. Something what I needed when developed Petunia, but there was no interpretive emulator to back up the missing instructions.

Sunday, March 6, 2011

Blueprints

It is time to celebrate: finally my Amiga configuration is complete. It took more than a half year after moving to the opposite side of the globe (from Hungary to New Zealand). I dragged my poor, old A1-XE with me in a suitcase; unfortunately it didn't survive the flight. :(
Stephen “Cobra” Fellner lent a uA1 to me in a compact case, I couldn't be grateful enough for it. (Not to mention the tons of helps they gave us to start our new life here from scratch...)
Piece-by-piece the machine was completed; the final item was an old Philips monitor from TradeMe.

It is also time to reorganize priorities in my (rather limited) free time. Cut back on beaches, bushwalks and especially on Facebook. :P
But enough from whining, this is a technical blog after all and the autumn is coming rapidly with lotsa rain...

Let's scheme!

A JIT compiler similar to a programming language compiler, the main difference is the source: while a programming language source is human readable text (or that would be the goal, at least), the source in this case is machine code from a different processor.
The upside is the machine code has very strict rules, easier to interpret. The downside is it must be precisely known in every minor detail and there are undocumented features and side-effects that have to be implemented correctly. (How to find out these: good question; usually with countless hours of debugging crashing applications.)

Why JIT called just-in-time compiling: the program code translated to natively executable code while it is emulated. It means every time the execution reaches a point in the program that wasn’t reached before then the compiler takes a chunk from the emulated code and translates it to native code then the execution flow goes on. (Some JIT compilers are able to translate the whole executable right after loading, but that is only possible in special cases.)
The compilation process can be either fairly simple or overly complicated, depending on the actual method. The final result must be directly executable on the host system; in other case it would be inefficient and probably useless. (It would require an interpreter to execute, rarely makes any sense.)
There are different approaches to the compilation process, which one is the best choice depends on multiple factors.

Poor man’s JIT

For example the most simplistic method produces a series of jump instructions for execution of source data fetching, operation execution, result store. Everything else is done by a library of functions in the emulation environment. This solution gets rid of the interpretation of every instruction code at every (re-)execution, but it is next to impossible to do any optimization which would involve the source- and destination handling together with the operation; not to mention a wider span of instructions.
Why anybody would try such an inefficient solution: if the interpreter is already implemented then requires significantly less work to turn it into a JIT compiler by this way. Also the translated code needs (lots of) memory, such implementation is lightweight on the memory usage.

Stamping Lil’ Roses and Rainbows

Slightly better approach is creating templates for certain addressing modes and operations that can be copied into the translated code (almost) directly. Not every instruction incarnation needs its own template; it is possible filling up some gaps regarding the specifics to the actual instruction, such as the involved registers.
This is the most common approach, flexible and efficient, if implemented properly. With some tweaking it is even possible to adjust the templates to handle special cases, such as optimizing arithmetic flag emulation away when consecutive instructions would overwrite it anyway. This is how Petunia works and as I found out recently, something similar that WinUAE implements.
However, templates are a bit rigid sometimes and still not to easy (or even impossible) to join together more, than two instructions in a specific optimization, which would make use of certain aspect of the target processor. Creating truly flexible templates could result a big mess in translation functions.

Big Planz I haz it

I had a couple years after finishing (the never-ever finished) Petunia playing around with scenarios in my mind where emulated code builds up this or that way. Finally, I came to the conclusion that there is a possible better way to implement JIT compiling, and that is similar to the microcode, that used often in CISC processors.
Microcode is for reducing the complexity of the machine code instruction by implementing it as a series of very simple “wired” instructions and an “interpreter” executes the simple instructions one-by-one at each clock cycle.
Combining this technique with the VLIW approach, when the simple instructions are executed out-of-order, or even can be eliminated completely the result might be lot more optimal on the generated code.
How do I intend to implement… Similarly to the templates, each emulated instruction will be prepared as a series of virtual instructions. The compiler in the first round collects the virtual instructions for each emulated instructions into a buffer for a defined chunk of the emulated code.
In the second round an optimizer runs trough the buffer while trying to apply modifications on the virtual instructions according to predefined rules. At this level the virtual instructions and the original (emulated) instructions have no connection at all anymore, each virtual instruction can be handled, reorganized, eliminated on its own.
The third round is the code emitter: turns the virtual instructions into natively executable code using actual code templates.

I am sure it wasn’t me who thought on this solution for the first time, but never read about similar approach before in the case of the JIT compiling. Programming language compilers do similar code translation to maintain the portability of the compiler between the different processor architectures. (Code emitter has to be adapted, the rest of the compiler needs no modification.)

Predicted Roadblocks

Once I heard: if you were not able to summarize the problem then you won't be able to find the solution either. Let's find out the possible problems with JIT compiling for the complete machine emulation then.

1. Memory access emulation
Unless the 68k emulation in OS4 the UAE is a complete machine emulation. While an application is running it reads and writes memory (no news to anybody, I guess). If the accessed memory is plain data then there is not much to do with it: the application can do whatever it was planned for.
Unfortunately there are two types of memory access that needs special care: accessing hardware registers and writing into the executed code area. For the latter see below at self-modifying code, the former is lot more easily to handle.
Solution: basically what needed is incorporate the functions from UAE that are called for each memory accesses into the translated code.

2. Self-modifying code
The 68k processors don't make difference between data and code, although the AmigaOS itself is able to recognize what part of the loaded executable contains actual code. The difference was never enforced to the developers with all of its up- and downsides, they found it out by the hard way what can of worms is hidden there, when the processors with cache (like 68020) appeared in Amigas. Several self-modifying game and demo fails running on the cached code memory, because the code cache and the data cache is separated.
In AmigaOS4 I was able to get information from DOS.library regarding the loaded and removed code segments. By using this information I could tell which memory areas are cleared, I simply dropped the translated code for those.
With UAE the situation is completely different: any byte in the memory is potential target for modifying. It means the translated code must be dropped and retranslated; otherwise it would conserve the previously translated state.
Solution: probably I must extend the memory access checking and drop the translated code when it gets modified. I have to revisit this topic later on; checking for translated code at every memory access might be too slow.

3. Translated code lookup
When the execution jumps, branches to a new memory address the emulator has to know whether there is an already translated code or the new address was never hit before, new translation is needed. The translated parts are sometimes following each other, but often the programs are wondering around in the memory with no logic to follow.
Solution: Petunia had the same problem, I created a two level look-up table for the translated code segments for each address in the address space; a bit memory-hungry, but very quick for finding the address of the translated code.

As Michael Jackson would say: This is it. These are the initial plans, more details must follow, but first thing first: let’s try to compile E-UAE… :)

Monday, February 21, 2011

Hello and welcome on the E-UAE JIT developer blog!

The purpose of this blog is documenting the development of the extension of the E-UAE Amiga emulator with the possibility of making use of Just-In-Time compile based Motorola 680x0 processor emulation.
A short lesson of history
WinUAE, the Amiga emulator for Windows have JIT compiling for many years now. Unfortunately, it is closely tied to the intel x86 architecture, because the most efficient way of implementing the JIT compiling is kinda similar to an actual programming language compiler: the end result is machine code, which is executed directly. Although it is possible to implement a processor independent JIT compiler, but to squeeze more speed from the executed code in a general compiling model is much more complex.
Recent Amiga (like) computers are using PowerPC processors, porting the WinUAE solution to PPC processor would be closely as hard to do as implementing a brand new solution. Not to mention that there are special requirements from the environment of the emulation, that cannot be simply resolved.
Since not many coders have experience with JIT compiling on the whole world, the line of applicants for creating JIT compiling for PowerPC processor was pretty short for years.
On the other side, users needed the JIT compiling for UAE emulation for running all sorts of the legacy applications, which cannot be run on the recent incarnation of the AmigaOS systems (because those are buggy, not system-friendly, hitting the hardware directly, and a number of other reasons). The demand was so high that even a bounty was set up for this specific project on the AmigaBounty site, yet nobody wanted to take the job.
The insane with a rattle
You might ask why anybody would invest countless hours into this project. And I must say it is a valid question.
I have already experience in this field, I completed the similar component in AmigaOS4 (code name: Petunia). If you are interested in the fine details of the JIT compiling, then I suggest checking out my Project Petunia web page.
There are a few other reasons; some folks were bugging me with it for years and it is a bit of a challenge. Also there is this typical feeling in every coder’s head, when a project finished: trying to implement it second time the outcome would be much better because of the previous experience.
Well, here is the time to see how well it goes…
Principles to follow
There are some ground rules I will try to keep to make as many future users happy as possible. These are:
  • The source files will be freely available to anybody under GPL license or directly from me with a different license on request.
  • While the development will be done on AmigaOS4, I won’t use any AmigaOS4-related feature, which makes it harder to port the solution to any platform. (It means my previous project won’t be used directly in any form.)
  • I will try to keep the solution as flexible as possible, to let others make use of it without E-UAE.
  • I won’t sacrifice compatibility over speed.
  • I will implement some possible adjustments to let the user fine-tune the JIT and therefore improve compatibility with the old applications.
That is all for today, more technical details will follow in the next post.