Saturday, April 8, 2017

JSRF Ingame

First of all, sorry it's been a while since I've updated the blog.  Life hasn't exactly been fair as of late, so to speak.  But since I'm here, I'm going to share this bit of news that surfaced a few weeks ago, so some of you already know about this.

Go ahead and watch the video.  While it's buggy it's still impressive, considering that this is something I've never successfully done in my own lifetime.  I'm actually surprised the music works as well as it does.

Needless to say, SoullessSentinel is definitely better than I am.  He's done several things that I never thought to do and did it better than I could have.  So the torch rightfully belongs to him right now.

Cheers, to SoullessSentinel and Cxbx, the emulator that never says die!


Sunday, January 22, 2017

The macOS Experiment (Part 2)

Around the beginning of the new year, I mentioned what I titled "The macOS Experiment" and some of what I had planned for it.  Over the weekend, I wanted to put a bit of effort into it and see what would happen.  Did I get it working?  Well, almost.  And I'll get to that in a moment.  First, I'll share the git repo for those that want to see what I'm talking about:

There isn't much there but the bare minimum, and it was crappily written (no need to point out the obvious) because I just wanted to quickly see if I could get it up and running.

Basically what it does is this: It serves as a way to launch 32-bit code from the desired base address (for the sake of Xbox, 0x10000) by calling mmap to reserve that desired memory location and range as well as give us read, write and execute permissions on it, then injects the code to be run at the given base address (in this case, a .xbe file).  From there, we point to the .xbe file's main function (not the entry point, which is what Xeon did) with a function pointer and call it.  Actually before we call that function, we need to insert our wrapper functions to HLE the necessary functions in the .xbe.  For now, I just hard coded those to see if I could get a quick proof of concept going.

So to start off, I took the simplest .xbe to deal with, the first D3D tutorial in the XDK, CreateDevice.  All it does is create the D3D device, clear the screen to a random colour, then present it to us.  This is actually simpler than it sounds, considering that you don't have to literally create a D3D device.  All the D3D functions are C-based functions using the __stdcall calling convention.  The only parameter to even worry about there is D3DPRESENT_PARAMETERS anyway to initialize an OpenGL context that we want to match those parameters.  Clearing and swapping the screen are also very trivial things, but as you can see, I didn't literally add any of that code.  Why is that?  Because I needed to make sure that my hooked functions get called correctly, and this is where things got a bit tricky.

Let's take a look at this code where the "magic" begins (see macbox_xbe.cpp on the repo):

void macbox_install_wrapper( void* function_addr, void* wrapper_addr )
    uint8_t* func = (uint8_t*) function_addr;
    *(uint8_t*) &func[0] = 0xE9;    // JMP rel32 (quick and easy)
    *(uint32_t*)&func[1] = (uint32_t) wrapper_addr - (uint32_t) function_addr;

Just like Cxbx does, my code goes to the top of the function that we need to hook, and places a jmp rel32 instruction right there to immediately redirect it to our wrapper function.  Quite easy to do when you know how to encode x86 instructions.

So let's move on.  Next, let's hook some functions:

/* TODO: Move this elsewhere and don't hard code it either */
macbox_install_wrapper( reinterpret_cast(0x195B0), reinterpret_cast(Direct3D_CreateDevice) );
macbox_install_wrapper( reinterpret_cast(0x1A270), reinterpret_cast(D3DDevice_Clear) );
macbox_install_wrapper( reinterpret_cast(0x1ABC0), reinterpret_cast(D3DDevice_Swap) );

How did I get these offsets?  I loaded the .xbe up in IDA Pro using the flirt signatures to spot the D3D functions right away.  It really saved my arse and my sanity when working on Cxbx. 

After this, I was ready to go, or at least I thought I was.  When Direct3D_CreateDevice was first called, the parameters contained all gibberish.  On top of that, it crashed soon after the function returned.  I made sure that all the parameters were correct and the same as they were for Windows (at least, from a byte perspective).  But then I remembered that these functions have to use __stdcall in order to work on Windows for Cxbx.  So I had to find the equivalent for Mac.  This is what WINE used to define __stdcall (or also known as CALLBACK) for Mac:

#define CALLBACK __attribute__((__stdcall__)) __attribute__((__force_align_arg_pointer__))

Kinda long, isn't it?  So I used this, and it caused the immediate crash to go away.  The parameters are still gibberish for that particular function (Direct3D_CreateDevice) but with D3DDevice_Clear, the parameters were perfect!  Then after that function returned, something went wonky and the EIP ended up in a weird state resulting in another crash.  This brings me to the next issue.

From what I have read, the stack alignment in macOS is always 16-byte aligned, whereas with Xbox code it was 4-byte aligned.  This seems like it could be the issue.  I took a moment to get some assembly code screen shotted.

From IDA, this is the code that calls Direct3D_CreateDevice:

And this is the code that XCode generated...

Once more, IDA's output from whence D3DDevice_Clear was called:

And this is what XCode generated...

So the green line is actually where the breakpoint is placed at the beginning of the function.  So is the stack getting screwed up somewhere or what?  I'm quite sure there's a way around it considering that WINE was able to pull it off somehow.  Maybe I should ask them on their forum?  Well, I want to figure something out.

Since I didn't have all day to dedicate to this, I thought I'd at least share with you all what's going on.  The sooner I can fix this, the sooner I can move forward.  Frankly, I wouldn't be surprised if it was stupidly simple what I was missing.  If worse comes to worse, I guess I could do something super hacky to fix it, but I want to avoid that if I can.

So that's it for now.  Thanks for reading, and let me know what you think.


Monday, January 2, 2017

The macOS experiment (Part 1)

Lately, I haven't been doing much emulation wise.  Well, to be frank, I haven't done any serious emulation work in a long time (and just incase you're reading this JayFox, I'm not talking about xqemu, nor have I ~ever~ said I was doing any real work on it aside from a few work arounds here and there; hell I've even stated that I'm not an xqemu dev multiple times, so pleeeeeeeeeease spare me of your usual "3 lines of code" lecture because I'm really not in the mood, thanks and no disrespect), but since I'm in between jobs/paid projects yet again, I decided to pick up one of the previous experiments I started before out of curiosity.

What is this experiment?  Well, the goal of this little coding experiment was to claim the typical base address of 0x10000 that every .xbe uses.  For those who don't quite see what I'm getting at, let's take a look at how both Cxbx and Xeon, the first two Xbox emulators, worked from an internal perspective.  They both use HLE, yeah that's right sherlock, in order to run code natively on the host CPU, they both use a .exe to reserve the memory range beginning at 0x10000 and at least 64mb beyond that.  For Cxbx, the generated .exe from the .xbe has the base address set when the header is written.  For Xeon, the main .exe simply reserves that memory address range by forcing it to load at 0x10000 w/ a big global static array and it's overwritten with the .xbe contents.  The actual emulator code is run from a .dll that's loaded into memory well out of that range.  That's the only way they knew how to do it at the time (for windows at least).  Naturally, I started to wonder how would one do that on a Mac?

Since similar experiments for this have already been done on Linux, so I figured Mac could do it also with similar methods.  So I started a thread on ngemu a long while back, and asked for suggestions, hints and ideas.  At the time, I was quite new to Mac development, but at the time of writing this, I have grown to be quite experienced with it (although still not quite as knowledgeable of the lower level and kernel level aspects).  Instinctively, my first try was to use mmap() directly, or vm_allocate().  That didn't work.

Among one of the respondents to my thread back then was Ben Vanik, the author of Xenia.  He was running linux, and recommended this code:


namespace {
    void * const MEMORY_ADDRESS = reinterpret_cast(0x10000);
    size_t const MEMORY_SIZE = 1 << 20;

int main() {
    if (p != reinterpret_cast(-1)) {
        std::cout << "Memory allocated" << std::endl;
        munmap(p, MEMORY_SIZE);
    } else {
        std::cout << "Memory could not be allocated" << std::endl;

It worked for him, but still didn't work for me on macOS (OSX), so I was still back where I started.  Keep in mind that all these flags were necessary.  I needed to be able to read, write and execute from this memory range.  It also needed to be at that exact address (hence the MAP_FIXED flag), and so forth.  Many were against that last bit, which I'll explain in a moment.

After that, I eventually forgot about it and moved on to other projects.  Mostly working on this indie game that I started less than 2 months of starting that thread.  I eventually did get it working, and the reason it wasn't working was actually quite a testament to my ignorance and stupidity... I forgot to switch the compiler setting from x86_64 to i386!  Duh, no wonder it wasn't working.  I wanted to slap myself afterwards.  IIRC, there's a flag called MAP_32BIT which allows 64-bit programs to access the 32-bit memory address space, but it appears to only be available in Linux, not macOS.  Oh well, no big loss for now.

Now, what I mean by "work" is that mmap() stop failing.  I would finally get that base address, but it would simply crash on a call to std::cout.  Even commenting this out, unmapping the pointer and letting it return from main also resulted in a crash.  I'm sure you could theorize why, but finding a solution was what I was interested in.

Since there weren't many Mac devs on your every day emulation/gaming forum (most of them are extremely anti-Mac; many for ignorant and uninformed reasons, and few for legitimate and well thought out reasons) so I ended up going to a forum that was dedicated to Mac related programming topics;  While most of them did give me some sound advice, they didn't quite understand why I needed this exact functionality.  Many would suggest removing MAP_FIXED but that would give me a memory address out of the range I'm looking for and would essentially defeat the purpose.  I had to start two threads altogether, and only one person actually understood what I was trying to do, as he was also in the need to write a basic VM while lacking the proper resources to do it.  He tried this code in XCode, and it worked for him, but only after adding the following compiler flag in the "other linker flags" section:

-pagezero_size 0x10000000 -segprot __PAGEZERO rwx rwx

Frankly, I do not fully understand what this did, even though it generally looks pretty obvious.  A google search didn't yield much results either, but it worked.

Now, I understand that what I'm doing is considered "unsafe" and risky.  Even on Linux, it is considered the same to do so.  It's all part of some experiment I put a bit of free time into.  Was there any particular goal to it?  Well, for starters, I wanted to see at one point whether it were at all feasible or possible to bring Cxbx to Mac since using WINE wasn't enough to do it due to the memory map requirements (my assumption; it always crashed for me).  Another idea was to provide proof of concept that HLE and direct code execution was indeed possible on an Intel Mac without the need for a VM library or driver.  There's a VM framework for macOS now, but it requires a 2010 Mac with 10.10+ installed.  Since neither of my comps meet the former requirement, that wasn't an option for me.  Third, I've had the itch to implement just enough APIs in an attempt to run Azurik: Rize of Perathia on my Mac.  I am greatly intrigued by the possibility of playing this game in some hacked form of 1080p or 4k using certain Apple exclusive OpenGL extensions, but hesitant to re-live the pain of trying to get this game working on Cxbx.

So far, it appears to work without issues, but I'm sure there could be some hidden caveats somewhere.  What I'd like to do later on is experiment with a simple .xbe that simply creates a D3D device, and clears the screen and HLE that as proof of concept.  I don't really plan on going too much farther than that, but I am curious to find out if there's anything else that can evolve out of this, even if it doesn't mean emulating Xbox.  Who knows?  Maybe an HLE emulator of something else?  Like one of Sega's recent arcade hardware machines like Europa?  Hey, I just like to learn, and emulating Xbox even on an HLE level has helped me tremendously personally and in my professional career (which is part of the reason I have this job interview coming up; I'll blog about what I mean about this statement later).

I know it's late but happy new year, and happy coding.  It's late so I'm off to bed.


Tuesday, October 11, 2016

Cxbx-Reloaded Proof of Concept release

While I'm at it, I wanted to let you know that SoullessSentinel has uploaded a proof of concept release of Cxbx-Reloaded on his website and on github.  As it says, don't expect much at this point, as the next release will be vastly different as he plans to remove much of the HLE stuff.


It's also highly encouraged that you contribute to the compatibility list.  The website gathers information on your game's .xbe, lets you choose the status of the game, and then saves the result, it's that easy.  So have at it.

As proof of concept from my end, I shared a few screens of Innocent Tears *almost* going ingame.  It would be awesome if it did considering that not only is this an Xbox exclusive title, but was almost never released to begin with AFAIK.

So have at it.  I'd try more, but I don't really have access to many games while I'm at work.  Looking forward to what else he's got planned.


Sunday, September 25, 2016

Cxbx-Reloaded continues

I wanted to make at least a brief mention of this at the very least.  SoullessSentinel is still working on Cxbx-Reloaded.  It's kind of surprising since I haven't talked to him in quite a while nor have I even kept up on his progress.  So far, it appears to be coming along somewhat.

After looking through the git repository, it looks like someone fixed a handful of things that I was either too lazy to implement properly or just completely forgot about.  One thing he fixed was my implementation of section loading; a proper solution was lacking and I lazily added what was necessary for a few games just to test it.  I looked through that code thinking "man, I sure hope somebody fixed that", and bam, it was there, lol.  They also did a way with alot of the previous FS stuff.  Interesting nonetheless.

If you want to take a look at the repro, click here:

And if you want to see screens, click here:

Some of these titles I've never seen run before, so I guess that's some good progress in itself.  Best wishes for this branch.


Tuesday, May 17, 2016

Azurik: A hack worth a mention

This was the positive result of a hack done by JayFox some time ago, and I don't believe I've ever shared it here.  And the reason I'm sharing this is because it's screen footage of XQEMU running my favourite Xbox game, Azurik: Rize of Perathia!

There's a reason why I've always considered this game to be one of, if not the, hardest games to emulate on Xbox.  I couldn't fix it on Cxbx (and I tried for years), Microsoft couldn't do it, and even JayFox himself had quite a bit of trouble with it.  In order for this game to work, JayFox had to add a series of hacks.  I don't remember what he did, but getting it this far was quite a feat.  From what I remember, he tells me that the bug you see is an issue with floating point precision for vertex skinned meshes.  I ran into a similar issue on Cxbx, but it was much more severe as everything was greatly distorted.

The hacks used to get this far are not in his main branch, so I cannot replicate it.  Just wanting to share a bit of news that never seemed to get the mention it deserved.


Sunday, May 8, 2016

Oh no, is xqemu dead??

Heh, of course not!

Now, your concern about the xqemu project is greatly appreciated, however it's important to remember that JayFox and espes are two busy people.  Both of them last I checked are busy with University classes and what not, and espes has a job also.  So lately they've had their hands full.  Every once and a while progress on an emu will slow down due to IRL taking priority, that doesn't mean that it's dead because you haven't heard anything in a while or haven't seen any updates on github in a few months.  Trust me, it's not going anywhere; *Madara Uchiha voice* settle down!

So, for all of you SEGA fans out there, please enjoy this video of House of the Dead 3.  Runs quite nicely, no?

The speed of the video was increased by 8x roughly.  So considering how problematic this game was to emulate on Cxbx and xqemu at first, I'd say that xqemu is doing just fine.  The compatibility is pretty good in many cases, just a few things to sort out here and there.