Making a code receive data from USB Gecko

Started by biolizard89, September 24, 2010, 02:53:27 AM

Previous topic - Next topic

biolizard89

So I'm trying to write a code which interacts with the USB Gecko to send/receive custom data to a PC app.  So far, I can make the code send data over the Gecko with no issues.  However, trying to have the code receive data from the Gecko into a buffer in the Wii's RAM immediately causes the game to crash.  It is further complicated by the fact that since my code takes over the USB Gecko, trying to debug the ASM line-by-line via Gecko.NET is not possible.

The code never crashes without the branch to the code handler's exireceivebuffer routine (although obviously it doesn't receive any data).  When I do branch to that routine, something crashes, although I can't tell for sure where in the routine it crashes (since I can't debug while my code is trying to use the Gecko).

So, here's the entire ASM:

[spoiler]# save stack frame
stwu r1,-0x80(r1)
stmw r2,8(r1)

# save LR to r8 and CTR to r6
mflr r8
mfctr r6

la r9, 0(r20) # set r9 to PO... this will need a tweak whenever the code handler is modified... r20 in GameCube code handler... r16 in Wii code handler
la r11, 4(r9) # set r11 to PO+4

receivesegment: # start of loop

lwz r12, 0(r9)
lwz r3, 0(r11)

cmpwi cr0, r12, 0 # compare r12 to 0... was cmpdi r9, 0, but apparently that doesn't work on 32-bit PowerPC

beq endsegment # break the loop if we hit the null

# call the code handler routine... this will need a tweak whenever the code handler is modified... 0x80001E00 for GameCube
lis   r7,0x8000
ori   r7,r7,0x1E00
mtctr   r7

# run the Gecko Receive routine within its own stack frame
stwu r1,-0x80(r1)
stmw r2,8(r1)
bctrl # this is the actual branch THIS IS WHERE IT CRASHES
lmw r2,8(r1)
addi r1,r1,0x80

la r9, 8(r9) # r9 += 8
la r11, 8(r11) # r11 += 8
b receivesegment # end of loop
endsegment:

# restore LR from r8 and CTR from r6
mtlr r8
mtctr r6

# restore stack frame
lmw r2,8(r1)
addi r1,r1,0x80[/spoiler]

This ASM is compiled into a C0 code, and executed immediately after the following Ocarina code:

4E000008 00000000 // Load P2 address list pointer into PO
66000002 00000000 // Jump over P2 address list

80CD3614 0000000C // P2 XYZ
00000000 00000000 // Null terminator, end of address list


The entire code is supposed to read data from the Gecko and write 0x0C bytes of it to 0x80CD3614, and then stop when it hits the line with the 0's.  (But if I were to have more lines of addresses/lengths in that block, it would cycle through them until it reached the 0 line.)  This particular address is the XYZ coords of player 2 in Sonic Adventure 2: Battle for GameCube.  (I'm running this with the GameCube code handler.)

Changing the branch address from 0x80001E00 to 0x80001E34 (which is the exisendbuffer routine) does work (it sends 0x0C bytes over the USB Gecko from 0x80CD3614).

So, that's my situation.  Any suggestions on why receiving data doesn't work properly but sending does?  I'm fairly new to ASM.  (And I apologize for the long post here; hopefully the technical details I'm providing are useful.)

Thanks!

EDIT: In case it's any help, here are the relevant disassembled routines from the GameCube code handler I'm using, since it's somewhat different from the Wii code handler that is open-source:
[spoiler]800036BC:  7FC802A6   mflr   r30      #exireceivebyte
800036C0:  3C60A000   lis   r3,-24576
800036C4:  48000019   bl   0x800036dc
800036C8:  74C30800   andis.   r3,r6,2048
800036CC:  54C5843E   rlwinm   r5,r6,16,16,31
800036D0:  98AD0000   stb   r5,0(r13)
800036D4:  7FC803A6   mtlr   r30
800036D8:  4E800020   blr   
800036DC:  92F86814   stw   r23,26644(r24)      #checkexisend
800036E0:  90786824   stw   r3,26660(r24)
800036E4:  92D86820   stw   r22,26656(r24)
800036E8:  80B86820   lwz   r5,26656(r24)      #exicheckreceivewait
800036EC:  70A50001   andi.   r5,r5,1
800036F0:  4082FFF8   bne+   0x800036e8
800036F4:  80D86824   lwz   r6,26660(r24)
800036F8:  90B86814   stw   r5,26644(r24)
800036FC:  4E800020   blr   
80003700:  7D4802A6   mflr   r10      #exireceivebuffer... r3 counter, r12 buffer... 0x80001E00
80003704:  7C6903A6   mtctr   r3
80003708:  39C00000   li   r14,0
8000370C:  4800007D   bl   0x80003788      #bufferloop
80003710:  48000079   bl   0x80003788
80003714:  4BFFFFA9   bl   0x800036bc
80003718:  4182FFF4   beq+   0x8000370c
8000371C:  888D0000   lbz   r4,0(r13)
80003720:  7C8E61AE   stbx   r4,r14,r12
80003724:  39CE0001   addi   r14,r14,1
80003728:  4200FFE4   bdnz+   0x8000370c
8000372C:  7D4803A6   mtlr   r10
80003730:  4E800020   blr   
80003734:  7D4802A6   mflr   r10      #exisendbuffer... r3 counter, r12 buffer... 0x80001E34
80003738:  7C6903A6   mtctr   r3
8000373C:  39C00000   li   r14,0
80003740:  7C6C70AE   lbzx   r3,r12,r14      #sendloop
80003744:  4800001D   bl   0x80003760
80003748:  4182FFF8   beq+   0x80003740
8000374C:  39CE0001   addi   r14,r14,1
80003750:  4200FFF0   bdnz+   0x80003740
80003754:  7D4803A6   mtlr   r10
80003758:  4E800020   blr   
8000375C:  386000AA   li   r3,170      #exisendbyteAA
80003760:  7FC802A6   mflr   r30      #exisendbyte
80003764:  5463A016   rlwinm   r3,r3,20,0,11
80003768:  6463B000   oris   r3,r3,45056
8000376C:  3AC00019   li   r22,25
80003770:  3AE000D0   li   r23,208
80003774:  3F00CC00   lis   r24,-13312
80003778:  4BFFFF65   bl   0x800036dc
8000377C:  54C337FF   rlwinm.   r3,r6,6,31,31
80003780:  7FC803A6   mtlr   r30
80003784:  4E800020   blr   
80003788:  7FC802A6   mflr   r30      #exicheckreceive
8000378C:  3C60D000   lis   r3,-12288      #exicheckreceive2
80003790:  4BFFFF4D   bl   0x800036dc
80003794:  54C337FF   rlwinm.   r3,r6,6,31,31
80003798:  4182FFF4   beq+   0x8000378c
8000379C:  7FC803A6   mtlr   r30
800037A0:  4E800020   blr   
[/spoiler]

biolizard89

I really don't like bumping posts, but does anyone have suggestions on this?  dcx2?  brkirch?  Anyone?

hetoan2



Check out my site with codes obviously...
http://hetoan2.com/

and youtube...
http://youtube.com/hetoan2

Nuke

biolizard89,

You could be over complicating things as upload and download from memory commands are already available, couldn't you use those? Just make your PC app to interact with those and let the handler always do its thing.

If i remember the 8 bytes are just the start and end address you, would use 0x80CD3614, 0x80CD3614+0C


uploadcode:
   bl   exisendbyteAA

   li   r3, 8         # 8 bytes
   ori   r12, r31, dwordbuffer@l   # buffer

----
readmem:
   bl   exisendbyteAA

       li   r3, 8         # 8 bytes
   ori   r12, r31, dwordbuffer@l   # buffer
0xFFFFFFuuuuuuu

biolizard89

Hi Nuke,

Thank you very much for the reply.

I am not using the standard upload/download commands which WiiRd uses because they require a round trip between the PC and the Wii before the data is transferred.  This code is intended to dump and upload to many memory ranges every frame, and waiting for a ping for each memory range would severely slow down the game.  My method, at least for dumping the memory, is able to dump a memory range every frame with no noticeable slowdown.  I imagine that having an upload as well as a dump will slow down the game slightly, but it will be one round trip per frame instead of the 10 or so round trips which using the standard commands.

In addition, my method allows a Gecko code to control what memory ranges get written to.  So it would allow access to data that uses pointers, which the standard commands do not permit (unless I made more round-trip pings to interpret the pointers).

Any suggestions?  The comments in the code handler imply that all I need to do is set r3 and r12 with the byte counter and the start address, and branch-link to the exireceivebuffer routine.  Am I reading the code wrong?  Are there some other registers that I need to be careful with?

Thanks!

Nuke

I now understand.

What I would suggest is to write a custom handler using the Wiird handler as a base. You only need to change the EXI registers from 0xCD to 0xCC values in the asm for it to be compatible with gamecube. And maybe you could then upload the handler and overwrite the existing one.

I can't say where or why your asm is crashing, but with asm things can easily go wrong as its not so user friendly. The only reason the Gecko handler is so tight optimized asm, as we know from the very beginning we had a very tiny space to work with.

sorry I can't help more than that.
0xFFFFFFuuuuuuu

biolizard89

Quote from: Nuke on October 12, 2010, 04:49:27 PM
I now understand.

What I would suggest is to write a custom handler using the Wiird handler as a base. You only need to change the EXI registers from 0xCD to 0xCC values in the asm for it to be compatible with gamecube. And maybe you could then upload the handler and overwrite the existing one.

I can't say where or why your asm is crashing, but with asm things can easily go wrong as its not so user friendly. The only reason the Gecko handler is so tight optimized asm, as we know from the very beginning we had a very tiny space to work with.

sorry I can't help more than that.
Thanks Nuke!  I was unaware that porting the newer code handler to GameCube was a simple matter of changing the EXI registers.  I'll play around with it.

That said, I don't want to remove the codetype functionality of the code handler, as I would like Gecko codes to have control over which memory ranges get dumped/loaded, and for miscellaneous Gecko codes to be usable simultaneously.  I recall that the new code handler supports moving the code list to a custom empty memory range... I take it that would give me extra memory to implement my functionality?  Maybe stick the C USB Gecko library (about 2KiB) in that memory range so that I don't have to figure out the weirdness of why the ASM USB Gecko lib that the code handler uses is crashing for me?  Does that sound like a decent plan to try?

Thanks!

biolizard89

Quote from: biolizard89 on October 16, 2010, 07:55:57 PM
Quote from: Nuke on October 12, 2010, 04:49:27 PM
I now understand.

What I would suggest is to write a custom handler using the Wiird handler as a base. You only need to change the EXI registers from 0xCD to 0xCC values in the asm for it to be compatible with gamecube. And maybe you could then upload the handler and overwrite the existing one.

I can't say where or why your asm is crashing, but with asm things can easily go wrong as its not so user friendly. The only reason the Gecko handler is so tight optimized asm, as we know from the very beginning we had a very tiny space to work with.

sorry I can't help more than that.
Thanks Nuke!  I was unaware that porting the newer code handler to GameCube was a simple matter of changing the EXI registers.  I'll play around with it.

That said, I don't want to remove the codetype functionality of the code handler, as I would like Gecko codes to have control over which memory ranges get dumped/loaded, and for miscellaneous Gecko codes to be usable simultaneously.  I recall that the new code handler supports moving the code list to a custom empty memory range... I take it that would give me extra memory to implement my functionality?  Maybe stick the C USB Gecko library (about 2KiB) in that memory range so that I don't have to figure out the weirdness of why the ASM USB Gecko lib that the code handler uses is crashing for me?  Does that sound like a decent plan to try?

Thanks!
Okay, I got bidirectional USB Gecko transfer working in a GeckoOS code.  It uses 119 lines of GeckoOS codes, and has been minimally tested (read: it probably crashes in many situations which I didn't test), but it does appear to function now.  Very little impact on game performance, either, although I did need to lower the USB Gecko's latency timer.  At the moment the code has no practical use, but once I develop a useful app that uses the code (and do further testing and cleanup), I will release it.  Thanks for your help Nuke!

Nuke

#8
I'm pretty sure that I added a command to send and receive at the same time in the first version of the VHDL which you may have. I never documented it or released it public.

I think it was command 0xC i.e EXI_CHAN1DATA = 0xC0000000;

One way to test is make some test code for both read and write and use 0xC instead of 0xA, 0xB

The vhdl I released is actually missing this and also the ID stuff, as I think I just uploaded the wrong version v2.0 which V1.0 is actually the one people have.

Please try it and let me know.

Edit: I think 0xC was changed to a check TX command later on, so I will dig out my old sources and check this for you.
0xFFFFFFuuuuuuu

biolizard89

Quote from: Nuke on October 22, 2010, 05:43:31 AM
I'm pretty sure that I added a command to send and receive at the same time in the first version of the VHDL which you may have. I never documented it or released it public.

I think it was command 0xC i.e EXI_CHAN1DATA = 0xC0000000;

One way to test is make some test code for both read and write and use 0xC instead of 0xA, 0xB

The vhdl I released is actually missing this and also the ID stuff, as I think I just uploaded the wrong version v2.0 which V1.0 is actually the one people have.

Please try it and let me know.
Hey Nuke, thanks for the reply and info!  The USB Gecko lib I have on hand includes this:
/*---------------------------------------------------------------------------------------------*
    Name:           usb_checksendstatus
    Description: Chesk the FIFO is ready to send
*----------------------------------------------------------------------------------------------*/

static int __usb_checksendstatus()
{
s32 i = 0;

exi_chan1sr = 0x000000D0;
exi_chan1data = 0xC0000000;
exi_chan1cr = 0x19;
while((exi_chan1cr)&1);
i = exi_chan1data;
exi_chan1sr = 0x0; 
if (i&0x04000000){
return 1;
}   
    return 0;
}

So it looks like command 0xC is used to check the send FIFO status.  0xD is apparently used to check the receive FIFO status.  Would your undocumented function be command 0xE?  If you can post some documentation (C code would be awesome) on what I would have to do to send and receive something simultaneously, I would happily test it when I have a chance (although I admit I don't have loads of free time to do this testing).  So I would be giving this: exi_chan1data = 0xE0000000 | (sendbyte<<20); assuming that it's command 0xE?  Should I expect the EXI bus to return 0x0C000000 if both the send and receive worked, since send is 0x04000000 and receive is 0x08000000?  How will this behave if a byte is sent but not received?  Would I use an ioctl on the PC side?  If so, what control code would I specify to the FTDI driver's ioctl on the PC?

Thanks again Nuke, your expertise is very helpful!

Nuke

#10
It is full duplex so it can send and receive at once, I just didn't document it and later removed the command. I will check my old backups when i go home tonight. It is the VHDL code that handles the procedure, it just does both read and write operations at once.

Its not commands E and F as those are used to control the flash memory.

Try ,0x6,0x7,0x8 if it doesn't work then I probably did remove it in the released version.

i = exi_chan1data;

put a printf("%x\n",i); after the above line and it will tell you the returned value. yes it should return 0xC :)




0xFFFFFFuuuuuuu

dcx2

mmmm...VHDL....

I'm interested in this, as it could potentially benefit Gecko.NET, too.

biolizard89

Quote from: Nuke on October 22, 2010, 06:20:36 AM
It is full duplex so it can send and receive at once, I just didn't document it and later removed the command. I will check my old backups when i go home tonight. It is the VHDL code that handles the procedure, it just does both read and write operations at once.

Its not commands E and F as those are used to control the flash memory.

Try ,0x6,0x7,0x8 if it doesn't work then I probably did remove it in the released version.

i = exi_chan1data;

put a printf("%x\n",i); after the above line and it will tell you the returned value. yes it should return 0xC :)
Hmm, well, my implementation using the half-duplex commands appears to be fast enough for now... I will try implementing it using the full-duplex command when I have some down time, but I'm not certain when that would be.

That said, one question: how would the code look like on the PC-side?  The FTDI driver has receive and send functions that don't appear to be full-duplex or thread-safe, and it appears to me that if I initiate sending bytes, the driver will wait until all the bytes are sent before it will let me receive, and vice versa.  There's an ioctl function with no documentation; is that required?  Or is there some other method that is necessary?

Thanks!

Nuke

#13
You could take advantage of it properly if threaded, but I never tried it further than a test, which is probably the reason I removed it. I've added back to USB Gecko SE, so if you want to try it out send me a PM.


0xFFFFFFuuuuuuu