Generated code in newer GCC builds

Started by biolizard89, April 11, 2013, 08:52:04 AM

Previous topic - Next topic

biolizard89

Hey everyone, long time no see.

So I had never gotten around to updating my DevKitPPC install since first getting it installed on my laptop circa 2011.  Meaning that I've been using GCC 4.4.3 to build the USB Gecko driver for GeckoTunnel.  Well, I had to reformat last month (stupid hard drive committed suicide), and while I was restoring from backups, I figured I'd try the newer DevKitPPC.  So now I'm on GCC 4.6.3.  And it seems to be really different in terms of generated code.

Here's a snippet of ASM output from 4.4.3:

do_geckotunnel_comm:
mflr 0
stwu 1,-16(1)
stw 31,12(1)
mr 31,3
stw 0,20(1)
b .L33
.L39:
lwz 3,0(31)
cmpwi 7,3,-1
beq- 7,.L39
bl usb_receivebuffersafe
addi 31,31,8
.L33:
lwz 4,4(31)
cmpwi 7,4,0
bne+ 7,.L39
b .L42
.L38:
lwz 3,0(31)
cmpwi 7,3,-1
beq- 7,.L38
bl usb_sendbuffersafe
.L42:
addi 31,31,8
lwz 4,4(31)
cmpwi 7,4,0
bne+ 7,.L38
lwz 0,20(1)
lwz 31,12(1)
addi 1,1,16
mtlr 0
blr


Now here's the 4.6.3 version:

do_geckotunnel_comm:
mflr 0
mr 11,1
stwu 1,-16(1)
bl _savegpr_31
stw 0,20(1)
mr 31,3
b .L33
.L38:
lwz 3,0(31)
cmpwi 7,3,-1
beq- 7,.L38
bl usb_receivebuffersafe
addi 31,31,8
.L33:
lwz 4,4(31)
cmpwi 7,4,0
bne+ 7,.L38
b .L41
.L37:
lwz 3,0(31)
cmpwi 7,3,-1
beq- 7,.L37
bl usb_sendbuffersafe
.L41:
addi 31,31,8
lwz 4,4(31)
cmpwi 7,4,0
bne+ 7,.L37
addi 11,1,16
b _restgpr_31_x


A quick Google suggests that the savegpr and restgpr crap are actually from libgcc.  Which is useless for me, since GeckoOS doesn't have libgcc available to C0 codes.  I disassembled libgcc, figuring I could just embed this magical library function myself, and found this:

Quote00000044 <_savegpr_31>:
  44:   93 eb ff fc    stw     r31,-4(r11)
  48:   4e 80 00 20    blr

[snip]

00000044 <_restgpr_31>:
  44:   83 eb ff fc    lwz     r31,-4(r11)
  48:   4e 80 00 20    blr

So it's saving stuff to r11 rather than the stack which would be r1.

And then I examined the rest of the GCC-generated code and noticed that it's copying r1 to r11 and using r11 as a stack, or something.  I'm not really sure what the intention is here.  dcx2's register safety thread says that r11 is used to cache the stack pointer... what is the purpose of this?  Is this caching (assuming that's what it is) safe to do, given that interrupts could be triggered at any time?  (I'm doing this in a C0 code.)

The generated code is certainly shorter than the older GCC's generated code, so if this is legit, I'll switch to the newer GCC for GeckoTunnel, and enjoy the RAM savings.  But I'd like some explanation of what's going on... is anyone able to explain this?

Many thanks.  (I hope people still hang around here?)

γRB

#1
The most times you can find the stack into r29(Convention), but that's not for the Wii, as it's into r1.

   r1 sp stackpointer
   r2 rtoc global pointer to _SDA2_BASE_
   r13 global pointer to _SDA_BASE_

biolizard89

Quote from: γRB on April 11, 2013, 01:37:43 PM
The most times you can find the stack into r29(Convention), but that's not for the Wii, as it's into r1.

   r1 sp stackpointer
   r2 rtoc global pointer to _SDA2_BASE_
   r13 global pointer to _SDA_BASE_

I'm sorry, I don't understand your post... I know that r1 is the stack pointer; I don't understand what r2 and r13 have to do with my question.  (If it wasn't clear, I'm running the compiled code on the Wii in a GeckoOS C0 code.)

γRB

Oh, I didn't fully understand what you wanted, sorry.

dcx2

Actually I will wager that savegpr and restgpr are probably available somewhere in main memory.  It's different for each game, probably, and they might not all use r11, some of them might still use r1, depends on what mood the compiler was in when it ran.

It is odd that sometimes the compiler uses r11 as a parameter register to these savegpr/restgpr functions.  r11 isn't really allowed for this purpose according to the PPC EABI.  I don't know why they need r11 for this purpose when they could just do extra math on r1.  Maybe if a function needed to dynamically calculate the stack offsets at runtime?  Who knows.

Caching the stack pointer in r11 is perfectly safe.  If an interrupt needs to run, it will use r1.  User code will never "see" any manipulation made to r1 during an interrupt.

I find it particularly odd that it wouldn't just inline this function.  I mean, you're caching the stack pointer with one instruction, and you've now got a bl and a blr, and they're all 100% superfluous.  You could remove the mr r11,r1, the bl, and the blr, and just replace the bl with stw r31, 12(r1) (like the old GCC did).  Faster AND smaller.  Maybe you have optimizations off?

biolizard89

Quote from: dcx2 on April 13, 2013, 08:33:12 PM
Actually I will wager that savegpr and restgpr are probably available somewhere in main memory.  It's different for each game, probably, and they might not all use r11, some of them might still use r1, depends on what mood the compiler was in when it ran.

It is odd that sometimes the compiler uses r11 as a parameter register to these savegpr/restgpr functions.  r11 isn't really allowed for this purpose according to the PPC EABI.  I don't know why they need r11 for this purpose when they could just do extra math on r1.  Maybe if a function needed to dynamically calculate the stack offsets at runtime?  Who knows.

Caching the stack pointer in r11 is perfectly safe.  If an interrupt needs to run, it will use r1.  User code will never "see" any manipulation made to r1 during an interrupt.

I find it particularly odd that it wouldn't just inline this function.  I mean, you're caching the stack pointer with one instruction, and you've now got a bl and a blr, and they're all 100% superfluous.  You could remove the mr r11,r1, the bl, and the blr, and just replace the bl with stw r31, 12(r1) (like the old GCC did).  Faster AND smaller.  Maybe you have optimizations off?
Thanks dcx2, that was very informative.  Funny you should mention optimizations... the output I posted was with -Os.  With no optimization flags passed to GCC, the output actually doesn't contain any references to savegpr or restgpr.  This seems to me to be a bug in GCC, since as you mention, the savegpr/restgpr actually makes the output larger.  Very odd.

biolizard89

Well, I can now confirm that the new GCC is broken in terms of embedding its generated ASM into a C0 code.  I took the C code I had, compiled it in both the old and new GCC, added the standard stack frame stuff to the beginning and the end, and embedded in a C0 code.  The new GCC code crashes as soon as the code handler executes.  The old GCC code works fine.

This is really, really unsettling, as I've always assumed that GCC was mature enough that the significant bugs were gone by now.  I guess I was wrong.

Is anyone else using GCC for Gecko code development?  If anyone's interested, I can upload a copy of the old DevKitPPC, since it seems that the DevKitPro downloads page doesn't have it anymore.