WiiRd forum

Main Forums => The Collective => Topic started by: wiiztec on June 05, 2010, 03:54:27 AM



Title: LUT discussion.
Post by: wiiztec on June 05, 2010, 03:54:27 AM
What the hell is LUT?


Title: LUT discussion.
Post by: Romaap on June 05, 2010, 09:29:07 AM
Lookup Table, its just a chunk of values.
These values can then be used by doing something like:
Code:
if button is pressed
increase X
endif
write ( [start_of_lookup_table] + X ) to ________


Title: LUT discussion.
Post by: wiiztec on June 05, 2010, 12:52:19 PM
Is it static in memory with a bunch of floating points? if so i've used values from it in several of my codes


Title: LUT discussion.
Post by: dcx2 on June 05, 2010, 01:07:40 PM
A Look-Up Table (LUT) is a way to speed up complex calculations.  For example, when you punch in cos 30 into your calculator, it uses the 30 as an index into a cosine LUT, and at index 30 is the resulting value.  When you "look up" the value for 30, you will get the resulting value for cos 30.  (this is actually a lie, real calculators probably use LUT/interpolation or better yet Taylor expansions)

The blue lines in my Surface Swapper are the LUT embedded in the code (blr/mflr technique courtesy of brkirch).  The ASM uses the current surface value to index into the LUT, and then use the value at that index as the new surface value.

If you didn't modify the LUT, then when you stand on, say, lava (0x0A)....the lbzx will index to the 11th cell of the array (arrays start counting at 0).  The unmodified code naturally has an 0x0A at this index.  If you want to replace lava with ice, you would simply change the 0x0A in the LUT into 0x21.  Then, when the lbzx indexes to the 11th cell, it finds a 0x21 instead of an 0x0A.  This is how it does 0x0A -> 0x21.  The best part about the LUT is that you can do this for EVERY surface individually.


Title: LUT discussion.
Post by: hetoan2 on June 06, 2010, 11:43:42 PM
@dcx2,

how would you go about making a look up table for any general situation?

I'm not really sure what this is all about:

bl 0x0004       # get pointer to next instruction
mflr r4         # put pointer in r4

are those specific to mario galaxy 2 or what? i don't really understand what it is doing.

EDIT: I think those are to figure out the addresses that change because of the position it is loaded into the code handler...
am i right?

Also is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S


Title: LUT discussion.
Post by: Romaap on June 07, 2010, 05:38:03 AM
@dcx2,

how would you go about making a look up table for any general situation?

I'm not really sure what this is all about:

bl 0x0004       # get pointer to next instruction
mflr r4         # put pointer in r4

are those specific to mario galaxy 2 or what? i don't really understand what it is doing.

EDIT: I think those are to figure out the addresses that change because of the position it is loaded into the code handler...
am i right?

Also is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S
Your edit is right, what that does is branch to Program Counter + 4 (the next instruction) and puts the next instruction's address in the Link Register.
The second line copies the Link Register to r4, so now r4 will hold the address of the code.


Title: LUT discussion.
Post by: dcx2 on June 07, 2010, 03:27:17 PM
hetoan, the bl/mflr technique is not game-specific, nor is the concept of a LUT, but it's extremely off-topic here...if you start a thread in Wii Game Hacking Help, I will elaborate.  A LUT for 0x00 through 0xFF could be large, but there are methods for mitigating the size, and there are also situations where a LUT won't work at all, but sometimes there are other ways to use something like a LUT.


Title: Re: LUT discussion.
Post by: hetoan2 on June 07, 2010, 05:28:23 PM
Split topic. My apologies on not doing so before.


Title: Re: LUT discussion.
Post by: dcx2 on June 07, 2010, 06:36:10 PM
Oh neat, I thought you could do that but I wasn't sure exactly how.  I could have used this Split Topic thing before...I'll give a thorough explanation here, but some of this has already been covered.

bl = branch and link.  Branch means to jump program execution to the specified address.  The currently executing instruction's address is stored in what is generically referred to as the Program Counter (the register is SRR0).  Normally, after an instruction is executed, the Program Counter (PC) is incremented by 4 (the number of bytes in an instruction; since the PowerPC is a RISC architecture, all instructions are the same size).  However, for a branch, the PC is incremented or decremented by a different value (relative branch), or it might even be overwritten altogether (absolute branch).

The Link part comes in with the Link Register, LR.  Usually we just branch and we never go back to the branch point.  Sometimes, after branching somewhere, you want to come back to where you were before the branch (for example, function calls).  The bl instruction will put PC + 4 into the LR (i.e. the address of the next instruction), so that execution can continue when the program "links back" using blr.

The beauty of bl 4/mflr is that it branches to PC + 4, which is what normal program execution will do (so we do not need the matching blr), but it also puts PC + 4 into the LR.  This is what provides us with the location of the instructions in memory.  It's a lot like the 4E code type in that regard.  We then use mflr (Move From Link Register) to get the address of the instruction after the bl (i.e. the mflr)

brkirch demonstrated this technique with his Moon Jump code for Super Mario Galaxy 2.  Instead of using three Gecko Registers or some arbitrary (hopefully unused) address to store the "launch velocity vector" of the jump, he padded his C2 code with some blank lines, and made the C2 code branch over the blank lines.  Now he has a very small "data area" where he can store the velocity vector, literally embedded in the C2 code.  In order to gain access to this data (because as you said, it will change location depending on the codes that you are using, etc) he needs a pointer, and he gets the pointer using bl/mflr.

I will post again with more details specifically about how to use a Look Up Table, what the limitations are, and some ways to get around those limitations.


Title: Re: LUT discussion.
Post by: hetoan2 on June 07, 2010, 07:37:05 PM
Ah yes, I completely understand now :D

thanks.

I mean you can skip a set of values in theory using cmpwi's and then if its in the next range you just do some set mathematical operations to make it uniform no?

that way if it goes from 00 - 15 and then from 6F - 80 you can just have it recognize when the value is at 6F or above and then have it follow the rest of the table, without some 20 extra lines of code.

As long as the table swaps the values then its okay.

Thanks... Also, when i split the topics it put this under Codes, I can move it to Wii Hacking/Help if you'd like, I didn't want to move it in case you end up losing this thread's location :S


Title: Re: LUT discussion.
Post by: paprika_killer on June 07, 2010, 08:33:05 PM
Wouldn't "the collective" be the place for this?

very nice explanation btw.

Also, bl is often used to call "functions" right?


Title: Re: LUT discussion.
Post by: brkirch on June 08, 2010, 01:06:35 AM
Also is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S

There is no limit other than what can fit into memory.  That is why the F2/F4 code types were necessary instead of having the XOR checksums simply added directly to the C2 code type (the second part of a C2 code is read as a 32-bit line count).

Also, bl is often used to call "functions" right?

Yes, although sometimes bctrl or even bctr is used instead.  The LR register value is then stored to the stack if the called function contains more function calls.  If the LR register value is not stored to the stack and you are using a lookup table in a C2 code then it is necessary to move the LR register to another register using the mflr instruction before using bl 0x4 and mflr and then restore the LR register value at the end of the code using mtlr.

In the Super Mario Galaxy 2 code topic I posted commented disassembly of the moon jump code I created.  However the problem with it was that the offsets used to access the data had to be calculated manually which can be a lot of work for longer codes (like the SMG2 transformation code I will be releasing soon, which has over 400 lines of assembly) and has to be done every time the code length changes.  The solution I found was to rewrite the codes like this:
Code:
/* ASM insert for storing initial jump velocity */

.set codeaddress,0x803A2FE0

.set length,end1-start1
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset1
.set initialYoffset,initialY-offset1
.set initialZoffset,initialZ-offset1

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start1:
bl 0x4 #move address of next instruction to LR (offsets off of that address will be used for storing initial jump velocity components within this code)
offset1:
mflr codereg2 #move LR to codereg2
lwz codereg1,724(r31) #read initial jump velocity x component
stw codereg1,initialXoffset(codereg2) #store a copy of initial jump velocity x component
lwz codereg1,728(r31) #read initial jump velocity y component
stw codereg1,initialYoffset(codereg2) #store a copy of initial jump velocity y component
lwz codereg1,732(r31) #read initial jump velocity z component
stw codereg1,initialZoffset(codereg2) #store a copy of initial jump velocity z component
lis r4,-32660 #execute instruction originally at this address
b end1 #skip section of code used for data
initialX:
.int 0x00000000 #initial jump velocity x component is stored here
initialY:
.int 0x00000000 #initial jump velocity y component is stored here
initialZ:
.int 0x00000000 #initial jump velocity z component is stored here
end1:
.int align
.balignl 8,0

/* ASM insert for replacing jump velocity with initial jump velocity */

.set codeaddress,0x80388E44

.set buttonsaddr,0x80750A00

.if buttonsaddr<<16>>16>=0x8000
.set buttonsaddrhigh,buttonsaddr>>16+1
.set buttonsaddrlow,buttonsaddr<<16>>16-0x10000
.else
.set buttonsaddrhigh,buttonsaddr>>16
.set buttonsaddrlow,buttonsaddr<<16>>16
.endif

.set length,end2-start2
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset2
.set initialYoffset,initialY-offset2
.set initialZoffset,initialZ-offset2

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start2:
lis codereg2,buttonsaddrhigh
lwz codereg2,buttonsaddrlow(codereg2) #read address that contains current pressed buttons to codereg2
rlwinm. codereg2,codereg2,0,20,20 #check if button A is pressed
beq- endCode #if button A is not pressed, end code
bl 0x04 #move address of next instruction to LR (offsets off of that address will be used for reading initial jump velocity components contained within above C2 code)
offset2:
mflr codereg2 #move LR to codereg2
lwz codereg1,initialXoffset(codereg2) #read initial jump velocity x component
stw codereg1,724(r31) #replace current velocity x component with initial jump velocity x component
lwz codereg1,initialYoffset(codereg2) #read initial jump velocity y component
stw codereg1,728(r31) #replace current velocity y component with initial jump velocity y component
lwz codereg1,initialZoffset(codereg2) #read initial jump velocity z component
stw codereg1,732(r31) #replace current velocity z component with initial jump velocity z component
endCode:
mr r3,r31 #execute instruction originally at this address
end2:
.int align
.balignl 8,0

This assembly can then be put into a program like PyiiASMH (http://wiird.l0nk.org/forum/index.php/topic,4845.0.html) and the assembler will calculate ALL of the offsets, which not saves time but also reduces the chance of the code not working correctly due to human error (it is really easy to accidentally get an offset wrong and very hard to track down where mistakes like that are).


Title: Re: LUT discussion.
Post by: wiiztec on June 08, 2010, 01:31:12 AM
woah 400 lines of assembly? that's like 200 code lines, I hope that already includes yoshi transformations otherwise the final code could be too long to use


Title: Re: LUT discussion.
Post by: dcx2 on June 08, 2010, 02:18:34 AM
A look-up table takes as an input an index into the table.  The output it provides is the value listed at that index.  As a result, an operation that uses a LUT is extremely fast, and extremely flexible; the only operation is a read, and you can put arbitrary data into the table.  This is what enables a code like the Surface Swapper, which permits a many-to-many mapping of surfaces (as opposed to "one to many" - i.e. all surfaces are ground, or "one to one", which would replace only lava with ice).

Unfortunately, in order to use a LUT, your input needs to be a valid index; it must go from 0 to some maximum value and should be continuous.  For large max values, this could make a table intractable, although in some cases you can fit quite a lot of values into a table if they are indexed by byte, in which case 8 values fit on one code line.

As hetoan mentioned, if you have an irregular set of inputs, you can apply a mathematical transform to the inputs so that they are mapped onto 0 -> n, continuous.  For instance, 00 -> 15, 6F -> 80; if the input is less than 16, go straight to the LUT.  If the input is greater than 6E but less than 81, subtract 59 from the input, THEN go to the LUT.

If there's a range of continuous values, you can cut it from the LUT.  For instance, if the first twenty values should look up a 0, you can just test for an input in that range and spit out 0 without going to the table.

If you have a sparse set of inputs, you can iterate through a map.  Split memory into 16-bit chunks; the first 8 bits are the "InputChunk" and the second 8 bits are the "OutputChunk".  Your ASM will check an InputChunk; if it does not match the current input, it will test the next InputChunk.  It will continue until it either finds a matching InputChunk, or it reaches the end of the map.  If it finds a matching InputChunk, it will spit out the corresponding OutputChunk.  If it reaches the end of the map, it leaves the input alone.

This would be the case if there were, say, two hundred surfaces, and I only wanted to swap the deadly surfaces out.  Instead of filling a LUT with 200 possibilities, my map only has the inputs that I want to alter, and their corresponding outputs, without all the values in between that the LUT requires to be continuous.


Title: Re: LUT discussion.
Post by: hetoan2 on June 08, 2010, 10:44:14 AM
@dcx2, I never thought of doing an input/output type system, although if it's not a lot of values you're changing you might be better off doing cmpwi and such.

@brkirch. thanks for that program you listed. Offsets is what always gets me with branches and stuff >_<

it's a shame i didn't see it before :S


Title: Re: LUT discussion.
Post by: hetoan2 on June 10, 2010, 10:43:52 AM
hey. I was just wondering why isn't this working:


bl 0x0004
mflr r17
lwz r0,260(r3)
cmpwi r0,0x4F
bgt branch
addi r17,r17,0x1C
lbzx r0,r17,r0
branch:
b 0x0058

00010203 04050607
08090A0B 0C0D0E0F
10111213 14151617
18191A1B 1C1D1E1F
20212223 24252627
28292A2B 2C2D2E2F
30313233 34353637
38393A3B 3C3D3E3F
40414243 44454647
48494A4B 4C4D4E4F

nop

just wondering how come this doesn't work while cmpwi on individual values does.

I think the asm is used for more than the ammo values i'm trying to swap in and out of it.
that shouldn't matter tho as I haven't changed any of the values.

The routine i wrote over is lwz r0,260(r3) and the next address stores the bites.


Title: Re: LUT discussion.
Post by: James0x57 on June 10, 2010, 11:19:16 AM
That should work fine as it is IF the LR wasn't being used and r17 is free to use.
(As a heads up, C0 codetype uses the LR- but I see you were using C2 for this)

I assume you made sure r17 was free to use so this is how you'd make it okay to change the LR (as you did):

mflr r0 #back up LR
bl 0x0004 #change LR
mflr r17 #back up new LR
mtlr r0 #restore old LR
lwz r0,260(r3)
cmpwi r0,0x4F
bgt branch
addi r17,r17,0x1C
lbzx r0,r17,r0
branch:
b 0x0058 #this goes past the nop, but that should be a branch back to the routine you hijacked, so it's okay

00010203 04050607
08090A0B 0C0D0E0F
10111213 14151617
18191A1B 1C1D1E1F
20212223 24252627
28292A2B 2C2D2E2F
30313233 34353637
38393A3B 3C3D3E3F
40414243 44454647
48494A4B 4C4D4E4F

nop



[edit] Just in case it was a silly mistake instead: make sure you counted your data as part of the lines for the C2 count at the top of your code. ;)


Title: Re: LUT discussion.
Post by: dcx2 on June 10, 2010, 01:54:41 PM
In addition to what James mentions (back up LR in case you're in a leaf function; remember to count the data lines as part of the C2 code's length; your b 0x58 overshoots the nop by one instruction, which should land on the branch back)...

Some instructions react differently in the event of an operand that is r0.  lbzx is one of them

http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/lbzx.html

In this case, if rA == r0, then instead of using the value in r0, it uses an actual 0.  However, it looks like your lbzx is using rB == r0, in which case you should be safe.

Also, be careful; if 260(r3) is signed, then your cmpwi would permit the table to be used with negative indices, which could make problems.


Title: Re: LUT discussion.
Post by: James0x57 on June 10, 2010, 02:59:38 PM
Also, be careful; if 260(r3) is signed, then your cmpwi would permit the table to be used with negative indices, which could make problems.

In which case you should use "cmplwi" instead; As it preforms an unsigned comparison. Thus the only values that make it past the bgt instruction are [0,4F].


Title: Re: LUT discussion.
Post by: hetoan2 on June 10, 2010, 05:26:08 PM
well at least its not freezing now. It's a little buggy for some reason and keeps writing 01 to the address regardless, but I think it's because the instruction is used for multiple things.

when i fix it / need more help i'll let you know

by the way. it froze with lbzx r0,r17,r0

I just added at the beginning mr r15,r0 and mr r0,r15 at the end


Title: Re: LUT discussion.
Post by: dcx2 on June 10, 2010, 05:42:14 PM
Why are you doing mr r15,r0 and mr r0,r15?

Set an execute breakpoint on the instruction immediately before the one your C2 hooks.  When you hit that breakpoint, send the codes to the game while it is paused.  Then go back to the Breakpoint tab and Step Into; you should catch the very first execution of your C2 code.  Follow it through and see what's happening.

You say it freezes on lbzx r0,r17,r0.  Make sure r17+r0 points to your LUT.  If it does, but you still freeze, then you should try using something other than r0.


Title: Re: LUT discussion.
Post by: hetoan2 on June 10, 2010, 06:36:09 PM
That was the case, i cant recall what it was picking up, but meh.

Either way James, with the added instructions in the code, you need to change the line addi r17,r17,0x1C to addi r17,r17,0x20 right?



Title: Re: LUT discussion.
Post by: hetoan2 on June 11, 2010, 01:03:20 AM
So i want the code to only work when r4 is pointing to the address that contains the ammo value which is 8155A640.

So i used this:

mflr r0
bl 0x0004
mflr r17
mtlr r0
lwz r15,260(r3)
cmplwi r15,0x4F
bgt- branch
lis r23,0x8155
ori r23,r23,0x6A40
cmpw r4,r23
bne branch
addi r17,r17,48
lbzx r15,r17,r15
branch:
b 0x0054

data 00-4F

mr r0,r15

which freezes; however, this code works (but is glitchy because the ASM is used in different things as well)


mflr r0
bl 0x0004
mflr r17
mtlr r0
lwz r15,260(r3)
cmplwi r15,0x4F
bgt- branch
addi r17,r17,32
lbzx r15,r17,r15
branch:
b 0x0054

data 00-4F

mr r0,r15

any reason why?

by the way r23 is free as far as i can tell


Title: Re: LUT discussion.
Post by: James0x57 on June 11, 2010, 01:33:44 AM
The "mr r0,r15" is not needed at all..?

Either way James, with the added instructions in the code, you need to change the line addi r17,r17,0x1C to addi r17,r17,0x20 right?
Oh balls, yes you do. =)


You don't need to use r23- I can't see why else it would freeze anyway:

lis r15,0x8155
ori r15,r15,0x6A40
cmpw r4,r15
lwz r15,260(r3)
bne branch

mflr r0
bl 0x0004
mflr r17
mtlr r0
cmplwi r15,0x4F
bgt branch
addi r17,r17,0x1C
lbzx r15,r17,r15
branch:
b 0x0054

data 00-4F

nop


2 more things:
1) Are you sure r17 is free?
2) Are you hijacking into this in the middle of a comparison? You may need to backup the CR (condition register)! [This would be very likely to cause a crash!]



Did you know that you can go to the bp tab on a crash and set a bpe to see where it crashed in certain cases? Might be useful!


Title: Re: LUT discussion.
Post by: hetoan2 on June 11, 2010, 01:38:50 AM
i think that r23 actually wasn't free. i changed it to r19 to be safe which is free.
r17 is definitely free. i changed it to r18 for safety anyways which is also free.
i am not hijacking mid-comparison.
the mr r0,r15 is because i changed the lwz r0 to lwz r15 because lbzx r0 was messing up.
and the code is starting to work better; however if activated at the beginning of the disk boot-up it will freeze when starting a game, but it's good if activated after the game starts :S


Edit: it seems to only freeze when you're not alive. Like when the game is starting, or if you kill yourself.


Title: Re: LUT discussion.
Post by: dcx2 on June 11, 2010, 03:34:25 AM
James made a good point, never hook between a comparison and its branch.

In addition, it's possible that the ASM instructions are being swapped out with something else temporarily, making your branch over-write something unrelated.  I saw Resident Evil 4 switch out ASM depending on what gun was armed.  When the game starts, when the code would normally crash, make sure the ASM is what you expect it to be.  In the event that the ASM is changing, you should be able to use an F2/F4 code to make sure you're only hooking when it's appropriate.

However, much more likely is that registers look safe, but aren't, or change safety over the course of the game's execution.  If you can't ascertain 110% the safety of a register, then don't guess.  Just make a stack frame.  (for completeness sake, this is not how you really should make a stack frame.  Normally, after storing the LR in r0, you would push r0 onto the stack, but we can cheat because we only need to cache the LR for a few instructions.  I also allocate plenty of extra space because I'm paranoid)

---

mflr r0           # save LR
stwu r1,-32(r1)   # allocate room for a stack frame
stmw r29,8(r1)    # make room for local variables

bl 0x04           # get address of next instruction
table_pointer:
mflr r29          # r29 = table pointer
mtlr r0           # restore LR

lwz r30,260(r3)   # load index
cmplwi r30,0x4F   # only interested in r30 <= 0x4F
bgt- PopStackFrame

lis r31,0x8155    # r31 = ammo pointer
ori r31,r31,0x6A40
cmpw r4,r31       # only interested in r4 == ammo pointer
bne- PopStackFrame

addi r29,r29,data_offset-table_pointer
lbzx r30,r29,r30  # do the look-up
b PopStackFrame

data_offset:
.word 0x0001
.word 0x0203
.word 0x0405
.word 0x0607
.word 0x0809
.word 0x0A0B
.word 0x0C0D
.word 0x0E0F
.word 0x1011
.word 0x1213
.word 0x1415
.word 0x1617
.word 0x1819
.word 0x1A1B
.word 0x1C1D
.word 0x1E1F
.word 0x2021
.word 0x2223
.word 0x2425
.word 0x2627
.word 0x2829
.word 0x2A2B
.word 0x2C2D
.word 0x2E2F
.word 0x3031
.word 0x3233
.word 0x3435
.word 0x3637
.word 0x3839
.word 0x3A3B
.word 0x3C3D
.word 0x3E3F

PopStackFrame:
mr r0,r30       # make sure r0 is holding the value of interest
lmw r29,8(r1)   # pop registers
addi r1,r1,32   # release stack frame memory


---

It should turn into the following code.  PyiiASMH will calculate all the offsets for you - even the offset between bl/mflr and the table data!

C2000000 00000012
7C0802A6 9421FFE0
BFA10008 48000005
7FA802A6 7C0803A6
83C30104 281E004F
41810060 3FE08155
63FF6A40 7C04F800
40820050 3BBD0030
7FDDF0AE 48000044
00010203 04050607
08090A0B 0C0D0E0F
10111213 14151617
18191A1B 1C1D1E1F
20212223 24252627
28292A2B 2C2D2E2F
30313233 34353637
38393A3B 3C3D3E3F
7FC0F378 BBA10008
38210020 00000000


Title: Re: LUT discussion.
Post by: James0x57 on June 11, 2010, 04:35:25 AM
James made a good point, never hook between a comparison and its branch.
No no no, never said that. Just back up the CR if you do!


.long 0x00010203
.long 0x04050607
.long 0x08090A0B
.long 0x0C0D0E0F
.long 0x10111213
.long 0x14151617
.long 0x18191A1B
.long 0x1C1D1E1F
.long 0x20212223
.long 0x24252627
.long 0x28292A2B, 0x2C2D2E2F #this works too!
.long 0x30313233, 0x34353637
.long 0x38393A3B, 0x3C3D3E3F

Fixed ;)


And I had no idea you could add or subtract lables (and constants) like that! That's great! Thanks!


Title: Re: LUT discussion.
Post by: brkirch on June 11, 2010, 07:31:36 AM
I would simply recommend avoiding nonvolatile registers (r14-r31) altogether if you don't make a stack frame to backup and restore them.  Nonvolatile registers are not supposed to change between function calls so usually nonvolatile registers are not free unless they are used within the function being hooked and the next instruction in that function with that register overwrites it without using its value.  BTW there shouldn't be a problem with hooking over a comparison as the comparison is executed at the end of the ASM insert (the branch instruction executed at the end of the ASM insert will not change CR).

And I had no idea you could add or subtract lables (and constants) like that! That's great! Thanks!

You obviously missed the post I made earlier in this topic (http://wiird.l0nk.org/forum/index.php/topic,5977.msg51764.html#msg51764)...  ;)


Title: Re: LUT discussion.
Post by: hetoan2 on June 11, 2010, 10:37:27 AM
Thanks for the help you guys :D the code actually worked!

sorry for being such a noob, but could you link me to a topic on how to make a stack frame or explain it.

i've never had to use a stack frame before because generally I have enough free registers, but it should be useful.

I understand the general idea behind a stack frame, i just dont get what's going on and how to apply it in a universal situation.

Also shouldn't this: cmplwi r30,0x4F   be cmplwi r30,0x3F  if your only writing 3F?

not to pick on typos
 


Title: Re: LUT discussion.
Post by: dcx2 on June 11, 2010, 01:08:45 PM
I have been owned on the "don't-hook-a-cmp", haha.

Yeah, hetoan, I typoed the 4f/3f deal.

For more info on stack frames, see section 5 of the PowerPC Application Binary Interface.  It goes into detail about the stack frame and has some pretty pictures to boot!

http://www.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF77852569970071B0D6/$file/eabi_app.pdf (http://www.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF77852569970071B0D6/$file/eabi_app.pdf)

EDIT: brkirch, I think the ASM example you posted with all the .set etc might have been a little too verbose.  It's quite formidable when you're not used to looking at ASM that way.  Despite that, I highly suggest folks go back and carefully read through, because there are a lot of neat tricks, like declaring a "register variable" at the top of the code and then using that register variable so you can easily change what regs it uses.


Title: Re: LUT discussion.
Post by: James0x57 on June 11, 2010, 02:37:24 PM
Haha, thanks Brkirch. ;) I only skimmed the post the first time- was kinda busy and noticed there was a lot to take in!

EDIT: brkirch, I think the ASM example you posted with all the .set etc might have been a little too verbose.  It's quite formidable when you're not used to looking at ASM that way.  Despite that, I highly suggest folks go back and carefully read through, because there are a lot of neat tricks, like declaring a "register variable" at the top of the code and then using that register variable so you can easily change what regs it uses.
Agreed. I went back to it too though and that register variable is what I noticed too. Pretty useful. =)
I think ...we should get the 'code' font a smidge bigger..


Title: Re: LUT discussion.
Post by: hetoan2 on June 11, 2010, 04:36:08 PM
i cant get to that file dcx2


Title: Re: LUT discussion.
Post by: dcx2 on June 11, 2010, 04:38:38 PM
Don't click on it...copy and paste the whole url.

...or I can just fix it so it links properly.

http://www.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF77852569970071B0D6/$file/eabi_app.pdf (http://www.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF77852569970071B0D6/$file/eabi_app.pdf)


Title: Re: LUT discussion.
Post by: hetoan2 on June 11, 2010, 04:59:27 PM
wow i didn't even notice that the link was broken. :\ thanks tho :P


Title: Re: LUT discussion.
Post by: dcx2 on May 22, 2011, 03:12:34 PM
In the Super Mario Galaxy 2 code topic I posted commented disassembly of the moon jump code I created.  However the problem with it was that the offsets used to access the data had to be calculated manually which can be a lot of work for longer codes (like the SMG2 transformation code I will be releasing soon, which has over 400 lines of assembly) and has to be done every time the code length changes.  The solution I found was to rewrite the codes like this:
Code:
/* ASM insert for storing initial jump velocity */

.set codeaddress,0x803A2FE0

.set length,end1-start1
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset1
.set initialYoffset,initialY-offset1
.set initialZoffset,initialZ-offset1

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start1:
bl 0x4 #move address of next instruction to LR (offsets off of that address will be used for storing initial jump velocity components within this code)
offset1:
mflr codereg2 #move LR to codereg2
lwz codereg1,724(r31) #read initial jump velocity x component
stw codereg1,initialXoffset(codereg2) #store a copy of initial jump velocity x component
lwz codereg1,728(r31) #read initial jump velocity y component
stw codereg1,initialYoffset(codereg2) #store a copy of initial jump velocity y component
lwz codereg1,732(r31) #read initial jump velocity z component
stw codereg1,initialZoffset(codereg2) #store a copy of initial jump velocity z component
lis r4,-32660 #execute instruction originally at this address
b end1 #skip section of code used for data
initialX:
.int 0x00000000 #initial jump velocity x component is stored here
initialY:
.int 0x00000000 #initial jump velocity y component is stored here
initialZ:
.int 0x00000000 #initial jump velocity z component is stored here
end1:
.int align
.balignl 8,0

/* ASM insert for replacing jump velocity with initial jump velocity */

.set codeaddress,0x80388E44

.set buttonsaddr,0x80750A00

.if buttonsaddr<<16>>16>=0x8000
.set buttonsaddrhigh,buttonsaddr>>16+1
.set buttonsaddrlow,buttonsaddr<<16>>16-0x10000
.else
.set buttonsaddrhigh,buttonsaddr>>16
.set buttonsaddrlow,buttonsaddr<<16>>16
.endif

.set length,end2-start2
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset2
.set initialYoffset,initialY-offset2
.set initialZoffset,initialZ-offset2

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start2:
lis codereg2,buttonsaddrhigh
lwz codereg2,buttonsaddrlow(codereg2) #read address that contains current pressed buttons to codereg2
rlwinm. codereg2,codereg2,0,20,20 #check if button A is pressed
beq- endCode #if button A is not pressed, end code
bl 0x04 #move address of next instruction to LR (offsets off of that address will be used for reading initial jump velocity components contained within above C2 code)
offset2:
mflr codereg2 #move LR to codereg2
lwz codereg1,initialXoffset(codereg2) #read initial jump velocity x component
stw codereg1,724(r31) #replace current velocity x component with initial jump velocity x component
lwz codereg1,initialYoffset(codereg2) #read initial jump velocity y component
stw codereg1,728(r31) #replace current velocity y component with initial jump velocity y component
lwz codereg1,initialZoffset(codereg2) #read initial jump velocity z component
stw codereg1,732(r31) #replace current velocity z component with initial jump velocity z component
endCode:
mr r3,r31 #execute instruction originally at this address
end2:
.int align
.balignl 8,0

This assembly can then be put into a program like PyiiASMH (http://wiird.l0nk.org/forum/index.php/topic,4845.0.html) and the assembler will calculate ALL of the offsets, which not saves time but also reduces the chance of the code not working correctly due to human error (it is really easy to accidentally get an offset wrong and very hard to track down where mistakes like that are).

I found this again recently and was reminded of the awesomeness of some tricks here.  For instance, you probably didn't notice, but brkirch is writing two C2 codes with his ASM, with one code reading the other code's data area!  This requires using PyiiASMH in RAW mode.  Amazing!  Exactly what I was looking for myself.

However, I noticed one thing that reminded me of a Y.S. technique.  Here is the piece in particular that I'm talking about in brkirch's assembly.


.set buttonsaddr,0x80750A00

.if buttonsaddr<<16>>16>=0x8000
.set buttonsaddrhigh,buttonsaddr>>16+1
.set buttonsaddrlow,buttonsaddr<<16>>16-0x10000
.else
.set buttonsaddrhigh,buttonsaddr>>16
.set buttonsaddrlow,buttonsaddr<<16>>16
.endif

[...]

start2:
lis codereg2,buttonsaddrhigh
lwz codereg2,buttonsaddrlow(codereg2) #read address that contains current pressed buttons to codereg2

This trick is meant to handle sign-extension of the displacement operand.  Y.S. noted via http://www.ibm.com/developerworks/library/l-ppc/ that the @ha symbol can serve this purpose.

.set buttonsaddr,0x80750A00

[...]

start2:
lis codereg2,[email protected]
lwz codereg2,[email protected](codereg2)

---

Using macros, it is possible to "extend" the ASM to make this re-usable.  I call it "lwza" for "lwz absolute".  It will load a 4-byte value from a full 4-byte address

.macro  lwza   reg1,address
lis   \reg1, \[email protected]
lwz \reg1, \[email protected](\reg1)
.endm