LUT discussion.

wiiztec · June 05, 2010, 04:54:27 AM

What the hell is LUT?

Romaap · June 05, 2010, 10:29:07 AM

Lookup Table, its just a chunk of values.
These values can then be used by doing something like:


if button is pressed
increase X
endif
write ( [start_of_lookup_table] + X ) to ________

wiiztec · June 05, 2010, 01:52:19 PM

Is it static in memory with a bunch of floating points? if so i've used values from it in several of my codes

dcx2 · June 05, 2010, 02:07:40 PM

A Look-Up Table (LUT) is a way to speed up complex calculations. For example, when you punch in cos 30 into your calculator, it uses the 30 as an index into a cosine LUT, and at index 30 is the resulting value. When you "look up" the value for 30, you will get the resulting value for cos 30. (this is actually a lie, real calculators probably use LUT/interpolation or better yet Taylor expansions)

The blue lines in my Surface Swapper are the LUT embedded in the code (blr/mflr technique courtesy of brkirch). The ASM uses the current surface value to index into the LUT, and then use the value at that index as the new surface value.

If you didn't modify the LUT, then when you stand on, say, lava (0x0A)....the lbzx will index to the 11th cell of the array (arrays start counting at 0). The unmodified code naturally has an 0x0A at this index. If you want to replace lava with ice, you would simply change the 0x0A in the LUT into 0x21. Then, when the lbzx indexes to the 11th cell, it finds a 0x21 instead of an 0x0A. This is how it does 0x0A -> 0x21. The best part about the LUT is that you can do this for EVERY surface individually.

hetoan2 · June 07, 2010, 12:43:42 AM

@dcx2,

how would you go about making a look up table for any general situation?

I'm not really sure what this is all about:

bl 0x0004 # get pointer to next instruction
mflr r4 # put pointer in r4

are those specific to mario galaxy 2 or what? i don't really understand what it is doing.

EDIT: I think those are to figure out the addresses that change because of the position it is loaded into the code handler...
am i right?

Also is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S

Romaap · June 07, 2010, 06:38:03 AM

Quote from: hetoan2 on June 07, 2010, 12:43:42 AM
@dcx2,

how would you go about making a look up table for any general situation?

I'm not really sure what this is all about:

bl 0x0004 # get pointer to next instruction
mflr r4 # put pointer in r4

are those specific to mario galaxy 2 or what? i don't really understand what it is doing.

EDIT: I think those are to figure out the addresses that change because of the position it is loaded into the code handler...
am i right?

Also is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S

Your edit is right, what that does is branch to Program Counter + 4 (the next instruction) and puts the next instruction's address in the Link Register.
The second line copies the Link Register to r4, so now r4 will hold the address of the code.

dcx2 · June 07, 2010, 04:27:17 PM

hetoan, the bl/mflr technique is not game-specific, nor is the concept of a LUT, but it's extremely off-topic here...if you start a thread in Wii Game Hacking Help, I will elaborate. A LUT for 0x00 through 0xFF could be large, but there are methods for mitigating the size, and there are also situations where a LUT won't work at all, but sometimes there are other ways to use something like a LUT.

hetoan2 · June 07, 2010, 06:28:23 PM

Split topic. My apologies on not doing so before.

dcx2 · June 07, 2010, 07:36:10 PM

Oh neat, I thought you could do that but I wasn't sure exactly how. I could have used this Split Topic thing before...I'll give a thorough explanation here, but some of this has already been covered.

bl = branch and link. Branch means to jump program execution to the specified address. The currently executing instruction's address is stored in what is generically referred to as the Program Counter (the register is SRR0). Normally, after an instruction is executed, the Program Counter (PC) is incremented by 4 (the number of bytes in an instruction; since the PowerPC is a RISC architecture, all instructions are the same size). However, for a branch, the PC is incremented or decremented by a different value (relative branch), or it might even be overwritten altogether (absolute branch).

The Link part comes in with the Link Register, LR. Usually we just branch and we never go back to the branch point. Sometimes, after branching somewhere, you want to come back to where you were before the branch (for example, function calls). The bl instruction will put PC + 4 into the LR (i.e. the address of the next instruction), so that execution can continue when the program "links back" using blr.

The beauty of bl 4/mflr is that it branches to PC + 4, which is what normal program execution will do (so we do not need the matching blr), but it also puts PC + 4 into the LR. This is what provides us with the location of the instructions in memory. It's a lot like the 4E code type in that regard. We then use mflr (Move From Link Register) to get the address of the instruction after the bl (i.e. the mflr)

brkirch demonstrated this technique with his Moon Jump code for Super Mario Galaxy 2. Instead of using three Gecko Registers or some arbitrary (hopefully unused) address to store the "launch velocity vector" of the jump, he padded his C2 code with some blank lines, and made the C2 code branch over the blank lines. Now he has a very small "data area" where he can store the velocity vector, literally embedded in the C2 code. In order to gain access to this data (because as you said, it will change location depending on the codes that you are using, etc) he needs a pointer, and he gets the pointer using bl/mflr.

I will post again with more details specifically about how to use a Look Up Table, what the limitations are, and some ways to get around those limitations.

hetoan2 · June 07, 2010, 08:37:05 PM

Ah yes, I completely understand now :D

thanks.

I mean you can skip a set of values in theory using cmpwi's and then if its in the next range you just do some set mathematical operations to make it uniform no?

that way if it goes from 00 - 15 and then from 6F - 80 you can just have it recognize when the value is at 6F or above and then have it follow the rest of the table, without some 20 extra lines of code.

As long as the table swaps the values then its okay.

Thanks... Also, when i split the topics it put this under Codes, I can move it to Wii Hacking/Help if you'd like, I didn't want to move it in case you end up losing this thread's location :S

brkirch · June 08, 2010, 02:06:35 AM

Quote from: hetoan2 on June 07, 2010, 12:43:42 AMAlso is there a limit to the number of lines a C2 code can have? I know i tried to make a very long (150ish lines) code that was 06/07 codetype, but it just froze. So there has to be a limit for that right? So how about C2?

If you made a LUT for all values from 00 - FF it would be very long, thats why i'm asking :S

There is no limit other than what can fit into memory. That is why the F2/F4 code types were necessary instead of having the XOR checksums simply added directly to the C2 code type (the second part of a C2 code is read as a 32-bit line count).

Quote from: paprika_killer on June 07, 2010, 09:33:05 PMAlso, bl is often used to call "functions" right?

Yes, although sometimes bctrl or even bctr is used instead. The LR register value is then stored to the stack if the called function contains more function calls. If the LR register value is not stored to the stack and you are using a lookup table in a C2 code then it is necessary to move the LR register to another register using the mflr instruction before using bl 0x4 and mflr and then restore the LR register value at the end of the code using mtlr.

In the Super Mario Galaxy 2 code topic I posted commented disassembly of the moon jump code I created. However the problem with it was that the offsets used to access the data had to be calculated manually which can be a lot of work for longer codes (like the SMG2 transformation code I will be releasing soon, which has over 400 lines of assembly) and has to be done every time the code length changes. The solution I found was to rewrite the codes like this:

Code Select

/* ASM insert for storing initial jump velocity */

.set codeaddress,0x803A2FE0

.set length,end1-start1
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset1
.set initialYoffset,initialY-offset1
.set initialZoffset,initialZ-offset1

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start1:
bl 0x4 #move address of next instruction to LR (offsets off of that address will be used for storing initial jump velocity components within this code)
offset1:
mflr codereg2 #move LR to codereg2
lwz codereg1,724(r31) #read initial jump velocity x component
stw codereg1,initialXoffset(codereg2) #store a copy of initial jump velocity x component
lwz codereg1,728(r31) #read initial jump velocity y component
stw codereg1,initialYoffset(codereg2) #store a copy of initial jump velocity y component
lwz codereg1,732(r31) #read initial jump velocity z component
stw codereg1,initialZoffset(codereg2) #store a copy of initial jump velocity z component
lis r4,-32660 #execute instruction originally at this address
b end1 #skip section of code used for data
initialX:
.int 0x00000000 #initial jump velocity x component is stored here
initialY:
.int 0x00000000 #initial jump velocity y component is stored here
initialZ:
.int 0x00000000 #initial jump velocity z component is stored here
end1:
.int align
.balignl 8,0

/* ASM insert for replacing jump velocity with initial jump velocity */

.set codeaddress,0x80388E44

.set buttonsaddr,0x80750A00

.if buttonsaddr<<16>>16>=0x8000
.set buttonsaddrhigh,buttonsaddr>>16+1
.set buttonsaddrlow,buttonsaddr<<16>>16-0x10000
.else
.set buttonsaddrhigh,buttonsaddr>>16
.set buttonsaddrlow,buttonsaddr<<16>>16
.endif

.set length,end2-start2
.set align,(length%8==0)*-0x60000000
.set numlines,(length+4)/8+(length%8==0)*-1

.set initialXoffset,initialX-offset2
.set initialYoffset,initialY-offset2
.set initialZoffset,initialZ-offset2

.set codereg1,3
.set codereg2,4

.int codeaddress<<7>>7|0xC2000000
.int numlines

start2:
lis codereg2,buttonsaddrhigh
lwz codereg2,buttonsaddrlow(codereg2) #read address that contains current pressed buttons to codereg2
rlwinm. codereg2,codereg2,0,20,20 #check if button A is pressed
beq- endCode #if button A is not pressed, end code
bl 0x04 #move address of next instruction to LR (offsets off of that address will be used for reading initial jump velocity components contained within above C2 code)
offset2:
mflr codereg2 #move LR to codereg2
lwz codereg1,initialXoffset(codereg2) #read initial jump velocity x component
stw codereg1,724(r31) #replace current velocity x component with initial jump velocity x component
lwz codereg1,initialYoffset(codereg2) #read initial jump velocity y component
stw codereg1,728(r31) #replace current velocity y component with initial jump velocity y component
lwz codereg1,initialZoffset(codereg2) #read initial jump velocity z component
stw codereg1,732(r31) #replace current velocity z component with initial jump velocity z component
endCode:
mr r3,r31 #execute instruction originally at this address
end2:
.int align
.balignl 8,0

This assembly can then be put into a program like PyiiASMH and the assembler will calculate ALL of the offsets, which not saves time but also reduces the chance of the code not working correctly due to human error (it is really easy to accidentally get an offset wrong and very hard to track down where mistakes like that are).

wiiztec · June 08, 2010, 02:31:12 AM

woah 400 lines of assembly? that's like 200 code lines, I hope that already includes yoshi transformations otherwise the final code could be too long to use

dcx2 · June 08, 2010, 03:18:34 AM

A look-up table takes as an input an index into the table. The output it provides is the value listed at that index. As a result, an operation that uses a LUT is extremely fast, and extremely flexible; the only operation is a read, and you can put arbitrary data into the table. This is what enables a code like the Surface Swapper, which permits a many-to-many mapping of surfaces (as opposed to "one to many" - i.e. all surfaces are ground, or "one to one", which would replace only lava with ice).

Unfortunately, in order to use a LUT, your input needs to be a valid index; it must go from 0 to some maximum value and should be continuous. For large max values, this could make a table intractable, although in some cases you can fit quite a lot of values into a table if they are indexed by byte, in which case 8 values fit on one code line.

As hetoan mentioned, if you have an irregular set of inputs, you can apply a mathematical transform to the inputs so that they are mapped onto 0 -> n, continuous. For instance, 00 -> 15, 6F -> 80; if the input is less than 16, go straight to the LUT. If the input is greater than 6E but less than 81, subtract 59 from the input, THEN go to the LUT.

If there's a range of continuous values, you can cut it from the LUT. For instance, if the first twenty values should look up a 0, you can just test for an input in that range and spit out 0 without going to the table.

If you have a sparse set of inputs, you can iterate through a map. Split memory into 16-bit chunks; the first 8 bits are the "InputChunk" and the second 8 bits are the "OutputChunk". Your ASM will check an InputChunk; if it does not match the current input, it will test the next InputChunk. It will continue until it either finds a matching InputChunk, or it reaches the end of the map. If it finds a matching InputChunk, it will spit out the corresponding OutputChunk. If it reaches the end of the map, it leaves the input alone.

This would be the case if there were, say, two hundred surfaces, and I only wanted to swap the deadly surfaces out. Instead of filling a LUT with 200 possibilities, my map only has the inputs that I want to alter, and their corresponding outputs, without all the values in between that the LUT requires to be continuous.

hetoan2 · June 08, 2010, 11:44:14 AM

@dcx2, I never thought of doing an input/output type system, although if it's not a lot of values you're changing you might be better off doing cmpwi and such.

@brkirch. thanks for that program you listed. Offsets is what always gets me with branches and stuff >_<

it's a shame i didn't see it before :S

hetoan2 · June 10, 2010, 11:43:52 AM

hey. I was just wondering why isn't this working:

bl 0x0004
mflr r17
lwz r0,260(r3)
cmpwi r0,0x4F
bgt branch
addi r17,r17,0x1C
lbzx r0,r17,r0
branch:
b 0x0058

00010203 04050607
08090A0B 0C0D0E0F
10111213 14151617
18191A1B 1C1D1E1F
20212223 24252627
28292A2B 2C2D2E2F
30313233 34353637
38393A3B 3C3D3E3F
40414243 44454647
48494A4B 4C4D4E4F

nop

just wondering how come this doesn't work while cmpwi on individual values does.

I think the asm is used for more than the ammo values i'm trying to swap in and out of it.
that shouldn't matter tho as I haven't changed any of the values.

The routine i wrote over is lwz r0,260(r3) and the next address stores the bites.

WiiRd forum

News:

LUT discussion.

wiiztec

Romaap

wiiztec

dcx2

hetoan2

Romaap

dcx2

hetoan2

dcx2

hetoan2

brkirch

wiiztec

dcx2

hetoan2

hetoan2