Huge Data Copy In ASM

Started by Bully@Wiiplaza, June 17, 2012, 03:23:06 PM

Previous topic - Next topic

Bully@Wiiplaza

I was about to make a "data copy" code in ASM and noticed that it´s really bad to do that many loads and stores. <_<
[spoiler]lwz r5,0(r4)

lis r10, 0x8053
lhz r10, 0x56D2 (r10)

lis r12, 0x9371

cmpwi r10, 0x1000
bne- _END

stw r5, 0 (r12)
lwz r11, 4 (r4)
stw r11, 4 (r12)
lwz r11, 8 (r4)
stw r11, 8 (r12)
lwz r11, 12 (r4)
stw r11, 12 (r12)
lwz r11, 16 (r4)
stw r11, 16 (r12)
lwz r11, 20 (r4)
stw r11, 20 (r12)
lwz r11, 24 (r4)
stw r11, 24 (r12)
lwz r11, 28 (r4)
stw r11, 28 (r12)
lwz r11, 32 (r4)
stw r11, 32 (r12)
lwz r11, 36 (r4)
stw r11, 36 (r12)
lwz r11, 40 (r4)
stw r11, 40 (r12)
lwz r11, 44 (r4)
stw r11, 44 (r12)
lwz r11, 48 (r4)
stw r11, 48 (r12)
lwz r11, 52 (r4)
stw r11, 52 (r12)
lwz r11, 56 (r4)
stw r11, 56 (r12)
lwz r11, 60 (r4)
stw r11, 60 (r12)
lwz r11, 64 (r4)
stw r11, 64 (r12)
lwz r11, 68 (r4)
stw r11, 68 (r12)
lwz r11, 72 (r4)
stw r11, 72 (r12)
lwz r11, 76 (r4)
stw r11, 76 (r12)
lwz r11, 80 (r4)
stw r11, 80 (r12)
lwz r11, 84 (r4)
stw r11, 84 (r12)
lwz r11, 88 (r4)
stw r11, 88 (r12)
lwz r11, 92 (r4)
stw r11, 92 (r12)
lwz r11, 96 (r4)
stw r11, 96 (r12)
lwz r11, 100 (r4)
stw r11, 100 (r12)
lwz r11, 104 (r4)
stw r11, 104 (r12)
lwz r11, 108 (r4)
stw r11, 108 (r12)
lwz r11, 112 (r4)
stw r11, 112 (r12)
lwz r11, 116 (r4)
stw r11, 116 (r12)
lwz r11, 120 (r4)
stw r11, 120 (r12)
lwz r11, 124 (r4)
stw r11, 124 (r12)
lwz r11, 126 (r4)
stw r11, 126 (r12)
lwz r11, 130 (r4)
stw r11, 130 (r12)
lwz r11, 134 (r4)
stw r11, 134 (r12)
lwz r11, 138 (r4)
stw r11, 138 (r12)
lwz r11, 142 (r4)
stw r11, 142 (r12)
lwz r11, 146 (r4)
stw r11, 146 (r12)
lwz r11, 150 (r4)
stw r11, 150 (r12)
lwz r11, 154 (r4)
stw r11, 154 (r12)
lwz r11, 158 (r4)
stw r11, 158 (r12)
lwz r11, 162 (r4)
stw r11, 162 (r12)
lwz r11, 166 (r4)
stw r11, 166 (r12)
lwz r11, 170 (r4)
stw r11, 170 (r12)
lwz r11, 174 (r4)
stw r11, 174 (r12)
lwz r11, 178 (r4)
stw r11, 178 (r12)
lwz r11, 182 (r4)
stw r11, 182 (r12)
lwz r11, 186 (r4)
stw r11, 186 (r12)
lwz r11, 190 (r4)
stw r11, 190 (r12)
lwz r11, 194 (r4)
stw r11, 194 (r12)
lwz r11, 198 (r4)
stw r11, 198 (r12)
lwz r11, 202 (r4)
stw r11, 202 (r12)
lwz r11, 208 (r4)
stw r11, 208 (r12)
lwz r11, 212 (r4)
stw r11, 212 (r12)
lwz r11, 216 (r4)
stw r11, 216 (r12)

_END:

cmpwi r10, 0x10
bne- _ENDO

lwz r11, 0 (r12)
stw r11, 0 (r4)
lwz r11, 4 (r12)
stw r11, 4 (r4)
lwz r11, 8 (r12)
stw r11, 8 (r4)
lwz r11, 12 (r12)
stw r11, 12 (r4)
lwz r11, 16 (r12)
stw r11, 16 (r4)
lwz r11, 20 (r12)
stw r11, 20 (r4)
lwz r11, 24 (r12)
stw r11, 24 (r4)
lwz r11, 28 (r12)
stw r11, 28 (r4)
lwz r11, 32 (r12)
stw r11, 32 (r4)
lwz r11, 36 (r12)
stw r11, 36 (r4)
lwz r11, 40 (r12)
stw r11, 40 (r4)
lwz r11, 44 (r12)
stw r11, 44 (r4)
lwz r11, 48 (r12)
stw r11, 48 (r4)
lwz r11, 52 (r12)
stw r11, 52 (r4)
lwz r11, 56 (r12)
stw r11, 56 (r4)
lwz r11, 60 (r12)
stw r11, 60 (r4)
lwz r11, 64 (r12)
stw r11, 64 (r4)
lwz r11, 68 (r12)
stw r11, 68 (r4)
lwz r11, 72 (r12)
stw r11, 72 (r4)
lwz r11, 76 (r12)
stw r11, 76 (r4)
lwz r11, 80 (r12)
stw r11, 80 (r4)
lwz r11, 84 (r12)
stw r11, 84 (r4)
lwz r11, 88 (r12)
stw r11, 88 (r4)
lwz r11, 92 (r12)
stw r11, 92 (r4)
lwz r11, 96 (r12)
stw r11, 96 (r4)
lwz r11, 100 (r12)
stw r11, 100 (r4)
lwz r11, 104 (r12)
stw r11, 104 (r4)
lwz r11, 108 (r12)
stw r11, 108 (r4)
lwz r11, 112 (r12)
stw r11, 112 (r4)
lwz r11, 116 (r12)
stw r11, 116 (r4)
lwz r11, 120 (r12)
stw r11, 120 (r4)
lwz r11, 124 (r12)
stw r11, 124 (r4)
lwz r11, 126 (r12)
stw r11, 126 (r4)
lwz r11, 130 (r12)
stw r11, 130 (r4)
lwz r11, 134 (r12)
stw r11, 134 (r4)
lwz r11, 138 (r12)
stw r11, 138 (r4)
lwz r11, 142 (r12)
stw r11, 142 (r4)
lwz r11, 146 (r12)
stw r11, 146 (r4)
lwz r11, 150 (r12)
stw r11, 150 (r4)
lwz r11, 154 (r12)
stw r11, 154 (r4)
lwz r11, 158 (r12)
stw r11, 158 (r4)
lwz r11, 162 (r12)
stw r11, 162 (r4)
lwz r11, 166 (r12)
stw r11, 166 (r4)
lwz r11, 170 (r12)
stw r11, 170 (r4)
lwz r11, 174 (r12)
stw r11, 174 (r4)
lwz r11, 178 (r12)
stw r11, 178 (r4)
lwz r11, 182 (r12)
stw r11, 182 (r4)
lwz r11, 186 (r12)
stw r11, 186 (r4)
lwz r11, 190 (r12)
stw r11, 190 (r4)
lwz r11, 194 (r12)
stw r11, 194 (r4)
lwz r11, 198 (r12)
stw r11, 198 (r4)
lwz r11, 202 (r12)
stw r11, 202 (r4)
lwz r11, 208 (r12)
stw r11, 208 (r4)
lwz r11, 212 (r12)
stw r11, 212 (r4)
lwz r11, 216 (r12)
stw r11, 216 (r4)

_ENDO:[/spoiler]
They are all in a row in memory. Is there a good way to shorten this?
This code basically stores the data to a random ram location by command and reads it again by command.

Thanks!
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

megazig


Bully@Wiiplaza

Quote from: megazig on June 17, 2012, 04:53:34 PM
see: memcpy
and that should help, when I don´t know how to do a mem copy?
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

megazig

the function memcpy. it's in every wii/gc game at around 0x80004040-ish.

int memcpy(void* out, void* in, int len);

dcx2

Without getting into too much detail, li the number of words you want to copy, mtctr, lwz, stwu, bdnz to the lwz.

Here's a snippet from my SMG2 teleport/levitation code.

# set up a loop
# note that lfsu requires pointing 4 bytes _before_ the first value
li r0,3
mtctr r0
addi r31,r12,XYZ_ORIENT_OFFSET-4
subi r12,r30,4      # work on a cached copy of r30

# f0 has the scale value
# r31 has p1 wiimote orientation pointer
# r12 has a pointer to code's stored coordinates
TOP_OF_LOOP:
lfsu f1,4(r31)
lfsu f2,4(r12)
fnmsubs f1,f0,f1,f2   # f1 = coords â€" (scale * orientation)
stfs f1,0(r12)
bdnz+ TOP_OF_LOOP

Bully@Wiiplaza

#5
Ah, you got me an idea, dcx2. ;D

[spoiler]lwz r5,0(r4) #default instruction

lis r10, 0x8053
lhz r10, 0x56D2 (r10) # Set up button activator

lis r12, 0x9371 # unused RAM location

cmpwi r10, 0x1000
bne- NOCOPY

li r11, 0 # Set counter to 0

LOOP:

lwz 9, 0 (r4) # Load data
stw r9, 0 (r12) # Store data

addi r11, r11, 1 # increase counter
addi r4, r4, 4 # increase source reg
addi r12, r12, 4 # increase destination reg

cmpwi r11, 54 # 54 copies
ble- LOOP # Loop until before the 55. copy

subi r4, r4, 216 # restore r4
subi r12, r12, 216 # restore r12

NOCOPY: # go here if button isn´t pressed

cmpwi r10, 0x10
bne- NOSTORE

li r11, 0

LOOP2:

lwz 9, 0 (r12) # regs switched
stw r9, 0 (r4)

addi r11, r11, 1
addi r4, r4, 4
addi r12, r12, 4

cmpwi r11, 54
ble- LOOP2

subi r4, r4, 216 # No need for fixing r12

NOSTORE:
[/spoiler]This possibly works, I couldn´t test.
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

dcx2

Try reading my example again.  These instructions are totally unnecessary and you could make this code shorter.

addi r11, r11, 1 # increase counter <-- handled by bdnz
addi r4, r4, 4 # increase source reg <-- handled by lwzu
addi r12, r12, 4 # increase destination reg <-- handled by stwu

cmpwi r11, 54 # 54 copies <-- handled by bdnz

Bully@Wiiplaza

Quote from: dcx2 on June 18, 2012, 03:21:15 AM
Without getting into too much detail, li the number of words you want to copy, mtctr, lwzu, stwu, bdnz to the lwzu.

Shouldn´t that be lwzu? In your last post you mentioned this:

Quote from: dcx2 on June 19, 2012, 03:29:32 PM
addi r4, r4, 4 # increase source reg <-- handled by lwzu

lis r12, 0x9371
li r11, 54
mtctr r11
_LOOP:
lwzu r9, 0 (r4)
stwu r9, 0 (r12)
bdnz+ _LOOP
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

dcx2

Yes, it should have been lwzu, my bad.

It seems you don't fully understand how u-suffixed instructions work.

lwzu rX,d(rY)

is equivalent to

lwz rX,d(rY)
addi rY, rY, d

Do you see why lwzu with d=0 will do nothing, now?

This is also why you must point to d bytes before the intended address when your loop starts.

Bully@Wiiplaza

#9
Quote from: dcx2 on June 19, 2012, 06:38:20 PM
Do you see why lwzu with d=0 will do nothing, now?

This is also why you must point to d bytes before the intended address when your loop starts.
Ah, yes! Now I got it.
It should start at offset 0 and I need to add 4 each time it runs.

lis r12, 0x9371
li r11, 54
mtctr r11

subi r4, r4, 4
subi r12, r12, 4

_LOOP:

lwzu r9, 4 (r4)
stwu r9, 4 (r12)
bdnz+ _LOOP

addi r4, r4, 216
addi r12, r12, 216

Hmm... maybe still not great because of two addi´s and two subi´s for fixing the registers.
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

dcx2

#10
subi goes before loop.


lis r12, 0x9371
li r11, 54
mtctr r11
subi r4, r4, 4
# subi r12, r12, 4 # probably unnecessary since it doesn't have to be at this exact address

_LOOP:
lwzu r9, 4 (r4)
stwu r9, 4 (r12)
bdnz+ _LOOP

EDIT:

Taken a bit further, so you don't have to restore r4 at the end of your code...


lis r12, 0x9371
li r11, 54
mtctr r11
subi r11, r4, 4

_LOOP:
lwzu r9, 4 (r11)
stwu r9, 4 (r12)
bdnz+ _LOOP


You also have two loops in your original code.  You may be able to save some code space by swapping r11 and r12 (depending on your button activator) before executing the loop.

Bully@Wiiplaza

#11
Quote from: dcx2 on June 19, 2012, 07:03:52 PM
subi goes before loop.
Yeah, I remember this right before I saw you post.

Thank you![spoiler]lwz r5,0(r4)

lis r10, 0x8053
lhz r10, 0x56D2 (r10)

lis r12, 0x9371

cmpwi r10, 0x1000
bne- NOCOPY

li r11, 54
mtctr r11
subi r11, r4, 4

_LOOP:
lwzu r9, 4 (r11)
stwu r9, 4 (r12)
bdnz+ _LOOP

NOCOPY:

cmpwi r10, 0x10
bne- NOSTORE

li r11, 54
mtctr r11
subi r11, r4, 4

_LOOP:
lwzu r9, 4 (r12)
stwu r9, 4 (r11)
bdnz+ _LOOP

NOSTORE:[/spoiler]Btw. I noticed that every game (I looked at) has empty memory at 93700000.
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

conanac

#12
In case you want to search for the signature or pattern of memcpy function, here is the one from sysmenu (3.3E):

[spoiler]
cmplw r4,r3
blt- 0x28
subi r4,r4,1
subi r6,r3,1
addi r5,r5,1
b 0x0C
lbzu r0,1(r4)
stbu r0,1(r6)
subic. r5,r5,1
bne+ 0xFFFFFFF4
blr
add r4,r4,r5
add r6,r3,r5
addi r5,r5,1
b 0x0C
lbzu r0,-1(r4)
stbu r0,-1(r6)
subic. r5,r5,1
bne+ 0xFFFFFFF4
blr
[/spoiler]

I think you perhaps could use lbzu r0,1(r4) and stbu r0,1(r6) for searching this memcpy function
(i.e. bytes: 8C040001 9C060001) in the games of interest (e.g. animal crossing has this function
at memory location 0x80004338, pikmin2 ntsc wii has this function at 0x80005FB4).

And like megazig said, you could just call this function (by giving appropriate argument), or you could
create your own routine similar to this one (which you already tried).

Cheers.



Bully@Wiiplaza

Creating your own is more secure, using the in-game one is even greater because of optimized code lengh.
Thanks for your precise examples on this, though!
My Wii hacking site...
http://bullywiihacks.com/

My youtube account with a lot of hacking videos...
http://www.youtube.com/user/BullyWiiPlaza

~Bully

Stuff

Quote from: conanac on June 20, 2012, 12:49:38 AM
In case you want to search for the signature or pattern of memcpy function, here is the one from sysmenu (3.3E):

[spoiler]
cmplw r4,r3
blt- 0x28
subi r4,r4,1
subi r6,r3,1
addi r5,r5,1
b 0x0C
lbzu r0,1(r4)
stbu r0,1(r6)
subic. r5,r5,1
bne+ 0xFFFFFFF4
blr
add r4,r4,r5
add r6,r3,r5
addi r5,r5,1
b 0x0C
lbzu r0,-1(r4)
stbu r0,-1(r6)
subic. r5,r5,1
bne+ 0xFFFFFFF4
blr
[/spoiler]

I think you perhaps could use lbzu r0,1(r4) and stbu r0,1(r6) for searching this memcpy function
(i.e. bytes: 8C040001 9C060001) in the games of interest (e.g. animal crossing has this function
at memory location 0x80004338, pikmin2 ntsc wii has this function at 0x80005FB4).

And like megazig said, you could just call this function (by giving appropriate argument), or you could
create your own routine similar to this one (which you already tried).

Cheers.




I saw this a few times in MH3 when I was searching for....some shady stuff. Now I know what that is. Pretty sweet.

I don't think it would optimize code length, though. This is "CopyNullTerminatedString" that I put in my FriendSwap Code. It would be the same length if I did "CopyXBytes" instead. Maybe just one more instruction.

_READ:
lbzu r5, 1(r6)
stbu r5, 1(r7)
cmplwi r5, 0
bne -0xC
blr

To call this I just gotta put the pointers into r6 and r7 and then bl _READ. I chose r5, r6, and r7 because they were safe at the time.

Using the built in function would be cool because you don't have to write your own, but you have to make sure r3-r6(maybe not 3 but it looks like it needs something) are safe most likely by backing them up somewhere and then you can pass the arguments. And you'll have to load a register with a pointer to the function and then mtctr/mtlr and finally you can blsomething. And when you're done copying, you'll have to restore r3-r6.

Here's a really bad compare and contrast I just came up with in a few minutes.

Use game function
[spoiler]backup and restore-like 6-8 mr instructions unless there's a better way.
2 instructions to load a pointer into some register.
mtctr/mtlr
blrl/bctrl[/spoiler]

Shared
[spoiler]The arguments look like it's r3 and r4. So you gotta put your pointers there. You gotta do that with the built in function too. Not sure what r5 is for. Up to 4 instructions depending on where the pointer is coming from.
[/spoiler]
Create your own[spoiler]
If your using safe registers, you shouldn't have to back up any way.
bl _READ
_READ can be 5 or 6 instructions. (This is a one time thing. If you need to use the copy function multiple times in your code, you probably have to do all the previous steps each time).[/spoiler]


So 10-11 instructions with an additional 5 instructions for every other time it's used compared to 14-16 instructions with an additional 14-16 instructions for every other time it's used. :/ And I really wanted to use memcpy.
.make Stuff happen.
Dropbox. If you don't have one, get it NOW! +250MB free if you follow my link :p.

Mod code Generator ~50% complete but very usable:
http://dl.dropbox.com/u/24514984/modcodes/modcodes.htm