Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf
Скачиваний:
12
Добавлен:
23.08.2013
Размер:
814.91 Кб
Скачать

An example of an assembly language function library that can be called from many different languages and platforms can be found in www.agner.org/random/randoma.zip.

5 Debugging and verifying assembly code

Debugging assembly code can be quite hard and frustrating, as you probably already have discovered. I would recommend that you start with writing the piece of code you want to optimize as a subroutine in C++. Next, write a test program that can test your subroutine thoroughly. Make sure the test program goes into all branches and boundary cases.

When your C++ subroutine works with your test program then you are ready to translate the code to assembly language. Most C++ compilers can translate C++ to assembly.

Now you can start to optimize. Each time you have made a modification, you should run it on the test program to see if it works correctly. Number all your versions and save them so that you can go back and test them again in case you discover an error that the test program didn't catch (such as writing to a wrong address).

Test the speed of the most critical part of your program with the methods described in chapter 20 page 132. If the code is significantly slower than expected, then check the list of possible bottlenecks on page 75 for PPro, P2 and P3, and page 95 for P4.

Highly optimized code tends to be very difficult to read and understand for others, and even for yourself when you get back to it after some time. In order to make it possible to maintain the code, it is important that you organize it into small logical units (procedures or macros) with a well-defined interface and appropriate comments. The more complicated the code is to read, the more important is a good documentation.

6 Reducing code size

As explained in chapter 9 page 29, the code cache is 8 or 16 kb on P1, PMMX, PPro, P2 and P3. If you have problems keeping the critical parts of your code within the code cache, then you may consider reducing the size of your code. You may also want to reduce the size of your code if speed is not important.

32-bit code is usually bigger than 16-bit code because addresses and data constants take 4 bytes in 32-bit code and only 2 bytes in 16-bit code. However, 16-bit code has other penalties, especially because of segment prefixes. Some other methods for reducing the size or your code are discussed below.

Both jump addresses, data addresses, and data constants take less space if they can be expressed as a sign-extended byte, i.e. if they are within the interval from -128 to +127.

For jump addresses, this means that short jumps take two bytes of code, whereas jumps beyond 127 bytes take 5 bytes if unconditional and 6 bytes if conditional.

Likewise, data addresses take less space if they can be expressed as a pointer and a displacement between -128 and +127. Example:

MOV EBX,DS:[100000] / ADD EBX,DS:[100004] ; 12 bytes

Reduce to:

MOV EAX,100000 / MOV EBX,[EAX] / ADD EBX,[EAX+4] ; 10 bytes

The advantage of using a pointer obviously increases if you use it many times. Storing data on the stack and using EBP or ESP as pointer will thus make your code smaller than if you

use static memory locations and absolute addresses, provided of course that your data are within +/-127 bytes of the pointer. Using PUSH and POP to write and read temporary data is even shorter.

Data constants may also take less space if they are between -128 and +127. Most instructions with immediate operands have a short form where the operand is a signextended single byte. Examples:

PUSH 200

; 5

bytes

PUSH 100

; 2

bytes

ADD

EBX,128

;

6

bytes

SUB

EBX,-128

;

3

bytes

The most important instruction with an immediate operand that does not have such a short form is MOV. Examples:

MOV EAX, 0

; 5 bytes

May be changed to:

SUB EAX,EAX

; 2 bytes

And

MOV EAX, 1

; 5 bytes

May be changed to:

SUB EAX,EAX / INC EAX

; 3 bytes

or:

PUSH 1 / POP EAX

; 3 bytes

And

MOV EAX, -1

; 5 bytes

May be changed to:

OR EAX, -1

; 3 bytes

If the same address or constant is used more than once then you may load it into a register. A MOV with a 4-byte immediate operand may sometimes be replaced by an arithmetic instruction if the value of the register before the MOV is known. Example:

MOV

[mem1],200

; 10 bytes

MOV

[mem2],200

; 10

bytes

MOV

[mem3],201

; 10

bytes

MOV

EAX,100

;

5

bytes

MOV

EBX,150

;

5

bytes

Assuming that mem1 and mem3 are both within -128/+127 bytes of mem2, this may be changed to:

MOV

EBX,OFFSET mem2

;

5

bytes

MOV

EAX,200

;

5

bytes

MOV

[EBX+mem1-mem2],EAX

;

3

bytes

MOV

[EBX],EAX

;

2

bytes