Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

814.91 Кб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 78 / 438 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

An example of an assembly language function library that can be called from many different languages and platforms can be found in www.agner.org/random/randoma.zip.

5 Debugging and verifying assembly code

Debugging assembly code can be quite hard and frustrating, as you probably already have discovered. I would recommend that you start with writing the piece of code you want to optimize as a subroutine in C++. Next, write a test program that can test your subroutine thoroughly. Make sure the test program goes into all branches and boundary cases.

When your C++ subroutine works with your test program then you are ready to translate the code to assembly language. Most C++ compilers can translate C++ to assembly.

Now you can start to optimize. Each time you have made a modification, you should run it on the test program to see if it works correctly. Number all your versions and save them so that you can go back and test them again in case you discover an error that the test program didn't catch (such as writing to a wrong address).

Test the speed of the most critical part of your program with the methods described in chapter 20 page 132. If the code is significantly slower than expected, then check the list of possible bottlenecks on page 75 for PPro, P2 and P3, and page 95 for P4.

Highly optimized code tends to be very difficult to read and understand for others, and even for yourself when you get back to it after some time. In order to make it possible to maintain the code, it is important that you organize it into small logical units (procedures or macros) with a well-defined interface and appropriate comments. The more complicated the code is to read, the more important is a good documentation.

6 Reducing code size

As explained in chapter 9 page 29, the code cache is 8 or 16 kb on P1, PMMX, PPro, P2 and P3. If you have problems keeping the critical parts of your code within the code cache, then you may consider reducing the size of your code. You may also want to reduce the size of your code if speed is not important.

32-bit code is usually bigger than 16-bit code because addresses and data constants take 4 bytes in 32-bit code and only 2 bytes in 16-bit code. However, 16-bit code has other penalties, especially because of segment prefixes. Some other methods for reducing the size or your code are discussed below.

Both jump addresses, data addresses, and data constants take less space if they can be expressed as a sign-extended byte, i.e. if they are within the interval from -128 to +127.

For jump addresses, this means that short jumps take two bytes of code, whereas jumps beyond 127 bytes take 5 bytes if unconditional and 6 bytes if conditional.

Likewise, data addresses take less space if they can be expressed as a pointer and a displacement between -128 and +127. Example:

MOV EBX,DS:[100000] / ADD EBX,DS:[100004] ; 12 bytes

Reduce to:

MOV EAX,100000 / MOV EBX,[EAX] / ADD EBX,[EAX+4] ; 10 bytes

The advantage of using a pointer obviously increases if you use it many times. Storing data on the stack and using EBP or ESP as pointer will thus make your code smaller than if you

use static memory locations and absolute addresses, provided of course that your data are within +/-127 bytes of the pointer. Using PUSH and POP to write and read temporary data is even shorter.

Data constants may also take less space if they are between -128 and +127. Most instructions with immediate operands have a short form where the operand is a signextended single byte. Examples:

PUSH 200		; 5		bytes
PUSH 100		; 2		bytes
ADD	EBX,128	;	6	bytes
SUB	EBX,-128	;	3	bytes

The most important instruction with an immediate operand that does not have such a short form is MOV. Examples:

MOV EAX, 0

; 5 bytes

May be changed to:

SUB EAX,EAX

; 2 bytes

And

MOV EAX, 1

; 5 bytes

May be changed to:

SUB EAX,EAX / INC EAX

; 3 bytes

or:

PUSH 1 / POP EAX

; 3 bytes

And

MOV EAX, -1

; 5 bytes

May be changed to:

OR EAX, -1

; 3 bytes

If the same address or constant is used more than once then you may load it into a register. A MOV with a 4-byte immediate operand may sometimes be replaced by an arithmetic instruction if the value of the register before the MOV is known. Example:

MOV	[mem1],200	; 10 bytes
MOV	[mem2],200	; 10		bytes
MOV	[mem3],201	; 10		bytes
MOV	EAX,100	;	5	bytes
MOV	EBX,150	;	5	bytes

Assuming that mem1 and mem3 are both within -128/+127 bytes of mem2, this may be changed to:

MOV	EBX,OFFSET mem2	;	5	bytes
MOV	EAX,200	;	5	bytes
MOV	[EBX+mem1-mem2],EAX	;	3	bytes
MOV	[EBX],EAX	;	2	bytes

<<< < Предыдущая 1 2 3 4 5 6 78 / 438 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.201378.64 Кб8Firebird Null guide.pdf
#
23.08.201360.5 Кб6Firebird's nbackup tool.pdf
#
23.08.2013384.6 Кб11Firth D.R.Balanced constant current excitation for dynamic strain measurements.pdf
#
23.08.2013447.05 Кб10FLTK human interface guidelines.2005.pdf
#
23.08.2013430.42 Кб9FLTK Subversion quick-start guide.2005.pdf
#
23.08.2013814.91 Кб12Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf
#
23.08.2013163.76 Кб42Forth-83 standard.1983.pdf
#
23.08.2013551.69 Кб18Frame D.Printed circuit board and connector impedance matching using complex conjugation.2004.pdf
#
23.08.2013321.12 Кб8Fredriksson L.CAN for critical embedded automotive networks.pdf
#
23.08.2013665.38 Кб10FreeBSD developers' handbook.2001.pdf
#
23.08.2013177.78 Кб17Fuller J.P.MSW Logo.A simplified reference.1998.pdf