Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

814.91 Кб

Скачать

☆

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 4329 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

example 16.11. FLD ST(0) plays the same role in example 16.13 as ORPD XMM3,XMM1 in example 16.11.

The repetition count for this loop is the number of significant bits in n. If this value often changes, then you may repeat the loop the maximum number of times in order to make the loop control branch predictable. This requires, of course, that there is no risk of overflow in the multiplications.

Changing the code of example 16.13 to use XMM registers is no advantage, unless you can handle data in parallel, because conditional moves in XMM registers are complicated to implement (see page 110).

16.4 Macro loops (all processors)

If the repetition count for a loop is small and constant, then it is possible to unroll the loop completely. The advantage of this is that calculations that depend only on the loop counter can be done at assembly time rather than at execution time. The disadvantage is, of course, that it takes up more space in the trace cache or code cache.

The MASM language includes a powerful macro language that is useful for this purpose. If, for example, we need a list of square numbers, then the C++ code may look like this:

int squares[10];

for (int i=0; i<10; i++) squares[i] = i*i;

The same list can be generated by a macro loop in MASM language:

; Example 16.14
.DATA
squares LABEL DWORD		; label at start of array
I = 0		; temporary		counter
REPT	10	; repeat	10	times
DD	I * I	; define	one array element
I	= I + 1	; increment		counter
ENDM		; end of	REPT loop

Here, I is a preprocessing variable. The I loop is run at assembly time, not at execution time. The variable I and the statement I = I + 1 never make it into the final code, and hence take no time to execute. In fact, example 16.14 generates no executable code, only data. The macro preprocessor will translate the above code to:

squares LABEL DWORD		; label at start of array
DD	0
DD	1
DD	4
DD	9
DD	16
DD	25
DD	36
DD	49
DD	64
DD	81

Now, let's return to the power example (example 16.12). Ifn is known at assembly time, then the power function can be implemented using the following macro:

;This macro will raise two packed double-precision floats in X

;to the power of N, where N is a positive integer constant.

;The result is returned in Y. X and Y must be two different

;XMM registers. X is not preserved.

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 4329 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.201378.64 Кб8Firebird Null guide.pdf
#
23.08.201360.5 Кб6Firebird's nbackup tool.pdf
#
23.08.2013384.6 Кб11Firth D.R.Balanced constant current excitation for dynamic strain measurements.pdf
#
23.08.2013447.05 Кб10FLTK human interface guidelines.2005.pdf
#
23.08.2013430.42 Кб9FLTK Subversion quick-start guide.2005.pdf
#
23.08.2013814.91 Кб12Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf
#
23.08.2013163.76 Кб42Forth-83 standard.1983.pdf
#
23.08.2013551.69 Кб18Frame D.Printed circuit board and connector impedance matching using complex conjugation.2004.pdf
#
23.08.2013321.12 Кб8Fredriksson L.CAN for critical embedded automotive networks.pdf
#
23.08.2013665.38 Кб10FreeBSD developers' handbook.2001.pdf
#
23.08.2013177.78 Кб17Fuller J.P.MSW Logo.A simplified reference.1998.pdf