Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

814.91 Кб

Скачать

☆

<<< < Предыдущая 9 10 11 12 13 14 15 16 17 18 19 2021 / 4321 22 23 24 25 26 27 28 29 30 31 32 33 > Следующая >>>

MOV	BYTE	PTR [ESI], AL
MOV	EBX,	DWORD PTR [ESI]	; partial memory stall

Here you get a stall because the processor has to combine the byte written from AL with the next three bytes, which were in memory before, to get the four bytes needed for reading into EBX. The penalty is approximately 7 - 8 clocks.

Unlike the partial register stalls, you also get a partial memory stall when you write a bigger operand to memory and then read part of it, if the smaller part doesn't start at the same address:

MOV DWORD PTR [ESI], EAX
MOV	BL,	BYTE	PTR	[ESI]	;	no stall
MOV	BH,	BYTE	PTR	[ESI+1]	;	stall

You can avoid this stall by changing the last line to MOV BH,AH, but such a solution is not possible in a situation like this:

FISTP QWORD PTR [EDI]

MOV	EAX,	DWORD	PTR	[EDI]
MOV	EDX,	DWORD	PTR	[EDI+4]	; stall

Interestingly, you can also get a partial memory stall when writing and reading completely different addresses if they happen to have the same set-value in different cache banks:

MOV BYTE PTR [ESI], AL
MOV	EBX,	DWORD	PTR	[ESI+4092]	;	no stall
MOV	ECX,	DWORD	PTR	[ESI+4096]	;	stall

14.8 Bottlenecks in PPro, P2, P3

When optimizing code for these processors, it is important to analyze where the bottlenecks are. Spending time on optimizing away one bottleneck doesn't make sense if another bottleneck is narrower.

If you expect code cache misses, then you should restructure your code to keep the most used parts of code together.

If you expect many data cache misses, then forget about everything else and concentrate on how to restructure your data to reduce the number of cache misses (page 29), and avoid long dependence chains after a data read cache miss.

If you have many divisions, then try to reduce them (page 116) and make sure the processor has something else to do during the divisions.

Dependence chains tend to hamper out-of-order execution (page 34). Try to break long dependence chains, especially if they contain slow instructions such as multiplication, division, and floating-point instructions.

If you have many jumps, calls, or returns, and especially if the jumps are poorly predictable, then try if some of them can be avoided. Replace poorly predictable conditional jumps with conditional moves if possible, and replace small procedures with macros (page 50).

If you are mixing different data sizes (8, 16, and 32 bit integers) then look out for partial stalls. If you use PUSHF or LAHF instructions then look out for partial flags stalls. Avoid testing flags after shifts or rotates by more than 1 (page 71).

<<< < Предыдущая 9 10 11 12 13 14 15 16 17 18 19 2021 / 4321 22 23 24 25 26 27 28 29 30 31 32 33 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.201378.64 Кб8Firebird Null guide.pdf
#
23.08.201360.5 Кб6Firebird's nbackup tool.pdf
#
23.08.2013384.6 Кб11Firth D.R.Balanced constant current excitation for dynamic strain measurements.pdf
#
23.08.2013447.05 Кб10FLTK human interface guidelines.2005.pdf
#
23.08.2013430.42 Кб9FLTK Subversion quick-start guide.2005.pdf
#
23.08.2013814.91 Кб12Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf
#
23.08.2013163.76 Кб42Forth-83 standard.1983.pdf
#
23.08.2013551.69 Кб18Frame D.Printed circuit board and connector impedance matching using complex conjugation.2004.pdf
#
23.08.2013321.12 Кб8Fredriksson L.CAN for critical embedded automotive networks.pdf
#
23.08.2013665.38 Кб10FreeBSD developers' handbook.2001.pdf
#
23.08.2013177.78 Кб17Fuller J.P.MSW Logo.A simplified reference.1998.pdf