Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Fog A.How to optimize for the Pentium family of microprocessors.2004.pdf
Скачиваний:
12
Добавлен:
23.08.2013
Размер:
814.91 Кб
Скачать

The name Celeron applies to Pentium II and later models with less cache than the standard versions. The name Xeon applies to Pentium II and later models with more cache than the standard versions.

The P1 and PMMX processors represent the fifth generation in the Intel x86 series of microprocessors, and their processor kernels are very similar. PPro, P2 and P3 all have the sixth generation kernel. These three processors are almost identical except for the fact that new instructions are added to each new model. P4 is the first processor in the seventh generation which, for obscure reasons, is not called seventh generation in Intel documents. Quite unexpectedly, the generation number returned by the CPUID instruction in the P4 is not 7 but 15. The reader should be aware that the 5'th, 6'th and 7'th generation microprocessors behave very differently. What is optimal for one generation may not be optimal for the others.

2 Getting started with optimization

2.1 Speed versus program clarity and security

Current trends in software technology go in the direction of ever more abstract and highlevel programming techniques and languages. The motivations behind this trend are: faster development, easier maintenance, and safer code. A typical programmer spends more time finding errors and making additions and modifications than on writing new code. Therefore, most software is written in high-level languages that are easier to document and maintain. The backside of the coin is that the code gets slower and the demands on hardware performance gets bigger and bigger, as the ever more complex intermediate layers

separate the programmer's code from the hardware. Large runtime modules, emulators and virtual machines consume large amounts of hard disk space and RAM memory. The result is that the programs take long time to install, long time to load, and long time to execute.

At the opposite extreme, we have assembly language, which produces very compact and fast code, but is very difficult to debug and maintain and is very vulnerable to programming errors.

A good compromise is provided by the C++ programming language. C++ has all the advanced features of a high-level language, but it has also inherited the low-level features of the old C language. You can use the most advanced high-level programming techniques in most of your software project for reasons of maintainability and security, and still have access to use low-level techniques in the innermost loop where speed is critical.

The security problems of low-level programming should not be ignored, however. Many of the software crashes and security holes that plague contemporary software are due to the unsafe features that C++ has inherited from C, such as absence of array bounds checking, uninitialized pointers, pointer arithmetic, pointer type casting, and memory leaks. Some programmers prefer to use other programming languages to avoid these problems, but most of the security problems in C++ can be avoided by using safer programming techniques. Some good advices for safe C++ programming are:

use references rather than pointers

use string objects rather than character arrays

use container classes rather than arrays (useful container classes are provided in the standard template library)

avoid dynamic memory allocation (new, delete) except in well-tested container classes

avoid functions that write to parameters through void pointers or variable argument lists, such as memcpy and scanf.

encapsulate everything in classes with well-defined interfaces and responsibilities

use systematic testing methods

You may deviate from these advices in critical parts of the code where speed is important, but make sure the unsafe code is limited to well-tested functions or modules with a welldefined interface to the rest of the program.

Assembly language is, of course, even more unsafe and difficult to maintain. Assembly language should therefore be used only in the most critical part of your program, and only if it provides a significant improvement in speed. The assembly code should be confined to a well-tested function, module or library with a well-defined interface to the calling program.

2.2 Choice of programming language

Before starting a new software project, you have to decide which programming language to use. Low-level languages are good for optimizing execution speed or program size, while high-level languages are good for making clear and well-structured code.

Today, most universities teach Java as the first programming language for pedagogical reasons. The advantages of Java are that it is consistent, well structured, and portable. But it is not fast, because in most cases it runs on a virtual Java machine that interprets code rather than executing it. If execution speed is important, then the best choice will be C++. This language has the best of both worlds. The C++ language has more features and options than most other programming languages. Advanced features like inheritance, polymorphism, macros, template libraries and exception handling enable you to make wellstructured and reusable code at a high level of abstraction. On the other hand, the C++ language is a superset of the old C language, which gives you access to fiddle with every bit and byte and to use low-level programming techniques.

C++ is definitely the language of choice if you want to make part of your project in assembly language. C++ has excellent features for integrating with assembly:

C++ links easily with assembly modules

C++ uses simple data structures that are also available in assembly

most C++ compilers can translate from C++ to assembly

most C++ compilers support inline assembly and direct access to registers and flags

some C++ compilers have "intrinsic functions" that translate directly to XMM instructions

It is possible to call assembly language modules from other compiled languages such as Pascal, Fortran, Basic and C#, but this is usually more complicated than with C++. Strings and arrays may have to be translated to the appropriate format, and it may be necessary to encapsulate the assembly module into a dynamic link library (DLL).

Combining assembly code with Java is even more difficult because Java is usually not compiled into executable code but to an intermediate code that runs on an emulated virtual Java machine.

See page 23 for details on how to call assembly language modules from various high level languages.

2.3 Choice of algorithm

The first thing to do when you want to optimize a piece of software is to find the best algorithm. Optimizing a poor algorithm is a waste of time. So don't even think of converting your code to assembly before you have explored all possibilities for optimizing your algorithm and the implementation of your algorithm.

2.4 Memory model

The Pentiums are designed primarily for 32-bit code, and the performance is inferior on 16bit code. Segmenting your code and data also degrades performance significantly, so you should generally prefer 32-bit flat mode, and an operating system that supports this mode. The code examples shown in this manual assume a 32-bit flat memory model, unless otherwise specified.

2.5 Finding the hot spots

Before you try to optimize anything, you have to identify the critical parts of your program. Often, more than 99% of the CPU time is spent in the innermost loop of a program. If this is the case then you should isolate this hot spot in a separate subroutine that you can optimize for speed, while the rest of your program can be optimized for clarity and maintainability.

You may translate the critical subroutine to assembly and leave everything else in high-level language. Many assembly programmers waste a lot of energy optimizing the wrong parts of their programs. There are even people who make entire Windows programs in assembly.

Most of the code in a typical program goes to the user interface and to calling system routines. A user interface with menus and dialog boxes is certainly not something that is being executed a thousand times per second. People who try to optimize something like this in assembly may be spending hours - or more likely months - making the program respond ten nanoseconds faster to a mouse click on a system where the screen is refreshed 60 times per second. There are certainly better ways of investing your programming skills! The same applies to program sections that consist mainly of calls to system routines. Such calls are usually well optimized by C++ compilers and there is no reason to use assembly language here.

Assembly language should be used only for loops that are executed so many times that it

9

really matters in terms of CPU time, and that is very many. A 2 GHz Pentium 4 can do 6· 10 integer additions per second. So it is probably not worth the effort to optimize a loop that makes "only" one million integer operations. It will suffice to change from Java to C++.

Typical applications where assembly language can be useful for optimizing speed include processing of sound and images, compression and encryption of large amounts of data, simulation of complex systems, and mathematical calculations that involve iteration. The speed of such applications can sometimes be increased manyfold by conversion to assembly.

Assembly language is also useful when optimizing code for size. This is typically used in embedded systems where a piece of code has to fit into a ROM or flash RAM. Using assembly language for optimizing an application program for size is not worth the effort because data storage is so cheap.

If it is not obvious where the critical parts of your program are then you may use a profiler to find them. If it turns out that the bottleneck is disk access, then you may modify your program to make disk access sequential in order to improve disk caching, rather than turning to assembly programming. If the bottleneck is graphics output then you may look for a way of reducing the number of calls to graphic procedures or a better graphics library.

Some high level language compilers offer relatively good optimization for specific processors, but in most cases further optimization by hand can make it much better. When the possibilities for optimizing in C++ have been exhausted, then you can make your C++ compiler translate the critical subroutine to assembly, and do further optimizations by hand.