- •Contents
- •List of Figures
- •List of Tables
- •Acknowledgments
- •Introduction to MPI
- •Overview and Goals
- •Background of MPI-1.0
- •Background of MPI-1.1, MPI-1.2, and MPI-2.0
- •Background of MPI-1.3 and MPI-2.1
- •Background of MPI-2.2
- •Who Should Use This Standard?
- •What Platforms Are Targets For Implementation?
- •What Is Included In The Standard?
- •What Is Not Included In The Standard?
- •Organization of this Document
- •MPI Terms and Conventions
- •Document Notation
- •Naming Conventions
- •Semantic Terms
- •Data Types
- •Opaque Objects
- •Array Arguments
- •State
- •Named Constants
- •Choice
- •Addresses
- •Language Binding
- •Deprecated Names and Functions
- •Fortran Binding Issues
- •C Binding Issues
- •C++ Binding Issues
- •Functions and Macros
- •Processes
- •Error Handling
- •Implementation Issues
- •Independence of Basic Runtime Routines
- •Interaction with Signals
- •Examples
- •Point-to-Point Communication
- •Introduction
- •Blocking Send and Receive Operations
- •Blocking Send
- •Message Data
- •Message Envelope
- •Blocking Receive
- •Return Status
- •Passing MPI_STATUS_IGNORE for Status
- •Data Type Matching and Data Conversion
- •Type Matching Rules
- •Type MPI_CHARACTER
- •Data Conversion
- •Communication Modes
- •Semantics of Point-to-Point Communication
- •Buffer Allocation and Usage
- •Nonblocking Communication
- •Communication Request Objects
- •Communication Initiation
- •Communication Completion
- •Semantics of Nonblocking Communications
- •Multiple Completions
- •Non-destructive Test of status
- •Probe and Cancel
- •Persistent Communication Requests
- •Send-Receive
- •Null Processes
- •Datatypes
- •Derived Datatypes
- •Type Constructors with Explicit Addresses
- •Datatype Constructors
- •Subarray Datatype Constructor
- •Distributed Array Datatype Constructor
- •Address and Size Functions
- •Lower-Bound and Upper-Bound Markers
- •Extent and Bounds of Datatypes
- •True Extent of Datatypes
- •Commit and Free
- •Duplicating a Datatype
- •Use of General Datatypes in Communication
- •Correct Use of Addresses
- •Decoding a Datatype
- •Examples
- •Pack and Unpack
- •Canonical MPI_PACK and MPI_UNPACK
- •Collective Communication
- •Introduction and Overview
- •Communicator Argument
- •Applying Collective Operations to Intercommunicators
- •Barrier Synchronization
- •Broadcast
- •Example using MPI_BCAST
- •Gather
- •Examples using MPI_GATHER, MPI_GATHERV
- •Scatter
- •Examples using MPI_SCATTER, MPI_SCATTERV
- •Example using MPI_ALLGATHER
- •All-to-All Scatter/Gather
- •Global Reduction Operations
- •Reduce
- •Signed Characters and Reductions
- •MINLOC and MAXLOC
- •All-Reduce
- •Process-local reduction
- •Reduce-Scatter
- •MPI_REDUCE_SCATTER_BLOCK
- •MPI_REDUCE_SCATTER
- •Scan
- •Inclusive Scan
- •Exclusive Scan
- •Example using MPI_SCAN
- •Correctness
- •Introduction
- •Features Needed to Support Libraries
- •MPI's Support for Libraries
- •Basic Concepts
- •Groups
- •Contexts
- •Intra-Communicators
- •Group Management
- •Group Accessors
- •Group Constructors
- •Group Destructors
- •Communicator Management
- •Communicator Accessors
- •Communicator Constructors
- •Communicator Destructors
- •Motivating Examples
- •Current Practice #1
- •Current Practice #2
- •(Approximate) Current Practice #3
- •Example #4
- •Library Example #1
- •Library Example #2
- •Inter-Communication
- •Inter-communicator Accessors
- •Inter-communicator Operations
- •Inter-Communication Examples
- •Caching
- •Functionality
- •Communicators
- •Windows
- •Datatypes
- •Error Class for Invalid Keyval
- •Attributes Example
- •Naming Objects
- •Formalizing the Loosely Synchronous Model
- •Basic Statements
- •Models of Execution
- •Static communicator allocation
- •Dynamic communicator allocation
- •The General case
- •Process Topologies
- •Introduction
- •Virtual Topologies
- •Embedding in MPI
- •Overview of the Functions
- •Topology Constructors
- •Cartesian Constructor
- •Cartesian Convenience Function: MPI_DIMS_CREATE
- •General (Graph) Constructor
- •Distributed (Graph) Constructor
- •Topology Inquiry Functions
- •Cartesian Shift Coordinates
- •Partitioning of Cartesian structures
- •Low-Level Topology Functions
- •An Application Example
- •MPI Environmental Management
- •Implementation Information
- •Version Inquiries
- •Environmental Inquiries
- •Tag Values
- •Host Rank
- •IO Rank
- •Clock Synchronization
- •Memory Allocation
- •Error Handling
- •Error Handlers for Communicators
- •Error Handlers for Windows
- •Error Handlers for Files
- •Freeing Errorhandlers and Retrieving Error Strings
- •Error Codes and Classes
- •Error Classes, Error Codes, and Error Handlers
- •Timers and Synchronization
- •Startup
- •Allowing User Functions at Process Termination
- •Determining Whether MPI Has Finished
- •Portable MPI Process Startup
- •The Info Object
- •Process Creation and Management
- •Introduction
- •The Dynamic Process Model
- •Starting Processes
- •The Runtime Environment
- •Process Manager Interface
- •Processes in MPI
- •Starting Processes and Establishing Communication
- •Reserved Keys
- •Spawn Example
- •Manager-worker Example, Using MPI_COMM_SPAWN.
- •Establishing Communication
- •Names, Addresses, Ports, and All That
- •Server Routines
- •Client Routines
- •Name Publishing
- •Reserved Key Values
- •Client/Server Examples
- •Ocean/Atmosphere - Relies on Name Publishing
- •Simple Client-Server Example.
- •Other Functionality
- •Universe Size
- •Singleton MPI_INIT
- •MPI_APPNUM
- •Releasing Connections
- •Another Way to Establish MPI Communication
- •One-Sided Communications
- •Introduction
- •Initialization
- •Window Creation
- •Window Attributes
- •Communication Calls
- •Examples
- •Accumulate Functions
- •Synchronization Calls
- •Fence
- •General Active Target Synchronization
- •Lock
- •Assertions
- •Examples
- •Error Handling
- •Error Handlers
- •Error Classes
- •Semantics and Correctness
- •Atomicity
- •Progress
- •Registers and Compiler Optimizations
- •External Interfaces
- •Introduction
- •Generalized Requests
- •Examples
- •Associating Information with Status
- •MPI and Threads
- •General
- •Initialization
- •Introduction
- •File Manipulation
- •Opening a File
- •Closing a File
- •Deleting a File
- •Resizing a File
- •Preallocating Space for a File
- •Querying the Size of a File
- •Querying File Parameters
- •File Info
- •Reserved File Hints
- •File Views
- •Data Access
- •Data Access Routines
- •Positioning
- •Synchronism
- •Coordination
- •Data Access Conventions
- •Data Access with Individual File Pointers
- •Data Access with Shared File Pointers
- •Noncollective Operations
- •Collective Operations
- •Seek
- •Split Collective Data Access Routines
- •File Interoperability
- •Datatypes for File Interoperability
- •Extent Callback
- •Datarep Conversion Functions
- •Matching Data Representations
- •Consistency and Semantics
- •File Consistency
- •Random Access vs. Sequential Files
- •Progress
- •Collective File Operations
- •Type Matching
- •Logical vs. Physical File Layout
- •File Size
- •Examples
- •Asynchronous I/O
- •I/O Error Handling
- •I/O Error Classes
- •Examples
- •Subarray Filetype Constructor
- •Requirements
- •Discussion
- •Logic of the Design
- •Examples
- •MPI Library Implementation
- •Systems with Weak Symbols
- •Systems Without Weak Symbols
- •Complications
- •Multiple Counting
- •Linker Oddities
- •Multiple Levels of Interception
- •Deprecated Functions
- •Deprecated since MPI-2.0
- •Deprecated since MPI-2.2
- •Language Bindings
- •Overview
- •Design
- •C++ Classes for MPI
- •Class Member Functions for MPI
- •Semantics
- •C++ Datatypes
- •Communicators
- •Exceptions
- •Mixed-Language Operability
- •Problems With Fortran Bindings for MPI
- •Problems Due to Strong Typing
- •Problems Due to Data Copying and Sequence Association
- •Special Constants
- •Fortran 90 Derived Types
- •A Problem with Register Optimization
- •Basic Fortran Support
- •Extended Fortran Support
- •The mpi Module
- •No Type Mismatch Problems for Subroutines with Choice Arguments
- •Additional Support for Fortran Numeric Intrinsic Types
- •Language Interoperability
- •Introduction
- •Assumptions
- •Initialization
- •Transfer of Handles
- •Status
- •MPI Opaque Objects
- •Datatypes
- •Callback Functions
- •Error Handlers
- •Reduce Operations
- •Addresses
- •Attributes
- •Extra State
- •Constants
- •Interlanguage Communication
- •Language Bindings Summary
- •Groups, Contexts, Communicators, and Caching Fortran Bindings
- •External Interfaces C++ Bindings
- •Change-Log
- •Bibliography
- •Examples Index
- •MPI Declarations Index
- •MPI Function Index
3.6. BUFFER ALLOCATION AND USAGE |
45 |
Example 3.10 An exchange that relies on bu ering.
CALL MPI_COMM_RANK(comm, rank, ierr)
IF (rank.EQ.0) THEN
CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)
CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE IF (rank.EQ.1) THEN
CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)
CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
The message sent by each process has to be copied out before the send operation returns and the receive operation starts. For the program to complete, it is necessary that at least one of the two messages sent be bu ered. Thus, this program can succeed only if the communication system can bu er at least count words of data.
Advice to users. When standard send operations are used, then a deadlock situation may occur where both processes are blocked because bu er space is not available. The same will certainly happen, if the synchronous mode is used. If the bu ered mode is used, and not enough bu er space is available, then the program will not complete either. However, rather than a deadlock situation, we shall have a bu er over ow error.
A program is \safe" if no message bu ering is required for the program to complete. One can replace all sends in such program with synchronous sends, and the program will still run correctly. This conservative programming style provides the best portability, since program completion does not depend on the amount of bu er space available or on the communication protocol used.
Many programmers prefer to have more leeway and opt to use the \unsafe" programming style shown in Example 3.10. In such cases, the use of standard sends is likely to provide the best compromise between performance and robustness: quality implementations will provide su cient bu ering so that \common practice" programs will not deadlock. The bu ered send mode can be used for programs that require more bu ering, or in situations where the programmer wants more control. This mode might also be used for debugging purposes, as bu er over ow conditions are easier to diagnose than deadlock conditions.
Nonblocking message-passing operations, as described in Section 3.7, can be used to avoid the need for bu ering outgoing messages. This prevents deadlocks due to lack of bu er space, and improves performance, by allowing overlap of computation and communication, and avoiding the overheads of allocating bu ers and copying messages into bu ers. (End of advice to users.)
3.6 Bu er Allocation and Usage
A user may specify a bu er to be used for bu ering messages sent in bu ered mode. Bu ering is done by the sender.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1
2
3
4
5
6
7
46 |
CHAPTER 3. POINT-TO-POINT COMMUNICATION |
MPI_BUFFER_ATTACH(bu er, size)
IN |
bu er |
initial bu er address (choice) |
IN |
size |
bu er size, in bytes (non-negative integer) |
int MPI_Buffer_attach(void* buffer, int size)
8
9
10
MPI_BUFFER_ATTACH(BUFFER, SIZE, IERROR) <type> BUFFER(*)
INTEGER SIZE, IERROR
11 |
fvoid MPI::Attach_buffer(void* buffer, int size) (binding deprecated, see |
|||
|
||||
12 |
|
Section 15.2) g |
|
|
13 |
|
|
||
Provides to MPI a bu er in the user's memory to be used for bu ering outgoing mes- |
||||
14 |
||||
sages. The bu er is used only by messages sent in bu ered mode. Only one bu er can be |
||||
15 |
||||
attached to a process at a time. |
|
|||
16 |
|
|||
|
|
|
||
17 |
|
|
|
|
18 |
MPI_BUFFER_DETACH(bu er_addr, size) |
|
||
|
|
|||
19 |
|
|
initial bu er address (choice) |
|
20 |
OUT |
bu er_addr |
||
|
|
|
||
21 |
OUT |
size |
bu er size, in bytes (non-negative integer) |
|
22 |
|
|
|
23 |
int MPI_Buffer_detach(void* buffer_addr, int* size) |
|
|
24 |
|
25MPI_BUFFER_DETACH(BUFFER_ADDR, SIZE, IERROR)
26<type> BUFFER_ADDR(*)
27INTEGER SIZE, IERROR
28fint MPI::Detach_buffer(void*& buffer) (binding deprecated, see Section 15.2) g
29
30Detach the bu er currently associated with MPI. The call returns the address and the
31size of the detached bu er. This operation will block until all messages currently in the
32bu er have been transmitted. Upon return of this function, the user may reuse or deallocate
33the space taken by the bu er.
Example 3.11 Calls to attach and detach bu ers.
#define BUFFSIZE 10000 int size;
char *buff;
MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE);
/* a buffer of 10000 bytes can now be used by MPI_Bsend */
MPI_Buffer_detach( &buff, &size);
42
/* Buffer size reduced to zero */
43
MPI_Buffer_attach( buff, size);
44
/* Buffer of 10000 bytes available again */
45
46Advice to users. Even though the C functions MPI_Bu er_attach and
47MPI_Bu er_detach both have a rst argument of type void*, these arguments are
48used di erently: A pointer to the bu er is passed to MPI_Bu er_attach; the address
3.6. BUFFER ALLOCATION AND USAGE |
47 |
of the pointer is passed to MPI_Bu er_detach, so that this call can return the pointer value. (End of advice to users.)
Rationale. Both arguments are de ned to be of type void* (rather than
void* and void**, respectively), so as to avoid complex type casts. E.g., in the last example, &bu , which is of type char**, can be passed as argument to MPI_Bu er_detach without type casting. If the formal parameter had type void** then we would need a type cast before and after the call. (End of rationale.)
The statements made in this section describe the behavior of MPI for bu ered-mode sends. When no bu er is currently associated, MPI behaves as if a zero-sized bu er is associated with the process.
MPI must provide as much bu ering for outgoing messages as if outgoing message data were bu ered by the sending process, in the speci ed bu er space, using a circular, contiguous-space allocation policy. We outline below a model implementation that de nes this policy. MPI may provide more bu ering, and may use a better bu er allocation algorithm than described below. On the other hand, MPI may signal an error whenever the simple bu ering allocator described below would run out of space. In particular, if no bu er is explicitly associated with the process, then any bu ered send may cause an error.
MPI does not provide mechanisms for querying or controlling bu ering done by standard mode sends. It is expected that vendors will provide such information for their implementations.
Rationale. There is a wide spectrum of possible implementations of bu ered communication: bu ering can be done at sender, at receiver, or both; bu ers can be dedicated to one sender-receiver pair, or be shared by all communications; bu ering can be done in real or in virtual memory; it can use dedicated memory, or memory shared by other processes; bu er space may be allocated statically or be changed dynamically; etc. It does not seem feasible to provide a portable mechanism for querying or controlling bu ering that would be compatible with all these choices, yet provide meaningful information. (End of rationale.)
3.6.1 Model Implementation of Bu ered Mode
The model implementation uses the packing and unpacking functions described in Section 4.2 and the nonblocking communication functions described in Section 3.7.
We assume that a circular queue of pending message entries (PME) is maintained. Each entry contains a communication request handle that identi es a pending nonblocking send, a pointer to the next entry and the packed message data. The entries are stored in successive locations in the bu er. Free space is available between the queue tail and the queue head.
A bu ered send call results in the execution of the following code.
Traverse sequentially the PME queue from head towards the tail, deleting all entries for communications that have completed, up to the rst entry with an uncompleted request; update queue head to point to that entry.
Compute the number, n, of bytes needed to store an entry for the new message. An up-
per bound on n can be computed as follows: A call to the function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48