- •Contents
- •List of Figures
- •List of Tables
- •Acknowledgments
- •Introduction to MPI
- •Overview and Goals
- •Background of MPI-1.0
- •Background of MPI-1.1, MPI-1.2, and MPI-2.0
- •Background of MPI-1.3 and MPI-2.1
- •Background of MPI-2.2
- •Who Should Use This Standard?
- •What Platforms Are Targets For Implementation?
- •What Is Included In The Standard?
- •What Is Not Included In The Standard?
- •Organization of this Document
- •MPI Terms and Conventions
- •Document Notation
- •Naming Conventions
- •Semantic Terms
- •Data Types
- •Opaque Objects
- •Array Arguments
- •State
- •Named Constants
- •Choice
- •Addresses
- •Language Binding
- •Deprecated Names and Functions
- •Fortran Binding Issues
- •C Binding Issues
- •C++ Binding Issues
- •Functions and Macros
- •Processes
- •Error Handling
- •Implementation Issues
- •Independence of Basic Runtime Routines
- •Interaction with Signals
- •Examples
- •Point-to-Point Communication
- •Introduction
- •Blocking Send and Receive Operations
- •Blocking Send
- •Message Data
- •Message Envelope
- •Blocking Receive
- •Return Status
- •Passing MPI_STATUS_IGNORE for Status
- •Data Type Matching and Data Conversion
- •Type Matching Rules
- •Type MPI_CHARACTER
- •Data Conversion
- •Communication Modes
- •Semantics of Point-to-Point Communication
- •Buffer Allocation and Usage
- •Nonblocking Communication
- •Communication Request Objects
- •Communication Initiation
- •Communication Completion
- •Semantics of Nonblocking Communications
- •Multiple Completions
- •Non-destructive Test of status
- •Probe and Cancel
- •Persistent Communication Requests
- •Send-Receive
- •Null Processes
- •Datatypes
- •Derived Datatypes
- •Type Constructors with Explicit Addresses
- •Datatype Constructors
- •Subarray Datatype Constructor
- •Distributed Array Datatype Constructor
- •Address and Size Functions
- •Lower-Bound and Upper-Bound Markers
- •Extent and Bounds of Datatypes
- •True Extent of Datatypes
- •Commit and Free
- •Duplicating a Datatype
- •Use of General Datatypes in Communication
- •Correct Use of Addresses
- •Decoding a Datatype
- •Examples
- •Pack and Unpack
- •Canonical MPI_PACK and MPI_UNPACK
- •Collective Communication
- •Introduction and Overview
- •Communicator Argument
- •Applying Collective Operations to Intercommunicators
- •Barrier Synchronization
- •Broadcast
- •Example using MPI_BCAST
- •Gather
- •Examples using MPI_GATHER, MPI_GATHERV
- •Scatter
- •Examples using MPI_SCATTER, MPI_SCATTERV
- •Example using MPI_ALLGATHER
- •All-to-All Scatter/Gather
- •Global Reduction Operations
- •Reduce
- •Signed Characters and Reductions
- •MINLOC and MAXLOC
- •All-Reduce
- •Process-local reduction
- •Reduce-Scatter
- •MPI_REDUCE_SCATTER_BLOCK
- •MPI_REDUCE_SCATTER
- •Scan
- •Inclusive Scan
- •Exclusive Scan
- •Example using MPI_SCAN
- •Correctness
- •Introduction
- •Features Needed to Support Libraries
- •MPI's Support for Libraries
- •Basic Concepts
- •Groups
- •Contexts
- •Intra-Communicators
- •Group Management
- •Group Accessors
- •Group Constructors
- •Group Destructors
- •Communicator Management
- •Communicator Accessors
- •Communicator Constructors
- •Communicator Destructors
- •Motivating Examples
- •Current Practice #1
- •Current Practice #2
- •(Approximate) Current Practice #3
- •Example #4
- •Library Example #1
- •Library Example #2
- •Inter-Communication
- •Inter-communicator Accessors
- •Inter-communicator Operations
- •Inter-Communication Examples
- •Caching
- •Functionality
- •Communicators
- •Windows
- •Datatypes
- •Error Class for Invalid Keyval
- •Attributes Example
- •Naming Objects
- •Formalizing the Loosely Synchronous Model
- •Basic Statements
- •Models of Execution
- •Static communicator allocation
- •Dynamic communicator allocation
- •The General case
- •Process Topologies
- •Introduction
- •Virtual Topologies
- •Embedding in MPI
- •Overview of the Functions
- •Topology Constructors
- •Cartesian Constructor
- •Cartesian Convenience Function: MPI_DIMS_CREATE
- •General (Graph) Constructor
- •Distributed (Graph) Constructor
- •Topology Inquiry Functions
- •Cartesian Shift Coordinates
- •Partitioning of Cartesian structures
- •Low-Level Topology Functions
- •An Application Example
- •MPI Environmental Management
- •Implementation Information
- •Version Inquiries
- •Environmental Inquiries
- •Tag Values
- •Host Rank
- •IO Rank
- •Clock Synchronization
- •Memory Allocation
- •Error Handling
- •Error Handlers for Communicators
- •Error Handlers for Windows
- •Error Handlers for Files
- •Freeing Errorhandlers and Retrieving Error Strings
- •Error Codes and Classes
- •Error Classes, Error Codes, and Error Handlers
- •Timers and Synchronization
- •Startup
- •Allowing User Functions at Process Termination
- •Determining Whether MPI Has Finished
- •Portable MPI Process Startup
- •The Info Object
- •Process Creation and Management
- •Introduction
- •The Dynamic Process Model
- •Starting Processes
- •The Runtime Environment
- •Process Manager Interface
- •Processes in MPI
- •Starting Processes and Establishing Communication
- •Reserved Keys
- •Spawn Example
- •Manager-worker Example, Using MPI_COMM_SPAWN.
- •Establishing Communication
- •Names, Addresses, Ports, and All That
- •Server Routines
- •Client Routines
- •Name Publishing
- •Reserved Key Values
- •Client/Server Examples
- •Ocean/Atmosphere - Relies on Name Publishing
- •Simple Client-Server Example.
- •Other Functionality
- •Universe Size
- •Singleton MPI_INIT
- •MPI_APPNUM
- •Releasing Connections
- •Another Way to Establish MPI Communication
- •One-Sided Communications
- •Introduction
- •Initialization
- •Window Creation
- •Window Attributes
- •Communication Calls
- •Examples
- •Accumulate Functions
- •Synchronization Calls
- •Fence
- •General Active Target Synchronization
- •Lock
- •Assertions
- •Examples
- •Error Handling
- •Error Handlers
- •Error Classes
- •Semantics and Correctness
- •Atomicity
- •Progress
- •Registers and Compiler Optimizations
- •External Interfaces
- •Introduction
- •Generalized Requests
- •Examples
- •Associating Information with Status
- •MPI and Threads
- •General
- •Initialization
- •Introduction
- •File Manipulation
- •Opening a File
- •Closing a File
- •Deleting a File
- •Resizing a File
- •Preallocating Space for a File
- •Querying the Size of a File
- •Querying File Parameters
- •File Info
- •Reserved File Hints
- •File Views
- •Data Access
- •Data Access Routines
- •Positioning
- •Synchronism
- •Coordination
- •Data Access Conventions
- •Data Access with Individual File Pointers
- •Data Access with Shared File Pointers
- •Noncollective Operations
- •Collective Operations
- •Seek
- •Split Collective Data Access Routines
- •File Interoperability
- •Datatypes for File Interoperability
- •Extent Callback
- •Datarep Conversion Functions
- •Matching Data Representations
- •Consistency and Semantics
- •File Consistency
- •Random Access vs. Sequential Files
- •Progress
- •Collective File Operations
- •Type Matching
- •Logical vs. Physical File Layout
- •File Size
- •Examples
- •Asynchronous I/O
- •I/O Error Handling
- •I/O Error Classes
- •Examples
- •Subarray Filetype Constructor
- •Requirements
- •Discussion
- •Logic of the Design
- •Examples
- •MPI Library Implementation
- •Systems with Weak Symbols
- •Systems Without Weak Symbols
- •Complications
- •Multiple Counting
- •Linker Oddities
- •Multiple Levels of Interception
- •Deprecated Functions
- •Deprecated since MPI-2.0
- •Deprecated since MPI-2.2
- •Language Bindings
- •Overview
- •Design
- •C++ Classes for MPI
- •Class Member Functions for MPI
- •Semantics
- •C++ Datatypes
- •Communicators
- •Exceptions
- •Mixed-Language Operability
- •Problems With Fortran Bindings for MPI
- •Problems Due to Strong Typing
- •Problems Due to Data Copying and Sequence Association
- •Special Constants
- •Fortran 90 Derived Types
- •A Problem with Register Optimization
- •Basic Fortran Support
- •Extended Fortran Support
- •The mpi Module
- •No Type Mismatch Problems for Subroutines with Choice Arguments
- •Additional Support for Fortran Numeric Intrinsic Types
- •Language Interoperability
- •Introduction
- •Assumptions
- •Initialization
- •Transfer of Handles
- •Status
- •MPI Opaque Objects
- •Datatypes
- •Callback Functions
- •Error Handlers
- •Reduce Operations
- •Addresses
- •Attributes
- •Extra State
- •Constants
- •Interlanguage Communication
- •Language Bindings Summary
- •Groups, Contexts, Communicators, and Caching Fortran Bindings
- •External Interfaces C++ Bindings
- •Change-Log
- •Bibliography
- •Examples Index
- •MPI Declarations Index
- •MPI Function Index
30 |
CHAPTER 3. POINT-TO-POINT COMMUNICATION |
1comm argument. This will allow communication with all the processes available at
2
3
4
5
6
7
initialization time.
Users may de ne new communicators, as explained in Chapter 6. Communicators provide an important encapsulation mechanism for libraries and modules. They allow modules to have their own disjoint communication universe and their own process numbering scheme. (End of advice to users.)
8Advice to implementors. The message envelope would normally be encoded by a
9xed-length message header. However, the actual encoding is implementation depen-
10dent. Some of the information (e.g., source or destination) may be implicit, and need
11not be explicitly carried by messages. Also, processes may be identi ed by relative
12ranks, or absolute ids, etc. (End of advice to implementors.)
13
14
3.2.4 Blocking Receive
15
16 The syntax of the blocking receive operation is given below.
17
18
19
20
21
22
23
24
25
26
27
28
29
30
MPI_RECV (buf, count, datatype, source, tag, comm, status)
OUT |
buf |
initial address of receive bu er (choice) |
IN |
count |
number of elements in receive bu er (non-negative in- |
|
|
teger) |
IN |
datatype |
datatype of each receive bu er element (handle) |
IN |
source |
rank of source or MPI_ANY_SOURCE (integer) |
IN |
tag |
message tag or MPI_ANY_TAG (integer) |
IN |
comm |
communicator (handle) |
OUT |
status |
status object (Status) |
31 |
int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, |
32 |
int tag, MPI_Comm comm, MPI_Status *status) |
44The blocking semantics of this call are described in Section 3.4.
45The receive bu er consists of the storage containing count consecutive elements of the
46type speci ed by datatype, starting at address buf. The length of the received message must
47be less than or equal to the length of the receive bu er. An over ow error occurs if all
48incoming data does not t, without truncation, into the receive bu er.
3.2. BLOCKING SEND AND RECEIVE OPERATIONS |
31 |
If a message that is shorter than the receive bu er arrives, then only those locations corresponding to the (shorter) message are modi ed.
Advice to users. The MPI_PROBE function described in Section 3.8 can be used to receive messages of unknown length. (End of advice to users.)
Advice to implementors. Even though no speci c behavior is mandated by MPI for erroneous programs, the recommended handling of over ow situations is to return in status information about the source and tag of the incoming message. The receive operation will return an error code. A quality implementation will also ensure that no memory that is outside the receive bu er will ever be overwritten.
In the case of a message shorter than the receive bu er, MPI is quite strict in that it allows no modi cation of the other locations. A more lenient statement would allow for some optimizations but this is not allowed. The implementation must be ready to end a copy into the receiver memory exactly at the end of the receive bu er, even if it is an odd address. (End of advice to implementors.)
The selection of a message by a receive operation is governed by the value of the message envelope. A message can be received by a receive operation if its envelope matches the source, tag and comm values speci ed by the receive operation. The receiver may specify a wildcard MPI_ANY_SOURCE value for source, and/or a wildcard MPI_ANY_TAG value for tag, indicating that any source and/or tag are acceptable. It cannot specify a wildcard value for comm. Thus, a message can be received by a receive operation only if it is addressed to the receiving process, has a matching communicator, has matching source unless source=MPI_ANY_SOURCE in the pattern, and has a matching tag unless tag=MPI_ANY_TAG in the pattern.
The message tag is speci ed by the tag argument of the receive operation. The argument source, if di erent from MPI_ANY_SOURCE, is speci ed as a rank within the process group associated with that same communicator (remote process group, for intercommunicators). Thus, the range of valid values for the source argument is f0,...,n- 1g[fMPI_ANY_SOURCEg, where n is the number of processes in this group.
Note the asymmetry between send and receive operations: A receive operation may accept messages from an arbitrary sender, on the other hand, a send operation must specify a unique receiver. This matches a \push" communication mechanism, where data transfer is e ected by the sender (rather than a \pull" mechanism, where data transfer is e ected by the receiver).
Source = destination is allowed, that is, a process can send a message to itself. (However, it is unsafe to do so with the blocking send and receive operations described above, since this may lead to deadlock. See Section 3.5.)
Advice to implementors. Message context and other communicator information can be implemented as an additional tag eld. It di ers from the regular message tag in that wild card matching is not allowed on this eld, and that value setting for this eld is controlled by communicator manipulation functions. (End of advice to implementors.)
3.2.5 Return Status
The source or tag of a received message may not be known if wildcard values were used in the receive operation. Also, if multiple requests are completed by a single MPI function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
32 |
CHAPTER 3. POINT-TO-POINT COMMUNICATION |
1(see Section 3.7.5), a distinct error code may need to be returned for each request. The
2information is returned by the status argument of MPI_RECV. The type of status is MPI-
3de ned. Status variables need to be explicitly allocated by the user, that is, they are not
4system objects.
5In C, status is a structure that contains three elds named MPI_SOURCE, MPI_TAG,
6and MPI_ERROR; the structure may contain additional elds. Thus,
7status.MPI_SOURCE, status.MPI_TAG and status.MPI_ERROR contain the source, tag, and
8error code, respectively, of the received message.
9In Fortran, status is an array of INTEGERs of size MPI_STATUS_SIZE. The constants
10MPI_SOURCE, MPI_TAG and MPI_ERROR are the indices of the entries that store the source,
11tag and error elds. Thus, status(MPI_SOURCE), status(MPI_TAG) and
12status(MPI_ERROR) contain, respectively, the source, tag and error code of the received
13message.
14In C++, the status object is handled through the following methods:
15fint MPI::Status::Get_source() const (binding deprecated, see Section 15.2) g
16
17fvoid MPI::Status::Set_source(int source) (binding deprecated, see Section 15.2) g
18fint MPI::Status::Get_tag() const (binding deprecated, see Section 15.2) g
19
20fvoid MPI::Status::Set_tag(int tag) (binding deprecated, see Section 15.2) g
21fint MPI::Status::Get_error() const (binding deprecated, see Section 15.2) g
22 |
fvoid MPI::Status::Set_error(int error) (binding deprecated, see Section 15.2) g |
23 |
24
In general, message-passing calls do not modify the value of the error code eld of
25
status variables. This eld may be updated only by the functions in Section 3.7.5 which
26
return multiple statuses. The eld is updated if and only if such function returns with an
27
error code of MPI_ERR_IN_STATUS.
28
29Rationale. The error eld in status is not needed for calls that return only one status,
30such as MPI_WAIT, since that would only duplicate the information returned by the
31function itself. The current design avoids the additional overhead of setting it, in such
32cases. The eld is needed for calls that return multiple statuses, since each request
33may have had a di erent failure. (End of rationale.)
34
35The status argument also returns information on the length of the message received.
36However, this information is not directly available as a eld of the status variable and a call
37to MPI_GET_COUNT is required to \decode" this information.
38
39
40
41
42
43
44
MPI_GET_COUNT(status, datatype, count)
IN |
status |
return status of receive operation (Status) |
IN |
datatype |
datatype of each receive bu er entry (handle) |
OUT |
count |
number of received entries (integer) |
45
46 int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count)
47
MPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)
48
3.2. BLOCKING SEND AND RECEIVE OPERATIONS |
33 |
INTEGER STATUS(MPI_STATUS_SIZE), DATATYPE, COUNT, IERROR
fint MPI::Status::Get_count(const MPI::Datatype& datatype) const (binding deprecated, see Section 15.2) g
Returns the number of entries received. (Again, we count entries, each of type datatype, not bytes.) The datatype argument should match the argument provided by the receive call that set the status variable. (We shall later see, in Section 4.1.11, that MPI_GET_COUNT may return, in certain situations, the value MPI_UNDEFINED.)
Rationale. Some message-passing libraries use INOUT count, tag and
source arguments, thus using them both to specify the selection criteria for incoming messages and return the actual envelope values of the received message. The use of a separate status argument prevents errors that are often attached with INOUT argument (e.g., using the MPI_ANY_TAG constant as the tag in a receive). Some libraries use calls that refer implicitly to the \last message received." This is not thread safe.
The datatype argument is passed to MPI_GET_COUNT so as to improve performance. A message might be received without counting the number of elements it contains, and the count value is often not needed. Also, this allows the same function to be used after a call to MPI_PROBE or MPI_IPROBE. With a status from MPI_PROBE or MPI_IPROBE, the same datatypes are allowed as in a call to MPI_RECV to receive this message. (End of rationale.)
The value returned as the count argument of MPI_GET_COUNT for a datatype of length zero where zero bytes have been transferred is zero. If the number of bytes transfered is greater than zero, MPI_UNDEFINED is returned.
Rationale. Zero-length datatypes may be created in a number of cases. An important case is MPI_TYPE_CREATE_DARRAY, where the de nition of the particular darray results in an empty block on some MPI process. Programs written in an SPMD style will not check for this special case and may want to use MPI_GET_COUNT to check the status. (End of rationale.)
Advice to users. The bu er size required for the receive can be a ected by data conversions and by the stride of the receive datatype. In most cases, the safest approach is to use the same datatype with MPI_GET_COUNT and the receive. (End of advice to users.)
All send and receive operations use the buf, count, datatype, source, dest, tag, comm and status arguments in the same way as the blocking MPI_SEND and MPI_RECV operations described in this section.
3.2.6 Passing MPI_STATUS_IGNORE for Status
Every call to MPI_RECV includes a status argument, wherein the system can return details about the message received. There are also a number of other MPI calls where status is returned. An object of type MPI_STATUS is not an MPI opaque object; its structure is declared in mpi.h and mpif.h, and it exists in the user's program. In many cases, application programs are constructed so that it is unnecessary for them to examine the
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48