- •Contents
- •List of Figures
- •List of Tables
- •Acknowledgments
- •Introduction to MPI
- •Overview and Goals
- •Background of MPI-1.0
- •Background of MPI-1.1, MPI-1.2, and MPI-2.0
- •Background of MPI-1.3 and MPI-2.1
- •Background of MPI-2.2
- •Who Should Use This Standard?
- •What Platforms Are Targets For Implementation?
- •What Is Included In The Standard?
- •What Is Not Included In The Standard?
- •Organization of this Document
- •MPI Terms and Conventions
- •Document Notation
- •Naming Conventions
- •Semantic Terms
- •Data Types
- •Opaque Objects
- •Array Arguments
- •State
- •Named Constants
- •Choice
- •Addresses
- •Language Binding
- •Deprecated Names and Functions
- •Fortran Binding Issues
- •C Binding Issues
- •C++ Binding Issues
- •Functions and Macros
- •Processes
- •Error Handling
- •Implementation Issues
- •Independence of Basic Runtime Routines
- •Interaction with Signals
- •Examples
- •Point-to-Point Communication
- •Introduction
- •Blocking Send and Receive Operations
- •Blocking Send
- •Message Data
- •Message Envelope
- •Blocking Receive
- •Return Status
- •Passing MPI_STATUS_IGNORE for Status
- •Data Type Matching and Data Conversion
- •Type Matching Rules
- •Type MPI_CHARACTER
- •Data Conversion
- •Communication Modes
- •Semantics of Point-to-Point Communication
- •Buffer Allocation and Usage
- •Nonblocking Communication
- •Communication Request Objects
- •Communication Initiation
- •Communication Completion
- •Semantics of Nonblocking Communications
- •Multiple Completions
- •Non-destructive Test of status
- •Probe and Cancel
- •Persistent Communication Requests
- •Send-Receive
- •Null Processes
- •Datatypes
- •Derived Datatypes
- •Type Constructors with Explicit Addresses
- •Datatype Constructors
- •Subarray Datatype Constructor
- •Distributed Array Datatype Constructor
- •Address and Size Functions
- •Lower-Bound and Upper-Bound Markers
- •Extent and Bounds of Datatypes
- •True Extent of Datatypes
- •Commit and Free
- •Duplicating a Datatype
- •Use of General Datatypes in Communication
- •Correct Use of Addresses
- •Decoding a Datatype
- •Examples
- •Pack and Unpack
- •Canonical MPI_PACK and MPI_UNPACK
- •Collective Communication
- •Introduction and Overview
- •Communicator Argument
- •Applying Collective Operations to Intercommunicators
- •Barrier Synchronization
- •Broadcast
- •Example using MPI_BCAST
- •Gather
- •Examples using MPI_GATHER, MPI_GATHERV
- •Scatter
- •Examples using MPI_SCATTER, MPI_SCATTERV
- •Example using MPI_ALLGATHER
- •All-to-All Scatter/Gather
- •Global Reduction Operations
- •Reduce
- •Signed Characters and Reductions
- •MINLOC and MAXLOC
- •All-Reduce
- •Process-local reduction
- •Reduce-Scatter
- •MPI_REDUCE_SCATTER_BLOCK
- •MPI_REDUCE_SCATTER
- •Scan
- •Inclusive Scan
- •Exclusive Scan
- •Example using MPI_SCAN
- •Correctness
- •Introduction
- •Features Needed to Support Libraries
- •MPI's Support for Libraries
- •Basic Concepts
- •Groups
- •Contexts
- •Intra-Communicators
- •Group Management
- •Group Accessors
- •Group Constructors
- •Group Destructors
- •Communicator Management
- •Communicator Accessors
- •Communicator Constructors
- •Communicator Destructors
- •Motivating Examples
- •Current Practice #1
- •Current Practice #2
- •(Approximate) Current Practice #3
- •Example #4
- •Library Example #1
- •Library Example #2
- •Inter-Communication
- •Inter-communicator Accessors
- •Inter-communicator Operations
- •Inter-Communication Examples
- •Caching
- •Functionality
- •Communicators
- •Windows
- •Datatypes
- •Error Class for Invalid Keyval
- •Attributes Example
- •Naming Objects
- •Formalizing the Loosely Synchronous Model
- •Basic Statements
- •Models of Execution
- •Static communicator allocation
- •Dynamic communicator allocation
- •The General case
- •Process Topologies
- •Introduction
- •Virtual Topologies
- •Embedding in MPI
- •Overview of the Functions
- •Topology Constructors
- •Cartesian Constructor
- •Cartesian Convenience Function: MPI_DIMS_CREATE
- •General (Graph) Constructor
- •Distributed (Graph) Constructor
- •Topology Inquiry Functions
- •Cartesian Shift Coordinates
- •Partitioning of Cartesian structures
- •Low-Level Topology Functions
- •An Application Example
- •MPI Environmental Management
- •Implementation Information
- •Version Inquiries
- •Environmental Inquiries
- •Tag Values
- •Host Rank
- •IO Rank
- •Clock Synchronization
- •Memory Allocation
- •Error Handling
- •Error Handlers for Communicators
- •Error Handlers for Windows
- •Error Handlers for Files
- •Freeing Errorhandlers and Retrieving Error Strings
- •Error Codes and Classes
- •Error Classes, Error Codes, and Error Handlers
- •Timers and Synchronization
- •Startup
- •Allowing User Functions at Process Termination
- •Determining Whether MPI Has Finished
- •Portable MPI Process Startup
- •The Info Object
- •Process Creation and Management
- •Introduction
- •The Dynamic Process Model
- •Starting Processes
- •The Runtime Environment
- •Process Manager Interface
- •Processes in MPI
- •Starting Processes and Establishing Communication
- •Reserved Keys
- •Spawn Example
- •Manager-worker Example, Using MPI_COMM_SPAWN.
- •Establishing Communication
- •Names, Addresses, Ports, and All That
- •Server Routines
- •Client Routines
- •Name Publishing
- •Reserved Key Values
- •Client/Server Examples
- •Ocean/Atmosphere - Relies on Name Publishing
- •Simple Client-Server Example.
- •Other Functionality
- •Universe Size
- •Singleton MPI_INIT
- •MPI_APPNUM
- •Releasing Connections
- •Another Way to Establish MPI Communication
- •One-Sided Communications
- •Introduction
- •Initialization
- •Window Creation
- •Window Attributes
- •Communication Calls
- •Examples
- •Accumulate Functions
- •Synchronization Calls
- •Fence
- •General Active Target Synchronization
- •Lock
- •Assertions
- •Examples
- •Error Handling
- •Error Handlers
- •Error Classes
- •Semantics and Correctness
- •Atomicity
- •Progress
- •Registers and Compiler Optimizations
- •External Interfaces
- •Introduction
- •Generalized Requests
- •Examples
- •Associating Information with Status
- •MPI and Threads
- •General
- •Initialization
- •Introduction
- •File Manipulation
- •Opening a File
- •Closing a File
- •Deleting a File
- •Resizing a File
- •Preallocating Space for a File
- •Querying the Size of a File
- •Querying File Parameters
- •File Info
- •Reserved File Hints
- •File Views
- •Data Access
- •Data Access Routines
- •Positioning
- •Synchronism
- •Coordination
- •Data Access Conventions
- •Data Access with Individual File Pointers
- •Data Access with Shared File Pointers
- •Noncollective Operations
- •Collective Operations
- •Seek
- •Split Collective Data Access Routines
- •File Interoperability
- •Datatypes for File Interoperability
- •Extent Callback
- •Datarep Conversion Functions
- •Matching Data Representations
- •Consistency and Semantics
- •File Consistency
- •Random Access vs. Sequential Files
- •Progress
- •Collective File Operations
- •Type Matching
- •Logical vs. Physical File Layout
- •File Size
- •Examples
- •Asynchronous I/O
- •I/O Error Handling
- •I/O Error Classes
- •Examples
- •Subarray Filetype Constructor
- •Requirements
- •Discussion
- •Logic of the Design
- •Examples
- •MPI Library Implementation
- •Systems with Weak Symbols
- •Systems Without Weak Symbols
- •Complications
- •Multiple Counting
- •Linker Oddities
- •Multiple Levels of Interception
- •Deprecated Functions
- •Deprecated since MPI-2.0
- •Deprecated since MPI-2.2
- •Language Bindings
- •Overview
- •Design
- •C++ Classes for MPI
- •Class Member Functions for MPI
- •Semantics
- •C++ Datatypes
- •Communicators
- •Exceptions
- •Mixed-Language Operability
- •Problems With Fortran Bindings for MPI
- •Problems Due to Strong Typing
- •Problems Due to Data Copying and Sequence Association
- •Special Constants
- •Fortran 90 Derived Types
- •A Problem with Register Optimization
- •Basic Fortran Support
- •Extended Fortran Support
- •The mpi Module
- •No Type Mismatch Problems for Subroutines with Choice Arguments
- •Additional Support for Fortran Numeric Intrinsic Types
- •Language Interoperability
- •Introduction
- •Assumptions
- •Initialization
- •Transfer of Handles
- •Status
- •MPI Opaque Objects
- •Datatypes
- •Callback Functions
- •Error Handlers
- •Reduce Operations
- •Addresses
- •Attributes
- •Extra State
- •Constants
- •Interlanguage Communication
- •Language Bindings Summary
- •Groups, Contexts, Communicators, and Caching Fortran Bindings
- •External Interfaces C++ Bindings
- •Change-Log
- •Bibliography
- •Examples Index
- •MPI Declarations Index
- •MPI Function Index
5.8. ALL-TO-ALL SCATTER/GATHER |
157 |
After the call, every process has the group-wide concatenation of the sets of data.
5.8 All-to-All Scatter/Gather
MPI_ALLTOALL(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)
IN |
sendbuf |
starting address of send bu er (choice) |
IN |
sendcount |
number of elements sent to each process (non-negative |
|
|
integer) |
IN |
sendtype |
data type of send bu er elements (handle) |
OUT |
recvbuf |
address of receive bu er (choice) |
IN |
recvcount |
number of elements received from any process (non- |
|
|
negative integer) |
IN |
recvtype |
data type of receive bu er elements (handle) |
IN |
comm |
communicator (handle) |
int MPI_Alltoall(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)
MPI_ALLTOALL(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE, COMM, IERROR
fvoid MPI::Comm::Alltoall(const void* sendbuf, int sendcount, const MPI::Datatype& sendtype, void* recvbuf, int recvcount,
const MPI::Datatype& recvtype) const = 0 (binding deprecated, see Section 15.2) g
MPI_ALLTOALL is an extension of MPI_ALLGATHER to the case where each process sends distinct data to each of the receivers. The j-th block sent from process i is received by process j and is placed in the i-th block of recvbuf.
The type signature associated with sendcount, sendtype, at a process must be equal to the type signature associated with recvcount, recvtype at any other process. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of processes. As usual, however, the type maps may be di erent.
If comm is an intracommunicator, the outcome is as if each process executed a send to each process (itself included) with a call to,
MPI_Send(sendbuf + i sendcount extent(sendtype); sendcount; sendtype; i; :::);
and a receive from every other process with a call to,
MPI_Recv(recvbuf + i recvcount extent(recvtype); recvcount; recvtype; i; :::):
All arguments on all processes are signi cant. The argument comm must have identical values on all processes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
158 |
CHAPTER 5. COLLECTIVE COMMUNICATION |
1The \in place" option for intracommunicators is speci ed by passing MPI_IN_PLACE to
2the argument sendbuf at all processes. In such a case, sendcount and sendtype are ignored.
3The data to be sent is taken from the recvbuf and replaced by the received data. Data sent
4and received must have the same type map as speci ed by recvcount and recvtype.
5
6Rationale. For large MPI_ALLTOALL instances, allocating both send and receive
7bu ers may consume too much memory. The \in place" option e ectively halves the
8application memory consumption and is useful in situations where the data to be sent
9will not be used by the sending process after the MPI_ALLTOALL exchange (e.g., in
10
11
parallel Fast Fourier Transforms). (End of rationale.)
12Advice to implementors. Users may opt to use the \in place" option in order to con-
13serve memory. Quality MPI implementations should thus strive to minimize system
14bu ering. (End of advice to implementors.)
15
16If comm is an intercommunicator, then the outcome is as if each process in group A
17sends a message to each process in group B, and vice versa. The j-th send bu er of process
18i in group A should be consistent with the i-th receive bu er of process j in group B, and
19vice versa.
20
21Advice to users. When a complete exchange is executed on an intercommunication
22domain, then the number of data items sent from processes in group A to processes
23in group B need not equal the number of items sent in the reverse direction. In
24particular, one can have unidirectional communication by specifying sendcount = 0 in
25the reverse direction.
26(End of advice to users.)
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
5.8. ALL-TO-ALL SCATTER/GATHER |
159 |
MPI_ALLTOALLV(sendbuf, sendcounts, sdispls, sendtype, recvbuf, recvcounts, rdispls, recvtype, comm)
IN |
sendbuf |
starting address of send bu er (choice) |
IN |
sendcounts |
non-negative integer array (of length group size) speci- |
|
|
fying the number of elements to send to each processor |
IN |
sdispls |
integer array (of length group size). Entry j speci es |
|
|
the displacement (relative to sendbuf from which to |
|
|
take the outgoing data destined for process j |
IN |
sendtype |
data type of send bu er elements (handle) |
OUT |
recvbuf |
address of receive bu er (choice) |
IN |
recvcounts |
non-negative integer array (of length group size) spec- |
|
|
ifying the number of elements that can be received |
|
|
from each processor |
IN |
rdispls |
integer array (of length group size). Entry i speci es |
|
|
the displacement (relative to recvbuf at which to place |
|
|
the incoming data from process i |
IN |
recvtype |
data type of receive bu er elements (handle) |
IN |
comm |
communicator (handle) |
int MPI_Alltoallv(void* sendbuf, int *sendcounts, int *sdispls, MPI_Datatype sendtype, void* recvbuf, int *recvcounts, int *rdispls, MPI_Datatype recvtype, MPI_Comm comm)
MPI_ALLTOALLV(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPE, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPE, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER SENDCOUNTS(*), SDISPLS(*), SENDTYPE, RECVCOUNTS(*), RDISPLS(*), RECVTYPE, COMM, IERROR
fvoid MPI::Comm::Alltoallv(const void* sendbuf, const int sendcounts[], const int sdispls[], const MPI::Datatype& sendtype,
void* recvbuf, const int recvcounts[], const int rdispls[], const MPI::Datatype& recvtype) const = 0 (binding deprecated, see Section 15.2) g
MPI_ALLTOALLV adds exibility to MPI_ALLTOALL in that the location of data for the send is speci ed by sdispls and the location of the placement of the data on the receive side is speci ed by rdispls.
If comm is an intracommunicator, then the j-th block sent from process i is received by process j and is placed in the i-th block of recvbuf. These blocks need not all have the same size.
The type signature associated with sendcounts[j], sendtype at process i must be equal to the type signature associated with recvcounts[i], recvtype at process j. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of processes. Distinct type maps between sender and receiver are still allowed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
160 |
CHAPTER 5. COLLECTIVE COMMUNICATION |
1The outcome is as if each process sent a message to every other process with,
2
3MPI_Send(sendbuf + sdispls[i] extent(sendtype); sendcounts[i]; sendtype; i; :::);
4
and received a message from every other process with a call to
5
6MPI_Recv(recvbuf + rdispls[i] extent(recvtype); recvcounts[i]; recvtype; i; :::):
7
8All arguments on all processes are signi cant. The argument comm must have identical
9values on all processes.
10The \in place" option for intracommunicators is speci ed by passing MPI_IN_PLACE to
11the argument sendbuf at all processes. In such a case, sendcounts, sdispls and sendtype are
12ignored. The data to be sent is taken from the recvbuf and replaced by the received data.
13Data sent and received must have the same type map as speci ed by the recvcounts array
14and the recvtype, and is taken from the locations of the receive bu er speci ed by rdispls.
15
16
Advice to users. Specifying the \in place" option (which must be given on all
17processes) implies that the same amount and type of data is sent and received between
18any two processes in the group of the communicator. Di erent pairs of processes can
19exchange di erent amounts of data. Users must ensure that recvcounts[j] and recvtype
20on process i match recvcounts[i] and recvtype on process j. This symmetric exchange
21can be useful in applications where the data to be sent will not be used by the sending
22process after the MPI_ALLTOALLV exchange. (End of advice to users.)
23
If comm is an intercommunicator, then the outcome is as if each process in group A
24
sends a message to each process in group B, and vice versa. The j-th send bu er of process
25
i in group A should be consistent with the i-th receive bu er of process j in group B, and
26
vice versa.
27
28
Rationale. The de nitions of MPI_ALLTOALL and MPI_ALLTOALLV give as much
29
exibility as one would achieve by specifying n independent, point-to-point communi-
30
cations, with two exceptions: all messages use the same datatype, and messages are
31
scattered from (or gathered to) sequential storage. (End of rationale.)
32
33
34
35
36
37
Advice to implementors. Although the discussion of collective communication in terms of point-to-point operation implies that each message is transferred directly from sender to receiver, implementations may use a tree communication pattern. Messages can be forwarded by intermediate nodes where they are split (for scatter) or concatenated (for gather), if this is more e cient. (End of advice to implementors.)
38
39
40
41MPI_ALLTOALLW(sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recv-
42types, comm)
43
44
IN |
sendbuf |
starting address of send bu er (choice) |
45
46
47
48
5.8. ALL-TO-ALL SCATTER/GATHER |
161 |
||
IN |
sendcounts |
non-negative integer array (of length group size) speci- |
|
|
|
fying the number of elements to send to each processor |
|
IN |
sdispls |
integer array (of length group size). Entry j speci es |
|
|
|
the displacement in bytes (relative to sendbuf) from |
|
|
|
which to take the outgoing data destined for process |
|
|
|
j (array of integers) |
|
IN |
sendtypes |
array of datatypes (of length group size). |
Entry j |
|
|
speci es the type of data to send to process j (array |
|
|
|
of handles) |
|
OUT |
recvbuf |
address of receive bu er (choice) |
|
IN |
recvcounts |
non-negative integer array (of length group size) spec- |
|
|
|
ifying the number of elements that can be received |
|
|
|
from each processor |
|
IN |
rdispls |
integer array (of length group size). Entry i speci es |
|
|
|
the displacement in bytes (relative to recvbuf) at which |
|
|
|
to place the incoming data from process i (array of |
|
|
|
integers) |
|
IN |
recvtypes |
array of datatypes (of length group size). |
Entry i |
|
|
speci es the type of data received from process i (ar- |
|
|
|
ray of handles) |
|
IN |
comm |
communicator (handle) |
|
int MPI_Alltoallw(void *sendbuf, int sendcounts[], int sdispls[], MPI_Datatype sendtypes[], void *recvbuf, int recvcounts[], int rdispls[], MPI_Datatype recvtypes[], MPI_Comm comm)
MPI_ALLTOALLW(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPES, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPES, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER SENDCOUNTS(*), SDISPLS(*), SENDTYPES(*), RECVCOUNTS(*), RDISPLS(*), RECVTYPES(*), COMM, IERROR
fvoid MPI::Comm::Alltoallw(const void* sendbuf, const int sendcounts[], const int sdispls[], const MPI::Datatype sendtypes[], void* recvbuf, const int recvcounts[], const int rdispls[], const MPI::Datatype recvtypes[]) const = 0 (binding deprecated, see Section 15.2) g
MPI_ALLTOALLW is the most general form of complete exchange. Like MPI_TYPE_CREATE_STRUCT, the most general type constructor, MPI_ALLTOALLW allows separate speci cation of count, displacement and datatype. In addition, to allow maximum exibility, the displacement of blocks within the send and receive bu ers is speci ed in bytes.
If comm is an intracommunicator, then the j-th block sent from process i is received by process j and is placed in the i-th block of recvbuf. These blocks need not all have the same size.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48