- •Contents
- •List of Figures
- •List of Tables
- •Acknowledgments
- •Introduction to MPI
- •Overview and Goals
- •Background of MPI-1.0
- •Background of MPI-1.1, MPI-1.2, and MPI-2.0
- •Background of MPI-1.3 and MPI-2.1
- •Background of MPI-2.2
- •Who Should Use This Standard?
- •What Platforms Are Targets For Implementation?
- •What Is Included In The Standard?
- •What Is Not Included In The Standard?
- •Organization of this Document
- •MPI Terms and Conventions
- •Document Notation
- •Naming Conventions
- •Semantic Terms
- •Data Types
- •Opaque Objects
- •Array Arguments
- •State
- •Named Constants
- •Choice
- •Addresses
- •Language Binding
- •Deprecated Names and Functions
- •Fortran Binding Issues
- •C Binding Issues
- •C++ Binding Issues
- •Functions and Macros
- •Processes
- •Error Handling
- •Implementation Issues
- •Independence of Basic Runtime Routines
- •Interaction with Signals
- •Examples
- •Point-to-Point Communication
- •Introduction
- •Blocking Send and Receive Operations
- •Blocking Send
- •Message Data
- •Message Envelope
- •Blocking Receive
- •Return Status
- •Passing MPI_STATUS_IGNORE for Status
- •Data Type Matching and Data Conversion
- •Type Matching Rules
- •Type MPI_CHARACTER
- •Data Conversion
- •Communication Modes
- •Semantics of Point-to-Point Communication
- •Buffer Allocation and Usage
- •Nonblocking Communication
- •Communication Request Objects
- •Communication Initiation
- •Communication Completion
- •Semantics of Nonblocking Communications
- •Multiple Completions
- •Non-destructive Test of status
- •Probe and Cancel
- •Persistent Communication Requests
- •Send-Receive
- •Null Processes
- •Datatypes
- •Derived Datatypes
- •Type Constructors with Explicit Addresses
- •Datatype Constructors
- •Subarray Datatype Constructor
- •Distributed Array Datatype Constructor
- •Address and Size Functions
- •Lower-Bound and Upper-Bound Markers
- •Extent and Bounds of Datatypes
- •True Extent of Datatypes
- •Commit and Free
- •Duplicating a Datatype
- •Use of General Datatypes in Communication
- •Correct Use of Addresses
- •Decoding a Datatype
- •Examples
- •Pack and Unpack
- •Canonical MPI_PACK and MPI_UNPACK
- •Collective Communication
- •Introduction and Overview
- •Communicator Argument
- •Applying Collective Operations to Intercommunicators
- •Barrier Synchronization
- •Broadcast
- •Example using MPI_BCAST
- •Gather
- •Examples using MPI_GATHER, MPI_GATHERV
- •Scatter
- •Examples using MPI_SCATTER, MPI_SCATTERV
- •Example using MPI_ALLGATHER
- •All-to-All Scatter/Gather
- •Global Reduction Operations
- •Reduce
- •Signed Characters and Reductions
- •MINLOC and MAXLOC
- •All-Reduce
- •Process-local reduction
- •Reduce-Scatter
- •MPI_REDUCE_SCATTER_BLOCK
- •MPI_REDUCE_SCATTER
- •Scan
- •Inclusive Scan
- •Exclusive Scan
- •Example using MPI_SCAN
- •Correctness
- •Introduction
- •Features Needed to Support Libraries
- •MPI's Support for Libraries
- •Basic Concepts
- •Groups
- •Contexts
- •Intra-Communicators
- •Group Management
- •Group Accessors
- •Group Constructors
- •Group Destructors
- •Communicator Management
- •Communicator Accessors
- •Communicator Constructors
- •Communicator Destructors
- •Motivating Examples
- •Current Practice #1
- •Current Practice #2
- •(Approximate) Current Practice #3
- •Example #4
- •Library Example #1
- •Library Example #2
- •Inter-Communication
- •Inter-communicator Accessors
- •Inter-communicator Operations
- •Inter-Communication Examples
- •Caching
- •Functionality
- •Communicators
- •Windows
- •Datatypes
- •Error Class for Invalid Keyval
- •Attributes Example
- •Naming Objects
- •Formalizing the Loosely Synchronous Model
- •Basic Statements
- •Models of Execution
- •Static communicator allocation
- •Dynamic communicator allocation
- •The General case
- •Process Topologies
- •Introduction
- •Virtual Topologies
- •Embedding in MPI
- •Overview of the Functions
- •Topology Constructors
- •Cartesian Constructor
- •Cartesian Convenience Function: MPI_DIMS_CREATE
- •General (Graph) Constructor
- •Distributed (Graph) Constructor
- •Topology Inquiry Functions
- •Cartesian Shift Coordinates
- •Partitioning of Cartesian structures
- •Low-Level Topology Functions
- •An Application Example
- •MPI Environmental Management
- •Implementation Information
- •Version Inquiries
- •Environmental Inquiries
- •Tag Values
- •Host Rank
- •IO Rank
- •Clock Synchronization
- •Memory Allocation
- •Error Handling
- •Error Handlers for Communicators
- •Error Handlers for Windows
- •Error Handlers for Files
- •Freeing Errorhandlers and Retrieving Error Strings
- •Error Codes and Classes
- •Error Classes, Error Codes, and Error Handlers
- •Timers and Synchronization
- •Startup
- •Allowing User Functions at Process Termination
- •Determining Whether MPI Has Finished
- •Portable MPI Process Startup
- •The Info Object
- •Process Creation and Management
- •Introduction
- •The Dynamic Process Model
- •Starting Processes
- •The Runtime Environment
- •Process Manager Interface
- •Processes in MPI
- •Starting Processes and Establishing Communication
- •Reserved Keys
- •Spawn Example
- •Manager-worker Example, Using MPI_COMM_SPAWN.
- •Establishing Communication
- •Names, Addresses, Ports, and All That
- •Server Routines
- •Client Routines
- •Name Publishing
- •Reserved Key Values
- •Client/Server Examples
- •Ocean/Atmosphere - Relies on Name Publishing
- •Simple Client-Server Example.
- •Other Functionality
- •Universe Size
- •Singleton MPI_INIT
- •MPI_APPNUM
- •Releasing Connections
- •Another Way to Establish MPI Communication
- •One-Sided Communications
- •Introduction
- •Initialization
- •Window Creation
- •Window Attributes
- •Communication Calls
- •Examples
- •Accumulate Functions
- •Synchronization Calls
- •Fence
- •General Active Target Synchronization
- •Lock
- •Assertions
- •Examples
- •Error Handling
- •Error Handlers
- •Error Classes
- •Semantics and Correctness
- •Atomicity
- •Progress
- •Registers and Compiler Optimizations
- •External Interfaces
- •Introduction
- •Generalized Requests
- •Examples
- •Associating Information with Status
- •MPI and Threads
- •General
- •Initialization
- •Introduction
- •File Manipulation
- •Opening a File
- •Closing a File
- •Deleting a File
- •Resizing a File
- •Preallocating Space for a File
- •Querying the Size of a File
- •Querying File Parameters
- •File Info
- •Reserved File Hints
- •File Views
- •Data Access
- •Data Access Routines
- •Positioning
- •Synchronism
- •Coordination
- •Data Access Conventions
- •Data Access with Individual File Pointers
- •Data Access with Shared File Pointers
- •Noncollective Operations
- •Collective Operations
- •Seek
- •Split Collective Data Access Routines
- •File Interoperability
- •Datatypes for File Interoperability
- •Extent Callback
- •Datarep Conversion Functions
- •Matching Data Representations
- •Consistency and Semantics
- •File Consistency
- •Random Access vs. Sequential Files
- •Progress
- •Collective File Operations
- •Type Matching
- •Logical vs. Physical File Layout
- •File Size
- •Examples
- •Asynchronous I/O
- •I/O Error Handling
- •I/O Error Classes
- •Examples
- •Subarray Filetype Constructor
- •Requirements
- •Discussion
- •Logic of the Design
- •Examples
- •MPI Library Implementation
- •Systems with Weak Symbols
- •Systems Without Weak Symbols
- •Complications
- •Multiple Counting
- •Linker Oddities
- •Multiple Levels of Interception
- •Deprecated Functions
- •Deprecated since MPI-2.0
- •Deprecated since MPI-2.2
- •Language Bindings
- •Overview
- •Design
- •C++ Classes for MPI
- •Class Member Functions for MPI
- •Semantics
- •C++ Datatypes
- •Communicators
- •Exceptions
- •Mixed-Language Operability
- •Problems With Fortran Bindings for MPI
- •Problems Due to Strong Typing
- •Problems Due to Data Copying and Sequence Association
- •Special Constants
- •Fortran 90 Derived Types
- •A Problem with Register Optimization
- •Basic Fortran Support
- •Extended Fortran Support
- •The mpi Module
- •No Type Mismatch Problems for Subroutines with Choice Arguments
- •Additional Support for Fortran Numeric Intrinsic Types
- •Language Interoperability
- •Introduction
- •Assumptions
- •Initialization
- •Transfer of Handles
- •Status
- •MPI Opaque Objects
- •Datatypes
- •Callback Functions
- •Error Handlers
- •Reduce Operations
- •Addresses
- •Attributes
- •Extra State
- •Constants
- •Interlanguage Communication
- •Language Bindings Summary
- •Groups, Contexts, Communicators, and Caching Fortran Bindings
- •External Interfaces C++ Bindings
- •Change-Log
- •Bibliography
- •Examples Index
- •MPI Declarations Index
- •MPI Function Index
11.3. COMMUNICATION CALLS |
339 |
MPI_WIN_GET_GROUP(WIN, GROUP, IERROR)
INTEGER WIN, GROUP, IERROR
fMPI::Group MPI::Win::Get_group() const (binding deprecated, see Section 15.2) g
MPI_WIN_GET_GROUP returns a duplicate of the group of the communicator used to create the window. associated with win. The group is returned in group.
11.3 Communication Calls
MPI supports three RMA communication calls: MPI_PUT transfers data from the caller memory (origin) to the target memory; MPI_GET transfers data from the target memory to the caller memory; and MPI_ACCUMULATE updates locations in the target memory, e.g. by adding to these locations values sent from the caller memory. These operations are nonblocking: the call initiates the transfer, but the transfer may continue after the call returns. The transfer is completed, both at the origin and at the target, when a subsequent synchronization call is issued by the caller on the involved window object. These synchronization calls are described in Section 11.4, page 347.
The local communication bu er of an RMA call should not be updated, and the local communication bu er of a get call should not be accessed after the RMA call, until the subsequent synchronization call completes.
It is erroneous to have concurrent con icting accesses to the same memory location in a window; if a location is updated by a put or accumulate operation, then this location cannot be accessed by a load or another RMA operation until the updating operation has completed at the target. There is one exception to this rule; namely, the same location can be updated by several concurrent accumulate calls, the outcome being as if these updates occurred in some order. In addition, a window cannot concurrently be updated by a put or accumulate operation and by a local store operation. This, even if these two updates access di erent locations in the window. The last restriction enables more e cient implementations of RMA operations on many systems. These restrictions are described in more detail in Section 11.7, page 363.
The calls use general datatype arguments to specify communication bu ers at the origin and at the target. Thus, a transfer operation may also gather data at the source and scatter it at the destination. However, all arguments specifying both communication bu ers are provided by the caller.
For all three calls, the target process may be identical with the origin process; i.e., a process may use an RMA operation to move data in its memory.
Rationale. The choice of supporting \self-communication" is the same as for messagepassing. It simpli es some coding, and is very useful with accumulate operations, to allow atomic updates of local variables. (End of rationale.)
MPI_PROC_NULL is a valid target rank in the MPI RMA calls MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The e ect is the same as for MPI_PROC_NULL in MPI point- to-point communication. After any RMA operation with rank MPI_PROC_NULL, it is still necessary to nish the RMA epoch with the synchronization method that started the epoch.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
340 |
CHAPTER 11. ONE-SIDED COMMUNICATIONS |
11.3.1 Put
The execution of a put operation is similar to the execution of a send by the origin process and a matching receive by the target process. The obvious di erence is that all arguments are provided by one call | the call executed by the origin process.
MPI_PUT(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, target_datatype, win)
9 |
|
|
initial address of origin bu er (choice) |
10 |
IN |
origin_addr |
|
11 |
IN |
origin_count |
number of entries in origin bu er (non-negative inte- |
|
|||
12 |
|
|
ger) |
13 |
IN |
origin_datatype |
datatype of each entry in origin bu er (handle) |
|
|||
14 |
|
|
rank of target (non-negative integer) |
15 |
IN |
target_rank |
|
16 |
IN |
target_disp |
displacement from start of window to target bu er |
|
|||
17 |
|
|
(non-negative integer) |
18 |
IN |
target_count |
number of entries in target bu er (non-negative inte- |
|
|||
19 |
|
|
ger) |
|
|
|
|
20 |
|
|
datatype of each entry in target bu er (handle) |
21 |
IN |
target_datatype |
|
|
|
|
|
22 |
IN |
win |
window object used for communication (handle) |
|
|||
23 |
|
|
|
24 |
int MPI_Put(void *origin_addr, int origin_count, MPI_Datatype |
||
|
|||
25 |
|
origin_datatype, int target_rank, MPI_Aint target_disp, int |
|
|
|
||
26 |
|
target_count, MPI_Datatype target_datatype, MPI_Win win) |
|
|
|
||
27 |
|
|
|
28 |
MPI_PUT(ORIGIN_ADDR, ORIGIN_COUNT, ORIGIN_DATATYPE, TARGET_RANK, |
||
29 |
|
TARGET_DISP, TARGET_COUNT, TARGET_DATATYPE, WIN, IERROR) |
30<type> ORIGIN_ADDR(*)
31INTEGER(KIND=MPI_ADDRESS_KIND) TARGET_DISP
32INTEGER ORIGIN_COUNT, ORIGIN_DATATYPE, TARGET_RANK, TARGET_COUNT,
33TARGET_DATATYPE, WIN, IERROR
34fvoid MPI::Win::Put(const void* origin_addr, int origin_count, const
35
36
37
38
MPI::Datatype& origin_datatype, int target_rank, MPI::Aint target_disp, int target_count, const MPI::Datatype& target_datatype) const (binding deprecated, see Section 15.2) g
39Transfers origin_count successive entries of the type speci ed by the origin_datatype,
40starting at address origin_addr on the origin node to the target node speci ed by the
41win, target_rank pair. The data are written in the target bu er at address target_addr =
42window_base + target_disp disp_unit, where window_base and disp_unit are the base address
43and window displacement unit speci ed at window initialization, by the target process.
44The target bu er is speci ed by the arguments target_count and target_datatype.
45The data transfer is the same as that which would occur if the origin process executed
46a send operation with arguments origin_addr, origin_count, origin_datatype, target_rank, tag,
47comm, and the target process executed a receive operation with arguments target_addr,
48
11.3. COMMUNICATION CALLS |
341 |
target_count, target_datatype, source, tag, comm, where target_addr is the target bu er address computed as explained above, and comm is a communicator for the group of win.
The communication must satisfy the same constraints as for a similar message-passing communication. The target_datatype may not specify overlapping entries in the target bu er. The message sent must t, without truncation, in the target bu er. Furthermore, the target bu er must t in the target window.
The target_datatype argument is a handle to a datatype object de ned at the origin process. However, this object is interpreted at the target process: the outcome is as if the target datatype object was de ned at the target process, by the same sequence of calls used to de ne it at the origin process. The target datatype must contain only relative displacements, not absolute addresses. The same holds for get and accumulate.
Advice to users. The target_datatype argument is a handle to a datatype object that is de ned at the origin process, even though it de nes a data layout in the target process memory. This causes no problems in a homogeneous environment, or in a heterogeneous environment, if only portable datatypes are used (portable datatypes are de ned in Section 2.4, page 11).
The performance of a put transfer can be signi cantly a ected, on some systems, from the choice of window location and the shape and location of the origin and target bu er: transfers to a target window in memory allocated by MPI_ALLOC_MEM may be much faster on shared memory systems; transfers from contiguous bu ers will be faster on most, if not all, systems; the alignment of the communication bu ers may also impact performance. (End of advice to users.)
Advice to implementors. A high-quality implementation will attempt to prevent remote accesses to memory outside the window that was exposed by the process. This, both for debugging purposes, and for protection with client-server codes that use RMA. I.e., a high-quality implementation will check, if possible, window bounds on each RMA call, and raise an MPI exception at the origin call if an out-of-bound situation occurred. Note that the condition can be checked at the origin. Of course, the added safety achieved by such checks has to be weighed against the added cost of such checks. (End of advice to implementors.)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
342 |
|
CHAPTER 11. ONE-SIDED COMMUNICATIONS |
1 |
11.3.2 |
Get |
|
|
|
||
2 |
|
|
|
3 |
|
|
|
4 |
MPI_GET(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, |
||
|
|||
5 |
target_datatype, win) |
|
|
|
|
||
6 |
|
|
initial address of origin bu er (choice) |
7 |
OUT |
origin_addr |
|
8 |
IN |
origin_count |
number of entries in origin bu er (non-negative inte- |
9 |
|
|
ger) |
10 |
IN |
origin_datatype |
datatype of each entry in origin bu er (handle) |
|
|||
11 |
|
|
rank of target (non-negative integer) |
12 |
IN |
target_rank |
|
|
|
|
|
13 |
IN |
target_disp |
displacement from window start to the beginning of |
14 |
|
|
the target bu er (non-negative integer) |
15 |
IN |
target_count |
number of entries in target bu er (non-negative inte- |
|
|||
16 |
|
|
ger) |
|
|
|
|
17 |
|
target_datatype |
datatype of each entry in target bu er (handle) |
18 |
IN |
||
|
|
|
|
19 |
IN |
win |
window object used for communication (handle) |
20 |
|
|
|
21 |
int MPI_Get(void *origin_addr, int origin_count, MPI_Datatype |
||
|
|||
22 |
|
origin_datatype, int target_rank, MPI_Aint target_disp, int |
|
|
|
||
23 |
|
target_count, MPI_Datatype target_datatype, MPI_Win win) |
|
|
|
||
24 |
|
|
|
25 |
MPI_GET(ORIGIN_ADDR, ORIGIN_COUNT, ORIGIN_DATATYPE, TARGET_RANK, |
||
|
|
|
|
26 |
|
TARGET_DISP, TARGET_COUNT, TARGET_DATATYPE, WIN, IERROR) |
|
|
|
|
|
27 |
<type> ORIGIN_ADDR(*) |
|
|
|
|
|
|
28 |
INTEGER(KIND=MPI_ADDRESS_KIND) TARGET_DISP |
||
|
|
|
|
29 |
INTEGER ORIGIN_COUNT, ORIGIN_DATATYPE, TARGET_RANK, TARGET_COUNT, |
||
|
|
|
|
30 |
TARGET_DATATYPE, WIN, IERROR |
||
|
|
|
|
31 |
fvoid MPI::Win::Get(void *origin_addr, int origin_count, const |
||
32 |
|||
|
|
MPI::Datatype& origin_datatype, int target_rank, MPI::Aint |
33 |
target_disp, int |
target_count, const MPI::Datatype& |
|
||
34 |
target_datatype) |
const (binding deprecated, see Section 15.2) g |
|
35
36Similar to MPI_PUT, except that the direction of data transfer is reversed. Data
37are copied from the target memory to the origin. The origin_datatype may not specify
38overlapping entries in the origin bu er. The target bu er must be contained within the
39target window, and the copied data must t, without truncation, in the origin bu er.
40
41 11.3.3 Examples
42
Example 11.1 We show how to implement the generic indirect assignment A = B(map),
43
where A, B and map have the same distribution, and map is a permutation. To simplify, we
44
assume a block distribution with equal size blocks.
45
46SUBROUTINE MAPVALS(A, B, map, m, comm, p)
47USE MPI
48INTEGER m, map(m), comm, p
11.3. COMMUNICATION CALLS |
|
343 |
|
REAL A(m), B(m) |
|
|
|
INTEGER otype(p), oindex(m), |
& ! used to construct origin datatypes |
||
ttype(p), |
tindex(m), |
& |
! used to construct target datatypes |
count(p), |
total(p), |
& |
|
win, ierr
INTEGER (KIND=MPI_ADDRESS_KIND) lowerbound, sizeofreal
!This part does the work that depends on the locations of B.
!Can be reused while this does not change
CALL MPI_TYPE_GET_EXTENT(MPI_REAL, lowerbound, sizeofreal, ierr) CALL MPI_WIN_CREATE(B, m*sizeofreal, sizeofreal, MPI_INFO_NULL, &
comm, win, ierr)
!This part does the work that depends on the value of map and
!the locations of the arrays.
!Can be reused while these do not change
!Compute number of entries to be received from each process
DO i=1,p |
|
|
count(i) |
= 0 |
|
END |
DO |
|
DO i=1,m |
|
|
j |
= map(i)/m+1 |
|
count(j) |
= count(j)+1 |
|
END |
DO |
|
total(1) = |
0 |
|
DO i=2,p |
|
|
total(i) |
= total(i-1) + count(i-1) |
|
END |
DO |
|
DO i=1,p |
|
|
count(i) |
= 0 |
|
END |
DO |
|
!compute origin and target indices of entries.
!entry i at current process is received from location
!k at process (j-1), where map(i) = (j-1)*m + (k-1),
!j = 1..p and k = 1..m
DO i=1,m
j = map(i)/m+1
k = MOD(map(i),m)+1 count(j) = count(j)+1
oindex(total(j) + count(j)) = i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1
2
3
344 |
CHAPTER 11. ONE-SIDED COMMUNICATIONS |
tindex(total(j) + count(j)) = k END DO
4! create origin and target datatypes for each get operation
5DO i=1,p
6 |
CALL MPI_TYPE_CREATE_INDEXED_BLOCK(count(i), |
1, oindex(total(i)+1), & |
|
||
7 |
MPI_REAL, |
otype(i), ierr) |
|
8CALL MPI_TYPE_COMMIT(otype(i), ierr)
9 |
CALL MPI_TYPE_CREATE_INDEXED_BLOCK(count(i), |
1, tindex(total(i)+1), & |
|
||
10 |
MPI_REAL, |
ttype(i), ierr) |
|
11CALL MPI_TYPE_COMMIT(ttype(i), ierr)
12END DO
13
14! this part does the assignment itself
15CALL MPI_WIN_FENCE(0, win, ierr)
16DO i=1,p
17CALL MPI_GET(A, 1, otype(i), i-1, 0, 1, ttype(i), win, ierr)
18END DO
19CALL MPI_WIN_FENCE(0, win, ierr)
20
21CALL MPI_WIN_FREE(win, ierr)
22DO i=1,p
23
24
CALL MPI_TYPE_FREE(otype(i), ierr) CALL MPI_TYPE_FREE(ttype(i), ierr)
25END DO
26RETURN
27END
28
29Example 11.2 A simpler version can be written that does not require that a datatype
30be built for the target bu er. But, one then needs a separate get call for each entry, as
31illustrated below. This code is much simpler, but usually much less e cient, for large arrays.
32
33SUBROUTINE MAPVALS(A, B, map, m, comm, p)
34USE MPI
35INTEGER m, map(m), comm, p
36REAL A(m), B(m)
37INTEGER win, ierr
38INTEGER (KIND=MPI_ADDRESS_KIND) lowerbound, sizeofreal
39
40CALL MPI_TYPE_GET_EXTENT(MPI_REAL, lowerbound, sizeofreal, ierr)
41CALL MPI_WIN_CREATE(B, m*sizeofreal, sizeofreal, MPI_INFO_NULL, &
42 |
comm, win, ierr) |
43
44CALL MPI_WIN_FENCE(0, win, ierr)
45DO i=1,m
46j = map(i)/m
47k = MOD(map(i),m)
48CALL MPI_GET(A(i), 1, MPI_REAL, j, k, 1, MPI_REAL, win, ierr)