Jones D.M.The new C standard (C90 and C++).An economic and cultural commentary.2005
.pdf5.2.1 Character sets 219
C++
The C++ Standard does not contain a requirement to define a collating sequence on the character sets it specifies.
213 Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters.
basic char-
acter set extended
characters
C90
This explicit subdivision of characters into sets is new in C99. The wording in the C90 Standard specified the minimum contents of the basic source and basic execution character sets. These terms are now defined exactly, with all other characters being called extended characters.
212 source character set
. . . ; any additional members beyond those required by this subclause are locale-specific.
C++
The values of the members of the execution character sets are implementation-defined, and any additional members are locale-specific.
The C++ Standard more closely follows the C90 wording.
218 it is used to terminate a character string.
C++
After any necessary concatenation, in translation phase 7 (2.1), ’\0’ is appended to every string literal so that programs that scan a string can find its end.
In practice the C usage is the same as that specified by C ++.
219Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet
2.2p3
character string terminate
2.13.4p4
basic source
character set basic execution
character set
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! |
" |
# |
% |
& |
’ |
( |
) |
* |
+ |
, |
- |
. |
/ |
: |
; |
< |
= |
> |
? |
[ |
\ |
] |
^ |
_ |
{ |
| |
} |
~ |
|
the space character, and control characters representing horizontal tab, vertical tab, and form feed.
September 2, 2005 |
v 1.0b |
225 5.2.1 Character sets
C90
The C90 Standard referred to these characters as the English alphabet.
C++
2.2p1
basic execution character set control characters
toupper function
basic character set
fit in a byte
1.7p1
digit characters contiguous
The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphics characters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
The C++ Standard includes new-line in the basic source character set (C only includes it in the basic execution character set).
The C++ Standard does not separate out the uppercase, lowercase, and decimal digits from the graphical characters, so technically they are not defined for the basic source character set (the library functions such as toupper effectively define these terms for the execution character set).
The representation of each member of the source and execution basic character sets shall fit in a byte. |
220 |
C++ |
|
A byte is at least large enough to contain any member of the basic execution character set and . . .
This requirement reverses the dependency given in the C Standard, but the effect is the same.
In both the source and execution basic character sets, the value of each character after 0 in the above list of 221 decimal digits shall be one greater than the value of the previous.
C++
The above wording has been proposed as the response to C++ DR #173.
end-of-line |
In source files, there shall be some way of indicating the end of each line of text; |
222 |
|
representation |
C++ |
|
|
|
|
||
|
The C++ Standard does not specify this level of detail (although it does refer to end-of-line indicators, |
|
|
|
2.1p1n1). |
|
|
|
|
|
|
|
this International Standard treats such an end-of-line indicator as if it were a single new-line character. |
223 |
|
|
C++ |
|
2.1p1n1 |
. . . (introducing new-line characters for end-of-line indicators) . . . |
|
If any other characters are encountered in a source file (except in an identifier, a character constant, a string 225 literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior
is undefined.
v 1.0b |
September 2, 2005 |
5.2.1.2 Multibyte characters 240
C90
Support for additional characters in identifiers is new in C99.
C++
2.1p1
Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character.
The C++ Standard specifies the behavior and a translator is required to handle source code containing such a character. A C translator is permitted to issue a diagnostic and fail to translate the source code.
226 A letter is an uppercase letter or a lowercase letter as defined above; letter
C90
This definition is new in C99.
227 in this International Standard the term does not include other characters that are letters in other alphabets.
C++
The definition used in the C ++ Standard, 17.3.2.1.3 (the footnote applies to C90 only), implies this is also true in C++.
228 The universal character name construct provides a way to name other characters.
C90
Support for universal character names is new in C99.
5.2.1.1Trigraph sequences
5.2.1.2Multibyte characters
235 The source character set may contain multibyte characters, used to represent members of the extended character set.
C++
multibyte
character source contain
The representations used for multibyte characters, in source code, invariably involve at least one character that is not in the basic source character set:
2.1p1
Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character.
The C++ Standard does not discuss the issue of a translator having to process multibyte characters during translation. However, implementations may choose to replace such characters with a corresponding universal-character-name.
236 The execution character set may also contain multibyte characters, which need not have the same encoding as for the source character set.
C++
There is no explicit statement about such behavior being permitted in the C++ Standard. The C header <wchar.h> (specified in Amendment 1 to C90) is included by reference and so the support it defines for multibyte characters needs to be provided by C++ implementations.
September 2, 2005 |
v 1.0b |
247 5.2.1.2 Multibyte characters
multibyte
character state-dependent
encoding shift state
— A multibyte character set may have a state-dependent encoding, wherein each sequence of multibyte 240 characters begins in an initial shift state and enters other locale-specific shift states when specific multibyte characters are encountered in the sequence.
C90
The C90 Standard specified implementation-defined shift states rather than locale-specific shift states.
footnote 12
C++
The definition of multibyte character, 1.3.8, says nothing about encoding issues (other than that more than one byte may be used). The definition of multibyte strings, 17.3.2.1.3.2, requires the multibyte characters to begin and end in the initial shift state.
While in the initial shift state, all single-byte characters retain their usual interpretation and do not alter the |
241 |
|
shift state. |
|
|
C++ |
|
|
The C++ Standard does not explicitly specify this requirement. |
|
|
|
|
|
12) The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as |
242 |
|
described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set. |
|
|
C90 |
|
|
The C90 Standard explicitly referred to the 1983 version of ISO/IEC 646 standard. |
|
|
|
|
|
The interpretation for subsequent bytes in the sequence is a function of the current shift state. |
243 |
|
C++ |
|
byte
all bits zero null character
interpreted as
2.2p3
multibyte
character
end in initial shift state
A set of virtual functions for handling state-dependent encodings, during program execution, is discussed in Clause 22, Localization library. But, this requirement is not specified.
— A byte with all bits zero shall be interpreted as a null character independent of shift state. |
244 |
||||
C++ |
|
||||
|
|
|
|
|
|
|
. . . , plus a null character (respectively, null wide character), whose representation has all zero bits. |
|
|
||
|
|
|
|
|
|
While the C++ Standard does not rule out the possibility of all bits zero having another interpretation in other |
|
||||
contexts, other requirements (17.3.2.1.3.1p1 and 17.3.2.1.3.2p1) restrict these other contexts, as do existing |
|
||||
character set encodings. |
|
||||
|
|
|
|
||
— A byte with all bits zero shall not occur in the second or subsequent bytes of a Such a byte shall not occur |
245 |
||||
as part of any other multibyte character. |
|
|
|||
|
|
|
|
|
|
C++
|
This requirement can be deduced from the definition of null terminated byte strings, 17.3.2.1.3.1p1, and |
|
|
null terminated multibyte strings, 17.3.2.1.3.2p1. |
|
|
|
|
token |
— An identifier, comment, string literal, character constant, or header name shall begin and end in the initial 247 |
|
shift state |
shift state. |
|
|
||
|
C90 |
|
|
Support for multibyte characters in identifiers is new in C99. |
v 1.0b |
September 2, 2005 |
5.2.2 Character display semantics 254
C++
|
In C++ all characters are mapped to the source character set in translation phase 1. Any shift state encoding |
116 transla- |
|
|
tion phase |
||
will not exist after translation phase 1, so the C requirement is not applicable to C++ source files. |
1 |
||
|
|||
|
|
|
|
248 — An identifier, comment, string literal, character constant, or header name shall consist of a sequence of |
|
||
valid multibyte characters. |
|
||
|
C90 |
|
|
|
Support for multibyte characters in identifiers is new in C99. |
|
|
|
C++ |
116 transla- |
|
|
In C++ all characters are mapped to the source character set in translation phase 1. Any shift state encoding |
||
|
tion phase |
||
will not exist after translation phase 1, so the C requirement is not applicable to C++ source files. |
1 |
||
|
|||
5.2.2 Character display semantics |
|
||
|
C++ |
|
|
|
Clause 18 mentions “display as a wstring” in Notes:. But, there is no other mention of display semantics |
|
|
|
anywhere in the standard. |
|
|
|
|
|
|
249 The active position is that location on a display device where the next character output by the fputc function |
active position |
||
would appear. |
|
||
|
C++ |
|
|
|
C++ has no concept of active position. The fputc function appears in "Table 94" as one of the functions |
|
|
|
supported by C++. |
|
|
|
|
|
|
250 The intent of writing a printing character (as defined by the isprint function) to a display device is to display |
|
||
|
a graphic representation of that character at the active position and then advance the active position to the |
|
|
|
next position on the current line. |
|
|
|
C++ |
|
|
|
The C++ Standard does not discuss character display semantics. |
|
|
|
|
|
|
251 The direction of writing is locale-specific. |
writing direction |
||
|
C++ |
locale-specific |
|
|
|
||
|
The C++ Standard does not discuss character display semantics. |
|
|
|
|
|
|
252 If the active position is at the final position of a line (if there is one), the behavior of the display device is |
|
||
|
unspecified. |
|
|
|
C++ |
|
|
|
The C++ Standard does not discuss character display semantics. |
|
|
|
|
|
|
253 Alphabetic escape sequences representing nongraphic characters in the execution character set are in- |
|
||
|
tended to produce actions on display devices as follows: |
|
|
|
C++ |
|
|
|
The C++ Standard does not discuss character display semantics. |
|
|
|
|
|
|
254 \a (alert) Produces an audible or visible alert without changing the active position. |
alert |
||
|
|
|
escape sequence |
September 2, 2005 |
v 1.0b |
259 5.2.2 Character display semantics
C++
Alert appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
backspace \b (backspace) Moves the active position to the previous position on the current line. 255 escape sequence
C++
Backspace appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
If the active position is at the initial position of a line, the behavior of the display device is unspecified. |
256 |
C90
If the active position is at the initial position of a line, the behavior is unspecified.
This wording differs from C99 in that it renders the behavior of the program as unspecified. The program simply writes the character; how the device handles the character is beyond its control.
C++
The C++ Standard does not discuss character display semantics.
form feed \f (form feed ) Moves the active position to the initial position at the start of the next logical page. 257 escape sequence
C++
Form feed appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
new-line \n (new line) Moves the active position to the initial position of the next line. 258 escape sequence
C++
New line appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
v 1.0b |
September 2, 2005 |
5.2.2 Character display semantics 264
259 \r (carriage return) Moves the active position to the initial position of the current line. |
carriage return |
C++ |
escape sequence |
|
Carriage return appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
260 \t (horizontal tab) Moves the active position to the next horizontal tabulation position on the current line. |
horizontal tab |
|
escape sequence |
C++ |
|
Horizontal tab appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
261 If the active position is at or past the last defined horizontal tabulation position, the behavior of the display device is unspecified.
C90
If the active position is at or past the last defined horizontal tabulation position, the behavior is unspecified.
262 \v (vertical tab) Moves the active position to the initial position of the next vertical tabulation position. |
vertical tab |
|
escape sequence |
C++ |
|
Vertical tab appears in Table 5, 2.13.2p3. There is no other description of this escape sequence, although the C behavior might be implied from the following wording:
17.4.1.2p3
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12:
263If the active position is at or past the last defined vertical tabulation position, the behavior of the display device is unspecified.
C90
If the active position is at or past the last defined vertical tabulation position, the behavior is unspecified.
September 2, 2005 |
v 1.0b |
271 5.2.4 Environmental limits
escape sequence fit in char object
Each of these escape sequences shall produce a unique implementation-defined value which can be stored 264 in a single char object.
C++
This requirement can be deduced from 2.2p3.
The external representations in a text file need not be identical to the internal representations, and are outside 265 the scope of this International Standard.
C++
The C++ Standard does not get involved in such details.
5.2.3 Signals and interrupts
C++ |
267 |
object storage
outside function image
The C++ Standard specifies, Clause 15 Exception handling, a much richer set of functionality for dealing with exceptional behaviors. While it does not go into the details contained in this C subclause, they are likely, of necessity, to be followed by a C++ implementation.
Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called |
268 |
|
by a signal handler, or both, with no alteration to earlier, but still active, invocations’ control flow (after the |
|
|
interruption), function return values, or objects with automatic storage duration. |
|
|
C++ |
|
|
This implementation requirement is not specified in the C ++ Standard (1.9p9). |
|
|
|
|
269 |
All such objects shall be maintained outside the function image (the instructions that compose the executable |
||
representation of a function) on a per-invocation basis. |
|
C++
The C++ Standard does not contain this requirement.
5.2.4 Environmental limits
environmental limits
Both the translation and execution environments constrain the implementation of language translators and 270 libraries.
C++
There is an informative annex which states:
Annex Bp1
Because computers are finite, C ++ implementations are inevitably limited in the size of the programs they can successfully process.
The following summarizes the language-related environmental limits on a conforming implementation; |
271 |
C++ |
|
There is an informative annex which states:
Annex Bp2
v 1.0b |
September 2, 2005 |
5.2.4.1 Translation limits 275
The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine conformance.
272 the library-related limits are discussed in clause 7.
C++
Clause 18.2 contains an Implementation Limits:.
5.2.4.1 Translation limits
273 The implementation shall be able to translate and execute at least one program that contains at least one |
translation |
|||
|
instance of every one of the following limits:13) |
limits |
||
|
C++ |
|
||
|
|
|
|
Annex Bp2 |
|
|
However, these quantities are only guidelines and do not determine conformance. |
|
|
|
|
|
|
|
|
|
|
|
|
This wording appears in an informative annex, which itself has no formal status. |
|
|||
|
|
|
||
274 — 127 nesting levels of blocks |
limit |
|||
|
C90 |
block nesting |
||
|
|
|||
|
|
|
|
|
|
|
15 nesting levels of compound statements, iteration control structures, and selection control structures |
|
|
|
|
|
|
|
The number of constructs that could create a block increased between C90 and C99, including selection statements and their associated substatements, and iteration statements and their associated bodies. Although use of these constructs doubles the number of blocks created in C99, the limit on the nesting of blocks has increased by a factor of four. So, the conformance status of a program will not be adversely affected.
C++
The following is a non-normative specification.
block selection statement
block selection substatement block iteration statement
block loop body
Nesting levels of compound statements, iteration control structures, and selection control structures [256] |
Annex Bp2 |
|
|
|
|
275 — 63 nesting levels of conditional inclusion
C90
8 nesting levels of conditional inclusion
C++
The following is a non-normative specification.
|
Annex Bp2 |
September 2, 2005 |
v 1.0b |
279 5.2.4.1 Translation limits
Nesting levels of conditional inclusion [256]
limit |
— 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, |
276 |
|||
type complex- |
or incomplete type in a declaration |
|
|
|
|
ity |
|
|
|
||
|
|
|
|
|
|
|
C++ |
|
|
|
|
|
The following is a non-normative specification. |
|
|
|
|
Annex Bp2 |
|
|
|
|
|
|
Pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or |
|
|
||
|
|
|
|
||
|
|
incomplete type in a declaration [256] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
limit |
— 63 nesting levels of parenthesized declarators within a full declarator |
|
|
277 |
|
declarator paren- |
C90 |
|
|
|
|
theses |
|
|
|
||
|
|
|
|
|
|
|
|
31 nesting levels of parenthesized declarators within a full declarator |
|
|
|
|
|
|
|
|
|
|
C++ |
|
|
|
|
|
The C++ Standard does not discuss declarator parentheses nesting limits. |
|
|
|
|
|
|
|
|
|
|
parenthesized |
— 63 nesting levels of parenthesized expressions within a full expression |
|
|
278 |
|
expression |
C90 |
|
|
|
|
nesting levels |
|
|
|
||
|
|
|
|
|
|
|
|
31 nesting levels of parenthesized expressions within a full expression |
|
|
|
|
|
|
|
|
|
|
C++ |
|
|
|
|
|
The following is a non-normative specification. |
|
|
|
|
Annex Bp2 |
|
|
|
|
|
|
Nesting levels of parenthesized expressions within a full expression [256] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
internal identifier |
— 63 significant initial characters in an internal identifier or a macro name (each universal character name |
279 |
|||
significant charac- |
or extended source character is considered a single character) |
|
|
|
|
ters |
|
|
|
|
|
|
C90 |
|
|
|
|
|
|
|
|
|
|
|
|
31 significant initial characters in an internal identifier or a macro name |
|
|
|
|
|
|
|
|
|
|
C++ |
|
|
|
|
2.10p1 |
|
|
|
|
|
|
|
v 1.0b |
September 2, 2005 |