CHAPTER 2
This chapter describes the context-free grammars used in this specification to define the lexical and syntactic structure of a Java program.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the infinite set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.
These input elements, with white space (§3.6) and comments (§3.7) discarded, form the terminal symbols for the syntactic grammar for Java and are called Java tokens (§3.5). These tokens are the identifiers (§3.8), keywords (§3.9), literals (§3.10), separators (§3.11), and operators (§3.11) of the Java language.
A LALR(1) version of the syntactic grammar is presented in Chapter 19. The grammar in the body of this specification is very similar to the LALR(1) grammar but more readable.
fixed
width
font in the productions of the lexical
and syntactic grammars, and throughout this specification whenever the text is
directly referring to such a terminal symbol. These are to appear in a program
exactly as written.
Nonterminal symbols are shown in italic type. The definition of a nonterminal is introduced by the name of the nonterminal being defined followed by a colon. One or more alternative right-hand sides for the nonterminal then follow on succeeding lines. For example, the syntactic definition:
IfThenStatement:states that the nonterminal IfThenStatement represents the token
if (
Expression)
Statement
if
, followed by a
left parenthesis token, followed by an Expression, followed by a right parenthesis
token, followed by a Statement. As another example, the syntactic definition:
ArgumentList:states that an ArgumentList may represent either a single Argument or an ArgumentList, followed by a comma, followed by an Argument. This definition of ArgumentList is recursive, that is to say, it is defined in terms of itself. The result is that an ArgumentList may contain any positive number of arguments. Such recursive definitions of nonterminals are common.
Argument
ArgumentList,
Argument
The subscripted suffix "opt", which may appear after a terminal or nonterminal, indicates an optional symbol. The alternative containing the optional symbol actually specifies two right-hand sides, one that omits the optional element and one that includes it. This means that:
BreakStatement:is a convenient abbreviation for:
break
Identifieropt;
BreakStatement:and that:
break ;
break
Identifier;
ForStatement:is a convenient abbreviation for:
for (
ForInitopt;
Expressionopt;
ForUpdateopt)
Statement
ForStatement:which in turn is an abbreviation for:
for ( ;
Expressionopt;
ForUpdateopt)
Statement
for (
ForInit;
Expressionopt;
ForUpdateopt)
Statement
ForStatement:which in turn is an abbreviation for:
for ( ; ;
ForUpdateopt)
Statement
for ( ;
Expression;
ForUpdateopt)
Statement
for (
ForInit; ;
ForUpdateopt)
Statement
for (
ForInit;
Expression;
ForUpdateopt)
Statement
ForStatement:so the nonterminal ForStatement actually has eight alternative right-hand sides.
for ( ; ; )
Statement
for ( ; ;
ForUpdate)
Statement
for ( ;
Expression; )
Statement
for ( ;
Expression;
ForUpdate)
Statement
for (
ForInit; ; )
Statement
for (
ForInit; ;
ForUpdate)
Statement
for (
ForInit;
Expression; )
Statement
for (
ForInit;
Expression;
ForUpdate)
Statement
A very long right-hand side may be continued on a second line by substantially indenting this second line, as in:
ConstructorDeclaration:which defines one right-hand side for the nonterminal ConstructorDeclaration. (This right-hand side is an abbreviation for four alternative right-hand sides, because of the two occurrences of "opt".)
ConstructorModifiersoptConstructorDeclarator
ThrowsoptConstructorBody
When the words "one of" follow the colon in a grammar definition, they signify that each of the terminal symbols on the following line or lines is an alternative definition. For example, the lexical grammar for Java contains the production:
ZeroToThree: one ofwhich is merely a convenient abbreviation for:
0 1 2 3
ZeroToThree:When an alternative in a lexical production appears to be a token, it represents the sequence of characters that would make up such a token. Thus, the definition:
0
1
2
3
BooleanLiteral: one ofin a lexical grammar production is shorthand for:
true false
BooleanLiteral:The right-hand side of a lexical production may specify that certain expansions are not permitted by using the phrase "but not" and then indicating the expansions to be excluded, as in the productions for InputCharacter (§3.4) and Identifier (§3.8):
t r u e
f a l s e
InputCharacter:Finally, a few nonterminal symbols are described by a descriptive phrase in roman type in cases where it would be impractical to list all the alternatives:
UnicodeInputCharacter but not CR or LF Identifier:
IdentifierName but not a Keyword or BooleanLiteral or NullLiteral
RawInputCharacter:
any Unicode character
Contents | Prev | Next | Index
Java Language Specification (HTML generated by Suzette Pelouch on February 24, 1998)
Copyright © 1996 Sun Microsystems, Inc.
All rights reserved
Please send any comments or corrections to doug.kramer@sun.com