Syntax Analysis 1
COMP321 Compiler β Syntax Analysis 1
Kyungpook National University | Hwisoo So | Spring 2026
π 1νμ΄μ§ β μ λͺ©
Syntax Analysis β 1
ꡬ문 λΆμ β 1
μ»΄νμΌλ¬μμ Syntax Analysis(ꡬ문 λΆμ) λ¨κ³μ λν 첫 λ²μ§Έ κ°μλ€.
μ¦, μ§κΈλΆν° parser(λ¬Έλ² κ²μ¬κΈ°) ννΈλ₯Ό 본격μ μΌλ‘ μμνλ€λ μλ―Έ.
- Hwisoo So β μνμ (κ°μμ μ΄λ¦)
- Kyungpook National University β κ²½λΆλνκ΅
- COMP321 Compiler Spring 2026 β μ»΄νμΌλ¬ κ³Όλͺ© (2026λ λ΄νκΈ°)
π 2νμ΄μ§ β Outlook / Todayβs agenda
Outlook / Todayβs agenda
κ°μ κ°μ / μ€λ λ°°μΈ λ΄μ©
Syntax and Semantics of Programming Languages
νλ‘κ·Έλλ° μΈμ΄μ ꡬ문과 μλ―Έ
μμΌλ‘ κ³μ λμ€λ ν΅μ¬ κ°λ :
- Syntax = νν (λ¬Έλ²)
- Semantics = μλ―Έ
Specifying the Syntax of a programming language
νλ‘κ·Έλλ° μΈμ΄μ ꡬ문μ μ μνλ λ°©λ²
- CFGs, BNF and EBNF β CFG, BNF, EBNF
- CFG β λ¬Έλ² μ μ μ΄λ‘
- BNF β λ¬Έλ² νν λ°©μ
- EBNF β λ νΈν νμ₯ λ²μ
- Grammar transformations β λ¬Έλ² λ³ν
νμ λ§λ€κΈ° μ½κ² λ¬Έλ²μ λ°κΎΈλ κ³Όμ (β LL νμ λ§λ€ λ νμ)
Parsing
νμ± (ꡬ문 λΆμ κ³Όμ )
- Top-down parsing (LL) β νλ€μ΄ νμ± (LL)
- Recursive descent (LL) parser construction β μ¬κ· νκ° νμ ꡬν
- LL Grammars β LL λ¬Έλ²
LL νμλ μΌμͺ½λΆν° μ½κ³ (L), μΌμͺ½λΆν° μ λ(L)
AST Construction
AST μμ±
- Parse trees vs. ASTs β νμ€ νΈλ¦¬ vs AST
- Parse Tree = λ¬Έλ² κ·Έλλ‘
- AST = μ€μ μλ―Έ μ€μ¬
Chomskyβs Hierarchy
μ΄μ€ν€ κ³μΈ΅ β μΈμ΄ μ΄λ‘ (μ κ·, CFG λ±)
π 3νμ΄μ§ β μ»΄νμΌλ¬ ꡬ쑰 볡μ΅
Recapitulate: the structure of a compilerβ¦
볡μ΅: μ»΄νμΌλ¬μ ꡬ쑰
κ·Έλ¦Ό μ€λͺ β μ»΄νμΌλ¬ μ 체 νμ΄νλΌμΈ
Source Code
β
Lexical Analyzer
β [Tokens]
Syntax Analyzer
β [Abstract Syntax Tree (AST)]
Semantic Analyzer
β [Decorated AST]
Intermediate Code Generator
β [Intermediate Representation (IR)]
Code Optimizer
β
Code Generator
β
Target Assembly
| λ¨κ³ | μΆλ ₯ |
|---|---|
| Lexical Analyzer | Tokens |
| Syntax Analyzer | AST |
| Semantic Analyzer | Decorated AST |
| Intermediate Code Generator | IR |
- Compiler Frontend (Analysis) β μ»΄νμΌλ¬ μλ¨: Lexical + Syntax + Semantic
- Compiler Backend (Synthesis) β μ»΄νμΌλ¬ λ·λ¨: IR μμ± μ΄ν
β¦each phase transforms the programβ¦
κ° λ¨κ³λ νλ‘κ·Έλ¨μ λ€λ₯Έ ννλ‘ λ³ννλ€
μ»΄νμΌλ¬λ βλ²μκΈ°βκ° μλλΌ λ¨κ³λ³ λ³νκΈ° νμ΄νλΌμΈμ΄λ€.
π 4νμ΄μ§ β μ»΄νμΌλ¬ ꡬ쑰 λ³΅μ΅ (μ½λ μμ)
Recapitulate: the structure of a compilerβ¦
μ»΄νμΌλ¬ ꡬ쑰 볡μ΅
μ½λ μμ
position = initial + rate * 60
μ¬λ³Ό ν μ΄λΈ μ°κ²°:
| λ²νΈ | μλ³μ |
|---|---|
| 1 | position |
| 2 | initial |
| 3 | rate |
| 4 | 60 |
- Symbol Table (μ¬λ³Ό ν μ΄λΈ) β λ³μ μ 보 μ μ₯: μ΄λ¦ / νμ / μμΉ
λ¨κ³λ³ λ³ν
1λ¨κ³: Tokens (scanner κ²°κ³Ό)
id, =, id, +, id, *, intliteral
2λ¨κ³: AST (κ΅¬μ‘°λ§ νν)
=
/ \
id +
/ \
id *
/ \
id int
3λ¨κ³: Decorated AST β νμ μ 보 μΆκ°λ¨ (μ: int β float λ³ν)
Note: all variables are of type real
λͺ¨λ λ³μλ real νμ β νμ λ³ν λ°μ (int β float)
π 5νμ΄μ§ β Syntax and Semantics (μμ΄ μμ)
English Language: Syntax and Semantics
μμ΄μμμ ꡬ문과 μλ―Έ
| λ¬Έμ₯ | ν΄μ |
|---|---|
| John eats apples. | μ‘΄μ μ¬κ³Όλ₯Ό λ¨Ήλλ€ |
| Apples eat John. | μ¬κ³Όκ° μ‘΄μ λ¨Ήλλ€ |
Syntax (ꡬ문)
- The form or structure of English sentences β λ¬Έμ₯μ νν λλ ꡬ쑰
- Not concerned with the meaning β μλ―Έλ κ³ λ €νμ§ μλλ€
- Specified by the English grammar β λ¬Έλ²μ μν΄ μ μλλ€
Syntaxλ βλ§λ λ¬Έμ₯μΈμ§βλ§ λ³Έλ€
Apples eat Johnβ λ¬Έλ²μ μΌλ‘λ λ§μ, μλ―Έλ μ΄μν¨
Semantics (μλ―Έ)
- The meaning of English sentences β λ¬Έμ₯μ μλ―Έ
- Semantic definition? β μλ―Έλ₯Ό μ΄λ»κ² μ μν κ²μΈκ°?
Semanticsλ βλ§μ΄ λλμ§β λ³Έλ€
π₯ 1~5νμ΄μ§ ν΅μ¬ μμ½
- μ»΄νμΌλ¬ ꡬ쑰: Lexical β Syntax β Semantic
- Syntax vs Semantics: Syntax = λ¬Έλ² / Semantics = μλ―Έ
- Syntax Analysis μν : ν ν°μ λ°μμ ꡬ쑰(νΈλ¦¬) λ§λ λ€
- AST μ€μμ±: μ΄ν λͺ¨λ λ¨κ³μ κΈ°λ°
π 6νμ΄μ§ β Programming Languages: Syntax and Semantics
Programming Languages: Syntax and Semantics
νλ‘κ·Έλλ° μΈμ΄: ꡬ문과 μλ―Έ
μ½λ μμ
b = b + 1;
if (b==2) printf("too small\n");
b = b + 1;β bλ b+1λ‘ κ°±μ (λ¬Έλ² + μλ―Έ λͺ¨λ μ μ)if (b==2) printf("too small\n");β 쑰건문 μμ (λ¬Έλ² + μλ―Έ λͺ¨λ μ μ)
Syntax (ꡬ문)
- The form or structure of programs β νλ‘κ·Έλ¨μ νν λλ ꡬ쑰
- Not concerned with the meaning of programs β νλ‘κ·Έλ¨μ μλ―Έλ κ³ λ €νμ§ μλλ€
- Specified by a context-free grammar (CFG) β λ¬Έλ§₯ μμ λ¬Έλ²(CFG)μ μν΄ μ μλλ€
ν΅μ¬: νλ‘κ·Έλλ° μΈμ΄μ λ¬Έλ²μ μ λΆ CFGλ‘ μ μλ¨
Semantics (μλ―Έ)
- The meaning of programs β νλ‘κ·Έλ¨μ μλ―Έ
- Specified by:
- operational, denotational or axiomatic semantics β μ°μ°μ / μ§μμ / 곡리μ μλ―Έλ‘
- attribute grammars (will be covered by this course) β μμ± λ¬Έλ² (μ΄ μμ μμ λ€λ£Έ)
- informal English descriptions (as with C, Java, MiniC) β λΉκ³΅μμ μμ΄ μ€λͺ
π 7νμ΄μ§ β Context-free Grammars (CFGs)
Context-free Grammars (CFGs)
λ¬Έλ§₯ μμ λ¬Έλ²
Example: CFG for micro-English
Sentence ::= Subject Verb Object .
Subject ::= I | A Noun | The Noun
Object ::= me | a Noun | the Noun
Noun ::= cat | mat | rat | Ξ΅
Verb ::= like | is | see | sees
| κΈ°νΈ | μλ―Έ |
|---|---|
::= |
μ μ |
\| |
μ ν |
Ξ΅ |
μ무κ²λ μμ (λΉ λ¬Έμμ΄) |
CFG κΈ°λ³Έ κ°λ
- Context-free grammars consist of productions of the form:
CFGλ λ€μ ννμ productionμΌλ‘ ꡬμ±λλ€
<nonterminal> ::= sequence of (non) terminals
- a terminal of a grammar is a token β terminalμ ν ν°μ΄λ€
-
**the symbol β β denotes alternative forms** β |λ μ νμ μλ―Ένλ€ - Ξ΅ denotes the empty string β
Ξ΅λ λΉ λ¬Έμμ΄μ μλ―Ένλ€
π 8νμ΄μ§ β Context-free Grammars (derivation)
Context-free Grammars
λ¬Έλ§₯ μμ λ¬Έλ² β derivation
A derivation shows how to generate a syntactically valid string.
μ λ(derivation)λ λ¬Έλ²μ μΌλ‘ μ¬λ°λ₯Έ λ¬Έμμ΄μ μμ±νλ κ³Όμ μ 보μ¬μ€λ€
μμ
Sentence β Subject Verb Object .
β I Verb Object .
β I see Object .
β I see the Noun .
β I see the cat .
- derivation = βλ¬Έμ₯μ ν λ¨κ³μ© μμ±νλ κ³Όμ β
- The non-terminal to be replaced next is underlined
λ€μμ μΉνν non-terminalμ λ°μ€λ‘ νμλλ€
π 9νμ΄μ§ β Context-free Grammars (λ¬Έμ₯ μμ)
Context-free Grammars
micro-Englishμμ κ°λ₯ν λ¬Έμ₯λ€
ν¬ν¨λλ λ¬Έμ₯ (valid)
| λ¬Έμ₯ | ν΄μ | λΉκ³ |
|---|---|---|
| I like the cat. | λλ κ³ μμ΄λ₯Ό μ’μνλ€ | λ¬Έλ² OK |
| The cat like the mat. | κ³ μμ΄κ° λ§€νΈλ₯Ό μ’μνλ€ | λ¬Έλ² OK, μμ΄λ νλ¦Ό (likesμ¬μΌ ν¨) |
| The mat is the rat. | λ§€νΈλ μ₯λ€ | λ¬Έλ² OK, μλ―Έ μ΄μ |
Syntax vs Semantics μ°¨μ΄ λ€μ λ±μ₯
ν¬ν¨λμ§ μλ λ¬Έμ₯ (invalid)
- I like the black cat. β adjective(νμ©μ¬) μμ β μμ± λΆκ°
How many sentences can be derived?
λͺ κ°μ λ¬Έμ₯μ λ§λ€ μ μλκ°?
Just a few, i.e., finitely many
λͺ κ° μ λλ€ (μ ν κ°) β μ΄ CFGλ ννλ ₯μ΄ μ νλ¨
π 10νμ΄μ§ β Derivations
Derivations
μ λ
- From a grammar we can derive strings β λ¬Έλ²μΌλ‘λΆν° λ¬Έμμ΄μ μμ±ν μ μλ€
- by generating sequences of tokens β ν ν°μ μνμ€λ₯Ό μμ±ν¨μΌλ‘μ¨
- In each derivation step, a nonterminal is replaced β κ° λ¨κ³μμ nonterminalμ΄ μΉνλλ€
- by a right-hand side of a production β ν΄λΉ productionμ RHSλ‘
ν΅μ¬ κ°λ
| μ©μ΄ | μλ―Έ |
|---|---|
| Sentential form | μ€κ° κ³Όμ μ λ¬Έμμ΄ (terminal + nonterminal νΌμ¬) |
| Sentence | terminalλ§ μλ μ΅μ’ μν |
- A context-free grammar is a generator of a language
CFGλ μΈμ΄ μμ±κΈ°λ€ - the set of all strings that can be derived
μ λ κ°λ₯ν λͺ¨λ λ¬Έμμ΄μ μ§ν© - A sentence that cannot be derived is syntactically illegal
μ λν μ μλ λ¬Έμ₯μ λ¬Έλ²μ μΌλ‘ νλ¦° κ²μ΄λ€
CFG = βκ°λ₯ν λͺ¨λ νλ‘κ·Έλ¨ μ μβ
parser μν = μ΄ λ¬Έμ₯μ΄ CFGλ‘ λ§λ€μ΄μ§ μ μλμ§ κ²μ¬
π₯ 6~10νμ΄μ§ ν΅μ¬ μμ½
- Syntax vs Semantics: Syntax = ꡬ쑰 (CFG) / Semantics = μλ―Έ
- CFG ν΅μ¬ ꡬμ±: terminal = ν ν° / nonterminal = λ¬Έλ² λ³μ / production = κ·μΉ / Ξ΅ = empty
- derivation: λ¬Έμ₯μ λ¨κ³μ μΌλ‘ μμ± / sentential form β μ€κ° μν / sentence β μ΅μ’ μν
- ν΅μ¬ μ μ: CFG = language generator / βμμ± κ°λ₯ = λ¬Έλ²μ μΌλ‘ correctβ
π 11νμ΄μ§ β Context-free Grammars (μ μ)
Context-free Grammars
CFG μ μ
A context free grammar specifies all valid strings (a.k.a. programs) of a given programming language!
CFGλ μ£Όμ΄μ§ νλ‘κ·Έλλ° μΈμ΄μμ κ°λ₯ν λͺ¨λ μ¬λ°λ₯Έ λ¬Έμμ΄(μ¦, νλ‘κ·Έλ¨)μ μ μνλ€.
β Formal description of the syntax of a programming language.
β νλ‘κ·Έλλ° μΈμ΄μ ꡬ문μ λν νμμ μ μ
Elements of CFGs (CFGμ κ΅¬μ± μμ)
Terminal Symbols (Terminals) β λ¨λ§ κΈ°νΈ
- μ:
<=,while,if,>,==,ID,INTLITERAL - μ€μ μ½λμμ λμ€λ ν ν°λ€
Nonterminal Symbols (Non-terminals) β λΉλ¨λ§ κΈ°νΈ
- Particular class of phrases in the language β μΈμ΄μμ νΉμ ν ꡬ쑰λ₯Ό λνλ΄λ κΈ°νΈ
- μ:
Program,Command,Expression,Declaration - λ¬Έλ²μ βλ³μβ μν
Start Symbol β μμ κΈ°νΈ
- One of the nonterminals β λΉλ¨λ§ μ€ νλ
- usually the lefthand-side of the first production β λ³΄ν΅ μ²« λ²μ§Έ productionμ μΌμͺ½
- μ:
sentence
Productions β μμ° κ·μΉ
- μ:
sentence ::= noun verb
CFG = (Terminal, Nonterminal, Start, Production)
π 12νμ΄μ§ β CFG in BNF
Context-free Grammars
BNF νν
CFGs can be expressed in BNF (Backus-Naur Form)
CFGλ BNFλ‘ ννν μ μλ€
To recognize P. Naurβs contributionβ¦ and J. W. Backusβ¦
BNFλ Naurμ Backusμ κΈ°μ¬λ₯Ό 기리기 μν΄ μ΄λ¦ λΆμ¬μ‘λ€
Example in BNF
Program ::= Command
Command ::= single-Command
| Command ; single-Command
single-Command ::= V-name := Expression
| begin Command end
| κΈ°νΈ | μλ―Έ |
|---|---|
::= |
λ¬Έλ² μ μ |
:= |
μ€μ μ½λ λμ μ°μ°μ |
\| |
μ ν |
β οΈ
::=μ:=κ΅¬λΆ μ€μ β ν·κ°λ¦¬λ©΄ μνμμ νλ¦°λ€
- μ¬κ· μμ (
Command ::= Command ; ...) β βλ¬Έμ₯ μ¬λ¬ κ° μ΄μ΄λΆμ΄κΈ°β νν
π 13νμ΄μ§ β EBNF
Context-free Grammars
EBNF
For our convenience, we will use EBNF
νΈμλ₯Ό μν΄ EBNFλ₯Ό μ¬μ©νλ€
EBNF = BNF + regular expressions
EBNF = BNF + μ κ·ννμ
BNF vs EBNF λΉκ΅
BNF
Command ::= single-Command
| Command ; single-Command
EBNF
Command ::= single-Command (; single-Command)*
*means 0 or more occurrences
*λ 0λ² μ΄μ λ°λ³΅
BNF β 볡μ‘ν μ¬κ· νμ
EBNF β ν¨μ¬ μ§κ΄μ μΌλ‘ νν κ°λ₯
π 14νμ΄μ§ β CFG for extended micro-English
A CFG for extended micro-English
νμ₯λ micro-Englishμ© CFG
Verify if βPeter passed the testβ is a sentence?
βPeter passed the testβκ° λ¬Έμ₯μΈμ§ νμΈνλΌ
λ¬Έλ²
sentence ::= subject predicate
subject ::= NOUN | ARTICLE NOUN
predicate ::= VERB object
object ::= NOUN | ARTICLE NOUN
νλ¨ κ³Όμ
| λ¨μ΄ | λΆλ₯ |
|---|---|
| Peter | NOUN |
| passed | VERB |
| the | ARTICLE |
| test | NOUN |
β μ λΆ κ·μΉ λ§μ‘± β β valid sentence
π 15νμ΄μ§ β Two derivations
Two derivations of βPeter passed the testβ
λ κ°μ§ μ λ λ°©λ²
derivation 1 (leftmost)
sentence β subject predicate
β NOUN predicate
β NOUN VERB object
β NOUN VERB ARTICLE NOUN
derivation 2 (rightmost)
sentence β subject predicate
β subject VERB object
β subject VERB ARTICLE NOUN
β NOUN VERB ARTICLE NOUN
λ λ°©μ λͺ¨λ κ°μ κ²°κ³Ό μμ±:
Sentence: NOUN VERB ARTICLE NOUN
- Sentential forms β μ€κ° κ³Όμ λ€
- κ°μ λ¬Έμ₯λ μ¬λ¬ derivation κ°λ₯ β CFGλ λΉκ²°μ μ ꡬ쑰
π₯ 11~15νμ΄μ§ ν΅μ¬ μμ½
- CFG κ΅¬μ± μμ (νμ): Terminal / Nonterminal / Start Symbol / Production
- BNF vs EBNF: BNF = κΈ°λ³Έ / EBNF = λ°λ³΅(
*,+,?) κ°λ₯ - derivation ν΅μ¬: λ¬Έμ₯ μμ± κ³Όμ , μ¬λ¬ λ°©μ κ°λ₯
- ν΅μ¬ λ¬Έμ μ ν: βμ΄ λ¬Έμ₯μ΄ CFGλ‘ μμ± κ°λ₯νκ°?β
- λ§€μ° μ€μ: κ°μ λ¬Έμμ΄ = μ¬λ¬ derivation κ°λ₯
π 16νμ΄μ§ β Leftmost and Rightmost Derivations
Leftmost and Rightmost Derivations
μ’μΈ‘ μ λμ μ°μΈ‘ μ λ
At each step in the derivation, two choices are made:
μ λ κ³Όμ μ κ° λ¨κ³μμ λ κ°μ§ μ νμ΄ μ‘΄μ¬νλ€
- Which nonterminal to replace? β μ΄λ€ nonterminalμ μΉνν κ²μΈκ°?
- Which alternative to use for that nonterminal? β μ΄λ€ κ·μΉμ μ¬μ©ν κ²μΈκ°?
Two types of useful derivations (μ μ©ν λ κ°μ§ μ λ λ°©μ)
| λ°©μ | μ€λͺ | μ°κ²°λλ νμ |
|---|---|---|
| Leftmost derivation | νμ κ°μ₯ μΌμͺ½ nonterminalμ μΉν | LL parser |
| Rightmost derivation | νμ κ°μ₯ μ€λ₯Έμͺ½ nonterminalμ μΉν | LR parser |
β μν ν΅μ¬: Leftmost β LL / Rightmost β LR
π 17νμ΄μ§ β Conventions for writing CFGs
Conventions for writing CFGs
CFG μμ± κ·μΉ
Start symbol
- The left-hand side of the first production β 첫 λ²μ§Έ productionμ μΌμͺ½
- The letter S, whenever it appears β λ³΄ν΅ Sλ₯Ό μ¬μ©
Nonterminals (λΉλ¨λ§)
- lower-case names such as βsentenceβ, βexprβ
- capital letters like A, B, C
Terminals (λ¨λ§)
- boldface names such as ID and INTLITERAL
- digits, operators, punctuation characters:
1,+,[ - Sometimes in double quotes β κ°λ λ°μ΄νλ‘ νν
π 18νμ΄μ§ β Example Grammar for Pascal
Example Grammar for Pascal
Pascal μΈμ΄ λ¬Έλ² μμ
program ::= PROGRAM id ( id more_ids ) ; block .
block ::= variables BEGIN stmt more_stmts END
more_ids ::= , id more_ids | Ξ΅
variables ::= VAR id more_ids : type ; more_variables | Ξ΅
more_variables ::= id more_ids : type ; more_variables | Ξ΅
stmt ::= id := exp
| READ ( id more_ids )
| IF exp THEN stmt ELSE stmt
| WHILE exp DO stmt
| BEGIN stmt more_stmts END
more_stmts ::= ; stmt more_stmts | Ξ΅
exp ::= num | id | exp + exp | exp β exp
type ::= integer | boolean | char
ν΅μ¬ ν¬μΈνΈ
- program ꡬ쑰 β
PROGRAMμμ βblockβ λμ. - block β λ³μ μ μΈ +
BEGIN ~ END - stmt μ’ λ₯: λμ / READ / IF / WHILE / BEGIN-END
- Ξ΅ β optional (μμ΄λ λ¨)
Note: The productions for βnumβ and βidβ have been omitted
numκ³Ό id μ μλ μλ΅λ¨ (lexical λ¨κ³μμ μ΄λ―Έ μ μλ ν ν°)
π 19νμ΄μ§ β Pascal λ¬Έλ² κ²μ¬ μμ
Example Grammar for Pascal (cont.)
Does the following program adhere to the Pascal syntax?
λ€μ νλ‘κ·Έλ¨μ΄ Pascal λ¬Έλ²μ λ§μ‘±νλκ°?
PROGRAM foo (input, output);
BEGIN
IF a THEN a:= a+1; b := 22 ENDIF;
END;
ν΅μ¬ λΆμ
Pascal λ¬Έλ²μμ IFλ¬Έ κ·μΉ:
IF exp THEN stmt ELSE stmt
β λ°λμ ELSE νμ
β μ΄ μ½λλ ELSE μμ + ENDIF ꡬ쑰λ Pascal μ€νμΌ μλ
β λ¬Έλ² μ€λ₯ κ°λ₯μ± μμ
π 20νμ΄μ§ β Grammar & Program, side-by-side
Grammar & Program, side-by-side
λ¬Έλ²κ³Ό νλ‘κ·Έλ¨ λΉκ΅
μΌμͺ½: CFG / μ€λ₯Έμͺ½: μ€μ μ½λλ₯Ό λλν λΉκ΅
λ§€μΉ κ³Όμ
| μ½λ | λ¬Έλ² |
|---|---|
| PROGRAM | ok |
| foo | id |
| (input, output) | ok |
| BEGIN ~ END | block |
μ€λ₯ λΆλΆ
IF a THEN a:= a+1; b := 22 ENDIF;
CFG κ·μΉ: IF exp THEN stmt ELSE stmt
β ELSE μμ β λ¬Έλ² λΆμΌμΉ β syntax error
π₯ 16~20νμ΄μ§ ν΅μ¬ μμ½
- derivation μ’ λ₯ (λ§€μ° μ€μ): Leftmost β LL parser / Rightmost β LR parser
- CFG μμ± κ·μΉ: Start symbol / Terminal / Nonterminal ꡬλΆ
- Ξ΅ μλ―Έ: optional (μμ΄λ λ¨)
- μ€μ μν ν΅μ¬ μ ν: μ½λκ° CFGμ λ§λμ§ νλ¨
- μ€μ ν¬μΈνΈ: CFGλ βκ°λ₯ν νλ‘κ·Έλ¨ μ μβ / parserλ βκ·Έ μμ ν¬ν¨λλμ§ κ²μ¬β
π 21νμ΄μ§ β Lexical Analyzer μν
Lexical Analyzer (Scanner, Tokenizer)
μλ³Έ μ½λ
PROGRAM foo (input, output);
VAR a, b: integer;
BEGIN
IF a <> 0 THEN BEGIN a:= a+1; b := 22 END
END.
scanner μΆλ ₯ (ν ν°λ€)
PROGRAM foo ( , output ) ;
IF a THEN
a := a + 1 ; b := 22
END .
;
BEGIN
VAR a , b : integer ;
input
<> 0
END
BEGIN
ν΅μ¬ μ€λͺ
scanner μν :
- λ¬Έμμ΄ β ν ν° λμ΄ β
- λ¬Έλ² μ²΄ν¬ β
- ꡬ쑰 μ΄ν΄ β
Lexical Analyzerλ βμλ―Έβλ βꡬ쑰βλ₯Ό λͺ¨λ₯Έλ€ β λ¨μν λ¬Έμμ΄ β ν ν°μΌλ‘ μͺΌκ°€ λΏμ΄λ€
π 22νμ΄μ§ β Syntax Analyzer μν
Syntax Analyzer (Parser)
νλ¦
Tokens β Parser β Parse Tree
| λ¨κ³ | μν |
|---|---|
| scanner | ν ν° μμ± |
| parser | ꡬ쑰 μμ± (νΈλ¦¬) |
- Scanner β ν ν° λμ΄
- Syntax Analyzer (Parser) β ν ν°μ μ λ ₯μΌλ‘ λ°μ λ¬Έλ² κ΅¬μ‘°(Parse Tree) μμ±
π 23νμ΄μ§ β Syntax Error μμ
Syntax Error λ°μ
λ¬Έμ μ½λ
PROGRAM foo (input, output);
BEGIN
IF a THEN a:= a+1; b := 22 ENDIF;
END;
parser μλ¬
Expected token: ELSE
Actual token from the scanner: (μ€μ scannerκ° μ€ ν ν°)
ν΅μ¬
- CFG κ·μΉ:
IF exp THEN stmt ELSE stmt - μ€μ μ½λ: ELSE μμ β κ·μΉ λΆμΌμΉ
scannerλ λ¬Έμ μμ
parserμμ μλ¬ λ°μ
Syntax Error = parser λ¨κ³μμ λ°μ
π 24νμ΄μ§ β Grammar & Program (Reminder)
Grammar & Program, side-by-side (Reminder)
λ¬Έλ²κ³Ό νλ‘κ·Έλ¨ λΉκ΅ (λ€μ 보기)
μΌμͺ½: CFG / μ€λ₯Έμͺ½: μ½λλ₯Ό λλν λΉκ΅ν΄μ λ¬Έλ² μλ° μ°ΎκΈ°
PROGRAM foo (input, output);
BEGIN
IF a THEN a:= a+1; b := 22 ENDIF;
END;
- IF β ELSE νμ (μμ β μ€λ₯)
- ENDIF β Pascal λ¬Έλ² μλ
- β CFGλ‘ μμ± λΆκ° β syntax error
π 25νμ΄μ§ β SYNTAX ERROR νμ
SYNTAX ERROR!
SYNTAX ERROR!
Expected token: ELSE
Actual token from the scanner: ...
parserλ νμ:
κΈ°λν ν ν° vs μ€μ ν ν° β mismatch β Syntax Error
π₯ 21~25νμ΄μ§ ν΅μ¬ μμ½
- scanner vs parser: scanner β ν ν° μμ± / parser β λ¬Έλ² κ²μ¬
- Syntax Error λ°μ μμΉ: parser λ¨κ³
- error μμΈ: CFG κ·μΉ λΆμΌμΉ
- λν μ: IFμ ELSE μμ β μ€λ₯
- μ€μν κ°λ
νλ¦:
input β scanner β tokens β parser β parse tree β μ€ν¨ μ syntax error
π 26νμ΄μ§ β Extended BNF
Extended BNF
νμ₯λ BNF
Extended BNF combines BNF with RE
EBNFλ BNFμ μ κ·ννμμ κ²°ν©ν κ²μ΄λ€
EBNF production νν
LHS ::= RHS
- LHS β nonterminal symbol
- RHS β extended regular expression (terminal + nonterminalλ‘ κ΅¬μ±)
EBNFκ° BNFμ μΆκ°νλ κΈ°νΈ
| κΈ°νΈ | μλ―Έ |
|---|---|
( ) |
κ·Έλ£Ήν |
* |
0λ² μ΄μ λ°λ³΅ |
+ |
1λ² μ΄μ λ°λ³΅ |
? |
0 λλ 1λ² |
β μ΄ 4κ°λ 무쑰건 κΈ°μ΅
The miniC grammar is given in EBNF
miniC λ¬Έλ²μ EBNFλ‘ μ£Όμ΄μ§λ€
π 27νμ΄μ§ β EBNF Example
Extended BNF Example
κ°λ¨ν ννμ μΈμ΄
Expression ::= PrimaryExp (Operator PrimaryExp)*
PrimaryExp ::= Literal | Identifier | "(" Expression ")"
Identifier ::= Letter (Letter | Digit)*
Literal ::= Digit Digit*
Letter ::= "a" | ... | "z"
Digit ::= "0" | ... | "9"
Operator ::= "+" | "-" | "*" | "/"
ν΅μ¬
(Operator PrimaryExp)*β μ°μ° μ¬λ¬ λ² κ°λ₯a + b * cκ°μ ννμ μμ± κ°λ₯
π 28νμ΄μ§ β Kleene Closure (*)
An EBNF example from miniC: Kleene Closure
A miniC program that consists of a sequence of zero or more declarations
miniC νλ‘κ·Έλ¨μ μ μΈμ΄ 0κ° μ΄μμΌλ‘ ꡬμ±λλ€
BNF (볡μ‘ν μ¬κ·)
program ::= decl-list
decl-list ::= decl-list func-decl
| decl-list var-decl
| Ξ΅
EBNF (ν μ€λ‘)
program ::= (func-decl | var-decl)*
*= Kleene Closure = 0κ° μ΄μ λ°λ³΅
π 29νμ΄μ§ β Positive Closure (+)
An EBNF example from miniC: Positive Closure
one or more declarations β νλ μ΄μ μ μΈ
BNF
decl-list ::= decl-list func-decl
| decl-list var-decl
| func-decl
| var-decl
EBNF
program ::= (func-decl | var-decl)+
+= 1κ° μ΄μ
π 30νμ΄μ§ β Option (?)
An EBNF example from miniC: Option ?
The if-statement with an optional else-part
elseκ° μ νμ μΈ ifλ¬Έ
BNF
stmt ::= if "(" expr ")" stmt
| if "(" expr ")" stmt else stmt
| other
EBNF
stmt ::= if "(" expr ")" stmt (else stmt)?
| other
?= μμ μλ μκ³ μμ μλ μμ
μ΄κ²μ΄ dangling else λ¬Έμ μ μμμ΄λ€
π₯ 26~30νμ΄μ§ ν΅μ¬ μμ½
| κΈ°νΈ | μλ―Έ |
|---|---|
* |
0 μ΄μ (Kleene Closure) |
+ |
1 μ΄μ (Positive Closure) |
? |
μ ν (Option) |
() |
κ·Έλ£Ήν |
- BNF vs EBNF: EBNFλ βμΆμ½ ννβ
- λͺ¨λ 볡μ‘ν λ¬Έλ²λ κ²°κ΅ REμ²λΌ νν κ°λ₯
π 31νμ΄μ§ β Grammar Transformations (μ΄λ‘ )
A little bit of useful theory
μ½κ°μ μ μ©ν μ΄λ‘
We will now look at a very few useful bits of theory.
μ΄μ λͺ κ°μ§ μ μ©ν μ΄λ‘ μ μ΄ν΄λ³Έλ€
These will be necessary later when we implement parsers.
μ΄κ²λ€μ λμ€μ parserλ₯Ό ꡬνν λ νμνλ€
Grammar transformations (λ¬Έλ² λ³ν)
A grammar can be transformed in a number of ways without changing the meaning
λ¬Έλ²μ μλ―Έλ₯Ό λ°κΎΈμ§ μκ³ μ¬λ¬ λ°©μμΌλ‘ λ³νλ μ μλ€
(i.e., the set of strings that it generates)
μ¦, μμ±νλ λ¬Έμμ΄ μ§ν©μ λμΌνκ² μ μ§
λ³ν μ΄μ : νμ λ§λ€κΈ° μ½κ² νκΈ° μν΄
π 32νμ΄μ§ β Grammar Transformations (1)
Grammar Transformations (1)
λ¬Έλ² λ³ν 1 β Left Factorization (μ’μΈ‘ μΈμλΆν΄)
κΈ°μ‘΄ λ¬Έλ²
single-Command ::= V-name := Expression
| if Expression then single-Command
| if Expression then single-Command else single-Command
λ¬Έμ : λ κ·μΉμ΄ κ°μ prefix (if Expression then single-Command)
λ³ν ν
single-Command ::= V-name := Expression
| if Expression then single-Command ( Ξ΅ | else single-Command )
μΌλ° νν
N ::= X Y | X Z β N ::= X (Y | Z)
βμλΆλΆ κ°μΌλ©΄ λ¬Άμ΄λΌβ
LL parserμμ νμ
π 33νμ΄μ§ β Grammar Transformations (2)
Grammar Transformations (2)
λ¬Έλ² λ³ν 2 β Elimination of Left Recursion (μ’μΈ‘ μ¬κ· μ κ±°)
λ¬Έμ λ¬Έλ²
N ::= X | N Y
β LL parserμμ 무ν 루ν λ°μ
μμ
Identifier ::= Letter
| Identifier Letter
| Identifier Digit
λ³ν ν
Identifier ::= Letter (Letter | Digit)*
λ³ν μ리
N β X
β X Y
β X Y Y
β ...
β X Y*
Left recursion β Kleene star (
*)
π 34νμ΄μ§ β Grammar Transformations (3)
Grammar Transformations (3)
λ¬Έλ² λ³ν 3 β Substitution of non-terminal symbols (λΉλ¨λ§ μΉν)
κΈ°λ³Έ νν
N ::= X
M ::= Ξ± N Ξ²
λ³ν ν:
M ::= Ξ± X Ξ²
μ€κ° λ¨κ³ μ κ±° β μ§μ μ°κ²°
μμ
single-Command ::= for contrVar := Expression
to-or-dt Expression do single-Command
to-or-dt ::= to | downto
λ³ν ν:
single-Command ::= for contrVar := Expression
(to | downto) Expression do single-Command
nonterminal μ κ±° β inline μ²λ¦¬
π 35νμ΄μ§ β Outlook (μ 리)
Outlook
μ 리 λ° μ΄ν λ΄μ©
μλ£λ λ΄μ© β
- Syntax and Semantics of Programming Languages β
- Specifying the Syntax of a programming language β
- CFGs, BNF and EBNF β
- Grammar transformations β
λ€μ λ΄μ©
- Parsing
- Top-down parsing (LL)
- Recursive descent
- LL Grammars
- AST Construction
- Chomskyβs Hierarchy
μ§κΈκΉμ§λ βλ¬Έλ² μ μβ
μ΄μ λΆν°λ βνμ± κ΅¬νβ
π₯ 31~35νμ΄μ§ ν΅μ¬ μμ½
| λ³ν | λͺ©μ |
|---|---|
| Left Factorization | κ³΅ν΅ prefix μ κ±° |
| Left Recursion μ κ±° | LL parser νμ 쑰건 |
| Substitution | λΉλ¨λ§ inline μ²λ¦¬ |
- Grammar Transformation λͺ©μ : νμ λ§λ€κΈ° μ½κ²
π₯ μ 체 κ°μ ν΅μ¬ μμ½ (1~35νμ΄μ§)
β μν ν΅μ¬ 6κ°μ§
| νλͺ© | λ΄μ© |
|---|---|
| β 1. CFG | μΈμ΄ μ μ ν΅μ¬ |
| β 2. Derivation | λ¬Έμ₯ μμ± κ³Όμ |
| β 3. Syntax vs Semantics | 무쑰건 κ΅¬λΆ |
| β 4. EBNF κΈ°νΈ | * / + / ? / () |
| β 5. Grammar Transformation | Left factoring / Left recursion μ κ±° |
| β 6. Parser μν | CFG κΈ°λ° λ¬Έλ² κ²μ¬ |
μ»΄νμΌλ¬ μ 체 νλ¦
input β scanner β tokens β parser β parse tree
β μ€ν¨ μ Syntax Error
Grammar Transformation μμ½
RE β NFA β DFA β μ΅μ DFA (lexical)
CFG β Left Factoring β Left Recursion μ κ±° β Parser (syntax)
Leave a comment