Syntax Analysis 2
COMP321 Compiler β Syntax Analysis 2
Kyungpook National University | Hwisoo So | Spring 2026
π 1νμ΄μ§ β μ λͺ©
Syntax Analysis β 2
ν΄μ: ꡬ문 λΆμ β 2
β μ€λͺ μ΄ κ°μλ Syntax Analysis(ꡬ문 λΆμ)μ λ λ²μ§Έ ννΈλ€. μ¦,
- Lexical Analysis(ν ν°ν) μ΄ν λ¨κ³
- μ€μ λ¬Έμ₯ ꡬ쑰λ₯Ό λΆμνλ λ¨κ³
λ₯Ό 본격μ μΌλ‘ λ€λ£¨κΈ° μμνλ κ°μλ€.
Hwisoo So
Kyungpook National University
ν΄μ: μν¬μ / κ²½λΆλνκ΅
β μ€λͺ : κ°μ λ΄λΉ κ΅μ μ 보.
COMP321 Compiler
Spring 2026
ν΄μ: COMP321 μ»΄νμΌλ¬ κ³Όλͺ© / 2026λ λ΄ νκΈ°
β μ€λͺ : κ³Όλͺ© μ 보 + νκΈ°.
πΌ κ·Έλ¦Ό μ€λͺ
(νμ΄μ§ μ 체)
- μ€μμ βSyntax Analysis β 2β ν¬κ² μμ
- μλμ κ΅μ μ΄λ¦ + νκ΅
- νλ¨μ κ²½λΆλνκ΅ λ‘κ³
π μλ―Έ: μ΄κ±΄ κ·Έλ₯ κ°μ νμ΄ν μ¬λΌμ΄λλΌμ κ°λ λ΄μ© μμ.
π 2νμ΄μ§
Syntax Analysis COMP321@KNU
ν΄μ: ꡬ문 λΆμ (κ²½λΆλ μ»΄νμΌλ¬ κ³Όλͺ©)
Outlook
ν΄μ: κ°μ / μμΌλ‘ λ°°μΈ λ΄μ©
Syntax and Semantics of Programming Languages β
ν΄μ: νλ‘κ·Έλλ° μΈμ΄μ ꡬ문과 μλ―Έ β
β μ€λͺ : μ΄λ―Έ μ΄μ κ°μμμ
- syntax (νμ)
- semantics (μλ―Έ)
λ₯Ό λ°°μ λ€λ λ».
Specifying the Syntax of a programming language: β
ν΄μ: νλ‘κ·Έλλ° μΈμ΄μ ꡬ문μ μ μνλ λ°©λ² β
β CFGs, BNF and EBNF β
ν΄μ: CFG, BNF, EBNF β
β μ€λͺ : λ¬Έλ² μ μ λ°©μλ€ μ΄λ―Έ νμ΅ μλ£.
β Grammar transformations β
ν΄μ: λ¬Έλ² λ³ν β
β μ€λͺ : left recursion μ κ±°, factoring κ°μ κ².
Parsing
ν΄μ: νμ±
β μ€λͺ : π μ§κΈλΆν° ν΅μ¬ β scanner λ€μ λ¨κ³ = parser
β Top-down parsing (LL)
ν΄μ: νλ€μ΄ νμ± (LL)
β Recursive descent (LL) parser construction
ν΄μ: μ¬κ· νκ° νμ ꡬν
β LL Grammars
ν΄μ: LL λ¬Έλ²
AST Construction
ν΄μ: AST μμ±
β Parse trees vs. ASTs
ν΄μ: νμ€ νΈλ¦¬ vs AST
Chomskyβs Hierarchy
ν΄μ: μ΄μ€ν€ κ³μΈ΅ ꡬ쑰
β μ€λͺ : μ΄ μ¬λΌμ΄λλ μμΌλ‘ λ°°μΈ κ² λ‘λλ§΅μ΄λ€.
ν΅μ¬ νλ¦:
λ¬Έλ² μ μ β νμ± β AST μμ± β μ΄λ‘
νΉν μ€μν 건:
π μ΄μ λΆν° βParsingβ ννΈ μμ
π 3νμ΄μ§
Parsing
ν΄μ: νμ±
We will now look at parsing.
ν΄μ: μ΄μ νμ±μ μ΄ν΄λ³Ό κ²μ΄λ€.
Topics:
ν΄μ: λ€λ£° λ΄μ©:
Derivations and parse trees
ν΄μ: μ λ κ³Όμ κ³Ό νμ€ νΈλ¦¬
Ambiguous grammars
ν΄μ: λͺ¨νΈν λ¬Έλ²
β Operator precedence
ν΄μ: μ°μ°μ μ°μ μμ
β Operator associativity
ν΄μ: μ°μ°μ κ²°ν© λ°©ν₯
Recursive descent parsing
ν΄μ: μ¬κ· νκ° νμ±
β What it is.
ν΄μ: κ·Έκ² λ¬΄μμΈμ§
β How to implement it given an EBNF specification.
ν΄μ: EBNF λͺ μΈλ₯Ό κΈ°λ°μΌλ‘ μ΄λ»κ² ꡬννλμ§
β μ€λͺ : μ΄ νμ΄μ§ ν΅μ¬:
π parser 곡λΆμμ μ€μν 3κ°
- derivation (λ¬Έμ₯ μμ± κ³Όμ )
- parse tree (ꡬ쑰 νν)
- ambiguity (λͺ¨νΈμ±)
그리κ³
π μ€μ ꡬν: recursive descent
π 4νμ΄μ§
Grammars: Definition
ν΄μ: λ¬Έλ²: μ μ
Typically, languages are specified as grammars
ν΄μ: μΌλ°μ μΌλ‘ μΈμ΄λ λ¬Έλ²μΌλ‘ μ μλλ€
β μ€λͺ : νλ‘κ·Έλλ° μΈμ΄ = CFGλ‘ μ μλ¨
Formally, a grammar is a tuple (N, Ξ£, S, P) where
ν΄μ: νμμ μΌλ‘ λ¬Έλ²μ (N, Ξ£, S, P)λ‘ κ΅¬μ±λλ€
N is a finite set of non-terminal symbols
ν΄μ: Nμ μ νν λΉλ¨λ§ μ§ν©μ΄λ€
β μ€λͺ
: <expr>, <stmt> κ°μ κ²
Ξ£ is a set of terminal symbols
ν΄μ: Ξ£λ λ¨λ§ κΈ°νΈ μ§ν©μ΄λ€
β μ€λͺ : μ€μ ν ν° (if, +, id λ±)
S β N is the start symbol
ν΄μ: Sλ μμ κΈ°νΈμ΄λ€
β μ€λͺ : λ¬Έλ²μ μΆλ°μ
P β (N βͺ Ξ£)β Γ (N βͺ Ξ£)β are the production rules
ν΄μ: Pλ μμ± κ·μΉ μ§ν©μ΄λ€
β μ€λͺ : A β B νν
N = {<prog>, <expr>, <var>}
π λΉλ¨λ§ μ§ν©
Ξ£ = {True, False, ;, &, |, a, b, β¦}
π λ¨λ§ μ§ν©
S = <prog>
π μμ μ¬λ³Ό
Production rules
<prog> β
<prog> β <expr>; <prog>
π νλ‘κ·Έλ¨μ μ¬λ¬ exprλ‘ κ΅¬μ±λ¨
<expr> β True / False
π boolean κ°
<expr> β <var>
π λ³μ
<expr> β !<expr>
π NOT μ°μ°
<expr> β <expr> & <expr>
<expr> β <expr> | <expr>
π AND, OR
<var> β Var(_)
π λ³μ μμ±
β ν΅μ¬ μ€λͺ :
π βλ¬Έλ² = μΈμ΄λ₯Ό μ μνλ μνμ ꡬ쑰β
π 5νμ΄μ§
Grammars: Language
ν΄μ: λ¬Έλ²μ΄ μ μνλ μΈμ΄
The language of a grammar is the set of words that can be derived from the start symbol by applying production rules
ν΄μ: λ¬Έλ²μ μΈμ΄λ μμ κΈ°νΈλ‘λΆν° μμ± κ·μΉμ μ μ©νμ¬ λ§λ€μ΄μ§ μ μλ λͺ¨λ λ¬Έμμ΄μ μ§ν©μ΄λ€
β μ€λͺ : π λ§€μ° μ€μ β Grammar β Language μμ±
Example derivation
<prog>
β <expr>; <prog>
β <expr>;
β <expr>&<expr>;
β <var>&<expr>;
β foo&<expr>;
β foo&<var>;
β foo&bar;
ν΄μ + μ€λͺ : μ΄κ±΄ λ¬Έμμ΄μ λ§λλ κ³Όμ μ΄λ€.
λ¨κ³λ³:
<prog>- β
<expr>; - β
<expr>&<expr>; - β
<var>&<expr>; - β
foo&<expr>; - β
foo&<var>; - β
foo&bar;
π μ΅μ’ κ²°κ³Ό λ¬Έμμ΄ μμ±λ¨
πΌ κ·Έλ¦Ό μ€λͺ (μ€μ)
μ¬λΌμ΄λ νλ¨ νΈλ¦¬
π parse tree ꡬ쑰:
루νΈ: <prog>
μ€κ°: <expr> & <expr>
리ν: foo, bar
π μλ―Έ:
foo & bar;μ΄ κ΅¬μ‘°λ₯Ό νΈλ¦¬λ‘ ννν κ²
β ν΅μ¬ μ 리:
π βλ¬Έλ²μ λ¬Έμμ΄μ μμ±νλ κ·μΉμ΄κ³ , κ·Έ κ²°κ³Ό μ§ν©μ΄ μΈμ΄λ€β
π₯ 1~5νμ΄μ§ ν΅μ¬ μμ½
- λ¬Έλ²(CFG)μ μΈμ΄λ₯Ό μ μνλ μνμ ꡬ쑰
- (N, Ξ£, S, P)λ‘ κ΅¬μ±λ¨
- production ruleμ ν΅ν΄ λ¬Έμμ΄μ μμ±
- derivation = μμ± κ³Όμ
- parse tree = ꡬ쑰 νν
π 6νμ΄μ§
Grammars for Regular Languages
ν΄μ: μ κ· μΈμ΄λ₯Ό μν λ¬Έλ²
Can we place a restriction on the form of a grammar to ensure that it describes a regular language?
ν΄μ: λ¬Έλ²μ ννμ μ νμ λμ΄ κ·Έκ²μ΄ μ κ· μΈμ΄λ₯Ό νννλλ‘ λ§λ€ μ μμκΉ?
β μ€λͺ : π ν΅μ¬ μ§λ¬Έ β βλͺ¨λ CFG λ§κ³ μ κ·μΈμ΄λ§ νννλ λ¬Έλ² λ°λ‘ λ§λ€ μ μλ?β
Provable fact:
ν΄μ: μ¦λͺ κ°λ₯ν μ¬μ€:
For any RE r, there is a grammar g such that L(r) = L(g).
ν΄μ: μμμ μ κ·ννμ rμ λν΄, L(r) = L(g)λ₯Ό λ§μ‘±νλ λ¬Έλ² gκ° μ‘΄μ¬νλ€.
β μ€λͺ : π RE β Grammar λμ κ°λ₯ β μ¦, μ κ·ννμ == μ κ·λ¬Έλ²
The grammars that generate regular sets are called regular grammars
ν΄μ: μ κ· μ§ν©μ μμ±νλ λ¬Έλ²μ μ κ· λ¬Έλ²μ΄λΌκ³ νλ€
Definition:
ν΄μ: μ μ:
In a regular grammar, all productions have one of two forms:
ν΄μ: μ κ· λ¬Έλ²μμλ λͺ¨λ μμ± κ·μΉμ΄ λ€μ λ νν μ€ νλμ΄λ€
A β aA(orA β Aa)
ν΄μ: A β aA (λλ A β Aa)
β μ€λͺ : π μ¬κ· (λ¬Έμ + μν)
A β a(orA β Ξ΅)
ν΄μ: A β a (λλ A β Ξ΅)
β μ€λͺ : π μ’ λ£ μ‘°κ±΄
where A is any non-terminal and a is any terminal symbol
ν΄μ: μ¬κΈ°μ Aλ λΉλ¨λ§, aλ λ¨λ§μ΄λ€
These are also called type 3 grammars (Chomsky)
ν΄μ: μ΄κ²μ μ΄μ€ν€ νμ 3 λ¬Έλ²μ΄λΌκ³ νλ€
β μ€λͺ : μ΄μ€ν€ κ³μΈ΅:
- Type 3 β Regular
- Type 2 β CFG
β ν΅μ¬: π μ κ·λ¬Έλ²μ νν μ ν μμ
π 7νμ΄μ§
Grammars for Regular Languages
ν΄μ: μ κ· μΈμ΄ λ¬Έλ²
In a regular grammar (Type-3 grammar), all productions have one of two forms
ν΄μ: μ κ· λ¬Έλ²μμλ μμ± κ·μΉμ΄ λ ννλ§ κ°μ§λ€
A β aA
A β a
Operations:
ν΄μ: μ°μ°:
(1) Symbol: a
ν΄μ: κΈ°νΈ: a
A β a
ν΄μ: Aλ aλ‘ μμ±
| **(2) Alternation: (a | b)** |
ν΄μ: μ ν (a λλ b)
A β a
A β b
ν΄μ: A β a λλ A β b
β μ€λͺ : π OR μ°μ°
(3) Concatenation: aβb
ν΄μ: μ°κ²°: a λ€μ b
A β aB
B β b
ν΄μ: a λ€μ b μμ±
β μ€λͺ : π μμ μ°κ²°
(4) Repetition (Kleene Star): a*
ν΄μ: λ°λ³΅: a*
A β aA
A β Ξ΅
ν΄μ: λ°λ³΅νκ±°λ μ’ λ£
β μ€λͺ : π a* ꡬν
β ν΅μ¬: π μ κ·ννμ μ°μ°μ λ¬Έλ²μΌλ‘ ꡬν κ°λ₯
π 8νμ΄μ§
Parse Trees
ν΄μ: νμ€ νΈλ¦¬
Correspondence between a derivation and a parse tree:
ν΄μ: μ λ κ³Όμ κ³Ό νμ€ νΈλ¦¬λ μλ‘ λμλλ€
β μ€λͺ : π derivation β tree
λ¬Έλ²:
expr ::= id | int | - expr | ( expr ) | expr op expr
op ::= + | - | * | /
ν΄μ: exprλ λ€μ μ€ νλ / opλ μ°μ°μ
generate string βslope * x + interceptβ:
ν΄μ: βslope * x + interceptβ μμ±
derivation κ³Όμ
expr β expr op expr
β expr op id
β expr + id
β expr op expr + id
β expr op id + id
β expr * id + id
β id * id + id
π μ΅μ’ :
slope * x + intercept
πΌ κ·Έλ¦Ό μ€λͺ (λ§€μ° μ€μ)
μ¬λΌμ΄λ μ€λ₯Έμͺ½ νΈλ¦¬:
expr
/ | \
expr op expr
| | |
expr + id(intercept)
/ | \
id * id
π μλ―Έ:
(slope * x) + intercept
β ν΅μ¬:
- π derivation = linear κ³Όμ
- π parse tree = ꡬ쑰 νν
π 9νμ΄μ§
Parse Trees (cont.)
ν΄μ: νμ€ νΈλ¦¬ (κ³μ)
Correspondence between a derivation and a parse tree
ν΄μ: μ λμ νμ€ νΈλ¦¬ λμ
The parse treeβs internal nodes are nonterminals
ν΄μ: λ΄λΆ λ Έλλ λΉλ¨λ§μ΄λ€
The children of a node are the terminals and nonterminals on the right-hand side
ν΄μ: μμμ RHS κ΅¬μ± μμ
The leaves of the parse tree are the terminals (tokens)
ν΄μ: 리νλ ν ν°μ΄λ€
When read from left to right, the leaves make up the sentence
ν΄μ: 리νλ₯Ό μΌβμ€ μ½μΌλ©΄ λ¬Έμ₯μ΄ λλ€
πΌ κ·Έλ¦Ό μ€λͺ
νΈλ¦¬ ꡬ쑰:
루νΈ: expr
λ΄λΆ: expr, op
리ν: id(slope), *, id(x), +, id(intercept)
π μ½μΌλ©΄:
slope * x + intercept
β ν΅μ¬: π parse treeλ βλ¬Έμ₯μ ꡬ쑰βλ₯Ό μ νν 보μ¬μ€λ€
π 10νμ΄μ§
Ambiguous Grammars
ν΄μ: λͺ¨νΈν λ¬Έλ²
Ambiguity: one sentential form has several distinct parse trees
ν΄μ: λͺ¨νΈμ±: νλμ λ¬Έμ₯μ΄ μ¬λ¬ νμ€ νΈλ¦¬λ₯Ό κ°μ§ μ μλ€
Example: slope * x + intercept
ν΄μ: μμ
(slope * x) + intercept
ν΄μ: κ³± λ¨Όμ
slope * (x + intercept)
ν΄μ: λ§μ λ¨Όμ
πΌ κ·Έλ¦Ό μ€λͺ (ν΅μ¬)
λ κ° νΈλ¦¬ μ‘΄μ¬:
1οΈβ£
+
/ \
* intercept
/ \
slope x
π
(slope * x) + intercept
2οΈβ£
*
/ \
slope +
/ \
x intercept
π
slope * (x + intercept)
Problem: Operator precedence!
ν΄μ: λ¬Έμ : μ°μ°μ μ°μ μμ
In the 2nd tree, β+β has precedence over β*β!
ν΄μ: λ λ²μ§Έ νΈλ¦¬μμλ +κ° *λ³΄λ€ μ°μ λ¨
β ν΅μ¬ μ€λͺ :
π grammarκ° λͺ¨νΈνλ©΄: κ°μ μ½λ β λ€λ₯Έ μλ―Έ
π μ»΄νμΌλ¬ μ μ₯μμ μΉλͺ μ
π₯ 6~10νμ΄μ§ ν΅μ¬ μμ½
- μ κ·λ¬Έλ² = νν μ νλ CFG
- RE β Grammar λμ κ°λ₯
- derivation β parse tree λμ
- parse treeλ λ¬Έμ₯ ꡬ쑰λ₯Ό νν
- ambiguous grammarλ μ λ μ¬μ©νλ©΄ μλ¨
- λ¬Έμ μμΈ = operator precedence
π 11νμ΄μ§
Ambiguous Grammars (cont.)
ν΄μ: λͺ¨νΈν λ¬Έλ² (κ³μ)
When more than one distinct derivation of a sentence exists (which means there exist several distinct parse trees), the grammar is ambiguous.
ν΄μ: νλμ λ¬Έμ₯μ λν΄ μ¬λ¬ μλ‘ λ€λ₯Έ μ λ κ³Όμ μ΄ μ‘΄μ¬νλ©΄ (μ¦ μ¬λ¬ κ°μ νμ€ νΈλ¦¬κ° μ‘΄μ¬νλ©΄) κ·Έ λ¬Έλ²μ λͺ¨νΈνλ€
β μ€λͺ : π ν΅μ¬ μ μ
μ¬λ¬ derivation β μ¬λ¬ parse tree β ambiguous
A programming language construct should have only one parse tree to avoid misinterpretation by a programmer/compiler.
ν΄μ: νλ‘κ·Έλλ° μΈμ΄ ꡬ쑰λ μ€ν΄λ₯Ό λ°©μ§νκΈ° μν΄ νλμ νμ€ νΈλ¦¬λ§ κ°μ ΈμΌ νλ€
β μ€λͺ : π μ»΄νμΌλ¬λ βμλ―Έκ° νλλ‘ κ³ μ βλμ΄μΌ ν¨
For expression grammars, precedence and associativity of operators are used to disambiguate the productions.
ν΄μ: ννμ λ¬Έλ²μμλ μ°μ°μ μ°μ μμμ κ²°ν© λ°©ν₯μ μ¬μ©νμ¬ λͺ¨νΈμ±μ μ κ±°νλ€
We rewrite the grammar to make it un-ambiguous:
ν΄μ: λ¬Έλ²μ λ€μ μμ±νμ¬ λͺ¨νΈμ±μ μ κ±°νλ€
expr ::= term | expr add_op term
term ::= factor | term mult_op factor
factor ::= id | number | - factor | ( expr )
add_op ::= + | -
mult_op ::= * | /
ν΄μ + μ€λͺ :
exprβ π λ§μ /λΊμ λ΄λΉtermβ π κ³±μ /λλμ λ΄λΉfactorβ π κ°μ₯ κΈ°λ³Έ λ¨μ
πΌ κ·Έλ¦Ό μ€λͺ
μ€λ₯Έμͺ½ λ Έλ λ°μ€:
π β+λ₯Ό *λ³΄λ€ μμ λμ§ λ§κ³ μλλ‘ λ΄λ €λΌβ
β μ¦: * > +
μΌμͺ½ λ Έλ λ°μ€:
π left associativity
a - b - c = (a - b) - c
β ν΅μ¬: π λ¬Έλ² κ΅¬μ‘°λ‘ μ°μ μμλ₯Ό κ°μ νλ€
π 12νμ΄μ§
Ambiguous if-then-else
ν΄μ: if-then-elseμ λͺ¨νΈμ±
A well-known example of an ambiguous grammar are the following productions for if-then-else
ν΄μ: μ μλ €μ§ λͺ¨νΈν λ¬Έλ² μμλ if-then-elseμ΄λ€
stmt ::= IF expr THEN stmt
| IF expr THEN stmt ELSE stmt
ν΄μ: ifλ¬Έ μ μ
Example: IF a THEN IF b THEN x=false; ELSE x=true;
ν΄μ: μμ
β μ€λͺ (ν΅μ¬):
π λ¬Έμ : ELSEκ° μ΄λ IFμ λΆλκ°?
This grammar can be repaired, but the above problem indicates a programming language design problem.
ν΄μ: μ΄ λ¬Έλ²μ μμ ν μ μμ§λ§, μ΄ λ¬Έμ λ μΈμ΄ μ€κ³ λ¬Έμ λ₯Ό μλ―Ένλ€
Ada uses a different syntax to avoid this problem
ν΄μ: Adaλ λ€λ₯Έ λ¬Έλ²μΌλ‘ ν΄κ²°νλ€
stmt ::= IF expr THEN stmt END IF
| IF expr THEN stmt ELSE stmt END IF
ν΄μ: END IF μΆκ°
β ν΅μ¬:
π dangling else λ¬Έμ
π ν΄κ²°: ꡬ쑰λ₯Ό λͺ ννκ² λ§λ λ€
π 13νμ΄μ§
Some Terminology
ν΄μ: μ©μ΄ μ 리
Recognition
ν΄μ: μΈμ
To answer the question βDoes the input conform to the syntax of the language?β
ν΄μ: μ λ ₯μ΄ λ¬Έλ²μ λ§λμ§ νμΈνλ κ²
A recognizer uses a CFG to check the syntax of the program.
ν΄μ: recognizerλ CFGλ₯Ό μ¬μ©νμ¬ λ¬Έλ²μ κ²μ¬νλ€
It answers βYesβ or βNoβ
ν΄μ: Yes/Noλ‘ νλ¨
β μ€λͺ : π recognizer = λ¬Έλ² μ²΄ν¬κΈ°
Parsing
ν΄μ: νμ±
- Recognize the input program.
ν΄μ: μ λ ₯μ΄ λ§λμ§ νμΈ
- Determine the phrase structure
ν΄μ: ꡬ쑰λ₯Ό λΆμ
A parser uses a CFG to parse a sentence or a program.
ν΄μ: parserλ CFGλ₯Ό μ΄μ©ν΄ λ¬Έμ₯μ λΆμνλ€
It constructs the leftmost or rightmost derivation
ν΄μ: μ’μΈ‘/μ°μΈ‘ μ λλ₯Ό μμ±νλ€
builds the AST
ν΄μ: ASTλ₯Ό λ§λ λ€
β ν΅μ¬ λΉκ΅:
| Β | μν |
|---|---|
| Recognizer | Yes/No |
| Parser | ꡬ쑰 μμ± (AST) |
π 14νμ΄μ§
Context-Free Grammar Classes
ν΄μ: CFG ν΄λμ€
For an arbitrary CFG, parsing can take O(nΒ³) time
ν΄μ: μΌλ° CFG νμ±μ O(nΒ³)
too slow for practical applications
ν΄μ: λ무 λλ¦Ό
For several classes of grammars, a parser that takes O(n) time can be constructed
ν΄μ: νΉμ λ¬Έλ²μμλ O(n) κ°λ₯
Top-down LL parsers
ν΄μ: LL νμ
Bottom-up LR parsers
ν΄μ: LR νμ
LL = Left-to-right scanning, Left-most derivation
ν΄μ: LL = μ’βμ°, μ’μΈ‘ μ λ
LR = Left-to-right scanning, Right-most derivation
ν΄μ: LR = μ’βμ°, μ°μΈ‘ μ λ
The class of LR grammars is a proper superset of the class of LL grammars
ν΄μ: LR β LL
πΌ κ·Έλ¦Ό μ€λͺ
LL β LR
π LRμ΄ λ κ°λ ₯ν¨
π 15νμ΄μ§
Parser Motivation
ν΄μ: νμμ λͺ©μ
Given a grammar G and an input string s, we need an algorithm to:
ν΄μ: λ¬Έλ² Gμ λ¬Έμμ΄ sκ° μ£Όμ΄μ‘μ λ
Decide whether s is in L(G)
ν΄μ: sκ° μΈμ΄μ μνλμ§ νλ¨
If so, generate a parse tree
ν΄μ: λ§μΌλ©΄ νμ€ νΈλ¦¬ μμ±
We will see two algorithms
ν΄μ: λ κ°μ§ μκ³ λ¦¬μ¦μ λ³Έλ€
Each with different tradeoffs in time and space
ν΄μ: μκ°/κ³΅κ° νΈλ μ΄λμ€ν μ‘΄μ¬
β ν΅μ¬:
π parserμ μν 2κ°
- νλ³ (membership)
- ꡬ쑰 μμ± (parse tree)
π₯ 11~15νμ΄μ§ ν΅μ¬ μμ½
- ambiguous grammar β λ°λμ μ κ±°
- precedence + associativityλ‘ ν΄κ²°
- dangling else = λνμ λ¬Έμ
- recognizer vs parser μ°¨μ΄
- LL vs LR ꡬ쑰
- parser λͺ©μ = νλ³ + νΈλ¦¬ μμ±
π 16νμ΄μ§
CYK Algorithm
ν΄μ: CYK μκ³ λ¦¬μ¦
Cocke-Younger-Kasami (CYK) Algorithm
ν΄μ: Cocke, Younger, Kasamiκ° λ§λ μκ³ λ¦¬μ¦
Parsing algorithm for context-free grammars
ν΄μ: CFGλ₯Ό μν νμ± μκ³ λ¦¬μ¦μ΄λ€
Invented by John Cocke, Daniel Younger, and Tadao Kasami
ν΄μ: μΈ μ¬λμ΄ λ§λ μκ³ λ¦¬μ¦
Basic idea given string s with n tokens:
ν΄μ: κΈΈμ΄ nμΈ λ¬Έμμ΄ sκ° μ£Όμ΄μ‘μ λ κΈ°λ³Έ μμ΄λμ΄
- Find production rules that cover 1 token in s
ν΄μ: κΈΈμ΄ 1μ§λ¦¬ ν ν°μ μμ±ν μ μλ κ·μΉ μ°ΎκΈ°
- Use 1. to find rules that cover 2 tokens
ν΄μ: κΈΈμ΄ 2μ§λ¦¬ κ΅¬κ° μμ±
- Use 2. to find rules that cover 3 tokens
ν΄μ: κΈΈμ΄ 3μ§λ¦¬ κ΅¬κ° μμ±
β¦
ν΄μ: κ³μ λ°λ³΅
N. Use N-1. to find rules that cover n tokens
ν΄μ: μ΅μ’ μ μΌλ‘ μ 체 λ¬Έμμ΄ μμ±
If succeeds then s is in L(G), else it is not
ν΄μ: μ±κ³΅νλ©΄ μΈμ΄μ μν¨
β ν΅μ¬ μ€λͺ :
π CYK ν΅μ¬ ꡬ쑰
κΈΈμ΄ 1 β κΈΈμ΄ 2 β κΈΈμ΄ 3 β β¦ β μ 체
π bottom-up λ°©μ
π 17νμ΄μ§
A graphical way to visualize CYK
ν΄μ: CYKλ₯Ό κ·Έλνλ‘ νν
Initial graph: the input (terminals)
ν΄μ: μ΄κΈ° κ·Έλν = μ λ ₯
Repeat: add non-terminal edges until no more can be added
ν΄μ: λΉλ¨λ§ λ Έλλ₯Ό κ³μ μΆκ°
An edge is added when adjacent edges form RHS of a grammar production
ν΄μ: μΈμ ν λ Έλκ° RHS κ·μΉμ λ§μ‘±νλ©΄ μΆκ°
πΌ κ·Έλ¦Ό μ€λͺ (ν΅μ¬)
μ λ ₯:
a + a * a
κ·Έλν ꡬ쑰:
- μλ: terminals (
a,+,a,*,a) - μλ‘ μ¬λΌκ°λ©΄μ E μΆκ°
π μλ―Έ:
E β a
E β E + E
E β E * E
β ν΅μ¬: π κ·Έλν = parsing κ³Όμ μκ°ν
π 18νμ΄μ§
CYK: the algorithm
ν΄μ: CYK μκ³ λ¦¬μ¦
CYK is easiest for grammars in Chomsky Normal Form
ν΄μ: CYKλ CNFμμ κ°μ₯ μ½λ€
O(NΒ³) time, O(NΒ²) space
ν΄μ: μκ° O(NΒ³), κ³΅κ° O(NΒ²)
Chomsky Normal Form
ν΄μ: μ΄μ€ν€ μ κ·ν
A β BC
ν΄μ: λΉλ¨λ§ λ κ°
A β d
ν΄μ: ν°λ―Έλ νλ
S β Ξ΅
ν΄μ: λΉ λ¬Έμμ΄
β μ€λͺ :
π CNF 쑰건:
A β BCA β a
π μ΄μ : CYKλ 2κ°μ© μͺΌκ°λ©΄μ κ³μ°νκΈ° λλ¬Έ
π 19νμ΄μ§
CYK Implementation
ν΄μ: CYK ꡬν
CYK uses a table e(N,N)
ν΄μ: NxN ν μ΄λΈ μ¬μ©
set e(i,j) to true if substring input[i:j] can be derived
ν΄μ: input[i:j]λ₯Ό μμ±ν μ μμΌλ©΄ true
input[i:j] is input from index i to j-1
ν΄μ: iλΆν° j-1κΉμ§
For the grammar
ν΄μ: λ¬Έλ²
E β a | E + E | E * E
κ·μΉ
e(i,i+1)ifinput[i] == 'a'
ν΄μ: κΈΈμ΄ 1μ΄λ©΄ aμΈμ§ νμΈ
e(i,j)ife(i,k),input[k]=='+',e(k+1,j)
ν΄μ:
+κΈ°μ€μΌλ‘ λΆν
e(i,j)ife(i,k),input[k]=='*',e(k+1,j)
ν΄μ:
*κΈ°μ€ λΆν
β ν΅μ¬:
π λΆν μ 볡: i ~ j ꡬκ°μ kμμ μͺΌκ°¬
π 20νμ΄μ§
CYK Implementation
ν΄μ: CYK ꡬν
This is a form of dynamic programming
ν΄μ: DP μκ³ λ¦¬μ¦μ΄λ€
We use a table to store temporary results
ν΄μ: μ€κ° κ²°κ³Ό μ μ₯
we use the temp results to compute new ones
ν΄μ: κΈ°μ‘΄ κ²°κ³Όλ‘ μλ‘μ΄ κ³μ°
Alternative: recursive checking
ν΄μ: λμ: μ¬κ·
But we will end up re-doing a lot of computation
ν΄μ: μ€λ³΅ κ³μ° λ°μ
β ν΅μ¬ μ€λͺ :
π CYK = DP μ΄μ : κ°μ ꡬκ°μ μ¬λ¬ λ² κ³μ°νμ§ μκΈ° μν΄
π₯ 16~20νμ΄μ§ ν΅μ¬ μμ½
1. CYK λ³Έμ§
Bottom-up parsing + DP
2. λμ λ°©μ
κΈΈμ΄ 1 β 2 β 3 β β¦ β μ 체
3. ν΅μ¬ μ°μ°
e(i,j)= κ΅¬κ° i~jκ° μμ± κ°λ₯νκ°
4. λΆν
i ~ j β i~k + k+1~j
5. 쑰건
CNF (A β BC)
6. 볡μ‘λ
μκ°: O(nΒ³) / 곡κ°: O(nΒ²)
COMP321 Compiler β Syntax Analysis 2 (21~42νμ΄μ§)
Kyungpook National University | Hwisoo So | Spring 2026
π 21νμ΄μ§
Illustration
ν΄μ: μμ
Initial graph: the input (terminals)
ν΄μ: μ΄κΈ° κ·Έλν = μ λ ₯
Repeat: add non-terminal edges until no more can be added
ν΄μ: λ μ΄μ μΆκ°ν μ μμ λκΉμ§ λΉλ¨λ§μ μΆκ°
An edge is added when adjacent edges form RHS of a grammar production
ν΄μ: μΈμ ν μμκ° RHSλ₯Ό λ§μ‘±νλ©΄ μΆκ°
πΌ κ·Έλ¦Ό μ€λͺ (ν΅μ¬)
μ λ ₯:
a + a * a
ꡬ쑰:
- μλ:
a + a * a(ν ν°) - μ: μ μ E μμ±λ¨
νμ:
e(0,1), e(2,3), e(4,5)
e(0,3), e(2,5)
e(0,5)
π μλ―Έ: μμ κ΅¬κ° β ν° κ΅¬κ°μΌλ‘ νμ₯
β ν΅μ¬: π CYKλ βκ·Έλν μλ‘ μμκ°λ κ³Όμ β
π 22νμ΄μ§
CYK is dynamic programming
ν΄μ: CYKλ λμ κ³νλ²μ΄λ€
Input: a + a * a
ν΄μ: μ λ ₯
Letβs compute which facts we know hold
ν΄μ: μ°ΈμΈ κ²λ€μ κ³μ°ν΄λ³΄μ
weβll deduce facts gradually until no more can be deduced
ν΄μ: λ μ΄μ μΆλ‘ ν μ μμ λκΉμ§ λ°λ³΅
Step 1
base case (length 1)
e(0,1) = e(2,3) = e(4,5) = true
ν΄μ: κ° aλ Eλ‘ μμ± κ°λ₯
Step 2
length 3
e(0,3) = true (+)
e(2,5) = true (*)
ν΄μ: a+a / a*a κ°λ₯
Step 3
length 5
e(0,5) = true
ν΄μ: μ 체 λ¬Έμμ΄ κ°λ₯
πΌ κ·Έλ¦Ό μ€λͺ
μΈλ±μ€:
0 1 2 3 4
a + a * a
π ꡬκ°:
(0,1): a(0,3): a+a(0,5): μ 체
β ν΅μ¬: π DP ν΅μ¬ νλ¦
length 1 β length 3 β length 5
π 23νμ΄μ§
Visualize this parser in tabular form
ν΄μ: ν μ΄λΈλ‘ μκ°ν
Step 1 / Step 2 / Step 3
ν΄μ: λ¨κ³λ³ κ²°κ³Ό
πΌ κ·Έλ¦Ό μ€λͺ (μν ν΅μ¬)
ν ꡬ쑰:
- ν: i
- μ΄: j
e(i,j)μ±μμ§λ μμ
Step 1 (1μΉΈ):
(0,1), (2,3), (4,5)
Step 2 (3μΉΈ):
(0,3), (2,5)
Step 3 (5μΉΈ):
(0,5)
πΌ μ€λ₯Έμͺ½ κ·Έλ¦Ό μλ―Έ
μ«μ:
1β step12β step23β step3
π μ μ νμ₯λ¨
πΌ μλ κ·Έλν
E β a
E β E + E
E β E * E
π μ€μ μ μ© κ·μΉ
β ν΅μ¬: π CYKλ βμΌκ° ν μ΄λΈ μ±μ°κΈ°β
μλ β μ
μ§§μ β κΈ΄
π 24νμ΄μ§
CYK Parser
ν΄μ: CYK νμ
Builds the parse tree bottom-up
ν΄μ: μλμμ μλ‘ νΈλ¦¬ μμ±
given A β B C
ν΄μ: κ·μΉ
when parser finds adjacent B C
ν΄μ: Bμ Cκ° λΆμ΄μμΌλ©΄
it reduces B C to A
ν΄μ: Aλ‘ μΆμ
adding node A to parse tree
ν΄μ: νΈλ¦¬μ μΆκ°
Next lecture: top-down parsers
ν΄μ: λ€μ: top-down
β ν΅μ¬: π CYK = bottom-up reduction
B C β A
π 25νμ΄μ§
CYK Pseudocode
ν΄μ: CYK μμ¬μ½λ
μ΄κΈ° μ€μ
s = input string
P(N,N,r) = false
ν΄μ: 3μ°¨μ ν μ΄λΈ
P(i,j,Rk) = Rk is used to parse input from i to j
ν΄μ: Rkκ° i~j μμ± κ°λ₯
Step 1
for each i
for each Rk β ai
P[i,i+1,k] = true
ν΄μ: κΈΈμ΄ 1 μ²λ¦¬
Step 2
for i = 2 to n
ν΄μ: κΈΈμ΄ μ¦κ°
for each j
ν΄μ: μμ μμΉ
for each k
ν΄μ: λΆν μμΉ
ν΅μ¬ 쑰건
if P[j,j+k,B] and P[j+k,j+i,C]
β P[j,j+i,A] = true
ν΄μ: B + C β A
λ§μ§λ§
if P[0,n,R1] true
β accept
ν΄μ: μ 체 μμ± κ°λ₯νλ©΄ μ±κ³΅
β ν΅μ¬ ꡬ쑰 (μνμ©):
for length
for start
for split
for rule
π₯ 21~25νμ΄μ§ ν΅μ¬ μμ½
1. CYK μ 체 νλ¦
1 β 3 β 5 β β¦ β n
2. ν μ΄λΈ μλ―Έ
e(i,j)= i~j μμ± κ°λ₯?
3. ν΅μ¬ μ°μ°
i~j β i~k + k+1~j
4. μκ³ λ¦¬μ¦ κ΅¬μ‘°
length
start
split
rule
5. λ³Έμ§
DP + Bottom-up parsing
π 26νμ΄μ§
Illustration
ν΄μ: μμ
μ½λ ꡬ쑰
for each i = 2 to n
for each j = 0 to n-i
for each k = 1 to i-1
for each production RA β RB RC
ν΄μ:
i: λΆλΆ λ¬Έμμ΄ κΈΈμ΄j: μμ μμΉk: λΆν μμΉ
if P[j,j+k,B] and P[j+k,j+i,C]
then P[j,j+i,A] = true
ν΄μ: μΌμͺ½κ³Ό μ€λ₯Έμͺ½μ΄ κ°κ° μμ± κ°λ₯νλ©΄ μ 체λ μμ± κ°λ₯
πΌ κ·Έλ¦Ό μ€λͺ
μ λ ₯:
a a a
λ¬Έλ²:
E β a
E β E E
μν
P(0,1,E)
P(1,2,E)
P(2,3,E)
π κΈΈμ΄ 1μ λͺ¨λ true
β ν΅μ¬: π μ§κΈμ length=2 κ³μ° μμ μ§μ
π 27νμ΄μ§
μ§ν μν
i = 2
j = 0
k = 1
ν΄μ:
- κΈΈμ΄ 2
- μμ 0
- split = 1
체ν¬
P(0,1,E) and P(1,2,E)
π λ λ€ true
κ²°κ³Ό
P(0,2,E) = true
πΌ κ·Έλ¦Ό μ€λͺ
a a
β
E E β E
β ν΅μ¬: π κΈΈμ΄ 2μ§λ¦¬ μμ± μ±κ³΅
π 28νμ΄μ§
μ§ν μν
i = 2
j = 1
k = 1
ν΄μ: λ λ²μ§Έ ꡬκ°
체ν¬
P(1,2,E) and P(2,3,E)
π true
κ²°κ³Ό
P(1,3,E) = true
πΌ κ·Έλ¦Ό μ€λͺ
a a
β
E E β E
β ν΅μ¬: π λ λ²μ§Έ κΈΈμ΄ 2 ꡬκ°λ μ±κ³΅
π 29νμ΄μ§
μν μ 리
νμ¬ true:
P(0,1,E)
P(1,2,E)
P(2,3,E)
P(0,2,E)
P(1,3,E)
π μλ―Έ: κΈΈμ΄ 1 + κΈΈμ΄ 2 μ λΆ μλ£
λ€μ λ¨κ³
i = 3
π μ 체 κΈΈμ΄ κ²μ¬ μμ
π 30νμ΄μ§
μ§ν μν
i = 3
j = 0
k = 1
첫 λ²μ§Έ λΆν
P(0,1,E) and P(1,3,E)
π true
κ²°κ³Ό
P(0,3,E) = true
λ λ²μ§Έ λΆν
k = 2
체ν¬
P(0,2,E) and P(2,3,E)
π true
κ²°κ³Ό
P(0,3,E) = true (μ΄λ―Έ true)
πΌ κ·Έλ¦Ό μ€λͺ
a a a
β
(E E) E
β
E
λλ
a (a a)
β
E (E E)
β
E
β ν΅μ¬:
π μ¬λ¬ λ°©μμΌλ‘ μμ± κ°λ₯
π κ·Έλλ κ²°κ³Όλ λμΌ
π₯ 26~30νμ΄μ§ ν΅μ¬ μμ½
1. 루ν ꡬ쑰 (μκΈ° νμ)
for length i
for start j
for split k
for rule
2. ν΅μ¬ 쑰건
P[j,j+k,B] && P[j+k,j+i,C]
β P[j,j+i,A]
3. μ§ν μμ
κΈΈμ΄1 β κΈΈμ΄2 β κΈΈμ΄3
4. λΆν ν΅μ¬
i~j = i~k + k+1~j
5. μν ν¬μΈνΈ
π i, j, k κ° μ§μ μΆμ
π ν μ±μ°κΈ°
π μ΅μ’ P(0,n) νμΈ
π₯ μ§μ§ μ€μν ν μ€
CYKλ βκ΅¬κ° DP + λΆν μ 볡 + bottom-up νμ±βμ΄λ€
π 31νμ΄μ§
Illustration
ν΄μ: μμ
μ½λ (κ³μ λ°λ³΅)
for each i = 2 to n
for each j = 0 to n-i
for each k = 1 to i-1
for each production RA β RB RC
ν΄μ: μ¬μ ν CYK 루ν
μν
i = 3
j = 0
k = 1
true κ°
P(0,1,E)
P(1,2,E)
P(2,3,E)
P(0,2,E)
P(1,3,E)
P(0,3,E)
πΌ κ·Έλ¦Ό μ€λͺ
μ λ ₯:
a a a
ꡬ쑰:
- μλ:
a a a - μ: Eλ€μ΄ κ³μ μμ
π μ΅μ’ : μ 체 λ¬Έμμ΄λ Eλ‘ μμ± κ°λ₯
β ν΅μ¬: π CYK ν μ΄λΈ μμ± μν
π 32νμ΄μ§
Illustration
ν΄μ: μμ
μν
i = 3
j = 0
k = 2
체ν¬
P(0,2,E) and P(2,3,E)
π true
κ²°κ³Ό
P(0,3,E) = true
(μ΄λ―Έ trueμ§λ§ λ νμΈλ¨)
πΌ κ·Έλ¦Ό μ€λͺ
λ κ°μ§ κ²½μ°:
(a a) a
a (a a)
λ λ€ κ°λ₯
β ν΅μ¬: π νλμ λ¬Έμμ΄μ΄ μ¬λ¬ λ°©μμΌλ‘ μμ±λ¨
π 33νμ΄μ§
CYK Pseudocode
ν΄μ: CYK μμ¬μ½λ
if any of P[0,n-1,x] is true
β s is in L(G)
ν΄μ: μ 체 ꡬκ°μ΄ μμ± κ°λ₯νλ©΄ μ±κ³΅
O(NΒ²) space complexity
ν΄μ: κ³΅κ° λ³΅μ‘λ
O(NΒ³Β·r) time complexity
ν΄μ: μκ° λ³΅μ‘λ
β μ€λͺ : π r = κ·μΉ κ°μ
β ν΅μ¬:
| Β | 볡μ‘λ |
|---|---|
| μκ° | O(nΒ³) |
| κ³΅κ° | O(nΒ²) |
π 34νμ΄μ§
Given a CYK graph, we can find a parse tree
ν΄μ: CYK κ·Έλνμμ νμ€ νΈλ¦¬λ₯Ό λ§λ€ μ μλ€
Parse tree:
ν΄μ: νμ€ νΈλ¦¬
πΌ κ·Έλ¦Ό μ€λͺ (μ΄ νμ΄μ§ ν΅μ¬)
μ λ ₯:
a + a * a
κ·Έλν ꡬ쑰
μμͺ½:
E9, E10
E6, E7, E8
E11
π CYK κ²°κ³Ό (μ¬λ¬ ν보 λ Έλ)
νΈλ¦¬ (μΌμͺ½ κ·Έλ¦Ό)
E
/ \
E E
/ \ |
a a a
+
*
π μ€μ μ νλ parse tree
Q: Is this the tree we want?
ν΄μ: μ΄κ² μ°λ¦¬κ° μνλ νΈλ¦¬μΈκ°?
Why is this not part of the tree?
ν΄μ: μ μΌλΆ λ Έλλ νΈλ¦¬μ ν¬ν¨λμ§ μλκ°?
β ν΅μ¬ μ€λͺ (μ§μ§ μ€μ):
π CYKλ βκ°λ₯ν λͺ¨λ μ‘°ν©βμ λ§λ λ€
νμ§λ§
κ·Έμ€ νλλ§ μ νν΄μ parse tree λ§λ λ€
β μ μΌλΆ λ Έλλ μ μ°λ?
π μ΄μ : μ 체λ₯Ό ꡬμ±νλ κ²½λ‘λ§ μ νν΄μΌ νκΈ° λλ¬Έ
β ν΅μ¬ μ΄ν΄:
| Β | μλ―Έ |
|---|---|
| CYK κ²°κ³Ό | μ¬λ¬ κ°λ₯μ± (graph) |
| parse tree | νλμ μ νλ κ²½λ‘ |
π₯ 31~34νμ΄μ§ ν΅μ¬ μμ½
1. CYK κ²°κ³Ό vs parse tree
| Β | μλ―Έ |
|---|---|
| CYK | κ°λ₯ν λͺ¨λ ꡬ쑰 |
| Parse Tree | κ·Έ μ€ νλ μ ν |
2. μ€μν κ°λ
π ambiguityκ° μμΌλ©΄ β parse tree μ¬λ¬ κ° κ°λ₯
3. ν΅μ¬ μ§λ¬Έ
βμ΄λ€ κ²½λ‘λ₯Ό μ νν κ²μΈκ°?β
4. μκ³ λ¦¬μ¦ κ΄μ
- CYK = recognition + ν보 μμ±
- Tree μμ± = μ ν κ³Όμ μΆκ° νμ
π₯ μ΄ ννΈ ν μ€ μ 리
CYKλ βμ λ΅ μ¬λΆβκΉμ§λ 보μ₯νμ§λ§ βνΈλ¦¬ μ νβμ λ³λ λ¬Έμ λ€
π 35νμ΄μ§
Parsing
ν΄μ: νμ±
πΌ κ·Έλ¦Ό (μ 체 ꡬ쑰)
κ΅¬μ± μμ:
program text β parser β parse tree β AST β interpreter
μ€κ°μ:
- grammar
- syntax-directed translation
- AST-based interpreter
β μ€λͺ : μ 체 μ»΄νμΌλ¬ νλ¦:
ν μ€νΈ β νμ± β ꡬ쑰 β AST β μ€ν/ν΄μ
π parserμ μμΉ: μ€κ° ν΅μ¬ λ¨κ³
β ν΅μ¬:
π parserλ λ¨μν κ²μ¬νλ κ² μλλΌ
π βꡬ쑰(AST)βκΉμ§ λ§λ λ€
π 36νμ΄μ§
Top-Down Parsers and LL Grammars
ν΄μ: νλ€μ΄ νμμ LL λ¬Έλ²
Top-down parser is a parser for LL class of grammars
ν΄μ: νλ€μ΄ νμλ LL λ¬Έλ²μ©μ΄λ€
LL = Left-to-right scanning of input, Left-most derivation
ν΄μ: LL = μ’βμ° μ½κ³ , μ’μΈ‘ μ λ
Also called a predictive parser
ν΄μ: μμΈ‘ νμλΌκ³ λ νλ€
LL class is a strict subset of LR
ν΄μ: LL β LR
LL grammars cannot contain left-recursive productions
ν΄μ: LLμ left recursion λͺ» μ
X ::= X Y β
π μΌμͺ½ μ¬κ· β κΈμ§
LL(k) where k is lookahead depth
ν΄μ: kλ lookahead κ°μ
if k=1, common prefix cannot be handled
ν΄μ: LL(1)μ prefix κ²ΉμΉλ©΄ λͺ» μ²λ¦¬
μμ:
X ::= a b | a c
π λ λ€ a μμ β λ¬Έμ
A top-down parser constructs a parse tree from the root down
ν΄μ: 루νΈλΆν° νΈλ¦¬ μμ±
Not too difficult to implement recursive descent
ν΄μ: ꡬν μ¬μ
β ν΅μ¬:
| Β | μλ―Έ |
|---|---|
| Top-down | μμμ μμ |
| LL | μμΈ‘ κΈ°λ° |
π 37νμ΄μ§
Top-down Parsing Example: Micro English
ν΄μ: μμ: Micro English
λ¬Έλ²:
Sentence ::= Subject Verb Object .
Subject ::= I | a Noun | the Noun
Object ::= me | a Noun | the Noun
Noun ::= cat | mat | rat
Verb ::= like | is | see | sees
ν΄μ: κ°λ¨ν μμ΄ λ¬Έλ²
λ¬Έμ₯ μμ:
The cat sees the rat.
I like a cat
β μ€λͺ : π μ΄κ±΄ μ€μ βμΈμ΄ νμ± μμβ
β ν΅μ¬: π parserλ μ΄λ° λ¬Έμ₯μ κ΅¬μ‘°λ‘ λ°κΏ
π 38νμ΄μ§
Top-down LL Parsing
ν΄μ: νλ€μ΄ LL νμ±
πΌ κ·Έλ¦Ό μ€λͺ (ν΅μ¬)
λ¬Έμ₯:
The cat sees a rat .
νΈλ¦¬ ꡬ쑰:
Sentence
βββ Subject
βββ Verb
βββ Object
βββ .
μΈλΆ:
Subject β The cat
Verb β sees
Object β a rat
β ν΅μ¬: π μμμ μλλ‘ νΈλ¦¬ μμ±
Sentence β Subject β Noun ...
π 39νμ΄μ§
λ¬Έλ² λ°λ³΅
π λμΌ λ¬Έλ² λ€μ 보μ¬μ€
β μ€λͺ :
π parser ꡬν μ€λΉ λ¨κ³
π λ¬Έλ²μ κ³μ νμΈνλ μ΄μ : μ½λλ‘ λ°κΎΈκΈ° μν΄
π 40νμ΄μ§
λ¬Έλ² + νΈλ¦¬
β μ€λͺ :
π μ€μ parsing μ§ν μ€ μν
π parserλ: νμ¬ μ λ ₯ 보면μ κ·μΉ μ ν
π 41νμ΄μ§
λ¬Έλ² + νΈλ¦¬ (κ³μ)
β μ€λͺ : π λ¨κ³λ³ νμ₯
Sentence
β Subject
β the Noun
β the cat
β ν΅μ¬: π LL parserλ βν λ¨κ³μ© λ΄λ €κ°β
π 42νμ΄μ§
Outlook
ν΄μ: μ 리
Parsing β
ν΄μ: νμ± μλ£
Top-down parsing β
ν΄μ: νλ€μ΄ μλ£
Recursive descent parser construction
ν΄μ: μ¬κ· νκ° νμ ꡬν
AST Construction
ν΄μ: AST μμ±
Chomskyβs Hierarchy
ν΄μ: μ΄μ€ν€ κ³μΈ΅
β μ€λͺ : π μ§κΈκΉμ§ ν κ² μ 리
π₯ 35~42νμ΄μ§ ν΅μ¬ μμ½
1. Parser μ 체 νλ¦
input β parser β parse tree β AST
2. Top-down parsing
루νΈλΆν° μμν΄μ λ΄λ €κ°
3. LL νΉμ§
Left-to-right + Leftmost derivation
4. μ μ½
Left recursion κΈμ§
5. ν΅μ¬ μμ΄λμ΄
νμ¬ μ λ ₯ λ³΄κ³ βμ΄λ€ κ·μΉ μΈμ§ μμΈ‘β
6. Recursive Descent
π ν¨μ κΈ°λ° κ΅¬ν
parseExpr() {
if (...) parseTerm();
}
π₯ μ 체 κ°μ ν μ€ μ 리
Scanner β Parser β AST β μ€ν
Leave a comment