Syntax Analysis 2

COMP321 Compiler β€” Syntax Analysis 2

Kyungpook National University | Hwisoo So | Spring 2026


πŸ“‹ 1νŽ˜μ΄μ§€ β€” 제λͺ©

Syntax Analysis – 2

해석: ꡬ문 뢄석 – 2

βœ” μ„€λͺ… 이 κ°•μ˜λŠ” Syntax Analysis(ꡬ문 뢄석)의 두 번째 νŒŒνŠΈλ‹€. 즉,

  • Lexical Analysis(토큰화) 이후 단계
  • μ‹€μ œ λ¬Έμž₯ ꡬ쑰λ₯Ό λΆ„μ„ν•˜λŠ” 단계

λ₯Ό 본격적으둜 닀루기 μ‹œμž‘ν•˜λŠ” κ°•μ˜λ‹€.


Hwisoo So
Kyungpook National University

해석: μ†Œν¬μˆ˜ / κ²½λΆλŒ€ν•™κ΅

βœ” μ„€λͺ…: κ°•μ˜ λ‹΄λ‹Ή ꡐ수 정보.


COMP321 Compiler
Spring 2026

해석: COMP321 컴파일러 κ³Όλͺ© / 2026λ…„ λ΄„ ν•™κΈ°

βœ” μ„€λͺ…: κ³Όλͺ© 정보 + ν•™κΈ°.

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

(νŽ˜μ΄μ§€ 전체)

  • 쀑앙에 β€œSyntax Analysis – 2” 크게 있음
  • μ•„λž˜μ— ꡐ수 이름 + 학ꡐ
  • ν•˜λ‹¨μ— κ²½λΆλŒ€ν•™κ΅ 둜고

πŸ‘‰ 의미: 이건 κ·Έλƒ₯ κ°•μ˜ 타이틀 μŠ¬λΌμ΄λ“œλΌμ„œ κ°œλ… λ‚΄μš© μ—†μŒ.


πŸ“‹ 2νŽ˜μ΄μ§€

Syntax Analysis COMP321@KNU

해석: ꡬ문 뢄석 (κ²½λΆλŒ€ 컴파일러 κ³Όλͺ©)

Outlook

해석: κ°œμš” / μ•žμœΌλ‘œ 배울 λ‚΄μš©


Syntax and Semantics of Programming Languages βœ“

해석: ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ ꡬ문과 의미 βœ“

βœ” μ„€λͺ…: 이미 이전 κ°•μ˜μ—μ„œ

  • syntax (ν˜•μ‹)
  • semantics (의미)

λ₯Ό λ°°μ› λ‹€λŠ” 뜻.


Specifying the Syntax of a programming language: βœ“

해석: ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ ꡬ문을 μ •μ˜ν•˜λŠ” 방법 βœ“


– CFGs, BNF and EBNF βœ“

해석: CFG, BNF, EBNF βœ“

βœ” μ„€λͺ…: 문법 μ •μ˜ 방식듀 이미 ν•™μŠ΅ μ™„λ£Œ.


– Grammar transformations βœ“

해석: 문법 λ³€ν™˜ βœ“

βœ” μ„€λͺ…: left recursion 제거, factoring 같은 것.


Parsing

해석: νŒŒμ‹±

βœ” μ„€λͺ…: πŸ‘‰ μ§€κΈˆλΆ€ν„° 핡심 β€” scanner λ‹€μŒ 단계 = parser


– Top-down parsing (LL)

해석: νƒ‘λ‹€μš΄ νŒŒμ‹± (LL)


– Recursive descent (LL) parser construction

해석: μž¬κ·€ ν•˜κ°• νŒŒμ„œ κ΅¬ν˜„


– LL Grammars

해석: LL 문법


AST Construction

해석: AST 생성


– Parse trees vs. ASTs

해석: 파슀 트리 vs AST


Chomsky’s Hierarchy

해석: μ΄˜μŠ€ν‚€ 계측 ꡬ쑰

βœ” μ„€λͺ…: 이 μŠ¬λΌμ΄λ“œλŠ” μ•žμœΌλ‘œ 배울 것 λ‘œλ“œλ§΅μ΄λ‹€.

핡심 흐름:

문법 μ •μ˜ β†’ νŒŒμ‹± β†’ AST 생성 β†’ 이둠

특히 μ€‘μš”ν•œ 건:

πŸ‘‰ μ΄μ œλΆ€ν„° β€œParsing” 파트 μ‹œμž‘


πŸ“‹ 3νŽ˜μ΄μ§€

Parsing

해석: νŒŒμ‹±


We will now look at parsing.

해석: 이제 νŒŒμ‹±μ„ μ‚΄νŽ΄λ³Ό 것이닀.


Topics:

해석: λ‹€λ£° λ‚΄μš©:


Derivations and parse trees

해석: μœ λ„ κ³Όμ •κ³Ό 파슀 트리


Ambiguous grammars

해석: λͺ¨ν˜Έν•œ 문법


– Operator precedence

해석: μ—°μ‚°μž μš°μ„ μˆœμœ„


– Operator associativity

해석: μ—°μ‚°μž κ²°ν•© λ°©ν–₯


Recursive descent parsing

해석: μž¬κ·€ ν•˜κ°• νŒŒμ‹±


– What it is.

해석: 그게 무엇인지


– How to implement it given an EBNF specification.

해석: EBNF λͺ…μ„Έλ₯Ό 기반으둜 μ–΄λ–»κ²Œ κ΅¬ν˜„ν•˜λŠ”μ§€

βœ” μ„€λͺ…: 이 νŽ˜μ΄μ§€ 핡심:

πŸ‘‰ parser κ³΅λΆ€μ—μ„œ μ€‘μš”ν•œ 3개

  • derivation (λ¬Έμž₯ 생성 κ³Όμ •)
  • parse tree (ꡬ쑰 ν‘œν˜„)
  • ambiguity (λͺ¨ν˜Έμ„±)

그리고

πŸ‘‰ μ‹€μ œ κ΅¬ν˜„: recursive descent


πŸ“‹ 4νŽ˜μ΄μ§€

Grammars: Definition

해석: 문법: μ •μ˜


Typically, languages are specified as grammars

해석: 일반적으둜 μ–Έμ–΄λŠ” λ¬Έλ²•μœΌλ‘œ μ •μ˜λœλ‹€

βœ” μ„€λͺ…: ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄ = CFG둜 μ •μ˜λ¨


Formally, a grammar is a tuple (N, Ξ£, S, P) where

해석: ν˜•μ‹μ μœΌλ‘œ 문법은 (N, Ξ£, S, P)둜 κ΅¬μ„±λœλ‹€


N is a finite set of non-terminal symbols

해석: N은 μœ ν•œν•œ 비단말 집합이닀

βœ” μ„€λͺ…: <expr>, <stmt> 같은 것


Ξ£ is a set of terminal symbols

해석: Ξ£λŠ” 단말 기호 집합이닀

βœ” μ„€λͺ…: μ‹€μ œ 토큰 (if, +, id λ“±)


S ∈ N is the start symbol

해석: SλŠ” μ‹œμž‘ κΈ°ν˜Έμ΄λ‹€

βœ” μ„€λͺ…: λ¬Έλ²•μ˜ 좜발점


P βŠ† (N βˆͺ Ξ£)βˆ— Γ— (N βˆͺ Ξ£)βˆ— are the production rules

해석: PλŠ” 생성 κ·œμΉ™ 집합이닀

βœ” μ„€λͺ…: A β†’ B ν˜•νƒœ


N = {<prog>, <expr>, <var>}

πŸ‘‰ 비단말 μ§‘ν•©

Ξ£ = {True, False, ;, &, |, a, b, …}

πŸ‘‰ 단말 μ§‘ν•©

S = <prog>

πŸ‘‰ μ‹œμž‘ 심볼


Production rules

<prog> β†’
<prog> β†’ <expr>; <prog>

πŸ‘‰ ν”„λ‘œκ·Έλž¨μ€ μ—¬λŸ¬ expr둜 ꡬ성됨

<expr> β†’ True / False

πŸ‘‰ boolean κ°’

<expr> β†’ <var>

πŸ‘‰ λ³€μˆ˜

<expr> β†’ !<expr>

πŸ‘‰ NOT μ—°μ‚°

<expr> β†’ <expr> & <expr>
<expr> β†’ <expr> | <expr>

πŸ‘‰ AND, OR

<var> β†’ Var(_)

πŸ‘‰ λ³€μˆ˜ 생성

βœ” 핡심 μ„€λͺ…:

πŸ‘‰ β€œλ¬Έλ²• = μ–Έμ–΄λ₯Ό μ •μ˜ν•˜λŠ” μˆ˜ν•™μ  ꡬ쑰”


πŸ“‹ 5νŽ˜μ΄μ§€

Grammars: Language

해석: 문법이 μ •μ˜ν•˜λŠ” μ–Έμ–΄


The language of a grammar is the set of words that can be derived from the start symbol by applying production rules

해석: λ¬Έλ²•μ˜ μ–Έμ–΄λž€ μ‹œμž‘ κΈ°ν˜Έλ‘œλΆ€ν„° 생성 κ·œμΉ™μ„ μ μš©ν•˜μ—¬ λ§Œλ“€μ–΄μ§ˆ 수 μžˆλŠ” λͺ¨λ“  λ¬Έμžμ—΄μ˜ 집합이닀

βœ” μ„€λͺ…: πŸ‘‰ 맀우 μ€‘μš” β€” Grammar β†’ Language 생성


Example derivation

<prog>
β‡’ <expr>; <prog>
β‡’ <expr>;
β‡’ <expr>&<expr>;
β‡’ <var>&<expr>;
β‡’ foo&<expr>;
β‡’ foo&<var>;
β‡’ foo&bar;

해석 + μ„€λͺ…: 이건 λ¬Έμžμ—΄μ„ λ§Œλ“œλŠ” 과정이닀.

단계별:

  • <prog>
  • β†’ <expr>;
  • β†’ <expr>&<expr>;
  • β†’ <var>&<expr>;
  • β†’ foo&<expr>;
  • β†’ foo&<var>;
  • β†’ foo&bar;

πŸ‘‰ μ΅œμ’… κ²°κ³Ό λ¬Έμžμ—΄ 생성됨


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (μ€‘μš”)

μŠ¬λΌμ΄λ“œ ν•˜λ‹¨ 트리

πŸ‘‰ parse tree ꡬ쑰:

루트: <prog>
쀑간: <expr> & <expr>
리프: foo, bar

πŸ‘‰ 의미: foo & bar; 이 ꡬ쑰λ₯Ό 트리둜 ν‘œν˜„ν•œ 것

βœ” 핡심 정리:

πŸ‘‰ β€œλ¬Έλ²•μ€ λ¬Έμžμ—΄μ„ μƒμ„±ν•˜λŠ” κ·œμΉ™μ΄κ³ , κ·Έ κ²°κ³Ό 집합이 언어닀”


πŸ”₯ 1~5νŽ˜μ΄μ§€ 핡심 μš”μ•½

  • 문법(CFG)은 μ–Έμ–΄λ₯Ό μ •μ˜ν•˜λŠ” μˆ˜ν•™μ  ꡬ쑰
  • (N, Ξ£, S, P)둜 ꡬ성됨
  • production rule을 톡해 λ¬Έμžμ—΄μ„ 생성
  • derivation = 생성 κ³Όμ •
  • parse tree = ꡬ쑰 ν‘œν˜„

πŸ“‹ 6νŽ˜μ΄μ§€

Grammars for Regular Languages

해석: μ •κ·œ μ–Έμ–΄λ₯Ό μœ„ν•œ 문법


Can we place a restriction on the form of a grammar to ensure that it describes a regular language?

해석: λ¬Έλ²•μ˜ ν˜•νƒœμ— μ œν•œμ„ 두어 그것이 μ •κ·œ μ–Έμ–΄λ₯Ό ν‘œν˜„ν•˜λ„λ‘ λ§Œλ“€ 수 μžˆμ„κΉŒ?

βœ” μ„€λͺ…: πŸ‘‰ 핡심 질문 β€” β€œλͺ¨λ“  CFG 말고 μ •κ·œμ–Έμ–΄λ§Œ ν‘œν˜„ν•˜λŠ” 문법 λ”°λ‘œ λ§Œλ“€ 수 μžˆλƒ?”


Provable fact:

해석: 증λͺ… κ°€λŠ₯ν•œ 사싀:


For any RE r, there is a grammar g such that L(r) = L(g).

해석: μž„μ˜μ˜ μ •κ·œν‘œν˜„μ‹ r에 λŒ€ν•΄, L(r) = L(g)λ₯Ό λ§Œμ‘±ν•˜λŠ” 문법 gκ°€ μ‘΄μž¬ν•œλ‹€.

βœ” μ„€λͺ…: πŸ‘‰ RE ↔ Grammar λŒ€μ‘ κ°€λŠ₯ β€” 즉, μ •κ·œν‘œν˜„μ‹ == μ •κ·œλ¬Έλ²•


The grammars that generate regular sets are called regular grammars

해석: μ •κ·œ 집합을 μƒμ„±ν•˜λŠ” 문법을 μ •κ·œ 문법이라고 ν•œλ‹€


Definition:

해석: μ •μ˜:


In a regular grammar, all productions have one of two forms:

해석: μ •κ·œ λ¬Έλ²•μ—μ„œλŠ” λͺ¨λ“  생성 κ·œμΉ™μ΄ λ‹€μŒ 두 ν˜•νƒœ 쀑 ν•˜λ‚˜μ΄λ‹€


  1. A β†’ aA (or A β†’ Aa)

해석: A β†’ aA (λ˜λŠ” A β†’ Aa)

βœ” μ„€λͺ…: πŸ‘‰ μž¬κ·€ (문자 + μƒνƒœ)


  1. A β†’ a (or A β†’ Ξ΅)

해석: A β†’ a (λ˜λŠ” A β†’ Ξ΅)

βœ” μ„€λͺ…: πŸ‘‰ μ’…λ£Œ 쑰건


where A is any non-terminal and a is any terminal symbol

해석: μ—¬κΈ°μ„œ AλŠ” 비단말, aλŠ” 단말이닀


These are also called type 3 grammars (Chomsky)

해석: 이것을 μ΄˜μŠ€ν‚€ νƒ€μž… 3 문법이라고 ν•œλ‹€

βœ” μ„€λͺ…: μ΄˜μŠ€ν‚€ 계측:

  • Type 3 β†’ Regular
  • Type 2 β†’ CFG

βœ” 핡심: πŸ‘‰ μ •κ·œλ¬Έλ²•μ€ ν˜•νƒœ μ œν•œ 있음


πŸ“‹ 7νŽ˜μ΄μ§€

Grammars for Regular Languages

해석: μ •κ·œ μ–Έμ–΄ 문법


In a regular grammar (Type-3 grammar), all productions have one of two forms

해석: μ •κ·œ λ¬Έλ²•μ—μ„œλŠ” 생성 κ·œμΉ™μ΄ 두 ν˜•νƒœλ§Œ κ°€μ§„λ‹€

A β†’ aA
A β†’ a

Operations:

해석: μ—°μ‚°:


(1) Symbol: a

해석: 기호: a

A β†’ a

해석: AλŠ” a둜 생성


**(2) Alternation: (a b)**

해석: 선택 (a λ˜λŠ” b)

A β†’ a
A β†’ b

해석: A β†’ a λ˜λŠ” A β†’ b

βœ” μ„€λͺ…: πŸ‘‰ OR μ—°μ‚°


(3) Concatenation: aβˆ™b

해석: μ—°κ²°: a λ‹€μŒ b

A β†’ aB
B β†’ b

해석: a λ‹€μŒ b 생성

βœ” μ„€λͺ…: πŸ‘‰ μˆœμ„œ μ—°κ²°


(4) Repetition (Kleene Star): a*

해석: 반볡: a*

A β†’ aA
A β†’ Ξ΅

해석: λ°˜λ³΅ν•˜κ±°λ‚˜ μ’…λ£Œ

βœ” μ„€λͺ…: πŸ‘‰ a* κ΅¬ν˜„

βœ” 핡심: πŸ‘‰ μ •κ·œν‘œν˜„μ‹ 연산을 λ¬Έλ²•μœΌλ‘œ κ΅¬ν˜„ κ°€λŠ₯


πŸ“‹ 8νŽ˜μ΄μ§€

Parse Trees

해석: 파슀 트리


Correspondence between a derivation and a parse tree:

해석: μœ λ„ κ³Όμ •κ³Ό 파슀 νŠΈλ¦¬λŠ” μ„œλ‘œ λŒ€μ‘λœλ‹€

βœ” μ„€λͺ…: πŸ‘‰ derivation ↔ tree


문법:

expr ::= id | int | - expr | ( expr ) | expr op expr
op ::= + | - | * | /

해석: exprλŠ” λ‹€μŒ 쀑 ν•˜λ‚˜ / opλŠ” μ—°μ‚°μž


generate string β€œslope * x + intercept”:

해석: β€œslope * x + intercept” 생성


derivation κ³Όμ •

expr β†’ expr op expr
     β†’ expr op id
     β†’ expr + id
     β†’ expr op expr + id
     β†’ expr op id + id
     β†’ expr * id + id
     β†’ id * id + id

πŸ‘‰ μ΅œμ’…: slope * x + intercept


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (맀우 μ€‘μš”)

μŠ¬λΌμ΄λ“œ 였λ₯Έμͺ½ 트리:

        expr
       /  |   \
    expr op  expr
     |     |     |
   expr    +   id(intercept)
   / | \
 id  *  id

πŸ‘‰ 의미: (slope * x) + intercept

βœ” 핡심:

  • πŸ‘‰ derivation = linear κ³Όμ •
  • πŸ‘‰ parse tree = ꡬ쑰 ν‘œν˜„

πŸ“‹ 9νŽ˜μ΄μ§€

Parse Trees (cont.)

해석: 파슀 트리 (계속)


Correspondence between a derivation and a parse tree

해석: μœ λ„μ™€ 파슀 트리 λŒ€μ‘


The parse tree’s internal nodes are nonterminals

해석: λ‚΄λΆ€ λ…Έλ“œλŠ” 비단말이닀


The children of a node are the terminals and nonterminals on the right-hand side

해석: μžμ‹μ€ RHS ꡬ성 μš”μ†Œ


The leaves of the parse tree are the terminals (tokens)

해석: λ¦¬ν”„λŠ” 토큰이닀


When read from left to right, the leaves make up the sentence

해석: 리프λ₯Ό μ™Όβ†’μ˜€ 읽으면 λ¬Έμž₯이 λœλ‹€


πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

트리 ꡬ쑰:

루트: expr
λ‚΄λΆ€: expr, op
리프: id(slope), *, id(x), +, id(intercept)

πŸ‘‰ 읽으면: slope * x + intercept

βœ” 핡심: πŸ‘‰ parse treeλŠ” β€œλ¬Έμž₯의 ꡬ쑰”λ₯Ό μ •ν™•νžˆ 보여쀀닀


πŸ“‹ 10νŽ˜μ΄μ§€

Ambiguous Grammars

해석: λͺ¨ν˜Έν•œ 문법


Ambiguity: one sentential form has several distinct parse trees

해석: λͺ¨ν˜Έμ„±: ν•˜λ‚˜μ˜ λ¬Έμž₯이 μ—¬λŸ¬ 파슀 트리λ₯Ό κ°€μ§ˆ 수 μžˆλ‹€


Example: slope * x + intercept

해석: μ˜ˆμ‹œ


(slope * x) + intercept

해석: κ³± λ¨Όμ €


slope * (x + intercept)

해석: λ§μ…ˆ λ¨Όμ €


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (핡심)

두 개 트리 쑴재:

1️⃣

   +
  / \
 *   intercept
/ \
slope x

πŸ‘‰ (slope * x) + intercept

2️⃣

   *
  / \
slope  +
       / \
      x intercept

πŸ‘‰ slope * (x + intercept)


Problem: Operator precedence!

해석: 문제: μ—°μ‚°μž μš°μ„ μˆœμœ„


In the 2nd tree, β€œ+” has precedence over β€œ*”!

해석: 두 번째 νŠΈλ¦¬μ—μ„œλŠ” +κ°€ *보닀 μš°μ„ λ¨

βœ” 핡심 μ„€λͺ…:

πŸ‘‰ grammarκ°€ λͺ¨ν˜Έν•˜λ©΄: 같은 μ½”λ“œ β†’ λ‹€λ₯Έ 의미

πŸ‘‰ 컴파일러 μž…μž₯μ—μ„œ 치λͺ…적


πŸ”₯ 6~10νŽ˜μ΄μ§€ 핡심 μš”μ•½

  • μ •κ·œλ¬Έλ²• = ν˜•νƒœ μ œν•œλœ CFG
  • RE ↔ Grammar λŒ€μ‘ κ°€λŠ₯
  • derivation ↔ parse tree λŒ€μ‘
  • parse treeλŠ” λ¬Έμž₯ ꡬ쑰λ₯Ό ν‘œν˜„
  • ambiguous grammarλŠ” μ ˆλŒ€ μ‚¬μš©ν•˜λ©΄ μ•ˆλ¨
  • 문제 원인 = operator precedence

πŸ“‹ 11νŽ˜μ΄μ§€

Ambiguous Grammars (cont.)

해석: λͺ¨ν˜Έν•œ 문법 (계속)


When more than one distinct derivation of a sentence exists (which means there exist several distinct parse trees), the grammar is ambiguous.

해석: ν•˜λ‚˜μ˜ λ¬Έμž₯에 λŒ€ν•΄ μ—¬λŸ¬ μ„œλ‘œ λ‹€λ₯Έ μœ λ„ 과정이 μ‘΄μž¬ν•˜λ©΄ (즉 μ—¬λŸ¬ 개의 파슀 νŠΈλ¦¬κ°€ μ‘΄μž¬ν•˜λ©΄) κ·Έ 문법은 λͺ¨ν˜Έν•˜λ‹€

βœ” μ„€λͺ…: πŸ‘‰ 핡심 μ •μ˜

μ—¬λŸ¬ derivation ⇔ μ—¬λŸ¬ parse tree ⇔ ambiguous


A programming language construct should have only one parse tree to avoid misinterpretation by a programmer/compiler.

해석: ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄ κ΅¬μ‘°λŠ” μ˜€ν•΄λ₯Ό λ°©μ§€ν•˜κΈ° μœ„ν•΄ ν•˜λ‚˜μ˜ 파슀 트리만 κ°€μ Έμ•Ό ν•œλ‹€

βœ” μ„€λͺ…: πŸ‘‰ μ»΄νŒŒμΌλŸ¬λŠ” β€œμ˜λ―Έκ°€ ν•˜λ‚˜λ‘œ κ³ μ •β€λ˜μ–΄μ•Ό 함


For expression grammars, precedence and associativity of operators are used to disambiguate the productions.

해석: ν‘œν˜„μ‹ λ¬Έλ²•μ—μ„œλŠ” μ—°μ‚°μž μš°μ„ μˆœμœ„μ™€ κ²°ν•© λ°©ν–₯을 μ‚¬μš©ν•˜μ—¬ λͺ¨ν˜Έμ„±μ„ μ œκ±°ν•œλ‹€


We rewrite the grammar to make it un-ambiguous:

해석: 문법을 λ‹€μ‹œ μž‘μ„±ν•˜μ—¬ λͺ¨ν˜Έμ„±μ„ μ œκ±°ν•œλ‹€


expr   ::= term | expr add_op term
term   ::= factor | term mult_op factor
factor ::= id | number | - factor | ( expr )
add_op ::= + | -
mult_op ::= * | /

해석 + μ„€λͺ…:

  • expr β†’ πŸ‘‰ λ§μ…ˆ/λΊ„μ…ˆ λ‹΄λ‹Ή
  • term β†’ πŸ‘‰ κ³±μ…ˆ/λ‚˜λˆ—μ…ˆ λ‹΄λ‹Ή
  • factor β†’ πŸ‘‰ κ°€μž₯ κΈ°λ³Έ λ‹¨μœ„

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

였λ₯Έμͺ½ λ…Έλž€ λ°•μŠ€:

πŸ‘‰ β€œ+λ₯Ό *보닀 μœ„μ— 두지 말고 μ•„λž˜λ‘œ 내렀라”

β†’ 즉: * > +

μ™Όμͺ½ λ…Έλž€ λ°•μŠ€:

πŸ‘‰ left associativity

a - b - c = (a - b) - c

βœ” 핡심: πŸ‘‰ 문법 ꡬ쑰둜 μš°μ„ μˆœμœ„λ₯Ό κ°•μ œν•œλ‹€


πŸ“‹ 12νŽ˜μ΄μ§€

Ambiguous if-then-else

해석: if-then-else의 λͺ¨ν˜Έμ„±


A well-known example of an ambiguous grammar are the following productions for if-then-else

해석: 잘 μ•Œλ €μ§„ λͺ¨ν˜Έν•œ 문법 μ˜ˆμ‹œλŠ” if-then-else이닀


stmt ::= IF expr THEN stmt
       | IF expr THEN stmt ELSE stmt

해석: ifλ¬Έ μ •μ˜


Example: IF a THEN IF b THEN x=false; ELSE x=true;

해석: 예제

βœ” μ„€λͺ… (핡심):

πŸ‘‰ 문제: ELSEκ°€ μ–΄λŠ IF에 λΆ™λŠ”κ°€?


This grammar can be repaired, but the above problem indicates a programming language design problem.

해석: 이 문법은 μˆ˜μ •ν•  수 μžˆμ§€λ§Œ, 이 λ¬Έμ œλŠ” μ–Έμ–΄ 섀계 문제λ₯Ό μ˜λ―Έν•œλ‹€


Ada uses a different syntax to avoid this problem

해석: AdaλŠ” λ‹€λ₯Έ λ¬Έλ²•μœΌλ‘œ ν•΄κ²°ν•œλ‹€


stmt ::= IF expr THEN stmt END IF
       | IF expr THEN stmt ELSE stmt END IF

해석: END IF μΆ”κ°€

βœ” 핡심:

πŸ‘‰ dangling else 문제

πŸ‘‰ ν•΄κ²°: ꡬ쑰λ₯Ό λͺ…ν™•ν•˜κ²Œ λ§Œλ“ λ‹€


πŸ“‹ 13νŽ˜μ΄μ§€

Some Terminology

해석: μš©μ–΄ 정리


Recognition

해석: 인식


To answer the question β€œDoes the input conform to the syntax of the language?”

해석: μž…λ ₯이 문법에 λ§žλŠ”μ§€ ν™•μΈν•˜λŠ” 것


A recognizer uses a CFG to check the syntax of the program.

해석: recognizerλŠ” CFGλ₯Ό μ‚¬μš©ν•˜μ—¬ 문법을 κ²€μ‚¬ν•œλ‹€


It answers β€œYes” or β€œNo”

해석: Yes/No둜 νŒλ‹¨

βœ” μ„€λͺ…: πŸ‘‰ recognizer = 문법 체크기


Parsing

해석: νŒŒμ‹±


  1. Recognize the input program.

해석: μž…λ ₯이 λ§žλŠ”μ§€ 확인


  1. Determine the phrase structure

해석: ꡬ쑰λ₯Ό 뢄석


A parser uses a CFG to parse a sentence or a program.

해석: parserλŠ” CFGλ₯Ό μ΄μš©ν•΄ λ¬Έμž₯을 λΆ„μ„ν•œλ‹€


It constructs the leftmost or rightmost derivation

해석: 쒌츑/우츑 μœ λ„λ₯Ό μƒμ„±ν•œλ‹€


builds the AST

해석: ASTλ₯Ό λ§Œλ“ λ‹€

βœ” 핡심 비ꡐ:

Β  μ—­ν• 
Recognizer Yes/No
Parser ꡬ쑰 생성 (AST)

πŸ“‹ 14νŽ˜μ΄μ§€

Context-Free Grammar Classes

해석: CFG 클래슀


For an arbitrary CFG, parsing can take O(nΒ³) time

해석: 일반 CFG νŒŒμ‹±μ€ O(nΒ³)


too slow for practical applications

해석: λ„ˆλ¬΄ 느림


For several classes of grammars, a parser that takes O(n) time can be constructed

해석: νŠΉμ • λ¬Έλ²•μ—μ„œλŠ” O(n) κ°€λŠ₯


Top-down LL parsers

해석: LL νŒŒμ„œ


Bottom-up LR parsers

해석: LR νŒŒμ„œ


LL = Left-to-right scanning, Left-most derivation

해석: LL = μ’Œβ†’μš°, 쒌츑 μœ λ„


LR = Left-to-right scanning, Right-most derivation

해석: LR = μ’Œβ†’μš°, 우츑 μœ λ„


The class of LR grammars is a proper superset of the class of LL grammars

해석: LR βŠƒ LL

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

LL βŠ‚ LR

πŸ‘‰ LR이 더 κ°•λ ₯함


πŸ“‹ 15νŽ˜μ΄μ§€

Parser Motivation

해석: νŒŒμ„œμ˜ λͺ©μ 


Given a grammar G and an input string s, we need an algorithm to:

해석: 문법 G와 λ¬Έμžμ—΄ sκ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ


Decide whether s is in L(G)

해석: sκ°€ 언어에 μ†ν•˜λŠ”μ§€ νŒλ‹¨


If so, generate a parse tree

해석: 맞으면 파슀 트리 생성


We will see two algorithms

해석: 두 κ°€μ§€ μ•Œκ³ λ¦¬μ¦˜μ„ λ³Έλ‹€


Each with different tradeoffs in time and space

해석: μ‹œκ°„/곡간 νŠΈλ ˆμ΄λ“œμ˜€ν”„ 쑴재

βœ” 핡심:

πŸ‘‰ parser의 μ—­ν•  2개

  1. νŒλ³„ (membership)
  2. ꡬ쑰 생성 (parse tree)

πŸ”₯ 11~15νŽ˜μ΄μ§€ 핡심 μš”μ•½

  • ambiguous grammar β†’ λ°˜λ“œμ‹œ 제거
  • precedence + associativity둜 ν•΄κ²°
  • dangling else = λŒ€ν‘œμ  문제
  • recognizer vs parser 차이
  • LL vs LR ꡬ쑰
  • parser λͺ©μ  = νŒλ³„ + 트리 생성

πŸ“‹ 16νŽ˜μ΄μ§€

CYK Algorithm

해석: CYK μ•Œκ³ λ¦¬μ¦˜


Cocke-Younger-Kasami (CYK) Algorithm

해석: Cocke, Younger, Kasamiκ°€ λ§Œλ“  μ•Œκ³ λ¦¬μ¦˜


Parsing algorithm for context-free grammars

해석: CFGλ₯Ό μœ„ν•œ νŒŒμ‹± μ•Œκ³ λ¦¬μ¦˜μ΄λ‹€


Invented by John Cocke, Daniel Younger, and Tadao Kasami

해석: μ„Έ μ‚¬λžŒμ΄ λ§Œλ“  μ•Œκ³ λ¦¬μ¦˜


Basic idea given string s with n tokens:

해석: 길이 n인 λ¬Έμžμ—΄ sκ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ κΈ°λ³Έ 아이디어


  1. Find production rules that cover 1 token in s

해석: 길이 1짜리 토큰을 생성할 수 μžˆλŠ” κ·œμΉ™ μ°ΎκΈ°


  1. Use 1. to find rules that cover 2 tokens

해석: 길이 2짜리 ꡬ간 생성


  1. Use 2. to find rules that cover 3 tokens

해석: 길이 3짜리 ꡬ간 생성


…

해석: 계속 반볡


N. Use N-1. to find rules that cover n tokens

해석: μ΅œμ’…μ μœΌλ‘œ 전체 λ¬Έμžμ—΄ 생성


If succeeds then s is in L(G), else it is not

해석: μ„±κ³΅ν•˜λ©΄ 언어에 속함

βœ” 핡심 μ„€λͺ…:

πŸ‘‰ CYK 핡심 ꡬ쑰

길이 1 β†’ 길이 2 β†’ 길이 3 β†’ … β†’ 전체

πŸ‘‰ bottom-up 방식


πŸ“‹ 17νŽ˜μ΄μ§€

A graphical way to visualize CYK

해석: CYKλ₯Ό κ·Έλž˜ν”„λ‘œ ν‘œν˜„


Initial graph: the input (terminals)

해석: 초기 κ·Έλž˜ν”„ = μž…λ ₯


Repeat: add non-terminal edges until no more can be added

해석: 비단말 λ…Έλ“œλ₯Ό 계속 μΆ”κ°€


An edge is added when adjacent edges form RHS of a grammar production

해석: μΈμ ‘ν•œ λ…Έλ“œκ°€ RHS κ·œμΉ™μ„ λ§Œμ‘±ν•˜λ©΄ μΆ”κ°€


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (핡심)

μž…λ ₯:

a + a * a

κ·Έλž˜ν”„ ꡬ쑰:

  • μ•„λž˜: terminals (a, +, a, *, a)
  • μœ„λ‘œ μ˜¬λΌκ°€λ©΄μ„œ E μΆ”κ°€

πŸ‘‰ 의미:

E β†’ a
E β†’ E + E
E β†’ E * E

βœ” 핡심: πŸ‘‰ κ·Έλž˜ν”„ = parsing κ³Όμ • μ‹œκ°ν™”


πŸ“‹ 18νŽ˜μ΄μ§€

CYK: the algorithm

해석: CYK μ•Œκ³ λ¦¬μ¦˜


CYK is easiest for grammars in Chomsky Normal Form

해석: CYKλŠ” CNFμ—μ„œ κ°€μž₯ 쉽닀


O(NΒ³) time, O(NΒ²) space

해석: μ‹œκ°„ O(NΒ³), 곡간 O(NΒ²)


Chomsky Normal Form

해석: μ΄˜μŠ€ν‚€ μ •κ·œν˜•


A β†’ BC

해석: 비단말 두 개


A β†’ d

해석: 터미널 ν•˜λ‚˜


S β†’ Ξ΅

해석: 빈 λ¬Έμžμ—΄

βœ” μ„€λͺ…:

πŸ‘‰ CNF 쑰건:

  • A β†’ BC
  • A β†’ a

πŸ‘‰ 이유: CYKλŠ” 2κ°œμ”© μͺΌκ°œλ©΄μ„œ κ³„μ‚°ν•˜κΈ° λ•Œλ¬Έ


πŸ“‹ 19νŽ˜μ΄μ§€

CYK Implementation

해석: CYK κ΅¬ν˜„


CYK uses a table e(N,N)

해석: NxN ν…Œμ΄λΈ” μ‚¬μš©


set e(i,j) to true if substring input[i:j] can be derived

해석: input[i:j]λ₯Ό 생성할 수 있으면 true


input[i:j] is input from index i to j-1

해석: iλΆ€ν„° j-1κΉŒμ§€


For the grammar

해석: 문법

E β†’ a | E + E | E * E

κ·œμΉ™

  1. e(i,i+1) if input[i] == 'a'

해석: 길이 1이면 a인지 확인

  1. e(i,j) if e(i,k), input[k]=='+', e(k+1,j)

해석: + κΈ°μ€€μœΌλ‘œ λΆ„ν• 

  1. e(i,j) if e(i,k), input[k]=='*', e(k+1,j)

해석: * κΈ°μ€€ λΆ„ν• 

βœ” 핡심:

πŸ‘‰ λΆ„ν•  정볡: i ~ j ꡬ간을 kμ—μ„œ μͺΌκ°¬


πŸ“‹ 20νŽ˜μ΄μ§€

CYK Implementation

해석: CYK κ΅¬ν˜„


This is a form of dynamic programming

해석: DP μ•Œκ³ λ¦¬μ¦˜μ΄λ‹€


We use a table to store temporary results

해석: 쀑간 κ²°κ³Ό μ €μž₯


we use the temp results to compute new ones

해석: κΈ°μ‘΄ 결과둜 μƒˆλ‘œμš΄ 계산


Alternative: recursive checking

해석: λŒ€μ•ˆ: μž¬κ·€


But we will end up re-doing a lot of computation

해석: 쀑볡 계산 λ°œμƒ

βœ” 핡심 μ„€λͺ…:

πŸ‘‰ CYK = DP 이유: 같은 ꡬ간을 μ—¬λŸ¬ 번 κ³„μ‚°ν•˜μ§€ μ•ŠκΈ° μœ„ν•΄


πŸ”₯ 16~20νŽ˜μ΄μ§€ 핡심 μš”μ•½

1. CYK 본질

Bottom-up parsing + DP

2. λ™μž‘ 방식

길이 1 β†’ 2 β†’ 3 β†’ … β†’ 전체

3. 핡심 μ—°μ‚°

e(i,j) = ꡬ간 i~jκ°€ 생성 κ°€λŠ₯ν•œκ°€

4. λΆ„ν• 

i ~ j β†’ i~k + k+1~j

5. 쑰건

CNF (A β†’ BC)

6. λ³΅μž‘λ„

μ‹œκ°„: O(nΒ³) / 곡간: O(nΒ²)

COMP321 Compiler β€” Syntax Analysis 2 (21~42νŽ˜μ΄μ§€)

Kyungpook National University | Hwisoo So | Spring 2026


πŸ“‹ 21νŽ˜μ΄μ§€

Illustration

해석: μ˜ˆμ‹œ


Initial graph: the input (terminals)

해석: 초기 κ·Έλž˜ν”„ = μž…λ ₯


Repeat: add non-terminal edges until no more can be added

해석: 더 이상 μΆ”κ°€ν•  수 없을 λ•ŒκΉŒμ§€ 비단말을 μΆ”κ°€


An edge is added when adjacent edges form RHS of a grammar production

해석: μΈμ ‘ν•œ μš”μ†Œκ°€ RHSλ₯Ό λ§Œμ‘±ν•˜λ©΄ μΆ”κ°€


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (핡심)

μž…λ ₯:

a + a * a

ꡬ쑰:

  • μ•„λž˜: a + a * a (토큰)
  • μœ„: 점점 E 생성됨

ν‘œμ‹œ:

e(0,1), e(2,3), e(4,5)
e(0,3), e(2,5)
e(0,5)

πŸ‘‰ 의미: μž‘μ€ ꡬ간 β†’ 큰 κ΅¬κ°„μœΌλ‘œ ν™•μž₯

βœ” 핡심: πŸ‘‰ CYKλŠ” β€œκ·Έλž˜ν”„ μœ„λ‘œ μŒ“μ•„κ°€λŠ” 과정”


πŸ“‹ 22νŽ˜μ΄μ§€

CYK is dynamic programming

해석: CYKλŠ” 동적 κ³„νšλ²•μ΄λ‹€


Input: a + a * a

해석: μž…λ ₯


Let’s compute which facts we know hold

해석: 참인 것듀을 κ³„μ‚°ν•΄λ³΄μž


we’ll deduce facts gradually until no more can be deduced

해석: 더 이상 μΆ”λ‘ ν•  수 없을 λ•ŒκΉŒμ§€ 반볡


Step 1

base case (length 1)

e(0,1) = e(2,3) = e(4,5) = true

해석: 각 aλŠ” E둜 생성 κ°€λŠ₯


Step 2

length 3

e(0,3) = true   (+)
e(2,5) = true   (*)

해석: a+a / a*a κ°€λŠ₯


Step 3

length 5

e(0,5) = true

해석: 전체 λ¬Έμžμ—΄ κ°€λŠ₯


πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

인덱슀:

0 1 2 3 4
a + a * a

πŸ‘‰ ꡬ간:

  • (0,1): a
  • (0,3): a+a
  • (0,5): 전체

βœ” 핡심: πŸ‘‰ DP 핡심 흐름

length 1 β†’ length 3 β†’ length 5

πŸ“‹ 23νŽ˜μ΄μ§€

Visualize this parser in tabular form

해석: ν…Œμ΄λΈ”λ‘œ μ‹œκ°ν™”


Step 1 / Step 2 / Step 3

해석: 단계별 κ²°κ³Ό


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (μ‹œν—˜ 핡심)

ν‘œ ꡬ쑰:

  • ν–‰: i
  • μ—΄: j
  • e(i,j) μ±„μ›Œμ§€λŠ” μˆœμ„œ

Step 1 (1μΉΈ):

(0,1), (2,3), (4,5)

Step 2 (3μΉΈ):

(0,3), (2,5)

Step 3 (5μΉΈ):

(0,5)

πŸ–Ό 였λ₯Έμͺ½ κ·Έλ¦Ό 의미

숫자:

  • 1 β†’ step1
  • 2 β†’ step2
  • 3 β†’ step3

πŸ‘‰ 점점 ν™•μž₯됨


πŸ–Ό μ•„λž˜ κ·Έλž˜ν”„

E β†’ a
E β†’ E + E
E β†’ E * E

πŸ‘‰ μ‹€μ œ 적용 κ·œμΉ™

βœ” 핡심: πŸ‘‰ CYKλŠ” β€œμ‚Όκ° ν…Œμ΄λΈ” μ±„μš°κΈ°β€

μ•„λž˜ β†’ μœ„
짧은 β†’ κΈ΄

πŸ“‹ 24νŽ˜μ΄μ§€

CYK Parser

해석: CYK νŒŒμ„œ


Builds the parse tree bottom-up

해석: μ•„λž˜μ—μ„œ μœ„λ‘œ 트리 생성


given A β†’ B C

해석: κ·œμΉ™


when parser finds adjacent B C

해석: B와 Cκ°€ λΆ™μ–΄μžˆμœΌλ©΄


it reduces B C to A

해석: A둜 μΆ•μ†Œ


adding node A to parse tree

해석: νŠΈλ¦¬μ— μΆ”κ°€


Next lecture: top-down parsers

해석: λ‹€μŒ: top-down

βœ” 핡심: πŸ‘‰ CYK = bottom-up reduction

B C β†’ A

πŸ“‹ 25νŽ˜μ΄μ§€

CYK Pseudocode

해석: CYK μ˜μ‚¬μ½”λ“œ


초기 μ„€μ •

s = input string
P(N,N,r) = false

해석: 3차원 ν…Œμ΄λΈ”


P(i,j,Rk) = Rk is used to parse input from i to j

해석: Rkκ°€ i~j 생성 κ°€λŠ₯


Step 1

for each i
  for each Rk β†’ ai
    P[i,i+1,k] = true

해석: 길이 1 처리


Step 2

for i = 2 to n

해석: 길이 증가


for each j

해석: μ‹œμž‘ μœ„μΉ˜


for each k

해석: λΆ„ν•  μœ„μΉ˜


핡심 쑰건

if P[j,j+k,B] and P[j+k,j+i,C]
β†’ P[j,j+i,A] = true

해석: B + C β†’ A


λ§ˆμ§€λ§‰

if P[0,n,R1] true
β†’ accept

해석: 전체 생성 κ°€λŠ₯ν•˜λ©΄ 성곡


βœ” 핡심 ꡬ쑰 (μ‹œν—˜μš©):

for length
  for start
    for split
      for rule

πŸ”₯ 21~25νŽ˜μ΄μ§€ 핡심 μš”μ•½

1. CYK 전체 흐름

1 β†’ 3 β†’ 5 β†’ … β†’ n

2. ν…Œμ΄λΈ” 의미

e(i,j) = i~j 생성 κ°€λŠ₯?

3. 핡심 μ—°μ‚°

i~j β†’ i~k + k+1~j

4. μ•Œκ³ λ¦¬μ¦˜ ꡬ쑰

length
  start
    split
      rule

5. 본질

DP + Bottom-up parsing


πŸ“‹ 26νŽ˜μ΄μ§€

Illustration

해석: μ˜ˆμ‹œ


μ½”λ“œ ꡬ쑰

for each i = 2 to n
  for each j = 0 to n-i
    for each k = 1 to i-1
      for each production RA β†’ RB RC

해석:

  • i: λΆ€λΆ„ λ¬Έμžμ—΄ 길이
  • j: μ‹œμž‘ μœ„μΉ˜
  • k: λΆ„ν•  μœ„μΉ˜

if P[j,j+k,B] and P[j+k,j+i,C]
then P[j,j+i,A] = true

해석: μ™Όμͺ½κ³Ό 였λ₯Έμͺ½μ΄ 각각 생성 κ°€λŠ₯ν•˜λ©΄ 전체도 생성 κ°€λŠ₯


πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

μž…λ ₯:

a a a

문법:

E β†’ a
E β†’ E E

μƒνƒœ

P(0,1,E)
P(1,2,E)
P(2,3,E)

πŸ‘‰ 길이 1은 λͺ¨λ‘ true

βœ” 핡심: πŸ‘‰ μ§€κΈˆμ€ length=2 계산 μ‹œμž‘ 직전


πŸ“‹ 27νŽ˜μ΄μ§€

μ§„ν–‰ μƒνƒœ

i = 2
j = 0
k = 1

해석:

  • 길이 2
  • μ‹œμž‘ 0
  • split = 1

체크

P(0,1,E) and P(1,2,E)

πŸ‘‰ λ‘˜ λ‹€ true


κ²°κ³Ό

P(0,2,E) = true

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

a a
↓
E E β†’ E

βœ” 핡심: πŸ‘‰ 길이 2짜리 생성 성곡


πŸ“‹ 28νŽ˜μ΄μ§€

μ§„ν–‰ μƒνƒœ

i = 2
j = 1
k = 1

해석: 두 번째 ꡬ간


체크

P(1,2,E) and P(2,3,E)

πŸ‘‰ true


κ²°κ³Ό

P(1,3,E) = true

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

a a
↓
E E β†’ E

βœ” 핡심: πŸ‘‰ 두 번째 길이 2 ꡬ간도 성곡


πŸ“‹ 29νŽ˜μ΄μ§€

μƒνƒœ 정리

ν˜„μž¬ true:

P(0,1,E)
P(1,2,E)
P(2,3,E)
P(0,2,E)
P(1,3,E)

πŸ‘‰ 의미: 길이 1 + 길이 2 μ „λΆ€ μ™„λ£Œ


λ‹€μŒ 단계

i = 3

πŸ‘‰ 전체 길이 검사 μ‹œμž‘


πŸ“‹ 30νŽ˜μ΄μ§€

μ§„ν–‰ μƒνƒœ

i = 3
j = 0
k = 1

첫 번째 λΆ„ν• 

P(0,1,E) and P(1,3,E)

πŸ‘‰ true


κ²°κ³Ό

P(0,3,E) = true

두 번째 λΆ„ν• 

k = 2

체크

P(0,2,E) and P(2,3,E)

πŸ‘‰ true


κ²°κ³Ό

P(0,3,E) = true (이미 true)

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

a a a
↓
(E E) E
↓
E

λ˜λŠ”

a (a a)
↓
E (E E)
↓
E

βœ” 핡심:

πŸ‘‰ μ—¬λŸ¬ λ°©μ‹μœΌλ‘œ 생성 κ°€λŠ₯
πŸ‘‰ κ·Έλž˜λ„ κ²°κ³ΌλŠ” 동일


πŸ”₯ 26~30νŽ˜μ΄μ§€ 핡심 μš”μ•½

1. 루프 ꡬ쑰 (μ•”κΈ° ν•„μˆ˜)

for length i
  for start j
    for split k
      for rule

2. 핡심 쑰건

P[j,j+k,B] && P[j+k,j+i,C]
β†’ P[j,j+i,A]

3. μ§„ν–‰ μˆœμ„œ

길이1 β†’ 길이2 β†’ 길이3

4. λΆ„ν•  핡심

i~j = i~k + k+1~j

5. μ‹œν—˜ 포인트

πŸ‘‰ i, j, k κ°’ 직접 좔적
πŸ‘‰ ν‘œ μ±„μš°κΈ°
πŸ‘‰ μ΅œμ’… P(0,n) 확인


πŸ”₯ μ§„μ§œ μ€‘μš”ν•œ ν•œ 쀄
CYKλŠ” β€œκ΅¬κ°„ DP + λΆ„ν•  정볡 + bottom-up νŒŒμ‹±β€μ΄λ‹€


πŸ“‹ 31νŽ˜μ΄μ§€

Illustration

해석: μ˜ˆμ‹œ


μ½”λ“œ (계속 반볡)

for each i = 2 to n
  for each j = 0 to n-i
    for each k = 1 to i-1
      for each production RA β†’ RB RC

해석: μ—¬μ „νžˆ CYK 루프


μƒνƒœ

i = 3
j = 0
k = 1

true κ°’

P(0,1,E)
P(1,2,E)
P(2,3,E)
P(0,2,E)
P(1,3,E)
P(0,3,E)

πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

μž…λ ₯:

a a a

ꡬ쑰:

  • μ•„λž˜: a a a
  • μœ„: E듀이 계속 μŒ“μž„

πŸ‘‰ μ΅œμ’…: 전체 λ¬Έμžμ—΄λ„ E둜 생성 κ°€λŠ₯

βœ” 핡심: πŸ‘‰ CYK ν…Œμ΄λΈ” μ™„μ„± μƒνƒœ


πŸ“‹ 32νŽ˜μ΄μ§€

Illustration

해석: μ˜ˆμ‹œ


μƒνƒœ

i = 3
j = 0
k = 2

체크

P(0,2,E) and P(2,3,E)

πŸ‘‰ true


κ²°κ³Ό

P(0,3,E) = true

(이미 trueμ§€λ§Œ 또 확인됨)


πŸ–Ό κ·Έλ¦Ό μ„€λͺ…

두 κ°€μ§€ 경우:

(a a) a
a (a a)

λ‘˜ λ‹€ κ°€λŠ₯

βœ” 핡심: πŸ‘‰ ν•˜λ‚˜μ˜ λ¬Έμžμ—΄μ΄ μ—¬λŸ¬ λ°©μ‹μœΌλ‘œ 생성됨


πŸ“‹ 33νŽ˜μ΄μ§€

CYK Pseudocode

해석: CYK μ˜μ‚¬μ½”λ“œ


if any of P[0,n-1,x] is true
β†’ s is in L(G)

해석: 전체 ꡬ간이 생성 κ°€λŠ₯ν•˜λ©΄ 성곡


O(NΒ²) space complexity

해석: 곡간 λ³΅μž‘λ„


O(NΒ³Β·r) time complexity

해석: μ‹œκ°„ λ³΅μž‘λ„

βœ” μ„€λͺ…: πŸ‘‰ r = κ·œμΉ™ 개수

βœ” 핡심:

Β  λ³΅μž‘λ„
μ‹œκ°„ O(nΒ³)
곡간 O(nΒ²)

πŸ“‹ 34νŽ˜μ΄μ§€

Given a CYK graph, we can find a parse tree

해석: CYK κ·Έλž˜ν”„μ—μ„œ 파슀 트리λ₯Ό λ§Œλ“€ 수 μžˆλ‹€


Parse tree:

해석: 파슀 트리


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (이 νŽ˜μ΄μ§€ 핡심)

μž…λ ₯:

a + a * a

κ·Έλž˜ν”„ ꡬ쑰

μœ„μͺ½:

E9, E10
E6, E7, E8
E11

πŸ‘‰ CYK κ²°κ³Ό (μ—¬λŸ¬ 후보 λ…Έλ“œ)


트리 (μ™Όμͺ½ κ·Έλ¦Ό)

        E
       / \
      E   E
     / \   |
    a   a  a
        +
        *

πŸ‘‰ μ‹€μ œ μ„ νƒλœ parse tree


Q: Is this the tree we want?

해석: 이게 μš°λ¦¬κ°€ μ›ν•˜λŠ” νŠΈλ¦¬μΈκ°€?


Why is this not part of the tree?

해석: μ™œ 일뢀 λ…Έλ“œλŠ” νŠΈλ¦¬μ— ν¬ν•¨λ˜μ§€ μ•ŠλŠ”κ°€?


βœ” 핡심 μ„€λͺ… (μ§„μ§œ μ€‘μš”):

πŸ‘‰ CYKλŠ” β€œκ°€λŠ₯ν•œ λͺ¨λ“  쑰합”을 λ§Œλ“ λ‹€

ν•˜μ§€λ§Œ

그쀑 ν•˜λ‚˜λ§Œ μ„ νƒν•΄μ„œ parse tree λ§Œλ“ λ‹€


βœ” μ™œ 일뢀 λ…Έλ“œλŠ” μ•ˆ 쓰냐?

πŸ‘‰ 이유: 전체λ₯Ό κ΅¬μ„±ν•˜λŠ” 경둜만 선택해야 ν•˜κΈ° λ•Œλ¬Έ


βœ” 핡심 이해:

  의미
CYK κ²°κ³Ό μ—¬λŸ¬ κ°€λŠ₯μ„± (graph)
parse tree ν•˜λ‚˜μ˜ μ„ νƒλœ 경둜

πŸ”₯ 31~34νŽ˜μ΄μ§€ 핡심 μš”μ•½

1. CYK κ²°κ³Ό vs parse tree

  의미
CYK κ°€λŠ₯ν•œ λͺ¨λ“  ꡬ쑰
Parse Tree κ·Έ 쀑 ν•˜λ‚˜ 선택

2. μ€‘μš”ν•œ κ°œλ…

πŸ‘‰ ambiguityκ°€ 있으면 β†’ parse tree μ—¬λŸ¬ 개 κ°€λŠ₯

3. 핡심 질문

β€œμ–΄λ–€ 경둜λ₯Ό 선택할 것인가?”

4. μ•Œκ³ λ¦¬μ¦˜ 관점

  • CYK = recognition + 후보 생성
  • Tree 생성 = 선택 κ³Όμ • μΆ”κ°€ ν•„μš”

πŸ”₯ 이 파트 ν•œ 쀄 정리
CYKλŠ” β€œμ •λ‹΅ μ—¬λΆ€β€κΉŒμ§€λŠ” 보μž₯ν•˜μ§€λ§Œ β€œνŠΈλ¦¬ 선택”은 별도 λ¬Έμ œλ‹€


πŸ“‹ 35νŽ˜μ΄μ§€

Parsing

해석: νŒŒμ‹±


πŸ–Ό κ·Έλ¦Ό (전체 ꡬ쑰)

ꡬ성 μš”μ†Œ:

program text β†’ parser β†’ parse tree β†’ AST β†’ interpreter

쀑간에:

  • grammar
  • syntax-directed translation
  • AST-based interpreter

βœ” μ„€λͺ…: 전체 컴파일러 흐름:

ν…μŠ€νŠΈ β†’ νŒŒμ‹± β†’ ꡬ쑰 β†’ AST β†’ μ‹€ν–‰/해석

πŸ‘‰ parser의 μœ„μΉ˜: 쀑간 핡심 단계

βœ” 핡심:

πŸ‘‰ parserλŠ” λ‹¨μˆœνžˆ κ²€μ‚¬ν•˜λŠ” 게 μ•„λ‹ˆλΌ
πŸ‘‰ β€œκ΅¬μ‘°(AST)β€κΉŒμ§€ λ§Œλ“ λ‹€


πŸ“‹ 36νŽ˜μ΄μ§€

Top-Down Parsers and LL Grammars

해석: νƒ‘λ‹€μš΄ νŒŒμ„œμ™€ LL 문법


Top-down parser is a parser for LL class of grammars

해석: νƒ‘λ‹€μš΄ νŒŒμ„œλŠ” LL λ¬Έλ²•μš©μ΄λ‹€


LL = Left-to-right scanning of input, Left-most derivation

해석: LL = μ’Œβ†’μš° 읽고, 쒌츑 μœ λ„


Also called a predictive parser

해석: 예츑 νŒŒμ„œλΌκ³ λ„ ν•œλ‹€


LL class is a strict subset of LR

해석: LL βŠ‚ LR


LL grammars cannot contain left-recursive productions

해석: LL은 left recursion λͺ» 씀

X ::= X Y   ❌

πŸ‘‰ μ™Όμͺ½ μž¬κ·€ β†’ κΈˆμ§€


LL(k) where k is lookahead depth

해석: kλŠ” lookahead 개수


if k=1, common prefix cannot be handled

해석: LL(1)은 prefix 겹치면 λͺ» 처리

μ˜ˆμ‹œ:

X ::= a b | a c

πŸ‘‰ λ‘˜ λ‹€ a μ‹œμž‘ β†’ 문제


A top-down parser constructs a parse tree from the root down

해석: λ£¨νŠΈλΆ€ν„° 트리 생성


Not too difficult to implement recursive descent

해석: κ΅¬ν˜„ 쉬움

βœ” 핡심:

  의미
Top-down μœ„μ—μ„œ μ‹œμž‘
LL 예츑 기반

πŸ“‹ 37νŽ˜μ΄μ§€

Top-down Parsing Example: Micro English

해석: μ˜ˆμ‹œ: Micro English


문법:

Sentence ::= Subject Verb Object .
Subject  ::= I | a Noun | the Noun
Object   ::= me | a Noun | the Noun
Noun     ::= cat | mat | rat
Verb     ::= like | is | see | sees

해석: κ°„λ‹¨ν•œ μ˜μ–΄ 문법


λ¬Έμž₯ μ˜ˆμ‹œ:

The cat sees the rat.
I like a cat

βœ” μ„€λͺ…: πŸ‘‰ 이건 μ‹€μ œ β€œμ–Έμ–΄ νŒŒμ‹± μ˜ˆμ‹œβ€

βœ” 핡심: πŸ‘‰ parserλŠ” 이런 λ¬Έμž₯을 ꡬ쑰둜 λ°”κΏˆ


πŸ“‹ 38νŽ˜μ΄μ§€

Top-down LL Parsing

해석: νƒ‘λ‹€μš΄ LL νŒŒμ‹±


πŸ–Ό κ·Έλ¦Ό μ„€λͺ… (핡심)

λ¬Έμž₯:

The cat sees a rat .

트리 ꡬ쑰:

Sentence
 β”œβ”€β”€ Subject
 β”œβ”€β”€ Verb
 β”œβ”€β”€ Object
 └── .

μ„ΈλΆ€:

Subject β†’ The cat
Verb    β†’ sees
Object  β†’ a rat

βœ” 핡심: πŸ‘‰ μœ„μ—μ„œ μ•„λž˜λ‘œ 트리 생성

Sentence β†’ Subject β†’ Noun ...

πŸ“‹ 39νŽ˜μ΄μ§€

문법 반볡

πŸ‘‰ 동일 문법 λ‹€μ‹œ λ³΄μ—¬μ€Œ

βœ” μ„€λͺ…:

πŸ‘‰ parser κ΅¬ν˜„ μ€€λΉ„ 단계

πŸ‘‰ 문법을 계속 ν™•μΈν•˜λŠ” 이유: μ½”λ“œλ‘œ λ°”κΎΈκΈ° μœ„ν•΄


πŸ“‹ 40νŽ˜μ΄μ§€

문법 + 트리

βœ” μ„€λͺ…:

πŸ‘‰ μ‹€μ œ parsing μ§„ν–‰ 쀑 μƒνƒœ

πŸ‘‰ parserλŠ”: ν˜„μž¬ μž…λ ₯ λ³΄λ©΄μ„œ κ·œμΉ™ 선택


πŸ“‹ 41νŽ˜μ΄μ§€

문법 + 트리 (계속)

βœ” μ„€λͺ…: πŸ‘‰ 단계별 ν™•μž₯

Sentence
β†’ Subject
β†’ the Noun
β†’ the cat

βœ” 핡심: πŸ‘‰ LL parserλŠ” β€œν•œ 단계씩 내렀감”


πŸ“‹ 42νŽ˜μ΄μ§€

Outlook

해석: 정리


Parsing βœ“

해석: νŒŒμ‹± μ™„λ£Œ


Top-down parsing βœ“

해석: νƒ‘λ‹€μš΄ μ™„λ£Œ


Recursive descent parser construction

해석: μž¬κ·€ ν•˜κ°• νŒŒμ„œ κ΅¬ν˜„


AST Construction

해석: AST 생성


Chomsky’s Hierarchy

해석: μ΄˜μŠ€ν‚€ 계측

βœ” μ„€λͺ…: πŸ‘‰ μ§€κΈˆκΉŒμ§€ ν•œ 것 정리


πŸ”₯ 35~42νŽ˜μ΄μ§€ 핡심 μš”μ•½

1. Parser 전체 흐름

input β†’ parser β†’ parse tree β†’ AST

2. Top-down parsing

λ£¨νŠΈλΆ€ν„° μ‹œμž‘ν•΄μ„œ 내렀감

3. LL νŠΉμ§•

Left-to-right + Leftmost derivation

4. μ œμ•½

Left recursion κΈˆμ§€

5. 핡심 아이디어

ν˜„μž¬ μž…λ ₯ 보고 β€œμ–΄λ–€ κ·œμΉ™ μ“Έμ§€ μ˜ˆμΈ‘β€

6. Recursive Descent

πŸ‘‰ ν•¨μˆ˜ 기반 κ΅¬ν˜„

parseExpr() {
  if (...) parseTerm();
}

πŸ”₯ 전체 κ°•μ˜ ν•œ 쀄 정리
Scanner β†’ Parser β†’ AST β†’ μ‹€ν–‰

Categories:

Updated:

Leave a comment