Syntax Analysis 1

COMP321 Compiler β€” Syntax Analysis 1

Kyungpook National University | Hwisoo So | Spring 2026


πŸ“‹ 1νŽ˜μ΄μ§€ β€” 제λͺ©

Syntax Analysis – 1

ꡬ문 뢄석 – 1

μ»΄νŒŒμΌλŸ¬μ—μ„œ Syntax Analysis(ꡬ문 뢄석) 단계에 λŒ€ν•œ 첫 번째 κ°•μ˜λ‹€.
즉, μ§€κΈˆλΆ€ν„° parser(문법 검사기) 파트λ₯Ό 본격적으둜 μ‹œμž‘ν•œλ‹€λŠ” 의미.

  • Hwisoo So β†’ μ†Œνœ˜μˆ˜ (κ°•μ˜μž 이름)
  • Kyungpook National University β†’ κ²½λΆλŒ€ν•™κ΅
  • COMP321 Compiler Spring 2026 β†’ 컴파일러 κ³Όλͺ© (2026λ…„ λ΄„ν•™κΈ°)

πŸ“‹ 2νŽ˜μ΄μ§€ β€” Outlook / Today’s agenda

Outlook / Today’s agenda

κ°•μ˜ κ°œμš” / 였늘 배울 λ‚΄μš©

Syntax and Semantics of Programming Languages

ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ ꡬ문과 의미

μ•žμœΌλ‘œ 계속 λ‚˜μ˜€λŠ” 핡심 κ°œλ…:

  • Syntax = ν˜•νƒœ (문법)
  • Semantics = 의미

Specifying the Syntax of a programming language

ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ ꡬ문을 μ •μ˜ν•˜λŠ” 방법

  • CFGs, BNF and EBNF β†’ CFG, BNF, EBNF
    • CFG β†’ 문법 μ •μ˜ 이둠
    • BNF β†’ 문법 ν‘œν˜„ 방식
    • EBNF β†’ 더 νŽΈν•œ ν™•μž₯ 버전
  • Grammar transformations β†’ 문법 λ³€ν™˜
    νŒŒμ„œ λ§Œλ“€κΈ° μ‰½κ²Œ 문법을 λ°”κΎΈλŠ” κ³Όμ • (β†’ LL νŒŒμ„œ λ§Œλ“€ λ•Œ ν•„μˆ˜)

Parsing

νŒŒμ‹± (ꡬ문 뢄석 κ³Όμ •)

  • Top-down parsing (LL) β†’ νƒ‘λ‹€μš΄ νŒŒμ‹± (LL)
  • Recursive descent (LL) parser construction β†’ μž¬κ·€ ν•˜κ°• νŒŒμ„œ κ΅¬ν˜„
  • LL Grammars β†’ LL 문법
    LL νŒŒμ„œλŠ” μ™Όμͺ½λΆ€ν„° 읽고(L), μ™Όμͺ½λΆ€ν„° μœ λ„(L)

AST Construction

AST 생성

  • Parse trees vs. ASTs β†’ 파슀 트리 vs AST
    • Parse Tree = 문법 κ·ΈλŒ€λ‘œ
    • AST = μ‹€μ œ 의미 쀑심

Chomsky’s Hierarchy

μ΄˜μŠ€ν‚€ 계측 β€” μ–Έμ–΄ 이둠 (μ •κ·œ, CFG λ“±)


πŸ“‹ 3νŽ˜μ΄μ§€ β€” 컴파일러 ꡬ쑰 볡슡

Recapitulate: the structure of a compiler…

볡슡: 컴파일러의 ꡬ쑰

κ·Έλ¦Ό μ„€λͺ… β€” 컴파일러 전체 νŒŒμ΄ν”„λΌμΈ

Source Code
    ↓
Lexical Analyzer
    ↓  [Tokens]
Syntax Analyzer
    ↓  [Abstract Syntax Tree (AST)]
Semantic Analyzer
    ↓  [Decorated AST]
Intermediate Code Generator
    ↓  [Intermediate Representation (IR)]
Code Optimizer
    ↓
Code Generator
    ↓
Target Assembly
단계 좜λ ₯
Lexical Analyzer Tokens
Syntax Analyzer AST
Semantic Analyzer Decorated AST
Intermediate Code Generator IR
  • Compiler Frontend (Analysis) β†’ 컴파일러 μ•žλ‹¨: Lexical + Syntax + Semantic
  • Compiler Backend (Synthesis) β†’ 컴파일러 뒷단: IR 생성 이후

…each phase transforms the program…
각 λ‹¨κ³„λŠ” ν”„λ‘œκ·Έλž¨μ„ λ‹€λ₯Έ ν˜•νƒœλ‘œ λ³€ν™˜ν•œλ‹€

μ»΄νŒŒμΌλŸ¬λŠ” β€œλ²ˆμ—­κΈ°β€κ°€ μ•„λ‹ˆλΌ 단계별 λ³€ν™˜κΈ° νŒŒμ΄ν”„λΌμΈμ΄λ‹€.


πŸ“‹ 4νŽ˜μ΄μ§€ β€” 컴파일러 ꡬ쑰 볡슡 (μ½”λ“œ μ˜ˆμ‹œ)

Recapitulate: the structure of a compiler…

컴파일러 ꡬ쑰 볡슡

μ½”λ“œ μ˜ˆμ‹œ

position = initial + rate * 60

심볼 ν…Œμ΄λΈ” μ—°κ²°:

번호 μ‹λ³„μž
1 position
2 initial
3 rate
4 60
  • Symbol Table (심볼 ν…Œμ΄λΈ”) β†’ λ³€μˆ˜ 정보 μ €μž₯: 이름 / νƒ€μž… / μœ„μΉ˜

단계별 λ³€ν™˜

1단계: Tokens (scanner κ²°κ³Ό)

id, =, id, +, id, *, intliteral

2단계: AST (ꡬ쑰만 ν‘œν˜„)

        =
       / \
      id   +
           / \
          id   *
               / \
              id  int

3단계: Decorated AST β†’ νƒ€μž… 정보 좔가됨 (예: int β†’ float λ³€ν™˜)

Note: all variables are of type real
λͺ¨λ“  λ³€μˆ˜λŠ” real νƒ€μž… β†’ νƒ€μž… λ³€ν™˜ λ°œμƒ (int β†’ float)


πŸ“‹ 5νŽ˜μ΄μ§€ β€” Syntax and Semantics (μ˜μ–΄ μ˜ˆμ‹œ)

English Language: Syntax and Semantics

μ˜μ–΄μ—μ„œμ˜ ꡬ문과 의미

λ¬Έμž₯ 해석
John eats apples. 쑴은 사과λ₯Ό λ¨ΉλŠ”λ‹€
Apples eat John. 사과가 쑴을 λ¨ΉλŠ”λ‹€

Syntax (ꡬ문)

  • The form or structure of English sentences β†’ λ¬Έμž₯의 ν˜•νƒœ λ˜λŠ” ꡬ쑰
  • Not concerned with the meaning β†’ μ˜λ―ΈλŠ” κ³ λ €ν•˜μ§€ μ•ŠλŠ”λ‹€
  • Specified by the English grammar β†’ 문법에 μ˜ν•΄ μ •μ˜λœλ‹€

SyntaxλŠ” β€œλ§žλŠ” λ¬Έμž₯μΈμ§€β€λ§Œ λ³Έλ‹€
Apples eat John β†’ λ¬Έλ²•μ μœΌλ‘œλŠ” 맞음, μ˜λ―ΈλŠ” 이상함

Semantics (의미)

  • The meaning of English sentences β†’ λ¬Έμž₯의 의미
  • Semantic definition? β†’ 의미λ₯Ό μ–΄λ–»κ²Œ μ •μ˜ν•  것인가?

SemanticsλŠ” β€œλ§μ΄ λ˜λŠ”μ§€β€ λ³Έλ‹€


πŸ”₯ 1~5νŽ˜μ΄μ§€ 핡심 μš”μ•½

  1. 컴파일러 ꡬ쑰: Lexical β†’ Syntax β†’ Semantic
  2. Syntax vs Semantics: Syntax = 문법 / Semantics = 의미
  3. Syntax Analysis μ—­ν• : 토큰을 λ°›μ•„μ„œ ꡬ쑰(트리) λ§Œλ“ λ‹€
  4. AST μ€‘μš”μ„±: 이후 λͺ¨λ“  λ‹¨κ³„μ˜ 기반

πŸ“‹ 6νŽ˜μ΄μ§€ β€” Programming Languages: Syntax and Semantics

Programming Languages: Syntax and Semantics

ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄: ꡬ문과 의미

μ½”λ“œ μ˜ˆμ‹œ

b = b + 1;
if (b==2) printf("too small\n");
  • b = b + 1; β†’ bλŠ” b+1둜 κ°±μ‹  (문법 + 의미 λͺ¨λ‘ 정상)
  • if (b==2) printf("too small\n"); β†’ 쑰건문 μ˜ˆμ‹œ (문법 + 의미 λͺ¨λ‘ 정상)

Syntax (ꡬ문)

  • The form or structure of programs β†’ ν”„λ‘œκ·Έλž¨μ˜ ν˜•νƒœ λ˜λŠ” ꡬ쑰
  • Not concerned with the meaning of programs β†’ ν”„λ‘œκ·Έλž¨μ˜ μ˜λ―ΈλŠ” κ³ λ €ν•˜μ§€ μ•ŠλŠ”λ‹€
  • Specified by a context-free grammar (CFG) β†’ λ¬Έλ§₯ 자유 문법(CFG)에 μ˜ν•΄ μ •μ˜λœλ‹€

핡심: ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ 문법은 μ „λΆ€ CFG둜 μ •μ˜λ¨

Semantics (의미)

  • The meaning of programs β†’ ν”„λ‘œκ·Έλž¨μ˜ 의미
  • Specified by:
    • operational, denotational or axiomatic semantics β†’ 연산적 / μ§€μ‹œμ  / 곡리적 의미둠
    • attribute grammars (will be covered by this course) β†’ 속성 문법 (이 μˆ˜μ—…μ—μ„œ λ‹€λ£Έ)
    • informal English descriptions (as with C, Java, MiniC) β†’ 비곡식적 μ˜μ–΄ μ„€λͺ…

πŸ“‹ 7νŽ˜μ΄μ§€ β€” Context-free Grammars (CFGs)

Context-free Grammars (CFGs)

λ¬Έλ§₯ 자유 문법

Example: CFG for micro-English

Sentence ::= Subject Verb Object .
Subject  ::= I | A Noun | The Noun
Object   ::= me | a Noun | the Noun
Noun     ::= cat | mat | rat | Ξ΅
Verb     ::= like | is | see | sees
기호 의미
::= μ •μ˜
\| 선택
Ξ΅ 아무것도 μ—†μŒ (빈 λ¬Έμžμ—΄)

CFG κΈ°λ³Έ κ°œλ…

  • Context-free grammars consist of productions of the form:
    CFGλŠ” λ‹€μŒ ν˜•νƒœμ˜ production으둜 κ΅¬μ„±λœλ‹€
<nonterminal> ::= sequence of (non) terminals
  • a terminal of a grammar is a token β†’ terminal은 토큰이닀
  • **the symbol β€œ ” denotes alternative forms** β†’ |λŠ” 선택을 μ˜λ―Έν•œλ‹€
  • Ξ΅ denotes the empty string β†’ Ξ΅λŠ” 빈 λ¬Έμžμ—΄μ„ μ˜λ―Έν•œλ‹€

πŸ“‹ 8νŽ˜μ΄μ§€ β€” Context-free Grammars (derivation)

Context-free Grammars

λ¬Έλ§₯ 자유 문법 β€” derivation

A derivation shows how to generate a syntactically valid string.
μœ λ„(derivation)λŠ” λ¬Έλ²•μ μœΌλ‘œ μ˜¬λ°”λ₯Έ λ¬Έμžμ—΄μ„ μƒμ„±ν•˜λŠ” 과정을 보여쀀닀

μ˜ˆμ‹œ

Sentence  β†’  Subject Verb Object .
          β†’  I Verb Object .
          β†’  I see Object .
          β†’  I see the Noun .
          β†’  I see the cat .
  • derivation = β€œλ¬Έμž₯을 ν•œ 단계씩 μƒμ„±ν•˜λŠ” 과정”
  • The non-terminal to be replaced next is underlined
    λ‹€μŒμ— μΉ˜ν™˜ν•  non-terminal은 λ°‘μ€„λ‘œ ν‘œμ‹œλœλ‹€

πŸ“‹ 9νŽ˜μ΄μ§€ β€” Context-free Grammars (λ¬Έμž₯ μ˜ˆμ‹œ)

Context-free Grammars

micro-Englishμ—μ„œ κ°€λŠ₯ν•œ λ¬Έμž₯λ“€

ν¬ν•¨λ˜λŠ” λ¬Έμž₯ (valid)

λ¬Έμž₯ 해석 λΉ„κ³ 
I like the cat. λ‚˜λŠ” 고양이λ₯Ό μ’‹μ•„ν•œλ‹€ 문법 OK
The cat like the mat. 고양이가 맀트λ₯Ό μ’‹μ•„ν•œλ‹€ 문법 OK, μ˜μ–΄λŠ” ν‹€λ¦Ό (likesμ—¬μ•Ό 함)
The mat is the rat. λ§€νŠΈλŠ” μ₯λ‹€ 문법 OK, 의미 이상

Syntax vs Semantics 차이 λ‹€μ‹œ λ“±μž₯

ν¬ν•¨λ˜μ§€ μ•ŠλŠ” λ¬Έμž₯ (invalid)

  • I like the black cat. β†’ adjective(ν˜•μš©μ‚¬) μ—†μŒ β†’ 생성 λΆˆκ°€

How many sentences can be derived?

λͺ‡ 개의 λ¬Έμž₯을 λ§Œλ“€ 수 μžˆλŠ”κ°€?

Just a few, i.e., finitely many
λͺ‡ 개 μ•ˆ λœλ‹€ (μœ ν•œ 개) β†’ 이 CFGλŠ” ν‘œν˜„λ ₯이 μ œν•œλ¨


πŸ“‹ 10νŽ˜μ΄μ§€ β€” Derivations

Derivations

μœ λ„

  • From a grammar we can derive strings β†’ λ¬Έλ²•μœΌλ‘œλΆ€ν„° λ¬Έμžμ—΄μ„ 생성할 수 μžˆλ‹€
  • by generating sequences of tokens β†’ ν† ν°μ˜ μ‹œν€€μŠ€λ₯Ό μƒμ„±ν•¨μœΌλ‘œμ¨
  • In each derivation step, a nonterminal is replaced β†’ 각 λ‹¨κ³„μ—μ„œ nonterminal이 μΉ˜ν™˜λœλ‹€
  • by a right-hand side of a production β†’ ν•΄λ‹Ή production의 RHS둜

핡심 κ°œλ…

μš©μ–΄ 의미
Sentential form 쀑간 κ³Όμ •μ˜ λ¬Έμžμ—΄ (terminal + nonterminal 혼재)
Sentence terminal만 μžˆλŠ” μ΅œμ’… μƒνƒœ
  • A context-free grammar is a generator of a language
    CFGλŠ” μ–Έμ–΄ 생성기닀
  • the set of all strings that can be derived
    μœ λ„ κ°€λŠ₯ν•œ λͺ¨λ“  λ¬Έμžμ—΄μ˜ μ§‘ν•©
  • A sentence that cannot be derived is syntactically illegal
    μœ λ„ν•  수 μ—†λŠ” λ¬Έμž₯은 λ¬Έλ²•μ μœΌλ‘œ ν‹€λ¦° 것이닀

CFG = β€œκ°€λŠ₯ν•œ λͺ¨λ“  ν”„λ‘œκ·Έλž¨ μ •μ˜β€
parser μ—­ν•  = 이 λ¬Έμž₯이 CFG둜 λ§Œλ“€μ–΄μ§ˆ 수 μžˆλŠ”μ§€ 검사


πŸ”₯ 6~10νŽ˜μ΄μ§€ 핡심 μš”μ•½

  1. Syntax vs Semantics: Syntax = ꡬ쑰 (CFG) / Semantics = 의미
  2. CFG 핡심 ꡬ성: terminal = 토큰 / nonterminal = 문법 λ³€μˆ˜ / production = κ·œμΉ™ / Ξ΅ = empty
  3. derivation: λ¬Έμž₯을 λ‹¨κ³„μ μœΌλ‘œ 생성 / sentential form β†’ 쀑간 μƒνƒœ / sentence β†’ μ΅œμ’… μƒνƒœ
  4. 핡심 μ •μ˜: CFG = language generator / β€œμƒμ„± κ°€λŠ₯ = λ¬Έλ²•μ μœΌλ‘œ correct”

πŸ“‹ 11νŽ˜μ΄μ§€ β€” Context-free Grammars (μ •μ˜)

Context-free Grammars

CFG μ •μ˜

A context free grammar specifies all valid strings (a.k.a. programs) of a given programming language!
CFGλŠ” μ£Όμ–΄μ§„ ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ—μ„œ κ°€λŠ₯ν•œ λͺ¨λ“  μ˜¬λ°”λ₯Έ λ¬Έμžμ—΄(즉, ν”„λ‘œκ·Έλž¨)을 μ •μ˜ν•œλ‹€.

β†’ Formal description of the syntax of a programming language.
β†’ ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ˜ ꡬ문에 λŒ€ν•œ ν˜•μ‹μ  μ •μ˜

Elements of CFGs (CFG의 ꡬ성 μš”μ†Œ)

Terminal Symbols (Terminals) β€” 단말 기호

  • 예: <=, while, if, >, ==, ID, INTLITERAL
  • μ‹€μ œ μ½”λ“œμ—μ„œ λ‚˜μ˜€λŠ” 토큰듀

Nonterminal Symbols (Non-terminals) β€” 비단말 기호

  • Particular class of phrases in the language β†’ μ–Έμ–΄μ—μ„œ νŠΉμ •ν•œ ꡬ쑰λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 기호
  • 예: Program, Command, Expression, Declaration
  • λ¬Έλ²•μ˜ β€œλ³€μˆ˜β€ μ—­ν• 

Start Symbol β€” μ‹œμž‘ 기호

  • One of the nonterminals β†’ 비단말 쀑 ν•˜λ‚˜
  • usually the lefthand-side of the first production β†’ 보톡 첫 번째 production의 μ™Όμͺ½
  • 예: sentence

Productions β€” 생산 κ·œμΉ™

  • 예: sentence ::= noun verb

CFG = (Terminal, Nonterminal, Start, Production)


πŸ“‹ 12νŽ˜μ΄μ§€ β€” CFG in BNF

Context-free Grammars

BNF ν‘œν˜„

CFGs can be expressed in BNF (Backus-Naur Form)
CFGλŠ” BNF둜 ν‘œν˜„ν•  수 μžˆλ‹€

To recognize P. Naur’s contribution… and J. W. Backus…
BNFλŠ” Naur와 Backus의 κΈ°μ—¬λ₯Ό 기리기 μœ„ν•΄ 이름 λΆ™μ—¬μ‘Œλ‹€

Example in BNF

Program       ::= Command
Command       ::= single-Command
               | Command ; single-Command
single-Command ::= V-name := Expression
               | begin Command end
기호 의미
::= 문법 μ •μ˜
:= μ‹€μ œ μ½”λ“œ λŒ€μž… μ—°μ‚°μž
\| 선택

⚠️ ::=와 := ꡬ뢄 μ€‘μš” β€” ν—·κ°ˆλ¦¬λ©΄ μ‹œν—˜μ—μ„œ ν‹€λ¦°λ‹€

  • μž¬κ·€ 있음 (Command ::= Command ; ...) β†’ β€œλ¬Έμž₯ μ—¬λŸ¬ 개 이어뢙이기” ν‘œν˜„

πŸ“‹ 13νŽ˜μ΄μ§€ β€” EBNF

Context-free Grammars

EBNF

For our convenience, we will use EBNF
편의λ₯Ό μœ„ν•΄ EBNFλ₯Ό μ‚¬μš©ν•œλ‹€

EBNF = BNF + regular expressions
EBNF = BNF + μ •κ·œν‘œν˜„μ‹

BNF vs EBNF 비ꡐ

BNF

Command ::= single-Command
          | Command ; single-Command

EBNF

Command ::= single-Command (; single-Command)*

* means 0 or more occurrences
*λŠ” 0번 이상 반볡

BNF β†’ λ³΅μž‘ν•œ μž¬κ·€ ν•„μš”
EBNF β†’ 훨씬 μ§κ΄€μ μœΌλ‘œ ν‘œν˜„ κ°€λŠ₯


πŸ“‹ 14νŽ˜μ΄μ§€ β€” CFG for extended micro-English

A CFG for extended micro-English

ν™•μž₯된 micro-English용 CFG

Verify if β€œPeter passed the test” is a sentence?
β€œPeter passed the test”가 λ¬Έμž₯인지 ν™•μΈν•˜λΌ

문법

sentence  ::= subject predicate
subject   ::= NOUN | ARTICLE NOUN
predicate ::= VERB object
object    ::= NOUN | ARTICLE NOUN

νŒλ‹¨ κ³Όμ •

단어 λΆ„λ₯˜
Peter NOUN
passed VERB
the ARTICLE
test NOUN

β†’ μ „λΆ€ κ·œμΉ™ 만쑱 β†’ βœ” valid sentence


πŸ“‹ 15νŽ˜μ΄μ§€ β€” Two derivations

Two derivations of β€œPeter passed the test”

두 κ°€μ§€ μœ λ„ 방법

derivation 1 (leftmost)

sentence  β†’  subject predicate
          β†’  NOUN predicate
          β†’  NOUN VERB object
          β†’  NOUN VERB ARTICLE NOUN

derivation 2 (rightmost)

sentence  β†’  subject predicate
          β†’  subject VERB object
          β†’  subject VERB ARTICLE NOUN
          β†’  NOUN VERB ARTICLE NOUN

두 방식 λͺ¨λ‘ 같은 κ²°κ³Ό 생성:
Sentence: NOUN VERB ARTICLE NOUN

  • Sentential forms β†’ 쀑간 κ³Όμ •λ“€
  • 같은 λ¬Έμž₯도 μ—¬λŸ¬ derivation κ°€λŠ₯ β†’ CFGλŠ” 비결정적 ꡬ쑰

πŸ”₯ 11~15νŽ˜μ΄μ§€ 핡심 μš”μ•½

  1. CFG ꡬ성 μš”μ†Œ (ν•„μˆ˜): Terminal / Nonterminal / Start Symbol / Production
  2. BNF vs EBNF: BNF = κΈ°λ³Έ / EBNF = 반볡(*, +, ?) κ°€λŠ₯
  3. derivation 핡심: λ¬Έμž₯ 생성 κ³Όμ •, μ—¬λŸ¬ 방식 κ°€λŠ₯
  4. 핡심 문제 μœ ν˜•: β€œμ΄ λ¬Έμž₯이 CFG둜 생성 κ°€λŠ₯ν•œκ°€?”
  5. 맀우 μ€‘μš”: 같은 λ¬Έμžμ—΄ = μ—¬λŸ¬ derivation κ°€λŠ₯

πŸ“‹ 16νŽ˜μ΄μ§€ β€” Leftmost and Rightmost Derivations

Leftmost and Rightmost Derivations

쒌츑 μœ λ„μ™€ 우츑 μœ λ„

At each step in the derivation, two choices are made:
μœ λ„ κ³Όμ •μ˜ 각 λ‹¨κ³„μ—μ„œ 두 κ°€μ§€ 선택이 μ‘΄μž¬ν•œλ‹€

  1. Which nonterminal to replace? β†’ μ–΄λ–€ nonterminal을 μΉ˜ν™˜ν•  것인가?
  2. Which alternative to use for that nonterminal? β†’ μ–΄λ–€ κ·œμΉ™μ„ μ‚¬μš©ν•  것인가?

Two types of useful derivations (μœ μš©ν•œ 두 κ°€μ§€ μœ λ„ 방식)

방식 μ„€λͺ… μ—°κ²°λ˜λŠ” νŒŒμ„œ
Leftmost derivation 항상 κ°€μž₯ μ™Όμͺ½ nonterminal을 μΉ˜ν™˜ LL parser
Rightmost derivation 항상 κ°€μž₯ 였λ₯Έμͺ½ nonterminal을 μΉ˜ν™˜ LR parser

⭐ μ‹œν—˜ 핡심: Leftmost β†’ LL / Rightmost β†’ LR


πŸ“‹ 17νŽ˜μ΄μ§€ β€” Conventions for writing CFGs

Conventions for writing CFGs

CFG μž‘μ„± κ·œμΉ™

Start symbol

  • The left-hand side of the first production β†’ 첫 번째 production의 μ™Όμͺ½
  • The letter S, whenever it appears β†’ 보톡 Sλ₯Ό μ‚¬μš©

Nonterminals (비단말)

  • lower-case names such as β€œsentence”, β€œexpr”
  • capital letters like A, B, C

Terminals (단말)

  • boldface names such as ID and INTLITERAL
  • digits, operators, punctuation characters: 1, +, [
  • Sometimes in double quotes β†’ 가끔 λ”°μ˜΄ν‘œλ‘œ ν‘œν˜„

πŸ“‹ 18νŽ˜μ΄μ§€ β€” Example Grammar for Pascal

Example Grammar for Pascal

Pascal μ–Έμ–΄ 문법 μ˜ˆμ‹œ

program      ::= PROGRAM id ( id more_ids ) ; block .
block        ::= variables BEGIN stmt more_stmts END
more_ids     ::= , id more_ids | Ξ΅
variables    ::= VAR id more_ids : type ; more_variables | Ξ΅
more_variables ::= id more_ids : type ; more_variables | Ξ΅
stmt         ::= id := exp
               | READ ( id more_ids )
               | IF exp THEN stmt ELSE stmt
               | WHILE exp DO stmt
               | BEGIN stmt more_stmts END
more_stmts   ::= ; stmt more_stmts | Ξ΅
exp          ::= num | id | exp + exp | exp – exp
type         ::= integer | boolean | char

핡심 포인트

  1. program ꡬ쑰 β†’ PROGRAM μ‹œμž‘ β†’ block β†’ 끝에 .
  2. block β†’ λ³€μˆ˜ μ„ μ–Έ + BEGIN ~ END
  3. stmt μ’…λ₯˜: λŒ€μž… / READ / IF / WHILE / BEGIN-END
  4. Ξ΅ β†’ optional (없어도 됨)

Note: The productions for β€˜num’ and β€˜id’ have been omitted
numκ³Ό id μ •μ˜λŠ” μƒλž΅λ¨ (lexical λ‹¨κ³„μ—μ„œ 이미 μ •μ˜λœ 토큰)


πŸ“‹ 19νŽ˜μ΄μ§€ β€” Pascal 문법 검사 예제

Example Grammar for Pascal (cont.)

Does the following program adhere to the Pascal syntax?
λ‹€μŒ ν”„λ‘œκ·Έλž¨μ΄ Pascal 문법을 λ§Œμ‘±ν•˜λŠ”κ°€?

PROGRAM foo (input, output);
BEGIN
  IF a THEN a:= a+1; b := 22 ENDIF;
END;

핡심 뢄석

Pascal λ¬Έλ²•μ—μ„œ IFλ¬Έ κ·œμΉ™:

IF exp THEN stmt ELSE stmt

β†’ λ°˜λ“œμ‹œ ELSE ν•„μš”
β†’ 이 μ½”λ“œλŠ” ELSE μ—†μŒ + ENDIF ꡬ쑰도 Pascal μŠ€νƒ€μΌ μ•„λ‹˜
β†’ 문법 였λ₯˜ κ°€λŠ₯μ„± 있음


πŸ“‹ 20νŽ˜μ΄μ§€ β€” Grammar & Program, side-by-side

Grammar & Program, side-by-side

문법과 ν”„λ‘œκ·Έλž¨ 비ꡐ

μ™Όμͺ½: CFG / 였λ₯Έμͺ½: μ‹€μ œ μ½”λ“œλ₯Ό λ‚˜λž€νžˆ 비ꡐ

λ§€μΉ­ κ³Όμ •

μ½”λ“œ 문법
PROGRAM ok
foo id
(input, output) ok
BEGIN ~ END block

였λ₯˜ λΆ€λΆ„

IF a THEN a:= a+1; b := 22 ENDIF;

CFG κ·œμΉ™: IF exp THEN stmt ELSE stmt
β†’ ELSE μ—†μŒ β†’ 문법 뢈일치 β†’ syntax error


πŸ”₯ 16~20νŽ˜μ΄μ§€ 핡심 μš”μ•½

  1. derivation μ’…λ₯˜ (맀우 μ€‘μš”): Leftmost β†’ LL parser / Rightmost β†’ LR parser
  2. CFG μž‘μ„± κ·œμΉ™: Start symbol / Terminal / Nonterminal ꡬ뢄
  3. Ξ΅ 의미: optional (없어도 됨)
  4. μ‹€μ œ μ‹œν—˜ 핡심 μœ ν˜•: μ½”λ“œκ°€ CFG에 λ§žλŠ”μ§€ νŒλ‹¨
  5. μ€‘μš” 포인트: CFGλŠ” β€œκ°€λŠ₯ν•œ ν”„λ‘œκ·Έλž¨ μ •μ˜β€ / parserλŠ” β€œκ·Έ μ•ˆμ— ν¬ν•¨λ˜λŠ”μ§€ 검사”

πŸ“‹ 21νŽ˜μ΄μ§€ β€” Lexical Analyzer μ—­ν• 

Lexical Analyzer (Scanner, Tokenizer)

원본 μ½”λ“œ

PROGRAM foo (input, output);
VAR a, b: integer;
BEGIN
  IF a <> 0 THEN BEGIN a:= a+1; b := 22 END
END.

scanner 좜λ ₯ (토큰듀)

PROGRAM  foo  (  ,  output  )  ;
IF  a  THEN
a  :=  a  +  1  ;  b  :=  22
END  .
;
BEGIN
VAR  a  ,  b  :  integer  ;
input
<>  0
END
BEGIN

핡심 μ„€λͺ…

scanner μ—­ν• :

  • λ¬Έμžμ—΄ β†’ 토큰 λ‚˜μ—΄ βœ”
  • 문법 체크 ❌
  • ꡬ쑰 이해 ❌

Lexical AnalyzerλŠ” β€œμ˜λ―Έβ€λ‚˜ β€œκ΅¬μ‘°β€λ₯Ό λͺ¨λ₯Έλ‹€ β€” λ‹¨μˆœνžˆ λ¬Έμžμ—΄ β†’ ν† ν°μœΌλ‘œ μͺΌκ°€ 뿐이닀


πŸ“‹ 22νŽ˜μ΄μ§€ β€” Syntax Analyzer μ—­ν• 

Syntax Analyzer (Parser)

흐름

Tokens  β†’  Parser  β†’  Parse Tree
단계 μ—­ν• 
scanner 토큰 생성
parser ꡬ쑰 생성 (트리)
  • Scanner β†’ 토큰 λ‚˜μ—΄
  • Syntax Analyzer (Parser) β†’ 토큰을 μž…λ ₯으둜 λ°›μ•„ 문법 ꡬ쑰(Parse Tree) 생성

πŸ“‹ 23νŽ˜μ΄μ§€ β€” Syntax Error μ˜ˆμ‹œ

Syntax Error λ°œμƒ

문제 μ½”λ“œ

PROGRAM foo (input, output);
BEGIN
  IF a THEN a:= a+1; b := 22 ENDIF;
END;

parser μ—λŸ¬

Expected token: ELSE
Actual token from the scanner: (μ‹€μ œ scannerκ°€ μ€€ 토큰)

핡심

  • CFG κ·œμΉ™: IF exp THEN stmt ELSE stmt
  • μ‹€μ œ μ½”λ“œ: ELSE μ—†μŒ β†’ κ·œμΉ™ 뢈일치

scannerλŠ” 문제 μ—†μŒ
parserμ—μ„œ μ—λŸ¬ λ°œμƒ
Syntax Error = parser λ‹¨κ³„μ—μ„œ λ°œμƒ


πŸ“‹ 24νŽ˜μ΄μ§€ β€” Grammar & Program (Reminder)

Grammar & Program, side-by-side (Reminder)

문법과 ν”„λ‘œκ·Έλž¨ 비ꡐ (λ‹€μ‹œ 보기)

μ™Όμͺ½: CFG / 였λ₯Έμͺ½: μ½”λ“œλ₯Ό λ‚˜λž€νžˆ λΉ„κ΅ν•΄μ„œ 문법 μœ„λ°˜ μ°ΎκΈ°

PROGRAM foo (input, output);
BEGIN
  IF a THEN a:= a+1; b := 22 ENDIF;
END;
  • IF β†’ ELSE ν•„μš” (μ—†μŒ β†’ 였λ₯˜)
  • ENDIF β†’ Pascal 문법 μ•„λ‹˜
  • β†’ CFG둜 생성 λΆˆκ°€ β†’ syntax error

πŸ“‹ 25νŽ˜μ΄μ§€ β€” SYNTAX ERROR ν™•μ •

SYNTAX ERROR!

SYNTAX ERROR!
Expected token: ELSE
Actual token from the scanner: ...

parserλŠ” 항상:

κΈ°λŒ€ν•œ 토큰  vs  μ‹€μ œ 토큰  β†’  mismatch  β†’  Syntax Error

πŸ”₯ 21~25νŽ˜μ΄μ§€ 핡심 μš”μ•½

  1. scanner vs parser: scanner β†’ 토큰 생성 / parser β†’ 문법 검사
  2. Syntax Error λ°œμƒ μœ„μΉ˜: parser 단계
  3. error 원인: CFG κ·œμΉ™ 뢈일치
  4. λŒ€ν‘œ 예: IF에 ELSE μ—†μŒ β†’ 였λ₯˜
  5. μ€‘μš”ν•œ κ°œλ… 흐름:
    input β†’ scanner β†’ tokens β†’ parser β†’ parse tree
                                 ❌ μ‹€νŒ¨ μ‹œ syntax error
    

πŸ“‹ 26νŽ˜μ΄μ§€ β€” Extended BNF

Extended BNF

ν™•μž₯된 BNF

Extended BNF combines BNF with RE
EBNFλŠ” BNF와 μ •κ·œν‘œν˜„μ‹μ„ κ²°ν•©ν•œ 것이닀

EBNF production ν˜•νƒœ

LHS ::= RHS
  • LHS β†’ nonterminal symbol
  • RHS β†’ extended regular expression (terminal + nonterminal둜 ꡬ성)

EBNFκ°€ BNF에 μΆ”κ°€ν•˜λŠ” 기호

기호 의미
( ) κ·Έλ£Ήν™”
* 0번 이상 반볡
+ 1번 이상 반볡
? 0 λ˜λŠ” 1번

⭐ 이 4κ°œλŠ” 무쑰건 κΈ°μ–΅

The miniC grammar is given in EBNF
miniC 문법은 EBNF둜 μ£Όμ–΄μ§„λ‹€


πŸ“‹ 27νŽ˜μ΄μ§€ β€” EBNF Example

Extended BNF Example

κ°„λ‹¨ν•œ ν‘œν˜„μ‹ μ–Έμ–΄

Expression  ::= PrimaryExp (Operator PrimaryExp)*
PrimaryExp  ::= Literal | Identifier | "(" Expression ")"
Identifier  ::= Letter (Letter | Digit)*
Literal     ::= Digit Digit*
Letter      ::= "a" | ... | "z"
Digit       ::= "0" | ... | "9"
Operator    ::= "+" | "-" | "*" | "/"

핡심

  • (Operator PrimaryExp)* β†’ μ—°μ‚° μ—¬λŸ¬ 번 κ°€λŠ₯
  • a + b * c 같은 ν‘œν˜„μ‹ 생성 κ°€λŠ₯

πŸ“‹ 28νŽ˜μ΄μ§€ β€” Kleene Closure (*)

An EBNF example from miniC: Kleene Closure

A miniC program that consists of a sequence of zero or more declarations
miniC ν”„λ‘œκ·Έλž¨μ€ 선언이 0개 μ΄μƒμœΌλ‘œ κ΅¬μ„±λœλ‹€

BNF (λ³΅μž‘ν•œ μž¬κ·€)

program   ::= decl-list
decl-list ::= decl-list func-decl
            | decl-list var-decl
            | Ξ΅

EBNF (ν•œ μ€„λ‘œ)

program ::= (func-decl | var-decl)*

* = Kleene Closure = 0개 이상 반볡


πŸ“‹ 29νŽ˜μ΄μ§€ β€” Positive Closure (+)

An EBNF example from miniC: Positive Closure

one or more declarations β†’ ν•˜λ‚˜ 이상 μ„ μ–Έ

BNF

decl-list ::= decl-list func-decl
            | decl-list var-decl
            | func-decl
            | var-decl

EBNF

program ::= (func-decl | var-decl)+

+ = 1개 이상


πŸ“‹ 30νŽ˜μ΄μ§€ β€” Option (?)

An EBNF example from miniC: Option ?

The if-statement with an optional else-part
elseκ°€ 선택적인 ifλ¬Έ

BNF

stmt ::= if "(" expr ")" stmt
       | if "(" expr ")" stmt else stmt
       | other

EBNF

stmt ::= if "(" expr ")" stmt (else stmt)?
       | other

? = μžˆμ„ μˆ˜λ„ 있고 없을 μˆ˜λ„ 있음

이것이 dangling else 문제의 μ‹œμž‘μ΄λ‹€


πŸ”₯ 26~30νŽ˜μ΄μ§€ 핡심 μš”μ•½

기호 의미
* 0 이상 (Kleene Closure)
+ 1 이상 (Positive Closure)
? 선택 (Option)
() κ·Έλ£Ήν™”
  • BNF vs EBNF: EBNFλŠ” β€œμΆ•μ•½ ν‘œν˜„β€
  • λͺ¨λ“  λ³΅μž‘ν•œ 문법도 κ²°κ΅­ RE처럼 ν‘œν˜„ κ°€λŠ₯

πŸ“‹ 31νŽ˜μ΄μ§€ β€” Grammar Transformations (이둠)

A little bit of useful theory

μ•½κ°„μ˜ μœ μš©ν•œ 이둠

We will now look at a very few useful bits of theory.
이제 λͺ‡ κ°€μ§€ μœ μš©ν•œ 이둠을 μ‚΄νŽ΄λ³Έλ‹€

These will be necessary later when we implement parsers.
이것듀은 λ‚˜μ€‘μ— parserλ₯Ό κ΅¬ν˜„ν•  λ•Œ ν•„μš”ν•˜λ‹€

Grammar transformations (문법 λ³€ν™˜)

A grammar can be transformed in a number of ways without changing the meaning
문법은 의미λ₯Ό λ°”κΎΈμ§€ μ•Šκ³  μ—¬λŸ¬ λ°©μ‹μœΌλ‘œ λ³€ν™˜λ  수 μžˆλ‹€

(i.e., the set of strings that it generates)
즉, μƒμ„±ν•˜λŠ” λ¬Έμžμ—΄ 집합은 λ™μΌν•˜κ²Œ μœ μ§€

λ³€ν™˜ 이유: νŒŒμ„œ λ§Œλ“€κΈ° μ‰½κ²Œ ν•˜κΈ° μœ„ν•΄


πŸ“‹ 32νŽ˜μ΄μ§€ β€” Grammar Transformations (1)

Grammar Transformations (1)

문법 λ³€ν™˜ 1 β€” Left Factorization (쒌츑 μΈμˆ˜λΆ„ν•΄)

κΈ°μ‘΄ 문법

single-Command ::= V-name := Expression
                 | if Expression then single-Command
                 | if Expression then single-Command else single-Command

문제: 두 κ·œμΉ™μ΄ 같은 prefix (if Expression then single-Command)

λ³€ν™˜ ν›„

single-Command ::= V-name := Expression
                 | if Expression then single-Command ( Ξ΅ | else single-Command )

일반 ν˜•νƒœ

N ::= X Y | X Z   β†’   N ::= X (Y | Z)

β€œμ•žλΆ€λΆ„ κ°™μœΌλ©΄ 묢어라”
LL parserμ—μ„œ ν•„μˆ˜


πŸ“‹ 33νŽ˜μ΄μ§€ β€” Grammar Transformations (2)

Grammar Transformations (2)

문법 λ³€ν™˜ 2 β€” Elimination of Left Recursion (쒌츑 μž¬κ·€ 제거)

문제 문법

N ::= X | N Y

β†’ LL parserμ—μ„œ λ¬΄ν•œ 루프 λ°œμƒ

μ˜ˆμ‹œ

Identifier ::= Letter
             | Identifier Letter
             | Identifier Digit

λ³€ν™˜ ν›„

Identifier ::= Letter (Letter | Digit)*

λ³€ν™˜ 원리

N β†’ X
  β†’ X Y
  β†’ X Y Y
  β†’ ...
  β†’ X Y*

Left recursion β†’ Kleene star (*)


πŸ“‹ 34νŽ˜μ΄μ§€ β€” Grammar Transformations (3)

Grammar Transformations (3)

문법 λ³€ν™˜ 3 β€” Substitution of non-terminal symbols (비단말 μΉ˜ν™˜)

κΈ°λ³Έ ν˜•νƒœ

N ::= X
M ::= Ξ± N Ξ²

λ³€ν™˜ ν›„:

M ::= Ξ± X Ξ²

쀑간 단계 제거 β†’ 직접 μ—°κ²°

μ˜ˆμ‹œ

single-Command ::= for contrVar := Expression
                   to-or-dt Expression do single-Command

to-or-dt ::= to | downto

λ³€ν™˜ ν›„:

single-Command ::= for contrVar := Expression
                   (to | downto) Expression do single-Command

nonterminal 제거 β†’ inline 처리


πŸ“‹ 35νŽ˜μ΄μ§€ β€” Outlook (정리)

Outlook

정리 및 이후 λ‚΄μš©

μ™„λ£Œλœ λ‚΄μš© βœ”

  • Syntax and Semantics of Programming Languages βœ”
  • Specifying the Syntax of a programming language βœ”
    • CFGs, BNF and EBNF βœ”
    • Grammar transformations βœ”

λ‹€μŒ λ‚΄μš©

  • Parsing
    • Top-down parsing (LL)
    • Recursive descent
    • LL Grammars
  • AST Construction
  • Chomsky’s Hierarchy

μ§€κΈˆκΉŒμ§€λŠ” β€œλ¬Έλ²• μ •μ˜β€
μ΄μ œλΆ€ν„°λŠ” β€œνŒŒμ‹± κ΅¬ν˜„β€


πŸ”₯ 31~35νŽ˜μ΄μ§€ 핡심 μš”μ•½

λ³€ν™˜ λͺ©μ 
Left Factorization 곡톡 prefix 제거
Left Recursion 제거 LL parser ν•„μˆ˜ 쑰건
Substitution 비단말 inline 처리
  • Grammar Transformation λͺ©μ : νŒŒμ„œ λ§Œλ“€κΈ° μ‰½κ²Œ

πŸ”₯ 전체 κ°•μ˜ 핡심 μš”μ•½ (1~35νŽ˜μ΄μ§€)

⭐ μ‹œν—˜ 핡심 6κ°€μ§€

ν•­λͺ© λ‚΄μš©
⭐ 1. CFG μ–Έμ–΄ μ •μ˜ 핡심
⭐ 2. Derivation λ¬Έμž₯ 생성 κ³Όμ •
⭐ 3. Syntax vs Semantics 무쑰건 ꡬ뢄
⭐ 4. EBNF 기호 * / + / ? / ()
⭐ 5. Grammar Transformation Left factoring / Left recursion 제거
⭐ 6. Parser μ—­ν•  CFG 기반 문법 검사

컴파일러 전체 흐름

input β†’ scanner β†’ tokens β†’ parser β†’ parse tree
                              ❌ μ‹€νŒ¨ μ‹œ Syntax Error

Grammar Transformation μš”μ•½

RE β†’ NFA β†’ DFA β†’ μ΅œμ†Œ DFA  (lexical)

CFG β†’ Left Factoring β†’ Left Recursion 제거 β†’ Parser  (syntax)

Categories:

Updated:

Leave a comment