Lexical Analysis Part 3

COMP321 Compiler โ€” Lexical Analysis Part 3

Kyungpook National University | Hwisoo So | Spring 2026


๐Ÿ“‹ 1ํŽ˜์ด์ง€ โ€” ์ œ๋ชฉ

Lexical Analysis โ€“ Part 3

์–ดํœ˜ ๋ถ„์„ โ€“ 3๋ฒˆ์งธ ํŒŒํŠธ

์ปดํŒŒ์ผ๋Ÿฌ์˜ Lexical Analysis(์Šค์บ๋„ˆ ๋‹จ๊ณ„)๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฐ•์˜์˜ ์„ธ ๋ฒˆ์งธ ํŒŒํŠธ๋‹ค.
์ฆ‰, ์ง€๊ธˆ๊นŒ์ง€ ๋ฐฐ์šด ๋‚ด์šฉ์˜ ์—ฐ์žฅ์„  + ๋” ์‹ฌํ™”๋œ ๋‚ด์šฉ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

  • Hwisoo So โ†’ ์†Œํœ˜์ˆ˜ (๊ฐ•์˜์ž ์ด๋ฆ„) โ€” ์ด ๊ฐ•์˜๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ต์ˆ˜๋‹˜ ์ด๋ฆ„์ด๋‹ค.
  • Kyungpook National University โ†’ ๊ฒฝ๋ถ๋Œ€ํ•™๊ต โ€” ๊ฐ•์˜๊ฐ€ ์ง„ํ–‰๋˜๋Š” ์†Œ์† ๋Œ€ํ•™
  • COMP321 Compiler Spring 2026 โ†’ ์ปดํŒŒ์ผ๋Ÿฌ ๊ณผ๋ชฉ (COMP321), 2026๋…„ ๋ด„ํ•™๊ธฐ โ€” ์ด ๊ฐ•์˜๊ฐ€ ์–ด๋–ค ๊ณผ๋ชฉ์ด๊ณ  ์–ธ์ œ ์ง„ํ–‰๋˜๋Š”์ง€ ๋ช…์‹œ

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • ์ค‘์•™: Lexical Analysis โ€“ Part 3 (์ œ๋ชฉ)
  • ์•„๋ž˜: ๊ต์ˆ˜ ์ด๋ฆ„ + ํ•™๊ต ๋กœ๊ณ 

์ด ํŽ˜์ด์ง€๋Š” ์™„์ „ํ•œ ํ‘œ์ง€(slide cover)๋‹ค. ๋‚ด์šฉ ์—†์Œ, ๋‹จ์ˆœํžˆ ๊ฐ•์˜ ์‹œ์ž‘ ์•Œ๋ฆผ ์—ญํ• .


๐Ÿ“‹ 2ํŽ˜์ด์ง€ โ€” Outline (๊ฐ•์˜ ๊ฐœ์š”)

The role of a scanner โœ”

์Šค์บ๋„ˆ์˜ ์—ญํ•  โœ”
์ด๋ฏธ ๋ฐฐ์šด ๋‚ด์šฉ์ด๋ผ๋Š” ์˜๋ฏธ โ†’ scanner = ๋ฌธ์ž์—ด โ†’ ํ† ํฐ์œผ๋กœ ์ชผ๊ฐœ๋Š” ์—ญํ• 

Scanner concepts โœ”

์Šค์บ๋„ˆ ๊ฐœ๋… โœ”

Tokens, Lexemes, Patterns โ€” ํ•ต์‹ฌ 3๊ฐœ ๊ฐœ๋…

  • Lexeme: ์‹ค์ œ ๋ฌธ์ž์—ด (์˜ˆ: โ€œifโ€, โ€œabcโ€)
  • Token: ๋ถ„๋ฅ˜ (์˜ˆ: IF, ID ๋“ฑ)
  • Pattern: ๊ทœ์น™ (์ •๊ทœ์‹)

Regular Expression & Automata (to be continuedโ€ฆ)

์ •๊ทœํ‘œํ˜„์‹๊ณผ ์˜คํ† ๋งˆํƒ€ (๊ณ„์† ์ง„ํ–‰ ์˜ˆ์ •)
์ด ํŒŒํŠธ๊ฐ€ ์ง€๊ธˆ ๊ฐ•์˜์˜ ํ•ต์‹ฌ ์ฃผ์ œ

ํ•ญ๋ชฉ ์ƒํƒœ
Definitions of REs, DFAs and NFAs โœ” ์™„๋ฃŒ โ€” ์ด๋ฏธ ๋ฐฐ์šด ๋‚ด์šฉ
REs โ†’ NFA โœ” ์™„๋ฃŒ โ€” Thompson construction ์‚ฌ์šฉ
(Thompsonโ€™s construction, Algorithmโ€ฆ) ์ฑ… ๊ธฐ์ค€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฒˆํ˜ธ๊นŒ์ง€ ๋ช…์‹œ โ†’ ์‹œํ—˜ ๋ฒ”์œ„ ๋А๋‚Œ ๊ฐ•ํ•จ
NFA โ†’ DFA (subset constructionโ€ฆ) ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ โ†’ ์ดํ›„ ์Šฌ๋ผ์ด๋“œ์—์„œ ์ž์„ธํžˆ ๋‚˜์˜ด
DFA โ†’ minimal-state DFA ์ƒํƒœ ์ตœ์†Œํ™” โ†’ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋‹จ๊ณ„
Scanner generators ์ด๋ฒˆ ๊ฐ•์˜์—์„œ ์ƒˆ๋กœ ๋‚˜์˜ฌ ํ•ต์‹ฌ
Extra materials ์ถ”๊ฐ€ ์ž๋ฃŒ

Todayโ€™s agenda โ€” ์˜ค๋Š˜์˜ ๊ฐ•์˜ ๊ณ„ํš

์ด ํŽ˜์ด์ง€๋Š” ์ „์ฒด ๊ฐ•์˜ ํ๋ฆ„ ์ •๋ฆฌ โ†’ ์ง€๊ธˆ๋ถ€ํ„ฐ scanner generator ์ค‘์‹ฌ์œผ๋กœ ๊ฐ„๋‹ค


๐Ÿ“‹ 3ํŽ˜์ด์ง€ โ€” Scanner Generators

Scanner Generators

์Šค์บ๋„ˆ ์ƒ์„ฑ๊ธฐ

ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€: โ€œ์Šค์บ๋„ˆ๋Š” ์ง์ ‘ ์งค ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ž๋™ ์ƒ์„ฑ๋„ ๊ฐ€๋Šฅํ•˜๋‹คโ€

Scanners generated in C (C๋กœ ์ƒ์„ฑ๋˜๋Š” ์Šค์บ๋„ˆ)

  • lex (UNIX) โ†’ ๊ฐ€์žฅ ์ „ํ†ต์ ์ธ lexer ์ƒ์„ฑ๊ธฐ
  • flex (GNUโ€™s fast lex, UNIX) โ†’ lex์˜ ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฒ„์ „
  • Mks lex (MS-DOS, Windows, OS/2) โ†’ ์œˆ๋„์šฐ ๊ณ„์—ด

Scanners generated in Java (Java๋กœ ์ƒ์„ฑ๋˜๋Š” ์Šค์บ๋„ˆ)

  • Jlex (Princeton University) โ†’ ํ”„๋ฆฐ์Šคํ„ด ๋Œ€ํ•™
  • JFlex
  • JavaCC (SUN Microsystems โ†’ now, Oracle)

์•ž์œผ๋กœ๋Š” JLex ๊ธฐ์ค€์œผ๋กœ ์„ค๋ช… ์ง„ํ–‰


๐Ÿ“‹ 4ํŽ˜์ด์ง€ โ€” The Scanner Spec in JLex

The Scanner Spec in JLex

JLex์—์„œ์˜ ์Šค์บ๋„ˆ ๋ช…์„ธ

JLex ํŒŒ์ผ์€ ์„ธ ๋ถ€๋ถ„ ๊ตฌ์กฐ๋กœ ๋˜์–ด ์žˆ๋‹ค:

[์œ ์ € ์ฝ”๋“œ]
%%
[์„ ์–ธ]
%%
[์ •๊ทœ์‹ ๊ทœ์น™]
๊ตฌ๊ฐ„ ์˜๋ฏธ
user code ์‚ฌ์šฉ์ž ์ฝ”๋“œ โ€” ์—ฌ๊ธฐ์— ์ž‘์„ฑํ•œ ์ฝ”๋“œ๋Š” ๊ทธ๋Œ€๋กœ ์ƒ์„ฑ๋œ scanner ์ฝ”๋“œ์— ๋“ค์–ด๊ฐ
%% ๊ตฌ๋ถ„์ž
JLEX directives (declarations) JLex ์ง€์‹œ๋ฌธ (์„ ์–ธ) โ€” ๋ณ€์ˆ˜, ์ •์˜ ๋“ฑ ์ž‘์„ฑ
%% ๊ตฌ๋ถ„์ž
regular expression rules ์ •๊ทœํ‘œํ˜„์‹ ๊ทœ์น™ โ€” ํ•ต์‹ฌ ๋ถ€๋ถ„, ํ† ํฐ ์ •์˜ํ•˜๋Š” ๊ณณ

์ด ๊ตฌ์กฐ๋Š” ์‹œํ—˜์— ์ž์ฃผ ๋‚˜์˜ด


๐Ÿ“‹ 5ํŽ˜์ด์ง€ โ€” How a Scanner Generator Works

How a Scanner Generator Works

์Šค์บ๋„ˆ ์ƒ์„ฑ๊ธฐ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋Š”๊ฐ€

๋‚ด๋ถ€ ๋™์ž‘ ํ๋ฆ„ (๋งค์šฐ ์ค‘์š”)

RE โ†’ NFA โ†’ DFA โ†’ ์ตœ์†Œ DFA
  1. ์‚ฌ๋žŒ์ด ์ •๊ทœ์‹ ์ž‘์„ฑ
  2. ์ปดํ“จํ„ฐ๊ฐ€ ์ž๋™์œผ๋กœ NFA ์ƒ์„ฑ
  3. DFA๋กœ ๋ณ€ํ™˜
  4. ์ƒํƒœ ์ตœ์†Œํ™”
  5. ์ตœ์ข… scanner ์ฝ”๋“œ ์ƒ์„ฑ

์ด๊ฒŒ ์ปดํŒŒ์ผ๋Ÿฌ ์ด๋ก  ํ•ต์‹ฌ ํ๋ฆ„์ด๋‹ค.

๋‘ ๊ฐ€์ง€ ๊ตฌํ˜„ ๋ฐฉ์‹

๋ฐฉ์‹ ์„ค๋ช…
Table-driven code (JLex) ํ…Œ์ด๋ธ” ๊ธฐ๋ฐ˜ ์ฝ”๋“œ โ€” ์ž…๋ ฅ์— ๋Œ€ํ•ด DFA๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ์ƒํƒœ ์ „์ด๋ฅผ ํ…Œ์ด๋ธ”๋กœ ๊ด€๋ฆฌ
Hard-wired code (Direct-coded) ํ•˜๋“œ์ฝ”๋”ฉ ๋ฐฉ์‹ โ€” ์šฐ๋ฆฌ๊ฐ€ ์ง์ ‘ if๋ฌธ์œผ๋กœ ๋งŒ๋“  scanner, like our hand-crafted scanner for Assignment #1
  • Assignment #1 = Scanner + Parser โ†’ ๊ณผ์ œ1์€ ์Šค์บ๋„ˆ + ํŒŒ์„œ์˜€๋‹ค

์ด ํŽ˜์ด์ง€ ํ•ต์‹ฌ ์š”์•ฝ

์Šค์บ๋„ˆ ๋งŒ๋“œ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•: ์ž๋™ ์ƒ์„ฑ(JLex) vs ์ง์ ‘ ๊ตฌํ˜„(Assignment)
๋‚ด๋ถ€ ์›๋ฆฌ๋Š” ๋™์ผ: RE โ†’ NFA โ†’ DFA โ†’ ์ตœ์†Œ DFA


๐Ÿ“‹ 6ํŽ˜์ด์ง€ โ€” Table-driven vs. Directed-coded Scanners

Table-driven vs. Directed-coded Scanners

ํ…Œ์ด๋ธ” ๊ธฐ๋ฐ˜ vs ์ง์ ‘ ์ฝ”๋“œ ๊ธฐ๋ฐ˜ ์Šค์บ๋„ˆ

Direct-coded ๋ฐฉ์‹ ์˜ˆ์‹œ (์ฝ”๋“œ)

switch (currentChar) {
    case '+': return token representation for '+';
    case '<':
        takeIt();
        if (currentChar == '=')
            return token representation for '<=';
        else
            return token representation for '<';
}
  • case '+': โ€˜+โ€™์ด๋ฉด + ํ† ํฐ ๋ฐ˜ํ™˜ โ†’ ์ง์ ‘ ๊ตฌํ˜„ ๋ฐฉ์‹, ๋ฌธ์ž ํ•˜๋‚˜ ๋ณด๊ณ  ๋ฐ”๋กœ ํ† ํฐ ๊ฒฐ์ •
  • case '<': โ€˜<โ€™๋ฅผ ์ฝ๊ณ  ๋‹ค์Œ ๋ฌธ์ž๊ฐ€ โ€˜=โ€™์ด๋ฉด โ€œ<=โ€ ์•„๋‹ˆ๋ฉด โ€œ<โ€

Direct-coded ๋ฐฉ์‹์˜ ๋Œ€ํ‘œ ์˜ˆ์‹œ: < ์ฝ์Œ โ†’ ๋‹ค์Œ ๋ฌธ์ž ํ™•์ธ(lookahead) โ†’ ์กฐ๊ฑด๋ฌธ์œผ๋กœ ์ฒ˜๋ฆฌ
์ฆ‰, ์ฝ”๋“œ ํ๋ฆ„ ์ž์ฒด๊ฐ€ ์ƒํƒœ ๋จธ์‹  ์—ญํ• ์„ ํ•จ

์™ผ์ชฝ ํ‘œ โ€” Table encoding RE (DFA transition table)

  • ์ƒํƒœ: s0, s1, s2, se
  • ์ž…๋ ฅ: 0~9, ๊ธฐํƒ€ ๋ฌธ์ž
  • ฮด(state, input) = next state
    ์˜ˆ: s0์—์„œ ์ˆซ์ž โ†’ s1 / s1์—์„œ ์ˆซ์ž โ†’ s2 / ์™„์ „ํžˆ ์ž๋™ ์ƒ์„ฑ๋œ ํ˜•ํƒœ

์˜ค๋ฅธ์ชฝ ์ฝ”๋“œ โ€” Skeleton recognizer (Table-driven scanner ํ•ต์‹ฌ ๊ตฌ์กฐ)

Char โ† next character
State โ† s0
while (Char โ‰  EOF)
    State โ† ฮด(State, Char)
    Char โ† next character
if (State is a final state)
    then report success
else report failure
  • ๋ฌธ์ž ํ•˜๋‚˜ ์ฝ๊ณ  ์ƒํƒœ ๊ฐฑ์‹  โ†’ EOF๊นŒ์ง€ ๋ฐ˜๋ณต โ†’ ๋งˆ์ง€๋ง‰ ์ƒํƒœ๊ฐ€ final์ด๋ฉด ์„ฑ๊ณต
  • ์ƒํƒœ ์ „์ด๋Š” ํ•จ์ˆ˜ ฮด๋กœ ์ฒ˜๋ฆฌ, ์‹ค์ œ ๊ตฌํ˜„์€ โ€œํ…Œ์ด๋ธ” lookupโ€

๐Ÿ”ฅ ํ•ต์‹ฌ ๋น„๊ต

๋ฐฉ์‹ ํŠน์ง•
Table-driven ํ…Œ์ด๋ธ” ๊ธฐ๋ฐ˜ DFA
Direct-coded if/switch ๊ธฐ๋ฐ˜

๐Ÿ“‹ 7ํŽ˜์ด์ง€ โ€” Ease of Implementation & Performance

Ease of Implementation & Performance

๊ตฌํ˜„ ๋‚œ์ด๋„์™€ ์„ฑ๋Šฅ

Table-driven scanner

โ€œํ…Œ์ด๋ธ” ๊ธฐ๋ฐ˜ ์Šค์บ๋„ˆ๋Š” ์˜คํ† ๋งˆํƒ€๋ฅผ ํฐ ํ…Œ์ด๋ธ”๋กœ ํ‘œํ˜„ํ•œ๋‹คโ€

  • Costly lookup โ†’ ์ƒํƒœ + ์ž…๋ ฅ โ†’ ํ…Œ์ด๋ธ” index ๊ณ„์‚ฐ ํ•„์š”, cache ํšจ์œจ ์•ˆ ์ข‹์Œ
  • Easy, automated way to implement โ†’ JLex ๊ฐ™์€ ๋„๊ตฌ๊ฐ€ ์ด ๋ฐฉ์‹
  • Hard to implement manually โ†’ ์ง์ ‘ ๊ตฌํ˜„์€ ์–ด๋ ต๋‹ค

Direct-coded scanner

โ€œ์ง์ ‘ ๊ตฌํ˜„ ์Šค์บ๋„ˆ๋Š” ํ”„๋กœ๊ทธ๋žจ ํ๋ฆ„ ์ž์ฒด๋กœ ์˜คํ† ๋งˆํƒ€๋ฅผ ํ‘œํ˜„ํ•œ๋‹คโ€

  • if๋ฌธ / switch๋ฌธ ์ž์ฒด๊ฐ€ ์ƒํƒœ ๋จธ์‹ 
  • Automated and manual implementations feasible โ†’ ์ž๋™ ์ƒ์„ฑ๋„ ๊ฐ€๋Šฅํ•˜๊ณ  ์ง์ ‘ ๊ตฌํ˜„๋„ ๊ฐ€๋Šฅ
  • No table and thus no table lookup needed โ†’ ํ…Œ์ด๋ธ”์ด ์—†์–ด์„œ lookup ํ•„์š” ์—†์Œ
  • More complex control flow โ†’ ์ œ์–ด ํ๋ฆ„์ด ๋” ๋ณต์žกํ•จ
  • Ok for architectures with good branch prediction โ†’ x86 ๊ฐ™์€ CPU์—์„œ๋Š” ๊ดœ์ฐฎ๋‹ค
  • Tends to be faster than table-driven scanner โ†’ ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ๋น ๋ฆ„

๐Ÿ”ฅ ํ•ต์‹ฌ ์ •๋ฆฌ

๋ฐฉ์‹ ํŠน์ง•
Table-driven ๊ตฌํ˜„ ์‰ฌ์›€, ๋А๋ฆผ
Direct-coded ๊ตฌํ˜„ ์–ด๋ ค์›€, ๋น ๋ฆ„

๐Ÿ“‹ 8ํŽ˜์ด์ง€ โ€” JLex Example Spec

JLex Example Spec

JLex ์˜ˆ์ œ ๋ช…์„ธ

%%
LETTER=[a-zA-Z_]
DIGIT=[0-9]

"if" { return new Token(Token.IF, "if", src_pos); }
"<"  { ... }
"<=" { ... }
{LETTER}({LETTER}|{DIGIT})* { return new Token(Token.ID, "spelling", src_pos); }
  • LETTER=[a-zA-Z_] โ†’ ์•ŒํŒŒ๋ฒณ + ์–ธ๋”๋ฐ”
  • DIGIT=[0-9] โ†’ ์ˆซ์ž
  • โ€œifโ€ { return new Token(Token.IF, โ€ฆ) } โ†’ keyword ์ฒ˜๋ฆฌ, โ€œifโ€๊ฐ€ ๋‚˜์˜ค๋ฉด IF ํ† ํฐ ๋ฐ˜ํ™˜
  • โ€<โ€ โ†’ LESS ํ† ํฐ
  • โ€<=โ€ โ†’ LESS_EQ ํ† ํฐ
  • **{LETTER}({LETTER} {DIGIT})*** โ†’ ๋ฌธ์ž๋กœ ์‹œ์ž‘ํ•˜๊ณ  ์ดํ›„ ๋ฌธ์ž/์ˆซ์ž ๋ฐ˜๋ณต โ†’ identifier ์ •์˜
  • return new Token(Token.ID, โ€ฆ) โ†’ ID ํ† ํฐ ๋ฐ˜ํ™˜

๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ทœ์น™ (Two rules)

  1. ์ฒซ ๋ฒˆ์งธ ํŒจํ„ด ์šฐ์„  (์šฐ์„ ์ˆœ์œ„ = ์œ„์— ์žˆ๋Š” ๊ทœ์น™)
    ์—ฌ๋Ÿฌ ํŒจํ„ด์ด ๋งค์นญ๋˜๋ฉด ์ฒซ ๋ฒˆ์งธ ์‚ฌ์šฉ โ†’ ์˜ˆ: if๋Š” identifier๊ฐ€ ์•„๋‹ˆ๋ผ keyword(IF)๋กœ ์ฒ˜๋ฆฌ

  2. Longest match (๊ฐ€์žฅ ๊ธด ๋ฌธ์ž์—ด ๋งค์นญ)
    <= vs < โ†’ <= ์„ ํƒ๋จ


๐Ÿ“‹ 9ํŽ˜์ด์ง€ โ€” NFA for JLex Example Spec

NFA for JLex Example Spec

JLex ์˜ˆ์ œ์˜ NFA

โ€œThe NFAs for the different REs are combined as aboveโ€
๊ฐ ์ •๊ทœ์‹์˜ NFA๋ฅผ ํ•ฉ์นœ๋‹ค

๊ตฌ์กฐ:

s0
โ”œโ”€ ฮต โ†’ if ํŒจํ„ด NFA
โ”œโ”€ ฮต โ†’ < ํŒจํ„ด NFA
โ”œโ”€ ฮต โ†’ <= ํŒจํ„ด NFA
โ””โ”€ ฮต โ†’ id ํŒจํ„ด NFA

ฮต-transition์œผ๋กœ ์—ฐ๊ฒฐ โ†’ ํ•˜๋‚˜์˜ ํฐ NFA๋กœ ํ•ฉ์นจ โ†’ ์ดํ›„ DFA๋กœ ๋ณ€ํ™˜๋จ

โ€œInstead of an NFA, a DFA can also be usedโ€ โ†’ NFA ๋Œ€์‹  DFA๋„ ๊ฐ€๋Šฅ

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • ์‹œ์ž‘ ์ƒํƒœ s0์—์„œ ์—ฌ๋Ÿฌ ฮต transition์œผ๋กœ ๊ฐ ํŒจํ„ด๋ณ„ NFA๋กœ ๋ถ„๊ธฐ

๐Ÿ“‹ 10ํŽ˜์ด์ง€ โ€” NFA for JLex Example Spec (detail)

NFA for JLex Example Spec (detail)

JLex ์˜ˆ์ œ NFA ์ƒ์„ธ

ํŒจํ„ด ์ƒํƒœ ์ „์ด ์„ค๋ช…
if s1 โ†’ s2 โ†’ s3 i โ†’ f โ†’ โ€œifโ€ ์ธ์‹
< s4 โ†’ s5 โ€<โ€
<= s6 โ†’ s7 โ†’ s8 โ€<=โ€
identifier s9 โ†’ s10 (loop) LETTER โ†’ (LETTER or DIGIT ๋ฐ˜๋ณต)

๐Ÿ”ฅ ํ•ต์‹ฌ ๊ตฌ์กฐ

  • ํ•˜๋‚˜์˜ ์‹œ์ž‘ ์ƒํƒœ s0์—์„œ ฮต๋กœ ๊ฐ ํŒจํ„ด ์‹œ์ž‘
  • ๊ฐ ํŒจํ„ด NFA ๋”ฐ๋กœ ์กด์žฌ
  • ์ตœ์ข…์ ์œผ๋กœ: ๋ชจ๋“  ํ† ํฐ์„ ํ•˜๋‚˜์˜ NFA๋กœ ํ•ฉ์นœ ๊ฒƒ

๐Ÿ”ฅ 6~10ํŽ˜์ด์ง€ ํ•ต์‹ฌ ์š”์•ฝ

์ด ํŒŒํŠธ ํ•ต์‹ฌ 3๊ฐœ:

  1. ์Šค์บ๋„ˆ ๊ตฌํ˜„ ๋ฐฉ์‹ 2๊ฐ€์ง€: Table-driven / Direct-coded
  2. JLex ๊ทœ์น™: ์œ„์— ์žˆ๋Š” ๊ทœ์น™ ์šฐ์„  / longest match
  3. NFA ๊ตฌ์„ฑ: ๊ฐ ํŒจํ„ด NFA ์ƒ์„ฑ โ†’ ฮต๋กœ ํ•ฉ์ณ์„œ ํ•˜๋‚˜์˜ NFA

๐Ÿ“‹ 11ํŽ˜์ด์ง€ โ€” Limitations of Regular Languages

Limitations of Regular Languages

์ •๊ทœ์–ธ์–ด์˜ ํ•œ๊ณ„

์ด ํŽ˜์ด์ง€๋ถ€ํ„ฐ๋Š” ์•„์ฃผ ์ค‘์š”ํ•˜๋‹ค. ์ง€๊ธˆ๊นŒ์ง€๋Š” ์ •๊ทœํ‘œํ˜„์‹(RE)๊ณผ ์˜คํ† ๋งˆํƒ€๊ฐ€ scanner๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ์–ผ๋งˆ๋‚˜ ์œ ์šฉํ•œ์ง€๋ฅผ ๋ดค๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์—ฌ๊ธฐ์„œ๋Š” ๋ฐ˜๋Œ€๋กœ, โ€œ์ •๊ทœํ‘œํ˜„์‹์ด ๊ทธ๋ ‡๊ฒŒ ์ข‹๋‹ค๋ฉด, ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด ์ „์ฒด๋ฅผ ๋‹ค ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋˜์ง€ ์•Š๋‚˜?โ€๋ผ๋Š” ์งˆ๋ฌธ์— ๋‹ตํ•˜๋ ค๊ณ  ํ•œ๋‹ค.

์ด ์Šฌ๋ผ์ด๋“œ์˜ ํ•ต์‹ฌ ํ๋ฆ„:

  • RE๋Š” ๊ฐ•๋ ฅํ•˜๋‹ค
  • ํ•˜์ง€๋งŒ ๋ชจ๋“  ๊ฑธ ํ‘œํ˜„ํ•  ์ˆ˜๋Š” ์—†๋‹ค
  • ๊ทธ๋ž˜์„œ ์ปดํŒŒ์ผ๋Ÿฌ์—๋Š” scanner ๋ง๊ณ  parser๊ฐ€ ๋”ฐ๋กœ ํ•„์š”ํ•˜๋‹ค

Advantages of Regular Expressions

์ •๊ทœํ‘œํ˜„์‹์˜ ์žฅ์ 

Simple & powerful notation for specifying patterns

ํŒจํ„ด์„ ๋ช…์‹œํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ์ˆœํ•˜๋ฉด์„œ๋„ ๊ฐ•๋ ฅํ•œ ํ‘œ๊ธฐ๋ฒ•์ด๋‹ค.

์ •๊ทœํ‘œํ˜„์‹์€ ์งง์€ ๋ฌธ๋ฒ•์œผ๋กœ ๋งŽ์€ ๋ฌธ์ž์—ด ์ง‘ํ•ฉ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:

  • [0-9]+ โ†’ ์ˆซ์ž๊ฐ€ ํ•˜๋‚˜ ์ด์ƒ
  • [a-zA-Z_][a-zA-Z0-9_]* โ†’ identifier ํ˜•ํƒœ

์ฆ‰, RE์˜ ์žฅ์ ์€ โ€œ๋ฌธ์ž์—ด ํŒจํ„ด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์••์ถ•์ ์ด๊ณ  ๋ช…ํ™•ํ•˜๋‹คโ€๋Š” ๊ฒƒ์ด๋‹ค.

Automatic construction of fast recognizers (scanners)

๋น ๋ฅธ ์ธ์‹๊ธฐ(์Šค์บ๋„ˆ)๋ฅผ ์ž๋™์œผ๋กœ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

RE๋ฅผ ์จ ๋†“์œผ๋ฉด ๊ฑฐ๊ธฐ์„œ ๋๋‚˜๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ:

RE โ†’ NFA โ†’ DFA โ†’ ์ตœ์†Œ DFA โ†’ scanner code ์ƒ์„ฑ

์ด๋Ÿฐ ๊ณผ์ •์„ ํ†ตํ•ด ์‹ค์ œ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์Šค์บ๋„ˆ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, ์ •๊ทœํ‘œํ˜„์‹์€ ์ด๋ก ์šฉ ๊ธฐํ˜ธ๊ฐ€ ์•„๋‹ˆ๋ผ ์ž๋™ํ™” ๋„๊ตฌ(JLex, flex ๋“ฑ)์™€ ๋ฐ”๋กœ ์—ฐ๊ฒฐ๋˜๋Š” ์‹ค์šฉ์ ์ธ ํ‘œํ˜„ ๋ฐฉ์‹์ด๋‹ค.

Many patterns can be specified with REs

๋งŽ์€ ํŒจํ„ด๋“ค์„ ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ๋ช…์‹œํ•  ์ˆ˜ ์žˆ๋‹ค.

scanner ๋‹จ๊ณ„์—์„œ ๋‹ค๋ฃฐ ๋Œ€๋ถ€๋ถ„์˜ ํ† ํฐ๋“ค์€ RE๋กœ ์ถฉ๋ถ„ํžˆ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค:

  • ํ‚ค์›Œ๋“œ: if, while, return
  • identifier
  • ์ •์ˆ˜ literal, ์‹ค์ˆ˜ literal
  • ์—ฐ์‚ฐ์ž: +, -, <=
  • ๊ตฌ๋‘์ : ;, (, )

Example โ€” an expression grammar

์˜ˆ์‹œ โ€” ์‹(expression) ๋ฌธ๋ฒ•

๊ต์ˆ˜๋‹˜์ด ์ผ๋ถ€๋Ÿฌ expression์„ ์˜ˆ๋กœ ๋“ ๋‹ค. ์™œ๋ƒํ•˜๋ฉด expression์€ ๋”ฑ RE๋กœ๋Š” ๋ถ€์กฑํ•˜๊ณ  CFG๊ฐ€ ํ•„์š”ํ•œ ๋Œ€ํ‘œ ์‚ฌ๋ก€์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Term: [a-zA-Z] ([a-zA-Z] | [0-9])*

ํ•ญ(Term): ์•ŒํŒŒ๋ฒณ ํ•˜๋‚˜๋กœ ์‹œ์ž‘ํ•˜๊ณ , ๊ทธ ๋’ค์— ์•ŒํŒŒ๋ฒณ์ด๋‚˜ ์ˆซ์ž๊ฐ€ 0๊ฐœ ์ด์ƒ ์˜ค๋Š” ํ˜•ํƒœ

๊ตฌ์กฐ: ์ฒซ ๊ธ€์ž = ์•ŒํŒŒ๋ฒณ, ์ดํ›„ = ์•ŒํŒŒ๋ฒณ ๋˜๋Š” ์ˆซ์ž ๋ฐ˜๋ณต
โ†’ abc, a1, x99 ๊ฐ™์€ ๋ฌธ์ž์—ด. ์ด๊ฑด ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ์ž˜ ์ •์˜๋˜๋Š” ๋ถ€๋ถ„์ด๋‹ค.

Op: + | - | * | /

์—ฐ์‚ฐ์ž: + ๋˜๋Š” - ๋˜๋Š” * ๋˜๋Š” /
์ด๊ฒƒ๋„ ์—ญ์‹œ RE๋กœ ํ‘œํ˜„์ด ์‰ฝ๋‹ค.

Expr: ( Term Op )* Term

์‹: (Term Op)๊ฐ€ 0๋ฒˆ ์ด์ƒ ๋ฐ˜๋ณต๋˜๊ณ  ๋งˆ์ง€๋ง‰์— Term ํ•˜๋‚˜๊ฐ€ ์˜ค๋Š” ํ˜•ํƒœ

ํ‘œ๋ฉด์ ์œผ๋กœ๋Š” ๊ทธ๋Ÿด๋“ฏํ•ด ๋ณด์ด์ง€๋งŒ, ๊ด„ํ˜ธ๊ฐ€ ์ค‘์ฒฉ๋˜๋Š” ์ผ๋ฐ˜์ ์ธ expression ์ „์ฒด๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด:

  • (a+b), ((a+b)*c), a*(b+c) ๊ฐ™์€ ๊ตฌ์กฐ๋Š” ๋‹จ์ˆœํžˆ (Term Op)* Term๋กœ๋Š” ์•ˆ ๋œ๋‹ค.
  • ์™œ๋ƒํ•˜๋ฉด Term ์•ˆ์— ๋˜ Expr๊ฐ€ ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
  • ์ฆ‰, expression์˜ ์ง„์งœ ๋ณธ์งˆ์€ ์žฌ๊ท€ ๊ตฌ์กฐ์ธ๋ฐ, RE๋Š” ๊ทธ๋Ÿฐ ์žฌ๊ท€์  ์ค‘์ฒฉ์„ ์ œ๋Œ€๋กœ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•œ๋‹ค.

Of course, this would generate a DFAโ€ฆ

โ€œ๊ทธ๋ž˜, ์ด๋Ÿฐ ๋‹จ์ˆœํ™”๋œ ํ˜•ํƒœ๋Š” DFA๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์ง€. ๊ทธ๋Ÿฐ๋ฐ ์ง„์งœ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ์‹ ์ „์ฒด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋А๋ƒ? ๊ทธ๊ฑด ์•„๋‹ˆ๋‹ค.โ€


If REs are so useful โ€ฆ Why not use them for everything?

์ •๊ทœํ‘œํ˜„์‹์ด ๊ทธ๋ ‡๊ฒŒ ์œ ์šฉํ•˜๋‹ค๋ฉด โ€” ์™œ ๋ชจ๋“  ๊ฒƒ์— ์ •๊ทœํ‘œํ˜„์‹์„ ์“ฐ์ง€ ์•Š๋Š”๊ฐ€?

์ด ๋ฌธ์žฅ์ด ์ด ํŽ˜์ด์ง€์˜ ํ•ต์‹ฌ ์งˆ๋ฌธ์ด๋‹ค. ์ด์ œ๋ถ€ํ„ฐ๋Š” RE์˜ ํ‘œํ˜„๋ ฅ ํ•œ๊ณ„(expressive power)๋ฅผ ์„ค๋ช…ํ•˜๊ฒ ๋‹ค๋Š” ์˜ˆ๊ณ ๋‹ค.


๐Ÿ“‹ 12ํŽ˜์ด์ง€ โ€” Limitations of Regular Languages (cont.)

Limitations of Regular Languages (cont.)

์ •๊ทœ์–ธ์–ด์˜ ํ•œ๊ณ„ (๊ณ„์†)

Regular expressions are limited in what can be expressed.

์ •๊ทœํ‘œํ˜„์‹์€ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

ํ•ต์‹ฌ: RE๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ์–ธ์–ด / RE๋กœ ์ ˆ๋Œ€ ํ‘œํ˜„ ๋ถˆ๊ฐ€ํ•œ ์–ธ์–ด โ€” ์ด๊ฑด โ€œ์‹ค๋ ฅ ์ฐจ์ดโ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œํ˜•์‹ ์ž์ฒด์˜ ํ•œ๊ณ„โ€๋‹ค.


Example: 0โฟ1โฟ

์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ‘œํ˜„ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋Œ€ํ‘œ ์˜ˆ์‹œ

์ด ์–ธ์–ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค: 01 / 0011 / 000111 / 00001111 โ€ฆ
์กฐ๊ฑด: ์•ž์—๋Š” 0๋งŒ / ๋’ค์—๋Š” 1๋งŒ / 0์˜ ๊ฐœ์ˆ˜์™€ 1์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ •ํ™•ํžˆ ๊ฐ™์•„์•ผ ํ•จ

๋ฌธ์ œ๋Š” ๋งˆ์ง€๋ง‰ ์กฐ๊ฑด์ด๋‹ค. RE๋‚˜ DFA๋Š” ๋ฌธ์ž์—ด์„ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ๋ณด๋ฉด์„œ ํ˜„์žฌ ์ƒํƒœ๋งŒ ๊ธฐ์–ตํ•œ๋‹ค. 0์ด ๋ช‡ ๊ฐœ ๋‚˜์™”๋Š”์ง€ ๋ฌดํ•œํžˆ ์ •ํ™•ํ•˜๊ฒŒ ๊ธฐ์–ตํ•ด ๋‘์—ˆ๋‹ค๊ฐ€ ๋’ค์˜ 1 ๊ฐœ์ˆ˜์™€ ๋งž์ถฐ์•ผ ํ•˜๋ฏ€๋กœ, ์œ ํ•œํ•œ ์ƒํƒœ๋งŒ ๊ฐ€์ง„ DFA/์ •๊ทœ์–ธ์–ด ์ฒด๊ณ„๋กœ๋Š” ์•ˆ ๋œ๋‹ค.


For the same reason, we cannot specify the set of arithmetic expressions for which left and right parentheses match.

๊ฐ™์€ ์ด์œ ๋กœ, ์™ผ์ชฝ ๊ด„ํ˜ธ์™€ ์˜ค๋ฅธ์ชฝ ๊ด„ํ˜ธ๊ฐ€ ์„œ๋กœ ์ •ํ™•ํžˆ ๋งž๋Š” ์‚ฐ์ˆ ์‹๋“ค์˜ ์ง‘ํ•ฉ๋„ ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ๋ช…์‹œํ•  ์ˆ˜ ์—†๋‹ค.

์˜ˆ: (a) ๋งž์Œ / ((a)) ๋งž์Œ / (() ์•ˆ ๋งž์Œ / ()) ์•ˆ ๋งž์Œ

์ด โ€œ๊ด„ํ˜ธ ์ง ๋งž์ถ”๊ธฐโ€ ๋ฌธ์ œ๋Š” 0โฟ1โฟ์™€ ๋ณธ์งˆ์ด ๊ฐ™๋‹ค:

  • ์—ด๋ฆฐ ๊ด„ํ˜ธ (๋ฅผ ๋ช‡ ๊ฐœ ๋ดค๋Š”์ง€ ๊ธฐ์–ตํ•ด์•ผ ํ•˜๊ณ 
  • ๋‹ซํžŒ ๊ด„ํ˜ธ )๊ฐ€ ๋‚˜์˜ฌ ๋•Œ ํ•˜๋‚˜์”ฉ ๋Œ€์‘์‹œ์ผœ์•ผ ํ•˜๋ฉฐ
  • ์ค‘๊ฐ„์— ์ค‘์ฒฉ๋„ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค

์ฆ‰, RE๋Š” โ€œ์ด ๊ธ€์ž ๋‹ค์Œ์—” ์ € ๊ธ€์žโ€ ๊ฐ™์€ ํ‰๋ฉด์ ์ธ ํŒจํ„ด์—๋Š” ๊ฐ•ํ•˜์ง€๋งŒ, โ€œ์•ž์—์„œ ์—ด์–ด ๋†“์€ ๊ตฌ์กฐ๋ฅผ ๋’ค์—์„œ ์ •ํ™•ํžˆ ๋‹ซ์•„์•ผ ํ•œ๋‹คโ€ ๊ฐ™์€ ๊ณ„์ธต์  ๊ตฌ์กฐ๋Š” ๋‹ค๋ฃจ์ง€ ๋ชปํ•œ๋‹ค.

์˜ˆ์‹œ: (a), ((a)), (((a))), ((((a)))) โ€” ์ค‘์ฒฉ ๊นŠ์ด๊ฐ€ ๋ฌดํ•œํžˆ ์ฆ๊ฐ€ ๊ฐ€๋Šฅ


Regular expressions only usable to specify keywords, identifiers, literals, operators and punctuation characters of a programming language.

์ •๊ทœํ‘œํ˜„์‹์€ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ํ‚ค์›Œ๋“œ, ์‹๋ณ„์ž, ๋ฆฌํ„ฐ๋Ÿด, ์—ฐ์‚ฐ์ž, ๊ตฌ๋‘์  ๋ฌธ์ž๋ฅผ ๋ช…์‹œํ•˜๋Š” ๋ฐ์—๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด ๋ฌธ์žฅ์€ ์ปดํŒŒ์ผ๋Ÿฌ ๊ตฌ์กฐ๋ฅผ ์ •ํ™•ํžˆ ๋‚˜๋ˆ ์ค€๋‹ค:

  • RE๊ฐ€ ๋งก๋Š” ์ผ = lexical level (ํ† ํฐ ํ•˜๋‚˜ํ•˜๋‚˜์˜ ๋ชจ์–‘)

For everything else (expressions, statements, nested statements, โ€ฆ) we need a more powerful mechanism

๊ทธ ๋ฐ–์˜ ๋ชจ๋“  ๊ฒƒ๋“ค์„ ์œ„ํ•ด์„œ๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ํ•„์š”ํ•˜๋‹ค.

์—ฌ๊ธฐ์„œ โ€œmore powerful mechanismโ€์ด ๋ฐ”๋กœ CFG์™€ parser๋‹ค. ์™œ๋ƒํ•˜๋ฉด:

  • expression ์•ˆ์— expression ๋“ค์–ด๊ฐ
  • if๋ฌธ ์•ˆ์— statement ๋“ค์–ด๊ฐ
  • while๋ฌธ ์•ˆ์— block ๋“ค์–ด๊ฐ
  • block ์•ˆ์— ๋˜ if/while ๋“ค์–ด๊ฐ

๊ทธ๋ž˜์„œ scanner ๋‹ค์Œ์— parser๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•˜๋‹ค.


๐Ÿ“‹ 13ํŽ˜์ด์ง€ โ€” Regular Expressions and Context-Free Grammars

Regular Expressions and Context-Free Grammars

์ •๊ทœํ‘œํ˜„์‹๊ณผ ๋ฌธ๋งฅ์ž์œ ๋ฌธ๋ฒ•

์—ฌ๊ธฐ์„œ๋Š” RE์™€ CFG๋ฅผ ๋‹จ์ˆœ ๋„๊ตฌ ๋น„๊ต๊ฐ€ ์•„๋‹ˆ๋ผ ํ˜•์‹์–ธ์–ด์˜ ํ‘œํ˜„๋ ฅ ๊ด€์ ์—์„œ ๊ตฌ๋ถ„ํ•œ๋‹ค.


โ€œThe โ€˜languagesโ€™ that can be defined by REs and Context-free grammars (CFGs) have been extensively studied by theoretical computer scientists.โ€

์ •๊ทœํ‘œํ˜„์‹๊ณผ ๋ฌธ๋งฅ์ž์œ ๋ฌธ๋ฒ•์œผ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋Š” โ€œ์–ธ์–ด๋“คโ€์€ ์ด๋ก  ์ปดํ“จํ„ฐ๊ณผํ•™์ž๋“ค์— ์˜ํ•ด ๋งค์šฐ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค.

์—ฌ๊ธฐ์„œ โ€œlanguageโ€๋Š” ์ž์—ฐ์–ด๊ฐ€ ์•„๋‹ˆ๋ผ ๋ฌธ์ž์—ด ์ง‘ํ•ฉ์ด๋ผ๋Š” ๋œป์ด๋‹ค: identifier๋“ค์˜ ์ง‘ํ•ฉ / ์˜ฌ๋ฐ”๋ฅธ arithmetic expression๋“ค์˜ ์ง‘ํ•ฉ / ๊ด„ํ˜ธ๊ฐ€ ์ง ๋งž๋Š” ๋ฌธ์ž์—ด๋“ค์˜ ์ง‘ํ•ฉ ๋“ฑ.


RE are a โ€œweakerโ€ formalism than CFGs

์ •๊ทœํ‘œํ˜„์‹์€ CFG๋ณด๋‹ค โ€œ๋” ์•ฝํ•œโ€ ํ˜•์‹ ์ฒด๊ณ„์ด๋‹ค.

์—ฌ๊ธฐ์„œ โ€œweakerโ€๋Š” ์„ฑ๋Šฅ์ด ๋‚˜์˜๋‹ค๊ฑฐ๋‚˜ ์“ธ๋ชจ์—†๋‹ค๋Š” ๋œป์ด ์•„๋‹ˆ๋‹ค. ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ์–ธ์–ด์˜ ๋ฒ”์œ„๊ฐ€ ๋” ์ข๋‹ค๋Š” ๋œป์ด๋‹ค.

RE๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ์–ธ์–ด โŠ‚ CFG๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ์–ธ์–ด

Any language expressible by a RE can be expressed by a CFG but not the other way around!

์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์–ธ์–ด๋Š” CFG๋กœ๋„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ทธ ๋ฐ˜๋Œ€๋Š” ์„ฑ๋ฆฝํ•˜์ง€ ์•Š๋Š”๋‹ค.

  • regular language โ†’ CFG๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ
  • ํ•˜์ง€๋งŒ ๋ชจ๋“  CFG language๊ฐ€ regular์ธ ๊ฑด ์•„๋‹˜
  • ์˜ˆ: identifier ์ง‘ํ•ฉ โ†’ RE ๊ฐ€๋Šฅ, CFG๋„ ๊ฐ€๋Šฅ / ๊ด„ํ˜ธ ์ง ๋งž๋Š” ๋ฌธ์ž์—ด โ†’ CFG ๊ฐ€๋Šฅ, RE ๋ถˆ๊ฐ€

์ฆ‰ CFG๊ฐ€ strictly more powerful ํ•˜๋‹ค. ๊ทธ๋ž˜์„œ parser๋Š” scanner๋ณด๋‹ค ๋” ๊ฐ•ํ•œ ํ˜•์‹์„ ์‚ฌ์šฉํ•œ๋‹ค.


The languages expressible as RE are called regular languages

์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ์–ธ์–ด๋“ค์„ ์ •๊ทœ์–ธ์–ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

  • RE๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ = DFA/NFA๋กœ ์ธ์‹ ๊ฐ€๋Šฅ = regular language

Generally: a language that exhibits โ€œself embeddingโ€ cannot be expressed by a RE.

์ผ๋ฐ˜์ ์œผ๋กœ โ€œ์ž๊ธฐ ๋‚ดํฌ(self embedding)โ€๋ฅผ ๋ณด์ด๋Š” ์–ธ์–ด๋Š” ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋‹ค.

self embedding์ด๋ž€ ๊ตฌ์กฐ ์•ˆ์— ๊ฐ™์€ ์ข…๋ฅ˜์˜ ๊ตฌ์กฐ๊ฐ€ ๋‹ค์‹œ ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์ด๋‹ค:

  • expression ์•ˆ์— expression
  • statement ์•ˆ์— statement
  • parentheses ์•ˆ์— parentheses

์ฆ‰, โ€œ๋‚˜์™€ ๊ฐ™์€ ์ข…๋ฅ˜์˜ ๊ตฌ์กฐ๊ฐ€ ๋‚ด ์•ˆ์— ๋˜ ๋“ค์–ด์˜จ๋‹คโ€๋Š” ์žฌ๊ท€์  ์„ฑ์งˆ์ด๋‹ค.
์˜ˆ: expr โ†’ (expr)

RE๋Š” ์ด๋Ÿฐ ์ž๊ธฐ ๋‚ดํฌ๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๋ฐ˜๋ฉด ์ž๊ธฐ ๋‚ดํฌ๋Š” ๊นŠ์ด๊ฐ€ ๋ฌดํ•œํžˆ ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ์š”๊ตฌํ•œ๋‹ค.


Programming languages exhibit self embedding.

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋Š” ์ž๊ธฐ ๋‚ดํฌ๋ฅผ ๋ณด์ธ๋‹ค.

์ด๊ฑด ๊ณง, ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋Š” RE๋งŒ์œผ๋กœ๋Š” ๋‹ค๋ฃฐ ์ˆ˜ ์—†๋‹ค๋Š” ๋ง์ด๋‹ค.
์ฆ‰, ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋Š” ๋ณธ์งˆ์ ์œผ๋กœ parser๊ฐ€ ํ•„์š”ํ•˜๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ๋‹ค.


Example: an expression can contain another expression

ํ•˜๋‚˜์˜ ์‹์€ ๋˜ ๋‹ค๋ฅธ ์‹์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋‹ค โ†’ self embedding์˜ ๋Œ€ํ‘œ ์‚ฌ๋ก€

์˜ˆ: a+b / (a+b)*c / ((a+b)*(c-d)) โ€” ์‹ ์•ˆ์— ์‹์ด ๋˜ ๋“ค์–ด๊ฐ„๋‹ค. ์ด๊ฒŒ parser๊ฐ€ ํ•„์š”ํ•œ ์ด์œ ๋‹ค.


expr ::= id | integer | - expr | ( expr ) | expr op expr

๊ทœ์น™ ์˜๋ฏธ ์˜ˆ์‹œ
expr ::= id ์‹์€ ์‹๋ณ„์ž ํ•˜๋‚˜ x
expr ::= integer ์‹์€ ์ •์ˆ˜ ํ•˜๋‚˜ 3
expr ::= - expr ์‹ ์•ž์— unary minus -x, -(a+b)
expr ::= ( expr ) ๊ด„ํ˜ธ ์•ˆ์— ์‹ (x+1)
expr ::= expr op expr ์‹๊ณผ ์‹ ์‚ฌ์ด์— ์—ฐ์‚ฐ์ž a+b, x*y

ํ•ต์‹ฌ: expr์˜ ์ •์˜ ์•ˆ์— ๋˜ expr์ด ๋“ฑ์žฅ โ†’ ์ด๊ฒŒ ์žฌ๊ท€๊ณ , ์ด๊ฒŒ self embedding์ด๋‹ค.

op ::= + | - | * | /

์—ฐ์‚ฐ์ž(op)๋Š” +, -, *, / ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด ๋ถ€๋ถ„ ์ž์ฒด๋Š” RE๋กœ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.


How many expressions can be derived? Infinitely many.

๋ช‡ ๊ฐœ์˜ ์‹์ด ์œ ๋„๋  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ๋ฌดํ•œํžˆ ๋งŽ๋‹ค.

์žฌ๊ท€ ๋•Œ๋ฌธ์— ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ๊ตฌ์กฐ์˜ ๊นŠ์ด์— ์ œํ•œ์ด ์—†๋‹ค๋Š” ๋œป์ด๋‹ค. ์ฆ‰, expression grammar๋Š” ๊ธธ์ด๋งŒ ๊ธด ๊ฒŒ ์•„๋‹ˆ๋ผ ๊ตฌ์กฐ๊ฐ€ ๋ฌดํ•œํžˆ ์ค‘์ฒฉ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฐ ์„ฑ์งˆ์ด regular language์˜ ๋ฒ”์œ„๋ฅผ ๋„˜๋Š”๋‹ค.


๐Ÿ“‹ 14ํŽ˜์ด์ง€ โ€” Extra Materials: Training for Lexical Analysis

Extra Materials: Training for Lexical Analysis

์ถ”๊ฐ€ ์ž๋ฃŒ: Lexical Analysis ์—ฐ์Šต

์ด ํŽ˜์ด์ง€๋Š” ๋ณธ๊ฐ•์˜ ๋‚ด์šฉ์ด๋ผ๊ธฐ๋ณด๋‹ค ์ถ”๊ฐ€ ์—ฐ์Šต ํŒŒํŠธ๊ฐ€ ์‹œ์ž‘๋œ๋‹ค๋Š” ํ‘œ์ง€๋‹ค.

  • ์•ž 11~13ํŽ˜์ด์ง€: ์ด๋ก  ์„ค๋ช…
  • 15ํŽ˜์ด์ง€๋ถ€ํ„ฐ: ์—ฐ์Šต๋ฌธ์ œ

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • ์ค‘์•™ ํฐ ์ œ๋ชฉ: Extra Materials
  • ๊ทธ ์•„๋ž˜: Training for Lexical Analysis
  • ํ•˜๋‹จ: ํ•™๊ต ๋กœ๊ณ 

1ํŽ˜์ด์ง€์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์‹ค์งˆ ๋‚ด์šฉ๋ณด๋‹ค๋Š” โ€œ์ด์ œ ์—ฐ์Šต ์„น์…˜ ์‹œ์ž‘โ€์„ ์•Œ๋ ค์ฃผ๋Š” ํ‘œ์ง€ ์—ญํ• ์ด๋‹ค.


๐Ÿ“‹ 15ํŽ˜์ด์ง€ โ€” Writing Regular Expressions

Writing Regular Expressions

์ •๊ทœํ‘œํ˜„์‹ ์ž‘์„ฑํ•˜๊ธฐ

์ด์ œ๋ถ€ํ„ฐ๋Š” ์•ž์—์„œ ๋ฐฐ์šด ๋‚ด์šฉ์„ ์‹ค์ œ๋กœ ์จ๋ณด๋Š” ์—ฐ์Šต์ด๋‹ค.

Write a regular expression for each of the following sets of tokens:
๋‹ค์Œ ๊ฐ๊ฐ์˜ ํ† ํฐ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด ์ •๊ทœํ‘œํ˜„์‹์„ ์ž‘์„ฑํ•˜์‹œ์˜ค.

์ •๊ทœ์‹์„ ์“ธ ๋•Œ ํ™•์ธํ•ด์•ผ ํ•  ๊ฒƒ๋“ค:

  • ๋ฐ˜๋“œ์‹œ ํŠน์ • ์ ‘๋‘์–ด๊ฐ€ ์žˆ์–ด์•ผ ํ•˜๋Š”์ง€
  • ๋ฐ˜๋ณต์ด ๋˜๋Š”์ง€
  • underscore๊ฐ€ ์–ด๋””์—๋‚˜ ๊ฐ€๋Šฅํ•œ์ง€
  • ๋งˆ์ง€๋ง‰ ๊ธ€์ž ์ œ์•ฝ์ด ์žˆ๋Š”์ง€

1) Ruby binary literals

โ€œ0bโ€ ๋‹ค์Œ์— ์ด์ง„์ˆ˜๊ฐ€ ์˜ค๋Š” Ruby์˜ ์ด์ง„ ๋ฆฌํ„ฐ๋Ÿด
์˜ˆ: 0b001011, 0b01

2) Ruby binary literals (with optional underscore)

๋‘ ๊ฐœ์˜ ์ด์ง„ ์ˆซ์ž ์‚ฌ์ด์— ์„ ํƒ์ ์œผ๋กœ underscore _๊ฐ€ ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋Š” Ruby ์ด์ง„ ๋ฆฌํ„ฐ๋Ÿด
์˜ˆ: 0b0_101, 0b11_01 ๊ฐ€๋Šฅ / 0b_1, 0b1_ ๋ถˆ๊ฐ€

3) Ada identifiers

๋ฌธ์ž(letter) ํ•˜๋‚˜๋กœ ์‹œ์ž‘ํ•˜๊ณ , ๊ทธ ๋’ค์— ๋ฌธ์žยท์ˆซ์žยทunderscore๊ฐ€ ์ž„์˜ ๊ฐœ์ˆ˜ ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
๋‹จ, underscore๋กœ ๋๋‚˜๋ฉด ์•ˆ ๋˜๊ณ , underscore๊ฐ€ ๋‘ ๋ฒˆ ์—ฐ์† ๋‚˜์™€์„œ๋„ ์•ˆ ๋œ๋‹ค.

4) Floating-point number

ํ•˜๋‚˜ ์ด์ƒ์˜ ์ˆซ์ž(์ •์ˆ˜๋ถ€) ๋’ค์— ์†Œ์ˆ˜์  .์ด ์˜ค๊ณ , ๊ทธ ๋’ค์— ํ•˜๋‚˜ ์ด์ƒ์˜ ์ˆซ์ž(์†Œ์ˆ˜๋ถ€)๊ฐ€ ์˜ค๋Š” ํ˜•ํƒœ
์˜ˆ: 2.24, 0.1234

5) Floating-point number in scientific notation

์œ„์™€ ๊ฐ™์€ ๋ถ€๋™์†Œ์ˆ˜์  ์ˆ˜ ๋’ค์—, ์„ ํƒ์ ์œผ๋กœ e ๋˜๋Š” E์™€ ๋ถ€ํ˜ธ ์žˆ๋Š” ์ •์ˆ˜ ์ง€์ˆ˜๊ฐ€ ๋”ฐ๋ผ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
์˜ˆ: 1.2e-2, 2.3E+34, 2.3E34


๐Ÿ“‹ 16ํŽ˜์ด์ง€ โ€” ์—ฐ์Šต๋ฌธ์ œ 1๋ฒˆ: Ruby Binary Literals

1) Ruby binary literals consisting of โ€œ0bโ€ followed by the binary number.

โ€œ0bโ€ ๋‹ค์Œ์— ์ด์ง„์ˆ˜๊ฐ€ ์˜ค๋Š” Ruby์˜ ์ด์ง„ ๋ฆฌํ„ฐ๋Ÿด
์˜ˆ: 0b001011, 0b01

๋ฌธ์ œ ์˜๋ฏธ ์„ค๋ช…

ํ—ˆ์šฉ ์ด์œ 
0b0 ์กฐ๊ฑด ๋งŒ์กฑ
0b1 ์กฐ๊ฑด ๋งŒ์กฑ
0b01 ์กฐ๊ฑด ๋งŒ์กฑ
0b101010 ์กฐ๊ฑด ๋งŒ์กฑ
๋ถˆํ—ˆ ์ด์œ 
0b ๋’ค์— ์ด์ง„์ˆ˜ ์—†์Œ
0b102 2๊ฐ€ ๋“ค์–ด๊ฐ
0B01 ๋Œ€๋ฌธ์ž B
00b01 ์‹œ์ž‘ ํ˜•์‹ ๋‹ค๋ฆ„

์ •๊ทœํ‘œํ˜„์‹

0b[01]+

๋˜๋Š” ๋™์ผํ•œ ํ‘œํ˜„:

0b(0|1)+

์™œ +๊ฐ€ ํ•„์š”ํ•œ๊ฐ€?

  • [01]* ๋ผ๊ณ  ์“ฐ๋ฉด 0b ๋’ค์— ์•„๋ฌด๊ฒƒ๋„ ์—†๋Š” ๊ฒฝ์šฐ๋„ ํ—ˆ์šฉ๋จ
  • ์˜ˆ์‹œ์™€ ๋ฌธ์ œ ๋ฌธ๋งฅ์ƒ binary number๊ฐ€ ์™€์•ผ ํ•˜๋ฏ€๋กœ ์ตœ์†Œ 1์ž๋ฆฌ ํ•„์š”
  • ๋”ฐ๋ผ์„œ [01]+์ด ๋งž๋‹ค

๐Ÿ“‹ 17ํŽ˜์ด์ง€ โ€” ์—ฐ์Šต๋ฌธ์ œ 2๋ฒˆ: Ruby Binary Literals (with optional underscore)

2) Ruby binary literals, with an optional underscore (โ€œ_โ€) between a pair of binary digits.

๋‘ ๊ฐœ์˜ ์ด์ง„ ์ˆซ์ž ์‚ฌ์ด์— ์„ ํƒ์ ์œผ๋กœ underscore๊ฐ€ ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋Š” Ruby ์ด์ง„ ๋ฆฌํ„ฐ๋Ÿด
์˜ˆ: 0b0_101, 0b11_01 ๊ฐ€๋Šฅ / 0b_1, 0b1_ ๋ถˆ๊ฐ€

๋ฌธ์ œ ์˜๋ฏธ ์„ค๋ช…

ํ•ต์‹ฌ: _๋Š” ์•„๋ฌด ๋ฐ๋‚˜ ๋ถ™๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, ๋ฐ˜๋“œ์‹œ binary digit์™€ binary digit ์‚ฌ์ด์—๋งŒ ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

ํ—ˆ์šฉ ์ด์œ 
0b0_1 digit _ digit
0b10_10 digit ์‚ฌ์ด underscore
0b0_101 digit ์‚ฌ์ด underscore
๋ถˆํ—ˆ ์ด์œ 
0b_1 ์‹œ์ž‘ ์งํ›„ underscore (์•ž์ชฝ digit ์—†์Œ)
0b1_ ๋์ด underscore (๋’ค์ชฝ digit ์—†์Œ)
0b__1 underscore ์—ฐ์†
0b0__1 ์—ฐ์† underscore

์ž˜๋ชป๋œ ์ ‘๊ทผ (ํ•จ์ •)

0b[01_]+   โ† ํ‹€๋ฆผ!

์ด๋ ‡๊ฒŒ ์“ฐ๋ฉด 0b_1, 0b1_, 0b___ ๊ฐ™์€ ๊ฒƒ๋„ ํ—ˆ์šฉํ•ด ๋ฒ„๋ฆฐ๋‹ค. ๋‹จ์ˆœ ํ—ˆ์šฉ ๋ฌธ์ž ์ง‘ํ•ฉ ๋‚˜์—ด๋งŒ์œผ๋กœ๋Š” ์œ„์น˜ ์ œ์•ฝ์„ ๋ฐ˜์˜ ๋ชปํ•จ.

์ •ํ™•ํ•œ ์ •๊ทœํ‘œํ˜„์‹

0b[01](_?[01])*

๊ตฌ์กฐ ๋ถ„์„:

  • ์ฒซ ๊ธ€์ž๋Š” ๋ฐ˜๋“œ์‹œ [01] โ†’ 0b_1 ๋ง‰ํž˜
  • ๋ฐ˜๋ณต ๋‹จ์œ„ (_?[01]): underscore๊ฐ€ ์˜ค๋ฉด ๋ฐ˜๋“œ์‹œ ๋’ค์— [01]๊ฐ€ ๋”ฐ๋ผ์˜ด โ†’ 0b1_ ๋ง‰ํž˜
  • underscore ์—ฐ์† ๋ถˆ๊ฐ€ (_?๋Š” ์ตœ๋Œ€ ํ•˜๋‚˜) โ†’ 0b0__1 ๋ง‰ํž˜

๐Ÿ“‹ 18ํŽ˜์ด์ง€ โ€” ์—ฐ์Šต๋ฌธ์ œ 3๋ฒˆ: Ada Identifiers

3) Ada identifiers: a letter followed by any number of letters, digits, and underlines.

Ada ์‹๋ณ„์ž: ๋ฌธ์ž ํ•˜๋‚˜๋กœ ์‹œ์ž‘ํ•˜๊ณ , ๊ทธ ๋’ค์— ๋ฌธ์ž, ์ˆซ์ž, underscore๊ฐ€ ์ž„์˜ ๊ฐœ์ˆ˜ ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
๋‹จ, underscore๋กœ ๋๋‚˜๋ฉด ์•ˆ ๋˜๊ณ , underscore๊ฐ€ ๋‘ ๋ฒˆ ์—ฐ์† ๋‚˜์™€์„œ๋„ ์•ˆ ๋œ๋‹ค.

๋ฌธ์ œ ์˜๋ฏธ ์„ค๋ช…

๋‹จ์ˆœํžˆ ์“ฐ๋ฉด ํ‹€๋ฆฌ๊ธฐ ์‰ฌ์šด ํ•จ์ • ๋ฌธ์ œ๋‹ค. ์ฒ˜์Œ ๋ณด๋ฉด ์ด๋ ‡๊ฒŒ ์“ฐ๊ณ  ์‹ถ์–ด์ง„๋‹ค:

[a-zA-Z][a-zA-Z0-9_]*   โ† ํ‹€๋ฆผ!

์ด ์‹์€ abc_, ab__cd ๊ฐ™์€ ๊ธˆ์ง€๋œ ํŒจํ„ด๋„ ํ—ˆ์šฉํ•ด ๋ฒ„๋ฆฐ๋‹ค.

์กฐ๊ฑด ์˜ˆ์‹œ ๋ถˆํ—ˆ ์ด์œ 
underscore๋กœ ๋๋‚˜๋ฉด ์•ˆ ๋จ abc_, x1_ ๋ underscore
underscore ๋‘ ๋ฒˆ ์—ฐ์† ๊ธˆ์ง€ ab__cd, x__1 ์—ฐ์† underscore
ํ—ˆ์šฉ ๋ถˆํ—ˆ
a, abc, a1, a_b _abc (์‹œ์ž‘์ด letter ์•„๋‹˜)
a1_b2_c3 abc_ (๋ underscore)
ย  ab__cd (์—ฐ์† underscore)

ํ•ต์‹ฌ ์‚ฌ๊ณ ๋ฒ•

underscore๋ฅผ โ€œ๋งˆ์Œ๋Œ€๋กœ ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ ๋ฌธ์žโ€๋กœ ๋ณด๋ฉด ์•ˆ ๋œ๋‹ค. underscore๋Š” ์˜ค์ง ์ •์ƒ ๋ฌธ์ž ์‚ฌ์ด๋ฅผ ์ด์–ด์ฃผ๋Š” ์—ฐ๊ฒฐ ๋ฌธ์ž์ฒ˜๋Ÿผ ์ƒ๊ฐํ•ด์•ผ ํ•œ๋‹ค.

์ฆ‰ ๊ตฌ์กฐ:

  • underscore๊ฐ€ ์˜ค๋ ค๋ฉด ๋ฐ˜๋“œ์‹œ ๊ทธ ๋’ค์— letter ๋˜๋Š” digit๊ฐ€ ์™€์•ผ ํ•จ

์ •๊ทœํ‘œํ˜„์‹

[a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*

์™œ ์ด ์‹์ด ๋งž๋Š”๊ฐ€:

  • ์‹œ์ž‘ ๋ฌธ์ž [a-zA-Z]: ๋ฐ˜๋“œ์‹œ letter๋กœ ์‹œ์ž‘
  • ๋ฐ˜๋ณต ๋‹จ์œ„ ([a-zA-Z0-9]|_[a-zA-Z0-9]):
    • ๊ทธ๋ƒฅ letter/digit ํ•˜๋‚˜ ์˜ค๊ฑฐ๋‚˜
    • underscore + letter/digit ์˜ค๊ฑฐ๋‚˜
  • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด: _ ํ˜ผ์ž ๋์— ๋ชป ์˜ด / __ ๋ถˆ๊ฐ€ / underscore ๋’ค์—๋Š” ๋ฌด์กฐ๊ฑด letter/digit๊ฐ€ ์˜ด

๐Ÿ“‹ 19ํŽ˜์ด์ง€ โ€” ์—ฐ์Šต๋ฌธ์ œ 4๋ฒˆ: Floating-point Number

4) A floating-point number: one or more digits (whole-number part) followed by a decimal point (โ€œ.โ€) and one or more digits (fractional part).

๋ถ€๋™์†Œ์ˆ˜์  ์ˆ˜: ํ•˜๋‚˜ ์ด์ƒ์˜ ์ˆซ์ž(์ •์ˆ˜๋ถ€) ๋’ค์— ์†Œ์ˆ˜์  .์ด ์˜ค๊ณ , ๊ทธ ๋’ค์— ํ•˜๋‚˜ ์ด์ƒ์˜ ์ˆซ์ž(์†Œ์ˆ˜๋ถ€)๊ฐ€ ์˜ค๋Š” ํ˜•ํƒœ
์˜ˆ: 2.24, 0.1234

ํ•ต์‹ฌ ์ œ์•ฝ ์„ธ ๊ฐ€์ง€

  1. ์ •์ˆ˜๋ถ€๊ฐ€ ์ตœ์†Œ 1์ž๋ฆฌ ์žˆ์–ด์•ผ ํ•จ
  2. ์†Œ์ˆ˜์ ์ด ๋ฐ˜๋“œ์‹œ ์žˆ์–ด์•ผ ํ•จ
  3. ์†Œ์ˆ˜๋ถ€๋„ ์ตœ์†Œ 1์ž๋ฆฌ ์žˆ์–ด์•ผ ํ•จ

์ •๊ทœํ‘œํ˜„์‹

[0-9]+\.[0-9]+
๋ถ€๋ถ„ ์ •๊ทœ์‹ ์ด์œ 
์ •์ˆ˜๋ถ€ [0-9]+ one or more digits
์†Œ์ˆ˜์  \. ์‹ค์ œ ๋งˆ์นจํ‘œ ๋ฌธ์ž (.๋Š” ๋ฉ”ํƒ€๋ฌธ์ž๋ผ escape ํ•„์š”)
์†Œ์ˆ˜๋ถ€ [0-9]+ one or more digits

์™œ ์ ์„ escapeํ•ด์•ผ ํ•˜๋‚˜?

์ •๊ทœํ‘œํ˜„์‹์—์„œ .์€ โ€œ์•„๋ฌด ๋ฌธ์ž ํ•˜๋‚˜โ€๋ผ๋Š” ํŠน๋ณ„ํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค. ์‹ค์ œ ์†Œ์ˆ˜์  ๋ฌธ์ž . ์ž์ฒด๋ฅผ ์˜๋ฏธํ•˜๋ ค๋ฉด ๋ฐ˜๋“œ์‹œ \.๋กœ ์จ์•ผ ํ•œ๋‹ค.

ํ—ˆ์šฉ ๋ถˆํ—ˆ ์ด์œ 
2.24, 0.1234 .5 ์ •์ˆ˜๋ถ€ ์—†์Œ
ย  5. ์†Œ์ˆ˜๋ถ€ ์—†์Œ
ย  12 ์†Œ์ˆ˜์  ์—†์Œ
ย  12.a ์†Œ์ˆ˜๋ถ€๊ฐ€ digit ์•„๋‹˜

๐Ÿ“‹ 20ํŽ˜์ด์ง€ โ€” ์—ฐ์Šต๋ฌธ์ œ 5๋ฒˆ: Floating-point in Scientific Notation

5) Floating-point number in scientific notation: same as above, but optionally followed by โ€œeโ€ or โ€œEโ€, and a signed integer exponent.

๊ณผํ•™์  ํ‘œ๊ธฐ๋ฒ•์˜ ๋ถ€๋™์†Œ์ˆ˜์  ์ˆ˜: ์œ„์™€ ๊ฐ™๋˜, ์„ ํƒ์ ์œผ๋กœ e ๋˜๋Š” E์™€ ๋ถ€ํ˜ธ ์žˆ๋Š” ์ •์ˆ˜ ์ง€์ˆ˜๊ฐ€ ๋”ฐ๋ผ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
์˜ˆ: 1.2e-2, 2.3E+34, 2.3E34

๊ตฌ์กฐ

๊ธฐ๋ณธ ๋ผˆ๋Œ€ (19ํŽ˜์ด์ง€) + optional exponent suffix:

[0-9]+\.[0-9]+([eE][+-]?[0-9]+)?
๋ถ€๋ถ„ ์ •๊ทœ์‹ ์ด์œ 
๊ธฐ๋ณธ float [0-9]+\.[0-9]+ 19ํŽ˜์ด์ง€์™€ ๋™์ผ
exponent ์ „์ฒด ([eE][+-]?[0-9]+)? optional
e ๋˜๋Š” E [eE] ๋‘˜ ๋‹ค ๊ฐ€๋Šฅ
๋ถ€ํ˜ธ (optional) [+-]? ์žˆ์–ด๋„ ๋˜๊ณ  ์—†์–ด๋„ ๋จ
์ง€์ˆ˜ ์ˆซ์ž [0-9]+ ์ตœ์†Œ 1์ž๋ฆฌ ์ด์ƒ

examples ํ•ด์„

์˜ˆ์‹œ ํ•ด์„
1.2e-2 ๊ธฐ๋ณธ float 1.2 + ์ง€์ˆ˜ e-2
2.3E+34 ๊ธฐ๋ณธ float 2.3 + ์ง€์ˆ˜ E+34 (์–‘์ˆ˜ ๋ถ€ํ˜ธ ๋ช…์‹œ)
2.3E34 ๊ธฐ๋ณธ float 2.3 + ์ง€์ˆ˜ E34 (๋ถ€ํ˜ธ ์ƒ๋žต)

๋ถˆํ—ˆ ์˜ˆ์‹œ

๋ถˆํ—ˆ ์ด์œ 
1.2e ์ง€์ˆ˜ ์ˆซ์ž ์—†์Œ
1.2e+ ๋ถ€ํ˜ธ๋งŒ ์žˆ๊ณ  ์ˆซ์ž ์—†์Œ
1.2e+3.4 exponent๋Š” ์ •์ˆ˜์—ฌ์•ผ ํ•จ
.2e3 ๊ธฐ๋ณธ float ์กฐ๊ฑด ์œ„๋ฐ˜
2.e3 ๊ธฐ๋ณธ float ์กฐ๊ฑด ์œ„๋ฐ˜

๐Ÿ”ฅ 16~20ํŽ˜์ด์ง€ ํ•ต์‹ฌ ์ •๋ฆฌ

๋ฌธ์ œ ์ •๊ทœํ‘œํ˜„์‹
16ํŽ˜์ด์ง€ โ€” Ruby binary 0b[01]+
17ํŽ˜์ด์ง€ โ€” Ruby binary + underscore 0b[01](_?[01])*
18ํŽ˜์ด์ง€ โ€” Ada identifier [a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*
19ํŽ˜์ด์ง€ โ€” Floating-point [0-9]+\.[0-9]+
20ํŽ˜์ด์ง€ โ€” Scientific notation [0-9]+\.[0-9]+([eE][+-]?[0-9]+)?

๐Ÿ“‹ 21ํŽ˜์ด์ง€ โ€” Subset Construction with Animation

Subset Construction with Animation

๋ถ€๋ถ„์ง‘ํ•ฉ ๊ตฌ์„ฑ๋ฒ• (์• ๋‹ˆ๋ฉ”์ด์…˜)

Reminder) subset construction: NFA to DFA
๋ณต์Šต: subset construction์€ NFA๋ฅผ DFA๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค

NFA์˜ ํ•ต์‹ฌ ํŠน์ง•:

  • ๊ฐ™์€ ์ž…๋ ฅ์—์„œ ์—ฌ๋Ÿฌ ๊ฒฝ๋กœ ๊ฐ€๋Šฅ
  • ์˜ˆ: s1์—์„œ a ์ž…๋ ฅ โ†’ s1 ๊ฐˆ ์ˆ˜๋„ ์žˆ๊ณ  s2 ๊ฐˆ ์ˆ˜๋„ ์žˆ์Œ โ†’ ์ด๊ฑธ โ€œguessโ€๋ผ๊ณ  ํ‘œํ˜„

Instead of guessing in state s1 on symbol a, we can follow both transitions in parallel.

์ƒํƒœ s1์—์„œ ์ž…๋ ฅ a๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ์ถ”์ธกํ•˜๋Š” ๋Œ€์‹ , ์šฐ๋ฆฌ๋Š” ๋‘ ๊ฒฝ๋กœ๋ฅผ ๋™์‹œ์— ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.

NFA์˜ ๋น„๊ฒฐ์ •์„ฑ์„ ์ œ๊ฑฐํ•˜๋Š” ํ•ต์‹ฌ ์•„์ด๋””์–ด: โ€œํ•˜๋‚˜๋งŒ ๊ณ ๋ฅด๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ๋‘˜ ๋‹ค ๊ฐ„๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹คโ€


We introduce a โ€œvirtual stateโ€

์šฐ๋ฆฌ๋Š” โ€œ๊ฐ€์ƒ ์ƒํƒœโ€๋ฅผ ๋„์ž…ํ•œ๋‹ค

  • ์ƒˆ๋กœ์šด ๊ฐœ๋…: {s1, s2} โ€” ์—ฌ๋Ÿฌ ์ƒํƒœ๋ฅผ ๋ฌถ์€ ํ•˜๋‚˜์˜ ์ƒํƒœ
  • DFA์—์„œ๋Š” ์ƒํƒœ ํ•˜๋‚˜๊ฐ€ ์‹ค์ œ๋กœ๋Š” NFA ์ƒํƒœ๋“ค์˜ ์ง‘ํ•ฉ์ด ๋œ๋‹ค
  • ํ‘œ๊ธฐ: {s1, s2} (์ง‘ํ•ฉ์œผ๋กœ ํ‘œํ˜„)

A question: if we are in {s1, s2}, where can we go on b?

์งˆ๋ฌธ: ์šฐ๋ฆฌ๊ฐ€ {s1, s2} ์ƒํƒœ์— ์žˆ์„ ๋•Œ b๋ฅผ ์ฝ์œผ๋ฉด ์–ด๋””๋กœ ๊ฐ€๋Š”๊ฐ€?

Answer: wherever s1 or s2 can go
๋‹ต: s1์ด๋‚˜ s2๊ฐ€ ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ์ƒํƒœ๋กœ ๊ฐ„๋‹ค

Move({s1, s2}, b) = Move(s1, b) โˆช Move(s2, b)

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • ์ƒํƒœ s1์—์„œ a โ†’ s1 ๋˜๋Š” s2
  • ์ƒํƒœ s2์—์„œ b โ†’ s3
  • ์ตœ์ข… s4๋Š” accepting state
  • NFA๋Š” โ€œ๊ฐˆ๋ฆผ๊ธธ ์กด์žฌโ€ โ†’ DFA๋Š” โ€œ๊ฐˆ๋ฆผ๊ธธ ์ œ๊ฑฐ โ†’ ์ƒํƒœ ๋ฌถ๊ธฐโ€

๐Ÿ“‹ 22ํŽ˜์ด์ง€ โ€” Example: Simulating an NFA โ€œon the flyโ€

Example: Simulating an NFA โ€œon the flyโ€

์˜ˆ: NFA๋ฅผ ์ฆ‰์„์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ธฐ

On-the-fly simulation: useful when a regular expression is used only once

NFA๋ฅผ ์ฆ‰์„์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” ๊ฒƒ์€ ์ •๊ทœํ‘œํ˜„์‹์ด ํ•œ ๋ฒˆ๋งŒ ์‚ฌ์šฉ๋  ๋•Œ ์œ ์šฉํ•˜๋‹ค

์šฉ๋„ ์˜ˆ์‹œ: IDE์˜ find / grep โ€” DFA๋กœ ๋ณ€ํ™˜ ์•ˆ ํ•˜๊ณ  ๊ทธ๋ƒฅ NFA๋กœ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ


In a compiler, โ€ฆ used over and over again โ†’ need a more efficient approach

์ปดํŒŒ์ผ๋Ÿฌ์—์„œ๋Š” ์ •๊ทœํ‘œํ˜„์‹์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค โ†’ ๋” ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค โ†’ DFA ์‚ฌ์šฉ

scanner๋Š” ๋ชจ๋“  ์ž…๋ ฅ์— ๋Œ€ํ•ด ๊ณ„์† ์‚ฌ์šฉ๋จ

๐Ÿ”ฅ ํ•ต์‹ฌ ๋น„๊ต

๋ฐฉ์‹ ํŠน์ง•
NFA ์ง์ ‘ ์‚ฌ์šฉ ๊ฐ„๋‹จ, ๋А๋ฆผ
DFA ๋ณ€ํ™˜ ํ›„ ์‚ฌ์šฉ ๋ณต์žก, ๋น ๋ฆ„

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • s0 โ†’ s1 โ†’ s2 โ†’ s3 โ†’ s4 (์ž…๋ ฅ: a, b, b)
  • ๋ฌธ์ž์—ด โ€œabbโ€ ์ฒ˜๋ฆฌ ๊ณผ์ • โ€” NFA๋Š” ์—ฌ๋Ÿฌ ๊ฒฝ๋กœ ์‹œ๋„ ๊ฐ€๋Šฅ

๐Ÿ“‹ 23ํŽ˜์ด์ง€ โ€” Algorithm: NFA โ†’ DFA with Subset Construction

Algorithm: NFA โ†’ DFA with Subset Construction

์•Œ๊ณ ๋ฆฌ์ฆ˜: subset construction์œผ๋กœ NFA โ†’ DFA

ํ•ต์‹ฌ ๊ฐœ๋…

  • Subset construction works on sets of NFA states: DFA ์ƒํƒœ = NFA ์ƒํƒœ ์ง‘ํ•ฉ
  • Each set of NFA states becomes a DFA state: ์˜ˆ: {s0, s1}, {s1, s2}, {s1}

๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ํ•จ์ˆ˜ (Two key functions)

ํ•จ์ˆ˜ ์˜๋ฏธ
Move(S, a) ์ƒํƒœ ์ง‘ํ•ฉ S์—์„œ ์ž…๋ ฅ a๋กœ ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ์ƒํƒœ ์ง‘ํ•ฉ
ฮต-closure(S) ฮต-transition(์ž…๋ ฅ ์—†์ด ์ด๋™)์œผ๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ƒํƒœ ์ง‘ํ•ฉ

์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ๋ฆ„

  1. Start state = ฮต-closure({s0}): ์‹œ์ž‘ ์ƒํƒœ๋Š” s0์˜ ฮต-closure (๊ทธ๋ƒฅ s0๊ฐ€ ์•„๋‹ˆ๋ผ ฮต๋กœ ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ์ƒํƒœ ํฌํ•จ)
  2. Compute Move(S0, ฮฑ): ๊ฐ ์ž…๋ ฅ ฮฑ์— ๋Œ€ํ•ด Move ๊ณ„์‚ฐ
  3. take ฮต-closure: ๊ทธ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ฮต-closure ์ ์šฉ
  4. Iterate until no more states: ์ƒˆ ์ƒํƒœ๊ฐ€ ์•ˆ ์ƒ๊ธธ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต

๐Ÿ“‹ 24ํŽ˜์ด์ง€ โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ƒ์„ธ ์ฝ”๋“œ

์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ฝ”๋“œ

Dstates โ† {}
add ฮต-closure(s0) as unmarked state to Dstates

while (unmarked state T exists)
    mark T
    for each ฮฑ โˆˆ ฮฃ
        U โ† ฮต-closure(Move(T, ฮฑ))
        if (U โˆ‰ Dstates) add U
        ฮด[T, ฮฑ] โ† U

๊ฐ ์ค„ ํ•ด์„

์ฝ”๋“œ ํ•ด์„
Dstates โ† {} DFA ์ƒํƒœ ์ง‘ํ•ฉ ์ดˆ๊ธฐํ™”
add ฮต-closure(s0) ์‹œ์ž‘ ์ƒํƒœ ์ถ”๊ฐ€
while (unmarked T exists) ์•„์ง ์ฒ˜๋ฆฌ ์•ˆ ํ•œ ์ƒํƒœ๊ฐ€ ์žˆ์œผ๋ฉด ๋ฐ˜๋ณต
mark T ํ˜„์žฌ ์ƒํƒœ ์ฒ˜๋ฆฌ ์™„๋ฃŒ ํ‘œ์‹œ
for each ฮฑ โˆˆ ฮฃ ๋ชจ๋“  ์ž…๋ ฅ ๋ฌธ์ž์— ๋Œ€ํ•ด
U โ† ฮต-closure(Move(T, ฮฑ)) ํ•ต์‹ฌ: Move โ†’ ฮต-closure ์ˆœ์„œ๋กœ ๊ณ„์‚ฐ
if (U โˆ‰ Dstates) add U ์ƒˆ ์ƒํƒœ๋ฉด ์ถ”๊ฐ€
ฮด[T, ฮฑ] โ† U transition ๊ธฐ๋ก

์ด๊ฑด ๊ทธ๋ƒฅ ์™ธ์šฐ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ์ƒํƒœ ํ™•์žฅ ๊ณผ์ • ์ดํ•ดํ•ด์•ผ ํ•จ


๐Ÿ“‹ 25ํŽ˜์ด์ง€ โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‹œ์ž‘ ์ƒํƒœ

์ดˆ๊ธฐ ์ƒํƒœ ์„ค์ •

Dstates = {}
ฮต-closure(s0) = {s0, s1}

์ดˆ๊ธฐ ์ƒํƒœ๋Š” {s0, s1} โ€” ์™œ s1๊นŒ์ง€ ํฌํ•จ๋˜๋Š”๊ฐ€?

ฮต-transition ๋•Œ๋ฌธ: s0์—์„œ ์ž…๋ ฅ ์—†์ด s1 ๊ฐˆ ์ˆ˜ ์žˆ์Œ

  • ์‹œ์ž‘ ์ƒํƒœ๋Š” ๋‹จ์ผ ์ƒํƒœ๊ฐ€ ์•„๋‹ˆ๋ผ ์ง‘ํ•ฉ ์ƒํƒœ
  • add as unmarked state โ†’ ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ๋กœ ์ถ”๊ฐ€

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

  • s0 โ†’ ฮต โ†’ s1
  • DFA ์‹œ์ž‘ ์ƒํƒœ: ฮต-closure(s0) โ† ์‹œํ—˜ ํฌ์ธํŠธ

๐Ÿ”ฅ 21~25ํŽ˜์ด์ง€ ์ „์ฒด ํ•ต์‹ฌ ์š”์•ฝ

  1. ํ•ต์‹ฌ ์•„์ด๋””์–ด: NFA๋Š” ์—ฌ๋Ÿฌ ๊ฒฝ๋กœ / DFA๋Š” ํ•˜๋‚˜์˜ ๊ฒฝ๋กœ โ†’ ํ•ด๊ฒฐ: ์—ฌ๋Ÿฌ ์ƒํƒœ๋ฅผ ํ•˜๋‚˜๋กœ ๋ฌถ๋Š”๋‹ค

  2. DFA ์ƒํƒœ ์ •์˜: DFA state = NFA ์ƒํƒœ ์ง‘ํ•ฉ

  3. ํ•ต์‹ฌ ํ•จ์ˆ˜:

ํ•จ์ˆ˜ ์˜๋ฏธ
Move(S, a) ์ƒํƒœ ์ง‘ํ•ฉ S์—์„œ a๋กœ ์ด๋™ ๊ฐ€๋Šฅํ•œ ์ƒํƒœ๋“ค
ฮต-closure(S) ฮต ์ „์ด๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ƒํƒœ๋“ค
  1. ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ๋ฆ„:
    • ์ดˆ๊ธฐ: S0 = ฮต-closure(s0)
    • ๋ฐ˜๋ณต: U = ฮต-closure(Move(T, ฮฑ))
    • ์ƒˆ ์ƒํƒœ๋ฉด ์ถ”๊ฐ€
  2. ์ค‘์š”ํ•œ ํฌ์ธํŠธ: โ€œguess ํ•˜์ง€ ์•Š๋Š”๋‹คโ€ โ†’ โ€œ๋ชจ๋“  ๊ฐ€๋Šฅ์„ฑ์„ ๋™์‹œ์— ํฌํ•จํ•œ๋‹คโ€

๐Ÿ”ฅ ์ „์ฒด ๊ฐ•์˜ ํ•ต์‹ฌ ์š”์•ฝ (1~25ํŽ˜์ด์ง€)

์ „์ฒด ํ๋ฆ„

์ •๊ทœํ‘œํ˜„์‹ โ†’ NFA โ†’ DFA โ†’ ์ตœ์†Œ DFA

๊ฐ ๋‹จ๊ณ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋ฐฉ๋ฒ• ํ•ต์‹ฌ ๊ฐœ๋…
RE โ†’ NFA Thompson Construction ฮต-transition
NFA โ†’ DFA Subset Construction ฮต-closure, Move
DFA โ†’ min DFA Partition Refinement distinguishable

RE์˜ ํ•œ๊ณ„

RE๋กœ ์ž˜ ๋˜๋Š” ๊ฒƒ RE๋กœ ์•ˆ ๋˜๋Š” ๊ฒƒ
keyword 0โฟ1โฟ
identifier ๊ด„ํ˜ธ ์ง ๋งž์ถ”๊ธฐ
literal ์ค‘์ฒฉ๋œ expression/statement ์ „์ฒด
operator, punctuation self embedding ๊ตฌ์กฐ

์ด๋ก  ๊ด€๊ณ„

  • RE < CFG โ€” ๋ชจ๋“  regular language๋Š” CFG๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ, ํ•˜์ง€๋งŒ ๋ชจ๋“  CFL์ด regular์ธ ๊ฑด ์•„๋‹˜

์ •๊ทœํ‘œํ˜„์‹ ๋ฌธ์ œ ์ •๋‹ต ๋ชจ์Œ

๋ฌธ์ œ ์ •๊ทœํ‘œํ˜„์‹
Ruby binary 0b[01]+
Ruby binary + _ 0b[01](_?[01])*
Ada identifier [a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*
Floating-point [0-9]+\.[0-9]+
Scientific notation [0-9]+\.[0-9]+([eE][+-]?[0-9]+)?

Subset Construction ์‹œํ—˜ ํฌ์ธํŠธ

  • Move(T, ฮฑ): ์ƒํƒœ ์ง‘ํ•ฉ T์—์„œ ฮฑ๋กœ ์ด๋™ํ•˜๋Š” ์ƒํƒœ๋“ค
  • ฮต-closure(T): ฮต ์ „์ด๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ƒํƒœ๋“ค
  • ๋ฐ˜๋ณต: ๋” ์ด์ƒ ์ƒํƒœ ์ถ”๊ฐ€ ์—†์„ ๋•Œ๊นŒ์ง€ (fixed-point)
  • DFA ์‹œ์ž‘ ์ƒํƒœ: ๋ฐ˜๋“œ์‹œ ฮต-closure(s0)

COMP321 Compiler โ€” Lexical Analysis Part 3

Kyungpook National University | Hwisoo So | Spring 2026


๐Ÿ“‹ 26ํŽ˜์ด์ง€ โ€” ์‹œ์ž‘ ์ƒํƒœ ํ™•์ธ

ํ˜„์žฌ ์ƒํƒœ

Dstates = { {s0, s1} }

DFA ์ƒํƒœ ์ง‘ํ•ฉ์—๋Š” ํ˜„์žฌ {s0, s1} ํ•˜๋‚˜๋งŒ ์žˆ๋‹ค.


Dstates contains now one elementโ€ฆ

Dstates์—๋Š” ํ˜„์žฌ ํ•˜๋‚˜์˜ ์ƒํƒœ {s0, s1}๊ฐ€ ์žˆ๋‹ค.

์ด ์ƒํƒœ๋Š” ์–ด๋””์„œ ์™”๋ƒ?

ฮต-closure(s0) = {s0, s1}

์ฆ‰ ์‹œ์ž‘ ์ƒํƒœ๋‹ค.


Note1: unmarked states are printed in red

์•„์ง ์ฒ˜๋ฆฌ๋˜์ง€ ์•Š์€ ์ƒํƒœ๋Š” ๋นจ๊ฐ„์ƒ‰์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค.

Note2: Dstates means โ€œdeterministic statesโ€

Dstates๋Š” DFA ์ƒํƒœ๋“ค์„ ์˜๋ฏธํ•œ๋‹ค.


๐Ÿ”ฅ ํ•ต์‹ฌ

ํ˜„์žฌ ์ƒํƒœ ์ฒ˜๋ฆฌ ์—ฌ๋ถ€
{s0, s1} ์•„์ง ์ฒ˜๋ฆฌ ์•ˆ ํ•จ (unmarked)

๐Ÿ“‹ 27ํŽ˜์ด์ง€ โ€” ์ฒซ ์ƒํƒœ ์ฒ˜๋ฆฌ ์‹œ์ž‘

Now we pick state {s0, s1} and mark it

์ด์ œ {s0, s1} ์ƒํƒœ๋ฅผ ์„ ํƒํ•˜๊ณ  ์ฒ˜๋ฆฌ ์™„๋ฃŒ๋กœ ํ‘œ์‹œํ•œ๋‹ค.

subset construction ํ•ต์‹ฌ ํ๋ฆ„:

  1. unmarked ์ƒํƒœ ์„ ํƒ
  2. ์ฒ˜๋ฆฌ
  3. mark

marked states are printed in green

์ฒ˜๋ฆฌ๋œ ์ƒํƒœ๋Š” ์ดˆ๋ก์ƒ‰์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

์ƒํƒœ {s0, s1} โ†’ ์ด์ œ ์ฒ˜๋ฆฌ ์‹œ์ž‘ โ†’ ์ดํ›„ transition ๊ณ„์‚ฐํ•  ๋Œ€์ƒ


๐Ÿ“‹ 28ํŽ˜์ด์ง€ โ€” ์ž…๋ ฅ ์ง‘ํ•ฉ ํ™•์ธ

T = {s0, s1}, ฮฃ = {a, b}

ํ˜„์žฌ ์ƒํƒœ T๋Š” {s0, s1}, ์ž…๋ ฅ ์ง‘ํ•ฉ์€ {a, b}

DFA๋Š” ๋ชจ๋“  ์ž…๋ ฅ์— ๋Œ€ํ•ด transition์ด ํ•„์š”ํ•˜๋‹ค. ์ฆ‰:

  • a์— ๋Œ€ํ•ด? โ†’ ๊ณ„์‚ฐ ํ•„์š”
  • b์— ๋Œ€ํ•ด? โ†’ ๊ณ„์‚ฐ ํ•„์š”

The for loop iterates over all symbols

for๋ฌธ์€ ๋ชจ๋“  ์ž…๋ ฅ ๋ฌธ์ž์— ๋Œ€ํ•ด ๋ฐ˜๋ณตํ•œ๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ โ€” ์ง€๊ธˆ ํ•ด์•ผ ํ•  ๊ฒƒ

Move({s0, s1}, a) = ?
Move({s0, s1}, b) = ?

๐Ÿ“‹ 29ํŽ˜์ด์ง€ โ€” ฮฑ = a ๊ณ„์‚ฐ

Move(T, a) = {s1, s2}

{s0, s1}์—์„œ a๋ฅผ ์ฝ์œผ๋ฉด {s1, s2}๋กœ ๊ฐ„๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ๋˜๋Š”๊ฐ€? ๊ณ„์‚ฐ ๊ณผ์ •

์ƒํƒœ a ์ž…๋ ฅ ๊ฒฐ๊ณผ
s0์—์„œ a s0 โ†’ s1 s1
s1์—์„œ a s1 โ†’ s1, s1 โ†’ s2 s1, s2

ํ•ฉ์น˜๋ฉด: {s1, s2}


ฮต-closure({s1, s2}) = {s1, s2}

ฮต-transition์œผ๋กœ ๋” ๊ฐˆ ๊ณณ ์—†์Œ โ†’ ๊ทธ๋Œ€๋กœ {s1, s2}

U = {s1, s2}

๋‹ค์Œ ์ƒํƒœ๋Š” {s1, s2}

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s0,s1}, a) = {s1, s2}

๐Ÿ“‹ 30ํŽ˜์ด์ง€ โ€” ์ƒˆ ์ƒํƒœ ์ถ”๊ฐ€

State U is not in Dstates โ†’ add it

์ด ์ƒํƒœ {s1, s2}๋Š” ์•„์ง ์—†์œผ๋ฏ€๋กœ ์ถ”๊ฐ€ํ•œ๋‹ค.

Dstates ์—…๋ฐ์ดํŠธ

์ด์ „: Dstates = { {s0, s1} }
      โ†“
์ดํ›„: Dstates = { {s0, s1}, {s1, s2} }

๐Ÿ–ผ ๊ทธ๋ฆผ ์„ค๋ช…

{s0, s1} โ€“aโ€“> {s1, s2}

๐Ÿ”ฅ ํ•ต์‹ฌ

DFA ์ƒํƒœ ํ™•์žฅ ์‹œ์ž‘๋จ


๐Ÿ”ฅ 26~30ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋‚ด์šฉ
1. ์‹œ์ž‘ ์ƒํƒœ {s0, s1} = ฮต-closure(s0)
2. ์ฒ˜๋ฆฌ ์‹œ์ž‘ T = {s0, s1} mark
3. a์— ๋Œ€ํ•ด Move({s0,s1}, a) = {s1, s2}
4. ฮต-closure = {s1, s2}
5. ์ƒํƒœ ์ถ”๊ฐ€ Dstates = { {s0,s1}, {s1,s2} }

์ด ๊ตฌ๊ฐ„ ํ•ต์‹ฌ ๊ฐ๊ฐ

  • NFA๋Š” โ€œ๊ฐˆ๋ฆผ๊ธธโ€ โ†’ DFA๋Š” โ€œ๋ชจ๋“  ๊ฐˆ๋ฆผ๊ธธ์„ ํ•œ ๋ฒˆ์— ํฌํ•จโ€
  • ํ•œ ์ƒํƒœ = ์—ฌ๋Ÿฌ ์ƒํƒœ์˜ ๋ฌถ์Œ

๐Ÿ”ฅ ์‹œํ—˜ ํฌ์ธํŠธ

ํ•ด์•ผ ํ•  ๊ฒƒ ๋‚ด์šฉ
Move ๊ณ„์‚ฐ ๊ฐ ์ƒํƒœ์—์„œ ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ์ƒํƒœ ํ•ฉ์ง‘ํ•ฉ
ฮต-closure ์ ์šฉ ์ž…๋ ฅ ์—†์ด ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ์ƒํƒœ๊นŒ์ง€ ํฌํ•จ
์ƒˆ๋กœ์šด ์ƒํƒœ ํŒ๋ณ„ ์—†์œผ๋ฉด ์ถ”๊ฐ€

๐Ÿ“‹ 31ํŽ˜์ด์ง€ โ€” ์ „์ด ํ•จ์ˆ˜ ฮด ๊ตฌ์„ฑ ์‹œ์ž‘

Here we start assembling the transition function ฮด

์ด์ œ ์ „์ด ํ•จ์ˆ˜ ฮด๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์‹œ์ž‘ํ•œ๋‹ค.

์ง€๊ธˆ๊นŒ์ง€ ํ•œ ๊ฑด โ€œ์ƒํƒœ ์ฐพ๊ธฐโ€์˜€๊ณ , ์ด์ œ๋ถ€ํ„ฐ๋Š” โ€œ์ „์ดํ‘œ ๋งŒ๋“ค๊ธฐโ€๋‹ค.


If we read an a in state T, then DFA moves to {s1, s2}

์ƒํƒœ T์—์„œ a๋ฅผ ์ฝ์œผ๋ฉด DFA๋Š” {s1, s2}๋กœ ์ด๋™ํ•œ๋‹ค.

์ด๋ฏธ ๊ตฌํ•œ ๊ฒƒ:

ฮด({s0, s1}, a) = {s1, s2}

ฮด ์ „์ดํ‘œ (ํ˜„์žฌ)

์ƒํƒœ a b
{s0,s1} {s1,s2} ย 

๐Ÿ”ฅ ํ•ต์‹ฌ

DFA transition table ๋งŒ๋“ค๊ธฐ ์‹œ์ž‘


๐Ÿ“‹ 32ํŽ˜์ด์ง€ โ€” b ์ฒ˜๋ฆฌ ์‹œ์ž‘

ํ˜„์žฌ ์ƒํƒœ

Dstates = { {s0, s1}, {s1, s2} }

Second iteration: ฮฑ = b

๋‘ ๋ฒˆ์งธ ๋ฐ˜๋ณต: ์ž…๋ ฅ b

  • a๋Š” ๋๋‚ฌ๊ณ 
  • ์ด์ œ b ์ฒ˜๋ฆฌ ์ฐจ๋ก€

๐Ÿ”ฅ ํ•ต์‹ฌ โ€” ์ง€๊ธˆ ํ•ด์•ผ ํ•  ๊ฒƒ

Move({s0,s1}, b) = ?

๐Ÿ“‹ 33ํŽ˜์ด์ง€ โ€” ฮฑ = b ๊ณ„์‚ฐ

Move(T, b) = {s1}

{s0, s1}์—์„œ b๋ฅผ ์ฝ์œผ๋ฉด {s1}๋กœ ๊ฐ„๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ๋˜๋Š”๊ฐ€? ๊ณ„์‚ฐ ๊ณผ์ •

์ƒํƒœ b ์ž…๋ ฅ ๊ฒฐ๊ณผ
s0์—์„œ b ์ด๋™ ์—†์Œ โ€”
s1์—์„œ b s1 โ†’ s1 s1

๊ฒฐ๊ณผ: {s1}


ฮต-closure({s1}) = {s1}

ฮต ์ด๋™ ์—†์Œ โ†’ ๊ทธ๋Œ€๋กœ {s1}

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s0,s1}, b) = {s1}

๐Ÿ“‹ 34ํŽ˜์ด์ง€ โ€” ์ƒˆ ์ƒํƒœ {s1} ์ถ”๊ฐ€

State U is not in Dstates โ†’ add it

{s1}๋Š” ์•„์ง ์—†์œผ๋ฏ€๋กœ ์ถ”๊ฐ€ํ•œ๋‹ค.

Dstates ์—…๋ฐ์ดํŠธ

์ด์ „: Dstates = { {s0,s1}, {s1,s2} }
      โ†“
์ดํ›„: Dstates = { {s0,s1}, {s1,s2}, {s1} }

๐Ÿ”ฅ ํ•ต์‹ฌ

์ƒˆ๋กœ์šด DFA ์ƒํƒœ {s1} ๋“ฑ์žฅ


๐Ÿ“‹ 35ํŽ˜์ด์ง€ โ€” ฮด ํ…Œ์ด๋ธ” ์—…๋ฐ์ดํŠธ

If we read b in state T, DFA moves to {s1}

์ƒํƒœ T์—์„œ b๋ฅผ ์ฝ์œผ๋ฉด {s1}๋กœ ์ด๋™ํ•œ๋‹ค.

ฮด ์ „์ดํ‘œ (์—…๋ฐ์ดํŠธ)

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}

์ฒซ ๋ฒˆ์งธ ์ƒํƒœ {s0,s1}์— ๋Œ€ํ•œ transition ์™„์„ฑ


๐Ÿ”ฅ 31~35ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

์ƒํƒœ {s0,s1} ์ฒ˜๋ฆฌ ์™„๋ฃŒ

์ž…๋ ฅ ๊ฒฐ๊ณผ
a {s1,s2}
b {s1}

ํ˜„์žฌ DFA ์ƒํƒœ

{s0,s1}   โ†’ (์™„๋ฃŒ)
{s1,s2}   โ†’ (๋ฏธ์ฒ˜๋ฆฌ)
{s1}      โ†’ (๋ฏธ์ฒ˜๋ฆฌ)

ํ˜„์žฌ ์ „์ด ๊ตฌ์กฐ

{s0,s1}
  โ”œโ”€ a โ†’ {s1,s2}
  โ””โ”€ b โ†’ {s1}

๐Ÿ”ฅ ์‹œํ—˜ ํ•ต์‹ฌ ํฌ์ธํŠธ

์‹ค์ˆ˜ ์œ ํ˜• ๋‚ด์šฉ
โŒ s0์—์„œ b ์•ˆ ๊ฐ€๋Š”๋ฐ ํฌํ•จ์‹œํ‚ค๋Š” ๊ฒฝ์šฐ s0 โ†’ b ์ด๋™ ์—†์Œ ํ™•์ธ ํ•„์ˆ˜
โŒ ฮต-closure ๋นผ๋จน๋Š” ๊ฒฝ์šฐ Move ํ›„ ๋ฐ˜๋“œ์‹œ ฮต-closure ์ ์šฉ
โŒ ๊ธฐ์กด ์ƒํƒœ์ธ๋ฐ ๋˜ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ ์ด๋ฏธ ์žˆ์œผ๋ฉด ์ ˆ๋Œ€ ์ถ”๊ฐ€ ์•ˆ ํ•จ

๐Ÿ“‹ 36ํŽ˜์ด์ง€ โ€” ๋‹ค์Œ ์ฒ˜๋ฆฌ ๋Œ€์ƒ ์„ ํƒ

ํ˜„์žฌ ์ƒํƒœ

Dstates = { {s0,s1}, {s1,s2}, {s1} }

Dstates contains now two unmarked states: {s1,s2} and {s1}

์•„์ง ์ฒ˜๋ฆฌ ์•ˆ ๋œ ์ƒํƒœ๋Š” {s1,s2}์™€ {s1} ๋‘ ๊ฐœ

์ƒํƒœ ์ฒ˜๋ฆฌ ์—ฌ๋ถ€
{s0,s1} โœ” ์™„๋ฃŒ
{s1,s2} โŒ ๋ฏธ์ฒ˜๋ฆฌ
{s1} โŒ ๋ฏธ์ฒ˜๋ฆฌ

๋‹ค์Œ ์ฒ˜๋ฆฌ ๋Œ€์ƒ: T = {s1, s2}

๐Ÿ”ฅ ํ•ต์‹ฌ

์ด์ œ {s1,s2} ์ƒํƒœ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•จ


๐Ÿ“‹ 37ํŽ˜์ด์ง€ โ€” {s1,s2} ์ฒ˜๋ฆฌ ์‹œ์ž‘

T = {s1, s2}

ํ˜„์žฌ ์ฒ˜๋ฆฌํ•  ์ƒํƒœ๋Š” {s1,s2}

ํ•ด์•ผ ํ•  ๊ฒƒ

Move({s1,s2}, a) = ?
Move({s1,s2}, b) = ?

๐Ÿ”ฅ ํ•ต์‹ฌ

์ƒˆ๋กœ์šด ์ƒํƒœ ๋งŒ๋“ค์–ด์งˆ ๊ฐ€๋Šฅ์„ฑ ์žˆ์Œ


๐Ÿ“‹ 38ํŽ˜์ด์ง€ โ€” ฮฑ = a ๊ณ„์‚ฐ

ฮฑ = a

์ž…๋ ฅ a์— ๋Œ€ํ•ด ๊ณ„์‚ฐ

๊ณ„์‚ฐ ๊ณผ์ •

์ƒํƒœ a ์ž…๋ ฅ ๊ฒฐ๊ณผ
s1์—์„œ a s1 โ†’ s1, s1 โ†’ s2 s1, s2
s2์—์„œ a ์ด๋™ ์—†์Œ โ€”

๊ฒฐ๊ณผ: {s1, s2}


ฮต-closure({s1, s2}) = {s1, s2}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ โ†’ ๊ทธ๋Œ€๋กœ {s1, s2}

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s1,s2}, a) = {s1,s2}

๐Ÿ‘‰ ์ž๊ธฐ ์ž์‹ ์œผ๋กœ ๋Œ์•„์˜ด (self-loop)


๐Ÿ“‹ 39ํŽ˜์ด์ง€ โ€” ์ค‘๋ณต ์ƒํƒœ ํ™•์ธ

U = {s1,s2} already in Dstates

์ด ์ƒํƒœ๋Š” ์ด๋ฏธ ์กด์žฌํ•จ

  • ์ƒˆ๋กœ์šด ์ƒํƒœ๊ฐ€ ์•„๋‹˜
  • Dstates ๋ณ€ํ™” ์—†์Œ

๐Ÿ”ฅ ํ•ต์‹ฌ

๊ธฐ์กด ์ƒํƒœ๋ฉด ์ถ”๊ฐ€ ์•ˆ ํ•จ โ€” ์ „์ด ๊ธฐ๋ก๋งŒ ํ•จ


๐Ÿ“‹ 40ํŽ˜์ด์ง€ โ€” ์ฒ˜๋ฆฌ ์™„๋ฃŒ (a)

Nothing to be done here

์ถ”๊ฐ€ ์ž‘์—… ์—†์Œ โ€” ์ด๋ฏธ ์žˆ๋Š” ์ƒํƒœ โ†’ skip

ํ˜„์žฌ๊นŒ์ง€ transition

{s1,s2} --a--> {s1,s2}   (self-loop)

๐Ÿ”ฅ 36~40ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋‚ด์šฉ
1. ์ฒ˜๋ฆฌ ๋Œ€์ƒ T = {s1,s2}
2. a ์ž…๋ ฅ Move({s1,s2}, a) = {s1,s2}
3. ฮต-closure ์ ์šฉ = {s1,s2}
4. ์ƒํƒœ ์กด์žฌ ์—ฌ๋ถ€ ์ด๋ฏธ ์žˆ์Œ โ†’ ์ถ”๊ฐ€ X
5. transition ์ถ”๊ฐ€ ฮด({s1,s2}, a) = {s1,s2}

ํ˜„์žฌ ๋ˆ„์  ์ „์ด

{s0,s1}
  โ”œโ”€ a โ†’ {s1,s2}
  โ””โ”€ b โ†’ {s1}

{s1,s2}
  โ””โ”€ a โ†’ {s1,s2}   (self-loop)

๐Ÿ”ฅ ์ค‘์š”ํ•œ ํŒจํ„ด โ€” self-loop

{s1,s2} --a--> {s1,s2}

์ด๋Ÿฐ ๊ตฌ์กฐ๋Š” ์ƒํƒœ ์•ˆ์ •ํ™” ๋А๋‚Œ โ€” ์‹œํ—˜์—์„œ ์ž์ฃผ ๋‚˜์˜ค๋Š” ๊ตฌ์กฐ

์ด ๊ตฌ๊ฐ„ ํ•ต์‹ฌ ํฌ์ธํŠธ

  1. ์ƒํƒœ ์ค‘๋ณต ์ฒดํฌ: ์ด๋ฏธ ์žˆ์œผ๋ฉด ์ ˆ๋Œ€ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์Œ
  2. self-loop ์ดํ•ด: ์ƒํƒœ๊ฐ€ ์ž๊ธฐ ์ž์‹ ์œผ๋กœ ๋Œ์•„์˜ฌ ์ˆ˜ ์žˆ์Œ
  3. Move ๊ณ„์‚ฐ ์ •ํ™•ํžˆ: ์ง‘ํ•ฉ ์•ˆ ๋ชจ๋“  ์ƒํƒœ ๋‹ค ๊ณ ๋ คํ•ด์•ผ ํ•จ

๐Ÿ“‹ 41ํŽ˜์ด์ง€ โ€” ฮด ํ‘œ ํ™•์ • (a ๋ถ€๋ถ„)

ํ˜„์žฌ ฮด ์ „์ดํ‘œ

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} ย 

{s1,s2}์—์„œ a์— ๋Œ€ํ•œ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ์ „์ดํ‘œ์— ๊ณต์‹ ๋ฐ˜์˜ํ•œ ๋‹จ๊ณ„.

์™œ ์ž๊ธฐ ์ž์‹ ์œผ๋กœ ๊ฐ€๋Š”๊ฐ€?

s1์—์„œ a โ†’ s1, s2
s2์—์„œ a โ†’ ์ด๋™ ์—†์Œ
ํ•ฉ์น˜๋ฉด: {s1, s2}

์ฆ‰ {s1,s2}์—์„œ a๋ฅผ ์ฝ์–ด๋„ ๊ทธ๋Œ€๋กœ {s1,s2}์— ๋จธ๋ฌธ๋‹ค โ†’ self-loop

๐Ÿ”ฅ ํ•ต์‹ฌ

{s1,s2} --a--> {s1,s2}

๐Ÿ“‹ 42ํŽ˜์ด์ง€ โ€” ฮฑ = b ๊ณ„์‚ฐ ์‹œ์ž‘

T = {s1, s2} ฮฑ = b

ํ˜„์žฌ ์ƒํƒœ๋Š” {s1,s2}, ์ด๋ฒˆ์—๋Š” ์ž…๋ ฅ b์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

  • a๋Š” ๋๋‚ฌ๊ณ 
  • ๋‚จ์€ ์ž…๋ ฅ b๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ฐจ๋ก€

ํ•ด์•ผ ํ•  ๊ณ„์‚ฐ

Move({s1,s2}, b) = ?

์—ฌ๊ธฐ์„œ ์ƒˆ ์ƒํƒœ๊ฐ€ ๋‚˜์˜ฌ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

์ƒํƒœ: {s1,s2} / ์ž…๋ ฅ: b โ†’ ๊ณ„์‚ฐ ์‹œ์ž‘


๐Ÿ“‹ 43ํŽ˜์ด์ง€ โ€” ์ƒˆ ์ƒํƒœ {s1,s3} ๋“ฑ์žฅ

U = {s1, s3}

๋‹ค์Œ ์ƒํƒœ U๋Š” {s1, s3}์ด๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ๋˜๋Š”๊ฐ€? ๊ณ„์‚ฐ ๊ณผ์ •

Move({s1,s2}, b):
  s1์—์„œ b โ†’ s1 โ†’ s1
  s2์—์„œ b โ†’ s2 โ†’ s3

ํ•ฉ์น˜๋ฉด: {s1, s3}

ฮต-closure({s1,s3}) = {s1,s3}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ โ†’ ๊ทธ๋Œ€๋กœ {s1, s3}


์ด ์ƒํƒœ๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ 

์ƒํƒœ ์˜๋ฏธ
s1 ๊ณ„์† ๋ฐ˜๋ณต ๊ตฌ์กฐ ์ชฝ ์ƒํƒœ
s3 accepting state s4 ๋ฐ”๋กœ ์ง์ „ ์ƒํƒœ

์ฆ‰ {s1,s3}๋Š” โ€œ์•„์ง ๋ฐ˜๋ณต๋„ ๊ฐ€๋Šฅํ•˜๊ณ , ํ•œํŽธ์œผ๋กœ๋Š” ์ˆ˜์šฉ ์ƒํƒœ์— ๊ฐ€๊นŒ์›Œ์ง„ ์ƒํƒœโ€๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s1,s2}, b) = {s1,s3}

๐Ÿ“‹ 44ํŽ˜์ด์ง€ โ€” ์ƒˆ ์ƒํƒœ ์—ฌ๋ถ€ ํ™•์ธ

ํ˜„์žฌ ์ƒํ™ฉ

U = {s1, s3}
T = {s1,s2}
ฮฑ = b
Dstates = { {s0,s1}, {s1,s2}, {s1} }

ํ˜„์žฌ Dstates์—๋Š” {s1,s3}๊ฐ€ ๋“ค์–ด ์žˆ์ง€ ์•Š๋‹ค.

๊ฒฐ๋ก 

{s1,s3} ๋Š” ์ƒˆ๋กœ์šด DFA ์ƒํƒœ โ†’ ์ถ”๊ฐ€ ํ•„์š”

๐Ÿ“‹ 45ํŽ˜์ด์ง€ โ€” Dstates ์—…๋ฐ์ดํŠธ

์—…๋ฐ์ดํŠธ ๊ฒฐ๊ณผ

Dstates = { {s0,s1}, {s1,s2}, {s1}, {s1,s3} }

DFA ์ƒํƒœ ์ง‘ํ•ฉ์ด ์ด์ œ ๋„ค ๊ฐœ๊ฐ€ ๋œ๋‹ค.

ฮด ์ „์ดํ‘œ ์—…๋ฐ์ดํŠธ

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}

ฮด({s1,s2}, b) = {s1,s3} ๊ณต์‹ ๋ฐ˜์˜ ์™„๋ฃŒ.

์ด ์ƒํƒœ๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ 

{s1,s3}๋Š” ์•ž์œผ๋กœ ์ž…๋ ฅ b๋ฅผ ํ•˜๋‚˜ ๋” ์ฝ์œผ๋ฉด s4(accepting) ์ชฝ์œผ๋กœ ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.
์ฆ‰, ์ด์ œ DFA๊ฐ€ accepting NFA state๋ฅผ ํฌํ•จํ•˜๋Š” ์ƒํƒœ๋กœ ์ด์–ด์งˆ ์ค€๋น„๋ฅผ ํ•œ ์…ˆ์ด๋‹ค.


๐Ÿ”ฅ 41~45ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋‚ด์šฉ
1. {s1,s2}์—์„œ a ฮด({s1,s2}, a) = {s1,s2} โ†’ self-loop
2. {s1,s2}์—์„œ b Move({s1,s2}, b) = {s1,s3}
3. ฮต-closure ์ ์šฉ = {s1,s3}
4. ์ƒํƒœ ์ถ”๊ฐ€ Dstates = { {s0,s1}, {s1,s2}, {s1}, {s1,s3} }
5. ์ „์ดํ‘œ ์—…๋ฐ์ดํŠธ ฮด({s1,s2}, b) = {s1,s3}

ํ˜„์žฌ๊นŒ์ง€ ์ „์ฒด DFA ๊ตฌ์กฐ

{s0,s1}
  โ”œโ”€ a โ†’ {s1,s2}
  โ””โ”€ b โ†’ {s1}

{s1,s2}
  โ”œโ”€ a โ†’ {s1,s2}   (self-loop)
  โ””โ”€ b โ†’ {s1,s3}

๐Ÿ”ฅ ์‹œํ—˜์—์„œ ์ค‘์š”ํ•œ ํฌ์ธํŠธ

ํฌ์ธํŠธ ์„ค๋ช…
์—ฌ๋Ÿฌ ์ƒํƒœ์˜ ์ด๋™์€ ํ•ฉ์ง‘ํ•ฉ Move({s1,s2}, b) = Move(s1,b) โˆช Move(s2,b)
๊ธฐ์กด ์ƒํƒœ์ธ์ง€ ํ•ญ์ƒ ํ™•์ธ ์ด๋ฏธ ์žˆ์œผ๋ฉด ์ถ”๊ฐ€ X, ์—†์œผ๋ฉด ์ถ”๊ฐ€ O
accepting ํฌํ•จ ์—ฌ๋ถ€ ๋‚˜์ค‘์— ์ฒดํฌ ํ˜„์žฌ {s1,s3}๋Š” ์•„์ง s4 ์—†์Œ, ๊ณง {s1,s4} ๋“ฑ์žฅ ์˜ˆ์ •

๐Ÿ“‹ 46ํŽ˜์ด์ง€ โ€” ๋‹ค์Œ ์ฒ˜๋ฆฌ ๋Œ€์ƒ: {s1}

ํ˜„์žฌ ์ƒํƒœ

Dstates = { {s0,s1}, {s1,s2}, {s1}, {s1,s3} }
์ƒํƒœ ์ฒ˜๋ฆฌ ์—ฌ๋ถ€
{s0,s1} โœ” ์™„๋ฃŒ
{s1,s2} โœ” ์™„๋ฃŒ
{s1} โŒ ๋ฏธ์ฒ˜๋ฆฌ
{s1,s3} โŒ ๋ฏธ์ฒ˜๋ฆฌ

{ s1 } is the next unmarked state in Dstates.

{s1}์ด Dstates์—์„œ ๋‹ค์Œ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ๋ฏธํ‘œ์‹œ(unmarked) ์ƒํƒœ์ด๋‹ค.

subset construction์˜ ํ๋ฆ„:

  1. ์•„์ง ์ฒ˜๋ฆฌ ์•ˆ ํ•œ ์ƒํƒœ ํ•˜๋‚˜ ๊ณ ๋ฆ„
  2. ๊ทธ ์ƒํƒœ์—์„œ ์ž…๋ ฅ ๋ฌธ์ž ์ „๋ถ€์— ๋Œ€ํ•ด Move ๊ณ„์‚ฐ
  3. ๊ฒฐ๊ณผ ์ƒํƒœ๋ฅผ ์ถ”๊ฐ€/๊ธฐ๋ก
  4. mark ์ฒ˜๋ฆฌ

๐Ÿ”ฅ ํ•ต์‹ฌ

๋‹ค์Œ ์ฒ˜๋ฆฌ ๋Œ€์ƒ: T = {s1}

๐Ÿ“‹ 47ํŽ˜์ด์ง€ โ€” {s1}์—์„œ a ๊ณ„์‚ฐ

T = { s1 } ฮฑ = a

ํ˜„์žฌ ์ƒํƒœ๋Š” {s1}, ์ž…๋ ฅ a์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

ํ•ด์•ผ ํ•  ๊ณ„์‚ฐ

Move({s1}, a) = ?

์›์†Œ๊ฐ€ ํ•˜๋‚˜๋ฟ์ด๋ฏ€๋กœ ์‚ฌ์‹ค์ƒ NFA ์ƒํƒœ s1 ํ•˜๋‚˜์—์„œ์˜ ์ด๋™๋งŒ ๋ณด๋ฉด ๋œ๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

์ง€๊ธˆ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•  ๊ฒƒ: ฮด({s1}, a)

๐Ÿ“‹ 48ํŽ˜์ด์ง€ โ€” ๊ณ„์‚ฐ ๊ฒฐ๊ณผ

U = { s1 , s2 }

๋‹ค์Œ ์ƒํƒœ U๋Š” {s1, s2}์ด๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ๋˜๋Š”๊ฐ€? ๊ณ„์‚ฐ ๊ณผ์ •

Move({s1}, a):
  s1์—์„œ a โ†’ s1 โ†’ s1
           โ†’ s1 โ†’ s2

๊ฒฐ๊ณผ: {s1, s2}

ฮต-closure({s1,s2}) = {s1,s2}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ


์˜๋ฏธ ์„ค๋ช…

s1์€ ์›๋ž˜ โ€œa๋ฅผ ์ฝ์œผ๋ฉด ์ž๊ธฐ ์ž์‹ ์— ๋จธ๋ฌผ ์ˆ˜๋„ ์žˆ๊ณ , ๋‹ค์Œ ๋‹จ๊ณ„์ธ s2๋กœ๋„ ๊ฐˆ ์ˆ˜ ์žˆ๋Š”โ€ ๋น„๊ฒฐ์ • ์ƒํƒœ์˜€๋‹ค. DFA์—์„œ๋Š” ๊ทธ ๋‘ ๊ฐ€๋Šฅ์„ฑ์„ ํ•œ ๋ฒˆ์— ๋‹ด์•„์•ผ ํ•˜๋ฏ€๋กœ:

{s1} --a--> {s1,s2}

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s1}, a) = {s1,s2}

๐Ÿ“‹ 49ํŽ˜์ด์ง€ โ€” ์ „์ดํ‘œ ๋ฐ˜์˜ (๊ธฐ์กด ์ƒํƒœ)

ํ˜„์žฌ ฮด ์ „์ดํ‘œ

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} ย 

{s1}์—์„œ a๋ฅผ ์ฝ์œผ๋ฉด {s1,s2}๋กœ ๊ฐ„๋‹ค๋Š” ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ฮด ํ‘œ์— ๊ธฐ๋ก ์™„๋ฃŒ.


์ค‘์š”ํ•œ ํฌ์ธํŠธ

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์ ์€ {s1,s2}๊ฐ€ ์ด๋ฏธ Dstates ์•ˆ์— ์žˆ๋Š” ์ƒํƒœ๋ผ๋Š” ์ ์ด๋‹ค.

  • ์ƒˆ๋กœ์šด ์ƒํƒœ ์•„๋‹˜ โ†’ ์ถ”๊ฐ€๋Š” ์•ˆ ํ•จ
  • ์ „์ดํ‘œ๋งŒ ๊ธฐ๋ก

ํ•™์ƒ๋“ค์ด ์ž์ฃผ ํ—ท๊ฐˆ๋ฆฌ๋Š” ๋ถ€๋ถ„:

โŒ โ€œ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”์œผ๋ฉด ๋ฌด์กฐ๊ฑด ์ƒํƒœ ์ถ”๊ฐ€โ€ โ†’ ํ‹€๋ฆผ
โœ… ์—†์„ ๋•Œ๋งŒ ์ถ”๊ฐ€, ์žˆ์œผ๋ฉด ์ „์ด๋งŒ ๊ธฐ๋ก

๐Ÿ”ฅ ํ•ต์‹ฌ

์ด ํŽ˜์ด์ง€์˜ ํ•ต์‹ฌ:
๊ธฐ์กด ์ƒํƒœ๋กœ ๊ฐ€๋Š” ์ „์ด๋ฅผ ํ‘œ์— ๋ฐ˜์˜

๐Ÿ“‹ 50ํŽ˜์ด์ง€ โ€” {s1}์—์„œ b ๊ณ„์‚ฐ ์‹œ์ž‘

T = { s1 } ฮฑ = b

ํ˜„์žฌ ์ƒํƒœ๋Š” {s1}, ์ด๋ฒˆ์—๋Š” ์ž…๋ ฅ b์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

ํ•ด์•ผ ํ•  ๊ฒƒ

Move({s1}, b) = ?

{s1}์— ๋Œ€ํ•œ ์ „์ด๋Š” ๋‘ ๊ฐœ๋ฅผ ๋‹ค ์ฑ„์›Œ์•ผ ํ•œ๋‹ค:

  • a โ†’ ์ด๋ฏธ ๊ณ„์‚ฐํ•จ (= {s1,s2})
  • b โ†’ ์ง€๊ธˆ๋ถ€ํ„ฐ ๊ณ„์‚ฐ

๊ทธ๋ž˜์•ผ DFA์—์„œ {s1} ํ–‰์ด ์™„์„ฑ๋œ๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

๋‹ค์Œ ๊ณ„์‚ฐ: ฮด({s1}, b)

๐Ÿ”ฅ 46~50ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋‚ด์šฉ
1. ๋‹ค์Œ ์ฒ˜๋ฆฌ ์ƒํƒœ ์„ ํƒ T = {s1}
2. a์— ๋Œ€ํ•ด ๊ณ„์‚ฐ Move({s1}, a) = {s1,s2}
3. ฮต-closure ์ ์šฉ = {s1,s2}
4. ์ „์ดํ‘œ ๋ฐ˜์˜ ฮด({s1}, a) = {s1,s2} (๊ธฐ์กด ์ƒํƒœ โ†’ ์ถ”๊ฐ€ ์—†์Œ)
5. b ๊ณ„์‚ฐ ์‹œ์ž‘ ฮด({s1}, b) ๊ณ„์‚ฐ ์ค€๋น„

ํ˜„์žฌ๊นŒ์ง€ ๋ˆ„์ ๋œ DFA ์ „์ดํ‘œ

DFA ์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} ์•„์ง ๊ณ„์‚ฐ ์ค‘
{s1,s3} ์•„์ง ๋ฏธ์ฒ˜๋ฆฌ ์•„์ง ๋ฏธ์ฒ˜๋ฆฌ

๐Ÿ”ฅ ์ด ๊ตฌ๊ฐ„์—์„œ ๊ผญ ์ดํ•ดํ•ด์•ผ ํ•  ํฌ์ธํŠธ

  1. ์›์†Œ ํ•˜๋‚˜์งœ๋ฆฌ ์ง‘ํ•ฉ ์ƒํƒœ๋„ ๊ฒฐ๊ตญ DFA ์ƒํƒœ๋‹ค
    {s1}๋„ ๊ทธ๋ƒฅ ํ•˜๋‚˜์˜ DFA ์ƒํƒœ๋‹ค.

  2. ์ง‘ํ•ฉ ์ƒํƒœ์—์„œ ๊ณ„์‚ฐํ•  ๋•Œ๋„ ๊ทœ์น™์€ ๊ฐ™๋‹ค
    Move โ†’ ฮต-closure โ†’ ๊ธฐ์กด ์ƒํƒœ์ธ์ง€ ํ™•์ธ โ†’ ์ „์ดํ‘œ ๊ธฐ๋ก

  3. ๊ธฐ์กด ์ƒํƒœ๋ฉด ์ƒˆ๋กœ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๋Š”๋‹ค
    {s1,s2}๋Š” ์ด๋ฏธ ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๊ณ  ์ „์ด๋งŒ ์ ๋Š”๋‹ค.

    COMP321 Compiler โ€” Lexical Analysis Part 3

    Kyungpook National University | Hwisoo So | Spring 2026


๐Ÿ“‹ 51ํŽ˜์ด์ง€ โ€” {s1}์—์„œ b ๊ณ„์‚ฐ

T = { s1 } ฮฑ = b

ํ˜„์žฌ ์ƒํƒœ {s1}์—์„œ ์ž…๋ ฅ b์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

๊ณ„์‚ฐ ๊ณผ์ •

Move({s1}, b):
  s1์—์„œ b โ†’ s1 โ†’ s1

๊ฒฐ๊ณผ: {s1}

ฮต-closure({s1}) = {s1}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ โ†’ ๊ทธ๋Œ€๋กœ {s1}

๐Ÿ”ฅ ์ตœ์ข… ๊ฒฐ๊ณผ

ฮด({s1}, b) = {s1}

์˜๋ฏธ โ€” ์™„์ „ํ•œ ๋ฃจํ”„ํ˜• ์ƒํƒœ

{s1} ์ƒํƒœ๋Š”:

์ž…๋ ฅ ๊ฒฐ๊ณผ
a {s1,s2}
b {s1} โ† self-loop

๐Ÿ“‹ 52ํŽ˜์ด์ง€ โ€” {s1} ์ „์ด ์™„์„ฑ

ฮด ์ „์ดํ‘œ ๋ฐ˜์˜

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} {s1}

{s1} ์ƒํƒœ ์ „์ด ์™„์„ฑ

๐Ÿ”ฅ ํ•ต์‹ฌ

{s1} โ†’ a โ†’ {s1,s2}
{s1} โ†’ b โ†’ {s1}   (self-loop)

์ด ์ƒํƒœ๋Š” ์™„์ „ํ•œ โ€œ๋ฃจํ”„ํ˜• ์ƒํƒœโ€ โ€” b๋ฅผ ๊ณ„์† ์ฝ์–ด๋„ ์ž๊ธฐ ์ž์‹ ์— ๋จธ๋ฌธ๋‹ค.


๐Ÿ“‹ 53ํŽ˜์ด์ง€ โ€” ์•„์ง ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ ๋‚จ์•„์žˆ์Œ

ํ˜„์žฌ ์ฒ˜๋ฆฌ ํ˜„ํ™ฉ

Dstates = {
  {s0,s1},    โœ” ์™„๋ฃŒ
  {s1,s2},    โœ” ์™„๋ฃŒ
  {s1},       โœ” ์™„๋ฃŒ
  {s1,s3}     โŒ ๋ฏธ์ฒ˜๋ฆฌ
}

์•„์ง ์ฒ˜๋ฆฌ ์•ˆ ๋œ ์ƒํƒœ๊ฐ€ ๋‚จ์•„ ์žˆ๋‹ค โ†’ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ณ„์† ์ง„ํ–‰

๐Ÿ”ฅ ํ•ต์‹ฌ

๋‹ค์Œ ์ฒ˜๋ฆฌ ๋Œ€์ƒ: T = {s1,s3}

๐Ÿ“‹ 54ํŽ˜์ด์ง€ โ€” {s1,s3}์—์„œ a ๊ณ„์‚ฐ

T = { s1, s3 } ฮฑ = a

ํ˜„์žฌ ์ƒํƒœ {s1,s3}์—์„œ ์ž…๋ ฅ a์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

๊ณ„์‚ฐ ๊ณผ์ •

Move({s1,s3}, a):
  s1์—์„œ a โ†’ s1 โ†’ s1
           โ†’ s1 โ†’ s2
  s3์—์„œ a โ†’ ์ด๋™ ์—†์Œ

๊ฒฐ๊ณผ: {s1, s2}

ฮต-closure({s1,s2}) = {s1,s2}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ

๐Ÿ”ฅ ๊ฒฐ๊ณผ

ฮด({s1,s3}, a) = {s1,s2}

์˜๋ฏธ

{s1,s3}์—์„œ a๋ฅผ ์ฝ์œผ๋ฉด ๋‹ค์‹œ ๋ฐ˜๋ณต ์ƒํƒœ {s1,s2}๋กœ ๋Œ์•„๊ฐ„๋‹ค.


๐Ÿ“‹ 55ํŽ˜์ด์ง€ โ€” {s1,s3}์—์„œ b ๊ณ„์‚ฐ (โญ ํ•ต์‹ฌ)

T = { s1, s3 } ฮฑ = b

ํ˜„์žฌ ์ƒํƒœ {s1,s3}์—์„œ ์ž…๋ ฅ b์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ๋‹ค.

๊ณ„์‚ฐ ๊ณผ์ •

Move({s1,s3}, b):
  s1์—์„œ b โ†’ s1 โ†’ s1
  s3์—์„œ b โ†’ s3 โ†’ s4  โ† accepting state!

๊ฒฐ๊ณผ: {s1, s4}

ฮต-closure({s1,s4}) = {s1,s4}

์ถ”๊ฐ€ ฮต ์ด๋™ ์—†์Œ

๐Ÿ”ฅ ๊ฒฐ๊ณผ

ฮด({s1,s3}, b) = {s1,s4}

โญ ๋งค์šฐ ์ค‘์š”ํ•œ ํฌ์ธํŠธ โ€” accepting state ๋“ฑ์žฅ

์ƒํƒœ ์˜๋ฏธ
s1 ์•„์ง ๋ฐ˜๋ณต ๊ฐ€๋Šฅ
s4 accepting state

์ฆ‰ {s1,s4}๋Š”:

  • ์•„์ง ๋ฐ˜๋ณต๋„ ๊ฐ€๋Šฅ (s1)
  • ๋™์‹œ์— accept ์ƒํƒœ ํฌํ•จ (s4)

โ†’ โ€œ์ด ๋ฌธ์ž์—ด์€ accept ๊ฐ€๋Šฅ ์ƒํƒœ์— ๋„๋‹ฌํ–ˆ๋‹คโ€


๐Ÿ”ฅ 51~55ํŽ˜์ด์ง€ ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ

๋‹จ๊ณ„ ๋‚ด์šฉ
1. {s1} ์ฒ˜๋ฆฌ ์™„๋ฃŒ a โ†’ {s1,s2} / b โ†’ {s1} (self-loop)
2. ๋‹ค์Œ ์ฒ˜๋ฆฌ T = {s1,s3}
3. a ์ฒ˜๋ฆฌ {s1,s3} โ†’ a โ†’ {s1,s2}
4. b ์ฒ˜๋ฆฌ {s1,s3} โ†’ b โ†’ {s1,s4} โ† accepting ํฌํ•จ
5. ์ƒˆ ์ƒํƒœ ๋“ฑ์žฅ {s1,s4}

ํ˜„์žฌ๊นŒ์ง€ ๋ˆ„์  ์ „์ดํ‘œ

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} {s1}
{s1,s3} {s1,s2} {s1,s4}

์ด ๊ตฌ๊ฐ„ ํ•ต์‹ฌ ํฌ์ธํŠธ

๊ฐœ๋… ์„ค๋ช…
accepting state ํŒ๋‹จ ๊ธฐ์ค€ DFA ์ƒํƒœ ์•ˆ์— accepting NFA ์ƒํƒœ ํฌํ•จํ•˜๋ฉด accept
{s1} ์ˆœ์ˆ˜ ๋ฐ˜๋ณต ์ƒํƒœ (self-loop)
{s1,s2} ์ค‘๊ฐ„ ๋‹จ๊ณ„ ์ƒํƒœ
{s1,s3} accept ์ง์ „ ์ƒํƒœ
{s1,s4} accept ์ƒํƒœ

๐Ÿ“‹ 56ํŽ˜์ด์ง€ โ€” {s1,s4} ์ „์ด ๊ณ„์‚ฐ

T = { s1, s3 } ฮฑ = b โ†’ {s1,s4} ํ™•์ •

{s1, s3}์—์„œ b๋ฅผ ์ฝ์œผ๋ฉด {s1, s4}๋กœ ๊ฐ„๋‹ค.

Move({s1,s3}, b):
  s1์—์„œ b โ†’ s1
  s3์—์„œ b โ†’ s4

๊ฒฐ๊ณผ: {s1, s4}

์ด ์ˆœ๊ฐ„์ด ์•„์ฃผ ์ค‘์š”ํ•˜๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ด์ œ DFA ์•ˆ์— accepting state๊ฐ€ ์ƒ๊ธฐ๊ธฐ ์‹œ์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๐Ÿ”ฅ ํ•ต์‹ฌ

ฮด({s1,s3}, b) = {s1,s4}

๐Ÿ“‹ 57ํŽ˜์ด์ง€ โ€” U = {s1, s4} ํ™•์ •

U = { s1, s4 }

๋‹ค์Œ ์ƒํƒœ U๋Š” {s1, s4}์ด๋‹ค.

{s1,s4}๋Š” ๊ทธ๋ƒฅ ์ƒˆ๋กœ์šด ์ƒํƒœ๊ฐ€ ์•„๋‹ˆ๋ผ accepting NFA state๋ฅผ ํฌํ•จํ•˜๋Š” DFA ์ƒํƒœ๋‹ค.

์ฆ‰ DFA ๊ด€์ ์—์„œ ์ด ์ƒํƒœ๋Š” ๋‚˜์ค‘์— accepting state๊ฐ€ ๋œ๋‹ค.


๐Ÿ“‹ 58ํŽ˜์ด์ง€ โ€” {s1,s4} Dstates์— ์ถ”๊ฐ€

U not yet in Dstates โ†’ add.

{s1,s4}๋Š” ์•„์ง Dstates์— ์—†์œผ๋ฏ€๋กœ ์ถ”๊ฐ€ํ•œ๋‹ค.

Dstates ์—…๋ฐ์ดํŠธ

์ด์ „:
Dstates = { {s0,s1}, {s1,s2}, {s1}, {s1,s3} }

์ดํ›„:
Dstates = {
  {s0,s1},
  {s1,s2},
  {s1},
  {s1,s3},
  {s1,s4}   โ† ์ƒˆ๋กœ ์ถ”๊ฐ€
}

subset construction์—์„œ ํ•ญ์ƒ ํ•ด์•ผ ํ•˜๋Š” ์ผ:

  1. Move ๊ณ„์‚ฐ
  2. ฮต-closure ์ ์šฉ
  3. ๊ธฐ์กด ์ƒํƒœ์ธ์ง€ ํ™•์ธ
  4. ์—†์œผ๋ฉด ์ถ”๊ฐ€ โ† ์ง€๊ธˆ ์—ฌ๊ธฐ

๐Ÿ“‹ 59ํŽ˜์ด์ง€ โ€” ์ „์ดํ‘œ ์—…๋ฐ์ดํŠธ ํ™•์ •

ํ˜„์žฌ ฮด ์ „์ดํ‘œ

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} {s1}
{s1,s3} {s1,s2} {s1,s4}
{s1,s4} ย  ย 

{s1,s3} โ€“bโ€“> {s1,s4} ๋ฐ˜์˜ ์™„๋ฃŒ.

์ด์ œ ๋‚จ์€ ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ: {s1,s4} ํ•˜๋‚˜๋ฟ โ†’ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฑฐ์˜ ๋


๐Ÿ“‹ 60ํŽ˜์ด์ง€ โ€” ๋งˆ์ง€๋ง‰ ์ƒํƒœ {s1,s4} ์ฒ˜๋ฆฌ

T = { s1, s4 } is the last unmarked state.

{s1,s4}๋Š” ๋งˆ์ง€๋ง‰ ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ์ด๋‹ค.

โ€œReading an a takes us to {s1, s2}. Reading a b takes us to {s1}. Those two transitions are added to ฮด.โ€

ฮฑ = a ๊ณ„์‚ฐ

Move({s1,s4}, a):
  s1์—์„œ a โ†’ s1, s2
  s4์—์„œ a โ†’ ์ด๋™ ์—†์Œ

๊ฒฐ๊ณผ: {s1, s2}
โ†’ ฮด({s1,s4}, a) = {s1,s2}

ฮฑ = b ๊ณ„์‚ฐ

Move({s1,s4}, b):
  s1์—์„œ b โ†’ s1
  s4์—์„œ b โ†’ ์ด๋™ ์—†์Œ

๊ฒฐ๊ณผ: {s1}
โ†’ ฮด({s1,s4}, b) = {s1}

์˜๋ฏธ ์„ค๋ช…

{s1,s4}๋Š” accept ์ƒํƒœ๋ฅผ ํฌํ•จํ•˜์ง€๋งŒ, ์ž…๋ ฅ์„ ๋” ์ฝ์œผ๋ฉด ๋‹ค์‹œ ๋น„accept ์ƒํƒœ๋กœ ๊ฐˆ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๊ฒŒ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ด์œ :

DFA์˜ accept ์—ฌ๋ถ€๋Š” โ€œํ˜„์žฌ๊นŒ์ง€ ์ฝ์€ ๋ฌธ์ž์—ด์ด accept๋˜๋Š”๊ฐ€โ€๋ฅผ ๋œปํ•˜์ง€,
โ€œ์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ ์˜์›ํžˆ accept ์ƒํƒœ์— ๊ณ ์ •๋œ๋‹คโ€๋Š” ๋œป์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ฆ‰ accepting state๋„ ๋‹ค๋ฅธ ์ž…๋ ฅ์„ ๋ฐ›์œผ๋ฉด ๋‹ค๋ฅธ ์ƒํƒœ๋กœ ์ „์ดํ•  ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ“‹ 61ํŽ˜์ด์ง€ โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ข…๋ฃŒ

No more unmarked states in Dstates. The algorithm stops.

Dstates ์•ˆ์— ๋” ์ด์ƒ ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ๊ฐ€ ์—†๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ข…๋ฃŒ๋œ๋‹ค.

์ตœ์ข… DFA ์ƒํƒœ ์ง‘ํ•ฉ

{s0,s1}
{s1,s2}
{s1}
{s1,s3}
{s1,s4}

๊ฐ ์ƒํƒœ์— ๋Œ€ํ•ด ์ž…๋ ฅ a, b ์ „์ด๊ฐ€ ๋ชจ๋‘ ์ฑ„์›Œ์กŒ๋‹ค โ†’ subset construction ์™„์ „ํžˆ ๋


๐Ÿ“‹ 62ํŽ˜์ด์ง€ โ€” ์ตœ์ข… DFA ์™„์„ฑ โญ

The DFA for the above transition function ฮด

โ€œAll DFA states that contain an accepting NFA state become accepting states in the DFA!โ€
accepting NFA state๋ฅผ ํฌํ•จํ•˜๋Š” ๋ชจ๋“  DFA ์ƒํƒœ๋Š” DFA์—์„œ๋„ accepting state๊ฐ€ ๋œ๋‹ค.

์ตœ์ข… ์ „์ดํ‘œ (์™„์„ฑ๋ณธ)

์ƒํƒœ a b
{s0,s1} {s1,s2} {s1}
{s1,s2} {s1,s2} {s1,s3}
{s1} {s1,s2} {s1}
{s1,s3} {s1,s2} {s1,s4}
{s1,s4} {s1,s2} {s1}

Accepting state ํŒ์ •

NFA์˜ accepting state = s4

๋”ฐ๋ผ์„œ s4๋ฅผ ํฌํ•จํ•˜๋Š” DFA ์ƒํƒœ๋งŒ accepting:

DFA ์ƒํƒœ accepting ์—ฌ๋ถ€
{s0,s1} โŒ
{s1,s2} โŒ
{s1} โŒ
{s1,s3} โŒ
{s1,s4} โœ… accepting

๐Ÿ–ผ ์ตœ์ข… DFA ๊ทธ๋ฆผ (์ƒํƒœ ์ด๋ฆ„ ๋‹จ์ˆœํ™”)

์Šฌ๋ผ์ด๋“œ์—๋Š” ์ง‘ํ•ฉ ์ด๋ฆ„์„ ์ค„์—ฌ์„œ ๊ทธ๋ฆฐ๋‹ค:

์ง‘ํ•ฉ ์ƒํƒœ ๋‹จ์ˆœ ์ด๋ฆ„
{s0,s1} S0
{s1} S1
{s1,s2} S1S2
{s1,s3} S1S3
{s1,s4} S1S4

์ฃผ์š” ํ™”์‚ดํ‘œ:

S0     --a--> S1S2
S0     --b--> S1
S1S2   --a--> S1S2  (self-loop)
S1S2   --b--> S1S3
S1     --a--> S1S2
S1     --b--> S1    (self-loop)
S1S3   --a--> S1S2
S1S3   --b--> S1S4
S1S4   --a--> S1S2
S1S4   --b--> S1

์ด ๊ทธ๋ฆผ์€ ์•ž์—์„œ ์ˆ˜์‹ญ ํŽ˜์ด์ง€ ๋™์•ˆ ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ•œ ์žฅ์œผ๋กœ ์••์ถ•ํ•œ ์ตœ์ข… DFA ์™„์„ฑ๋ณธ์ด๋‹ค.


๐Ÿ“‹ 63ํŽ˜์ด์ง€ โ€” Extra: Executing JLex

Extra: Executing JLex

์ถ”๊ฐ€: JLex ์‹คํ–‰ํ•˜๊ธฐ

Note: this is not in the range of midterm / final exam. Just for practice.
์ฃผ์˜: ์ด๊ฒƒ์€ ์ค‘๊ฐ„/๊ธฐ๋ง ์‹œํ—˜ ๋ฒ”์œ„๊ฐ€ ์•„๋‹ˆ๋‹ค. ์—ฐ์Šต์šฉ์ผ ๋ฟ์ด๋‹ค.

์ด ํŽ˜์ด์ง€ ์ดํ›„๋Š” JLex๋ฅผ ์‹ค์ œ๋กœ ์‹คํ–‰ํ•ด ๋ณด๋Š” ๋ฐฉ๋ฒ• ์†Œ๊ฐœ ํŒŒํŠธ๋‹ค.


๐Ÿ“‹ 64ํŽ˜์ด์ง€ โ€” JLex ์‹คํ–‰ ํ™˜๊ฒฝ ์ค€๋น„

Running JLex on a Sample Scanner Spec

์ƒ˜ํ”Œ scanner ๋ช…์„ธ์— ๋Œ€ํ•ด JLex ์‹คํ–‰ํ•˜๊ธฐ

์ค€๋น„ ๋‹จ๊ณ„

๋‹จ๊ณ„ ๋‚ด์šฉ
1 LMS์—์„œ jlex_examples.zip ๋‹ค์šด๋กœ๋“œ ๋ฐ ์••์ถ• ํ•ด์ œ
2 JLex ๋‹ค์šด๋กœ๋“œ โ€” Source Code ๋ฐ›๊ธฐ
3 JLex์˜ Main.java๋ฅผ jlex_examples์˜ JLex ํด๋” ์•ˆ์— ๋„ฃ๊ธฐ
4 User Manual ๋ฌธ์„œ ์ฐธ๊ณ  ๊ฐ€๋Šฅ
5 JDK ํ•„์š”

์•„์ง scanner๋ฅผ ๋Œ๋ฆฌ๋Š” ๋‹จ๊ณ„๊ฐ€ ์•„๋‹ˆ๋ผ ์‚ฌ์ „ ์„ธํŒ… ๋‹จ๊ณ„๋‹ค.


๐Ÿ“‹ 65ํŽ˜์ด์ง€ โ€” JLex ์ปดํŒŒ์ผ ๋ฐ ์‹คํ–‰

Running JLex on a Sample Scanner Spec

JLex ์ปดํŒŒ์ผ

# JLex ๋””๋ ‰ํ„ฐ๋ฆฌ๋กœ ์ด๋™ ํ›„
javac Main.java

Scanner.l์—์„œ scanner ์ฝ”๋“œ ์ƒ์„ฑ

# JLex์˜ ์ƒ์œ„ ๋””๋ ‰ํ„ฐ๋ฆฌ๋กœ ์ด๋™ ํ›„
java JLex.Main Scanner.l
  • Scanner.l: JLex์— ๋„˜๊ธฐ๋Š” scanner ๋ช…์„ธ ํŒŒ์ผ
  • ์ด ๋ช…๋ น์ด ์‹คํ–‰๋˜๋ฉด JLex๊ฐ€ ์ด๋ก  ์ˆœ์„œ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค:
user code ์ฒ˜๋ฆฌ
declarations ์ฒ˜๋ฆฌ
lexical rules ์ฒ˜๋ฆฌ
NFA ์ƒ์„ฑ
DFA transition table ์ƒ์„ฑ
minimization
lexical analyzer code ์ถœ๋ ฅ

์ฆ‰ ์šฐ๋ฆฌ๊ฐ€ ๋ฐฐ์šด ์ด๋ก  ์ˆœ์„œ ๊ทธ๋Œ€๋กœ ์ฝ˜์†”์— ๋‚˜ํƒ€๋‚œ๋‹ค:

spec โ†’ NFA โ†’ DFA โ†’ minimization โ†’ code output

๐Ÿ“‹ 66ํŽ˜์ด์ง€ โ€” Scanner ์ปดํŒŒ์ผ ๋ฐ ํ…Œ์ŠคํŠธ

Scanner ์ปดํŒŒ์ผ ๋ฐ ์‹คํ–‰

handler์™€ ํ•จ๊ป˜ ์ปดํŒŒ์ผ

javac Scanner.l.java Test.java
  • Test.java: handler ํŒŒ์ผ
  • ์ด ๋ช…๋ น์œผ๋กœ input.txt๋ฅผ ์Šค์บ”ํ•  ์ค€๋น„๊ฐ€ ๋œ๋‹ค.

scanner ์‹คํ–‰

java Test

์‹ค์ œ ์ถœ๋ ฅ ์˜ˆ์‹œ

Token(IF, "if", 0)
Token(ID, "spelling", 3)
Token(LESS, "<", 6)
Token(LESS_EQ, "<=", 17)

JLex spec์—์„œ ์ •์˜ํ–ˆ๋˜ ๊ทœ์น™๋“ค์ด ์‹ค์ œ ์ž…๋ ฅ ํŒŒ์ผ์— ์ ์šฉ๋˜์–ด, ๋ฌธ์ž์—ด์ด ํ† ํฐ์œผ๋กœ ๋ถ„๋ฅ˜๋œ ๊ฒฐ๊ณผ๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค.

์ด ์žฅ๋ฉด์€ โ€œ์ •๊ทœํ‘œํ˜„์‹ ๋ช…์„ธ โ†’ scanner ์ƒ์„ฑ โ†’ ์ž…๋ ฅ ์Šค์บ” โ†’ token ์ถœ๋ ฅโ€์ด ์‹ค์ œ๋กœ ๋™์ž‘ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ตœ์ข… ์‹ค์Šต ์˜ˆ์‹œ๋‹ค.

Scanner2.l๋„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ”ฅ 56~66ํŽ˜์ด์ง€ ์ „์ฒด ํ•ต์‹ฌ ์š”์•ฝ

1. Subset Construction ์ตœ์ข… ๋งˆ๋ฌด๋ฆฌ

๋งˆ์ง€๋ง‰ ๋ฏธ์ฒ˜๋ฆฌ ์ƒํƒœ {s1,s4}๊นŒ์ง€ ์ฒ˜๋ฆฌํ•˜๋ฉด ๋” ์ด์ƒ ์ƒˆ ์ƒํƒœ๊ฐ€ ์ƒ๊ธฐ์ง€ ์•Š๊ณ  ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋๋‚œ๋‹ค. ์ตœ์ข… DFA ์ „์ดํ‘œ์™€ DFA ๊ทธ๋ฆผ์ด 62ํŽ˜์ด์ง€์— ์ •๋ฆฌ๋œ๋‹ค.

2. Accepting State ํŒ์ • ๊ทœ์น™ (โญ ์‹œํ—˜ ํ•ต์‹ฌ)

DFA ์ƒํƒœ ์•ˆ์— accepting NFA state๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉด ๊ทธ DFA ์ƒํƒœ๋Š” accepting์ด๋‹ค.

์ด ์˜ˆ์—์„œ๋Š” {s1,s4}๊ฐ€ accepting state๋‹ค.

3. 63ํŽ˜์ด์ง€ ์ดํ›„๋Š” ์‹ค์Šต ํŒŒํŠธ

JLex ์„ค์น˜, ์ปดํŒŒ์ผ, ์‹คํ–‰, ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ตฌ๊ฐ„์ด๋ฉฐ, ์Šฌ๋ผ์ด๋“œ์— ๋ช…์‹œ์ ์œผ๋กœ ์‹œํ—˜๋ฒ”์œ„๊ฐ€ ์•„๋‹ˆ๋ผ๊ณ  ์ ํ˜€ ์žˆ๋‹ค.


๐Ÿ”ฅ ์ „์ฒด ๊ฐ•์˜ ์‹œํ—˜ ํ•ต์‹ฌ ์ •๋ฆฌ

๊ณ„์‚ฐํ•  ์ค„ ์•Œ์•„์•ผ ํ•˜๋Š” ๊ฒƒ

ํ•ญ๋ชฉ ๋‚ด์šฉ
Move(S, a) ์ƒํƒœ ์ง‘ํ•ฉ S์—์„œ a๋กœ ์ด๋™ ๊ฐ€๋Šฅํ•œ ์ƒํƒœ๋“ค์˜ ํ•ฉ์ง‘ํ•ฉ
ฮต-closure(S) ฮต ์ „์ด๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ƒํƒœ ํฌํ•จ
์ƒˆ DFA ์ƒํƒœ ์ƒ์„ฑ ๊ฒฐ๊ณผ ์ƒํƒœ๊ฐ€ Dstates์— ์—†์œผ๋ฉด ์ถ”๊ฐ€
Accepting DFA state ํŒ์ • ์ง‘ํ•ฉ ์•ˆ์— NFA accepting state ํฌํ•จ ์—ฌ๋ถ€ ํ™•์ธ

์‹œํ—˜ ๋ฒ”์œ„ ์ •๋ฆฌ

๋ฒ”์œ„ ๋‚ด์šฉ
โœ… ์‹œํ—˜ ๋ฒ”์œ„ RE โ†’ NFA, NFA โ†’ DFA (subset construction), DFA ์ตœ์†Œํ™”, scanner generator ์ด๋ก 
โŒ ์‹œํ—˜ ๋ฒ”์œ„ ์•„๋‹˜ JLex ์‹คํ–‰ ๋ช…๋ น, ํŒŒ์ผ ๋ฐฐ์น˜, ์ฝ˜์†” ์‹คํ–‰ ์ ˆ์ฐจ (63ํŽ˜์ด์ง€ ์ดํ›„)

์ค‘์š”๋„ ์ •๋ฆฌ

๋งค์šฐ ์ค‘์š” ๋œ ์ค‘์š” (์‹ค์Šต์šฉ)
Move, ฮต-closure ๊ณ„์‚ฐ JLex ์‹คํ–‰ ๋ช…๋ น์–ด
DFA ์ƒํƒœ ํ™•์žฅ ๊ณผ์ • ํŒŒ์ผ ๊ตฌ์กฐ ๋ฐฐ์น˜
accepting state ํŒ์ • ์ฝ˜์†” ์ถœ๋ ฅ ํ•ด์„
subset construction ํ๋ฆ„ ์ „์ฒด ย 

Categories:

Updated:

Leave a comment