Breder.org

A Practical Markdown Grammar Definition

Specifying the grammar of a language is the first step towards writing a solid parser capable of interpreting it. In this document, I will specify a set of markdown-like grammar rules that aim to be the most practical to me.

In the following definition, a Word is a production, [] delimits that the inner contents are optional, and | represents a choice of any of the delimited productions.

In cases where the grammar may be ambiguous, the first production which matches takes precedence.

Document := [ Blocks ]
Blocks   := Block [ Blocks ]

Block := EmptyLine | Header | List | ImageBlock | CodeBlock | Paragraph

EmptyLine := '\n'

Header      := HeaderLevel Spaces FormatedText '\n'
HeaderLevel := '#' | '##' | '###' | '####' | '#####' | '######'

List      := ListItems
ListItems := ListItem [ ListItems ]
ListItem  := '*' Spaces InlineText '\n'

ImageBlock := '![' FormattedText '](' UrlText ')'

CodeBlock           := '```\n' NonTripleBackQuotes '\n```\n'
NonTripleBackQuotes := any non-empty utf-8 sequence which
                       does not include '\n```\n'

Spaces := Space [ Spaces ]
Space  := ' '

Paragraph := [ Spaces ] InlineText [ Spaces ] '\n'

InlineText    := AngleBracketsLink | InlineLink | FormattedText
FormattedText := BoldText | EmphasisText | InlineCode | LiteralText

BoldText      := '**' InlineText  '**'
EmphasisText  := '*' InlineText '*'
InlineCode    := '`' NonBackQuote  '`'
NonBackQuote  := any non-empty uft-8 sequence which does not include
                 '`', '\t', '\n' and does not start or end with ' '

AngleBracketsLink := '<' UrlText '>'
InlineLink := '[' FormattedText '](' UrlText ')'
UrlText := any non-empty utf-8 sequence
           excluding ' ', '\t', '\n', '>', ')'

LiteralText := any non-empty utf-8 sequence excluding '\n'
    and which does not start or end with either ' ' or '\t'