A Practical Markdown Grammar Definition
Specifying the grammar of a language is the first step towards writing a solid parser capable of interpreting it. In this document, I will specify a set of markdown-like grammar rules that aim to be the most practical to me.
In the following definition, a Word is a production, [] delimits that the inner contents are optional, and | represents a choice of any of the delimited productions.
In cases where the grammar may be ambiguous, the first production which matches takes precedence.
Document := [ Blocks ]
Blocks := Block [ Blocks ]
Block := EmptyLine | Header | List | ImageBlock | CodeBlock | Paragraph
EmptyLine := '\n'
Header := HeaderLevel Spaces FormatedText '\n'
HeaderLevel := '#' | '##' | '###' | '####' | '#####' | '######'
List := ListItems
ListItems := ListItem [ ListItems ]
ListItem := '*' Spaces InlineText '\n'
ImageBlock := ''
CodeBlock := '```\n' NonTripleBackQuotes '\n```\n'
NonTripleBackQuotes := any non-empty utf-8 sequence which
does not include '\n```\n'
Spaces := Space [ Spaces ]
Space := ' '
Paragraph := [ Spaces ] InlineText [ Spaces ] '\n'
InlineText := AngleBracketsLink | InlineLink | FormattedText
FormattedText := BoldText | EmphasisText | InlineCode | LiteralText
BoldText := '**' InlineText '**'
EmphasisText := '*' InlineText '*'
InlineCode := '`' NonBackQuote '`'
NonBackQuote := any non-empty uft-8 sequence which does not include
'`', '\t', '\n' and does not start or end with ' '
AngleBracketsLink := '<' UrlText '>'
InlineLink := '[' FormattedText '](' UrlText ')'
UrlText := any non-empty utf-8 sequence
excluding ' ', '\t', '\n', '>', ')'
LiteralText := any non-empty utf-8 sequence excluding '\n'
and which does not start or end with either ' ' or '\t'