A Practical Markdown Grammar Definition
Specifying the grammar of a language is the first step towards writing a solid parser capable of interpreting it. In this document, I will specify a set of markdown-like grammar rules that aim to be the most practical to me.
In the following definition, a Word
is a production, []
delimits that the inner contents are optional, and |
represents a choice of any of the delimited productions.
In cases where the grammar may be ambiguous, the first production which matches takes precedence.
Document := [ Blocks ] Blocks := Block [ Blocks ] Block := EmptyLine | Header | List | ImageBlock | CodeBlock | Paragraph EmptyLine := '\n' Header := HeaderLevel Spaces FormatedText '\n' HeaderLevel := '#' | '##' | '###' | '####' | '#####' | '######' List := ListItems ListItems := ListItem [ ListItems ] ListItem := '*' Spaces InlineText '\n' ImageBlock := '![' FormattedText '](' UrlText ')' CodeBlock := '```\n' NonTripleBackQuotes '\n```\n' NonTripleBackQuotes := any non-empty utf-8 sequence which does not include '\n```\n' Spaces := Space [ Spaces ] Space := ' ' Paragraph := [ Spaces ] InlineText [ Spaces ] '\n' InlineText := AngleBracketsLink | InlineLink | FormattedText FormattedText := BoldText | EmphasisText | InlineCode | LiteralText BoldText := '**' InlineText '**' EmphasisText := '*' InlineText '*' InlineCode := '`' NonBackQuote '`' NonBackQuote := any non-empty uft-8 sequence which does not include '`', '\t', '\n' and does not start or end with ' ' AngleBracketsLink := '<' UrlText '>' InlineLink := '[' FormattedText '](' UrlText ')' UrlText := any non-empty utf-8 sequence excluding ' ', '\t', '\n', '>', ')' LiteralText := any non-empty utf-8 sequence excluding '\n' and which does not start or end with either ' ' or '\t'