{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Translating [Markdown] to [Python]\n",
    "\n",
    "A primary translation is literate programming is the tangle step that converts the literate program into \n",
    "the programming language. The 1979 implementation converts `\".WEB\"` files to valid pascal - `\".PAS\"` - files.\n",
    "The `pidgy` approach begins with [Markdown] files and proper [Python] files as the outcome. The rest of this \n",
    "document configures how [IPython] acknowledges the transformation and the heuristics the translate [Markdown] to [Python].\n",
    "\n",
    "[Markdown]: #\n",
    "[Python]: #"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "    import typing, mistune, IPython, pidgy.util\n",
    "    __all__ = 'tangle', 'Tangle'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `pidgy` tangle workflow has three steps:\n",
    "\n",
    "1. Block-level lexical analysis to tokenize [Markdown].\n",
    "2. Normalize the tokens to compacted `\"code\" and not \"code\"` tokens.\n",
    "3. Translate the normalized tokens to a string of valid [Python] code.\n",
    "\n",
    "[Markdown]: #\n",
    "[Python]: #"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "    @pidgy.implementation\n",
    "    def tangle(str:str)->str:\n",
    "        translate = Tangle()\n",
    "        return translate.stringify(translate.parse(''.join(str)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "    class pidgyManager(IPython.core.inputtransformer2.TransformerManager):\n",
    "        def transform_cell(self, cell): return super(type(self), self).transform_cell(tangle(str=cell))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Block level lexical analysis.\n",
    "\n",
    "`pidgy` uses a modified `mistune.BlockLexer` to create block level tokens\n",
    "for a [Markdown] source. A specific `pidgy` addition is the addition off \n",
    "a `doctest` block object, `doctest` are testable strings that are ignored by the tangle\n",
    "step. The tokens are to be normalized and translated to [Python] strings.\n",
    "\n",
    "<details><summary><code>BlockLexer</code></summary>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "    class BlockLexer(mistune.BlockLexer, pidgy.util.ContextDepth):\n",
    "        class grammar_class(mistune.BlockGrammar):\n",
    "            doctest = __import__('doctest').DocTestParser._EXAMPLE_RE\n",
    "            block_code = __import__('re').compile(r'^((?!\\s+>>>\\s) {4}[^\\n]+\\n*)+')\n",
    "            default_rules = \"newline hrule block_code fences heading nptable lheading block_quote list_block def_links def_footnotes table paragraph text\".split()\n",
    "\n",
    "        def parse_doctest(self, m): self.tokens.append({'type': 'paragraph', 'text': m.group(0)})\n",
    "\n",
    "        def parse_fences(self, m):\n",
    "            if m.group(2): self.tokens.append({'type': 'paragraph', 'text': m.group(0)})\n",
    "            else: super().parse_fences(m)\n",
    "\n",
    "        def parse_hrule(self, m): self.tokens.append(dict(type='hrule', text=m.group(0)))\n",
    "            \n",
    "        def parse_def_links(self, m):\n",
    "            super().parse_def_links(m)\n",
    "            self.tokens.append(dict(type='def_link', text=m.group(0)))\n",
    "            \n",
    "        def parse_front_matter(self): ...\n",
    "        def parse(self, text: str, default_rules=None, normalize=True) -> typing.List[dict]:\n",
    "            front_matter = None\n",
    "            if not self.depth: \n",
    "                self.tokens = []\n",
    "                if text.strip() and text.startswith('---\\n') and '\\n---\\n' in text[4:]:\n",
    "                    front_matter, sep, text = text[4:].partition('---\\n')\n",
    "                    front_matter = {'type': 'front_matter', 'text': F\"\\n{front_matter}\"}\n",
    "\n",
    "            with self: tokens = super().parse(pidgy.util.whiten(text), default_rules)\n",
    "            if normalize and not self.depth: tokens = normalizer(text, tokens)\n",
    "            if front_matter: tokens.insert(0, front_matter)\n",
    "            return tokens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normalizing the tokens\n",
    "\n",
    "Tokenizing [Markdown] typically extracts conventions at both the block and inline level.\n",
    "Fortunately, `pidgy`'s translation is restricted to block level [Markdown] tokens, and mitigating some potential complexities from having opinions about inline code while tangling.\n",
    "\n",
    "<details><summary><code>normalizer</code></summary>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "    def normalizer(text, tokens):\n",
    "        compacted = []\n",
    "        while tokens:\n",
    "            token = tokens.pop(0)\n",
    "            if 'text' not in token: continue\n",
    "            if not token['text'].strip(): continue\n",
    "            block, body = token['text'].splitlines(), \"\"\n",
    "            while block:\n",
    "                line = block.pop(0)\n",
    "                if line:\n",
    "                    before, line, text = text.partition(line)\n",
    "                    body += before + line\n",
    "            if token['type']=='code':\n",
    "                compacted.append({'type': 'code', 'lang': None, 'text': body})\n",
    "            elif compacted and compacted[-1]['type'] == 'paragraph':\n",
    "                compacted[-1]['text'] += body\n",
    "            else: compacted.append({'type': 'paragraph', 'text': body})\n",
    "                \n",
    "        if compacted and compacted[-1]['type'] == 'paragraph':\n",
    "            compacted[-1]['text'] += text\n",
    "        elif text.strip():\n",
    "            compacted.append({'type': 'paragraph', 'text': text})\n",
    "        # Deal with front matter\n",
    "        if compacted and compacted[0]['text'].startswith('---\\n') and '\\n---' in compacted[0]['text'][4:]:\n",
    "            token = compacted.pop(0)\n",
    "            front_matter, sep, paragraph = token['text'][4:].partition('---')\n",
    "            compacted = [{'type': 'front_matter', 'text': F\"\\n{front_matter}\"},\n",
    "                        {'type': 'paragraph', 'text': paragraph}] + compacted\n",
    "        return compacted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Flattening the tokens to a [Python] string.\n",
    "\n",
    "The tokenizer controls the translation of markdown strings to python strings.  Our major constraint is that the Markdown input should retain line numbers.\n",
    "\n",
    "<details><summary><code>Flatten</code></summary>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "    class Tangle(BlockLexer):\n",
    "        def stringify(self, tokens: typing.List[dict], source: str = \"\"\"\"\"\", last: int =0) -> str:\n",
    "            import textwrap\n",
    "            INDENT = indent = pidgy.util.base_indent(tokens) or 4\n",
    "            for i, token in enumerate(tokens):\n",
    "                object = token['text']\n",
    "                if token and token['type'] == 'code':\n",
    "                    if object.lstrip().startswith(pidgy.util.FENCE):\n",
    "\n",
    "                        object = ''.join(''.join(object.partition(pidgy.util.FENCE)[::2]).rpartition(pidgy.util.FENCE)[::2])\n",
    "                        indent = INDENT + pidgy.util.num_first_indent(object)\n",
    "                        object = textwrap.indent(object, INDENT*pidgy.util.SPACE)\n",
    "\n",
    "                    if object.lstrip().startswith(pidgy.util.MAGIC):  ...\n",
    "                    else: indent = pidgy.util.num_last_indent(object)\n",
    "                elif token and token['type'] == 'front_matter': \n",
    "                    object = textwrap.indent(\n",
    "                        F\"locals().update(__import__('ruamel.yaml').yaml.safe_load({pidgy.util.quote(object)}))\\n\", indent*pidgy.util.SPACE)\n",
    "\n",
    "                elif not object: ...\n",
    "                else:\n",
    "                    object = textwrap.indent(object, pidgy.util.SPACE*max(indent-pidgy.util.num_first_indent(object), 0))\n",
    "                    for next in tokens[i+1:]:\n",
    "                        if next['type'] == 'code':\n",
    "                            next = pidgy.util.num_first_indent(next['text'])\n",
    "                            break\n",
    "                    else: next = indent       \n",
    "                    Δ = max(next-indent, 0)\n",
    "\n",
    "                    if not Δ and source.rstrip().rstrip(pidgy.util.CONTINUATION).endswith(pidgy.util.COLON): \n",
    "                        Δ += 4\n",
    "\n",
    "                    spaces = pidgy.util.indents(object)\n",
    "                    \"what if the spaces are ling enough\"\n",
    "                    object = object[:spaces] + Δ*pidgy.util.SPACE+ object[spaces:]\n",
    "                    if not source.rstrip().rstrip(pidgy.util.CONTINUATION).endswith(pidgy.util.QUOTES): \n",
    "                        object = pidgy.util.quote(object)\n",
    "                source += object\n",
    "\n",
    "            # add a semicolon to the source if the last block is code.\n",
    "            for token in reversed(tokens):\n",
    "                if token['text'].strip():\n",
    "                    if token['type'] != 'code': \n",
    "                        source = source.rstrip() + pidgy.util.SEMI\n",
    "                    break\n",
    "\n",
    "            return source"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Append the lexer for nested rules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "    for x in \"default_rules footnote_rules list_rules\".split():\n",
    "        setattr(BlockLexer, x, list(getattr(BlockLexer, x)))\n",
    "        getattr(BlockLexer, x).insert(getattr(BlockLexer, x).index('block_code'), 'doctest')\n",
    "        if 'block_html' in getattr(BlockLexer, x):\n",
    "            getattr(BlockLexer, x).pop(getattr(BlockLexer, x).index('block_html'))\n",
    "    del x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More `pidgy` langauge features\n",
    "\n",
    "`pidgy` experiments extra language features for python, using the same system\n",
    "that IPython uses to add features like line and cell magics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "    import ast, pidgy, IPython"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recently, IPython introduced a convention that allows top level await statements outside of functions. Building of this convenience, `pidgy` allows for top-level __return__ and __yield__ statements.  These statements are replaced with the an IPython display statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "    class ExtraSyntax(ast.NodeTransformer):\n",
    "        def visit_FunctionDef(self, node): return node\n",
    "        visit_AsyncFunctionDef = visit_FunctionDef        \n",
    "\n",
    "        def visit_Return(self, node):\n",
    "            replace = ast.parse('''__import__('IPython').display.display()''').body[0]\n",
    "            replace.value.args = node.value.elts if isinstance(node.value, ast.Tuple) else [node.value]\n",
    "            return ast.copy_location(replace, node)\n",
    "\n",
    "        def visit_Expr(self, node):\n",
    "            if isinstance(node.value, (ast.Yield, ast.YieldFrom)):  return ast.copy_location(self.visit_Return(node.value), node)\n",
    "            return node\n",
    "\n",
    "        visit_Expression = visit_Expr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know naming is hard, there is no point focusing on it. `pidgy` allows authors\n",
    "to use emojis as variables in python. They add extra color and expression to the narrative."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "    def demojize(lines, delimiters=('_', '_')):\n",
    "        str = ''.join(lines)\n",
    "        import tokenize, emoji, stringcase; tokens = []\n",
    "        try:\n",
    "            for token in list(tokenize.tokenize(\n",
    "                __import__('io').BytesIO(str.encode()).readline)):\n",
    "                if token.type == tokenize.ERRORTOKEN:\n",
    "                    string = emoji.demojize(token.string, delimiters=delimiters\n",
    "                                           ).replace('-', '_').replace(\"’\", \"_\")\n",
    "                    if tokens and tokens[-1].type == tokenize.NAME: tokens[-1] = tokenize.TokenInfo(tokens[-1].type, tokens[-1].string + string, tokens[-1].start, tokens[-1].end, tokens[-1].line)\n",
    "                    else: tokens.append(\n",
    "                        tokenize.TokenInfo(\n",
    "                            tokenize.NAME, string, token.start, token.end, token.line))\n",
    "                else: tokens.append(token)\n",
    "            return tokenize.untokenize(tokens).decode()\n",
    "        except BaseException: raise SyntaxError(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "    def init_json():\n",
    "        import builtins\n",
    "        builtins.yes = builtins.true = True\n",
    "        builtins.no = builtins.false = False\n",
    "        builtins.null = None"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}