moonpool/dev/ocaml/Stdlib/Genlex/index.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Genlex (ocaml.Stdlib.Genlex)</title><link rel="stylesheet" href="../../../_odoc-theme/odoc.css"/><meta charset="utf-8"/><meta name="generator" content="odoc 2.2.1"/><meta name="viewport" content="width=device-width,initial-scale=1.0"/><script src="../../../highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body class="odoc"><nav class="odoc-nav"><a href="../index.html">Up</a> – <a href="../../index.html">ocaml</a> &#x00BB; <a href="../index.html">Stdlib</a> &#x00BB; Genlex</nav><header class="odoc-preamble"><h1>Module <code><span>Stdlib.Genlex</span></code></h1><ul class="at-tags"><li class="deprecated"><span class="at-tag">deprecated</span> Use the camlp-streams library instead.</li></ul><p>A generic lexical analyzer.</p><p>This module implements a simple 'standard' lexical analyzer, presented as a function from character streams to token streams. It implements roughly the lexical conventions of OCaml, but is parameterized by the set of keywords of your language.</p><p>Example: a lexer suitable for a desk calculator is obtained by</p><pre class="language-ocaml"><code>let lexer = make_lexer [&quot;+&quot;; &quot;-&quot;; &quot;*&quot;; &quot;/&quot;; &quot;let&quot;; &quot;=&quot;; &quot;(&quot;; &quot;)&quot;]</code></pre><p>The associated parser would be a function from <code>token stream</code> to, for instance, <code>int</code>, and would have rules such as:</p><pre class="language-ocaml"><code>let rec parse_expr = parser
  | [&lt; n1 = parse_atom; n2 = parse_remainder n1 &gt;] -&gt; n2
and parse_atom = parser
  | [&lt; 'Int n &gt;] -&gt; n
  | [&lt; 'Kwd &quot;(&quot;; n = parse_expr; 'Kwd &quot;)&quot; &gt;] -&gt; n
and parse_remainder n1 = parser
  | [&lt; 'Kwd &quot;+&quot;; n2 = parse_expr &gt;] -&gt; n1 + n2
  | [&lt; &gt;] -&gt; n1</code></pre><p>One should notice that the use of the <code>parser</code> keyword and associated notation for streams are only available through camlp4 extensions. This means that one has to preprocess its sources <i>e. g.</i> by using the <code>&quot;-pp&quot;</code> command-line switch of the compilers.</p></header><div class="odoc-content"><div class="odoc-spec"><div class="spec type anchored" id="type-token"><a href="#type-token" class="anchor"></a><code><span><span class="keyword">type</span> token</span><span> = </span></code><ol><li id="type-token.Kwd" class="def variant constructor anchored"><a href="#type-token.Kwd" class="anchor"></a><code><span>| </span><span><span class="constructor">Kwd</span> <span class="keyword">of</span> string</span></code></li><li id="type-token.Ident" class="def variant constructor anchored"><a href="#type-token.Ident" class="anchor"></a><code><span>| </span><span><span class="constructor">Ident</span> <span class="keyword">of</span> string</span></code></li><li id="type-token.Int" class="def variant constructor anchored"><a href="#type-token.Int" class="anchor"></a><code><span>| </span><span><span class="constructor">Int</span> <span class="keyword">of</span> int</span></code></li><li id="type-token.Float" class="def variant constructor anchored"><a href="#type-token.Float" class="anchor"></a><code><span>| </span><span><span class="constructor">Float</span> <span class="keyword">of</span> float</span></code></li><li id="type-token.String" class="def variant constructor anchored"><a href="#type-token.String" class="anchor"></a><code><span>| </span><span><span class="constructor">String</span> <span class="keyword">of</span> string</span></code></li><li id="type-token.Char" class="def variant constructor anchored"><a href="#type-token.Char" class="anchor"></a><code><span>| </span><span><span class="constructor">Char</span> <span class="keyword">of</span> char</span></code></li></ol></div><div class="spec-doc"><p>The type of tokens. The lexical classes are: <code>Int</code> and <code>Float</code> for integer and floating-point numbers; <code>String</code> for string literals, enclosed in double quotes; <code>Char</code> for character literals, enclosed in single quotes; <code>Ident</code> for identifiers (either sequences of letters, digits, underscores and quotes, or sequences of 'operator characters' such as <code>+</code>, <code>*</code>, etc); and <code>Kwd</code> for keywords (either identifiers or single 'special characters' such as <code>(</code>, <code>}</code>, etc).</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-make_lexer"><a href="#val-make_lexer" class="anchor"></a><code><span><span class="keyword">val</span> make_lexer : <span><span>string list</span> <span class="arrow">&#45;&gt;</span></span> <span><span>char <a href="../Stream/index.html#type-t">Stream.t</a></span> <span class="arrow">&#45;&gt;</span></span> <span><a href="#type-token">token</a> <a href="../Stream/index.html#type-t">Stream.t</a></span></span></code></div><div class="spec-doc"><p>Construct the lexer function. The first argument is the list of keywords. An identifier <code>s</code> is returned as <code>Kwd s</code> if <code>s</code> belongs to this list, and as <code>Ident s</code> otherwise. A special character <code>s</code> is returned as <code>Kwd s</code> if <code>s</code> belongs to this list, and cause a lexical error (exception <a href="../Stream/index.html#exception-Error"><code>Stream.Error</code></a> with the offending lexeme as its parameter) otherwise. Blanks and newlines are skipped. Comments delimited by <code>(*</code> and <code>*)</code> are skipped as well, and can be nested. A <a href="../Stream/index.html#exception-Failure"><code>Stream.Failure</code></a> exception is raised if end of stream is unexpectedly reached.</p></div></div></div></body></html>