mirror of
https://github.com/c-cube/ocaml-containers.git
synced 2025-12-06 11:15:31 -05:00
2 lines
No EOL
12 KiB
HTML
2 lines
No EOL
12 KiB
HTML
<!DOCTYPE html>
|
||
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>CCUtf8_string (containers.CCUtf8_string)</title><link rel="stylesheet" href="../../odoc.css"/><meta charset="utf-8"/><meta name="generator" content="odoc 1.5.1"/><meta name="viewport" content="width=device-width,initial-scale=1.0"/><script src="../../highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div class="content"><header><nav><a href="../index.html">Up</a> – <a href="../index.html">containers</a> » CCUtf8_string</nav><h1>Module <code>CCUtf8_string</code></h1><h2 id="unicode-string,-in-utf8"><a href="#unicode-string,-in-utf8" class="anchor"></a>Unicode String, in UTF8</h2></header><aside><p>A unicode string represented by a utf8 bytestring. This representation is convenient for manipulating normal OCaml strings that are encoded in UTF8.</p><p>We perform only basic decoding and encoding between codepoints and bytestrings. For more elaborate operations, please use the excellent <a href="http://erratique.ch/software/uutf">Uutf</a>.</p><p><b>status: experimental</b></p><dl><dt>since</dt><dd>2.1</dd></dl></aside><dl><dt class="spec type" id="type-uchar"><a href="#type-uchar" class="anchor"></a><code><span class="keyword">type</span> uchar</code><code> = Stdlib.Uchar.t</code></dt><dt class="spec type" id="type-gen"><a href="#type-gen" class="anchor"></a><code><span class="keyword">type</span> <span>'a gen</span></code><code> = unit <span>-></span> <span><span class="type-var">'a</span> option</span></code></dt><dt class="spec type" id="type-iter"><a href="#type-iter" class="anchor"></a><code><span class="keyword">type</span> <span>'a iter</span></code><code> = <span>(<span class="type-var">'a</span> <span>-></span> unit)</span> <span>-></span> unit</code></dt><dd><p>Fast internal iterator.</p><dl><dt>since</dt><dd>2.8</dd></dl></dd></dl><dl><dt class="spec type" id="type-t"><a href="#type-t" class="anchor"></a><code><span class="keyword">type</span> t</code><code> = <span class="keyword">private</span> string</code></dt><dd><p>A UTF8 string</p></dd></dl><dl><dt class="spec value" id="val-equal"><a href="#val-equal" class="anchor"></a><code><span class="keyword">val</span> equal : <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> bool</code></dt><dt class="spec value" id="val-hash"><a href="#val-hash" class="anchor"></a><code><span class="keyword">val</span> hash : <a href="index.html#type-t">t</a> <span>-></span> int</code></dt><dt class="spec value" id="val-compare"><a href="#val-compare" class="anchor"></a><code><span class="keyword">val</span> compare : <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> int</code></dt><dt class="spec value" id="val-pp"><a href="#val-pp" class="anchor"></a><code><span class="keyword">val</span> pp : Stdlib.Format.formatter <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> unit</code></dt><dt class="spec value" id="val-to_string"><a href="#val-to_string" class="anchor"></a><code><span class="keyword">val</span> to_string : <a href="index.html#type-t">t</a> <span>-></span> string</code></dt><dd><p>Identity.</p></dd></dl><dl><dt class="spec exception" id="exception-Malformed"><a href="#exception-Malformed" class="anchor"></a><code><span class="keyword">exception</span> </code><code><span class="exception">Malformed</span> <span class="keyword">of</span> string * int</code></dt><dd><p>Malformed string at given offset</p></dd></dl><dl><dt class="spec value" id="val-to_gen"><a href="#val-to_gen" class="anchor"></a><code><span class="keyword">val</span> to_gen : <span>?⁠idx:int</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <span><a href="index.html#type-uchar">uchar</a> <a href="index.html#type-gen">gen</a></span></code></dt><dd><p>Generator of unicode codepoints.</p><dl><dt>parameter idx</dt><dd><p>offset where to start the decoding.</p></dd></dl></dd></dl><dl><dt class="spec value" id="val-to_iter"><a href="#val-to_iter" class="anchor"></a><code><span class="keyword">val</span> to_iter : <span>?⁠idx:int</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <span><a href="index.html#type-uchar">uchar</a> <a href="index.html#type-iter">iter</a></span></code></dt><dd><p>Iterator of unicode codepoints.</p><dl><dt>parameter idx</dt><dd><p>offset where to start the decoding.</p></dd></dl><dl><dt>since</dt><dd>2.8</dd></dl></dd></dl><dl><dt class="spec value" id="val-to_seq"><a href="#val-to_seq" class="anchor"></a><code><span class="keyword">val</span> to_seq : <span>?⁠idx:int</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <span><a href="index.html#type-uchar">uchar</a> Stdlib.Seq.t</span></code></dt><dd><p>Iter of unicode codepoints. Renamed from <code>to_std_seq</code> since 3.0.</p><dl><dt>parameter idx</dt><dd><p>offset where to start the decoding.</p></dd></dl><dl><dt>since</dt><dd>3.0</dd></dl></dd></dl><dl><dt class="spec value" id="val-to_list"><a href="#val-to_list" class="anchor"></a><code><span class="keyword">val</span> to_list : <span>?⁠idx:int</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <span><a href="index.html#type-uchar">uchar</a> list</span></code></dt><dd><p>List of unicode codepoints.</p><dl><dt>parameter idx</dt><dd><p>offset where to start the decoding.</p></dd></dl></dd></dl><dl><dt class="spec value" id="val-fold"><a href="#val-fold" class="anchor"></a><code><span class="keyword">val</span> fold : <span>?⁠idx:int</span> <span>-></span> <span>(<span class="type-var">'a</span> <span>-></span> <a href="index.html#type-uchar">uchar</a> <span>-></span> <span class="type-var">'a</span>)</span> <span>-></span> <span class="type-var">'a</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <span class="type-var">'a</span></code></dt><dt class="spec value" id="val-iter"><a href="#val-iter" class="anchor"></a><code><span class="keyword">val</span> iter : <span>?⁠idx:int</span> <span>-></span> <span>(<a href="index.html#type-uchar">uchar</a> <span>-></span> unit)</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> unit</code></dt><dt class="spec value" id="val-n_chars"><a href="#val-n_chars" class="anchor"></a><code><span class="keyword">val</span> n_chars : <a href="index.html#type-t">t</a> <span>-></span> int</code></dt><dd><p>Number of characters.</p></dd></dl><dl><dt class="spec value" id="val-n_bytes"><a href="#val-n_bytes" class="anchor"></a><code><span class="keyword">val</span> n_bytes : <a href="index.html#type-t">t</a> <span>-></span> int</code></dt><dd><p>Number of bytes.</p></dd></dl><dl><dt class="spec value" id="val-map"><a href="#val-map" class="anchor"></a><code><span class="keyword">val</span> map : <span>(<a href="index.html#type-uchar">uchar</a> <span>-></span> <a href="index.html#type-uchar">uchar</a>)</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-filter_map"><a href="#val-filter_map" class="anchor"></a><code><span class="keyword">val</span> filter_map : <span>(<a href="index.html#type-uchar">uchar</a> <span>-></span> <span><a href="index.html#type-uchar">uchar</a> option</span>)</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-flat_map"><a href="#val-flat_map" class="anchor"></a><code><span class="keyword">val</span> flat_map : <span>(<a href="index.html#type-uchar">uchar</a> <span>-></span> <a href="index.html#type-t">t</a>)</span> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-append"><a href="#val-append" class="anchor"></a><code><span class="keyword">val</span> append : <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-concat"><a href="#val-concat" class="anchor"></a><code><span class="keyword">val</span> concat : <a href="index.html#type-t">t</a> <span>-></span> <span><a href="index.html#type-t">t</a> list</span> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-of_seq"><a href="#val-of_seq" class="anchor"></a><code><span class="keyword">val</span> of_seq : <span><a href="index.html#type-uchar">uchar</a> Stdlib.Seq.t</span> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dd><p>Build a string from unicode codepoints Renamed from <code>of_std_seq</code> since 3.0.</p><dl><dt>since</dt><dd>3.0</dd></dl></dd></dl><dl><dt class="spec value" id="val-of_iter"><a href="#val-of_iter" class="anchor"></a><code><span class="keyword">val</span> of_iter : <span><a href="index.html#type-uchar">uchar</a> <a href="index.html#type-iter">iter</a></span> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dd><p>Build a string from unicode codepoints</p><dl><dt>since</dt><dd>2.8</dd></dl></dd></dl><dl><dt class="spec value" id="val-uchar_to_bytes"><a href="#val-uchar_to_bytes" class="anchor"></a><code><span class="keyword">val</span> uchar_to_bytes : <a href="index.html#type-uchar">uchar</a> <span>-></span> <span>char <a href="index.html#type-iter">iter</a></span></code></dt><dd><p>Translate the unicode codepoint to a list of utf-8 bytes. This can be used, for example, in combination with <span class="xref-unresolved" title="unresolved reference to "Buffer.add_char""><code>Buffer</code>.add_char</span> on a pre-allocated buffer to add the bytes one by one (despite its name, <span class="xref-unresolved" title="unresolved reference to "Buffer.add_char""><code>Buffer</code>.add_char</span> takes individual bytes, not unicode codepoints).</p><dl><dt>since</dt><dd>3.2</dd></dl></dd></dl><dl><dt class="spec value" id="val-of_gen"><a href="#val-of_gen" class="anchor"></a><code><span class="keyword">val</span> of_gen : <span><a href="index.html#type-uchar">uchar</a> <a href="index.html#type-gen">gen</a></span> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-of_list"><a href="#val-of_list" class="anchor"></a><code><span class="keyword">val</span> of_list : <span><a href="index.html#type-uchar">uchar</a> list</span> <span>-></span> <a href="index.html#type-t">t</a></code></dt><dt class="spec value" id="val-of_string_exn"><a href="#val-of_string_exn" class="anchor"></a><code><span class="keyword">val</span> of_string_exn : string <span>-></span> <a href="index.html#type-t">t</a></code></dt><dd><p>Validate string by checking it is valid UTF8.</p><dl><dt>raises Invalid_argument</dt><dd><p>if the string is not valid UTF8.</p></dd></dl></dd></dl><dl><dt class="spec value" id="val-of_string"><a href="#val-of_string" class="anchor"></a><code><span class="keyword">val</span> of_string : string <span>-></span> <span><a href="index.html#type-t">t</a> option</span></code></dt><dd><p>Safe version of <a href="index.html#val-of_string_exn"><code>of_string_exn</code></a>.</p></dd></dl><dl><dt class="spec value" id="val-is_valid"><a href="#val-is_valid" class="anchor"></a><code><span class="keyword">val</span> is_valid : string <span>-></span> bool</code></dt><dd><p>Valid UTF8?</p></dd></dl><dl><dt class="spec value" id="val-unsafe_of_string"><a href="#val-unsafe_of_string" class="anchor"></a><code><span class="keyword">val</span> unsafe_of_string : string <span>-></span> <a href="index.html#type-t">t</a></code></dt><dd><p>Conversion from a string without validating. <b>CAUTION</b> this is unsafe and can break all the other functions in this module. Use only if you're sure the string is valid UTF8. Upon iteration, if an invalid substring is met, Malformed will be raised.</p></dd></dl></div></body></html> |