mirror of
https://github.com/c-cube/moonpool.git
synced 2025-12-08 12:15:44 -05:00
33 lines
12 KiB
HTML
33 lines
12 KiB
HTML
<!DOCTYPE html>
|
||
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Multicore_magic (multicore-magic.Multicore_magic)</title><meta charset="utf-8"/><link rel="stylesheet" href="../../_odoc-theme/odoc.css"/><meta name="generator" content="odoc 3.0.0"/><meta name="viewport" content="width=device-width,initial-scale=1.0"/><script src="../../highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body class="odoc"><nav class="odoc-nav"><a href="../index.html">Up</a> – <a href="../../index.html">Index</a> » <a href="../index.html">multicore-magic</a> » Multicore_magic</nav><header class="odoc-preamble"><h1>Module <code><span>Multicore_magic</span></code></h1><p>This is a library of magic multicore utilities intended for experts for extracting the best possible performance from multicore OCaml.</p><p>Hopefully future releases of multicore OCaml will make this library obsolete!</p></header><div class="odoc-tocs"><nav class="odoc-toc odoc-local-toc"><ul><li><a href="#helpers-for-using-padding-to-avoid-false-sharing">Helpers for using padding to avoid false sharing</a></li><li><a href="#missing-atomic-operations">Missing <code>Atomic</code> operations</a></li><li><a href="#fixes-and-workarounds">Fixes and workarounds</a></li><li><a href="#missing-functionality">Missing functionality</a></li><li><a href="#avoiding-contention">Avoiding contention</a></li></ul></nav></div><div class="odoc-content"><h2 id="helpers-for-using-padding-to-avoid-false-sharing"><a href="#helpers-for-using-padding-to-avoid-false-sharing" class="anchor"></a>Helpers for using padding to avoid false sharing</h2><div class="odoc-spec"><div class="spec value anchored" id="val-copy_as_padded"><a href="#val-copy_as_padded" class="anchor"></a><code><span><span class="keyword">val</span> copy_as_padded : <span><span class="type-var">'a</span> <span class="arrow">-></span></span> <span class="type-var">'a</span></span></code></div><div class="spec-doc"><p>Depending on the object, either creates a shallow clone of it or returns it as is. When cloned, the clone will have extra padding words added after the last used word.</p><p>This is designed to help avoid <a href="https://en.wikipedia.org/wiki/False_sharing">false sharing</a>. False sharing has a negative impact on multicore performance. Accesses of both atomic and non-atomic locations, whether read-only or read-write, may suffer from false sharing.</p><p>The intended use case for this is to pad all long lived objects that are being accessed highly frequently (read or written).</p><p>Many kinds of objects can be padded, for example:</p><pre class="language-ocaml"><code> let padded_atomic = Multicore_magic.copy_as_padded (Atomic.make 101)
|
||
let padded_ref = Multicore_magic.copy_as_padded (ref 42)
|
||
|
||
let padded_record =
|
||
Multicore_magic.copy_as_padded { number = 76; pointer = [ 1; 2; 3 ] }
|
||
|
||
let padded_variant = Multicore_magic.copy_as_padded (Some 1)</code></pre><p>Padding changes the length of an array. If you need to pad an array, use <a href="#val-make_padded_array"><code>make_padded_array</code></a>.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-copy_as"><a href="#val-copy_as" class="anchor"></a><code><span><span class="keyword">val</span> copy_as : <span><span class="optlabel">?padded</span>:bool <span class="arrow">-></span></span> <span><span class="type-var">'a</span> <span class="arrow">-></span></span> <span class="type-var">'a</span></span></code></div><div class="spec-doc"><p><code>copy_as x</code> by default simply returns <code>x</code>. When <code>~padded:true</code> is explicitly specified, returns <a href="#val-copy_as_padded" title="copy_as_padded"><code>copy_as_padded x</code></a>.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-make_padded_array"><a href="#val-make_padded_array" class="anchor"></a><code><span><span class="keyword">val</span> make_padded_array : <span>int <span class="arrow">-></span></span> <span><span class="type-var">'a</span> <span class="arrow">-></span></span> <span><span class="type-var">'a</span> array</span></span></code></div><div class="spec-doc"><p>Creates a padded array. The length of the returned array includes padding. Use <a href="#val-length_of_padded_array"><code>length_of_padded_array</code></a> to get the unpadded length.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-length_of_padded_array"><a href="#val-length_of_padded_array" class="anchor"></a><code><span><span class="keyword">val</span> length_of_padded_array : <span><span><span class="type-var">'a</span> array</span> <span class="arrow">-></span></span> int</span></code></div><div class="spec-doc"><p>Returns the length of an array created by <a href="#val-make_padded_array"><code>make_padded_array</code></a> without the padding.</p><p><b>WARNING</b>: This is not guaranteed to work with <a href="#val-copy_as_padded"><code>copy_as_padded</code></a>.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-length_of_padded_array_minus_1"><a href="#val-length_of_padded_array_minus_1" class="anchor"></a><code><span><span class="keyword">val</span> length_of_padded_array_minus_1 : <span><span><span class="type-var">'a</span> array</span> <span class="arrow">-></span></span> int</span></code></div><div class="spec-doc"><p>Returns the length of an array created by <a href="#val-make_padded_array"><code>make_padded_array</code></a> without the padding minus 1.</p><p><b>WARNING</b>: This is not guaranteed to work with <a href="#val-copy_as_padded"><code>copy_as_padded</code></a>.</p></div></div><h2 id="missing-atomic-operations"><a href="#missing-atomic-operations" class="anchor"></a>Missing <code>Atomic</code> operations</h2><div class="odoc-spec"><div class="spec value anchored" id="val-fenceless_get"><a href="#val-fenceless_get" class="anchor"></a><code><span><span class="keyword">val</span> fenceless_get : <span><span><span class="type-var">'a</span> <a href="../../ocaml/Stdlib/Atomic/index.html#type-t">Stdlib.Atomic.t</a></span> <span class="arrow">-></span></span> <span class="type-var">'a</span></span></code></div><div class="spec-doc"><p>Get a value from the atomic without performing an acquire fence.</p><p>Consider the following prototypical example of a lock-free algorithm:</p><pre class="language-ocaml"><code> let rec prototypical_lock_free_algorithm () =
|
||
let expected = Atomic.get atomic in
|
||
let desired = (* computed from expected *) in
|
||
if not (Atomic.compare_and_set atomic expected desired) then
|
||
(* failure, maybe retry *)
|
||
else
|
||
(* success *)</code></pre><p>A potential performance problem with the above example is that it performs two acquire fences. Both the <code>Atomic.get</code> and the <code>Atomic.compare_and_set</code> perform an acquire fence. This may have a negative impact on performance.</p><p>Assuming the first fence is not necessary, we can rewrite the example using <a href="#val-fenceless_get"><code>fenceless_get</code></a> as follows:</p><pre class="language-ocaml"><code> let rec prototypical_lock_free_algorithm () =
|
||
let expected = Multicore_magic.fenceless_get atomic in
|
||
let desired = (* computed from expected *) in
|
||
if not (Atomic.compare_and_set atomic expected desired) then
|
||
(* failure, maybe retry *)
|
||
else
|
||
(* success *)</code></pre><p>Now only a single acquire fence is performed by <code>Atomic.compare_and_set</code> and performance may be improved.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-fenceless_set"><a href="#val-fenceless_set" class="anchor"></a><code><span><span class="keyword">val</span> fenceless_set : <span><span><span class="type-var">'a</span> <a href="../../ocaml/Stdlib/Atomic/index.html#type-t">Stdlib.Atomic.t</a></span> <span class="arrow">-></span></span> <span><span class="type-var">'a</span> <span class="arrow">-></span></span> unit</span></code></div><div class="spec-doc"><p>Set the value of an atomic without performing a full fence.</p><p>Consider the following example:</p><pre class="language-ocaml"><code> let new_atomic = Atomic.make dummy_value in
|
||
(* prepare data_structure referring to new_atomic *)
|
||
Atomic.set new_atomic data_structure;
|
||
(* publish the data_structure: *)
|
||
Atomic.exchance old_atomic data_structure</code></pre><p>A potential performance problem with the above example is that it performs two full fences. Both the <code>Atomic.set</code> used to initialize the data structure and the <code>Atomic.exchange</code> used to publish the data structure perform a full fence. The same would also apply in cases where <code>Atomic.compare_and_set</code> or <code>Atomic.set</code> would be used to publish the data structure. This may have a negative impact on performance.</p><p>Using <a href="#val-fenceless_set"><code>fenceless_set</code></a> we can rewrite the example as follows:</p><pre class="language-ocaml"><code> let new_atomic = Atomic.make dummy_value in
|
||
(* prepare data_structure referring to new_atomic *)
|
||
Multicore_magic.fenceless_set new_atomic data_structure;
|
||
(* publish the data_structure: *)
|
||
Atomic.exchance old_atomic data_structure</code></pre><p>Now only a single full fence is performed by <code>Atomic.exchange</code> and performance may be improved.</p></div></div><div class="odoc-spec"><div class="spec value anchored" id="val-fence"><a href="#val-fence" class="anchor"></a><code><span><span class="keyword">val</span> fence : <span><span>int <a href="../../ocaml/Stdlib/Atomic/index.html#type-t">Stdlib.Atomic.t</a></span> <span class="arrow">-></span></span> unit</span></code></div><div class="spec-doc"><p>Perform a full acquire-release fence on the given atomic.</p><p><code>fence atomic</code> is equivalent to <code>ignore (Atomic.fetch_and_add atomic 0)</code>.</p></div></div><h2 id="fixes-and-workarounds"><a href="#fixes-and-workarounds" class="anchor"></a>Fixes and workarounds</h2><div class="odoc-spec"><div class="spec module anchored" id="module-Transparent_atomic"><a href="#module-Transparent_atomic" class="anchor"></a><code><span><span class="keyword">module</span> <a href="Transparent_atomic/index.html">Transparent_atomic</a></span><span> : <span class="keyword">sig</span> ... <span class="keyword">end</span></span></code></div><div class="spec-doc"><p>A replacement for <code>Stdlib.Atomic</code> with fixes and performance improvements</p></div></div><h2 id="missing-functionality"><a href="#missing-functionality" class="anchor"></a>Missing functionality</h2><div class="odoc-spec"><div class="spec module anchored" id="module-Atomic_array"><a href="#module-Atomic_array" class="anchor"></a><code><span><span class="keyword">module</span> <a href="Atomic_array/index.html">Atomic_array</a></span><span> : <span class="keyword">sig</span> ... <span class="keyword">end</span></span></code></div><div class="spec-doc"><p>Array of (potentially unboxed) atomic locations.</p></div></div><h2 id="avoiding-contention"><a href="#avoiding-contention" class="anchor"></a>Avoiding contention</h2><div class="odoc-spec"><div class="spec value anchored" id="val-instantaneous_domain_index"><a href="#val-instantaneous_domain_index" class="anchor"></a><code><span><span class="keyword">val</span> instantaneous_domain_index : <span>unit <span class="arrow">-></span></span> int</span></code></div><div class="spec-doc"><p><code>instantaneous_domain_index ()</code> potentially (re)allocates and returns a non-negative integer "index" for the current domain. The indices are guaranteed to be unique among the domains that exist at a point in time. Each call of <code>instantaneous_domain_index ()</code> may return a different index.</p><p>The intention is that the returned value can be used as an index into a contention avoiding parallelism safe data structure. For example, a naïve scalable increment of one counter from an array of counters could be done as follows:</p><pre class="language-ocaml"><code> let incr counters =
|
||
(* Assuming length of [counters] is a power of two and larger than
|
||
the number of domains. *)
|
||
let mask = Array.length counters - 1 in
|
||
let index = instantaneous_domain_index () in
|
||
Atomic.incr counters.(index land mask)</code></pre><p>The implementation ensures that the indices are allocated as densely as possible at any given moment. This should allow allocating as many counters as needed and essentially eliminate contention.</p><p>On OCaml 4 <code>instantaneous_domain_index ()</code> will always return <code>0</code>.</p></div></div></div></body></html>
|