fix: in Lock, prevent flambda from reordering mutex-protected operations

inspired by Mutex.protect in the stdlib, which is also `[@inline never]` for this reason
document how many threads are used for work in Ws_pool
2025-12-21 10:06:43 -05:00 · 2025-11-19 12:28:09 -05:00 · 2025-11-19 12:24:33 -05:00 · 2025-11-14 22:56:37 -05:00
4 changed files with 35 additions and 36 deletions
--- a/README.md
+++ b/README.md
@ -16,16 +16,13 @@ In addition, some concurrency and parallelism primitives are provided:
 - `Moonpool.Chan` provides simple cooperative and thread-safe channels
    to use within pool-bound tasks. They're essentially re-usable futures.

-    On OCaml 5 (meaning there's actual domains and effects, not just threads),
-    a `Fut.await` primitive is provided. It's simpler and more powerful
+    Moonpool now requires OCaml 5 (meaning there's actual domains and effects, not just threads),
+    so the `Fut.await` primitive is always provided. It's simpler and more powerful
    than the monadic combinators.
 - `Moonpool_forkjoin`, in the library `moonpool.forkjoin`
    provides the fork-join parallelism primitives
    to use within tasks running in the pool.

-On OCaml 4.xx, there is only one domain; all threads run on it, but the
-pool abstraction is still useful to provide preemptive concurrency.
-
 ## Usage

 The user can create several thread pools (implementing the interface `Runner.t`).
@ -182,7 +179,7 @@ scope).

 ### Fork-join

-On OCaml 5, again using effect handlers, the sublibrary `moonpool.forkjoin`
+The sub-library `moonpool.forkjoin`
 provides a module `Moonpool_forkjoin`
 implements the [fork-join model](https://en.wikipedia.org/wiki/Fork%E2%80%93join_model).
 It must run on a pool (using `Runner.run_async` or inside a future via `Fut.spawn`).
@ -296,21 +293,18 @@ You are assuming that, if pool P1 has 5000 tasks, and pool P2 has 10 other tasks

 ## OCaml versions

-This works for OCaml >= 4.08.
- On OCaml 4.xx, there are no domains, so this is just a library for regular thread pools
-    with not actual parallelism (except for threads that call C code that releases the runtime lock, that is).
-    C calls that do release the runtime lock (e.g. to call [Z3](https://github.com/Z3Prover/z3), hash a file, etc.)
-    will still run in parallel.
- on OCaml 5.xx, there is a fixed pool of domains (using the recommended domain count).
-    These domains do not do much by themselves, but we schedule new threads on them, and form pools
-    of threads that contain threads from each domain.
-    Each domain might thus have multiple threads that belong to distinct pools (and several threads from
-    the same pool, too — this is useful for threads blocking on IO); Each pool will have threads
-    running on distinct domains, which enables parallelism.
+This works for OCaml >= 5.00.

-    A useful analogy is that each domain is a bit like a CPU core, and `Thread.t` is a logical thread running on a core.
-    Multiple threads have to share a single core and do not run in parallel on it[^2].
-    We can therefore build pools that spread their worker threads on multiple cores to enable parallelism within each pool.
+Internally, there is a fixed pool of domains (using the recommended domain count).
+These domains do not do much by themselves, but we schedule new threads on them, and form pools
+of threads that contain threads from each domain.
+Each domain might thus have multiple threads that belong to distinct pools (and several threads from
+the same pool, too — this is useful for threads blocking on IO); Each pool will have threads
+running on distinct domains, which enables parallelism.
+
+A useful analogy is that each domain is a bit like a CPU core, and `Thread.t` is a logical thread running on a core.
+Multiple threads have to share a single core and do not run in parallel on it[^2].
+We can therefore build pools that spread their worker threads on multiple cores to enable parallelism within each pool.

 TODO: actually use https://github.com/haesbaert/ocaml-processor to pin domains to cores,
 possibly optionally using `select` in dune.
@ -326,3 +320,4 @@ $ opam install moonpool
 ```

 [^2]: ignoring hyperthreading for the sake of the analogy.
+
--- a/src/core/fifo_pool.mli
+++ b/src/core/fifo_pool.mli
@ -29,10 +29,8 @@ val create : (unit -> t, _) create_args
 (** [create ()] makes a new thread pool.
    @param on_init_thread
      called at the beginning of each new thread in the pool.
-    @param min
-      minimum size of the pool. See {!Pool.create_args}. The default is
-      [Domain.recommended_domain_count()], ie one worker per CPU core. On OCaml
-      4 the default is [4] (since there is only one domain).
+    @param num_threads
+      number of worker threads. See {!Ws_pool.create} for more details.
    @param on_exit_thread called at the end of each worker thread in the pool.
    @param name name for the pool, used in tracing (since 0.6) *)

--- a/src/core/lock.ml
+++ b/src/core/lock.ml
@ -5,13 +5,13 @@ type 'a t = {

 let create content : _ t = { mutex = Mutex.create (); content }

-let with_ (self : _ t) f =
+let[@inline never] with_ (self : _ t) f =
  Mutex.lock self.mutex;
-  try
-    let x = f self.content in
+  match f self.content with
+  | x ->
    Mutex.unlock self.mutex;
    x
-  with e ->
+  | exception e ->
    Mutex.unlock self.mutex;
    raise e

@ -24,13 +24,13 @@ let[@inline] update_map l f =
      l.content <- x';
      y)

-let get l =
+let[@inline never] get l =
  Mutex.lock l.mutex;
  let x = l.content in
  Mutex.unlock l.mutex;
  x

-let set l x =
+let[@inline never] set l x =
  Mutex.lock l.mutex;
  l.content <- x;
  Mutex.unlock l.mutex
--- a/src/core/ws_pool.mli
+++ b/src/core/ws_pool.mli
@ -1,8 +1,8 @@
 (** Work-stealing thread pool.

    A pool of threads with a worker-stealing scheduler. The pool contains a
-    fixed number of threads that wait for work items to come, process these, and
-    loop.
+    fixed number of worker threads that wait for work items to come, process
+    these, and loop.

    This is good for CPU-intensive tasks that feature a lot of small tasks. Note
    that tasks will not always be processed in the order they are scheduled, so
@ -15,8 +15,8 @@
    in it to stop (after they finish their work), and wait for them to stop.

    The threads are distributed across a fixed domain pool (whose size is
-    determined by {!Domain.recommended_domain_count} on OCaml 5, and simply the
-    single runtime on OCaml 4). *)
+    determined by {!Domain.recommended_domain_count}. See {!create} for more
+    details. *)

 include module type of Runner

@ -36,8 +36,14 @@ val create : (unit -> t, _) create_args
    @param num_threads
      size of the pool, ie. number of worker threads. It will be at least [1]
      internally, so [0] or negative values make no sense. The default is
-      [Domain.recommended_domain_count()], ie one worker thread per CPU core. On
-      OCaml 4 the default is [4] (since there is only one domain).
+      [Domain.recommended_domain_count()], ie one worker thread per CPU core.
+
+    Note that specifying [num_threads=n] means that the degree of parallelism is
+    at most [n]. This behavior is different than the one of [Domainslib], see
+    https://github.com/c-cube/moonpool/issues/41 for context.
+
+    If you want to use all cores, use [Domain.recommended_domain_count()].
+
    @param on_exit_thread called at the end of each thread in the pool
    @param name
      a name for this thread pool, used if tracing is enabled (since 0.6) *)
Author	SHA1	Message	Date
Simon Cruanes	189a95a514	fix: in Lock, prevent flambda from reordering mutex-protected operations Some checks failed github pages / Deploy doc (push) Has been cancelled Details Build and Test / build (push) Has been cancelled Details Build and Test / build-compat (push) Has been cancelled Details Build and Test / format (push) Has been cancelled Details inspired by Mutex.protect in the stdlib, which is also `[@inline never]` for this reason	2025-11-19 12:28:09 -05:00
Simon Cruanes	0959004b11	document how many threads are used for work in Ws_pool	2025-11-19 12:24:33 -05:00
Simon Cruanes	75e528413b	remove mentions of ocaml4 in readme Some checks failed github pages / Deploy doc (push) Has been cancelled Details Build and Test / build (push) Has been cancelled Details Build and Test / build-compat (push) Has been cancelled Details Build and Test / format (push) Has been cancelled Details	2025-11-14 22:56:37 -05:00