fix: in Lock, prevent flambda from reordering mutex-protected operations

inspired by Mutex.protect in the stdlib, which is also `[@inline never]` for this reason
document how many threads are used for work in Ws_pool
2025-12-21 10:06:43 -05:00 · 2025-11-19 12:28:09 -05:00 · 2025-11-19 12:24:33 -05:00 · 2025-11-14 22:56:37 -05:00
4 changed files with 35 additions and 36 deletions
--- a/README.md
+++ b/README.md
@ -16,16 +16,13 @@ In addition, some concurrency and parallelism primitives are provided:
 - `Moonpool.Chan` provides simple cooperative and thread-safe channels
    to use within pool-bound tasks. They're essentially re-usable futures.

-    On OCaml 5 (meaning there's actual domains and effects, not just threads),
-    a `Fut.await` primitive is provided. It's simpler and more powerful
+    Moonpool now requires OCaml 5 (meaning there's actual domains and effects, not just threads),
+    so the `Fut.await` primitive is always provided. It's simpler and more powerful
    than the monadic combinators.
 - `Moonpool_forkjoin`, in the library `moonpool.forkjoin`
    provides the fork-join parallelism primitives
    to use within tasks running in the pool.

-On OCaml 4.xx, there is only one domain; all threads run on it, but the
-pool abstraction is still useful to provide preemptive concurrency.
-
 ## Usage

 The user can create several thread pools (implementing the interface `Runner.t`).
@ -182,7 +179,7 @@ scope).

 ### Fork-join

-On OCaml 5, again using effect handlers, the sublibrary `moonpool.forkjoin`
+The sub-library `moonpool.forkjoin`
 provides a module `Moonpool_forkjoin`
 implements the [fork-join model](https://en.wikipedia.org/wiki/Fork%E2%80%93join_model).
 It must run on a pool (using `Runner.run_async` or inside a future via `Fut.spawn`).
@ -296,12 +293,9 @@ You are assuming that, if pool P1 has 5000 tasks, and pool P2 has 10 other tasks

 ## OCaml versions

-This works for OCaml >= 4.08.
- On OCaml 4.xx, there are no domains, so this is just a library for regular thread pools
-    with not actual parallelism (except for threads that call C code that releases the runtime lock, that is).
-    C calls that do release the runtime lock (e.g. to call [Z3](https://github.com/Z3Prover/z3), hash a file, etc.)
-    will still run in parallel.
- on OCaml 5.xx, there is a fixed pool of domains (using the recommended domain count).
+This works for OCaml >= 5.00.
+
+Internally, there is a fixed pool of domains (using the recommended domain count).
 These domains do not do much by themselves, but we schedule new threads on them, and form pools
 of threads that contain threads from each domain.
 Each domain might thus have multiple threads that belong to distinct pools (and several threads from
@ -326,3 +320,4 @@ $ opam install moonpool
 ```

 [^2]: ignoring hyperthreading for the sake of the analogy.
+
--- a/src/core/fifo_pool.mli
+++ b/src/core/fifo_pool.mli
@ -29,10 +29,8 @@ val create : (unit -> t, _) create_args
 (** [create ()] makes a new thread pool.
    @param on_init_thread
      called at the beginning of each new thread in the pool.
-    @param min
-      minimum size of the pool. See {!Pool.create_args}. The default is
-      [Domain.recommended_domain_count()], ie one worker per CPU core. On OCaml
-      4 the default is [4] (since there is only one domain).
+    @param num_threads
+      number of worker threads. See {!Ws_pool.create} for more details.
    @param on_exit_thread called at the end of each worker thread in the pool.
    @param name name for the pool, used in tracing (since 0.6) *)

--- a/src/core/lock.ml
+++ b/src/core/lock.ml
@ -5,13 +5,13 @@ type 'a t = {

 let create content : _ t = { mutex = Mutex.create (); content }

-let with_ (self : _ t) f =
+let[@inline never] with_ (self : _ t) f =
  Mutex.lock self.mutex;
-  try
-    let x = f self.content in
+  match f self.content with
+  | x ->
    Mutex.unlock self.mutex;
    x
-  with e ->
+  | exception e ->
    Mutex.unlock self.mutex;
    raise e

@ -24,13 +24,13 @@ let[@inline] update_map l f =
      l.content <- x';
      y)

-let get l =
+let[@inline never] get l =
  Mutex.lock l.mutex;
  let x = l.content in
  Mutex.unlock l.mutex;
  x

-let set l x =
+let[@inline never] set l x =
  Mutex.lock l.mutex;
  l.content <- x;
  Mutex.unlock l.mutex
--- a/src/core/ws_pool.mli
+++ b/src/core/ws_pool.mli
@ -1,8 +1,8 @@
 (** Work-stealing thread pool.

    A pool of threads with a worker-stealing scheduler. The pool contains a
-    fixed number of threads that wait for work items to come, process these, and
-    loop.
+    fixed number of worker threads that wait for work items to come, process
+    these, and loop.

    This is good for CPU-intensive tasks that feature a lot of small tasks. Note
    that tasks will not always be processed in the order they are scheduled, so
@ -15,8 +15,8 @@
    in it to stop (after they finish their work), and wait for them to stop.

    The threads are distributed across a fixed domain pool (whose size is
-    determined by {!Domain.recommended_domain_count} on OCaml 5, and simply the
-    single runtime on OCaml 4). *)
+    determined by {!Domain.recommended_domain_count}. See {!create} for more
+    details. *)

 include module type of Runner

@ -36,8 +36,14 @@ val create : (unit -> t, _) create_args
    @param num_threads
      size of the pool, ie. number of worker threads. It will be at least [1]
      internally, so [0] or negative values make no sense. The default is
-      [Domain.recommended_domain_count()], ie one worker thread per CPU core. On
-      OCaml 4 the default is [4] (since there is only one domain).
+      [Domain.recommended_domain_count()], ie one worker thread per CPU core.
+
+    Note that specifying [num_threads=n] means that the degree of parallelism is
+    at most [n]. This behavior is different than the one of [Domainslib], see
+    https://github.com/c-cube/moonpool/issues/41 for context.
+
+    If you want to use all cores, use [Domain.recommended_domain_count()].
+
    @param on_exit_thread called at the end of each thread in the pool
    @param name
      a name for this thread pool, used if tracing is enabled (since 0.6) *)
Author	SHA1	Message	Date
Simon Cruanes	189a95a514	fix: in Lock, prevent flambda from reordering mutex-protected operations Some checks failed github pages / Deploy doc (push) Has been cancelled Details Build and Test / build (push) Has been cancelled Details Build and Test / build-compat (push) Has been cancelled Details Build and Test / format (push) Has been cancelled Details inspired by Mutex.protect in the stdlib, which is also `[@inline never]` for this reason	2025-11-19 12:28:09 -05:00
Simon Cruanes	0959004b11	document how many threads are used for work in Ws_pool	2025-11-19 12:24:33 -05:00
Simon Cruanes	75e528413b	remove mentions of ocaml4 in readme Some checks failed github pages / Deploy doc (push) Has been cancelled Details Build and Test / build (push) Has been cancelled Details Build and Test / build-compat (push) Has been cancelled Details Build and Test / format (push) Has been cancelled Details	2025-11-14 22:56:37 -05:00