Module CCParse
Very Simple Parser Combinators
These combinators can be used to write very simple parsers, for example to extract data from a line-oriented file, or as a replacement to Scanf.
A few examples
Some more advanced example(s) can be found in the /examples directory.
Parse a tree
open CCParse;;
type tree = L of int | N of tree * tree;;
let mk_leaf x = L x
let mk_node x y = N(x,y)
let ptree = fix @@ fun self ->
skip_space *>
( (char '(' *> (pure mk_node <*> self <*> self) <* char ')')
<|>
(U.int >|= mk_leaf) )
;;
parse_string_exn ptree "(1 (2 3))" ;;
parse_string_exn ptree "((1 2) (3 (4 5)))" ;;Parse a list of words
open Containers.Parse;;
let p = U.list ~sep:"," U.word;;
parse_string_exn p "[abc , de, hello ,world ]";;Stress Test
This makes a list of 100_000 integers, prints it and parses it back.
let p = CCParse.(U.list ~sep:"," U.int);;
let l = CCList.(1 -- 100_000);;
let l_printed =
CCFormat.(to_string (within "[" "]" (list ~sep:(return ",@,") int))) l;;
let l' = CCParse.parse_string_exn p l_printed;;
assert (l=l');;Stability guarantees
Some functions are marked "experimental" and are still subject to change.
module Position : sig ... end
module Error : sig ... end
type +'a or_error= ('a, Error.t) Stdlib.result'a or_erroris eitherOk xfor some resultx : 'a, or an errorError.t.See
stringify_resultandError.to_stringto print the error message.
exceptionParseError of Error.t
Input
Combinators
type 'a tThe abstract type of parsers that return a value of type
'a(or fail).- raises ParseError
in case of failure.
- since
- NEXT_RELEASE the type is private.
val return : 'a -> 'a tAlways succeeds, without consuming its input.
val map : ('a -> 'b) -> 'a t -> 'b tval map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c tval map3 : ('a -> 'b -> 'c -> 'd) -> 'a t -> 'b t -> 'c t -> 'd tval bind : ('a -> 'b t) -> 'a t -> 'b tbind f presults in a new parser which behaves aspthen, in case of success, appliesfto the result.- since
- NEXT_RELEASE
val eoi : unit tExpect the end of input, fails otherwise.
val empty : unit tSucceed with
().- since
- NEXT_RELEASE
val fail : string -> 'a tfail msgfails with the given message. It can trigger a backtrack.
val fail_lazy : (unit -> string) -> 'a tLike
fail, but only produce an error message on demand.- since
- NEXT_RELEASE
val parsing : string -> 'a t -> 'a tparsing s pbehaves the same asp, with the information that we are parsings, ifpfails. The messagesis added to the error, it does not replace it, not does the location change (the error still points to the same location as inp).
val set_error_message : string -> 'a t -> 'a tset_error_message msg pbehaves likep, but ifpfails,set_error_message msg pfails withmsginstead and at the current position. The internal error message ofpis just discarded.- since
- NEXT_RELEASE
val with_pos : 'a t -> ('a * position) twith_pos pbehaves likep, but returns the (starting) position along withp's result.EXPERIMENTAL
- since
- NEXT_RELEASE
val any_char : char tany_charparses any character. It still fails if the end of input was reached.- since
- NEXT_RELEASE
val any_char_n : int -> string tany_char_n lenparses exactlylencharacters from the input. Fails if the input doesn't contain at leastlenchars.- since
- NEXT_RELEASE
val char : char -> char tchar cparses the charactercand nothing else.
type sliceA slice of the input, as returned by some combinators such as
split_1orsplit_listortake.The idea is that one can use some parsers to cut the input into slices, e.g. split into lines, or split a line into fields (think CSV or TSV). Then a variety of parsers can be used on each slice to extract data from it using
recurse.Slices contain enough information to make it possible for
recurse slice pto report failures (ifpfails) using locations from the original input, not relative to the slice. Therefore, even after splitting the input into lines using, say,each_line, a failure to parse the 500th line will be reported at line 500 and not at line 1.EXPERIMENTAL
- since
- NEXT_RELEASE
module Slice : sig ... endFunctions on slices.
val recurse : slice -> 'a t -> 'a trecurse slice pparses theslice(most likely obtained via another combinator, such assplit_1orsplit_n), usingp.The slice contains a position which is used to relocate error messages to their position in the whole input, not just relative to the slice.
EXPERIMENTAL
- since
- NEXT_RELEASE
val set_current_slice : slice -> unit tset_current_slice slicereplaces the parser's state withslice.EXPERIMENTAL
- since
- NEXT_RELEASE
val chars_fold : f:('acc -> char -> [ `Continue of 'acc | `Consume_and_stop of 'acc | `Stop of 'acc | `Fail of string ]) -> 'acc -> ('acc * slice) tchars_fold f acc0folds over characters of the input. Each charcis passed, along with the current accumulator, tof;fcan either:- stop, by returning
`Stop acc. In this case the final accumulatoraccis returned, andcis not consumed. - consume char and stop, by returning
`Consume_and_stop acc. - fail, by returning
`Fail msg. In this case the parser fails with the given message. - continue, by returning
`Continue acc. The parser continues to the next char with the new accumulator.
This is a generalization of of
chars_ifthat allows one to transform characters on the fly, skip some, handle escape sequences, etc. It can also be useful as a base component for a lexer.- returns
a pair of the final accumular, and the slice matched by the fold.
- since
- NEXT_RELEASE
- stop, by returning
val chars_fold_transduce : f:('acc -> char -> [ `Continue of 'acc | `Yield of 'acc * char | `Consume_and_stop | `Stop | `Fail of string ]) -> 'acc -> ('acc * string) tSame as
char_foldbut with the following differences:- returns a string along with the accumulator, rather than the slice of all the characters accepted by
`Continue _. The string is built from characters returned by`Yield. - new case
`Yield (acc, c)addscto the returned string and continues parsing withacc.
- since
- NEXT_RELEASE
- returns a string along with the accumulator, rather than the slice of all the characters accepted by
val take : int -> slice ttake lenparses exactlylencharacters from the input. Fails if the input doesn't contain at leastlenchars.- since
- NEXT_RELEASE
val take_if : (char -> bool) -> slice ttake_if ftakes characters as long as they satisfy the predicatef.- since
- NEXT_RELEASE
val take1_if : ?descr:string -> (char -> bool) -> slice ttake1_if ftakes characters as long as they satisfy the predicatef. Fails if no character satisfiesf.- parameter descr
describes what kind of character was expected, in case of error
- since
- NEXT_RELEASE
val char_if : ?descr:string -> (char -> bool) -> char tchar_if fparses a characterciff c = true. Fails if the next char does not satisfyf.- parameter descr
describes what kind of character was expected, in case of error
val chars_if : (char -> bool) -> string tchars_if fparses a string of chars that satisfyf. Cannot fail.
val chars1_if : ?descr:string -> (char -> bool) -> string tLike
chars_if, but accepts only non-empty strings.chars1_if pfails if the string accepted bychars_if pis empty.chars1_if pis equivalent totake1_if p >|= Slice.to_string.- parameter descr
describes what kind of character was expected, in case of error
val endline : char tParse '\n'.
val space : char tTab or space.
val white : char tTab or space or newline.
val skip_chars : (char -> bool) -> unit tSkip 0 or more chars satisfying the predicate.
val skip_space : unit tSkip ' ' and '\t'.
val skip_white : unit tSkip ' ' and '\t' and '\n'.
val suspend : (unit -> 'a t) -> 'a tsuspend fis the same asf (), but evaluatesf ()only when needed.
val string : string -> string tstring sparses exactly the strings, and nothing else.
val many : 'a t -> 'a list tmany pparsesprepeatedly, untilpfails, and collects the results into a list.
val optional : _ t -> unit toptional ptries to parsep, and return()whether it succeeded or failed. Cannot fail itself. It consumes input ifpsucceeded (as much aspconsumed), but consumes not input ifpfailed.- since
- NEXT_RELEASE
val try_ : 'a t -> 'a ttry_ pis just likep(it used to play a role in backtracking semantics but no more).- deprecated
since NEXT_RELEASE it can just be removed. See
try_optif you want to detect failure.
val try_opt : 'a t -> 'a option ttry_opt ptries to parse usingp, and returnSome xifpsucceeded withx(and consumes whatpconsumed). Otherwise it returnsNoneand consumes nothing. This cannot fail.- since
- NEXT_RELEASE
val many_until : until:_ t -> 'a t -> 'a list tmany_until ~until pparses as manypas it can until theuntilparser successfully returns. Ifpfails before that thenmany_until ~until pfails as well. Typicallyuntilcan be a closing ')' or another termination condition, and what is consumed byuntilis also consumed bymany_until ~until p.EXPERIMENTAL
- since
- NEXT_RELEASE
val try_or : 'a t -> f:('a -> 'b t) -> else_:'b t -> 'b ttry_or p1 ~f ~else_:p2attempts to parsexusingp1, and then becomesf x. Ifp1fails, then it becomesp2. This can be useful iffis expensive but only ever works ifp1matches (e.g. after an opening parenthesis or some sort of prefix).- since
- NEXT_RELEASE
val try_or_l : ?msg:string -> ?else_:'a t -> (unit t * 'a t) list -> 'a ttry_or_l ?else_ ltries each pair(test, p)in order. If the n-thtestsucceeds, thentry_or_l lbehaves like n-thp, whetherpfails or not. If they all fail, andelse_is defined, then it behaves likeelse_. If all fail, andelse_isNone, then it fails as well.This is a performance optimization compared to
(<|>). We commit to a branch if the test succeeds, without backtracking at all.See
lookahead_ignorefor a convenient way of writing the test conditions.- parameter msg
error message if all options fail
EXPERIMENTAL
- since
- NEXT_RELEASE
val or_ : 'a t -> 'a t -> 'a tor_ p1 p2tries to parsep1, and if it fails, triesp2from the same position.- since
- NEXT_RELEASE
val both : 'a t -> 'b t -> ('a * 'b) tboth a bparsesa, thenb, then returns the pair of their results.- since
- NEXT_RELEASE
val many1 : 'a t -> 'a list tmany1 pis likemany pexcepts it fails if the list is empty (i.e. it needspto succeed at least once).
val skip : _ t -> unit tskip pparses zero or more timespand ignores its result. It is eager, meaning it will continue as long aspsucceeds. As soon aspfails,skip pstops consuming any input.
val sep_until : until:_ t -> by:_ t -> 'a t -> 'a list tSame as
sepbut stop whenuntilparses successfully.- since
- NEXT_RELEASE
val lookahead : 'a t -> 'a tlookahead pbehaves likep, except it doesn't consume any input.EXPERIMENTAL
- since
- NEXT_RELEASE
val lookahead_ignore : 'a t -> unit tlookahead_ignore ptries to parse input withp, and succeeds ifpsucceeds. However it doesn't consume any input and returns(), so in effect its only use-case is to detect whetherpsucceeds, e.g. intry_or_l.EXPERIMENTAL
- since
- NEXT_RELEASE
val line : slice tParse a line,
'\n'excluded, and position the cursor after the'\n'.- since
- NEXT_RELEASE
val line_str : string tline_strisline >|= Slice.to_string. It parses the next line and turns the slice into a string. The state points to the character immediately after the'\n'character.- since
- NEXT_RELEASE
val each_line : 'a t -> 'a list teach_line prunspon each line of the input. EXPERIMENTAL- since
- NEXT_RELEASE
val split_1 : on_char:char -> (slice * slice option) tsplit_1 ~on_charlooks foron_charin the input, and returns a pairsl1, sl2, where:sl1is the slice of the input the precedes the first occurrence ofon_char, or the whole input ifon_charcannot be found. It does not containon_char.sl2is the slice that comes afteron_char, orNoneifon_charcouldn't be found. It doesn't contain the first occurrence ofon_char(if any).
The parser is now positioned at the end of the input.
EXPERIMENTAL
- since
- NEXT_RELEASE
val split_list : on_char:char -> slice list tsplit_list ~on_charsplits the input on all occurrences ofon_char, returning a list of slices.EXPERIMENTAL
- since
- NEXT_RELEASE
val split_list_at_most : on_char:char -> int -> slice list tsplit_list_at_most ~on_char nappliessplit_1 ~on_charat mostntimes, to get a list ofn+1elements. The last element might containon_char. This is useful to limit the amount of work done bysplit_list.EXPERIMENTAL
- since
- NEXT_RELEASE
val split_2 : on_char:char -> (slice * slice) tsplit_2 ~on_charsplits the input into exactly 2 fields, and fails if the split yields less or more than 2 items. EXPERIMENTAL- since
- NEXT_RELEASE
val split_4 : on_char:char -> (slice * slice * slice * slice) tSee
split_2EXPERIMENTAL- since
- NEXT_RELEASE
val each_split : on_char:char -> 'a t -> 'a list tsplit_list_map ~on_char pusessplit_list ~on_charto split the input, then parses each chunk of the input thus obtained usingp.The difference with
sep ~by:(char on_char) pis thatsepcallspfirst, and only tries to findon_charafterpreturns. While it is more flexible, this technique also meansphas to be careful not to consumeon_charby error.A useful specialization of this is
each_line, which is basicallyeach_split ~on_char:'\n' p.EXPERIMENTAL
- since
- NEXT_RELEASE
val all : slice tallreturns all the unconsumed input as a slice, and consumes it. UseSlice.to_stringto turn it into a string.Note that
lookahead allcan be used to peek at the rest of the input without consuming anything.- since
- NEXT_RELEASE
val all_str : string tall_straccepts all the remaining chars and extracts them into a string. Similar toallbut with a string.EXPERIMENTAL
- since
- NEXT_RELEASE
val memo : 'a t -> 'a tMemoize the parser.
memo pwill behave likep, but when called in a state (read: position in input) it has already processed,memo preturns a result directly. The implementation uses an underlying hashtable. This can be costly in memory, but improve the run time a lot if there is a lot of backtracking involvingp.Do not call
memoinside other functions, especially with(>>=),map, etc. being so prevalent. Instead the correct way to use it is in a toplevel definition:let my_expensive_parser = memo (foo *> bar >>= fun i -> …)This function is not thread-safe.
Infix
module Infix : sig ... endinclude module type of Infix
val (>|=) : 'a t -> ('a -> 'b) -> 'b tAlias to
map.p >|= fparses an itemxusingp, and returnsf x.
val (>>=) : 'a t -> ('a -> 'b t) -> 'b tAlias to
bind.p >>= fresults in a new parser which behaves aspthen, in case of success, appliesfto the result.
val (<*) : 'a t -> _ t -> 'a ta <* bparsesaintox, parsesband ignores its result, and returnsx.
val (*>) : _ t -> 'a t -> 'a ta *> bparsesa, then parsesbintox, and returnsx. The result ofais ignored.
val (<|>) : 'a t -> 'a t -> 'a tAlias to
or_.a <|> btries to parsea, and ifafails without consuming any input, backtracks and tries to parseb, otherwise it fails asa. Seetry_to ensureadoes not consume anything (but it is best to avoid wrapping large parsers withtry_).
val (<?>) : 'a t -> string -> 'a ta <?> msgbehaves likea, but ifafails,a <?> msgfails withmsginstead. Useful as the last choice in a series of<|>. For example:a <|> b <|> c <?> "expected one of a, b, c".
val (|||) : 'a t -> 'b t -> ('a * 'b) tAlias to
both.a ||| bparsesa, thenb, then returns the pair of their results.- since
- NEXT_RELEASE
Let operators on OCaml >= 4.08.0, nothing otherwise
- since
- 2.8
Parse input
val stringify_result : 'a or_error -> ('a, string) Stdlib.resultTurn a
Error.t-oriented result into a more basic string result.- since
- NEXT_RELEASE
val parse_string : 'a t -> string -> ('a, string) Stdlib.resultParse a string using the parser.
val parse_string_e : 'a t -> string -> 'a or_errorVersion of
parse_stringthat returns a more detailed error.
val parse_string_exn : 'a t -> string -> 'a- raises ParseError
if it fails.
val parse_file : 'a t -> string -> ('a, string) Stdlib.resultparse_file p filenameparses file namedfilenamewithpby opening the file and reading it whole.
val parse_file_e : 'a t -> string -> 'a or_errorVersion of
parse_filethat returns a more detailed error.
val parse_file_exn : 'a t -> string -> 'aSame as
parse_file, but- raises ParseError
if it fails.
module U : sig ... end
module Debug_ : sig ... endDebugging utils. EXPERIMENTAL