Version: 06-Nov-2018

This book is pre-release and is an evolving work-in-progress. It is published here for the purposes of gaining feedback and providing early value to those who have an interesting resource oriented computing.

Please send any comments or feedback to: rocbook​@durablescope​.com

© 2018 Tony Butterfield.
All rights reserved.

Spaces

Resource Identifier Grammars

We have previously talked about the process of requests being resolved to endpoints. While the process of request resolution treats the identifiers opaque, once an endpoint is evaluating the request it can parse the identifier in any way it wants. In NetKernel most endpoints use identifier grammars to parse identifiers. These grammars are a variant of Backus–Naur form1 (BNF). They offer a number of benefits over simple pattern matching with technology like globbing2 or regular expressions.3 In addition to providing identifier pattern matching functionality, identifier grammars can extract and unescape nested fragments of an identifier. This extraction feature is critical when grammars start to define resource identifiers with arguments as we shall soon see. In addition to parsing out arguments, the inverse can also be performed - constructing resource identifiers from a grammar with a set of arguments.

Standard Grammars

Standard Grammars provide the most general grammar implementation as well as the basis for both active and simple grammars. They are typically defined with an XML syntax that we will use here, but a builder API exists for constructing them also. Standard Grammars support arbitrary ordering, interleaves, optional arguments, escaping, pre-configured regular expressions for common patterns as well as full regular expression embedding.

Root Element of <grammar>

The root element of a grammar is always named "grammar". Any text contained within the root element, or any other element other than <regex> is considered to be literal text that must appear in a matched resource identifier. Insignificant whitespace is ignored (insignificant whitespace is an XML term for whitespace such as spaces, tabs, and newlines, which occur directly before or after tags. For example the following grammar:

<grammar>https://google.com</grammar>

only matches a single resource identifier of, unsurprisingly, value https://google.com.

<group> Element

The group element is used for defining anonymous or named fragments of an identifier. Fragments of identifiers within named groups are captured and returned by the parser; this is useful for parsing out arguments or query parameters from identifiers as we shall see. The following grammar extracts a product id from an identifier with a product URI scheme:

<grammar>
    product:
    <group name="productId">
        <regex type="anything"/>
    </group>
</grammar>

So, for example, for product:123 it will return a single named argument productId=>123. If the name attribute is omitted then no argument is captured by the grammar will matching will be identical.

The group element also takes optional max and min attributes to specify how many repetitions of the group to allow, the default being one and only one. Non-negative integer values are valid as well as an asterisk to indicate infinity.

<optional> Element

The optional element contains other grammar elements that are considered optional. A name attribute can be specified for this element. For example, the following describes both the identifier file:/products and res:/products/:

<grammar>res:/customers
  <optional>/</optional>
</grammar> 

<choice> Element

The choice element contains multiple group elements and supports the use of exactly one of the groups. A name attribute can be specified for this element. For example, the following grammar describes both the identifier file:/customers and file:/products:

<grammar>
  <choice>
    <group>file:/customers</group>
    <group>file:/products</group>
  </choice>
</grammar>

<interleave> Element

The interleave element contains multiple group elements which can be interleaved in any order. A name attribute can be specified for this element. For example, the following grammar describes both the identifier file:/customers/delinquentpayment and file:/customers/paymentdelinquent

<grammar>file:/customers/
  <interleave>
    <group>delinquent</group>
    <group>payment</group>
  </interleave>
</grammar>

<regex> Element

TODO

Active Grammars

TODO

Path-based Simple Grammars

TODO