Version: 02-Sep-2018

This book is pre-release and is an evolving work-in-progress. It is published here for the purposes of gaining feedback and providing early value to those who have an interesting resource oriented computing.

Please send any comments or feedback to: rocbook​@durablescope​.com

© 2018 Tony Butterfield.
All rights reserved.

Requests in Depth

In this chapter, we will delve into some of the core concepts behind requests.

The Structure of the Request-Response Mechanism

The concept of a request being issued and response obtained is a core part of both the REST and ROC architectural styles. Requests are used and are the only way, to interact with resources. An endpoint issues a request, and the kernel resolves the request to another endpoint. This endpoint does some processing and returns a response which is delivered back to the issuing endpoint.

First, we'll look at the different facets of a request.

Resource Identifier

The identifier of the resource the request wants to interact with. This is simply a string of characters. NetKernel places no constraints on this identifier, and it considered opaque - the kernel does not analyse its structure. Specific endpoints resolve a request will, of course, look the structure to determine a match.

Verb

When a request is issued to a resource, there is an intent. The verb indicates that intent. Verbs are limited to a small fixed set. Contrast this with the unlimited number of methods which may be associated with an object in the object-oriented programming style. Through this constraint we a gain control that allows us to reason much more clearly about the operation.

The verbs supported by NetKernel are:

SOURCE - request a snapshot of the current state of a resource. This operation will typically not modify the resource or have any other side-effects. It should be idempotent, meaning it doesn't matter if you make a request once and re-use the response, or issue the request multiple times. The state of the resource, a representation, is returned as the response body.

SINK - request to update the resource with new state passed in the request. Depending upon the structure of the representation and the capabilities of the resource this may be a partial update and complete replacement. The new state is passed to the resource in the request body.

EXISTS - request to determine if a resource identifier can resolve to a resource. A boolean TRUE is returned if the identifier can be resolved to an endpoint and that endpoint determines the resource exists.

DELETE - request the deletion of the resource. The semantics of delete are such that if the resource is successfully deleted a boolean TRUE is returned as the response representation, otherwise FALSE.

NEW - request creation of a new resource. This verb requests the creation of a new resource with the request identifier as either the base of the resources' identifier or as the actual identifier - the specific semantics will depend upon the implementation. Addition state may be passed to the endpoint to seed the resource state by passing a representation in the request body.

RESOLVE - request to resolve a resource without actually issuing the request to the resolved endpoint. This is useful for tracing and debugging. A Resolution representation is returned as the response body.

TRANSREPT - request to convert a representation from one form to another. Transrept is a contraction of trans-representation, in other words, the processes of moving from the representation to another. A representation is passed in the request body, and a different representation of the same state is returned in the response body. This request is useful for adapting from a representation that an endpoint for a resource works with natively, and the representation that a client expects.

Request Scope

The scope is a list of spaces that are used to direct the request resolution process. We'll cover this in more detail in the next section.

Representation

The request may specify the type of representation it expects the response to deliver. When specified, this field is passed to the resolved endpoint to take into account the requestors preference. However, it is under no obligation to fulfil the desire. If the desire is not met the kernel will orchestrate an additional request to transrept the responses representation into the one requested. If this second request fails, the whole request fails and return an exception response to the requestor. That way the requestor will either get the representation type it expects, or the request evaluation will fail with an exception.

Parent Request

Primary

Headers

TODO

things to cover: failure modes

Request Resolution and Scope

Request resolution is the process of determining which resource a request is referencing. Before a request can be evaluated, we must resolve which endpoint can perform that evaluation. This section discusses the process of request resolution. Firstly we'll look at the technical details, the data structures and mechanism, and follow that with some examples.

In the world-wide-web, the resource's identifier is enough to determine resolution due to the global address space and the standardised structure of the identifiers — uniform resource locators (URLs). Part of the URL determines the hostname (or IP address) and port of a server. Another part provides the server with a hierarchical path to the exact resource. Here resolution follows a static two-step process, firstly determine the server to talk to, and secondly, tell that server the path to the resource.

In resource-oriented computing the resource identifier is still an essential factor in determining resolution; however, there are no constraints in the syntax of resource identifiers. Resolution infrastructure must treat the identifiers as opaque. Additionally, a resource-oriented system typically has many address spaces, some static and some dynamic. For any particular request, these are composed into request scopes which guide the resolution process.

We've previously introduced request scope: the list of spaces contained within a request that determines how it is resolved. In this section, we'll introduce the idea of evaluation scope. Evaluation scope is also a list of spaces, but the role is different. It is the scope passed to an endpoint by the kernel when a request is evaluated. Evaluation scope is determined by the resolution process that caused an endpoint to be invoked. The endpoint then uses this evaluation scope as the basis for the request scope when issuing sub-requests. By basis, we mean that, though often the evaluation scope is used unchanged, the evaluation scope can be manipulated before issuing a sub-request. We will see more of this later, but typical examples are when pass-by-value arguments are used with a request, or when a sub-request should run in a sandbox isolated from it's calling scope.

The type of scope used in ROC is called dynamic scoping. In dynamic scoping, the runtime state of the request call stack determines the scope. Dynamic scoping is rare in modern programming languages; they instead use lexical scoping. In lexical scoping, the scope is statically defined by the syntax of the programming language. This has the advantage of being simpler for a developer to reason about, and also enables compile-time checks. However, it has some serious shortcomings for ROC. Firstly, in an environment where resource-oriented systems cross programming language, and even network boundaries, lexical scoping simply isn't possible. Additionally, dynamic scoping enables a much looser coupling. Resources can share their dynamic contextual state with others even though they have been implemented independently - this is important for many fundamental patterns in ROC including the pervasive language runtime pattern.

At the top of a chain of requests, there is a root request with no parent that was invoked by a transport due to some event external to the ROC system. The evaluation scope of this transport is just the single space in which it resides. Figure: Root request and sub-requests form a request tree

The resolution process is orchestrated by the Kernel whenever it receives a request. Each space within the request scope is asked in sequence if it can resolve the request. The sequence starts with the innermost scope, and ends when a resolution is found, or at the outermost scope whichever happens first. If no resolution is found, a "Resolution not found" error is returned to the requesting endpoint. At the start of resolution, the request scope is used as the basis for an evaluation scope. As each space in the scope is tested for resolution, it is discarded from the evaluation scope if it doesn't resolve. We describe this operation as popping scope. This results in an evaluation scope of the same size or smaller than the request scope. When resolution successfully occurs, it contains at least one space. Figure: Structure of request scope

The process of asking each space within the request scope if it can resolve involves delegating the request to that space. The space has the responsibility to define how and if it can resolve a request. Typically a space contains a list of endpoints which are tested for resolution in turn. A match is made based upon the verb and identifier of the request. It is common for endpoints to define a grammar for the identifiers that they will accept. A grammar is a pattern matching language which allows sub-sequences of the identifier to be extracted. This latter capability is useful when an endpoint embodies a set of resources, and part of the identifier is used to determine which specific resource it is. Figure: Delegation of resolution to a space

It is possible for endpoints to delegate resolution into another space. Endpoints such as overlays and, as we will see later, import, do precisely this. They act as a gateway from one space to another. All the resources of the delegate space are then available within the host space. When delegation occurs, the delegate space is added into the evaluation scope at the innermost position. This has the effect of giving any invoked endpoint access to the resources of the delegate space as well as any resources from the existing outer spaces.

Figure: Overlay delegating request resolution to a second space

Taking Control of Scope

In the previous sections, we saw how evaluation scopes could grow and shrink as a request tree is recursively evaluated. Mutation of the scope happens as a direct consequence of the spatial structure of a system through the process of resolution, and it happens automatically. In this section, we will look at various patterns for an endpoint to take runtime control over scope to implement a number of key patterns.

Up until now, we have considered spaces to be static constructs which exist to define our resource landscape. In this section, we see how ephemeral spaces can be created to last just for the lifetime of a sub-request.

Creating Context

We often want to create a context for a sub-request to execute in which has a set of resources available to define the landscape of that processing. A good example of this is when an external event triggers a transport to issue a root request. Usually, there is some state associated with that external event. One approach would be to pass that state as a pass-by-value argument to a sub-request. However, that approach has several problems. Firstly this state must be passed as an argument recursively through any necessary layers of resources to ensure that some endpoint that requires the state has access to it; otherwise, it is lost. Secondly, all of the work to process that state must be performed up front regardless of whether any of that state is ever required.

A better approach is for the transport to create an endpoint to expose one or more resources containing the external state. The transport will then place the endpoint in a transient space and inject it into the request scope of the root request. These resources can then be sourced by a sub-request regardless of how deep because they can always be resolved from the outermost request scope. The added advantage of this approach is that the endpoint that exposes the external state can perform any processing or extraction of that state only if it is asked for.

Figure: adding context into a root request

The HTTP transport within NetKernel uses this approach to expose resources such as httpRequest:/body, httpRequest:/header/accept by extracting state from the Java servlet request.

Sandboxed Request

Security is a very real concern for today's computer systems. Limiting access to resources is a commonly used approach in the real world as well as in resource-oriented systems. In the situation where we have untrusted code, or dynamically configured sub-systems, we want to ensure that they can only access predefined sets of resources. By running an accessor or issuing a request into a sandbox, we can ensure that only those resources contained within the sandbox can be resolved.

This approach is achieved very simply in ROC by creating an orphaned request scope for a sub-request that contains nothing but the sandbox space and optionally some pass-by-value arguments.

No sub-requests issued within the sandbox can resolve out to the parent scope, only a response can be returned.

Figure: request sandbox

Resource Identifier Grammars

We have previously talked about the process of requests being resolved to endpoints. While the process of request resolution treats the identifiers opaque, once an endpoint is evaluating the request it can parse the identifier in any way it wants. In NetKernel most endpoints use identifier grammars to parse identifiers. These grammars are a variant of Backus–Naur form1 (BNF). They offer a number of benefits over simple pattern matching with technology like globbing2 or regular expressions.3 In addition to providing identifier pattern matching functionality, identifier grammars can extract and unescape nested fragments of an identifier. This extraction feature is critical when grammars start to define resource identifiers with arguments as we shall soon see. In addition to parsing out arguments, the inverse can also be performed - constructing resource identifiers from a grammar with a set of arguments.

Standard Grammars

Standard Grammars provide the most general grammar implementation as well as the basis for both active and simple grammars. They are typically defined with an XML syntax that we will use here, but a builder API exists for constructing them also. Standard Grammars support arbitrary ordering, interleaves, optional arguments, escaping, pre-configured regular expressions for common patterns as well as full regular expression embedding.

Root Element of <grammar>

The root element of a grammar is always named "grammar". Any text contained within the root element, or any other element other than <regex> is considered to be literal text that must appear in a matched resource identifier. Insignificant whitespace is ignored (insignificant whitespace is an XML term for whitespace such as spaces, tabs, and newlines, which occur directly before or after tags. For example the following grammar:

<grammar>https://google.com</grammar>

only matches a single resource identifier of, unsurprisingly, value https://google.com.

<group> Element

The group element is used for defining anonymous or named fragments of an identifier. Fragments of identifiers within named groups are captured and returned by the parser; this is useful for parsing out arguments or query parameters from identifiers as we shall see. The following grammar extracts a product id from an identifier with a product URI scheme:

<grammar>
    product:
    <group name="productId">
        <regex type="anything"/>
    </group>
</grammar>

So, for example, for product:123 it will return a single named argument productId=>123. If the name attribute is omitted then no argument is captured by the grammar will matching will be identical.

The group element also takes optional max and min attributes to specify how many repetitions of the group to allow, the default being one and only one. Non-negative integer values are valid as well as an asterisk to indicate infinity.

<optional> Element

The optional element contains other grammar elements that are considered optional. A name attribute can be specified for this element. For example, the following describes both the identifier file:/products and res:/products/:

<grammar>res:/customers
  <optional>/</optional>
</grammar> 

<choice> Element

The choice element contains multiple group elements and supports the use of exactly one of the groups. A name attribute can be specified for this element. For example, the following grammar describes both the identifier file:/customers and file:/products:

<grammar>
  <choice>
    <group>file:/customers</group>
    <group>file:/products</group>
  </choice>
</grammar>

<interleave> Element

The interleave element contains multiple group elements which can be interleaved in any order. A name attribute can be specified for this element. For example, the following grammar describes both the identifier file:/customers/delinquentpayment and file:/customers/paymentdelinquent

<grammar>file:/customers/
  <interleave>
    <group>delinquent</group>
    <group>payment</group>
  </interleave>
</grammar>

<regex> Element

TODO

Active Grammars

TODO

Path-based Simple Grammars

TODO

Argument Passing

In the simplest case, there is a one-to-one correspondence between identifier and endpoint. For example, we could create an endpoint that returned the numeric value of Pi and map this into an address space using the identifier "Pi". However, when identifiers contain wildcards or regular expressions, there is a set of possible identifiers that are resolved to a single endpoint. In NetKernel there is a pattern matching language called grammars which serve the joint purpose of both matching identifiers and of extracting named parts of those identifiers as arguments.

Substring Arguments

The arguments can be thought of in two different ways. In the most literal sense, the arguments are simply sub-strings of the identifier and can be treated as values in their own right. For example, we could define the following grammar using a regular expression for an endpoint to source product information:

product:(.*)

and match an identifier:

product:0654321

From this, we could get the argument value, let's call it product id, of 0654321 which would be useful to the implementation of the accessor to possibly perform database queries.

Pass-by-reference arguments

The second way for an endpoint to consider the arguments is as nested resource identifiers. So using the same grammar, we could match the following identifier:

product:Pi

When the endpoint extracts the product id from this identifier, it gets the argument value of "Pi". Treating this as a nested resource identifier, it would then issue a SOURCE request for it within its evaluation scope. Assuming we have the same endpoint which resolves "Pi" as described above, then the endpoint would determine that the product id was a numeric value of 3.141....

In the general case, it is necessary for an escaping mechanism to encode nested identifiers as arguments. Later we will see how NetKernel grammars achieve this.

Pass-by-value arguments

We have seen that nested resource identifiers can be encoded into an identifier as arguments. This approach works well when there is an existing resource which can be referenced to provide a representation of an argument. In some cases, however, we might have a literal value, or maybe have just performed a calculation and we want to pass that value into another request. In this case, pass-by-value arguments are useful.

Pass-by-value arguments are an extension of pass-by-reference. They work by firstly creating unique identifiers for all the representations we need to pass. A special transient space is then constructed, with an endpoint that resolves the unique identifiers we have created to the representations. These identifiers are then added to the sub-request as arguments. The final step before issuing the request is to inject this transient space as the innermost space on the request scope.

Now when the request is issued, the endpoint resolves precisely as it would with pass-by-reference. Internally the request is also evaluated by the endpoint identically to how it would with pass-by-reference arguments. However when the resource identifier for one of the arguments is resolved, it resolves to the new transient space, and the representation the client created is returned.

Figure: Pass-by-value argument passing

Once the sub-request completes, the transient space is no longer referenced, and is discarded.

Transreption

TODO