Caching

The aim of the caching in resource oriented computing is to eliminate redundant computation. This causes systems to run faster, and with less CPU usage. The end result is a reduction in hardware platform requirements and reduced energy costs.

It might at first not seem obvious, but in virtually all information systems identical operations and calculations are repeated many times. The reason for this repetition is twofold: Firstly, when running the same processes over different data, many sub-components are invoked identically. Secondly, systems connect to the real world and real world state changes in certain well defined ways. It often changes slowly, partially, and incrementally, following distributions such as power law and normal curves.

When software is used to perform an operation on different data either due to iterating over many data points, or over time as data evolves many parts of that process remain identical. A page on a newspaper website might display a weather widget on every page. Weather forecasts typically only change every hour at the most, and there will be many thousands of readers in the same geographic area receiving the same information. Behind the scenes, generating this widget involved retrieving forecast information from a third party, extracting the relevant information, and formatting it into their own styled HTML. This work needs only to be done once per hour per geographic area. Performing the work for each served page is wasteful.

Over time, this same newspaper will accumulate a vast number of stories. Typically they want to keep them all available, either to link to from future stories, or to provide relevant search results to draw readers in from search engines. Many, or most, of these stories will fade away into obscurity, however at anyone time there will be hot stories that draw in vast crowds. Then, again, these stories will fade in interest and back into insignificance. To render a story to the website the server will typically retrieve the story from a content management system, perform a search for related stories, pull in relevant advertising, and format the page for viewing. In this scenario it can be seen that popular stories change over time but that in any given hour there is certain content that is repeatedly served many times.

Of course, systems can be designed and implemented to avoid such repeated computations, and often they are. In well defined places it is feasible and quite common for a mechanism to be created, and this are seen as the obvious and essential way to implement a solution in code. For example, in the newspaper, web-caches exist to store pages and fragments of pages. In more subtle ways, system configurations such as database settings and connection pools, and various application settings are loaded from the filesystem and stored within the software. Any scripts or dynamically compiled programming languages are loaded from the filesystem, parsed an compiled into efficient in memory representations ready for execution. These techniques are essential for the performant operation of todays software, and often take the form of in-memory caches.

There are, however, severe limitations to this approach, and often optimisation doesn’t happen because it is simply too much work and too complex to decide which parts of a system would benefit most and then implement a solution. A solution then adds complexity and coupling between parts that makes evolution and ongoing maintenance harder. The location within an application where a mechanism like this exists is fixed and provides a point optimisation. Different mechanisms might need to be implemented in different locations that might conflict in terms of how they manage their resources such as memory usage. Determining primary keys for data stored in a cache is often hard. Providing mechanisms to flush the cached data is necessary and ensuring they get called at the right times is hard too.

Probably the most technically challenging part of caching is managing state lifecycle. If a newspaper only updates it's weather forecasts every hour that probably doesn’t carry to much liability. If a storm is suddenly reclassified and turns for the worst a few people might be unhappy. If the advertisement for a product appears when it is out of stock and it is chosen rather than something else, there will be revenue lost. If a bank trading system fails to see spike of a market bubble bursting in a timely manner it could precipitate the end of that bank… and the collapse of the whole market. Simply saying that cached state is valid for a fixed period of time is often not adequate. Store it too long and you increase risk, store it a shorter time and you increase computational overheads. More sophisticated approaches are possible, of course, but they require even more coupling between the state and events that cause changes to that state.

Resource oriented computing has an approach to caching where every operation, every response to every idempotent request, is intrinsically cacheable. That is the basic premise, but of course it can be configured and controlled, often in quite sophisticated ways. For example rather than a simple time-to-live expiration of a representation, endpoints can implement functions to determine validity, for example by looking for changes in a file or database. Validity naturally propagates so that derivative resources generated from sub-requests to now expired representations also become expired. Caching is not some external bolted-on feature but an intrinsic part of the abstraction. It relies quite deeply on some of the core characteristics of the resource oriented approach. Let us look now at how it works.

Some background - Memoization

Resource oriented caching is somewhat similar to memoization¹, a term coined by Donald Michie in 1968 to describe an optimisation technique whereby software remembers the return value from function calls, keyed by the inputs to the function. Then, if the same function is ever called again, the return value could be substituted in place of actually executing it. Many functional programming languages provide some sort of memoization feature. Of course for this to work constraints must be placed on the function being called. Firstly, it must always return the same result for the same inputs. Mathematical functions always do this but in software this means that the function has access to no internal state that can change. The second constrain is that functions must not have any side-effects, that is, they should only return a result and not effect the state of any other function or external system. A function that satisfies these two constraints is called pure in functional programming and can be said to be referentially transparent².

Memoisation actually dates back further to 1953 when an optimisation technique for specific classes of algorithms was generalised with a term called dynamic programming³ by Richard E. Bellman. Dynamic programming works with certain classes of algorithms - algorithms which can be hierarchically decomposed into sub-problems which overlap. By overlap here we mean that the algorithm will usually evaluate a proportion of the problem space multiple times, sometimes many times. In dynamic programming sub-problems are enumerated within the scope of a single execution of an algorithm, and when executed their solutions are memoised and stored in a lookup table for future reference. By creating an implementation of an algorithm that uses dynamic programming techniques the computational time complexity of an algorithm can often be reduced as compared to a brute-force approach. For example, the classic travelling salesman problem can be changed from an extremely hard O(n!) (execution time proportional to the factorial of the number of places to visit) to a, still hard but slightly less so, O(2^n) (execution time proportional to 2 to the power of n). Unfortunately each algorithm requires research and investigation to see how amenable to dynamic programming it is, and then a custom implementation must be developed. It is a great approach to specific well defined algorithmic problems but not to optimising general computation.

We have already seen that real-world systems typically do have a large amount of repeat computation, or overlapping sub-problems in algorithmic theory parlance. We see that the dynamic programming approach doesn’t help too much because it can only be applied to the execution of single algorithms. General memoisation can be applied to more general computation if it is structured into pure functions. Unfortunately pure functions are quite limiting. They can work well over a short time scale but not so well when dealing with state that can change in a long running system.

Some More Background - the World Wide Web

The Web takes an interesting and pragmatic approach to state change. By necessity of the network separation, representations passed between server and client, or vice versa, are immutable. Change in state is transferred by polling a resource, and that polling is controlled through metadata in the request and response headers. There is no space, or need, to go into great detail on how this mechanism works suffice to say that response can state a validity duration that can range from zero to infinite. When a client polls a resource on a server, a mechanism called eTags can effectively let a client know that the representation they have is still valid, or they are delivered a new one. Both a client and any intermediary proxies can use the validity period of resources to cache state and provide optimisations. A formalisation of the Web architecture is called Representational state transfer⁴ (REST).

In ROC a very similar approach to state change is taken to REST. A response transfers a representation of state back to a client, and metadata in the headers determines the future validity of the representation. However the Web has a flat structure - it is not turtles all the way down.⁵ The Web has an architecture that typically consists of a user agent (a browser or app) connecting to a server. Once inside the server the Web ends and code starts. Functional decomposition doesn’t exist inside the Webs world view. Microservice⁶ architectures are starting to scratch the surface of a world beyond, by composing architectures in a macro way with resources, but in a very limited way. The ROC world view is very much that “everything is a resource”. In this world view we functionally decompose into resources until it is no longer practical - there is a pragmatism involved which we will discuss in later chapters. We could say it is turtles as far down as we choose.

By functionally decomposing systems into resources with a rich state management mechanism, ROC enables long time scale memoisation that can benefit real world information systems. We are almost at the point where we can fully explore the details of resource oriented caching. We must however cover one more concept. We have talked about scope previously as the context which guides the request resolution process, we will now see how it is essential to the identity of resources.

Scope and Identity

In this section we will take a deeper look at scope and how it effects the identity of resources. This is important because, as we shall see, a resource identifier is not enough to uniquely identify a resource; and if we can’t identify a resource we cannot correctly store and retrieve responses to it in a cache.

As humans we very naturally, without even thinking, use the joint concepts of context and name in language. Context is directly analogous to scope and name with resource identifier. We use pronouns such as “it” to refer to a noun in a context that we have previously established. In this example “it” is the resource identifier and the preceding text is the context. In the following sentence we resolve “it” to “dog” : I have a dog, it barked. We do this kind of resolving with proper nouns too. If a teacher mentions her pupil Charlie in a conversation in the classroom everybody immediately resolves “Charlie” to the pupil present. Of course people elsewhere are using “Charlie” to refer to other people and maybe even pets. We can clearly see that the resource identifier “Charlie” doesn’t uniquely identify a resource in any absolute way. In fact we can never uniquely identify anything absolutely by just name unless we can guarantee some unique naming scheme across the entire universe or even multiverses! In fact this is absurd to attempt, we always consider a context along with a name or identity even if it is not made explicit.

Digging deeper into context we see that it has a hierarchical nature such that it forms a vast tree of contexts covering all geographical and contextual space.

Here are a couple of examples:

“Orange” identifies all of these, a fruit, a colour - both concepts; as well as a town in southern France and famous county in California.

“Barrow” identifiers the concept of an ancient burial mound. If I was in a field in deepest Wiltshire, UK it might identify a specific barrow. Barrow also identifies as many as twelve towns in the UK and several in the US.

In programming languages the concept of context translates to scope. We normally think of lexical scope in programming languages. Lexical scope is hierarchical and defined by the syntactic elements of the code, for example with curly brackets and function definitions. Lexical scope normally gives any particular line of code access to resolve variables, definitions, and functions etc from the immediate scope, and any outward broader scope, up to what some programming languages, such as Javascript, call globals.

When a request is resolved in ROC, the innermost space in the scope is interrogated first. If it can’t be resolved there, the next space, one step out, is then interrogated. This progressive broadening of scope is analogous to the process used by a compiler to bind a variable reference in programming languages. In ROC we call the amount of the scope that was interrogated to resolve a resource the response scope. It is called response scope because it's attached to the response as a metadata header to indicate what scope was needed to fully define that response. At first this concept might seem trivial, but it is this response scope that defines the minimum bubble of context to absolutely define a resource. Imagine Charlie, from the example above, is on a school trip. On the bus trip his teacher may shout across “Charlie are you being quiet?”, and this request will still resolve to our Charlie - even though the bus is on a road in the middle of nowhere. The context of where the bus is is irrelevant for the teacher to resolve Charlie, only the contents of the bus are needed. Figure: Requesting Charlie to be quiet

If Charlie is not on the bus, how does the request “Charlie be quiet” resolve? On first analysis we could say that we pop scope to “middle of nowhere” and see if somebody outside the bus responds to the request. Of course shouting doesn’t travel that far - probably not out of the bus - so request scope is naturally limited in this example. This is an example of the scope sandboxing that we talked about earlier. If the teachers voice had unlimited range, the request might find a few Charlies in the neighbourhood of the bus. Maybe they would all respond? In ROC a space will only resolve a request to one resource - normally the first one defined.

Teacher request scope for “Charlie be quiet”:

Schoolbus(without Charlie) -> Middle of nowhere -> Planet earth

Response scope:

Schoolbus(without Charlie) -> Middle of nowhere

In these real world analogies, the scopes are rather artificial, in ROC the scopes are real spaces containing endpoints and their instantiated resources within the structure of the system.

In ROC we also have nested sub-requests. Sub-requests effect response scopes too. In the next section we’ll see how.

Response Scope for Requests With Nested Sub-Requests

We have simplified our discussion of scopes so far by only thinking about a single request in isolation. In reality the implementation of many endpoints cause sub-requests to be issued because they are designed by recursive decomposition. When a request is evaluated the fact that it’s implementation might involve sub-requests is irrelevant to the client - the endpoint embodies a resource, and how that resource is implemented is black box. However, when thinking about caching, what these sub-requests do is important because they can effect the response scope. Any endpoint which issues sub-requests which resolve by popping scope cause the outer response to depend upon that scope. This effect on response scope applies recursively to all ancestor requests within the sub-request tree.

Let us look at an example to make things clearer. The teacher is now worried that Charlie is upset, and decides to try and distract him by asking a question, “Is that a panda in the tree?”. Charlie takes the bait and answers. He, in effect, issues the sub-request “What is in the tree?”. The request fails to resolve in “Schoolbus” because he must look out of the window, the scope pops and the request resolves in “Middle of nowhere”. Charlie gets the response “Squirrel” and responds back to the teacher, “No”. In this situation both the Schoolbus, and Middle of Nowhere, are now in the response scope.

Figure: Response Scope with Nested Sub-Request That Pops Scope

If the teacher asked this question when she wasn’t in the bus, or if the bus was somewhere else, the response scope would be different; and when that is the case we are interacting with a different resource. Any cached response would not be valid when the relevant part of the context changed.

Mechanics of Caching

In the previous sections we have covered all the background necessary to describe how caching in resource oriented computing actually operates. I hope from the places we have explored you are already forming a reasonable picture, so let's dive deep and cover the salient points.

ROC caching works by storing a lookup table, in memory, of responses from previous requests. Only source, exists, and meta requests are eligible for storage because they are the only request verbs that can potentially be idempotent and not have side effects. Responses are keyed in the lookup table by identifier, response scope, verb, and requested representation type. Caching is transparent to both the issuer of the request and the implementing endpoint - this is possible because the kernel always acts as in intermediary orchestrating the resolution and evaluation of a request. Caching does not effect the outcome of any request only the computational effort to evaluate it. The size of the cache is managed to ensure it doesn’t grow unbounded. A value function based upon execution time, last used time, and number of times used determines which responses are kept, and which are discarded. The expiration function on each response ensures that expired responses are never retrieved from cache and are discarded as soon as possible. The expiry function set by an implementing endpoint can also ensure that evaluations which are not idempotent or do have side effects will never be cached.

Figure: UML Class Diagram of Cache Classes

Cache Keying

When an eligible response is cached it is stored in the lookup table with a key to allow it’s retrieval. In our previous discussion we have shown how both resource identifier and response scope are critical to specify the resource. In addition, the response is keyed by request verb and response representation. By keying with request verb the cache ensures we can cache responses for source, exists and meta requests to the same resource. The response representation is part of the key to allow for many different representational forms of response to be cached independently. This occurs if multiple clients make a request to a resource but require different representations.

Scopes cannot be compared directly when looking up a response during a cache get operation. Only a matching head of equal length to the response scope needs match a request scope.

Figure: Example of matching request scope to cached response scope

Figure: Example of a non-matching request scope to cached response scope

Expiry Function

The expiry function associated with every response determines how long it remains valid. This function is monotonic - once it returns true it will remain true forever. Once a response is expired it can never become valid again, and a new one must be requested. If an endpoint returns a response that is expired immediately, it is still valid for a client to use that ephemeral representation, it will just not be cached. If the response becomes expired after it has been stored in the cache, then it will never be returned by the cache because it is tested for expiry before being returned.

Whilst, as we see in a moment, it is possible for expiry functions to have custom implementations with logic determined by an endpoint, it is more common for the expiry function to be declared declaratively. This has the combined advantage of being easier for a developer, and of the kernel being able to optimise them. The declarative expiry functions are:

ALWAYS - the response will never be cached possibly because the resources state is constantly changing or because requesting it causes some desirable side-effect.
NEVER - the response will remain valid forever and can be cached indefinitely. Usually resources which embody platonic information or historical data will have this expiry.
CONSTANT - the response will remain valid for a fixed duration of time. This is equivalent to time-to-live.
DEPENDENT - the response will remain valid whilst the responses from all sub-requests remain valid. This basic dependency mode ensures that a derivative resource will be re-evaluated when any of it’s inputs change.
POLLED_DEPENDENT - this mode works in a very similar way to DEPENDENT except that dependent resources are only polled periodically to reduce polling overhead
MIN_CONSTANT_DEPENDENT - this mode blends CONSTANT expiry with DEPENDENT expiry such that either will cause expiration. I.e. the response will expire after a duration of time or if any of the dependent resources expire, whatever comes first.
MAX_CONSTANT_DEPENDENT - this mode blends CONSTANT expiry with DEPENDENT expiry such that the response will not expire any earlier than a fixed duration into the future but a dependent resource must expire too.
FUNCTION - a software function is supplied to determine expiry.
MIN_FUNCTION_DEPENDENT - this mode blends FUNCTION expiry with DEPENDENT expiry such that either will cause expiration.

The default expiry function for responses from endpoints handling source requests is DEPENDENT. This default is sensible for endpoints with no state that may or may not issue sub-requests. If an endpoint doesn’t explicitly return a response a NULL response is generated with an expiry of ALWAYS. This default assumes that, because the response of the endpoint is not important, the side-effects or state modification it performs is therefore it should always be re-evaluated.

Implementing custom expiry functions is typically only needed in endpoints which hold mutable state, or embody state in some external database. Simple examples include an accessor that maps the "file:” scheme onto files in the filesystem. Its expiry function for source requests will compare the timestamp of the file to the timestamp when the representation was created - if it changes the representation expires.

Cache Management

Because the cache is stored in memory its size must be managed to ensure it doesn’t grow unbounded. The cache can be configured to store a finite number of responses or to be dynamic and use a configured amount of the available memory. Either way, in order to store new responses sometimes old ones must be discarded. This discarding occurs periodically in chunks to improve efficiency - this is called the cull cycle. Culls are triggered based upon the configured limits of size and available memory. Each cull is configured to eliminate a certain percentage of the responses from the cache. Any expired responses are culled first as they are useless, then after this the responses are ranked based upon a value function. The least valuable responses are eliminated.

The cache value function uses execution time, time since last retrieval, and number of times retrieved to determine value. Execution time is a measure of the total CPU time taken to generate a response. This metric is captured whenever an endpoint is invoked and it includes any cost from sub-requests regardless of whether a sub-request’s response came from cache or was actually evaluated. It gives a true cost of generating the response from scratch - i.e. with an empty cache. Time since last retrieval and number of times retrieved are both updated whenever a response is retrieved.

Resolution Cache

So far our discussion has revolved around caching responses to save the computation needed when executing requests. If a request needs to be constantly, or periodically evaluated, then there is also benefit to caching the resolution. NetKernel implements a resolution cache for this purpose. Resolution is typically many orders of magnitude faster than endpoint execution - except for anything but the simplest requests - but there is still benefit. The resolution cache is very simple and provides a lookup from request scope to resolution. As an internal optimisation to the processing model, it doesn't require any specific consideration when designing or implementing a system.

Kernel Request Processing

All requests issued within ROC are processed by the kernel. The kernel acts as the middleware between endpoints.

Figure: Kernel caching flowchart

Control headers

For the most part caching behaves sensibly with minimal configuration. Additional headers provide some fine tuning. Firstly the request headers:

no-cache This request header ensures that the response will not be retrieved from cache or stored in cache. It doesn’t effect the behaviour of sub-requests issued, if any, by the executed endpoint. No-cache is more often used as a response header (see below). Its use as a request header is limited to situations where a client wants to override the implemented caching behaviour of a resource.

forget-dependencies This header causes the expiry dependencies for any sub-requests issued by the executed endpoint to be forgotten. This doesn’t effect the behaviour of those sub-requests in any way, but it does effect the request that the forget-dependencies header is attached too. By forgetting the expiry dependencies, long running requests with large numbers of sub-requests can execute without accumulating state in the kernel that would normally be used to determine its expiration. Because the dependencies are lost the response will be ALWAYS expired. Typically this header is used for long running batch style requests.

exclude-dependencies This header excludes the expiry dependencies from the request it is attached to from propagating into its parent request. This request will execute and cache as normal but the parent may potentially be more cacheable because it is lacking this requests expiry dependencies. This header is used to “fix” errant or undesirable caching characteristics. For example, a resource that marks it’s responses as ALWAYS expired, could be wrapped by an endpoint that sets a CONSTANT expiry period and issues a sub-request to this ALWAYS expired resource with an exclude-dependencies header.

The response headers that effect caching are:

no-cache This header ensures that the response will not be cached. It doesn’t effect the behaviour of any sub-requests issued by the executed endpoint. Sometimes no-cache is mistaken with an ALWAYS expiry, it is different in behaviour as well as intent. ALWAYS expiry ensures a response will not be cached but this ALWAYS expired propagates to a requesting endpoint if DEPENDENT expiry or one of it’s variants is used. This could inhibit the whole ancestor request chain from being cachable. The no-cache header simply stops the response from this single request from caching and has no effect on the ancestor request chain. No-cache is often used in conjunction with NEVER expired when you want the calling endpoint to take control of caching by specifying it’s own expiry function.

boost This header specifies a positive integer which boosts the perceived computation cost of response. The units are microseconds.

Changing the Style of Coding

Over and above the effect that it has on performance, caching simplifies the way that systems are coded. Because an endpoint implementation can rely on valuable resources being cached, that endpoint doesn’t need to manage that state itself. Usually it should just request all the external resources for configuration, inputs and other data each time it needs them. It is typical in object oriented code for objects to obtain state through various method calls in their constructor, and store it for the duration of the objects lifetime. More complex objects might even contain methods to refresh or update their state when external dependencies change. These kinds of functionalities add complexity and rigidity to a system, which means more lines of code and less dynamic behaviour to changes in state.

Memoization Wikipedia page
↩
Referencial Transparency Wikipedia page
↩
Dynamic Programming Wikipedia page
↩
https://en.wikipedia.org/wiki/Representational_state_transfer
↩
Turtles all the way down Wikipedia page
↩
https://en.wikipedia.org/wiki/Microservices
↩