Configuration and scope

2012-07-14

Most applications have configuration: how to open a connection to the database, what file to log to, the locations of key data files, etc.

Configuration is hard to express correctly. It’s dynamic because you don’t know the configuration at compile time–instead it comes from a file, the network, command arguments, etc. Config is almost always implicit, because it affects your functions without being passed in as an explicit parameter. Most languages address this in two ways:

Globals

Global variables are accessible in every scope, so they make great implicit parameters for functions.

module App
  API_SERVER = "api3"
end

def save(record)
  http_put(APP::API_SERVER, record)
end

Classes are often global, so you can also attach config to that class’s eigenclass, singleton object, or what have you:

class App
  def self.config; @config; end
end

App.config.api_server = "api3"

App.config.api_server

Erlang apps often handle config with a globally-named module:

{ok, Server} = app_config:get(api_server),

The global variable model is concise and simple; it’s what you should reach for right away. Every thread sees the same values. In fact, all code everywhere sees the same values. Yet there are shortcomings: what if you’re writing a library? What about tests, where you might call the same function with several different configurations? What if you’re running more than one copy of your application concurrently?

Object graph traversal

An advanced OOP programmer may solve the global problem by putting configuration into instances. The application sets up a graph of instances, each with the configuration it needs to do its job.

class App
  def initialize(config)
    @api_client = App::APIClient config[:api_server]
    @logger = Logger.new config[:logger]
  end
end

… and so forth. What if the APIClient needs to use the logger? You could keep a pointer to the application around:

class APIClient
  def initialize(app, config)
    @app = app
    @server = config[:server]
  end

  def get
    @app.logger.log "getting"
  end
end

And traverse the graph of objects in your application. This basically amounts to passing a configuration parameter into every constructor, but has the added benefit of letting you look up other objects in the Application: maybe other local services you might need. It’s a good way to let different components work together cleanly without making their dependencies explicit: the Application doesn’t need to know exactly what services an APIClient needs. Hoorah, encapsulation! It’s also thread-safe: you can create as many applications concurrently as you like, and they won’t step on each other.

On the other hand, you do a lot of traversing, and since these are instance variables, there’s no way to refer to them within other functions, like class methods. It’s also more difficult to test, since you have to stand up all the dependencies (mocked or otherwise) in order to create an object.

At this point, someone else reading this article is screaming “dependency injection frameworks” and pulling out XML. But before we pull out DI, let’s back up and think.

Backing up for a second

What we really want from configuration is to take functions like this:

f(config, x) = g(config, x * 2)
g(config, y) = h(config, y + 1)
h(config, z) = config + z

… and express them like this:

f(x) = g(x*2)
g(y) = h(y+1)
h(z) = config + z

We want the config variable to become implicit so that f and g are simplified. f and g do depend on config–but config may be irrelevant to their internal definition, and explicitly tracking every parameter dependency in the system can be exhausting. These implicit variables are known as dynamic scope in programming languages: variables which are bound in every function in a call stack, but are not explicit in their signatures. More particularly, we want two properties:

The variable is bound only within and below the binding expression. When control returns from the binding expression, the variable reverts to its previous value.
The variable is bound only for the thread that created it, and threads created from the bound scope; that is to say, two parallel invocations of f() can have different values of config. This lets us run, say, two copies of an application at the same time.

In Scala, one kind of implicit scope is provided by implicit parameters, which allow enclosing scope to carry down (at least) one level, to functions which have arguments of the same name and type, and which are tagged as “implicit”. (Well, at least, I think that’s what they do; A Tour of Scala: Implicit Parameters is beyond my mortal comprehension). Implicit parameters don’t carry across threads, which makes it a little tough to defer operations using, say, futures.

In Java, one might consider an InheritableThreadLocal for the task. That gives us the thread isolation property, provided that one remembers to clean up the thread local appropriately at the end of the binding context. Many Java libraries use this to provide, say, request context in a web app. Scala neatly wraps this construct with DynamicVariable, a mutable, thread-local, thread-inherited object which is bound only while a given closure is running. Since Scala doesn’t actually have dynamic scope, we still need to access the DynamicVariable object statically. No problem: we can bind it to a singleton object, just like the Ruby examples earlier:

class App {
  def start() {
    App.config.withValue(someConfigStructure) {
      httpServer.run();
    }
  }
}

object App {
  val config = new DynamicVariable[MyConfig];
}

class HttpServer {
  def run() {
    listen(App.config.value.httpPort)
  }
}

There’s a bit of a wart in that we need to call config.value() in order to get the currently bound value, but the semantics are sound, the code is readable, and there’s no extraneous bookkeeping.

Dynamic scope

In languages that support dynamic scope (Most Lisps, Perl, Haskell (sort of)), we can express this directly:

(ns app.config)
(def ^:dynamic config nil)

(ns app.core)
(defn start []
  (binding [app.config/config some-config-structure]
    (http-server/run)))

(ns app.http-server
  (:use app.config))
(defn run []
  (listen (:http-port config)))

One of the arguments against dynamic scope is that it can lead to name capture: a dynamic binding for “config” could break a function deep in someone else’s code that used that variable name. Clojure uses namespaces to separate vars, neatly allowing us to write either “app.config/config”, or, having included app.config, use the short name “config”. Other code remains unaffected.

Dynamic var bindings in Clojure have a root value (shared between all threads), and an overrideable thread-local value. However, not all Clojure closures close over dynamic vars! New threads do not inherit the dynamic frames of their parents by default: only future, bound-fn, and friends capture their dynamic scope. (Thread. (fn [] …)) will run with fresh (root) dynamic bindings. Use (bound-fn) where you want to preserve the current dynamic bindings between threads, and (fn) where you wish to reset them.

Thread-inheritable dynamic vars in Clojure

Alternatively, we could adopt Scala’s approach: define a new kind of reference, backed by an InheritableThreadLocal:

(defn thread-inheritable
  "Creates a dynamic, thread-local, thread-inheritable object, with initial
  value 'value'. Set with (.set x value), read with (deref x)."
  [value]
  (doto (proxy [InheritableThreadLocal IDeref] []
          (deref [] (.get this)))
    (.set value)))

That proxy expression creates a new InheritableThreadLocal which also implements IDeref, Clojure’s interface for dereferenceable things like vars, refs, atoms, agents, etc. Now we just need a macro to set the local within some scope.

(defn- set-dynamic-thread-vars!
  "Takes a map of vars to values, and assigns each."
  [bindings-map]
  (doseq [[v value] bindings-map]
    (.set v value)))

(defmacro inheritable-binding 
  "Creates new bindings for the (already-existing) dynamic thread-inherited
  vars, with the supplied initial values. Executes exprs in an implict do, then
  re-establishes the bindings that existed before. Bindings are made
  sequentially, like let."
  [bindings & body]
  `(let [inner-bindings# (hash-map ~@bindings)
         outer-bindings# (into {} (for [[k# v#] inner-bindings#]
                                        [k# (deref k#)]))]
    (try
      (set-dynamic-thread-vars! inner-bindings#)
       ~@body
       (finally
         (set-dynamic-thread-vars! outer-bindings#)))))

Now we can define a new var–say config, and rebind it dynamically.

(def config (thread-inheritable :default))

(prn "Initially" @config)
(inheritable-binding [config :inside]
  ; In any functions we call, (deref config) will be :inside.
  (prn "Inside" @config)
  
  ; We can safely evaluate multiple bindings in parallel. It's the
  ; many-worlds hypothesis in action!
  (inheritable-binding [config :future]
    (future (prn "Future" @config)))
  
  ; Unlike regular ^:dynamic vars, bindings are inherited in child threads.
  (inheritable-binding [config :thread]
    (Thread. (fn [] (prn "In unbound thread" @config)))))

More realistically, one might write:

(defmacro with-config 
  [m & body]
  `(inheritable-binding [config ~m] ~@body))

(defn start-server []
  (listen (:port @config)))

(with-config {:port 2}
  (start-server))

Voilà! Mutable, thread-safe, thread-inherited, implicit variables.

It’s worth noting that these variables are not a part of the dynamic binding, so they won’t be captured by (bound-fn). If you want to pass closures between existing threads, use ^:dynamic and (bound-fn). If you want your bindings to follow thread inheritance, use this bind-dynamic approach.

Closing thoughts

With all this in mind, remember LOGO? That little language has more in common with Lisp than you might think, though that discussion is, shall we say… out of this article’s scope.

TO RUNHTTPSERVER
  LISTEN :PORT
END

TO STARTAPP
  MAKE "PORT 8080
  RUNHTTPSERVER
END