Chapter 13. Datatypes and Protocols

Clojure is built on abstractions: sequences, references, macros, and so forth. However, most of those abstractions are implemented in Java, as classes and interfaces. It is difficult to add new abstractions to the language (for example, a queue data structure) without implementing them in Java.

Clojure 1.2 introduces several new features to make it easier to implement new abstractions directly in Clojure, while still taking full advantage of the performance optimizations in the Java platform. Datatypes and protocols are roughly analogous to Java's classes and interfaces, but they are more flexible.

Note

As of this writing, Clojure 1.2 has not yet been released. Although the concepts will remain the same, there may be minor changes in naming or syntax from what we describe in this chapter.

Protocols

A protocol is a set of methods. The protocol has a name and an optional documentation string. Each method has a name, one or more argument vectors, and an optional documentation string. That's it! There are no implementations, no actual code.

Protocols are created with defprotocol:

(defprotocol MyProtocol
  "This is my new protocol"
  (method-one [x] "This is the first method.")
  (method-two ([x] [x y]) "The second method."))

If you were to execute this example in the namespace my.code, the following Vars would be created:

  • my.code/MyProtocol: A protocol object.

  • my.code/method-one: A function of one argument.

  • my.code/method-two: A function of one or two arguments.

method-one and method-two are polymorphic functions, meaning they can have different implementations for different types of objects. You can call method-one or method-two immediately after defprotocol, but they will throw an exception because no implementations have been defined.

What is a protocol? It's a contract, a set of capabilities. An object or a datatype (described in the next section) can declare that it supports a particular protocol, meaning that it has implementations for the methods in that protocol.

Protocols As Interfaces

Conceptually, a protocol is similar to a Java interface. In fact, defprotocol creates a Java interface with the same methods. You can AOT-compile the Clojure source file containing defprotocol and use the interface in Java code. The Java interface will be in a package matching the namespace in which the protocol was defined. The package, interface, and method names will be adjusted to obey Java naming rules, such as replacing hyphens with underscores. Each method in the interface will have one argument fewer than the protocol method: that argument is the this pointer in Java. The previous example would create an interface matching the following Java code:

package my.code;

public interface MyProtocol {
    public Object method_one();
    public Object method_two(Object y);
}

There is one important difference between protocols and interfaces: protocols have no inheritance. You cannot create "subprotocols" like Java's subinterfaces.

Protocols are also similar to "mix-in" facilities provided by languages such as Ruby, with another important difference: protocols have no implementation. As a result, protocols never conflict with one another, unlike mix-ins.

Datatypes

Although Clojure is not, strictly-speaking, an object-oriented language, sometimes it is tempting to think in object-oriented terms when dealing with the real world. Most applications have many "records" of the same "type" with similar "fields."

Prior to Clojure 1.2, the standard way to handle records was to use maps. This worked, but did not permit any performance optimizations from reusing the same keys in many maps.

StructMaps were one solution, but they had several problems. StructMaps have a predefined set of keys, but no actual "type" that can be queried at runtime. They cannot be printed and read back as StructMaps. They cannot have primitive-typed fields, and they cannot match the performance of instance fields in plain old Java objects.

Clojure 1.2 introduces datatypes as a replacement for StructMaps. A datatype is a named record type, with a set of named fields that can implement protocols and interfaces. Datatypes are created with defrecord:

(defrecord name [fields...])

For example, a datatype might store an employee record with two fields, name and room number:

user> (defrecord Employee [name room])

In this example, defrecord creates a new class named Employee It has a default constructor that takes arguments matching the fields of the type, in the same order. You can construct an instance of the datatype by adding a dot to the end of its name.

user> (def emp (Employee. "John Smith" 304))

Datatype instances behave like Clojure maps. You can retrieve the fields of a datatyped object by using keywords as accessor functions:

user> (:name emp)
"John Smith"
user> (:room emp)
304

This is much faster than map lookups and even faster than StructMap accessor functions. Datatype instances also support the assoc and dissoc functions.

user=> (defrecord Scientist [name iq])
user.Scientist
user=> (def x (Scientist. "Albert Einstein" 190))
#'user/x
user=> (assoc x :name "Stephen Hawking")
#:user.Scientist{:name "Stephen Hawking", :iq 190}

You can even assoc additional fields that were not part of the original datatype, without changing the object's type.

user=> (assoc x :field "physics")
#:user.Scientist{:name "Albert Einstein", :iq 190, :field "physics"}

However, if you dissoc one of the original datatype keys, you get an ordinary map as the result.

user=> (dissoc x :iq)
{:name "Albert Einstein"}

Implementing Protocols and Interfaces

A datatype, by itself, just stores data. A protocol, by itself, doesn't do anything at all. Together they form a powerful abstraction. Once a protocol has been defined, it can be extended to support any datatype. We say the datatype implements the protocol. At that point, the protocol's methods can be called on instances of that datatype.

In-Line Methods

When creating a datatype with defrecord, you can supply method implementations for any number of protocols. The syntax is as follows:

(defrecord name [fields...]
  SomeProtocol
(method-one [args] ... method body ...)
    (method-two [args] ... method body ...)
  AnotherProtocol
    (method-three [args] ... method body ...))

You can chain any number of protocols and methods after the fields vector. Each method implementation has the same number of arguments as the corresponding protocol method. Fields of the instance are available as local variables in the method bodies, using the same names.

(defrecord name [x y z]
  SomeProtocol
  (method-one [args]
    ...do stuff with x, y, and z...))

These are the only locals available in the method bodies: defrecord does not close over its lexical scope like fn, proxy, or reify, which is described in the section "Reifying Anonymous Datatypes."

Extending Java Interfaces

Datatypes can also implement methods from Java interfaces. For example, you could implement the java.lang.Comparable interface, allowing your new datatype to support the Clojure compare function:

user> (defrecord Pair [x y]
        java.lang.Comparable
          (compareTo [this other]
             (let [result (compare x (:x other))]
               (if (zero? result)
                 (compare y (:y other))
                 result))))
#'user/Pair
user> (compare (Pair 1 2) (Pair 1 2))
0
user> (compare (Pair 1 3) (Pair 1 100))
-1

Note that the this argument, representing the object on which the method was called, must be explicitly included. This means that Clojure implementations of Java methods will have one more argument than appears in the Java method signature.

Since most of Clojure's core functions are defined to operate on interfaces, they can be extended to support new datatypes. Clojure defines too many interfaces to list here, but they can be found in the Clojure source code. Some examples are clojure.lang.Seqable and clojure.lang.Reversible for the seq and rseq functions, respectively. In a future release (2.0 or later), these interfaces will likely be redefined as protocols.

defrecord does not support Java class inheritance, so it cannot override methods of Java classes, even abstract classes. However, it does permit you to override methods of java.lang.Object such as hashCode, equals, and toString. Simply include java.lang.Object in the defrecord as if it were an interface. Clojure will generate good value-based implementations of the hashCode and equals methods, so it is rarely necessary to implement them yourself.

Java interfaces sometimes define overloaded methods with the same name but different argument types. If the methods have different numbers of arguments (arities), just define each arity as if it were a distinct method. (Do not use the multiple-arity syntax of fn.) If the methods have arguments of different types, add type tags (Chapter 8) to disambiguate them.

Datatypes As Classes

A datatype is equivalent to a Java class containing public final instance fields and implementing any number of interfaces. It does not extend any base class except java.lang.Object.

Unlike Java classes, a datatype is not required to provide implementations for every method of its protocols or interfaces. Methods lacking an implementation will throw an AbstractMethodError when called on instances of that datatype.

When AOT-compiled, defrecord will generate a Java class with the same name as the datatype and a package name matching the current namespace (subject to Java name rules, as with protocols). The generated class will have two constructors: one with just the fields as arguments and one with two extra arguments; a metadata map and a map of additional fields, either of which may be nil.

You cannot add additional constructors to a datatype, nor can you add methods that are not defined in a protocol or interface.

To optimize the memory usage of your datatype, you can add primitive type hints to the fields. You can also type-hint fields with class names; this will not affect memory usage (all pointers are the same size) but can prevent reflection warnings.

user> (defrecord Point [#^double x #^double y])
#'user/Point
user> (Point. 1 5)
#:Point{:x 1.0, :y 5.0}

Extending Protocols to Pre-Existing Types

Sometimes you may want to create a new protocol that operates on an existing datatype. Assume, for now, that you cannot modify the source code of the defrecord. You can still extend the protocol to support that datatype, using the extend function:

(extend DatatypeName
  SomeProtocol
    {:method-one (fn [x y] ...)
     :method-two existing-function}
  AnotherProtocol
    {...})

extend takes a datatype name followed by any number of protocol/method map pairs. A method map is an ordinary map from method names, given as keywords, to their implementations. The implementations can be anonymous functions created with fn or symbols naming existing functions.

Because extend is an ordinary function, all its arguments are evaluated. This means you could store a method map in a Var and reuse it to extend several datatypes, providing functionality very similar to mix-ins.

(def defaults
     {:method-one (fn [x y] ...)
      :method-two (fn [] ...)})
(extend DefaultType
  SomeProtocol
    defaults)
(extend AnotherType
  SomeProtocol
    (assoc defaults :method-two (fn ...)))

There are two convenience macros that simplify the extension syntax, extend-type and extend-protocol. Use extend-type when you want to implement several protocols for the same datatype; use extend-protocol when you want to implement the same protocol for several datatypes.

(extend-type DatatypeName
  SomeProtocol
    (method-one [x] ... method body ...)
    (method-two [x] ...)
  AnotherProtocol
    (method-three [x] ...))

(extend-protocol SomeProtocol
  SomeDatatype
     (method-one [x] ...)
     (method-two [x y] ...)
  AnotherType
     (method-one [x] ...)
     (method-two [x y] ...))

Methods added using extend and its associated macros are attached to the protocol, not the datatype itself. This makes them more flexible (they work on standard Java classes, described in the following section) but slightly less efficient than methods embedded directly within defrecord.

Extending Java Classes and Interfaces

Datatypes and protocols are a powerful abstraction, but often you have to deal with Java classes for which you do not have the source code. Java does not provide a way to add new interfaces to an existing class (known as interface injection), but Clojure protocols can be extended to support existing Java classes.

extend, extend-type, and extend-protocol all accept Java classes as "types." This works on interfaces, too. You can write (extend-type SomeInterface...) to extend a protocol to all classes that implement SomeInterface. This opens up the possibility of multiple inheritance of implementation, because a class can implement more than one interface; the result is currently undefined and should be avoided.

Reifying Anonymous Datatypes

Sometimes you need an object that implements certain protocols or interfaces, but you do not want to create a named datatype. Clojure 1.2 supports this with the reify macro:

(reify
  SomeProtocol
(method-one [] ...)
    (method-two [y] ...)
  AnotherProtocol
    (method-three [] ...))

reify's syntax is very similar to defrecord without the fields vector. Also, like defrecord, reify can extend methods of Java interfaces and java.lang.Object.

Unlike defrecord, the method bodies of reify are lexical closures, like anonymous functions created with fn, so they can capture local variables:

user> (def thing (let [s "Capture me!"]
                    (reify java.lang.Object
                       (toString [] s))))
#'user/thing
user> (str thing)
"Capture me!"

Many situations that formerly required the use of proxy can be handled with reify. In those cases, reify will be faster and simpler than proxy. However, reify is limited to implementing interfaces; it cannot override base class methods like proxy.

Conceptually, reify fills the same role as anonymous inner classes in Java.

Working with Datatypes and Protocols

Datatypes and protocols are a significant new feature in Clojure, and they will have a major impact on how most Clojure programs are written. Standards and best practices are still developing, but a few guidelines have emerged:

  • Prefer reify to proxy unless you need to override base class methods.

  • Prefer defrecord to gen-class unless you need gen-class features for Java interoperability.

  • Prefer defrecord to defstruct in all cases.

  • Specify your abstractions as protocols, not interfaces.

  • Prefer protocols to multimethods for the case of single-argument type-based dispatch.

  • Add type hints only where necessary for disambiguation or performance (Chapter 14); most types will be inferred automatically.

Datatypes and protocols do not remove any existing features: defstruct, gen-class, proxy, and multimethods are all still there. Only defstruct is likely to be deprecated.

The major difference between Java classes and protocols/datatypes is the lack of inheritance. The protocol extension mechanism is designed to enable method reuse without concrete inheritance and its associated problems.

A Complete Example

Here's a version of the classic "payroll" example using protocols and datatypes. Your payroll system will have one method that calculates employees' monthly paychecks based on how many hours they work:

(defprotocol Payroll
  (paycheck [emp hrs]))

Then there are two kinds of employees: "hourly" employees who are paid by the hour and "salaried" employees who are paid a fixed portion of their annual salary each month, regardless of how many hours they work:

(defrecord HourlyEmployee [name rate]
  Payroll
  (paycheck [hrs] (* rate hrs)))

(defrecord SalariedEmployee [name salary]
  Payroll
  (paycheck [hrs] (/ salary 12.0)))

Notice that you have not defined an IS-A relationship. There is no "Employee" base type; none is needed. All you have said is: these two types exist, and both support the paycheck method of Payroll.

Now you can define a couple of employees and calculate their paychecks:

user=> (def emp1 (HourlyEmployee. "Devin" 12))
user=> (def emp2 (SalariedEmployee. "Casey" 30000))
user=> (paycheck emp1 105)
1260
user=> (paycheck emp2 120)
2500.0

You might also need to send paychecks to contractors: in that case, the contractor's payment is specified before they start working. This could be another datatype, but you can also implement it using reify:

(defn contract [amount]
  (reify Payroll (paycheck [hrs] amount)))

As shown in the following example:

user=> (def con1 (contract 5000))
user=> (paycheck con1 80)
5000

Advanced Datatypes

Datatypes defined with defrecord are useful for storing structured data, but fundamentally they always act like maps. If you want to define a completely new type, one that doesn't behave like a map, use the deftype macro instead. deftype is a "lower-level" version of defrecord.

(deftype name [fields...]
  SomeProtocol
    (some-method [this x y] ...)
  SomeInterface
    (aMethod [this] ...))

The syntax is the same as defrecord, but deftype will not create any default method implementations for you. You must suppply all the method implementations, even standard Object methods such as equals and hashCode. deftype creates a "bare" Java class; it is intended to allow the redefinition of core data structures, such as vectors or maps, in Clojure itself.

Summary

Datatypes and protocols are two of the most exciting new features planned for Clojure 1.2. They provide a powerful solution to many of the same problems that object-oriented programming was intended to solve, but without the baggage of implementation inheritance. In fact, datatypes and protocols bear a remarkable similarity to early research in object-oriented design. They elegantly handle the problem of adding new functions to existing types, sometimes called the "expression problem." Because they are built on the Java platform's heavily-optimized method dispatch, they also provide excellent performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.141.75