Chapter 10: Designing Useful Domain-Specific Languages

Ruby makes it easy to implement domain-specific languages (DSLs), and many popular libraries offer DSLs to improve their usability.

In this chapter, you'll learn how to design and implement a DSL, which problems are handled well by DSLs, and both the advantages and disadvantages of using DSLs in your libraries.

We will cover the following topics:

  • Designing your DSL
  • Implementing your DSL
  • Learning when to use a DSL

By the end of the chapter, you'll have a better understanding of not only how to design a DSL, but why it may or may not be a good idea to do so.

Technical requirements

In this chapter and all chapters of this book, code given in code blocks is designed to be executed on Ruby 3.0. Many of the code examples will work on earlier versions of Ruby, but not all. The code for this chapter is available online at https://github.com/PacktPublishing/Polished-Ruby-Programming/tree/main/Chapter10.

Designing your DSL

The most important thing to think about when designing a DSL is to focus on how the DSL will be used. Some DSLs are designed to configure a library. Some DSLs are used for making specific changes using the library. Some DSLs exist purely to reduce the verbosity of the code. Sometimes the library exposes a DSL as its only interface, and the library and DSL are basically the same thing. Let's focus first on DSLs designed for configuring a library.

Configuration DSLs

DSLs designed to configure libraries are often referred to as configuration DSLs. They are often initiated from a singleton method on the library's main module or class, often straightforwardly named configure. RSpec, a popular Ruby library for testing, uses a configuration DSL like this:

RSpec.configure do |c|

  c.drb = true

  c.drb_port = 24601

  c.around do |spec|

    DB.transaction(rollback: :always, &spec)

  end

end

RSpec uses this DSL to configure itself. It passes in a configuration object, and you call methods on the configuration object to configure the library, in this case setting it to use drb (short for distributed Ruby, a standard library) on port 24601. It also calls the around configuration method with a block, which is yielded a proc (named spec in this example), which is passed as a block to DB.transaction for wrapping the entire test case in a database transaction that is always rolled back.

This type of configuration DSL is very helpful for users because it gives the user a single place to look for configuring the library. This is instead of the user looking through all of the RSpec documentation to determine how to configure settings such as the following:

RSpec::Core::DRbRunner.new(port: 24601)

RSpec::Core::Hooks.register(:prepend, :around) do |spec|

  DB.transaction(rollback: :always, &spec)

end

The user now has a single place they can look—the configuration DSL documentation—to determine all of the supported ways to configure the libraries. This makes configuration much easier. If your library has significant configuration options, strongly consider adding a configuration DSL for it. You don't necessarily have to use a separate method that takes a block.

For many libraries, the DSL can be as simple as singleton methods you can call on the library's main module. For example, if RSpec used this approach, a possible configuration DSL would be the following:

module RSpec

  self.drb = true

  self.drb_port = 24601

  around do |spec|

    DB.transaction(rollback: :always, &spec)

  end

end

This simpler approach has some disadvantages compared to the block-based approach, though. First, you need to know whether RSpec is a module or class because using module RSpec when RSpec is a class will result in a TypeError exception being raised. Second, writer methods (methods ending in =) are more awkward to call with this approach since they require an explicit self.

Suppose that you forget the explicit self, and you do the following:

module RSpec

  drb = true

  drb_port = 24601

end

Then you end up defining unused local variables, and this has no effect on the library. The general principle here is to avoid writer methods in cases where you will naturally call them on self. In these cases, it may be better to offer aliases such as set_drb and set_drb_port:

module RSpec

  set_drb true

  set_drb_port 24601

end

An alternative to this is having multipurpose drb and drb_port methods, which when called without arguments act as reader methods, but when called with one argument act as writer methods. Refer to the following code block:

module RSpec

  drb true       # Set the value

  drb_port 24601 # Set the value

end

RSpec.drb

# => true

RSpec.drb_port

# => 24601

Any of these approaches for configuration will work fine, the important principle is to make sure that you have a simple and well-documented way to configure your library, assuming your library is complex enough to require configuration.

In this section, you learned about DSLs for configuring a library, using a real-world example from RSpec. In the next section, you'll learn about DSLs for making complex changes using a library.

DSLs for making specific changes

For libraries that need to make complex changes atomically, there are three common approaches. The first is passing arrays or hashes or some nesting of the arrays or hashes to a single method, often with keyword arguments to influence the command. Refer to the following code:

Foo.process_bars(

  [:bar1, :baz2, 3, {quux: 1}],

  [:bar2, :baz4, 5],

  # ...

  skip_check: ->(bar){bar.number == 5},

  generate_names: true

)

This type of API is often hard for users to use. Manually making sure each of the arrays or hashes being passed in is the right format can be challenging by itself. It's best to avoid defining methods that require users to pass many complex objects if you can, as such methods are more difficult for users to use correctly.

Another approach is creating objects and individually attaching them to a command object, which is passed in. You often see this pattern in less powerful and expressive languages, where objects are explicitly instantiated and then passed to methods:

bar1 = Bar.new(:bar1, :baz2, 3, quux: 1)

bar2 = Bar.new(:bar2, :baz4, 5)

command = ProcessBarCommand.new

command.add_bar(bar1)

command.add_bar(bar2)

# ...

command.skip_check{|bar| bar.number == 5}

command.generate_names = true

Foo.process_bars(command)

This approach is better than the previous approach in most cases, as it is easier for users to use. However, it is a bit verbose, and not idiomatic Ruby. For this type of command, an idiomatic approach in Ruby would be to use a DSL inside a block, such as the following code:

Foo.process_bars do |c|

  c.bar(:bar1, :baz2, 3, quux: 1)

  c.bar(:bar2, :baz4, 5)

  # ...

  c.skip_check{|bar| bar.number == 5}

  c.generate_names = true

end

This retains the benefits of the command object approach but decreases the verbosity. Unlike the command object approach, it contains the logic for the command processing inside the block, which is an important conceptual difference. It also makes things easier for the user, as the user doesn't need to reference other constants manually, they just need to call methods on the yielded object.

Note that there are cases when the command object approach is probably better, and that is when you are passing the object to multiple separate methods. While you can pass blocks to methods using the & operator, it's probably not a good general approach, because the block will get evaluated separately by each method. With the command object approach, the command can be self-contained and you do not need to recreate the command every time you are calling a method. When using the command object approach, it is often a good idea for the command object initializer to use a DSL, shown as follows:

command = ProcessBarCommand.new do |c|

  c.bar(:bar1, :baz2, 3, quux: 1)

  c.bar(:bar2, :baz4, 5)

  # ...

  c.skip_check{|bar| bar.number == 5}

  c.generate_names = true

end

Foo.process_bars(command)

With some extra work, you can have your library support the same configuration block both directly passed to Foo.process_bars and when using the command object approach with ProcessBarCommand.new. This gives you the best of both worlds. You'll learn how to implement this technique in a later section.

In this section, you learned about DSLs for making complex changes in a library. In the next section, you'll learn about using DSLs to reduce the verbosity of code.

DSLs for reducing the verbosity of code

Sequel, a popular database library for Ruby, uses a DSL designed purely for reducing the verbosity of code. If you want to express an inequality condition in one of your database queries, you can use a long approach such as the following:

DB[:table].where(Sequel[:column] > 11)

# generates SQL: SELECT * FROM table WHERE (column > 11)

In this case, Sequel[:column] returns an object representing the SQL identifier, which supports the > method. This type of usage occurs very often in Sequel, so often that it is desirable to have a shortcut. Sequel has multiple shortcuts for this, but the one enabled by default uses a simple DSL:

DB[:table].where{column > 11}

This uses an instance_exec DSL, where methods inside the block are called on an object different than the object outside the block. Inside the block, methods called without an explicit receiver return Sequel identifier objects, so column inside the block is basically translated to Sequel[:column] (which itself is a shortcut for Sequel::SQL::Identifier.new(:column)). One issue with this approach is that if users are not familiar with the method and do not know the block is executed using instance_exec, they may do something like the following:

@some_var = 10

DB[:table].where{column > @some_var}

This doesn't work because the block is evaluated in the context of a different object. The need to reference methods or instance variables in the surrounding scope is common enough that the DSL also supports this approach, by yielding an object instead of using instance_exec if the block accepts an object:

@some_var = 10

DB[:table].where{|o| o.column > @some_var}

In this section, you learned about DSLs designed to reduce code verbosity, using a real-world example from Sequel. In the next section, you'll learn about libraries implemented purely as DSLs.

Libraries implemented as DSLs

Some libraries are implemented purely as DSLs, in that the expected use of the library is only via the DSL, and you as a user are never expected to manually create the library's objects. One library designed like this is minitest/spec, which is another popular Ruby library for testing.

With minitest/spec, all use of the library is via a DSL. You use describe to open a block for test examples. Inside the block, before is used for the code run before every example, and after for the code run after every example. You use it to define test examples. Notice in the following example, you never create any minitest objects:

require 'minitest/autorun'

describe Class do

  before do

    # setup code

  end

  

  after do

    # teardown code

  end

  it "should allow creating classes via .new" do

    Class.new.must_be_kind_of Class

  end

end

Another library implemented as a DSL is Sinatra, which was the first Ruby web framework showing you could implement a web application in a few lines of code, and an inspiration for many minimal web frameworks in Ruby and other languages. With Sinatra, after requiring the library, you can directly call methods to handle HTTP requests. This simple web application will return Index page for GET requests to the root of the application and File Not Found for all other requests, as shown here:

require 'sinatra'

get "/" do

  "Index page"

end

not_found do

"File Not Found"

end

This type of DSL is not for every library. It is best left for specific environments, such as testing for minitest/spec, or for simple cases, such as only handling a few routes in Sinatra. For both minitest and Sinatra, there is an alternative API that is not a pure DSL, where classes are created in the standard Ruby way.

In this section, you learned about designing different types of DSLs. In the next section, you'll learn how to implement the DSLs you learned about in this section.

Implementing your DSL

One of the best aspects of Ruby is how easy Ruby makes it to implement a DSL. After programmer friendliness, probably the main reason you see so many DSLs in Ruby is the simplicity of implementation. There are a few different DSL types you learned about in the previous sections, and you'll learn how to implement each in this section.

The first type is the most basic type, where the DSL method accepts a block that is yielded as an object, and you call methods on the yielded object. For example, the RSpec configuration example could be implemented as follows:

def RSpec.configure

  yield RSpec::Core::Configuration.new

end

In this case, the configuration is global and always affects the RSpec constant, so the RSpec::Configuration instance may not even need a reference to the receiver.

For the Foo.process_bars example given previously, assuming the ProcessBarCommand uses the add_bar method and the DSL uses the simpler bar method, you need to implement a wrapper object specific to the DSL. Often the name of this object has DSL in it. Since the skip_check and generate_names methods are the same in both cases, you can cheat and use method_missing, though it is often better to define actual methods, as you learned in Chapter 9, Metaprogramming and When to Use It. In this example, we'll use the method_missing shortcut:

class ProcessBarDSL

  def initialize(command)

    @command = command

  end

  def bar(...)

    @command.add_bar(...)

  end

  def method_missing(...)

    @command.send(...)

  end

end

With the ProcessBarDSL class created, you can implement Foo.process_bars by creating the ProcessBarCommand object, and yielding it wrapped in the ProcessBarDSL instance. After the block completes processing, you can implement the internal processing of the bars by calling a private internal method, here named as handle_bar_processing:

def Foo.process_bars

  command = ProcessBarCommand.new

  yield ProcessBarDSL.new(command)

  handle_bar_processing(command)

end

If you want to support an API where you can either pass a block to Foo.process_bars or pass an already created ProcessBarCommand object, that is also easy to implement. Refer to the following code block:

def Foo.process_bars(command=nil)

  unless command

    command = ProcessBarCommand.new

    yield ProcessBarDSL.new(command)

  end

  handle_bar_processing(command)

end

For the Sequel example with the where method, because it allows both the instance_exec approach and the block argument approach, it's slightly tricky. You need to check the arity of the block, and if the block has arity of 1, then the block expects an argument, and you yield the object to it. If the block does not have arity of 1, the block doesn't expect an argument, and you evaluate the block in the context of the object with instance_exec. Refer to the following code:

def where(&block)

  cond = if block.arity == 1

    yield Sequel::VIRTUAL_ROW

  else

    Sequel::VIRTUAL_ROW.instance_exec(&block)

  end

  add_where(cond)

end

The Sequel::VIRTUAL_ROW object uses a method_missing approach since all methods are treated as column names. Simplified, it is similar to the following code, though the actual implementation is more complex at it also supports creating a SQL function object if arguments are passed:

Sequel::VIRTUAL_ROW = Class.new(BasicObject) do

  def method_missing(meth)

    Sequel::SQL::Identifier.new(meth)

  end

end.new

In the minitest/spec example, the describe method is added to Kernel. It creates a class, sets a name for the class based on the argument, and passes the block given to class_eval. Simplified, it looks as follows:

module Kernel

  def describe(name, *, &block)

    klass = Class.new(Minitest::Spec)

    klass.name = name

    klass.class_eval(&block)

    klass

  end

end

The before and after methods inside the describe block both define methods. before defines setup and after defines teardown. Simplified, they could be implemented by code as follows:

class Minitest::Spec

  def self.before(&block)

    define_method(:setup, &block)

  end

  def self.after(&block)

    define_method(:teardown, &block)

  end

end

The it method is similar, but the method it defines starts with test, and includes the description given. It also includes an incremented number so that two specs with the same description end up defining different test methods. It's a very common mistake to copy an existing test, modify the copy to test an additional feature, and forget to change the name. With a manual test name definition, that results in the second test method overriding the first. This can be caught if running tests in verbose warning mode (the ruby -w switch), as in that case Ruby will emit method redefinition warnings, but otherwise, it is easy to miss and results in you not testing everything you think you are testing.

Simplified, the it method could be implemented with an approach such as the following:

class Minitest::Spec

  def self.it(description, &b)

    @num_specs ||= 0

    @num_specs += 1

    define_method("test_#{@num_specs}_#{description}", &b)

  end

end

One issue with the minitest/spec implementation of describe is that it adds the method to Kernel, so it ends up being a method on every object. You can call it inside other classes and methods. This adds to the flexibility, and it's probably a good choice for minitest/spec, but it may not be the best decision for DSLs in general.

The Sinatra DSL works differently. It doesn't want to define methods such as get and not_found on every object, but it still wants you to be able to call them at the top level, outside of any classes and methods. It does this by calling extend in the top-level scope with a module. The top-level scope runs in the context of an object called main, and just like any other object, if you extend main with a module, the methods in the module are only added to main and not any other object. A simplified version of the Sinatra DSL is similar to the following:

module Sinatra::Delegator

  meths = %i[get not_found] # ...

  meths.each do |meth|

    define_method(meth) do |*args, &block|

      Sinatra::Application.send(meth, *args, &block)

    end

  end

end

extend Sinatra::Delegator

In this section, you've learned the basics of implementing a variety of different types of DSLs. In the next section, you'll learn about which use cases lend themselves to DSL usage, and which use cases don't.

Learning when to use a DSL

There are some use cases in Ruby where using a DSL makes a lot of sense, and other cases where using a DSL increases complexity and makes the code worse instead of better. The best cases for DSL use in Ruby are where using the DSL makes the library easier to maintain and makes it simpler for a user to use the library. If you find yourself in that situation, then a DSL definitely sounds like the right choice. However, in most cases, a DSL is a trade-off.

In most cases, you design a DSL to make things easier in some way for the user, but it makes the internals more complex and makes your job as the maintainer of the library more difficult. It is possible but less likely for the opposite to be true, where you design a DSL to make your life as a maintainer easier, but the DSL makes the use of the library more difficult.

Of the DSL examples given previously, the RSpec configuration example may be an example of the best case for a DSL. It definitely makes it easier for the user to configure the library since they only need to look in one spot for configuration. Implementation of the DSL is fairly simple, and having all configurations run through a single configuration object may make it easier to maintain the library.

For the Foo.process_bars example, the DSL is definitely more idiomatic Ruby code, and likely to be easier for the user to use than the alternatives. In this case, it definitely adds maintenance work, since it requires creating a class specifically for the DSL. However, the DSL should be reasonably easy to maintain, so it's probably a good trade-off.

For the Sequel example with the where method that takes a block and either yields an object or uses instance_exec, it's definitely questionable whether the benefits outweigh the costs. This DSL only saves a little bit of typing for the user, and the fact that it can yield an object or use instance_exec is often a source of confusion, especially for users not familiar with the library. In general, using instance_exec for short blocks often results in user confusion, since most Ruby programmers are used to calling methods with blocks and using instance variables of the surrounding scope inside the blocks, and breaking that is often a bad idea.

In regards to the Sequel virtual row DSL, the DSL was designed back when the alternative approach was much more verbose (Sequel::SQL::Identifier.new(:column) > 11) than the current alternative approach (Sequel[:column] > 11), so the benefit of the DSL was higher back then than it is now. However, since the DSL is now widely used, it must continue to be supported. The principle to remember here is you will often need to support any DSL for a long time, so implementing a DSL just to reduce code verbosity is often a bad idea. Try hard to think of alternative approaches to using a DSL if you are using it just to reduce code verbosity.

For minitest/spec, the benefit of using the DSL is huge. For basic usage, you don't need to know about any minitest specific classes, you only need to know about four methods, describe, before, after, and it. This greatly simplifies the interface for the user and is one reason minitest/spec is such a pleasure to use. This does have an implementation cost, as minitest/spec has extra complexity on top of minitest itself, so there is a significant amount of maintenance involved. However, this is another case where the benefit outweighs the cost.

In the Sinatra case, the DSL is really what showed how simple web applications could be if you focused only on what was absolutely necessary to implement them. The actual DSL implementation in terms of extending main doesn't add much maintenance effort, so it is also a case where the benefit outweighs the cost.

As you have learned, there are some situations where implementing a DSL can be useful, and other situations where implementing a DSL can make a library worse.

Summary

In this chapter, you've learned that focusing on how the DSL will be used is the key to designing a good DSL. You've seen many examples of DSL usage from a user perspective, and how to implement each of these possible DSLs. Finally, you've learned when it is a good idea to implement a DSL, and when it may not be a good idea. With all you've learned in this chapter, you are better able to decide when a DSL makes sense for your library, and if you decide it does make sense, how better to design and implement it.

In the next chapter, you'll learn all about testing your Ruby code.

Questions

  1. What is the main advantage of using a DSL for configuring libraries?
  2. How can you implement a DSL that works both as a normal block DSL and an instance_exec DSL?
  3. Of the various reasons given in this chapter for using a DSL, which is the most likely to cause problems for the user and the least likely to add value?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.147.190