12.6 Example: Decoding S-Expressions

For each Marshal function provided by the standard library’s encoding/... packages, there is a corresponding Unmarshal function that does decoding. For example, as we saw in Section 4.5, given a byte slice containing JSON-encoded data for our Movie type (§12.3), we can decode it like this:

data := []byte{/* ... */}
var movie Movie
err := json.Unmarshal(data, &movie)

The Unmarshal function uses reflection to modify the fields of the existing movie variable, creating new maps, structs, and slices as determined by the type Movie and the content of the incoming data.

Let’s now implement a simple Unmarshal function for S-expressions, analogous to the standard json.Unmarshal function used above, and the inverse of our earlier sexpr.Marshal. We must caution you that a robust and general implementation requires substantially more code than will comfortably fit in this example, which is already long, so we have taken many shortcuts. We support only a limited subset of S-expressions and do not handle errors gracefully. The code is intended to illustrate reflection, not parsing.

The lexer uses the Scanner type from the text/scanner package to break an input stream into a sequence of tokens such as comments, identifiers, string literals, and numeric literals. The scanner’s Scan method advances the scanner and returns the kind of the next token, which has type rune. Most tokens, like '(', consist of a single rune, but the text/scanner package represents the kinds of the multi-character tokens Ident, String, and Int using small negative values of type rune. Following a call to Scan that returns one of these kinds of token, the scanner’s TokenText method returns the text of the token.

Since a typical parser may need to inspect the current token several times, but the Scan method advances the scanner, we wrap the scanner in a helper type called lexer that keeps track of the token most recently returned by Scan.

gopl.io/ch12/sexpr
type lexer struct {
    scan  scanner.Scanner
    token rune // the current token
}

func (lex *lexer) next()        { lex.token = lex.scan.Scan() }
func (lex *lexer) text() string { return lex.scan.TokenText() }

func (lex *lexer) consume(want rune) {
    if lex.token != want { // NOTE: Not an example of good error handling.
        panic(fmt.Sprintf("got %q, want %q", lex.text(), want))
    }
    lex.next()
}

Now let’s turn to the parser. It consists of two principal functions. The first of these, read, reads the S-expression that starts with the current token and updates the variable referred to by the addressable reflect.Value v.

func read(lex *lexer, v reflect.Value) {
    switch lex.token {
    case scanner.Ident:
        // The only valid identifiers are
        // "nil" and struct field names.
        if lex.text() == "nil" {
            v.Set(reflect.Zero(v.Type()))
            lex.next()
            return
        }
    case scanner.String:
        s, _ := strconv.Unquote(lex.text()) // NOTE: ignoring errors
        v.SetString(s)
        lex.next()
        return
    case scanner.Int:
        i, _ := strconv.Atoi(lex.text()) // NOTE: ignoring errors
        v.SetInt(int64(i))
        lex.next()
        return
    case '(':
        lex.next()
        readList(lex, v)
        lex.next() // consume ')'
        return
    }
    panic(fmt.Sprintf("unexpected token %q", lex.text()))
}

Our S-expressions use identifiers for two distinct purposes, struct field names and the nil value for a pointer. The read function only handles the latter case. When it encounters the scanner.Ident "nil", it sets v to the zero value of its type using the reflect.Zero function. For any other identifier, it reports an error. The readList function, which we’ll see in a moment, handles identifiers used as struct field names.

A '(' token indicates the start of a list. The second function, readList, decodes a list into a variable of composite type—a map, struct, slice, or array—depending on what kind of Go variable we’re currently populating. In each case, the loop keeps parsing items until it encounters the matching close parenthesis, ')', as detected by the endList function.

The interesting part is the recursion. The simplest case is an array. Until the closing ')' is seen, we use Index to obtain the variable for each array element and make a recursive call to read to populate it. As in many other error cases, if the input data causes the decoder to index beyond the end of the array, the decoder panics. A similar approach is used for slices, except we must create a new variable for each element, populate it, then append it to the slice.

The loops for structs and maps must parse a (key value) sublist on each iteration. For structs, the key is a symbol identifying the field. Analogous to the case for arrays, we obtain the existing variable for the struct field using FieldByName and make a recursive call to populate it. For maps, the key may be of any type, and analogous to the case for slices, we create a new variable, recursively populate it, and finally insert the new key/value pair into the map.

func readList(lex *lexer, v reflect.Value) {
    switch v.Kind() {
    case reflect.Array: // (item ...)
        for i := 0; !endList(lex); i++ {
            read(lex, v.Index(i))
        }

    case reflect.Slice: // (item ...)
        for !endList(lex) {
            item := reflect.New(v.Type().Elem()).Elem()
            read(lex, item)
            v.Set(reflect.Append(v, item))
        }

    case reflect.Struct: // ((name value) ...)
        for !endList(lex) {
            lex.consume('(')
            if lex.token != scanner.Ident {
                panic(fmt.Sprintf("got token %q, want field name", lex.text()))
            }
            name := lex.text()
            lex.next()
            read(lex, v.FieldByName(name))
            lex.consume(')')
        }

    case reflect.Map: // ((key value) ...)
        v.Set(reflect.MakeMap(v.Type()))
        for !endList(lex) {
            lex.consume('(')
            key := reflect.New(v.Type().Key()).Elem()
            read(lex, key)
            value := reflect.New(v.Type().Elem()).Elem()
            read(lex, value)
            v.SetMapIndex(key, value)
            lex.consume(')')
        }

    default:
        panic(fmt.Sprintf("cannot decode list into %v", v.Type()))
    }
}

func endList(lex *lexer) bool {
    switch lex.token {
    case scanner.EOF:
        panic("end of file")
    case ')':
        return true
    }
    return false
}

Finally, we wrap up the parser in an exported function Unmarshal, shown below, that hides some of the rough edges of the implementation. Errors encountered during parsing result in a panic, so Unmarshal uses a deferred call to recover from the panic (§5.10) and return an error message instead.

// Unmarshal parses S-expression data and populates the variable
// whose address is in the non-nil pointer out.
func Unmarshal(data []byte, out interface{}) (err error) {
    lex := &lexer{scan: scanner.Scanner{Mode: scanner.GoTokens}}
    lex.scan.Init(bytes.NewReader(data))
    lex.next() // get the first token
    defer func() {
        // NOTE: this is not an example of ideal error handling.
        if x := recover(); x != nil {
            err = fmt.Errorf("error at %s: %v", lex.scan.Position, x)
        }
    }()
    read(lex, reflect.ValueOf(out).Elem())
    return nil
}

A production-quality implementation should never panic for any input and should report an informative error for every mishap, perhaps with a line number or offset. Nonetheless, we hope this example conveys some idea of what’s happening under the hood of the packages like encoding/json, and how you can use reflection to populate data structures.

Exercise 12.8: The sexpr.Unmarshal function, like json.UnMarshal, requires the complete input in a byte slice before it can begin decoding. Define a sexpr.Decoder type that, like json.Decoder, allows a sequence of values to be decoded from an io.Reader. Change sexpr.Unmarshal to use this new type.

Exercise 12.9: Write a token-based API for decoding S-expressions, following the style of xml.Decoder (§7.14). You will need five types of tokens: Symbol, String, Int, StartList, and EndList.

Exercise 12.10: Extend sexpr.Unmarshal to handle the booleans, floating-point numbers, and interfaces encoded by your solution to Exercise 12.3. (Hint: to decode interfaces, you will need a mapping from the name of each supported type to its reflect.Type.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.23