For each Marshal
function provided by the standard library’s
encoding/...
packages, there is a corresponding
Unmarshal
function that does decoding.
For example, as we saw in Section 4.5, given a byte slice
containing JSON-encoded data for our Movie
type (§12.3), we can decode it like this:
data := []byte{/* ... */} var movie Movie err := json.Unmarshal(data, &movie)
The Unmarshal
function uses reflection to modify the fields of
the existing movie
variable, creating new maps, structs, and
slices as determined by the type Movie
and the content of the
incoming data.
Let’s now implement a simple Unmarshal
function for
S-expressions, analogous to the standard json.Unmarshal
function used above, and the inverse of our earlier
sexpr.Marshal
.
We must caution you that a robust and general implementation requires
substantially more code than will comfortably fit in this example,
which is already long, so we have taken many shortcuts.
We support only a limited subset of S-expressions and do not handle
errors gracefully.
The code is intended to illustrate reflection, not parsing.
The lexer uses the Scanner
type from the text/scanner
package
to break an input stream into a sequence of tokens such as
comments, identifiers, string literals, and numeric literals.
The scanner’s Scan
method advances the scanner and returns the
kind of the next token, which has type rune
.
Most tokens, like '('
, consist of a single rune, but the
text/scanner
package represents the kinds of the
multi-character tokens Ident
, String
, and Int
using small negative values of type rune
.
Following a call to Scan
that returns one of these kinds of
token, the scanner’s TokenText
method returns the text of the
token.
Since a typical parser may need to inspect the current token several
times, but the Scan
method advances the scanner, we wrap the
scanner in a helper type called lexer
that keeps track of the
token most recently returned by Scan
.
type lexer struct { scan scanner.Scanner token rune // the current token } func (lex *lexer) next() { lex.token = lex.scan.Scan() } func (lex *lexer) text() string { return lex.scan.TokenText() } func (lex *lexer) consume(want rune) { if lex.token != want { // NOTE: Not an example of good error handling. panic(fmt.Sprintf("got %q, want %q", lex.text(), want)) } lex.next() }
Now let’s turn to the parser.
It consists of two principal functions.
The first of these, read
, reads the S-expression that starts with the
current token and updates the variable referred to by the
addressable reflect.Value
v
.
func read(lex *lexer, v reflect.Value) { switch lex.token { case scanner.Ident: // The only valid identifiers are // "nil" and struct field names. if lex.text() == "nil" { v.Set(reflect.Zero(v.Type())) lex.next() return } case scanner.String: s, _ := strconv.Unquote(lex.text()) // NOTE: ignoring errors v.SetString(s) lex.next() return case scanner.Int: i, _ := strconv.Atoi(lex.text()) // NOTE: ignoring errors v.SetInt(int64(i)) lex.next() return case '(': lex.next() readList(lex, v) lex.next() // consume ')' return } panic(fmt.Sprintf("unexpected token %q", lex.text())) }
Our S-expressions use identifiers for two distinct purposes,
struct field names and the nil
value for a pointer.
The read
function only handles the latter case.
When it encounters the scanner.Ident
"nil"
, it sets
v
to the zero value of its type using the reflect.Zero
function.
For any other identifier, it reports an error.
The readList
function, which we’ll see in a moment, handles
identifiers used as struct field names.
A '('
token indicates the start of a list.
The second function, readList
, decodes a list into a variable
of composite type—a map, struct, slice, or array—depending on what
kind of Go variable we’re currently populating.
In each case, the loop keeps parsing items until it encounters the
matching close parenthesis, ')'
, as detected by the
endList
function.
The interesting part is the recursion.
The simplest case is an array.
Until the closing ')'
is seen, we use Index
to obtain the variable for each array element and make a recursive
call to read
to populate it.
As in many other error cases, if the input data causes the decoder to
index beyond the end of the array, the decoder panics.
A similar approach is used for slices, except we must create a new
variable for each element, populate it, then append it to the slice.
The loops for structs and maps must parse a (key value)
sublist
on each iteration.
For structs, the key is a symbol identifying the field.
Analogous to the case for arrays, we obtain the existing variable for
the struct field using FieldByName
and make a recursive call to
populate it.
For maps, the key may be of any type, and analogous to the
case for slices, we create a new variable, recursively populate it,
and finally insert the new key/value pair into the map.
func readList(lex *lexer, v reflect.Value) { switch v.Kind() { case reflect.Array: // (item ...) for i := 0; !endList(lex); i++ { read(lex, v.Index(i)) } case reflect.Slice: // (item ...) for !endList(lex) { item := reflect.New(v.Type().Elem()).Elem() read(lex, item) v.Set(reflect.Append(v, item)) } case reflect.Struct: // ((name value) ...) for !endList(lex) { lex.consume('(') if lex.token != scanner.Ident { panic(fmt.Sprintf("got token %q, want field name", lex.text())) } name := lex.text() lex.next() read(lex, v.FieldByName(name)) lex.consume(')') } case reflect.Map: // ((key value) ...) v.Set(reflect.MakeMap(v.Type())) for !endList(lex) { lex.consume('(') key := reflect.New(v.Type().Key()).Elem() read(lex, key) value := reflect.New(v.Type().Elem()).Elem() read(lex, value) v.SetMapIndex(key, value) lex.consume(')') } default: panic(fmt.Sprintf("cannot decode list into %v", v.Type())) } } func endList(lex *lexer) bool { switch lex.token { case scanner.EOF: panic("end of file") case ')': return true } return false }
Finally, we wrap up the parser in an exported function
Unmarshal
, shown below, that hides some of the rough edges of
the implementation.
Errors encountered during parsing result in a panic, so
Unmarshal
uses a deferred call to recover from the panic
(§5.10) and
return an error message instead.
// Unmarshal parses S-expression data and populates the variable // whose address is in the non-nil pointer out. func Unmarshal(data []byte, out interface{}) (err error) { lex := &lexer{scan: scanner.Scanner{Mode: scanner.GoTokens}} lex.scan.Init(bytes.NewReader(data)) lex.next() // get the first token defer func() { // NOTE: this is not an example of ideal error handling. if x := recover(); x != nil { err = fmt.Errorf("error at %s: %v", lex.scan.Position, x) } }() read(lex, reflect.ValueOf(out).Elem()) return nil }
A production-quality implementation should never panic for any input
and should report an informative error for every mishap, perhaps with a
line number or offset.
Nonetheless, we hope this example conveys some idea of what’s
happening under the hood of the packages like encoding/json
,
and how you can use reflection to populate data structures.
Exercise 12.8:
The sexpr.Unmarshal
function, like json.UnMarshal
,
requires the complete input in a byte slice before it can begin
decoding.
Define a sexpr.Decoder
type that, like json.Decoder
,
allows a sequence of values to be decoded from an io.Reader
.
Change sexpr.Unmarshal
to use this new type.
Exercise 12.9:
Write a token-based API for decoding S-expressions, following the
style of xml.Decoder
(§7.14).
You will need five types of tokens: Symbol
, String
,
Int
, StartList
, and EndList
.
Exercise 12.10:
Extend sexpr.Unmarshal
to handle the booleans, floating-point
numbers, and interfaces encoded by your solution to Exercise 12.3.
(Hint: to decode interfaces, you will need a mapping from the name of
each supported type to its reflect.Type
.)
3.145.64.241