The technique presented in this subsection will be demonstrated by the byWord.go file, which is shown in four parts. As you will see in the Go code, separating the words of a line can be tricky. The first part of this utility is as follows:
package main import ( "bufio" "flag" "fmt" "io" "os" "regexp" )
The second code portion of byWord.go is shown in the following Go code:
func wordByWord(file string) error { var err error f, err := os.Open(file) if err != nil { return err } defer f.Close() r := bufio.NewReader(f) for { line, err := r.ReadString(' ') if err == io.EOF { break } else if err != nil { fmt.Printf("error reading file %s", err) return err }
This part of the wordByWord() function is the same as the lineByLine() function of the byLine.go utility.
The third part of byWord.go is as follows:
r := regexp.MustCompile("[^\s]+") words := r.FindAllString(line, -1) for i := 0; i < len(words); i++ { fmt.Println(words[i]) } } return nil }
The remaining code of the wordByWord() function is totally new, and it uses regular expressions to separate the words found in each line of the input. The regular expression defined in the regexp.MustCompile("[^\s]+") statement states that empty characters will separate one word from another.
The last code segment of byWord.go is as follows:
func main() { flag.Parse() if len(flag.Args()) == 0 { fmt.Printf("usage: byWord <file1> [<file2> ...] ") return } for _, file := range flag.Args() { err := wordByWord(file) if err != nil { fmt.Println(err) } } }
Executing byWord.go will produce the following type of output:
$ go run byWord.go /tmp/adobegc.log 01/08/18 20:25:09:669 | [INFO]
You can verify the validity of byWord.go with the help of the wc(1) utility:
$ go run byWord.go /tmp/adobegc.log | wc 91591 91591 559005 $ wc /tmp/adobegc.log 4831 91591 583454 /tmp/adobegc.log
As you can see, the number of words calculated by wc(1) is the same as the number of lines and words that you took from the execution of byWord.go.