12.3.2. Defining the Query Program Classes

Image

We’ll start by defining our TextQuery class. The user will create objects of this class by supplying an istream from which to read the input file. This class also provides the query operation that will take a string and return a QueryResult representing the lines on which that string appears.

The data members of the class have to take into account the intended sharing with QueryResult objects. The QueryResult class will share the vector representing the input file and the sets that hold the line numbers associated with each word in the input. Hence, our class has two data members: a shared_ptr to a dynamically allocated vector that holds the input file, and a map from string to shared_ptr<set>. The map associates each word in the file with a dynamically allocated set that holds the line numbers on which that word appears.

To make our code a bit easier to read, we’ll also define a type member (§ 7.3.1, p. 271) to refer to line numbers, which are indices into a vector of strings:

class QueryResult; // declaration needed for return type in the query function
class TextQuery {
public:
    using line_no = std::vector<std::string>::size_type;
    TextQuery(std::ifstream&);
    QueryResult query(const std::string&) const;
private:
    std::shared_ptr<std::vector<std::string>> file;  // input file
    // map of each word to the set of the lines in which that word appears
    std::map<std::string,
             std::shared_ptr<std::set<line_no>>> wm;
};

The hardest part about this class is untangling the class names. As usual, for code that will go in a header file, we use std:: when we use a library name (§ 3.1, p. 83). In this case, the repeated use of std:: makes the code a bit hard to read at first. For example,

std::map<std::string, std::shared_ptr<std::set<line_no>>> wm;

is easier to understand when rewritten as

map<string, shared_ptr<set<line_no>>> wm;

The TextQuery Constructor

The TextQuery constructor takes an ifstream, which it reads a line at a time:

// read the input file and build the map of lines to line numbers
TextQuery::TextQuery(ifstream &is): file(new vector<string>)
{
    string text;
    while (getline(is, text)) {       // for each line in the file
        file->push_back(text);        // remember this line of text
        int n = file->size() - 1;     // the current line number
        istringstream line(text);     // separate the line into words
        string word;
        while (line >> word) {        // for each word in that line
            // if word isn't already in wm, subscripting adds a new entry
            auto &lines = wm[word]; // lines is a shared_ptr
            if (!lines) // that pointer is null the first time we see word
                lines.reset(new set<line_no>); // allocate a new set
            lines->insert(n);      // insert this line number
        }
    }
}

The constructor initializer allocates a new vector to hold the text from the input file. We use getline to read the file a line at a time and push each line onto the vector. Because file is a shared_ptr, we use the -> operator to dereference file to fetch the push_back member of the vector to which file points.

Next we use an istringstream8.3, p. 321) to process each word in the line we just read. The inner while uses the istringstream input operator to read each word from the current line into word. Inside the while, we use the map subscript operator to fetch the shared_ptr<set> associated with word and bind lines to that pointer. Note that lines is a reference, so changes made to lines will be made to the element in wm.

If word wasn’t in the map, the subscript operator adds word to wm11.3.4, p. 435). The element associated with word is value initialized, which means that lines will be a null pointer if the subscript operator added word to wm. If lines is null, we allocate a new set and call reset to update the shared_ptr to which lines refers to point to this newly allocated set.

Regardless of whether we created a new set, we call insert to add the current line number. Because lines is a reference, the call to insert adds an element to the set in wm. If a given word occurs more than once in the same line, the call to insert does nothing.

The QueryResult Class

The QueryResult class has three data members: a string that is the word whose results it represents; a shared_ptr to the vector containing the input file; and a shared_ptr to the set of line numbers on which this word appears. Its only member function is a constructor that initializes these three members:

class QueryResult {
friend std::ostream& print(std::ostream&, const QueryResult&);
public:
    QueryResult(std::string s,
                std::shared_ptr<std::set<line_no>> p,
                std::shared_ptr<std::vector<std::string>> f):
        sought(s), lines(p), file(f) { }
private:
    std::string sought;  // word this query represents
    std::shared_ptr<std::set<line_no>> lines; // lines it's on
    std::shared_ptr<std::vector<std::string>> file; // input file
};

The constructor’s only job is to store its arguments in the corresponding data members, which it does in the constructor initializer list (§ 7.1.4, p. 265).

The query Function

The query function takes a string, which it uses to locate the corresponding set of line numbers in the map. If the string is found, the query function constructs a QueryResult from the given string, the TextQuery file member, and the set that was fetched from wm.

The only question is: What should we return if the given string is not found? In this case, there is no set to return. We’ll solve this problem by defining a local static object that is a shared_ptr to an empty set of line numbers. When the word is not found, we’ll return a copy of this shared_ptr:

QueryResult
TextQuery::query(const string &sought) const
{
    // we'll return a pointer to this set if we don't find sought
    static shared_ptr<set<line_no>> nodata(new set<line_no>);
    // use find and not a subscript to avoid adding words to wm!
    auto loc = wm.find(sought);
    if (loc == wm.end())
        return QueryResult(sought, nodata, file); // not found
    else
        return QueryResult(sought, loc->second, file);
}

Printing the Results

The print function prints its given QueryResult object on its given stream:

ostream &print(ostream & os, const QueryResult &qr)
{
    // if the word was found, print the count and all occurrences
    os << qr.sought << " occurs " << qr.lines->size() << " "
       << make_plural(qr.lines->size(), "time", "s") << endl;
    // print each line in which the word appeared
    for (auto num : *qr.lines) // for every element in the set
        // don't confound the user with text lines starting at 0
        os << " (line " << num + 1 << ") "
           << *(qr.file->begin() + num) << endl;
    return os;
}

We use the size of the set to which the qr.lines points to report how many matches were found. Because that set is in a shared_ptr, we have to remember to dereference lines. We call make_plural6.3.2, p. 224) to print time or times, depending on whether that size is equal to 1.

In the for we iterate through the set to which lines points. The body of the for prints the line number, adjusted to use human-friendly counting. The numbers in the set are indices of elements in the vector, which are numbered from zero. However, most users think of the first line as line number 1, so we systematically add 1 to the line numbers to convert to this more common notation.

We use the line number to fetch a line from the vector to which file points. Recall that when we add a number to an iterator, we get the element that many elements further into the vector3.4.2, p. 111). Thus, file->begin() + num is the numth element after the start of the vector to which file points.

Note that this function correctly handles the case that the word is not found. In this case, the set will be empty. The first output statement will note that the word occurred 0 times. Because *res.lines is empty. the for loop won’t be executed.


Exercises Section 12.3.2

Exercise 12.30: Define your own versions of the TextQuery and QueryResult classes and execute the runQueries function from § 12.3.1 (p. 486).

Exercise 12.31: What difference(s) would it make if we used a vector instead of a set to hold the line numbers? Which approach is better? Why?

Exercise 12.32: Rewrite the TextQuery and QueryResult classes to use a StrBlob instead of a vector<string> to hold the input file.

Exercise 12.33: In Chapter 15 we’ll extend our query system and will need some additional members in the QueryResult class. Add members named begin and end that return iterators into the set of line numbers returned by a given query, and a member named get_file that returns a shared_ptr to the file in the QueryResult object.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.152.198