Visitor pattern for document traversal

The tree-structured DOM created by us needs to be traversed to produce the content in an output format like HTML, PDF, or SVG.

Note

The composite tree created by us can be traversed using the GoF visitor pattern. Wherever composite pattern is used for composing an hierarchy of objects, the visitor pattern is a natural choice for the traversal of the tree.

In a visitor pattern implementation, every node in the composite tree will support a method called accept, which takes a visitor concrete class as a parameter. The job of the accept routine is to reflect the call to the appropriate visit method in the visitor concrete class. We declare an interface named IDocumentVisitor with methods for visiting each of the elements in our hierarchy as follows:

    public interface IDocumentVisitor 
    { 
      void visit(TDocument doc); 
      void visit(TDocumentTable table); 
      void visit(TDocumentTableRow row); 
      void visit(TDocumentTableCell cell); 
      void visit(TDocumentText txt); 
    } 

The traversal of the tree should start from the top node of the tree. In our case, we start the traversal from the TDocument node. For each node in the hierarchy, we will add an accept method, which takes an instance of IDocumentVisitor. The signature of this function is as follows:

    public abstract class TDocumentElement 
    { 
      //--- code omitted 
      public abstract void accept(IDocumentVisitor doc_vis); 
      //--- code omitted 
    } 

Each element of the document node which derives from TDocumentElement needs to have an implementation of this method. For example, the body of the TDocument class is as follows:

    public class TDocument : TDocumentElement 
    { 
      //----- code omitted 
      public override void accept(IDocumentVisitor doc_vis) 
      { 
        doc_vis.visit(this); 
      } 
      //------ code omitted
    } 

In the TDocument class, the accept method will reflect the call to the IDocumentVisitor visit(TDocument) method implemented by the Visitor class. In the Visitor class, for each node inserted as a child, a call to the accept method of the respective nodes will be triggered. Each time the call gets reflected back to the appropriate visit method of the Visitor class. In this manner, the accept/visit pair processes the whole hierarchy.

The traversal starts with the TDocument accept method, as follows:

    string filename = @"D:abfund.pdf"; 
    ds.accept(new PDFVisitor(filename)); 

The ds object is of type TDocument, and the accept method takes an instance of the IDocumentVisitor interface. In the document object, the call gets reflected to the IDocumentVisitor visit(TDocument) method.

PDFVisitor for PDF generation

We have defined our object hierarchy and an interface to traverse the hierarchy. Now we need to implement routines to traverse the tree. The PDFVisitor class implements the IDocumentVisitor interface, as shown in the following code snippet:

    public class PDFVisitor : IDocumentVisitor 
    { 
      private string file_name = null; 
      private PdfWriter writer = null; 
      private Document document = null; 
      private PdfPTable table_temp = null; 
      private FileStream fs = null; 
      private int column_count; 
 
      public PDFVisitor(string filename) 
      { 
        file_name = filename; 
        fs = new FileStream(file_name, FileMode.Create); 
        document = new Document(PageSize.A4, 25, 25, 30, 30); 
        writer = PdfWriter.GetInstance(document, fs); 
      } 

The visit method, which takes TDocument as a parameter, adds some metadata to the PDF document being created. After this operation, the method inspects all the child elements of TDocument, and issues an accept method call with the current visitor instance. This invokes the accept method of the concrete class of TDocumentElement embedded as a child object:

    public void visit(TDocument doc) 
    { 
      document.AddAuthor(@"Praseed Pai & Shine Xavier"); 
      document.AddCreator(@"iTextSharp Library"); 
      document.AddKeywords(@"Design Patterns Architecture"); 
      document.AddSubject(@"Book on .NET Design Patterns"); 
      document.Open(); 
      column_count = doc.ColumnCount; 
      document.AddTitle(doc.Title); 
 
      for (int x = 0; x < doc.DocumentElements.Count; x++) 
      { 
        try 
        { 
          doc.DocumentElements[x].accept(this); 
        } 
        catch (Exception ex) 
        { 
          Console.Error.WriteLine(ex.Message); 
        } 
      } 
      document.Add(this.table_temp); 
      document.Close(); 
      writer.Close(); 
      fs.Close(); 

The TDocumentTable object will be handled by the visit method in a similar fashion. Once we have worked with the node, all the children stored in DocumentElements will be processed by invoking the accept method of each of the node element embedded inside the table:

    public void visit(TDocumentTable table) 
    { 
      this.table_temp = new PdfPTable(column_count); 
      PdfPCell cell = new  
      PdfPCell(new Phrase("Header spanning 3 columns")); 
      cell.Colspan = column_count; 
      cell.HorizontalAlignment = 1;  
      table_temp.AddCell(cell); 
      for (int x = 0; x < table.RowCount; x++) 
      { 
        try 
        { 
          table.DocumentElements[x].accept(this); 
        } 
        catch (Exception ex) 
        { 
          Console.Error.WriteLine(ex.Message); 
        } 
      } 
    } 

Mostly, an instance of TDocumentTableRow is included as a child of TDocumentTable. For our implementation, we will navigate to all the children of a row object, issuing accept calls to the respective nodes:

Note

A table is a collection of rows, and a row is a collection of cells. Each of the cells contains some text. We can add a collection of text inside a cell as well. Our implementation assumes that we will store only one text.

    public void visit(TDocumentTableRow row) 
    { 
      for (int I = 0; i < row.DocumentElements.Count; ++i) 
      { 
        row.DocumentElements[i].accept(this); 
      } 
    } 

To process TDocumentTableCell, we iterate through all the child elements of a cell, and these elements are instances TDocumentText. For the sake of brevity, we have included an attribute called Text to store the contents of a cell there:

    public void visit(TDocumentTableCell cell) 
    { 
      for (int i = 0; i < cell.DocumentElements.Count; ++i) 
      { 
        cell.DocumentElements[i].accept(this); 
      } 
    } 

The TDocumentText class has a property by the name of Text, where an application developer can store some text. That will be added to the table:

    public void visit(TDocumentText txt) 
    { 
      table_temp.AddCell(txt.Text); 
    } 
  } 

HTMLVisitor for HTML generation

The HTMLVisitor class produces HTML output by traversing the DOM. The skeleton implementation of HTMLVisitor is as follows:

    public class HTMLVisitor : IDocumentVisitor 
    { 
      private String file_name = null; 
      private StringBuilder document = null; 
      public HTMLVisitor(string filename) { 
        file_name = filename; 
      } 
      //--- Code omitted for all methods 
      public void visit(TDocument doc){} 
      public void visit(TDocumentTable table){} 
      public void visit(TDocumentTableRow row) {} 
      public void visit(TDocumentTableCell cell) {} 
      public void visit(TDocumentText txt) {} 
    }

The HTMLVisitor class can be leveraged as follows:

    string filename = @"D:abfund.html"; 
    ds.accept(new HTMLVisitor(filename)); 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.253.152