Importing from directory

Mallet supports reading from directory with the cc.mallet.pipe.iterator.FileIterator class. A file iterator is constructed with the following three parameters:

  • A list of File[] directories with text files
  • A file filter that specifies which files to select within a directory
  • A pattern that is applied to a filename to produce a class label

Consider the data structured into folders as shown in the following screenshot. We have documents organized in five topics by folders (tech, entertainment, politicssport, and business). Each folder contains documents on particular topics, as shown in the following screenshot:

In this case, we initialize iterator as follows:

FileIterator iterator = 
  new FileIterator(new File[]{new File("path-to-my-dataset")}, 
  new TxtFilter(), 
  FileIterator.LAST_DIRECTORY); 

The first parameter specifies the path to our root folder, the second parameter limits the iterator to the .txt files only, while the last parameter asks the method to use the last directory name in the path as class label.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.158.151