How it works...

The first string consists of a simple text containing a single email address. The regular expression, while fairly complex, represents a typical email address.

The RegExChunker class' constructor uses three arguments:

  • The first one is the regular expression.
  • The second is a name used to identify the type of the entity found. We used the EMAIL string as the entity type.
  • The third argument is a chunk score used by all of the chunks. This argument is not relevant to our example. This value is assigned to successful matches to indicate its importance, which we do not use in our examples.

An instance of the class is created as shown next:

Chunker chunker = new RegExChunker(emailRegularExpression,"EMAIL",1.0);

The chunk method returns an instance of an object that implements the Chunking interface. This interface's chunkSet method was executed against this object, and returns a set of Chunk instances. Each Chunk instance represents an entity found in the sample text:

Chunking chunking = chunker.chunk(sampleText);
Set<Chunk> chunkSet = chunking.chunkSet();

The for-each statement iterates over all of the elements of the set and displays each entity found. The start and end methods return the index in the sample text where the entity is found. The type method returns the entity's type:

for (Chunk chunk : chunkSet) {
System.out.println("Entity: " +
sampleText.substring(chunk.start(), chunk.end()) +
" Type: " + chunk.type());
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.98.177