Class TokenizerByWord

java.lang.Object
com.topologi.diffx.load.text.TokenizerByWord
All Implemented Interfaces:
TextTokenizer

public final class TokenizerByWord extends Object implements TextTokenizer
The tokeniser for characters events.

This class is not synchronized.

Version:
11 May 2010
  • Field Details

    • recycling

      private final Map<String,TextEvent> recycling
      Map characters to events in order to recycle events as they are created.
    • whitespace

      private final WhiteSpaceProcessing whitespace
      Define the whitespace processing.
  • Constructor Details

    • TokenizerByWord

      public TokenizerByWord(WhiteSpaceProcessing whitespace)
      Creates a new tokenizer.
      Parameters:
      whitespace - the whitespace processing for this tokenizer.
      Throws:
      NullPointerException - if the white space processing is not specified.
  • Method Details

    • tokenize

      public List<TextEvent> tokenize(CharSequence seq)
      Returns the list of TextEvent corresponding to the specified character sequence.
      Specified by:
      tokenize in interface TextTokenizer
      Parameters:
      seq - the character sequence to tokenize.
      Returns:
      the corresponding list.
    • granurality

      public TextGranularity granurality()
      Always TextGranularity.WORD. Returns the text granularity of this tokenizer.
      Specified by:
      granurality in interface TextTokenizer
      Returns:
      the text granularity of this tokenizer.
    • getWordEvent

      private TextEvent getWordEvent(String word)
      Returns the word event corresponding to the specified characters.
      Parameters:
      word - the characters of the word
      Returns:
      the corresponding word event
    • getSpaceEvent

      private TextEvent getSpaceEvent(String space)
      Returns the space event corresponding to the specified characters.
      Parameters:
      space - the characters of the space
      Returns:
      the corresponding space event