Package com.sun.speech.freetts.en
Class TokenizerImpl
- java.lang.Object
-
- com.sun.speech.freetts.en.TokenizerImpl
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringDEFAULT_POSTPUNCTUATION_SYMBOLSA string containing the default post-punctuation characters.static java.lang.StringDEFAULT_PREPUNCTUATION_SYMBOLSA string containing the default pre-punctuation characters.static java.lang.StringDEFAULT_SINGLE_CHAR_SYMBOLSA string containing the default single characters.static java.lang.StringDEFAULT_WHITESPACE_SYMBOLSA string containing the default whitespace characters.static intEOFA constant indicating that the end of the stream has been read.
-
Constructor Summary
Constructors Constructor Description TokenizerImpl()Constructs a Tokenizer.TokenizerImpl(java.io.Reader file)Creates a tokenizer that will return tokens from the given file.TokenizerImpl(java.lang.String string)Creates a tokenizer that will return tokens from the given string.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringgetErrorDescription()if hasErrors returnstrue, this will return a description of the error encountered, otherwise it will returnnullTokengetNextToken()Returns the next token.booleanhasErrors()Returnstrueif there were errors while reading tokensbooleanhasMoreTokens()Returnstrueif there are more tokens,falseotherwise.booleanisBreak()Determines if the current token should start a new sentence.voidsetInputReader(java.io.Reader reader)Sets the input readervoidsetInputText(java.lang.String inputString)Sets the text to tokenize.voidsetPostpunctuationSymbols(java.lang.String symbols)Sets the postpunctuation symbols of this Tokenizer to the given symbols.voidsetPrepunctuationSymbols(java.lang.String symbols)Sets the prepunctuation symbols of this Tokenizer to the given symbols.voidsetSingleCharSymbols(java.lang.String symbols)Sets the single character symbols of this Tokenizer to the given symbols.voidsetWhitespaceSymbols(java.lang.String symbols)Sets the whitespace symbols of this Tokenizer to the given symbols.
-
-
-
Field Detail
-
EOF
public static final int EOF
A constant indicating that the end of the stream has been read.- See Also:
- Constant Field Values
-
DEFAULT_WHITESPACE_SYMBOLS
public static final java.lang.String DEFAULT_WHITESPACE_SYMBOLS
A string containing the default whitespace characters.- See Also:
- Constant Field Values
-
DEFAULT_SINGLE_CHAR_SYMBOLS
public static final java.lang.String DEFAULT_SINGLE_CHAR_SYMBOLS
A string containing the default single characters.- See Also:
- Constant Field Values
-
DEFAULT_PREPUNCTUATION_SYMBOLS
public static final java.lang.String DEFAULT_PREPUNCTUATION_SYMBOLS
A string containing the default pre-punctuation characters.- See Also:
- Constant Field Values
-
DEFAULT_POSTPUNCTUATION_SYMBOLS
public static final java.lang.String DEFAULT_POSTPUNCTUATION_SYMBOLS
A string containing the default post-punctuation characters.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TokenizerImpl
public TokenizerImpl()
Constructs a Tokenizer.
-
TokenizerImpl
public TokenizerImpl(java.lang.String string)
Creates a tokenizer that will return tokens from the given string.- Parameters:
string- the string to tokenize
-
TokenizerImpl
public TokenizerImpl(java.io.Reader file)
Creates a tokenizer that will return tokens from the given file.- Parameters:
file- where to read the input from
-
-
Method Detail
-
setWhitespaceSymbols
public void setWhitespaceSymbols(java.lang.String symbols)
Sets the whitespace symbols of this Tokenizer to the given symbols.- Specified by:
setWhitespaceSymbolsin interfaceTokenizer- Parameters:
symbols- the whitespace symbols
-
setSingleCharSymbols
public void setSingleCharSymbols(java.lang.String symbols)
Sets the single character symbols of this Tokenizer to the given symbols.- Specified by:
setSingleCharSymbolsin interfaceTokenizer- Parameters:
symbols- the single character symbols
-
setPrepunctuationSymbols
public void setPrepunctuationSymbols(java.lang.String symbols)
Sets the prepunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPrepunctuationSymbolsin interfaceTokenizer- Parameters:
symbols- the prepunctuation symbols
-
setPostpunctuationSymbols
public void setPostpunctuationSymbols(java.lang.String symbols)
Sets the postpunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPostpunctuationSymbolsin interfaceTokenizer- Parameters:
symbols- the postpunctuation symbols
-
setInputText
public void setInputText(java.lang.String inputString)
Sets the text to tokenize.- Specified by:
setInputTextin interfaceTokenizer- Parameters:
inputString- the string to tokenize
-
setInputReader
public void setInputReader(java.io.Reader reader)
Sets the input reader- Specified by:
setInputReaderin interfaceTokenizer- Parameters:
reader- the input source
-
getNextToken
public Token getNextToken()
Returns the next token.- Specified by:
getNextTokenin interfaceTokenizer- Returns:
- the next token if it exists,
nullif no more tokens
-
hasMoreTokens
public boolean hasMoreTokens()
Returnstrueif there are more tokens,falseotherwise.- Specified by:
hasMoreTokensin interfaceTokenizer- Returns:
trueif there are more tokensfalseotherwise
-
hasErrors
public boolean hasErrors()
Returnstrueif there were errors while reading tokens
-
getErrorDescription
public java.lang.String getErrorDescription()
if hasErrors returnstrue, this will return a description of the error encountered, otherwise it will returnnull- Specified by:
getErrorDescriptionin interfaceTokenizer- Returns:
- a description of the last error that occurred.
-
-