org.apache.lucene.analysis
Class Tokenizer

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.Tokenizer
Direct Known Subclasses:
CharTokenizer, ChineseTokenizer, CJKTokenizer, EdgeNGramTokenizer, KeywordTokenizer, NGramTokenizer, SinkTokenizer, StandardTokenizer, WikipediaTokenizer

public abstract class Tokenizer
extends TokenStream

A Tokenizer is a TokenStream whose input is a Reader.

This is an abstract class.

NOTE: subclasses must override TokenStream.next(Token). It's also OK to instead override TokenStream.next() but that method is now deprecated in favor of TokenStream.next(Token).

NOTE: subclasses overriding TokenStream.next(Token) must call Token.clear().


Field Summary
protected  Reader input
          The text source for this Tokenizer.
 
Constructor Summary
protected Tokenizer()
          Construct a tokenizer with null input.
protected Tokenizer(Reader input)
          Construct a token stream processing the given input.
 
Method Summary
 void close()
          By default, closes the input Reader.
 void reset(Reader input)
          Expert: Reset the tokenizer to a new reader.
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
next, next, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

input

protected Reader input
The text source for this Tokenizer.

Constructor Detail

Tokenizer

protected Tokenizer()
Construct a tokenizer with null input.


Tokenizer

protected Tokenizer(Reader input)
Construct a token stream processing the given input.

Method Detail

close

public void close()
           throws IOException
By default, closes the input Reader.

Overrides:
close in class TokenStream
Throws:
IOException

reset

public void reset(Reader input)
           throws IOException
Expert: Reset the tokenizer to a new reader. Typically, an analyzer (in its reusableTokenStream method) will use this to re-use a previously created tokenizer.

Throws:
IOException


Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.