org.apache.lucene.analysis.cn
Class ChineseTokenizer

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.Tokenizer
          extended by org.apache.lucene.analysis.cn.ChineseTokenizer

public final class ChineseTokenizer
extends Tokenizer

Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. Let me use an example. If having a Chinese text "C1C2C3C4" to be indexed, the tokens returned from the ChineseTokenizer are C1, C2, C3, C4. And the tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4. Therefore the index the CJKTokenizer created is much larger. The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Version:
1.0
Author:
Yiyi Sun

Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
ChineseTokenizer(Reader in)
           
 
Method Summary
 Token next()
          Returns the next token in the stream, or null at EOS.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, reset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
next, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChineseTokenizer

public ChineseTokenizer(Reader in)
Method Detail

next

public final Token next()
                 throws IOException
Description copied from class: TokenStream
Returns the next token in the stream, or null at EOS. The returned Token is a "full private copy" (not re-used across calls to next()) but will be slower than calling TokenStream.next(Token) instead..

Overrides:
next in class TokenStream
Throws:
IOException


Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.