Uses of Class
org.apache.lucene.analysis.Tokenizer

Packages that use Tokenizer
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens. 
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese and Korean. 
org.apache.lucene.analysis.cn Analyzer for Chinese. 
org.apache.lucene.analysis.ngram   
org.apache.lucene.analysis.ru Analyzer for Russian. 
org.apache.lucene.analysis.sinks
Implementations of the SinkTokenizer that might be useful. 
org.apache.lucene.analysis.standard A fast grammar-based tokenizer constructed with JFlex. 
org.apache.lucene.wikipedia.analysis   
 

Uses of Tokenizer in org.apache.lucene.analysis
 

Subclasses of Tokenizer in org.apache.lucene.analysis
 class CharTokenizer
          An abstract base class for simple, character-oriented tokenizers.
 class KeywordTokenizer
          Emits the entire input as a single token.
 class LetterTokenizer
          A LetterTokenizer is a tokenizer that divides text at non-letters.
 class LowerCaseTokenizer
          LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
 class SinkTokenizer
          A SinkTokenizer can be used to cache Tokens for use in an Analyzer
 class WhitespaceTokenizer
          A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
 

Uses of Tokenizer in org.apache.lucene.analysis.cjk
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cjk
 class CJKTokenizer
          CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages.
 

Uses of Tokenizer in org.apache.lucene.analysis.cn
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn
 class ChineseTokenizer
          Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic.
 

Uses of Tokenizer in org.apache.lucene.analysis.ngram
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ngram
 class EdgeNGramTokenizer
          Tokenizes the input from an edge into n-grams of given size(s).
 class NGramTokenizer
          Tokenizes the input into n-grams of the given size(s).
 

Uses of Tokenizer in org.apache.lucene.analysis.ru
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ru
 class RussianLetterTokenizer
          A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset".
 

Uses of Tokenizer in org.apache.lucene.analysis.sinks
 

Subclasses of Tokenizer in org.apache.lucene.analysis.sinks
 class DateRecognizerSinkTokenizer
          Attempts to parse the Token.termBuffer() as a Date using a DateFormat.
 class TokenRangeSinkTokenizer
          Counts the tokens as they go by and saves to the internal list those between the range of lower and upper, exclusive of upper
 class TokenTypeSinkTokenizer
          If the Token.type() matches the passed in typeToMatch then add it to the sink
 

Uses of Tokenizer in org.apache.lucene.analysis.standard
 

Subclasses of Tokenizer in org.apache.lucene.analysis.standard
 class StandardTokenizer
          A grammar-based tokenizer constructed with JFlex
 

Uses of Tokenizer in org.apache.lucene.wikipedia.analysis
 

Subclasses of Tokenizer in org.apache.lucene.wikipedia.analysis
 class WikipediaTokenizer
          Extension of StandardTokenizer that is aware of Wikipedia syntax.
 



Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.