|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Tokenizer | |
---|---|
org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese and Korean. |
org.apache.lucene.analysis.cn | Analyzer for Chinese. |
org.apache.lucene.analysis.ngram | |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.sinks | Implementations of the SinkTokenizer that might be useful. |
org.apache.lucene.analysis.standard | A fast grammar-based tokenizer constructed with JFlex. |
org.apache.lucene.wikipedia.analysis |
Uses of Tokenizer in org.apache.lucene.analysis |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis | |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers. |
class |
KeywordTokenizer
Emits the entire input as a single token. |
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. |
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. |
class |
SinkTokenizer
A SinkTokenizer can be used to cache Tokens for use in an Analyzer |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
Uses of Tokenizer in org.apache.lucene.analysis.cjk |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk | |
---|---|
class |
CJKTokenizer
CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. |
Uses of Tokenizer in org.apache.lucene.analysis.cn |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cn | |
---|---|
class |
ChineseTokenizer
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. |
Uses of Tokenizer in org.apache.lucene.analysis.ngram |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram | |
---|---|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s). |
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). |
Uses of Tokenizer in org.apache.lucene.analysis.ru |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ru | |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". |
Uses of Tokenizer in org.apache.lucene.analysis.sinks |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.sinks | |
---|---|
class |
DateRecognizerSinkTokenizer
Attempts to parse the Token.termBuffer() as a Date using a DateFormat . |
class |
TokenRangeSinkTokenizer
Counts the tokens as they go by and saves to the internal list those between the range of lower and upper, exclusive of upper |
class |
TokenTypeSinkTokenizer
If the Token.type() matches the passed in typeToMatch then
add it to the sink |
Uses of Tokenizer in org.apache.lucene.analysis.standard |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.standard | |
---|---|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex |
Uses of Tokenizer in org.apache.lucene.wikipedia.analysis |
---|
Subclasses of Tokenizer in org.apache.lucene.wikipedia.analysis | |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |