org.apache.lucene.benchmark.byTask.feeds
Class BasicDocMaker

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker
All Implemented Interfaces:
DocMaker
Direct Known Subclasses:
DirDocMaker, LineDocMaker, ReutersDocMaker, SimpleDocMaker, TrecDocMaker

public abstract class BasicDocMaker
extends Object
implements DocMaker

Create documents for the test. Maintains counters of chars etc. so that sub-classes just need to provide textual content, and the create-by-size is handled here.

Config Params (default is in caps): doc.stored=true|FALSE
doc.tokenized=TRUE|false
doc.term.vector=true|FALSE
doc.term.vector.positions=true|FALSE
doc.term.vector.offsets=true|FALSE
doc.store.body.bytes=true|FALSE //Store the body contents raw UTF-8 bytes as a field


Field Summary
static String BODY_FIELD
           
static String BYTES_FIELD
           
protected  Config config
           
static String DATE_FIELD
           
protected  boolean forever
           
static String ID_FIELD
           
protected  Field.Index indexVal
           
static String NAME_FIELD
           
protected  Field.Store storeVal
           
protected  Field.TermVector termVecVal
           
static String TITLE_FIELD
           
 
Constructor Summary
BasicDocMaker()
           
 
Method Summary
protected  void addBytes(long n)
           
protected  void addUniqueBytes(long n)
           
protected  void collectFiles(File f, ArrayList inputFiles)
           
 long getByteCount()
          Return total byte size of docs made since last reset.
 int getCount()
          Return number of docs made since last reset.
 HTMLParser getHtmlParser()
          Returns the htmlParser.
protected abstract  DocData getNextDocData()
          Return the data of the next document.
 Document makeDocument()
          Create the next document.
 Document makeDocument(int size)
          Create the next document, of the given size by input bytes.
 long numUniqueBytes()
          Return total bytes of all available unique texts, 0 if not applicable
 void printDocStatistics()
          Print some statistics on docs available/added/etc.
 void resetInputs()
          Reset inputs so that the test run would behave, input wise, as if it just started.
protected  void resetUniqueBytes()
           
 void setConfig(Config config)
          Set the properties
 void setHTMLParser(HTMLParser htmlParser)
          Set the html parser to use, when appropriate
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.lucene.benchmark.byTask.feeds.DocMaker
numUniqueTexts
 

Field Detail

forever

protected boolean forever

BODY_FIELD

public static final String BODY_FIELD
See Also:
Constant Field Values

TITLE_FIELD

public static final String TITLE_FIELD
See Also:
Constant Field Values

DATE_FIELD

public static final String DATE_FIELD
See Also:
Constant Field Values

ID_FIELD

public static final String ID_FIELD
See Also:
Constant Field Values

BYTES_FIELD

public static final String BYTES_FIELD
See Also:
Constant Field Values

NAME_FIELD

public static final String NAME_FIELD
See Also:
Constant Field Values

config

protected Config config

storeVal

protected Field.Store storeVal

indexVal

protected Field.Index indexVal

termVecVal

protected Field.TermVector termVecVal
Constructor Detail

BasicDocMaker

public BasicDocMaker()
Method Detail

getNextDocData

protected abstract DocData getNextDocData()
                                   throws NoMoreDataException,
                                          Exception
Return the data of the next document. All current implementations can create docs forever. When the input data is exhausted, input files are iterated. This re-iteration can be avoided by setting doc.maker.forever to false (default is true).

Returns:
data of the next document.
Throws:
if - cannot create the next doc data
NoMoreDataException - if data is exhausted (and 'forever' set to false).
Exception

makeDocument

public Document makeDocument()
                      throws Exception
Description copied from interface: DocMaker
Create the next document.

Specified by:
makeDocument in interface DocMaker
Throws:
Exception

makeDocument

public Document makeDocument(int size)
                      throws Exception
Description copied from interface: DocMaker
Create the next document, of the given size by input bytes. If the implementation does not support control over size, an exception is thrown.

Specified by:
makeDocument in interface DocMaker
Parameters:
size - size of document, or 0 if there is no size requirement.
Throws:
Exception

setConfig

public void setConfig(Config config)
Description copied from interface: DocMaker
Set the properties

Specified by:
setConfig in interface DocMaker

resetInputs

public void resetInputs()
Description copied from interface: DocMaker
Reset inputs so that the test run would behave, input wise, as if it just started.

Specified by:
resetInputs in interface DocMaker

numUniqueBytes

public long numUniqueBytes()
Description copied from interface: DocMaker
Return total bytes of all available unique texts, 0 if not applicable

Specified by:
numUniqueBytes in interface DocMaker

getCount

public int getCount()
Description copied from interface: DocMaker
Return number of docs made since last reset.

Specified by:
getCount in interface DocMaker

getByteCount

public long getByteCount()
Description copied from interface: DocMaker
Return total byte size of docs made since last reset.

Specified by:
getByteCount in interface DocMaker

addUniqueBytes

protected void addUniqueBytes(long n)

resetUniqueBytes

protected void resetUniqueBytes()

addBytes

protected void addBytes(long n)

printDocStatistics

public void printDocStatistics()
Description copied from interface: DocMaker
Print some statistics on docs available/added/etc.

Specified by:
printDocStatistics in interface DocMaker

collectFiles

protected void collectFiles(File f,
                            ArrayList inputFiles)

setHTMLParser

public void setHTMLParser(HTMLParser htmlParser)
Description copied from interface: DocMaker
Set the html parser to use, when appropriate

Specified by:
setHTMLParser in interface DocMaker

getHtmlParser

public HTMLParser getHtmlParser()
Description copied from interface: DocMaker
Returns the htmlParser.

Specified by:
getHtmlParser in interface DocMaker


Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.