org.apache.lucene.benchmark.byTask.feeds
Class TrecDocMaker

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker
      extended by org.apache.lucene.benchmark.byTask.feeds.TrecDocMaker
All Implemented Interfaces:
DocMaker

public class TrecDocMaker
extends BasicDocMaker

A DocMaker using the (compressed) Trec collection for its input.

Config properties:


Field Summary
protected  File dataDir
           
protected  ThreadLocal dateFormat
           
protected  ArrayList inputFiles
           
protected  int iteration
           
protected  int nextFile
           
protected  BufferedReader reader
           
 
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker
BODY_FIELD, BYTES_FIELD, config, DATE_FIELD, forever, ID_FIELD, indexVal, NAME_FIELD, storeVal, termVecVal, TITLE_FIELD
 
Constructor Summary
TrecDocMaker()
           
 
Method Summary
protected  void closeInputs()
           
protected  DateFormat getDateFormat(int n)
           
protected  DocData getNextDocData()
          Return the data of the next document.
 int numUniqueTexts()
          Return how many real unique texts are available, 0 if not applicable.
protected  void openNextFile()
           
protected  Date parseDate(String dateStr)
           
protected  StringBuffer read(String prefix, StringBuffer sb, boolean collectMatchLine, boolean collectAll)
           
 void resetInputs()
          Reset inputs so that the test run would behave, input wise, as if it just started.
 void setConfig(Config config)
          Set the properties
 
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker
addBytes, addUniqueBytes, collectFiles, getByteCount, getCount, getHtmlParser, makeDocument, makeDocument, numUniqueBytes, printDocStatistics, resetUniqueBytes, setHTMLParser
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dateFormat

protected ThreadLocal dateFormat

dataDir

protected File dataDir

inputFiles

protected ArrayList inputFiles

nextFile

protected int nextFile

iteration

protected int iteration

reader

protected BufferedReader reader
Constructor Detail

TrecDocMaker

public TrecDocMaker()
Method Detail

setConfig

public void setConfig(Config config)
Description copied from interface: DocMaker
Set the properties

Specified by:
setConfig in interface DocMaker
Overrides:
setConfig in class BasicDocMaker

openNextFile

protected void openNextFile()
                     throws NoMoreDataException,
                            Exception
Throws:
NoMoreDataException
Exception

closeInputs

protected void closeInputs()

read

protected StringBuffer read(String prefix,
                            StringBuffer sb,
                            boolean collectMatchLine,
                            boolean collectAll)
                     throws Exception
Throws:
Exception

getNextDocData

protected DocData getNextDocData()
                          throws NoMoreDataException,
                                 Exception
Description copied from class: BasicDocMaker
Return the data of the next document. All current implementations can create docs forever. When the input data is exhausted, input files are iterated. This re-iteration can be avoided by setting doc.maker.forever to false (default is true).

Specified by:
getNextDocData in class BasicDocMaker
Returns:
data of the next document.
Throws:
NoMoreDataException - if data is exhausted (and 'forever' set to false).
Exception

getDateFormat

protected DateFormat getDateFormat(int n)

parseDate

protected Date parseDate(String dateStr)

resetInputs

public void resetInputs()
Description copied from interface: DocMaker
Reset inputs so that the test run would behave, input wise, as if it just started.

Specified by:
resetInputs in interface DocMaker
Overrides:
resetInputs in class BasicDocMaker

numUniqueTexts

public int numUniqueTexts()
Description copied from interface: DocMaker
Return how many real unique texts are available, 0 if not applicable.



Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.