Tuesday, 14. August 2007
Lucene Performance
javatux, 13:41h
I started a proof of concept if Lucence could be used for searching in log files. Here a first result of the indexer (see also benchmarks of the Lucene site):
- Dedicated machine for indexing: yes
- CPU: Intel Pentium M, 1.6 GHz, 1 processor
- RAM: 2 GB
- Drive configuration: IDE 2,5" hard disk (in a Dell Lititude D810 notebook)
- Lucene Version: 2.2.0
- Java Version: Java SE 1.6.0_02
- Java VM: client VM
- OS Version: WinXP with SP1
- Location of index: local
- Number of source documents: 9
- Total filesize of source documents: 15 MB
- Average filesize of source documents:2 MB
- Source documents storage location: Filesystem
- File type of source documents: log files
- Parser(s) used, if any:
- Analyzer(s) used: StandardAnalyzer
- Number of fields per document: 3
- Type of fields: text
- Index persistence: FSDirectory
- Index size: 3 MB
- Time taken (in ms/s as an average of at least 3 indexing
runs): 7 s (first try: 37 s -> ignored) - Time taken / 1000 docs indexed: 150 s (estimated)
- Memory consumption: started with -Xmx128m -Xms128m
- Query speed: not yet measured
- Note: first prototype, no special tuning/strategies
- Note: maxFieldLength set to 1,000,000 (default of 10,000 was to small)
Hardware Environment
Software environment
Lucene indexing variables
Figures
Notes
... comment
javatux,
Wednesday, 12. September 2007, 11:44
Update
I restarted the tests and corrected the values because the first try was not really relevant. I started with a default maxFieldLength of 10,000, i.e. not all files are really read till EOF in the beginning. Now the maxFieldLength is increased to 1,000,000 which is probably much to high, but the files are all read completely now.
... link
... comment
javatux,
Wednesday, 17. October 2007, 21:05
Lucene Performance Tuning
A big influence on the indexing performance has the analyzer. If you don't use the StandardAnalyzer but the SimpleAnalyzer the performance increase about the factor 6-8. In a project "LogBrowser", where several GBs of log files are indexed, the indexing time slows down from 5 hours to about 50 minutes!
... link
... comment