lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Gilbert <>
Subject Re: Lucene outperforms MySQL, BerkeleyDB, and PostgreSQL for genome map searches
Date Fri, 02 Sep 2005 22:50:16 GMT


The limitation on huge genome map ranges is for display software (putting all
those features into an image a person can understand).  500Kb is about an
average viewable size, though some uses will draw on 1-10 Mb of data.  I used
a real-world test case that will be directly applicable to how fast biologists
get to see their interesting genes.   Some time maybe I'll benchmark bigger
ranges.  This use of Lucene for bio-data is perhaps not where can show its
advantage most dramatically (as noted the SQL databases are pretty good at
numeric searches and Lucene only edges them out by a nose:).  It is really
with the text (biology-jargon) rich literature, experimental data sets where
Lucene can really show its stuff.  Phrase searching of biology experimental
phrases is a good example - almost impossible to do easily with SQL systems (even
MySQL textsearch is weak here), and Lucene in my tests easily picks out
relevavnt biology phrases. Lion Bioscience's SRS is a widely used commercial
system that is text-search based, but it lacks phrase search ability.

-- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message