lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smiley, David W." <dsmi...@mitre.org>
Subject Re: Geospatial search in Lucene/Solr
Date Tue, 28 Dec 2010 16:59:06 GMT
Thanks for letting me know about this Rob.  I think geonames is much simpler (and much less
data) to work with than wikipedia.  It's plain tab-delimited and I like that it includes the
population.  I'll press forward with my benchmark module based patch.  I can relatively easily
switch between the lat-lon type and my geohash type since they both conform to the SpatialQueriable
interface, and so consequently I don't need two complete Lucene checkouts.  I had to add Solr
& spatial as dependencies to the benchmark module but it's worth it to me.

~ David

On Dec 28, 2010, at 11:18 AM, Robert Muir wrote:

> On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. <dsmiley@mitre.org> wrote:
>> Presently, I’m working on Lucene’s benchmark contrib module to evaluate the
>> performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon
>> range queries), and then I’ll work on a more efficient probably non-geohash
>> implementation but based on the same underlying concept of a hierarchical
>> grid.  I’m using the geonames.org data set.  Unfortunately, the benchmark
>> code seems very oriented to a generic title-body document whereas I’m
>> looking to create lat-lon pairs… and furthermore to create documents
>> containing multiple lat-lon pairs, and even furthermore a query generator
>> that generates random box queries centered on a random location from the
>> data set.  I seem to be stretching the benchmark framework beyond the
>> use-case it was designed for and so perhaps it won’t be committable but at
>> least I’ll have a patch for other geospatial birds-of-a-feather like you to
>> use.
>> 
>> Stretch away.  The Title/Body orientation is just a relic of what we have
>> done in the past, it doesn't have to stay that way.
> 
> just for reference, a couple of us are using a python front-end to
> contrib/benchmark that Mike developed:
> 
> http://code.google.com/p/luceneutil/
> 
> This is nice as its designed for you to just declare 'competitors' (2
> checkouts of solrcene), and then you run the python script and it
> gives you the relative comparison... because they are 2 different
> checkouts its simple to compare different approaches, and each
> checkout can run with a different index (e.g. different codecs or test
> index format changes).
> 
> I thought it might be interesting to you, because there's a variety of
> queries tested here like numeric range, sorting, primary-key lookup,
> span queries etc beyond the "standard" set of queries. The framework
> also ensures that you are bringing back the same results in the same
> order, runs multiple iterations (including iterations in new JVMs),
> makes it easy to test optimized, optimized with deletions,
> multi-segment, multi-segment with deletions, and can output to txt,
> html, jira format for convenience.
> 
> currently we are generally testing with a line file format from
> wikipedia, but besides geonames i wanted to point out that wikipedia
> does include lat/long information for many articles (this is a major
> source for much of geonames place data!).
> 
> it would definitely be cool if we could test spatial queries with this
> as well... e.g by parsing out the lat/long from the wikipedia XML and
> adding to the line files, and adding some spatial queries to the
> default list of queries being tested.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message