lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <>
Subject Re: Can lucene documents have several thousand attributes each?
Date Fri, 23 May 2014 15:15:14 GMT
Another feature that might be useful, and that might not be obvious at first, is that document
tokens can have custom payloads, so you could encode arbitrary binary metadata in them.

Then at search time, maybe override the Similarity class to leverage those payloads.

Non-trivial, but likely do-able.

Mark Bennett / LucidWorks: Search & Big Data /
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On May 23, 2014, at 1:29 AM, Leighton Hargreaves <>

> Thanks for the responses, I didn't even realise there was a spatial feature.  The distances
I need to search for, though, are the minimum distances between arbitrarily complex 3D geometry
(the geometry itself wouldn't be represented in lucene, only metadata about it).  So I want
to calculate these minimum distances within my own geometry engine, and then pass the calculated
distances into lucene/solr.  
> So really my question is, what is the best way to represent values which relate to 2
documents, so they I can search for documents 'in relation to' another document?  (in this
case the relation is an externally-calculated distance).
> -----Original Message-----
> From: Ted Dunning [] 
> Sent: 21 May 2014 22:19
> To:
> Subject: Re: Can lucene documents have several thousand attributes each?
> Also, you can use 2D projections with AND to limit the number of documents you need to
compute distances on.
> On Wed, May 21, 2014 at 10:29 AM, <>
>> Hi Leighton,
>> I’m assuming you’re suggesting going about it this way instead of 
>> using the Lucene/Solr spatial feature is because it’s not a 2D 
>> distance?  Solr actually supports n-dimensional Euclidean distance 
>> calculation with this function query (aka Valuesource):
>> dist(2, x,y,z,0,0,0): Euclidean distance between (0,0,0) and (x,y,z) 
>> for each document
>> On Wed, May 21, 2014 at 12:30 PM, Leighton Hargreaves < 
>>> wrote:
>>> Hello Lucene project.
>>> I'm in the process of evaluating lucene for a project where we will 
>>> need to search a large set of 3D objects by various attributes.  In 
>>> many ways, lucene's functionality seems perfect.
>>> But one thing I'm not sure of: we need to find the set of objects 
>>> that
>> are
>>> within a given distance of any given object.
>>> One solution would to add a numeric field to each 3D object, for 
>>> each other 3D object, with a name such as 'distance_to_<other_object_id_1>'.
>>> This would allow us to find objects within a given distance of a 
>>> given object with a query like 'distance_to_<object_id>:[ *to 
>>> <max_distance>
>> ]'.
>>> But this would mean each 3D object would have several thousand
>> attributes,
>>> one for every other 3D object.  Would this be a prohibitively 
>>> expensive
>> way
>>> to do it?
>>> Another solution would be to handle the spatial aspect within my own 
>>> software ie filter lucene's results according to distance.  But I 
>>> worry that this would negatively affect performance by causing the 
>>> set of
>> results
>>> returned to my code to be large, prior to filtering by my own software.
>>> I apologise if the question is confusing or badly explained, I'm 
>>> just asking in case it turns out to be a standard class of problem 
>>> with good existing solutions.
>>> Regards,
>>> Leighton Hargreaves

View raw message