lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaijanath N. Rao" <>
Subject Re: Lucene Indexing structure
Date Wed, 07 May 2008 11:36:16 GMT
Hi Aaron,

I looked into, and have already 
mailed Mathias who is the author of the tool. The problem with the tool 
is that it iterates over each document in linear fashion. I have got one 
of the solutions, which was to cluster he images outside lucene using 
either SOM (self Organizing map) or any other clustering/classification 
algorithm and than index the images and it's features in lucene with the 
cluster id.

So now when a search happens first I retrieve the cluster id and than I 
search in lucene for all the images having this cluster-id. Once I get 
all the images within the cluster Id, I do the re-ranking based on the 
distance (let's say euclidean). Which reduces some time computation.

The above design is also scalable as at any point of time I know there 
will be few clusters and I would have to iterate over only those images 
which are within a cluster.  But yes still it might have a bottleneck. 
You can help me out in making this better.

I will also look into what Glen suggested, but not sure how to go about 
it. But it's definitely worth a try.

--Thanks and Regards

Aaron Schon wrote:
> Take a look at the Lire project:
> 2008/4/26 Vaijanath N. Rao <>:
>> Hi Lucene-user and Lucene-dev,
>>  I want to use lucene as an backend for the Image search (Content based
>> Image retrieval).
>>  Indexing Mechanism:
>>  a) Get the Image properties such as Texture Tamura (TT), Texture Edge
>> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
>> Color Correlogram  (CC) .
>>  b) Convert each of these vector into String and index into lucene as
>> fields, thush each Image (document in terms of lucene) consist of 6 fields
>> Image name, TT field, TE field, CCV field, CH field and CC field.
>>  Searching Mechanism:
>>  a) For the search Image convert the Image into the above 5 properties.
>>  b) for every field and for every value within the field construct the
>> query, For example let's say the user wants to search only Color histogram
>> based similarity and the query Image has 3 1 4 5 as the CH value the query
>> will look like.
>>    query = "CH:3 CH:1CH:4 CH:5"
>>  c) for the results returned convert all the field values back into float
>> and do the distance computation and re-rank the document with lower the
>> distance on the top and larger distance at the bottom.
>>  for example:
>>    For above query assume that output has two documents
>>    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
>> the distance computation will rank the second document higher than the
>> first.
>>  Obviously there is something wrong with the above approach (as to get the
>> correct document we need to get all the documents and than do the required
>> distance calculation), but that' due to lack of my knowledge of Luce and
>> lucene's Index storage.
>>  What I want to know how to improve upon the exsisting architecture other
>> than making number of fields in the lucene equalling to total number of
>> feature*size of each feature.
>>  Any other pointer will be welcomed. Is there is any Range tree
>> implementation within lucene which I can use for this operation.
>>  --Thanks and Regards
>>  Vaijanath N. Rao
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail:
>>  For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message