lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Optimal way to index
Date Mon, 11 Feb 2013 16:33:05 GMT
You can certainly use lucene for this, and it will be blindingly fast
even if you use a disk based index.

Just index documents as you've laid it out, with the field you want to
search on added as indexable and the others stored.

I've never used Guava Table so can't comment on that, but with only a
few thousand words it would certainly be feasible to use something
like that.  Better?  I don't know.

Personally I'd probably go with lucene as I'd be positive it would a)
work and b) be fast even if the thousands ending being tens of
thousands, or more.


On Mon, Feb 11, 2013 at 3:14 PM, Mohammad Tariq <> wrote:
> Hello list,
>          I have a scenario wherein I need an in-memory index as I need
> faster search. The problem goes like this :
> I have a list which contains a couple of thousands words. Each word has a
> corresponding ID and a list of synonyms. The actual word is a column in my
> Hbase table. I get files which contain values for this column and I have to
> extract values from these files and put them into the appropriate column.
> But sometimes files may contain the synonym instead of the actual word.
> Now, this is the place where index come into picture. I should have an
> index that contains all the words along with its ID and all the synonyms
> and it should be in-memory always so that inserts into Hbase are quick.
> Something like this :
>  ID          WORD           SYNONYMS
>  13991     A                  a, A, Aa, aa, AA
> Then the index should be something like this :
> a    A   13991
> A    A   13991
> Aa  A   13991
> aa   A   13991
> AA  A   13991
> So that if I get 'a' in the file, I should be able to do a lookup and index
> should give me 'A' along with '13991'. I need both the base name and the
> ID. The names could even be strings of 4 to 5 words.
> I have all this information stored in a Hbase table having two columns
> where the first column contains the actual word and the second column
> contains the entire list of synonyms. And the rowkey is the ID.
> Now. I am not getting whether it is feasible to use Lucene to get this or
>  should I go with something like 'Guava Table' or something else. Need some
> guidance as being new to Lucene I am not able to think in the right
> direction. If it is feasible to use Lucene to achieve this how to do it
> efficiently?
> I am using Hbase filters right now to do the fetch which is slowing down
> the process.
> I am sorry if my questions sound too childish or senseless as I am not very
> good at Lucene. Thank you so much for your valuable time.
> Warm Regards,
> Tariq

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message