lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sriram Sankar <san...@gmail.com>
Subject Re: posting list strings
Date Fri, 12 Jul 2013 19:54:37 GMT
Thanks!


On Tue, Jul 9, 2013 at 2:34 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> You can replace the term by their hash directly in the analyzer chain.
> Just write a custom TermToBytesRef attribute that hashes the term to a
> constant-length byte[] (using a AttributeFactory)! :-) This would give you
> all features of hashed, constant length terms, but you would lose prefix
> and wildcard queries. In fact, NumericTokenStream is doing this for numeric!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Adrien Grand [mailto:jpountz@gmail.com]
> > Sent: Tuesday, July 09, 2013 11:25 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: posting list strings
> >
> > Hi,
> >
> > Lucene stores the string because it may need it to run prefix or range
> > queries. We don't have a hash-based terms dictionary right now but I know
> > some people wrote one since they don't need support for these queries,
> see
> > for instance the Earlybird paper[1]. Then if you can find a perfect
> hashing
> > function, you can just replace your terms by their hash.
> >
> > [1]
> > http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.
> > pdf
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message