lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: lucene newbie question
Date Mon, 02 Oct 2006 19:09:24 GMT
Another Erick (note the correct spelling <G>). See below..

On 10/2/06, Los Morales <> wrote:
> Hi Erik,
> Thanks for the response.
> >Consider the index in the back of a book.  You could tear that out  and
> >still use it to tell what page something is on, but you have no  actual
> >content in hand.
> So, I guess what I'm having a hard time trying to figure out is, what's
> the
> point of having an index when you can't search/retrieve the contents of a
> field in the index since it is not stored?  Isn't the whole point of
> having
> an index is to be able to search and retrieve the contents efficiently?

Your confusion here, I think, is that you CAN search on an unstored field.
Consider a book. I want to show the user the titles of the most-relevant
books. If I store the text of the entire book, it bloats the size of the
index markedly. So, I index the text but do NOT store it. Now I can show my
titles in relevancy order (when searched over the entire text), but don't
have to pay the penalty size-wise. What I can't do in this case is
reconstruct the book from the index because I didn't store the text. But I
can search it, which is what my app requires.

Basically I'm not sure the points of UnIndexed and UnStored fields types.
> Say I use a field type "unindexed" for my SSN.  I know its stored in the
> index but how am I suppose to retrieve it?

You'd search on what you *have* indexed, get the doc (from the index), and
then read the field. Something like

String s = Hits.doc(52).get("SSN");

I'm doing this now since we have images stored with internal IDs on a
separate file system. I *never* care to allow the user to search by our
internal ID number. So I index the caption, and STORE but do not INDEX the
internal ID. We provide a page full of links (in relevancy order) and when
the user clicks on one, use the stored internal ID to fetch the right image.

As for the unstored, its like the scenario I described above... I see the
> fields in the index but I won't be able to search/retrieve it since I
> don't
> have the contents.  The "text" field type makes sense to me (with data
> being
> a String), as well as the type "keyword".
> Is there a scenario or scenarios you can describe where Unindexed/Unstored
> will be useful?  Thanks in advanced!

Again, you can search unstored fields. You just can't reconstruct the input
with 100% fidelity (things like stop words will be missing, and any funky
games you played during indexing will mess up an attempt to reconstruct the

Hope this helps.

> >From: Erik Hatcher <>
> >Reply-To:
> >To:
> >Subject: Re: lucene newbie question
> >Date: Mon, 2 Oct 2006 14:12:25 -0400
> >
> >
> >On Oct 2, 2006, at 2:08 PM, Los Morales wrote:
> >>I'm new to Lucene and IR in general.  I'm a bit confused on the  concept
> >>of fields.  From what I've read, a field does not have to  be indexed
> but
> >>its value can be stored in an index.  Likewise a  field can be indexed
> but
> >>its value is not stored in an index.  Now  how can a field be searchable
> >>when its value is not stored in the  index and vice-versa?  Again, I'm
> new
> >>to the Index/Search  paradigm.  Thanks in advanced.
> >
> >Consider the index in the back of a book.  You could tear that out  and
> >still use it to tell what page something is on, but you have no  actual
> >content in hand.  When a field is tokenized (and therefore  implicitly
> >indexed), it is run through the specified Analyzer and the  terms emitted
> >are indexed, but the original text may or may not also  be stored in the
> >index.
> >
> >Make sense?
> >
> >       Erik
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail:
> >For additional commands, e-mail:
> >
> _________________________________________________________________
> Be seen and heard with Windows Live Messenger and Microsoft LifeCams
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message