lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Stone <>
Subject Re: Using Lucene to match document sets to each other
Date Fri, 16 Dec 2011 17:53:35 GMT
Thanks for the response Donna. That would make more sense, but the items
I'm pulling in from the web contain large bodies of text (descriptions)
whereas the products in my catalog consist of shorter fields such as
product name, manufacturer, product code, etc. So using the smaller fields
from my catalog to build queries against the larger fields in the items I
pull in seems to be the only way to do things (that I can think of).

And this brings up my exact problem. I have a document (set of fields) that
I want to use as search criteria for a search against another set of
documents. Can something like this be done?


On Fri, Dec 16, 2011 at 5:02 AM, Donna L Gresh <> wrote:

> Maybe I'm misunderstanding what you're trying to do, but why not do it the
> other
> way around; that is, index the items in your catalog, and use the items on
> the web
> as the query into the catalog. I have an analogous process (though
> completely
> different application area) and I index the stuff that doesn't change
> much, and use the
> things that are constantly changing as the query.
> Donna L. Gresh
> Business Analytics and Mathematical Sciences
> IBM T.J. Watson Research Center
> (914) 945-2472
> From:
> Josh Stone <>
> To:
> Date:
> 12/15/2011 04:57 PM
> Subject:
> Using Lucene to match document sets to each other
> I have a use case for which I'm trying to figure out the best way to use
> Lucene and could use some guidance.
> I have a set of documents representing products in a catalog (name,
> description, etc.). I then pull down data from different sources such as
> Ebay and Amazon and need to determine if the items retrieved from those
> sources match any of the products in the catalog. So I'm essentially
> attempting to take many items and many products and determine where I have
> matches.
> I'm not sure the best way to go about this, but one questionable approach
> is to index the items as I pull them in (to RAM) and do one search for
> every product in my catalog, looking for matching names or descriptions.
> This means an almost exponential number of queries though. Is there a
> better approach? Any help is appreciated.
> Thanks,
> Josh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message