lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Martin <jmar...@path-works.com>
Subject Re: relational design in solr?
Date Fri, 22 Sep 2006 15:04:20 GMT
Chris,

I think what I am trying to do is actually much simpler than what you 
are talking about here.
I do plan on returning document ids and retrieving full entity data from 
the database- solr would
just be used for the search, not for results display.

The problem is that some data cannot be "flattened", for example when a 
document has repeating
fields that are complex types, such as address.

The best example I can think of is a resume database.  You could 
certainly just put the whole resume
document into the text index and do full text searches.  But to answer 
the question of what people
received a Harvard MBA in the last 10 years and have worked at Intel in 
the last 5 years, you have
to correlate the years of attendance with the schoolName entry.  
Otherwise you might be getting years
for some other education/work history entry.

By adding an objType field and combining search results, you can be sure 
that the year/schoolName
query matched a unique education record.  The tricky bit is in getting a 
list of field values (e.g. foreign
keys, which are essentially facets) for a result set very quickly.

If this can be done, figuring out a generic way of specifying multiple 
searches and relationships between
result sets (without reinventing SQL) becomes the challenge.

We'll see.  I have my doubts that it will work for any but the smallest 
of collections, which ours certainly
isn't.

Thanks --Joachim

Chris Hostetter wrote:

>While it's certianly possible to "join" the results of multiple indexes, i
>would do so only when absolutely neccessary -- in my experience the only
>time i've found that it makes sense, is when one aspect of the data
>changes extremely rapidly compared to everything else, making complex
>reindexing a pain, but reindexing just the changed data in it's own index
>is a lot more feasible.
>
>As a rule of thumb, when building "paginated" style search applications, I
>would advise people to try and flatten their index as much as possible, so
>that the application can do one "user query" (based on the users input)
>to get a single page of results, and then use the uniqueKeys from that
>page of results to lookup ancillary data from any other indexes (or
>databases that you need) -- the key being that all the data you want to
>search on, and all hte data you need to sort are in the index, but other
>data you needto return to the user can come from other sources.
>
>If you find yourself wanting to "join" to indexes for hte purposes of
>matching or sorting, the amount of work you wind up doing tends to be
>prohibitive on really large indexes -- and if your indxes aren't that
>large, it would probably just be easier to puteverything in one index and
>rebuild it frequently.
>
>: I am trying to integrate solr search results with results from a rdbms
>: query.  It's working ok, but fairly complicated  due to large size of
>: the results from the database, and many different sort requirements.
>:
>: I know that solr/lucene was not designed to intelligently handle
>: multiple document types in the same collection, i.e. provide join
>: features, but I'm wondering if anyone on this list has any thoughts on
>: how to do it in lucene, and how it might be integrated into a custom
>: solr deployment.  I can't see going back to vanilla lucene after solr!
>:
>: My basic idea is to add an objType field that would be used to define a
>: "table".  There would be one main objType, any related objTypes would
>: have a field pointing back to the main objs via id, like a foreign key.
>:
>: I'd run multiple parallel searches and merge the results based on
>: foreign keys, either using a Filter or just using custom code.  I'm
>: anticipating that iterating through the results to retrieve the foreign
>: key values will be too slow.
>:
>: Our data is highly textual, temporal and spatial, which pretty much
>: correspond to the 3 tables I would have.  I can de-normalize a lot of
>: the data, but the combination of times, locations and textual
>: representations would be way too large to fully flatten.
>:
>: I'm about to start experimenting with different strategies, and I would
>: appreciate any insight anyone can provide.  Would the faceting code help
>: here somehow?
>
>
>
>-Hoss
>  
>


Mime
View raw message