lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Martin <>
Subject Re: Simple Faceted Searching out of the box
Date Fri, 22 Sep 2006 20:02:45 GMT
I think you will find that this architecture is quite common.  What 
commercial packages
provide (remember you are getting this for free!) are the tools for 
managing the dynamic
export of data out of your database into the full-text search engine.

Solr provides a very easy way to do this, but yes, you have to do some 
to automate it.

Two common ways of doing this.  1) write a component that periodically 
checks for
new/updated database content and submits it to solr.  2) write a trigger 
in the database
that immediately posts to solr (I would use JMS or some other 
asynchronous messaging
system for this).  I'm sure there are other solutions.

When/if MYSQL full text search is as good as solr/lucene, you can cut 
out one of the steps.

I could see a component added to solr that did #1 above for you.  MG4j 
has a simple
loader that takes a SQL query and indexes the result 
(JdbcDocumentCollection). For
Solr, you'd want to be able to handle muti-valued fields, which 
complicates things.

If this architecture bothers technical folks, they either are accustomed 
to using very
expensive software, or haven't been doing this very long.

Of course, I am trying to figure out a way to make Solr more like a 
database, so there
you go...


Tim Archambault wrote:

> Okay, I'll use an example.
> A recruitment (jobs) customer goes onto our website and posts an 
> online job
> posting to our newspaper website. Upon insert into the database, I 
> need to
> generate an xml file to be sent to SOLR to ADD as  a record to the search
> engine. Same  goes for an edit, my database updates the record and then I
> have to send an ADD statement to Solr again to commit my change. 2x the
> work.
> I've been talking with other papers about Solr and I think what 
> bothers many
> is that there a is a deposit of information in a structured database here
> [named A], then we have another set of basically the same data over here
> [named B] and they don't understand why they have to manage to different
> sets of data [A & B] that are virtually the same thing.  Many foresee a
> maintenance nightmare. I've come to the conclusion that there's 
> somewhat of
> a disconnect between what a database does and what a search engine 
> does. I
> accept that the redundancy is necessary given the very different tasks 
> that
> each performs [keep in mind I'm still naive to the programming details 
> here,
> I understand conceptually].
> In writing this to you another thought came to mind. Maybe there are
> alternative ways to inject records into Solr outside the bounds of the
> cygwin and CURL examples I've been using. Maybe that is the question 
> we need
> to be asking. What are some alternative ways to populate Solr?
> Enough said, it's Friday afternoon.
> Have a great weekend.
> Tim
> On 9/22/06, Erik Hatcher <> wrote:
>> On Sep 22, 2006, at 2:45 PM, Tim Archambault wrote:
>> > I believe there's a way to access MSSQL, MySQL etc. directly with
>> > Lucene,
>> > but not sure how to do this with SOLR.
>> Nope.  Lucene is a pure search engine, with no hooks to databases, or
>> document parsers, etc.  Lots of folks have built these kinds of
>> things on top of Lucene, but the Lucene core is purely the text engine.
>> How would you envision communicating with Solr with a database in the
>> picture?   How would the entire database be initially indexed?  How
>> would changes to the database trigger Solr updates?   I'm not quite
>> clear on what it would mean for Solr to work with a database directly
>> so I'm curious.
>>         Erik

View raw message