lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hal Roberts <>
Subject Re: indexing db records via SolrJ
Date Mon, 16 Mar 2015 15:08:21 GMT
We import anywhere from five to fifty million small documents a day from 
a postgres database.  I wrestled to get the DIH stuff to work for us for 
about a year and was much happier when I ditched that approach and 
switched to writing the few hundred lines of relatively simple code to 
handle directly the logic of what gets updated and how it gets queried 
from postgres ourselves.

The DIH stuff is great for lots of cases, but if you are getting to the 
point of trying to hack its undocumented internals, I suspect you are 
better off spending a day or two of your time just writing all of the 
update logic yourself.

We found a relatively simple combination of postgres triggers, export to 
csv based on those triggers, and then just calling update/csv to work 
best for us.


On 3/16/15 9:59 AM, Shawn Heisey wrote:
> On 3/16/2015 7:15 AM, sreedevi s wrote:
>> I had checked this post.I dont know whether this is possible but my query
>> is whether I can use the configuration for DIH for indexing via SolrJ
> You can use SolrJ for accessing DIH.  I have code that does this, but
> only for full index rebuilds.
> It won't be particularly obvious how to do it.  Writing code that can
> intepret DIH status and know when it finishes, succeeds, or fails is
> very tricky because DIH only uses human-readable status info, not
> machine-readable, and the info is not very consistent.
> I can't just share my code, because it's extremely convoluted ... but
> the general gist is to create a SolrQuery object, use setRequestHandler
> to set the handler to "/dataimport" or whatever your DIH handler is, and
> set the other parameters on the request like "command" to "full-import"
> and so on.
> Thanks,
> Shawn

Hal Roberts
Berkman Center for Internet & Society
Harvard University

View raw message