lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swaraj Kumar <>
Subject Re: What is the best way of Indexing different formats of documents?
Date Tue, 07 Apr 2015 12:31:53 GMT
You can always choose either DIH or /update/extract to index docs in solr.
Now there are multiple benefits of DIH which I am listing below :-

1. Clean and update using a single command.
2. DIH also optimize indexing using optimize=true
3. You can do delta-import based on last index time where as in case of
/update/extract you need to do manual operation in case of delta import.
4. You can use multiple entity processor and transformers in case of DIH
which is very useful to index exact data you want.
5. Query parameter "rows" limits the num of records.


Swaraj Kumar
Senior Software Engineer I
Mob No- 9811774497

On Tue, Apr 7, 2015 at 4:18 PM, <> wrote:

> Hi,
> I am a newbie to SOLR and basically from database background. We have a
> requirement of indexing files of different formats (x12,edifact, csv,xml).
> The files which are inputted can be of any format and we need to do a
> content based search on it.
> From the web I understand we can use TIKA processor to extract the content
> and store it in SOLR. What I want to know is, is there any better approach
> for indexing files in SOLR ? Can we index the document through streaming
> directly from the Application ? If so what is the disadvantage of using it
> (against DIH which fetches from the database)? Could someone share me some
> insight on this ? ls there any web links which I can refer to get some idea
> on it ? Please do help.
> Thanks
> Sangeetha

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message