lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sascha Szott <sz...@zib.de>
Subject Building documents using content residing both in database tables and text files
Date Tue, 11 Aug 2009 14:35:14 GMT
Hello,

is it possible (and if it is, how can I accomplish it) to configure DIH 
to build up index documents by using content that resides in different 
data sources?

Here is an example scenario:
Let's assume we have a table T with two columns, ID (which is the 
primary key of T) and TITLE. Furthermore, each record in T is assigned a 
directory containing text files that were generated out of pdf documents 
by using Tika. A directory name is build by using the ID of the record 
in T associated to that directory, e.g. all text files associated to a 
record with id = 101 are stored in direcory 101.

Is there a way to configure DIH such that it uses ID, TITLE and the 
content of all related text files when building a document (the 
documents should have three fields: id, title, and text)?

Furthermore, as you may have noticed, a second question arises 
naturally: Will there be any integration of Solr Cell and DIH in an 
upcoming release, so that it would be possible to directly use the pdf 
documents instead of the extracted text files that were generated 
outside of Solr?

Best,
Sascha


Mime
View raw message