lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: Building documents using content residing both in database tables and text files
Date Tue, 11 Aug 2009 15:00:52 GMT
isn't it possible to do this by having two datasources (one Js=dbc and
another File) and two entities . The outer entity can read from a DB
and the inner entity can read from a file.


On Tue, Aug 11, 2009 at 8:05 PM, Sascha Szott<szott@zib.de> wrote:
> Hello,
>
> is it possible (and if it is, how can I accomplish it) to configure DIH to
> build up index documents by using content that resides in different data
> sources?
>
> Here is an example scenario:
> Let's assume we have a table T with two columns, ID (which is the primary
> key of T) and TITLE. Furthermore, each record in T is assigned a directory
> containing text files that were generated out of pdf documents by using
> Tika. A directory name is build by using the ID of the record in T
> associated to that directory, e.g. all text files associated to a record
> with id = 101 are stored in direcory 101.
>
> Is there a way to configure DIH such that it uses ID, TITLE and the content
> of all related text files when building a document (the documents should
> have three fields: id, title, and text)?
>
> Furthermore, as you may have noticed, a second question arises naturally:
> Will there be any integration of Solr Cell and DIH in an upcoming release,
> so that it would be possible to directly use the pdf documents instead of
> the extracted text files that were generated outside of Solr?

This is something I wish to see. But there has been no user request
yet. You can raise an issue and it can be looked upon
>
> Best,
> Sascha
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Mime
View raw message