lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor
Date Tue, 30 Jul 2013 07:49:29 GMT
There's no BlobTransformer in DataImportHandler. You'll have to write one.
Also, you'd probably need to write a FieldInputStreamDataSource instead of
FieldReaderDataSource.


On Tue, Jul 30, 2013 at 12:30 PM, Raymond Wiker <rwiker@gmail.com> wrote:

> I have a case where I want to documents and metadata content from a
> datebase. The metadata is is not a problem, but it does not appear that I
> can handle the document content (held as BLOBS in the database) with
> out-of-the-box SOLR 4.4 functionality.
>
> I was hoping to to be able to solve this by doing something like the
> following:
>
> *DataImportHandler *extracts all the columns (fields), including the
> document (BLOB)
>
> *BlobTransformer *to extract the BLOB content
>
> *FieldReaderDataSource *as a bridge between the extracted BLOB and and Tika
>
> *TikeEntityExtractor *to extract the text and embedded metadata from the
> BLOB.
>
> The first problem is that "BlobTransfomer" does not appear to exist. It
> could be that I need to load some additional jar files, or it could be that
> the "BlobTransfomer" functionality is simply not part of the Solr
> distribution.
>
> Is there a way of handling this type of content using DataImportHandler, or
> do I need to write an external connector for it?
>



-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message