lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Wiker <rwi...@gmail.com>
Subject DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor
Date Tue, 30 Jul 2013 07:00:28 GMT
I have a case where I want to documents and metadata content from a
datebase. The metadata is is not a problem, but it does not appear that I
can handle the document content (held as BLOBS in the database) with
out-of-the-box SOLR 4.4 functionality.

I was hoping to to be able to solve this by doing something like the
following:

*DataImportHandler *extracts all the columns (fields), including the
document (BLOB)

*BlobTransformer *to extract the BLOB content

*FieldReaderDataSource *as a bridge between the extracted BLOB and and Tika

*TikeEntityExtractor *to extract the text and embedded metadata from the
BLOB.

The first problem is that "BlobTransfomer" does not appear to exist. It
could be that I need to load some additional jar files, or it could be that
the "BlobTransfomer" functionality is simply not part of the Solr
distribution.

Is there a way of handling this type of content using DataImportHandler, or
do I need to write an external connector for it?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message