manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: MCF not indexing documents due to mime-type
Date Wed, 20 Dec 2017 12:53:10 GMT
Hi Phil,

For some output connectors, they *only* accept text documents.  That's why
you need to run your documents through Tika first.  So your original setup
was right.

If you are still using ElasticSearch, you can make it accept non-text
documents only by specifying the mapper attachment in the output connection
configuration.



Karl


On Wed, Dec 20, 2017 at 4:25 AM, Phillip Rhodes <motley.crue.fan@gmail.com>
wrote:

> MCF folks:
>
> I'm about to tear my hair out over this one... I just realized that
> I've been running MCF with the "Use the Extract Update Handler:"
> option checked.  Suspecting this might be related to another issue I
> was having (content was not being stored in the field named in the
> "Content field name:" option in MCF), I turned this option off.
>
> Now, MCF happily rejects nearly every document in my repository with this:
>
> Result Code: EXCLUDEDMIMETYPE
> Result Description: Excluding document because of mime type
> (application/pdf)
> (and so on for many other mime types)
>
> So... this is *not* what I would expect to happen as I have nothing at
> all listed in the "excluded mime types" setting for this output
> connector.  With nothing explicitly excluded, I would (perhaps
> naively) expect all mime types to be sent to Solr.
>
> But what makes it even worse is this: even when I explicitly add types
> (for example, application/pdf) to the "included mime types" setting
> and re-index, I *still* get the same message and no PDF files are
> indexed.
>
> Any ideas?  Is this a bug, or is there something else I need to do?
>
>
>
> Thanks,
>
>
> Phil
> ~~~
> This message optimized for indexing by NSA PRISM
>

Mime
View raw message