lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2416) Solr Cell fails to index Zip file contents
Date Wed, 12 Jul 2017 20:11:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084599#comment-16084599
] 

Tim Allison commented on SOLR-2416:
-----------------------------------

This should have been fixed by SOLR-7189, no?  Or am I confusing DIH and Solr Cell?

In Tika 1.15 (TIKA-2096), we changed the default behavior to add an embedded parser if a user
fails to pass one in via the parse context.  So, if we upgrade to Tika 1.16 (just out), this
will be fixed, too.  We'll probably want to let Solr users configure turning off embedded
document handling...

> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
>                 Key: SOLR-2416
>                 URL: https://issues.apache.org/jira/browse/SOLR-2416
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Jayendra Patil
>             Fix For: 6.0
>
>         Attachments: SOLR-2416_ExtractingDocumentLoader.patch, SOLR-4216.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java)
and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have reappeared
with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message