lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SOLR-7670) solr import files from multiple dataSource entity
Date Fri, 12 Jun 2015 14:09:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Heisey resolved SOLR-7670.
--------------------------------
    Resolution: Invalid

Issues like this should be brought up on the mailing list, to figure out whether there is
a bug or just a misconfiguration.

I'm going to guess that this is a misconfiguration, and I might know what it is:  You have
some nested entities, with ${files.fileAbsolutePath} used in the inner entity ... but you
don't have any entities named "files" ... the outer entity is files1 in the first nested case
and files2 in the second nested case.

If that is not the problem, please bring this issue up on the user mailing list.  Use a paste
website (perhaps http://apaste.info would work) to include the full stacktrace from the exception
and any configs.

https://lucene.apache.org/solr/resources.html#mailing-lists

I will mark this issue resolved.  If it turns out that there actually is a bug, we can re-open
it.

> solr import files from multiple dataSource entity
> -------------------------------------------------
>
>                 Key: SOLR-7670
>                 URL: https://issues.apache.org/jira/browse/SOLR-7670
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.1
>            Reporter: István Bakró Nagy
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am trying to import files from multiple folders.
> My solrconfig.xml invokes the following file to use it with org.apache.solr.handler.dataimport.DataImportHandler.
> <dataConfig>  
>     <dataSource type="BinFileDataSource" />
>         <document>
>             <entity name="files1"
>                     dataSource="null"
>                     rootEntity="false"
>                     processor="FileListEntityProcessor"
>                     baseDir="/w/PDF/"
>                     fileName=".*\.(pdf)|(doc)|(docx)|(ppt)|(pptx)|(xls)|(xlsx)|(odf)|(txt)|(rtf)|(html)|(htm)|(jpg)"
>                     onError="skip"
>                     recursive="true">
>                 <field column="fileAbsolutePath" name="id" />
>                 <field column="fileSize" name="size" />
>                 <field column="fileLastModified" name="lastModified" />
>                 <field column="file" name="fileName"/>
>                 <entity
>                     name="documentImport1"
>                     processor="TikaEntityProcessor"
>                     url="${files.fileAbsolutePath}"
>                     format="text">
>                     <field column="file" name="fileName"/>
>                     <field column="Author" name="author" meta="true"/>
>                     <field column="title" name="title" meta="true"/>
>                     <field column="text" name="text"/>
>                     <copyField source="content" dest="text"/>
>                 </entity>
>             </entity>
>             <entity name="files2"
>                     dataSource="null"
>                     rootEntity="false"
>                     processor="FileListEntityProcessor"
>                     baseDir="/w/KNOW-HOW/"
>                     fileName=".*\.(pdf)|(doc)|(docx)|(ppt)|(pptx)|(xls)|(xlsx)|(odf)|(txt)|(rtf)|(html)|(htm)|(jpg)"
>                     onError="skip"
>                     recursive="true">
>                 <field column="fileAbsolutePath" name="id" />
>                 <field column="fileSize" name="size" />
>                 <field column="fileLastModified" name="lastModified" />
>                 <field column="file" name="fileName"/>
>                 <entity
>                     name="documentImport2"
>                     processor="TikaEntityProcessor"
>                     url="${files.fileAbsolutePath}"
>                     format="text">
>                     <field column="file" name="fileName"/>
>                     <field column="Author" name="author" meta="true"/>
>                     <field column="title" name="title" meta="true"/>
>                     <field column="text" name="text"/>
>                     <copyField source="content" dest="text"/>
>                 </entity>
>             </entity>
>         </document> 
> </dataConfig>
> During import I get a FileNotFoundException.
> What am I missing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message