lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From solr2020 <psgoms...@gmail.com>
Subject Re: Apache Solr.
Date Mon, 03 Feb 2014 17:28:55 GMT
You can have this kind of configuration in Data import handler xml file to
index different type of files.

<dataConfig>
<dataSource type="BinFileDataSource" />    
<document>  
<entity name="files" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor" baseDir="(enter the file repository
path)"
fileName=".*.(doc)|(pdf)|(docx)|(txt)|(ppt)|(xls)|(xlsx)|(sql)|(vsd)|(zip)"
onError="skip" recursive="true">
		  <field column="fileAbsolutePath" name="id" />
                <field column="fileSize" name="size" />
                <field column="fileLastModified" name="lastModified" />
<entity name="tika-documentimport" processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}" format="text">  
			<field column="File" name="fileName"/>
             <field column="Author" name="author" meta="true"/>
</entity>  
</entity>  
</document>
</dataConfig>

Hope this helps.




--
View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message