lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scorpking <lehoank1...@gmail.com>
Subject Re: indexing data from rich documents - Tika with solr3.1
Date Tue, 13 Sep 2011 03:53:27 GMT
Hi, 
Can you explain me this problem?
I have indexed data from multi file which use tika libs. And i have indexed
data from http. But only one file (ex: http://myweb/filename.pdf). Now i
have many file formats in a http path (ex:http://myweb/files/). I tried
index data from a http path but it's not work. It is my data-config. 

*<dataConfig>
    <dataSource type="BinURLDataSource" name="bin" encoding="utf-8"/>
    <document>
		<entity name="sd" processor="FileListEntityProcessor"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)"
baseDir="http://www.lc.unsw.edu.au/onlib/pdf/"
				recursive="true" rootEntity="false" transformer="DateFormatTransformer"
> 
				
        <entity name="tika-test" processor="TikaEntityProcessor"
url="${sd.fileAbsolutePath}" format="text" dataSource="bin" >
				
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/>
								
        </entity>
				 <field column="file" name="filename"/> 
				 
		</entity>
    </document>
</dataConfig>*

Error: 
Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory
Processing Document # 1
	at
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)

Thanks for your help.


--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message