lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scorpking <lehoank1...@gmail.com>
Subject Re: indexing data from rich documents - Tika with solr3.1
Date Mon, 19 Sep 2011 10:04:00 GMT
yeah, i want to use DIH and i tried config my file dataconfig. but it is
wrong. This is my config:

*<dataConfig>
    <dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://ipAddress;databaseName=VTC_Edu" user="myuser"
password="mypass"  name="VTCEduDocument"/>
	
	<dataSource type="BinURLDataSource" name="dsurl"/>
    
	<document>
		
		<entity name="VTCEduDocument" pk="pk_document_id" query="select TOP 10
pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]"			

	
transformer="vn.vtc.solr.transformer.ImageFilter,vn.vtc.solr.transformer.RemoveHTML,RegexTransformer,TemplateTransformer,vn.vtc.solr.transformer.vntransformer,vn.vtc.solr.correctUnicodeString.correctUnicodeString,vn.vtc.solr.unescapeHtmlString.UnescapeHtmlString,vn.vtc.solr.correctISOString.correctISOString"
>
                <field column="pk_document_id" name="pk_document_id" />				
				<field column="s_path_origin" name="s_path_origin" />						
		</entity>
		
		<entity processor="TikaEntityProcessor" dataSource="dsurl" format="text"
url=
"http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}">
				<field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/> 
      </entity>
  
    </document>
</dataConfig>*

And here error: 
*EVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url null Processing Document # 1
	at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
	at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89)
	at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38)
	at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
	at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10
pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]
	at java.net.URL.<init>(URL.java:567)
	at java.net.URL.<init>(URL.java:464)
	at java.net.URL.<init>(URL.java:413)
	at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81)
	... 10 more*

???
Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message