lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scr...@asia.com
Subject Re: Import XML files different format?
Date Wed, 23 Jun 2010 12:38:04 GMT
Thanks Eric for your answer.

I'll try to use DIH via data-config.xml as i might index other content with different XML
structure in the futur... 

Will i need to have different data-config for each XML strucure content file? And then manualy
cange between them?



 

 


 

 

-----Original Message-----
From: Erik Hatcher <erik.hatcher@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, Jun 23, 2010 2:19 pm
Subject: Re: Import XML files different format?


You can use DataImportHandler's XML/XPath capabilities to do this: 
 
  <http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource >

 
or you could, of course, convert your XML to Solr's XML format. 
 
Another fine option for what this data looks like, CSV format. 
 
I'd imagine you have the orginal data in a relational database though? 
 
   Erik 
 
On Jun 23, 2010, at 7:59 AM, scrapy@asia.com wrote: 
 
> Hi, 
> 
> I'm new to solr. It looks great. 
> 
> I would like to add a XML document in the following format in solr: 
> 
> <?xml version="1.0" encoding="utf-8"?> 
> <race> 
> <go> 
>    <id><![CDATA[...]]></id> 
>    <title><![CDATA[...]]></title> 
>    <url><![CDATA[...]]></url> 
>    <content><![CDATA[...]]></content> 
>    <city><![CDATA[...]]></city> 
>    <postcode><![CDATA[...]]></postcode> 
>    <contract><![CDATA[...]]></contract> 
>    <category><![CDATA[...]]></category> 
>    <date><![CDATA[...]]></date> 
>    <time><![CDATA[...]]></time> 
> </go> 
> 
> etc... 
> </race> 
> 
> 
> 
> Is there a way to do this? If yes how? 
> 
> Or i need to convert it with some scripts to this: 
> 
> <add> 
> <doc> 
>   <field name="authors">Patrick Eagar</field> 
>   <field name="subject">Sports</field> 
> etc... 
> 
> 
> Thanks for your help 
> 
> Regards 
 

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message