lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Iturbe <marc...@santiago.cl>
Subject Re: Problems with DIH and missing fields.
Date Fri, 01 Apr 2011 15:31:28 GMT
Solved it!

commonField="true"
should be
commonField="false"

mistakes that happen when copying source a sample proyect...

Thanks for your help.


On Fri, Apr 1, 2011 at 10:29 AM, Marcelo Iturbe <marcelo@santiago.cl> wrote:

>
> Hello,
> I was able to repeat this behaviour in Solr 3.1.0
>
> The procedure is
>  - rename the directory example-DIH/rss to example-DIH/gcontacts
>  - modify solrconfig.xml to only load gcontacts
>  - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see
> content below)
>  - modify schema.xml
>
> This is from my schema.xml
>     <field name="source" type="text" indexed="true" stored="true" />
>     <field name="source-link" type="string" indexed="false" stored="true"
> />
>
>     <field name="title" type="string" indexed="true" stored="true" />
>     <field name="link" type="string" indexed="true" stored="true" />
>
>     <field name="email" type="string" indexed="true" stored="true"
> multiValued="true" default=" "/>
>     <field name="phoneNumber" type="string" indexed="true" stored="true"
> multiValued="true"  default=" "/>
>     <field name="organization" type="string" indexed="true" stored="true"
> multiValued="true"  default=" "/>
>
>     <field name="postalAddress" type="string" indexed="true" stored="true"
> multiValued="true"  default=" "/>
>
>     <field name="all_text" type="text" indexed="true" stored="true"
> multiValued="true" />
>     <copyField source="title" dest="all_text" />
>     <copyField source="email" dest="all_text" />
>     <copyField source="phoneNumber" dest="all_text" />
>     <copyField source="organization" dest="all_text" />
>     <copyField source="postalAddress" dest="all_text" />
>
> this is my gcontacts-data-config.xml file
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="gcontacts"
>                 pk="link"
>                 url="http://172.16.0.30/sayt2/contacts/testtim.xml"
>                 processor="XPathEntityProcessor"
>                 forEach="/feed/entry"
>                 >
>
>             <field column="source" xpath="/feed/entry/id"
> commonField="true" />
>             <field column="source-link"
> xpath="/feed/entry/link[@rel='edit']/@href" commonField="true" />
>
>             <field column="title" xpath="/feed/entry/title"
> commonField="true"/>
>             <field column="link"
> xpath="/feed/entry/link[@rel='edit']/@href" />
>             <field column="email" xpath="/feed/entry/email/@address"
> commonField="true"/>
>             <field column="phoneNumber" xpath="/feed/entry/phoneNumber"
> commonField="true"/>
>             <field column="organization" xpath="/feed/entry/organization"
> commonField="true"/>
>             <field column="postalAddress"
> xpath="/feed/entry/postalAddress"  commonField="true"/>
>         </entity>
>     </document>
> </dataConfig>
>
> This is from my solrconfig.xml file
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <solr sharedLib="lib" persistent="true">
>     <cores adminPath="/admin/cores">
>         <core default="false" instanceDir="gcontacts" name="gcontacts"/>
>     </cores>
> </solr>
>
> Thanks for your help.
>
> Regards
>
>
> On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis <
> matheis.stefan@googlemail.com> wrote:
>
>> Marcelo,
>>
>> could you paste the relevant parts of your DIH config?
>>
>> Regards
>> Stefan
>>
>> On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe <marcelo@santiago.cl>
>> wrote:
>> > Hello,
>> > I have an XML which contains personal contacts. Not all contacts have
>> the
>> > same fields (email, phone, postal).
>> >
>> > The problem is that when certain fields are NOT present,  SOLR is
>> injecting
>> > the previous contacts data.
>> >
>> > For example, assume the following from the XML feed:
>> > <entry>
>> >        <title type='text'>Jane Doe</title>
>> >        <gd:email rel='http://schemas.google.com/g/2005#work' address='
>> > jane.doe@gmail.com' primary='true'/>
>> >        <gd:postalAddress rel='http://schemas.google.com/g/2005#home
>> > '>Santiago
>> >            Region Metropolitana
>> >        Chile</gd:postalAddress>
>> >    </entry>
>> >    <entry>
>> >        <title type='text'>Jeff Smith</title>
>> >        <gd:email rel='http://schemas.google.com/g/2005#work' address='
>> > jeff.smith@gmail.com' primary='true'/>
>> >    </entry>
>> >    <entry>
>> >        <title type='text'>Ana Mercurio</title>
>> >        <gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
>> > primary='true'>+56912345678</gd:phoneNumber>
>> >    </entry>
>> >
>> > The second contact, will have the first contacts postal address.
>> > The third contact, will have Janes Postal Address and Jeffs email
>> address:
>> >
>> > <lst>
>> >    <arr name="title">
>> >        <str>Ana Mercurio</str>
>> >    </arr>
>> >    <arr name="phoneNumber">
>> >        <str>+5612345678</str>
>> >    </arr>
>> >    <arr name="email">
>> >        <str>jeff.smith@gmail.com</str>
>> >    </arr>
>> >    <arr name="postalAddress">
>> >        <str>Santiago
>> >            Region Metropolitana
>> >        Chile</str>
>> >    </arr>
>> > </lst>
>> >
>> > This is how I have the fields specified in the schema.xml file:
>> >    <field name="email" type="string" indexed="true" stored="true"
>> > multiValued="true" default=" "/>
>> >    <field name="phoneNumber" type="string" indexed="true" stored="true"
>> > multiValued="true"  default=" "/>
>> >    <field name="postalAddress" type="string" indexed="true"
>> stored="true"
>> > multiValued="true"  default=" "/>
>> >
>> > What did I miss?
>> >
>> > Thanks for your help.
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message