lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Iturbe <marc...@santiago.cl>
Subject Re: Problems with DIH and missing fields.
Date Fri, 01 Apr 2011 14:29:15 GMT
Hello,
I was able to repeat this behaviour in Solr 3.1.0

The procedure is
 - rename the directory example-DIH/rss to example-DIH/gcontacts
 - modify solrconfig.xml to only load gcontacts
 - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see
content below)
 - modify schema.xml

This is from my schema.xml
    <field name="source" type="text" indexed="true" stored="true" />
    <field name="source-link" type="string" indexed="false" stored="true" />

    <field name="title" type="string" indexed="true" stored="true" />
    <field name="link" type="string" indexed="true" stored="true" />
    <field name="email" type="string" indexed="true" stored="true"
multiValued="true" default=" "/>
    <field name="phoneNumber" type="string" indexed="true" stored="true"
multiValued="true"  default=" "/>
    <field name="organization" type="string" indexed="true" stored="true"
multiValued="true"  default=" "/>
    <field name="postalAddress" type="string" indexed="true" stored="true"
multiValued="true"  default=" "/>

    <field name="all_text" type="text" indexed="true" stored="true"
multiValued="true" />
    <copyField source="title" dest="all_text" />
    <copyField source="email" dest="all_text" />
    <copyField source="phoneNumber" dest="all_text" />
    <copyField source="organization" dest="all_text" />
    <copyField source="postalAddress" dest="all_text" />

this is my gcontacts-data-config.xml file
<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="gcontacts"
                pk="link"
                url="http://172.16.0.30/sayt2/contacts/testtim.xml"
                processor="XPathEntityProcessor"
                forEach="/feed/entry"
                >

            <field column="source" xpath="/feed/entry/id" commonField="true"
/>
            <field column="source-link"
xpath="/feed/entry/link[@rel='edit']/@href" commonField="true" />

            <field column="title" xpath="/feed/entry/title"
commonField="true"/>
            <field column="link" xpath="/feed/entry/link[@rel='edit']/@href"
/>
            <field column="email" xpath="/feed/entry/email/@address"
commonField="true"/>
            <field column="phoneNumber" xpath="/feed/entry/phoneNumber"
commonField="true"/>
            <field column="organization" xpath="/feed/entry/organization"
commonField="true"/>
            <field column="postalAddress" xpath="/feed/entry/postalAddress"
commonField="true"/>
        </entity>
    </document>
</dataConfig>

This is from my solrconfig.xml file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<solr sharedLib="lib" persistent="true">
    <cores adminPath="/admin/cores">
        <core default="false" instanceDir="gcontacts" name="gcontacts"/>
    </cores>
</solr>

Thanks for your help.

Regards

On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis <
matheis.stefan@googlemail.com> wrote:

> Marcelo,
>
> could you paste the relevant parts of your DIH config?
>
> Regards
> Stefan
>
> On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe <marcelo@santiago.cl>
> wrote:
> > Hello,
> > I have an XML which contains personal contacts. Not all contacts have the
> > same fields (email, phone, postal).
> >
> > The problem is that when certain fields are NOT present,  SOLR is
> injecting
> > the previous contacts data.
> >
> > For example, assume the following from the XML feed:
> > <entry>
> >        <title type='text'>Jane Doe</title>
> >        <gd:email rel='http://schemas.google.com/g/2005#work' address='
> > jane.doe@gmail.com' primary='true'/>
> >        <gd:postalAddress rel='http://schemas.google.com/g/2005#home
> > '>Santiago
> >            Region Metropolitana
> >        Chile</gd:postalAddress>
> >    </entry>
> >    <entry>
> >        <title type='text'>Jeff Smith</title>
> >        <gd:email rel='http://schemas.google.com/g/2005#work' address='
> > jeff.smith@gmail.com' primary='true'/>
> >    </entry>
> >    <entry>
> >        <title type='text'>Ana Mercurio</title>
> >        <gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
> > primary='true'>+56912345678</gd:phoneNumber>
> >    </entry>
> >
> > The second contact, will have the first contacts postal address.
> > The third contact, will have Janes Postal Address and Jeffs email
> address:
> >
> > <lst>
> >    <arr name="title">
> >        <str>Ana Mercurio</str>
> >    </arr>
> >    <arr name="phoneNumber">
> >        <str>+5612345678</str>
> >    </arr>
> >    <arr name="email">
> >        <str>jeff.smith@gmail.com</str>
> >    </arr>
> >    <arr name="postalAddress">
> >        <str>Santiago
> >            Region Metropolitana
> >        Chile</str>
> >    </arr>
> > </lst>
> >
> > This is how I have the fields specified in the schema.xml file:
> >    <field name="email" type="string" indexed="true" stored="true"
> > multiValued="true" default=" "/>
> >    <field name="phoneNumber" type="string" indexed="true" stored="true"
> > multiValued="true"  default=" "/>
> >    <field name="postalAddress" type="string" indexed="true" stored="true"
> > multiValued="true"  default=" "/>
> >
> > What did I miss?
> >
> > Thanks for your help.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message