manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Field mapping for RSS feed
Date Tue, 02 Aug 2011 15:47:48 GMT
Hi Kate,

The field mapping won't do the trick because the RSS connector is
currently very selective about what fields it extracts - it by no
means extracts all of them, so the ones that it *does* extract from
the feed are "special".

The behavior you describe sounds like a bug to me.  I'll go spelunking
through the code at first opportunity.  In the meantime, could you
create a Jira ticket describing the behavior you see vs. the behavior
you want?

Thanks!
Karl

On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <kmcgoniga@gmail.com> wrote:
> Hi,
>
> I'm trying to use ManifoldCF to index an RSS feed into Solr.  It sort of
> works, but my main problem at the moment is that the *channel* description
> from the RSS feed is written to the "description" field in Solr when I would
> really like the *item* description to be written instead.
>
> I have a typical RSS feed with the general structure:
>
> <rss>
>     <channel>
>         <title></title>
>         <link></link>
>         <description> *** the description I don't want *** </description>
>         <item>
>             <title></title>
>             <link></link>
>             <pubDate></pubDate>
>             <description> *** the description I do want *** </description>
>             <author></author>
>             <category></category>
>         </item>
>     </channel>
> </rss>
>
> I tried setting up the  field mapping on the job with the XPath address of
> the second description, i.e. "/rss/channel/item/description" as the source,
> but that did not work.
>
> I suspect I'm overlooking something simple, but I've spent 2 days trying to
> solve it.  I would be grateful for any help.
>
>
> Kate McGonigal
>
>
>

Mime
View raw message