manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Field mapping for RSS feed
Date Tue, 02 Aug 2011 15:47:48 GMT
Hi Kate,

The field mapping won't do the trick because the RSS connector is
currently very selective about what fields it extracts - it by no
means extracts all of them, so the ones that it *does* extract from
the feed are "special".

The behavior you describe sounds like a bug to me.  I'll go spelunking
through the code at first opportunity.  In the meantime, could you
create a Jira ticket describing the behavior you see vs. the behavior
you want?


On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <> wrote:
> Hi,
> I'm trying to use ManifoldCF to index an RSS feed into Solr.  It sort of
> works, but my main problem at the moment is that the *channel* description
> from the RSS feed is written to the "description" field in Solr when I would
> really like the *item* description to be written instead.
> I have a typical RSS feed with the general structure:
> <rss>
>     <channel>
>         <title></title>
>         <link></link>
>         <description> *** the description I don't want *** </description>
>         <item>
>             <title></title>
>             <link></link>
>             <pubDate></pubDate>
>             <description> *** the description I do want *** </description>
>             <author></author>
>             <category></category>
>         </item>
>     </channel>
> </rss>
> I tried setting up the  field mapping on the job with the XPath address of
> the second description, i.e. "/rss/channel/item/description" as the source,
> but that did not work.
> I suspect I'm overlooking something simple, but I've spent 2 days trying to
> solve it.  I would be grateful for any help.
> Kate McGonigal

View raw message