manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From K McGonigal <kmcgon...@gmail.com>
Subject Re: Field mapping for RSS feed
Date Tue, 02 Aug 2011 18:56:02 GMT
Hi Karl,

Thank you for your quick response. I've opened a Jira ticket for this,
though I don't really understand what sort of solution you had in mind so I
didn't propose anything.

I'm afraid I don't understand exactly what the Dechromed Content options do
either. I read about them in the End User Documentation, but there wasn't
much there yet.

I find it odd that I would be the first person to have this problem. You'd
think it would be very common.


Kate


On Tue, Aug 2, 2011 at 11:05 AM, Karl Wright <daddywri@gmail.com> wrote:

> I just looked at the code.  It's not a bug rather than an oversight of
> sorts.  The "description" or "content" fields are indexed as the
> primary content of the document if the "chrome" mode is selected
> accordingly.  If "None" is the "chrome" mode, then the item-level
> description field is ignored even when present.
>
> So I recommend simply adding a new kind of "description" field for
> when the "chrome" mode is set to "None".  "item/description" may be
> its name, or maybe the full XPath, your choice.  Propose something in
> the ticket and I'll respond.
>
> Thanks!
> Karl
>
>
> On Tue, Aug 2, 2011 at 11:47 AM, Karl Wright <daddywri@gmail.com> wrote:
> > Hi Kate,
> >
> > The field mapping won't do the trick because the RSS connector is
> > currently very selective about what fields it extracts - it by no
> > means extracts all of them, so the ones that it *does* extract from
> > the feed are "special".
> >
> > The behavior you describe sounds like a bug to me.  I'll go spelunking
> > through the code at first opportunity.  In the meantime, could you
> > create a Jira ticket describing the behavior you see vs. the behavior
> > you want?
> >
> > Thanks!
> > Karl
> >
> > On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <kmcgoniga@gmail.com>
> wrote:
> >> Hi,
> >>
> >> I'm trying to use ManifoldCF to index an RSS feed into Solr.  It sort of
> >> works, but my main problem at the moment is that the *channel*
> description
> >> from the RSS feed is written to the "description" field in Solr when I
> would
> >> really like the *item* description to be written instead.
> >>
> >> I have a typical RSS feed with the general structure:
> >>
> >> <rss>
> >>     <channel>
> >>         <title></title>
> >>         <link></link>
> >>         <description> *** the description I don't want ***
> </description>
> >>         <item>
> >>             <title></title>
> >>             <link></link>
> >>             <pubDate></pubDate>
> >>             <description> *** the description I do want ***
> </description>
> >>             <author></author>
> >>             <category></category>
> >>         </item>
> >>     </channel>
> >> </rss>
> >>
> >> I tried setting up the  field mapping on the job with the XPath address
> of
> >> the second description, i.e. "/rss/channel/item/description" as the
> source,
> >> but that did not work.
> >>
> >> I suspect I'm overlooking something simple, but I've spent 2 days trying
> to
> >> solve it.  I would be grateful for any help.
> >>
> >>
> >> Kate McGonigal
> >>
> >>
> >>
> >
>

Mime
View raw message