manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From K McGonigal <kmcgon...@gmail.com>
Subject Re: Trouble indexing a Twitter search in RSS format
Date Mon, 15 Aug 2011 15:38:35 GMT
Hmm, that's odd the URLs didn't work for you.  I've asked other people here
to try them and they had no problems.

After your suggestion I tried the web connector (but still with no access
credentials) and it did pretty well ingesting the RSS feed, so I might be
able to just use that.

I'm still mystified as to why the RSS connector couldn't handle it though. I
turned on DEBUG logging in Manifold, but that did not show anything unusual.

Thanks,
Kate

On Fri, Aug 12, 2011 at 3:58 PM, Karl Wright <daddywri@gmail.com> wrote:

> When I drop any of these URLs into my browser, I get redirected to a
> login screen.  Therefore it looks to me like Twitter does some kind of
> session-based login, tracked with cookies.  That would require
> maintenance of session cookies which the RSS connector simply does not
> do, and the coding of a login sequence as well.
>
> This is not a straightforward feature to add to the RSS connector, by any
> means.
>
> The web connector does have support for login sequencing and cookie
> session maintenance, and it does know how to chase RSS feeds, so that
> might be an option for you to try.  The problem is that most login
> sequences are non-trivial to set up and you will need a lot of
> patience and web spelunking skills to get it right.  The documentation
> is of some help but really could use a good example.
>
>
> Hope this helps.
> Karl
>
> On Fri, Aug 12, 2011 at 4:42 PM, K McGonigal <kmcgoniga@gmail.com> wrote:
> > Sorry to bother everyone again but I'm having trouble with an RSS
> connector
> > job on a Twitter search. When I try to run a job on
> > http://search.twitter.com/search.rss?q=Campylobacter the fetch appears
> to
> > work OK, but the document ingestion does not occur.
> >
> > I was wondering if it is just my setup, or could it be the redirection
> that
> > Twitter does on the links. For instance, a link shown in the RSS feed as
> > http://twitter.com/VashinkaInuiel/statuses/101493222852923393 redirects
> to
> > http://twitter.com/#!/VashinkaInuiel/statuses/101493222852923393 when it
> is
> > followed.
> >
> > Any help is very appreciated.
> >
> >
> >
>

Mime
View raw message