manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Web Connector and dates
Date Tue, 25 Jun 2013 13:18:47 GMT
Hi Stephane,

This is tricky, because the date is included in the index and yet the
version of the document better not include the date, or there can be no
incremental behavior.  However, it is possible to do this.  If you need
such a feature, please create a ticket.  I'm very behind at the moment so
it is unlikely to be worked on promptly, but I will get to it as soon as I
can.

Karl





On Tue, Jun 25, 2013 at 9:14 AM, Stephane Gamard <stephane@gamard.net>wrote:

> Hi Karl,
>
>
> I hear you about the web date. I was hoping Manifold would give me access
> to the date it crawled the document, and would update that date in case the
> page had been updated (in a later fetch). Would that kind of information be
> available?
>
>
>
> On June 25, 2013 at 3:06:04 PM, Karl Wright (daddywri@gmail.com) wrote:
>
> Hi Stephane,
>
> Web connector content does not in general include a date - it is not in
> the content, and there is no way to generate it out of nothing.  Thus the
> Web connector has no facility for processing dates, and does not attempt to
> do anything with them even when the documents it is crawling were
> referenced by an RSS feed.
>
> The date for content indexed by the RSS connector comes, if present, from
> fields in the RSS feed.  The dates are carried down from the feed to the
> referenced content.  This is one specialization that makes the RSS
> connector different from the more general Web connector.
>
> As for your observation that you are seeing no dates at all in Solr, as
> usual I must request that you include the Solr log info output for a
> document that you think should have a date attached but doesn't.  This info
> output shows all the arguments passed to Solr from ManifoldCF, and their
> names.  It should be obvious what is going on if we can see one of those
> lines.
>
> Thanks,
> Karl
>
>
>
> On Tue, Jun 25, 2013 at 8:55 AM, Stephane Gamard <stephane@gamard.net>wrote:
>
>> Hi All,
>>
>>
>> I'm getting more and more confused with the datum of ingested content.
>> Karl explained to me the (not yet documented) pudateiso metadata for RSS
>> connector, and now I'm mixing it with content from web connector as well.
>> My ingested content from the web connector has no date. I've did the
>> following to make sure it would get something (tried multiple config):
>>
>>
>>
>> on my solr-output:
>>
>>
>> And on my job:
>>
>> The ingested content have none of the datum fields (test and/or _date)
>> populated. Is the web-connector abiding to the same rules as the file and
>> other connectors as described here:
>> https://issues.apache.org/jira/browse/CONNECTORS-657
>>
>>
>

Mime
View raw message