incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <>
Subject Re: Link interface for crawler
Date Wed, 30 Jan 2013 12:55:56 GMT
On 01/30/2013 12:31 PM, Tobias Rübner wrote:
> Hi Thorsten,
> I would propose to extend the ContentEntity and add the needed fields there.
> The Task should only contain data releveant for executing the task.
> All other "meta" data should be stored in the ContentEntity.
> The getTo Information can already be stored in ContentEntity.setLinks and
> getFrom is a reverse searh on the same field.
> What do you think of this approach?

I prefer a well defined interface since the ContentEntity is in the end
a simple HashMap where we store information. We have a couple of
developments that are actively use link.getLastModifiedDate() in the
filtering state that would now need to become

The lastModified is important for the execution of the task in some
usecases, where you can filter on it. Further IMO not all ContentEntity
are providing Links (list of new tasks).

Regarding getTo and getFrom it is a bit different. I try to explain on
by example. A page may have links so it creates a new Task where the
getFrom is the page which contained the page as link (stored in getTo).
Both can be used for filtering so I would like to have them exposed
directly in the link and not go via the contentEntity.

In general as I understand you correct you propose to move down the
"meta" data to the contentEntity but for me that meta is meta from the task.

> Tobias
> On Wed, Jan 30, 2013 at 12:05 PM, Thorsten Scherler <>wrote:
>> Hi all,
>> Tobias I saw that you dropped the link interface but moved the links to
>> the contentEntity. The problem I see is that an URL needs stuff like
>> getAnchorText if it is useful for the crawler. This is as well true for
>> the getFrom and getTo stuff to implement mapping rules.
>> Can I bring back the Link interface?
>> salu2
>> --
>> Thorsten Scherler <>
>> codeBusters S.L. - web based systems
>> <consulting, training and solutions>

Thorsten Scherler <>
codeBusters S.L. - web based systems
<consulting, training and solutions>

View raw message