incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingfai Ma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DROIDS-48) Support prioritizing in the TaskQueue
Date Thu, 18 Jun 2009 08:26:08 GMT

    [ https://issues.apache.org/jira/browse/DROIDS-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721121#action_12721121
] 

Mingfai Ma commented on DROIDS-48:
----------------------------------

let me submit another patch. i have a habit to use the formatter of my IDE but I haven't set
it to use the coding style of this project, so. ... :-P

p.s. for this issue, it could be handled just by adding a weight integer field. but i feel
it is most flexible if the LinkTask could whole any arbitrary data. And the simplest way is
to make it extends Map.

{code}
public class LinkTask extends HashMap<String, Serializable> { //other interface are
skipped;
    protected final String id; //whatever data type for ID
    protected final URI uri; //refer to DROIDS-52, this may cause problem for URI)

   // all the other data are optional
{code}

use cases:
- say, in submitting a link, we want to associate information about cookie/http header, so
the fetcher could use the cookie info when fetching
- any optional fields like weight could be used
- any component, such as filter or parser or whatever, could mark arbitrary tag for a link.
say, a parser/factory, may read a "parser"/"contentType" value to decide how the data could
be parsed. (so the parser doesn't depends on HttpEntity in interface)  or the outlink could
be attached directly to a LinkTask. 

i throw the initial idea here to see if anyone has comment. more details on the implementation
could be provided.

> Support prioritizing in the TaskQueue
> -------------------------------------
>
>                 Key: DROIDS-48
>                 URL: https://issues.apache.org/jira/browse/DROIDS-48
>             Project: Droids
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.01
>            Reporter: Mingfai Ma
>         Attachments: DROIDS-48d.patch, DROIDS-48d2.patch
>
>
> Use case:
>  - when looping a directory, (imagine someone is too stupid and dunno the dmoz database
can be downloaded and try to crawl it with Droids) we got collect a lot of links that will
be handled later. assume the requirement is to fetch dmoz directory +1 link outside dmoz.org,
In the original mechanism, it will keep adding new links to the TaskQueue. Ideally, there
should be a mechanism to give a higher priority to the non-dmoz.org links, so when non-dmoz
links are added, they are processed first, and be removed from the TaskQueue asap.
> with the patch in DROIDS-47, a constructor is added to the SimpleTaskQueue to support
a custom Queue. This issue suggests to change the SimpleTaskQueue to use a PriorityBlockingQueue
by default, and add a getWeight to the Task interface
> I'm also thinking about a more complex TaskQueue. to be discussed in the mail list later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message