incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingfai Ma (JIRA)" <j...@apache.org>
Subject [jira] Created: (DROIDS-54) Make LinkTask supports arbitrary data by extends HashMap, and consider to refactor Task, Link, and LinkTask
Date Thu, 18 Jun 2009 10:29:07 GMT
Make LinkTask supports arbitrary data by extends HashMap, and consider to refactor Task, Link,
and LinkTask
-----------------------------------------------------------------------------------------------------------

                 Key: DROIDS-54
                 URL: https://issues.apache.org/jira/browse/DROIDS-54
             Project: Droids
          Issue Type: New Feature
          Components: core
    Affects Versions: 0.01
            Reporter: Mingfai Ma


refer to the initial idea at:
https://issues.apache.org/jira/browse/DROIDS-48?focusedCommentId=12721121&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12721121

The current implementation of LinkTask
{code}
public class LinkTask implements Link, Serializable
{
  private Date started;
  private final int depth;
  private final URI uri;
  private final Link from;
  
  private Date lastModifedDate;
  private Collection<URI> linksTo;
  private String anchorText;
  private int weight;
{code}

Suggested change:
{code}
public class LinkTask extends HashMap<String, Serializable> 
or
public class LinkTask extends HashMap<String, Serializable> implements Link
{code}

The minimum required attributes are:
 - final ? id, 
   - mainly to have a minimum size value as hash key and store in memory/data grid for lookup,
e.g. for use as history to avoid duplicated fetching. refer to DROIDS-53 
 - final String url
   - the original String representation of the URL (preferred), or java.net.URI representation
with the encoded string (seems no good).
   - the url is the original one provided by the user in construction. two diff url may refer
to the same url, e.g. http://www.apache.org and http://www.apache.org/, it's up to the user
to decide if they should be normalized. (and they could use the URL/LinkNormalizer in DROIDS-45

the other fields are basically optional. 
  - started/taskDate, if the queue use it for sorting, then it's useful, otherwise, it's just
for logging.
  -  "weight" is another example that not all implementation may need. 
  - "linksTo", a.k.a. outLinks, is also optional to be attached to the LinkTask. an implementation
may extract the outlink and put them in queue directly without storing the outlinks in the
LinkTask. 
  - "from", a.k.a. referrer, should not store the Link reference as it will affect GC. 

btw, should we also simplify Link, Task and LinkTask?  if we use a Map, it's very generic
already. Link and Task could be different concepts if we need to use them separately.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message