incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <scher...@gmail.com>
Subject Re: svn commit: r1439804 - in /incubator/droids/branches/0.2.x-cleanup/droids-core: ./ src/main/java/org/apache/droids/core/ src/main/java/org/apache/droids/handle/ src/main/java/org/apache/droids/parse/ src/main/java/org/apache/droids/taskmaster/ src/test...
Date Wed, 30 Jan 2013 10:22:54 GMT
On 01/30/2013 10:42 AM, Tobias Rübner wrote:
> Hi Thorsten,
>
> actually while implementing the new HTTPClient Crawler I needed a simple
> and generic way for the parser to create new tasks.
> When the parser is used for extracting the links, he does not know anything
> about the kind of the task.
> I have an example in the SimpleLinkParser
> https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/droids-core/src/main/java/org/apache/droids/parse/SimpleLinkParser.java
>
> So I thought this might be a good approach.

I understand your saying about the parser I worked around that by using
the getOutlinks() from the LinkTask to store them and the extraction I
do via a c3 pipeline.

Collection<URI> linksTo = new HashSet<URI>();
        List<String> outLinks = pipeline.getOutLinks();
        for (String location : outLinks) {
            URI uri;
            try {
                uri = new URI(location);
                linksTo.add(uri);
            } catch (URISyntaxException ex) {
                logger.error("Invalid location: " + location, ex);
            }
        }
        ((LinkTask) task).setLinksTo(linksTo);

I will try now your implementation and start moving things like the
linkTask back to droids-crawler again. Let us see where I need to adopt.

>
> It is always good to discuss, so please share your thoughts and we can
> create a great Droids-API!

Yeah, really appreciating your efforts!

salu2

>
> Tobias
>
>
> On Tue, Jan 29, 2013 at 4:31 PM, Thorsten Scherler <scherler@gmail.com>wrote:
>
>> On 01/29/2013 04:23 PM, Thorsten Scherler wrote:
>>> On 01/29/2013 10:50 AM, tobr@apache.org wrote:
>>>> Modified:
>> incubator/droids/branches/0.2.x-cleanup/droids-core/src/main/java/org/apache/droids/core/Task.java
>>>> URL:
>> http://svn.apache.org/viewvc/incubator/droids/branches/0.2.x-cleanup/droids-core/src/main/java/org/apache/droids/core/Task.java?rev=1439804&r1=1439803&r2=1439804&view=diff
>> ==============================================================================
>>>> ---
>> incubator/droids/branches/0.2.x-cleanup/droids-core/src/main/java/org/apache/droids/core/Task.java
>> (original)
>>>> +++
>> incubator/droids/branches/0.2.x-cleanup/droids-core/src/main/java/org/apache/droids/core/Task.java
>> Tue Jan 29 09:50:17 2013
>>>> @@ -59,4 +59,6 @@ public interface Task extends Serializab
>>>>      public void abort();
>>>>
>>>>      public boolean isAborted();
>>>> +
>>>> +    public Task createTask(URI uri);
>>>>  }
>>> Why did you added createTask to the interface?
>>>
>>> IMO it is not really generic since seeing your implementation and my
>>> current use case I would rather expected something like
>>>
>>> Link task = new LinkTask(link, uri, link.getDepth() + 1);
>>>
>>> /**
>>>      * Creates a new LinkTask.
>>>      *
>>>      * @param from Link
>>>      * @param uri URI
>>>      * @param depth int
>>>      */
>>>
>>> ...but I as well understand your approach.
>>>
>>> However I am doing the creation of tasks in my main CrawlingDroid but I
>>> am trying to understand why you have done it like that.
>>>
>>> salu2
>>>
>> Actually I just fixed my custom code for the linkTask with
>>
>>     @Override
>>     public Link createTask(URI uri) {
>>         return new LinkTask(this, uri, this.getDepth() + 1);
>>     }
>>
>> salu2
>>
>> --
>> Thorsten Scherler <scherler.at.gmail.com>
>> codeBusters S.L. - web based systems
>> <consulting, training and solutions>
>>
>> http://www.codebusters.es/
>>
>>


-- 
Thorsten Scherler <scherler.at.gmail.com>
codeBusters S.L. - web based systems
<consulting, training and solutions>

http://www.codebusters.es/


Mime
View raw message