nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Updated] (NUTCH-1959) Improving CommonCrawlFormat implementations
Date Thu, 12 Mar 2015 18:32:39 GMT


Lewis John McGibbney updated NUTCH-1959:
    Attachment: NUTCH-1959.v02.patch

Giuseppe's patch

> Improving CommonCrawlFormat implementations
> -------------------------------------------
>                 Key: NUTCH-1959
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.9
>            Reporter: Giuseppe Totaro
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>         Attachments: NUTCH-1959.patch, NUTCH-1959.v02.patch
> {{CommonCrawlFormat}} is an interface for Java classes that implement methods for writing
data into Common Crawl format. {{AbstractCommonCrawlFormat}} is an abstract class that implements
{{CommonCrawlFormat}} and provides abstract methods for "CommonCrawl formatter" classes.
> You can find in attachment a PATCH that includes some improvements for {{CommonCrawlFormat}}-based
> * {{CommonCrawlFormat}} and {{AbstractCommonCrawlFormat}} now provide only the {{getJsonData()}}
method, responsible for getting out JSON data.
> * {{AbstractCommonCrawlFormat}} provides also the abstract methods that each subclass
has to implement in order to handle JSON objects.
> * {{CommonCrawlFormatSimple}} is a {{StringBuilder}}-based formatter that now provide
also escaping of JSON string values.
> This PATCH aims at providing a better interface for implementing/extending {{CommonCrawlFormat}}
> I would really appreciate your feedback.
> Thanks a lot,
> Giuseppe

This message was sent by Atlassian JIRA

View raw message