From dev-return-39964-apmail-nutch-dev-archive=nutch.apache.org@nutch.apache.org Mon Dec 23 10:13:13 2019 Return-Path: X-Original-To: apmail-nutch-dev-archive@www.apache.org Delivered-To: apmail-nutch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 10C5319176 for ; Mon, 23 Dec 2019 10:13:12 +0000 (UTC) Received: (qmail 11844 invoked by uid 500); 23 Dec 2019 10:13:04 -0000 Delivered-To: apmail-nutch-dev-archive@nutch.apache.org Received: (qmail 11778 invoked by uid 500); 23 Dec 2019 10:13:04 -0000 Mailing-List: contact dev-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nutch.apache.org Delivered-To: mailing list dev@nutch.apache.org Received: (qmail 11633 invoked by uid 99); 23 Dec 2019 10:13:04 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Dec 2019 10:13:04 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9AFE8E3147 for ; Mon, 23 Dec 2019 10:13:03 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id E2219782374 for ; Mon, 23 Dec 2019 10:13:00 +0000 (UTC) Date: Mon, 23 Dec 2019 10:13:00 +0000 (UTC) From: "ASF GitHub Bot (Jira)" To: dev@nutch.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (NUTCH-1863) Add JSON format dump output to readdb command MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002181#comment-17002181 ] ASF GitHub Bot commented on NUTCH-1863: --------------------------------------- sebastian-nagel commented on pull request #490: Fix for NUTCH-1863: Add JSON format dump output to readdb command URL: https://github.com/apache/nutch/pull/490#discussion_r360828643 ########## File path: src/java/org/apache/nutch/crawl/CrawlDbReader.java ########## @@ -128,16 +137,37 @@ private void closeReaders() { readers = null; } - public static class CrawlDatumCsvOutputFormat extends - FileOutputFormat { - protected static class LineRecordWriter extends - RecordWriter { + public static class JsonIndenter extends MinimalPrettyPrinter { + + /** + * + */ + private static final long serialVersionUID = -4464852619186879060L; Review comment: Could also use the class annotation `@SuppressWarnings("serial")` because we never serialize the JsonIndenter. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org > Add JSON format dump output to readdb command > --------------------------------------------- > > Key: NUTCH-1863 > URL: https://issues.apache.org/jira/browse/NUTCH-1863 > Project: Nutch > Issue Type: New Feature > Components: crawldb > Affects Versions: 2.3, 1.10 > Reporter: Lewis John McGibbney > Assignee: Shashanka Balakuntala Srinivasa > Priority: Major > Fix For: 1.17 > > > Opening up the ability for third parties to consume Nutch crawldb data as JSON would be a poisitive thing IMHO. > This issue should improve the readdb functionality of both 1.X to enable JSON dumps of crawldb data. -- This message was sent by Atlassian Jira (v8.3.4#803005)