nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2441) ARG_SEGMENT usage
Date Mon, 04 Dec 2017 08:49:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276460#comment-16276460
] 

ASF GitHub Bot commented on NUTCH-2441:
---------------------------------------

lewismc commented on issue #250: fix for NUTCH-2441 ARG_SEGMENT fix for REST API
URL: https://github.com/apache/nutch/pull/250#issuecomment-348896808
 
 
   No problems thanks, can you just update the formatting then please?
   
   On Mon, Dec 4, 2017 at 06:35 Semyon <notifications@github.com> wrote:
   
   > *@okedoki* commented on this pull request.
   > ------------------------------
   >
   > In src/java/org/apache/nutch/metadata/Nutch.java
   > <https://github.com/apache/nutch/pull/250#discussion_r154567386>:
   >
   > > @@ -96,7 +96,7 @@
   >  	 * Similar to the -dir command in the bin/nutch script **/
   >  	public static final String ARG_SEGMENTDIR = "segment_dir";
   >  	/** Argument key to specify the location of individual segment for the REST endpoints
**/
   >
   > @lewismc <https://github.com/lewismc> The motivation is the following :
   > Part of the endpoints treat segments as an individual Path, part as an
   > array of Paths. Therefore we have an inconsistent usage of the REST API
   > value. The patch allows to use both of them for all endpoints(either array
   > of paths or an individual path, depends on what you sent).
   >
   > —
   > You are receiving this because you were mentioned.
   >
   >
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/nutch/pull/250#discussion_r154567386>, or mute
   > the thread
   > <https://github.com/notifications/unsubscribe-auth/ABHJl9rw69iajTvt9Xi8ax5_fRc_NjmQks5s85KxgaJpZM4QvE2f>
   > .
   >
   -- 
   http://home.apache.org/~lewismc/
   @hectorMcSpector
   http://www.linkedin.com/in/lmcgibbney
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> ARG_SEGMENT usage
> -----------------
>
>                 Key: NUTCH-2441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2441
>             Project: Nutch
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.13
>            Reporter: Semyon Semyonov
>             Fix For: 1.14
>
>         Attachments: metadataARG_SEGMENT.patch
>
>
> The class metadata/Nutch.java  public static final String ARG_SEGMENT = "segment" is
not used correctly. In some cases Fetcher and ParseSegment it is interpreted as a single segmenet,
in others CrawlDb, LinkDb, IndexingJob as an array of segments. Such misunderstanding leads
to inconsistency of usage of the parameter.
> After a discussion with [~wastl-nagel]  the proposed solution is to allow the usage of
both array and a string in all cases. That gives an opportunity to not introduce the broken
changes.
> A path is proposed.
>  *The question left is refactoring, all these five components share the same code(two
versions of the same code to be precise). Shouldn't we extract a method and reduce duplicates?
 *



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message