nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2481) HostDatum deltas(previous step statistics) and Metadata expressions
Date Mon, 29 Jan 2018 15:28:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343490#comment-16343490
] 

ASF GitHub Bot commented on NUTCH-2481:
---------------------------------------

YossiTamari commented on a change in pull request #278: NUTCH-2481
URL: https://github.com/apache/nutch/pull/278#discussion_r164464185
 
 

 ##########
 File path: src/java/org/apache/nutch/hostdb/UpdateHostDbReducer.java
 ##########
 @@ -88,95 +97,128 @@ public void configure(JobConf job) {
     numericFields = job.getStrings(UpdateHostDb.HOSTDB_NUMERIC_FIELDS);
     stringFields = job.getStrings(UpdateHostDb.HOSTDB_STRING_FIELDS);
     percentiles = job.getInts(UpdateHostDb.HOSTDB_PERCENTILES);
-    
+
     // What fields do we need to collect metadata from
     if (numericFields != null) {
       numericFieldWritables = new Text[numericFields.length];
       for (int i = 0; i < numericFields.length; i++) {
         numericFieldWritables[i] = new Text(numericFields[i]);
       }
     }
-    
+
     if (stringFields != null) {
       stringFieldWritables = new Text[stringFields.length];
       for (int i = 0; i < stringFields.length; i++) {
         stringFieldWritables[i] = new Text(stringFields[i]);
       }
     }
 
+    stringDeltaExpression = job
+        .get(UpdateHostDb.HOSTDB_UPDATEDB_DELTA_EXPRESSION);
+    if (!org.apache.commons.lang3.StringUtils.isEmpty(stringDeltaExpression)) {
+      // Create or retrieve a JexlEngine
+      JexlEngine jexl = new JexlEngine();
+
+      // Dont't be silent and be strict
 
 Review comment:
   `Dont't` must be a typo, but beyond that, `setSilent(true)` seems to contradict this comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HostDatum deltas(previous step statistics) and Metadata expressions
> -------------------------------------------------------------------
>
>                 Key: NUTCH-2481
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2481
>             Project: Nutch
>          Issue Type: Improvement
>          Components: hostdb
>            Reporter: Semyon Semyonov
>            Priority: Minor
>
> To allow the usage of previous step statistics(deltas of fetched,unfetced etc) in hostdb.
The motivation is usage of this statistics in generate with maxCount expressions.
>  
> The solution allows to fill in metadata of hostdatum based on custom JEXL expression
using two hostdatum: before update(previousHostDatum) and after update(currentHostDatum)..
> For example to fill in difference in quantity of fetched at round t and t-1 we can use
the following expression
> <property>
>  <name>hostdb.deltaExpression</name>
>  <value>\{return new ("javafx.util.Pair","FetchedDelta", currentHostDatum.fetched
- previousHostDatum.fetched);}</value>
> </property>
> A pull request will be provided shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message