nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Semyon Semyonov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-2481) HostDatum deltas(previous step statistics) and Metadata expressions
Date Wed, 17 Jan 2018 13:44:00 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Semyon Semyonov updated NUTCH-2481:
-----------------------------------
    Description: 
To allow the usage of previous step statistics(deltas of fetched,unfetced etc) in hostdb.
The motivation is usage of this statistics in generate with maxCount expressions.

 

The solution allows to fill in metadata of hostdatum based on custom JEXL expression using
two hostdatum: before update(previousHostDatum) and after update(currentHostDatum)..

For example to fill in difference in quantity of fetched at round t and t-1 we can use the
following expression

<property>
 <name>hostdb.deltaExpression</name>
 <value>\{return new ("javafx.util.Pair","FetchedDelta", currentHostDatum.fetched -
previousHostDatum.fetched);}</value>
</property>

A pull request will be provided shortly.

  was:
To allow the usage of previous step statistics(deltas of fetched,unfetced etc) in hostdb.
The motivation is usage of this statistics in generate with maxCount expressions.

 

The 


> HostDatum deltas(previous step statistics) and Metadata expressions
> -------------------------------------------------------------------
>
>                 Key: NUTCH-2481
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2481
>             Project: Nutch
>          Issue Type: Improvement
>          Components: hostdb
>            Reporter: Semyon Semyonov
>            Priority: Minor
>
> To allow the usage of previous step statistics(deltas of fetched,unfetced etc) in hostdb.
The motivation is usage of this statistics in generate with maxCount expressions.
>  
> The solution allows to fill in metadata of hostdatum based on custom JEXL expression
using two hostdatum: before update(previousHostDatum) and after update(currentHostDatum)..
> For example to fill in difference in quantity of fetched at round t and t-1 we can use
the following expression
> <property>
>  <name>hostdb.deltaExpression</name>
>  <value>\{return new ("javafx.util.Pair","FetchedDelta", currentHostDatum.fetched
- previousHostDatum.fetched);}</value>
> </property>
> A pull request will be provided shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message