nutch-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sna...@apache.org
Subject [nutch] branch master updated: NUTCH-2685: README.md file for exchange-jexl plugin.
Date Tue, 22 Jan 2019 16:26:47 GMT
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/master by this push:
     new 636f576  NUTCH-2685: README.md file for exchange-jexl plugin.
     new 6e000e1  Merge pull request #429 from r0ann3l/NUTCH-2685
636f576 is described below

commit 636f576d8bf4276562d36a70e1dafb524783e503
Author: r0ann3l <roannel.fdez@gmail.com>
AuthorDate: Wed Jan 16 10:48:27 2019 -0500

    NUTCH-2685: README.md file for exchange-jexl plugin.
---
 src/plugin/exchange-jexl/README.md | 64 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/src/plugin/exchange-jexl/README.md b/src/plugin/exchange-jexl/README.md
new file mode 100644
index 0000000..2d20242
--- /dev/null
+++ b/src/plugin/exchange-jexl/README.md
@@ -0,0 +1,64 @@
+exchange-jexl plugin for Nutch  
+==============================
+
+**exchange-jexl plugin** decides which index writer a document should be routed to, based
on a JEXL expression.
+
+## Configuration
+
+The **exchange-jexl plugin** must be configured in the exchanges.xml file, included in the
official Nutch distribution.
+
+```xml
+<exchanges>  
+  <exchange id="<exchange_id>" class="org.apache.nutch.exchange.jexl.JexlExchange">
 
+    <writers>  
+      ...  
+    </writers>  
+    <params>  
+      <param name="expr" value="<jexl_expression>" />
+    </params>  
+  </exchange>  
+    ...  
+</exchanges>
+```
+
+Each `<exchange>` element has two mandatory attributes:
+
+* `<exchange_id>` is a unique identification for each configuration. It is used by
Nutch to distinguish each one, even when they are for the same exchange implementation and
this ID allows to have multiple instances for the same exchange, but with different configurations.
+
+* `org.apache.nutch.exchange.jexl.JexlExchange` corresponds to the canonical name of the
class that implements the Exchange extension point. This value must not be modified for the
**exchange-jexl plugin**.
+
+## Writers section
+
+The `<writers>` element is independent for each configuration and contains a list of
`<writer id="<id>">` elements, where `<id>` indicates the ID of index writer
where the documents should be routed.
+
+## Params section
+
+The `<params>` element is where the parameters that the exchange needs are specified.
Each parameter has the form `<param name="<name>" value="<value>"/>`.
+
+The unique parameter needed by the **exchange-jexl plugin** has the `<name>` **expr**
and the `<value>` is a JEXL expression used to validate each document. The variable
**doc** can be used on the expressions and represents the document itself. For example, the
expression `doc.getFieldValue('host')=='example.org'` will match the documents where the **host**
field has the value **example.org**.
+
+## Use case 1
+
+```xml
+<exchanges xmlns="http://lucene.apache.org/nutch"
+       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+       xsi:schemaLocation="http://lucene.apache.org/nutch exchanges.xsd">
+  <exchange id="exchange_jexl_1" class="org.apache.nutch.exchange.jexl.JexlExchange">
+    <writers>
+      <writer id="indexer_solr_1" />
+      <writer id="indexer_rabbit_1" />
+    </writers>
+    <params>
+      <param name="expr" value="doc.getFieldValue('host')=='example.org'" />
+    </params>
+  </exchange>
+  <exchange id="default" class="default">
+    <writers>
+      <writer id="indexer_dummy_1" />
+    </writers>
+    <params />
+  </exchange>
+</exchanges>
+```
+
+According to this example, the documents which the value of **host** field is **example.org**
will be sent to **indexer_solr_1** and **indexer_rabbit_1**. The rest of documents where **host**
is different to **example.org** do not match with **exchange_jexl_1** exchange and will be
sent where the default exchange says; in this case to **indexer_dummy_1**.
\ No newline at end of file


Mime
View raw message