nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sujen Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request
Date Wed, 28 Oct 2015 17:29:28 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978826#comment-14978826
] 

Sujen Shah commented on NUTCH-2153:
-----------------------------------

Hi [~ahmadia] and [~chrismattmann], 

Currently, while using Nutch REST services in local mode, the crawldb job gets executed pretty
fast. But if the same is used in a distributed mode, the crawldb job can take up a fair amount
of time. So issuing a GET request would make the client wait for a long time for the response.

A POST request was used since the crawldb resource is created once a user issues a request
and not precomputed (which is usually the case when a GET is used). The /db endpoint still
requires development in the part where it can spin up threads for computation like the /job
endpoint, and then provide a GET interface to query results.

I have tried to use the same concept in the commoncrawldump service as that might also take
up time as the amount of data crawled increases. 

I would like to know what are your thoughts to handle such cases, where issuing a GET requires
computation of the resource. 

Thanks!

> Nutch REST API (DB) uses POST instead of GET to request
> -------------------------------------------------------
>
>                 Key: NUTCH-2153
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2153
>             Project: Nutch
>          Issue Type: Bug
>          Components: REST_api
>    Affects Versions: 1.11
>            Reporter: Aron Ahmadia
>            Priority: Trivial
>              Labels: memex
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message