nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sujen Shah (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request
Date Wed, 28 Oct 2015 17:29:28 GMT


Sujen Shah commented on NUTCH-2153:

Hi [~ahmadia] and [~chrismattmann], 

Currently, while using Nutch REST services in local mode, the crawldb job gets executed pretty
fast. But if the same is used in a distributed mode, the crawldb job can take up a fair amount
of time. So issuing a GET request would make the client wait for a long time for the response.

A POST request was used since the crawldb resource is created once a user issues a request
and not precomputed (which is usually the case when a GET is used). The /db endpoint still
requires development in the part where it can spin up threads for computation like the /job
endpoint, and then provide a GET interface to query results.

I have tried to use the same concept in the commoncrawldump service as that might also take
up time as the amount of data crawled increases. 

I would like to know what are your thoughts to handle such cases, where issuing a GET requires
computation of the resource. 


> Nutch REST API (DB) uses POST instead of GET to request
> -------------------------------------------------------
>                 Key: NUTCH-2153
>                 URL:
>             Project: Nutch
>          Issue Type: Bug
>          Components: REST_api
>    Affects Versions: 1.11
>            Reporter: Aron Ahmadia
>            Priority: Trivial
>              Labels: memex

This message was sent by Atlassian JIRA

View raw message