manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shinichiro Abe <shinichiro.ab...@gmail.com>
Subject Re: Http status code 302
Date Wed, 09 Jan 2013 09:29:13 GMT
I'm using web connector.

> Are you trying to crawl through a proxy?
No. I just set seeds that url without a proxy.
(Also I didn't obey robots.txt)

Using curl, it is the same as your result. 

Could you reproduce that?

Shinichiro

On 2013/01/09, at 17:49, Karl Wright wrote:

> When I try the URL you gave using curl and no special arguments, I get this:
> 
> 
> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
> * About to connect() to lucene.jugem.jp port 80 (#0)
> *   Trying 210.172.160.170... connected
> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>> GET /?eid=39 HTTP/1.1
>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
> .5 librtmp/2.3
>> Host: lucene.jugem.jp
>> Accept: */*
>> 
> < HTTP/1.1 200 OK
> < Date: Wed, 09 Jan 2013 08:47:52 GMT
> < Server: Apache/2.0.59 (Unix)
> < Vary: User-Agent,Host,Accept-Encoding
> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
> < Accept-Ranges: bytes
> < Content-Length: 22594
> < Cache-Control: private
> < Pragma: no-cache
> < Connection: close
> < Content-Type: text/html
> 
> There's no 302 from here.
> 
> Are you trying to crawl through a proxy?  If so, that might be where
> the problem lies.
> 
> Karl
> 
> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <daddywri@gmail.com> wrote:
>> It sounds like the httpclient upgrade definitely broke something.  We
>> should open a ticket.
>> 
>> But first, can you confirm what connector this is?  Is it the web
>> connector?  If so, I am puzzled because the web connector has always
>> logged any 302 return, but then queued a second document which it
>> subsequently fetches.
>> 
>> Karl
>> 
>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>> <shinichiro.abe.1@gmail.com> wrote:
>>> Hi,
>>> 
>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39
(koji's blog --I don't obey robots.txt).
>>> As I'm look at Simple History, it shows 302 result code at fetch activity and
doesn't ingest document.
>>> 
>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result
code and MCF could ingest documents.
>>> 
>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>> 
>>> Thanks in advance,
>>> Shinichiro Abe


Mime
View raw message