manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: Unexpected job status encountered
Date Thu, 03 Jan 2019 12:52:26 GMT

It works now because I have implemented preemptive authentication. I'll
create a ticket, because this is something I think we should support.

I have analyzed the logs once again. MCF never tries to authenticate.
Well, it tries, but it cannot repeat the request entity. That's why I
mentioned that preemptive authentication could be a solution. Then we
only need to post to Solr once, not doing the unnecessary two-step
authentication process by:
1. Try to post
2. Solr server sends a 401 response
3. Try to post once again using the header: "Authorization: Basic ******"

It's not very effective if you have to post, say, 100,000 documents.

This is actually what happens:
1. http-outgoing-200 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
2. http-outgoing-200 << "HTTP/1.1 401 Unauthorized[\r][\n]"
3. IO exception during indexing
https://www.journals.uio.no/index.php/bioimpedance/article/view/3350: null
org.apache.http.client.ClientProtocolException
(Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.)

By using preemptive authentication, the following is now being sent to
Solr in the first request:
http-outgoing-30 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
http-outgoing-30 >> "Authorization: Basic **************[\r][\n]"

Preemptive authentication is also suggested as a solution to other
developers facing the same problem:
https://developer.ibm.com/answers/questions/266117/im-getting-this-exception-trying-to-add-doc-to-wat/

I can create a patch or PR. It's very easy to implement, and we have
done it for all the other Solr connectors we have developed.

Erlend

Mime
View raw message