lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post
Date Tue, 25 Sep 2018 10:30:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627138#comment-16627138
] 

Karl Wright commented on SOLR-12798:
------------------------------------

It looks like the only implementer of ContentWriter is StringPayloadContentWriter, which just
furnishes a string for output, correct?

In order to work within that framework, ContentStreamUpdateHandler would need a streaming
ContentWriter implementation that pulls from the input and writes to the output.  That seems
to be missing.  And then this has nothing whatsoever to do with how the content is actually
transmitted -- it seems that the assumption is that the new ContentWriter stuff all goes via
PUT with metadata in the URL.  That's not good for two reasons: first, the URL length problems
I've already mentioned, and second -- Solr Cell uses the "name" part of the multipart post
to inject its own bit of metadata into the document, and there would be no way to transmit
that anymore.  Logic is still therefore going to be needed to use multipart forms under specific
circumstances.  Maybe there needs to be a useMultipart() method in all Requests, and HttpSolrClient
should look at that to decide whether to use multipart or standard PUT?




> Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-12798
>                 URL: https://issues.apache.org/jira/browse/SOLR-12798
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.4
>            Reporter: Karl Wright
>            Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from SolrJ 7.0.x
to SolrJ 7.4, we encountered significant structural changes to SolrJ's HttpSolrClient class
that seemingly disable any use of multipart post.  This is critical because ManifoldCF's documents
often contain metadata in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 10/31/2017, with
the introduction of the RequestWriter mechanism.  Basically, if a request has a RequestWriter,
it is used exclusively to write the request, and that overrides the stream mechanism completely.
 I haven't chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of ContentStreamUpdateRequests for
all posts meant for Solr Cell, and the creation of UpdateRequests for posts not meant for
Solr Cell (as well as for delete and commit requests).  For our release cycle that is taking
place right now, we're shipping a modified version of HttpSolrClient that ignores the RequestWriter
when dealing with ContentStreamUpdateRequests.  We apparently cannot use multipart for all
requests because on the Solr side we get "pfountz Should not get here!" errors on the Solr
side when we do, which generate HTTP error code 500 responses.  That should not happen either,
in my opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message