lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tod <listac...@gmail.com>
Subject Re: Solrj ContentStreamUpdateRequest Slow
Date Fri, 06 Aug 2010 13:28:16 GMT
On 8/4/2010 11:11 PM, jayendra patil wrote:
> ContentStreamUpdateRequest seems to read the file contents and transfer it
> over http, which slows down the indexing.
> 
> Try Using StreamingUpdateSolrServer with stream.file param @
> http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post
> 
> e.g.
> 
> SolrServer server = new StreamingUpdateSolrServer("Solr Server URL",20,8);
> UpdateRequest req = new UpdateRequest("/update/extract");
> ModifiableSolrParams params = null ;
> params = new ModifiableSolrParams();
> params.add("stream.file", new String[]{"local file path"});
> params.set("literal.id", value);
> req.setParams(params);
> server.request(req);
> server.commit();

Thanks for your suggestions.  Unfortunately, I'm still seeing poor 
performance.

To be clear, I am trying to have SOLR index multiple documents that 
exist on a remote server.  I'd prefer that SOLR stream the documents 
after I pass a pointer to them rather than me retrieving and pushing 
them so I can avoid network overhead.

When I do this:

curl 
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'

It returns in around a second.  When I execute the attached code it 
takes just over three minutes.  The optimal for me would be able get 
closer to the performance I'm seeing with curl using Solrj.

To be fair the SOLR server I am using is really a workstation class 
machine, plus I am still learning.  I have a feeling I'm doing something 
dumb but just can't seem to pinpoint the exact problem.


Thanks - Tod


--------code-----------


import java.io.File;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.UpdateRequest;
import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer;
import org.apache.solr.common.params.ModifiableSolrParams;


/**
  * @author EDaniel
  */
public class SolrExampleTests {

   public static void main(String[] args) {
System.out.println("main...");
     try {
//      String fileName = "/test/test.pdf";
       String fileName = "http://remoteserver/test/test.pdf";
       String solrId = "1234";
       indexFilesSolrCell(fileName, solrId);

     } catch (Exception ex) {
       System.out.println(ex.toString());
     }
   }

   /**
    * Method to index all types of files into Solr.
    * @param fileName
    * @param solrId
    * @throws IOException
    * @throws SolrServerException
    */
   public static void indexFilesSolrCell(String fileName, String solrId)
     throws IOException, SolrServerException {

System.out.println("indexFilesSolrCell...");

     String urlString = "http://localhost:8080/solr";

System.out.println("getting connection...");
//    SolrServer solr = new CommonsHttpSolrServer(urlString);
     SolrServer solr = new StreamingUpdateSolrServer(urlString,100,5);

System.out.println("getting updaterequest handle...");
//    ContentStreamUpdateRequest up = new 
ContentStreamUpdateRequest("/update/extract");
     UpdateRequest up = new UpdateRequest("/update/extract");

     ModifiableSolrParams params = null ;
     params = new ModifiableSolrParams();
//    params.add("stream.file", fileName);
     params.add("stream.url", fileName);
     params.set("literal.content_id", solrId);
     up.setParams(params);

System.out.println("making request...");
     solr.request(up);

System.out.println("committing...");
     solr.commit();

System.out.println("done...");
   }
}

Mime
View raw message