lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr
Date Wed, 01 Apr 2015 15:34:52 GMT
Data Import Handler is a process in Solr that reaches out, grabs
"something external" and indexes it. "Something external" can be a
database, files on the server etc. Along the way, you can do many
transformations of the data. The point is that the source can be
anything.

The update handler is an end-point in Solr that expects certain
specific formats and puts them in the index. For instance, if you
index XML, it _must_ be in a very specific form to throw at the update
handler, something like
<add>
   <doc>
     <field...>
     <field...>
   </doc>
   <doc>
     <field...>
     <field...>
   </doc>
</add>

The csv update handler is just an update handler that expects CSV
files. The headers are usually the field names although you can map
them from the column header in your csv file to your Solr schema.

In importing csv files should be very fast. I suspect your regex is costly.

As Alexandre says, though, it would be a good idea to go through the
CSV import tutorial. The Solr reference guide has the details:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Wed, Apr 1, 2015 at 8:04 AM, avinash09 <avinash.it09@gmail.com> wrote:
> sir , a silly  question m confuse here what is difference between data import
> handler and update csv
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196940.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message