lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nischal reddy <nischal.srini...@gmail.com>
Subject Re: Making lucene indexing multi threaded
Date Tue, 03 Sep 2013 03:39:26 GMT
Hi Eric,

I have commented out the indexing part (indexwriter.addDocument()) part in
my application and it is taking around 90 seconds, but when i uncomment the
indexing part it is taking lot of time.

My machine specs are

windows 7, intel i7 processor, 4gb ram and doest have an ssd harddisk.

can you please tell me how are you able to index 3-4k files in 1 second,
what is the approach you are following.

is reading files (io) eating up lot of time?

Any suggestions would help me a lot.

Thanks,
Nischal Y


On Mon, Sep 2, 2013 at 8:07 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Stop. Back up. Test. <G>....
>
> The very _first_ thing I'd do is just comment out the bit that
> actually indexes the content. I'm guessing you have some
> loop like:
>
> while (more files) {
>   read the file
>    transform the data
>    create a Lucene document
>    index the document
> }
>
> Just comment out the "index the document" line and see how
> long _that_ takes. 9 times out of 10, the bottleneck is here.
> As a comparison, I can index 3-4K docs/second on my laptop.
> This is using Solr and is the Wikipedia dump so the docs
> are several K each.
>
> So, if you're going to multi-thread, you'll probably want to
> multi-thread the acquisition of the data and feed that
> through a separate thread that actually does the indexing,
> you don't want multiple IndexWriters active at once.
>
> FWIW,
> Erick
>
>
>
> On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy
> <nischal.srinivas@gmail.com>wrote:
>
> > Hi,
> >
> > I am thinking to make my lucene indexing multi threaded, can someone
> throw
> > some light on the best approach to be followed for achieving this.
> >
> > I will give short gist about what i am trying to do, please suggest me
> the
> > best way to tackle this.
> >
> > What am i trying to do?
> >
> > I am building an index for files (around 30000 files), and later will use
> > this index to search the contents of the files. The usual sequential
> > approach works fine but is taking humungous amount of time (around 30
> > minutes is this the expected time or am i screwing up things somewhere?).
> >
> > What am i thinking to do?
> >
> > So to improve the performance i am thinking to make my application
> > multithreaded
> >
> > Need suggestions :)
> >
> > Please suggest me best ways to do this and normally how long does lucene
> > take to index 30k files?
> >
> > Please suggest me some links of examples (or probably best practices for
> > multithreading lucene) for making my application more robust.
> >
> > TIA,
> > Nischal Y
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message