lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Isaac Romero Cartaya" <>
Subject Re: Big problem with big indexes
Date Mon, 16 Oct 2006 12:43:32 GMT
First af all, what is your machine architecture ??? Do you have a super pc
I'm running this on a dual xeon hyperthreading 2,4 Ghz, 1 Gb RAM,  HD SATA.
I Can not get the times results you get. I think  that the problem may be in
the structure of my index, for example I use a special analyzer for the
english language, filter stopwords, stemming and synonims detection with
wordnet but I need to do this to get the results I get. Could you help me to
fix this problem.
I'm desesperate.

On 10/11/06, Erick Erickson <> wrote:
> Something's extremely not right <G>....
> First of all, I'm running a 1.4G index on a single machine and getting
> very
> good results, under 10 seconds even for the most complex queries I'm
> firing.
> This is with 870,000 documents, and includes sorting by criteria other
> than
> relevance. And using span queries. And using wildcards that build their
> own
> filters.
> So, something must be very different about how you are using lucene to get
> such poor search times.
> So, please tell us significantly more about the structure of your index
> and
> post the shortest example you can of your search code that demonstrates
> the
> problem, and maybe some of the wiser heads than mine can help out too.
> There should be no need to put the index in RAM, the index is just not big
> enough.
> So, some of the things I think would help analyze your problems....
> 1> hardware and op systems you're running on. Including how much memory
> you're allowing your JVM to have.
> 2> network topology. If you're running the searchers locally and just
> storing the indexes on remote machines, you're possibly having network
> latency problems. Personally, I don't think your problem is properly
> addressed by splitting your index. 600MB of index is just not big enough
> to
> need this.
> 3> This *should* work on a local machine with just a single index. How
> much
> trouble would it be to create it so? Can you try that and see what
> difference that makes?
> 4> how did you build your index? Is it optimized? Can you give us an idea
> of
> how many fields you are storing and some indication of the relative sizes
> of
> each? Mostly, I'm asking whether you have a bunch of small fields and some
> other very large ones.
> 5> Put one of the indexes on your local machine and get a copy of Luke
> (google luke lucene) and fire off a few queries via Luke and tell us what
> kind of results you get. Actually, this is probably the first thing you
> should try. If you get radically different results with Luke than your
> code,
> you can be pretty sure you're doing something out of the ordinary.
> 6> Timings of *only* the search code. By that I mean the time it takes for
> to complete. It's vaguely possible that the search is
> fine,
> but something you're doing when processing the results is taking forever.
> I
> have no evidence for this, of course, but it'd be a useful bit of
> information.
> I don't know if this helps much, but from your description, I think
> there's
> a fundamental, correctable problem because nobody would use the product if
> it gave such poor search times. And lots of people use it.
> Best
> Erick
> On 10/11/06, Ariel Isaac Romero Cartaya <> wrote:
> >
> > Hi everybody:
> >
> >      I have a big problem making prallel searches in big indexes.
> >      I have indexed with lucene over 60 000 articles, I have distributed
> > the
> > indexes in 10 computers nodes so each index not exceed the 60 MB of
> size.
> > I
> > makes parallel searches in those indexes but I get the search results
> > after
> > 40 MINUTES !!! Then I put the indexes in memory to do the parallel
> > searches
> > But still I get the search results after 3 minutes !!! that`s to mucho
> > time
> > waiting !!!
> >   How Can I reduce the time of search ???
> >   Could you help me please ???
> >   I need help !!!!!
> >
> > Greetings
> >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message